Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for fetching entire []byte slice as bytes #323

Closed
deuill opened this issue Apr 3, 2023 · 5 comments
Closed

Allow for fetching entire []byte slice as bytes #323

deuill opened this issue Apr 3, 2023 · 5 comments

Comments

@deuill
Copy link

deuill commented Apr 3, 2023

Currently, []byte slices in Go are presented in Python as a custom type, Slice_byte, which allows for iterating over and fetching individual byte values from the underlying slice. However, this is rather inefficient for large slices, as each iteration requires FFI calls between Python, C, and Go (i.e. against the Slice_byte_elem function).

It might, therefore, be more efficient to allow for returning the entire []byte slice as a Python bytes object, as a single FFI call, perhaps as a C.CString. Even if each (or the first) call creates a copy, the latency and CPU time tradeoff should be a sufficient improvement overall.

@rcoreilly
Copy link
Member

This could be achieved by using a string type presumably? The semantics of the []byte would be lost if it was not directly writable on the python side.

@deuill
Copy link
Author

deuill commented Apr 3, 2023

My understanding is that a string type in Go would automatically map to an str type in Python, which is defined as being composed only of UTF-8 bytes. Indeed, trying something similar out seems to return an error on the autogenerated Get method:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I suppose there's no way of forcing a Go string to be handled as a bytes type instead? My assumption here is that there's not (or that any conversion here would be of equivalent effort as the original ask, in GoPy itself), and it's also partially the reason why I specified bytes rather than bytearray -- I assume that mapping the mutating aspects of the Python bytearray back to the Go []byte may not be feasible, hence it might simply be worth adding a function that returns an immutable copy of the entire byte slice instead.

@rcoreilly
Copy link
Member

I'm not sufficiently up on the relevant standards in Python for how this all works, so I can't really judge, but it sounds like we might want to have it work in different ways depending on the use case.. We do have the ability to flag things with some kind of comment directive I believe, so that might be an option. I can't quite remember where this is used but I believe it determines how an interface{} is treated or something to that effect.

@rcoreilly
Copy link
Member

This is a great issue for someone to work on! The current model is that Go owns all the data structures exposed by gopy, which continue to be managed by its GC etc, and are accessed exclusively by the (auto generated) handle. To do something more efficient, gopy could expose a method that returns an unsafe pointer and a length into any slice's raw memory (&slice[0]), which then shows up as a python bytes object, with whatever proper steps / warnings or whatever to ensure that this raw memory is copied immediately into the Python side of things and the dangling pointer is not kept around after this initial call, as it will become increasingly likely to become invalid. Presumably the python wrapper just does the copy immediately in the course of calling this method.

@rcoreilly
Copy link
Member

Fixed by #342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants