Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ctypes] [feature request] Create a Python string without buffer copies given a bytes pointer, size, kind #104689

Open
vadimkantorov opened this issue May 20, 2023 · 3 comments
Labels
topic-ctypes type-feature A feature request or enhancement

Comments

@vadimkantorov
Copy link

vadimkantorov commented May 20, 2023

In interop scenarios, it might be useful to be able to have a Python string referencing an existing buffer without copies (e.g. if the underlying char data is stored in NumPy/PyTorch tensors, accessing these char buffers with a standard Python interface is helpful for debugging and sometimes for perf).

I think it currently might be possible with ctypes and making use of existing PyUnicode/PyASCIIobject object layout and resetting the size/data fields to my own values.

I agree that usefulness over copying the byte buffer is not very prominent, but still might be useful in some specific scenarios: e.g. by mmap'ing a giant string from a disk file and being able to examine it in an easy way

@vadimkantorov vadimkantorov added the type-feature A feature request or enhancement label May 20, 2023
@sunmy2019
Copy link
Member

CPython string does not need to be copied since they are immutable.

mmap'ing a giant string

Use mmap.mmap

making use of existing PyUnicode/PyASCIIobject object

A lot of code needs to be changed. Cost > Benefits.

@sunmy2019
Copy link
Member

You can always use Python buffer protocol in your use cases.

@vadimkantorov
Copy link
Author

vadimkantorov commented May 20, 2023

I agree that the usecase is narrow, but if the hack I'm thinking of:

  1. construct an empty or 1-character-long string with the user-provided kind
  2. replace size and data fields with user-provided values

is possible, then this function can be relatively simple to implement in ctypes module without any redesign of string data structures, and useful in some debugging/zero-copy interop scenarios (e.g. if the underlying char data is stored in NumPy/PyTorch tensors)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-ctypes type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants