Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] memoryview builtin and support for python buffer protocol #1515

Open
1 task done
guidorice opened this issue Dec 19, 2023 · 4 comments
Open
1 task done
Labels
enhancement New feature or request mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library

Comments

@guidorice
Copy link

guidorice commented Dec 19, 2023

Review Mojo's priorities

What is your request?

This enhancement request is to add support for Python's memoryview builtin and support for python buffer protocol. Here are some ideas about what kind of tasks and level of effort might be involved:

  • Add a new Mojo trait (Bufferable?) which has dunder methods: __buffer__ and __release_buffer__.
  • Add support for python's builtin memoryview() on Mojo structs. __buffer__ returns memoryview so this has to be builtin to Mojo (not a python module import).
  • Add support for the C data interface defined here python buffer protocol. This would allow Mojo structs implementing the C data interface as Py_buffer https://docs.python.org/3/c-api/buffer.html to be called from Python. Or maybe they could be wrapped in a PythonObject and returned as a memoryview?

What is your motivation for this change?

Currently Mojo 0.6 has poor (nonexistent?) support for zero-copy shared memory buffers with Python.

For example in Mojo's documentation the Ray Tracing notebook has an example of raster imagery being copied into a numpy array, using MLIR ops. Not only is this an unnecessary memory copy, it's also too verbose, undocumented, and not pythonic. See def to_numpy_image(self) -> PythonObject: in source notebook.

Mojo should enable and encourage interop with existing scientific computing packages in the most efficient manner. For example the Apache Arrow format.

The Arrow C data interface is inspired by the Python buffer protocol, which has proven immensely successful in allowing various Python libraries exchange numerical data with no knowledge of each other and near-zero adaptation cost. Arrow Spec

This enhancement would also lay the groundwork for supporting the Python array API standard.

Any other details?

Related Discussions/Issues:

Reference PEPs:

@guidorice guidorice added enhancement New feature or request mojo Issues that are related to mojo labels Dec 19, 2023
@gryznar
Copy link
Contributor

gryznar commented Dec 19, 2023

As a struct, it should be named MemoryView. Please be consistent and avoid Python's mess in naming!

@guidorice
Copy link
Author

Good suggestion! The naming is a bit confusing- there is the type Py_buffer at the C level, MemoryView in Python land, and memoryview() constructor, also in Python land. Definitely would not want to add new names or concepts if that can be avoided.

Also, I thought maybe this python example with comments may help to illustrate the idea little more:

# made up example (chatbot)
import array

arr = array.array('i', [1, 2, 3, 4, 5]) 

mem_view = memoryview(arr)

# Access properties of the memoryview  
print(mem_view.nbytes)
print(mem_view.itemsize)

# Indexing and slicing like NumPy array
print(mem_view[0])
print(mem_view[-1])
print(mem_view[1:3]) 

# Iterate through the memoryview
for num in mem_view:
    print(num)

# Get a NumPy array from the memoryview 
import numpy as np
num_arr = np.frombuffer(mem_view, dtype=np.int32)
print(num_arr)

output

20
4
1
5
<memory at 0x1011590c0>
1
2
3
4
5
[1 2 3 4 5]

@guidorice
Copy link
Author

I think this enhancement would open up numerous use cases like:

  • Mojo <-> C ABI
  • Mojo <-> Python modules/packages
  • Mojo <-> Python <-> C/Rust/Fortran etc backed packages

@gryznar
Copy link
Contributor

gryznar commented Dec 19, 2023

I am aware, that I am quite pedantic, but if Mojo would like to implement this, it will be IMHO better to just sacrifice one character more and name this constructor "memory_view". I don't like Python's style to blend words together without any separator. Keeping names strongly synchronized with Python is also not the best, cause it will also require to directly follow its behaviour which may be painful in some cases.

If Mojo will be Python++ instead of its compiled copy, it will gain its own identity and this small improvements will be in this case very noticeable

@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label Dec 20, 2023
@ematejska ematejska added the mojo-repo Tag all issues with this label label Apr 29, 2024
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 3, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 3, 2024 — with Linear
@ematejska ematejska removed the mojo-stdlib Tag for issues related to standard library label May 6, 2024
@ematejska ematejska added the mojo-stdlib Tag for issues related to standard library label May 6, 2024 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mojo Issues that are related to mojo mojo-repo Tag all issues with this label mojo-stdlib Tag for issues related to standard library
Projects
None yet
Development

No branches or pull requests

3 participants