Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret/expose a C++ raw memory as a numpy array *view* in Python? #2271

Closed
ShuhuaGao opened this issue Jun 27, 2020 · 12 comments
Closed

Comments

@ShuhuaGao
Copy link

This is a question or more likey a feature request.

Issue description

Supposing I have a block of memory in C++ (say, which may be an std::vector or a C array), how can I expose it as a numpy array to Python? More specifically, the numpy array should be a view rather than a copy of the memory, just like numpy.frombuffer or Eigen::Map.

The discussion on gitter did not yield a solution. This issue is related to #1042

Reproducible example code

According to the suggestion by @YannickJadoul on gitter, I have tried the empty py::capsule, but it did not work. A minimal example is below.

struct Group {
    int indices[5] = {1, 2, 3, 4, 5};
};

// I hope the returned py::array_t is a view that interprets memory in the C array 
py::array_t<int> get_indices(Group& g) {
    // an empty capsule
    return py::array_t<int>{5, g.indices, py::capsule{}};
}

PYBIND11_MODULE(mymodule, m) {
py::class_<Group>(m, "Group", "doc of the Group struct")
        .def(py::init<>())
        .def_property_readonly("indices", &get_indices)
        .def( // to facilitate examination of indices
            "print_indices",
            [](const Group& g) {
                py::print("The indices is now: ");
                for (auto i : g.indices) py::print(i);
            },
            "print the content of indices");
}

Test it in Python

g = mymodule.Group()
g.print_indices()  # 1 2 3 4 5
g.indices[0] = -1
g.print_indices()  # still 1 2 3 4 5

It would be great if a similar frombuffer method is available for py::array and py::array_t (without copying and without ownership), though I am not sure whether it is too difficult to do so.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jun 27, 2020

The discussion on gitter did not yield a solution.

I had not been able to notice or reply to the discussion on Gitter in the 1 hour and a quarter before you posted this issue...

But as replied there, now, the suggestion I made long time ago and said it might "maybe" work, doesn't: py::capsule() doesn't produce an empty capsule, but just a NULL PyObject*, it seems, and is not considered as a valid base argument.

The solution is probably to just py::cast(g) or py::cast(&g) as base argument. If the object already has a Python equivalent, pybind11 will just give you the py::object corresponding to the C++ object, and you'll be sure to keep your array alive, even if you don't have a reference left to the surrounding Group object in your Python code.

Alternatively, create any valid object (i.e., take py::none() and put it as "fake" base object). But this is unsafe if your C++ object gets destructed, so that is why pybind11 does not do that by default. I explained this in the Gitter link you refer to.

As for the feature request: you can always make a PR, if you want, but there's a lot of memory management subtleties where Python and C++ mismatch. So that's probably why there isn't a feature like this.

@YannickJadoul
Copy link
Collaborator

Another relevant link found by @bstaletic: https://gitter.im/pybind/Lobby?at=5a341ed6540c78242dcca87d

@ShuhuaGao
Copy link
Author

A solution was obtained with the help of @YannickJadoul and @bstaletic, which is posted below for others' reference:

py::array_t<int> get_indices(Group& g) { 
    return py::array_t<int>{5, g.indices, py::cast(g)}; 
}

An alternative of py::cast(g) is py::none(). In fact, any non-NULL python object will work here.
Do note that only a view is returned, and a user must pay attention to the life of the memory, which is not much Pythonic.

I will leave this issue open for some while in case someone proposes a better solution 😊.

@YannickJadoul
Copy link
Collaborator

I will leave this issue open for some while in case someone proposes a better solution blush.

I'm not sure there is a better solution? If your problem is solved, could you close it? There's already far too many open issues in this project :-)

@molpopgen
Copy link

Do note that only a view is returned, and a user must pay attention to the life of the memory, which is not much Pythonic.

A keep_alive policy should help here?

@molpopgen
Copy link

Actually, it seems that the lifetime management here is okay, as you've declared it a readonly property, so reference_internal is in play: https://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies

@YannickJadoul
Copy link
Collaborator

That's a good point, indeed. I had not considered the combination of that mechanism with the base mechanism.

So that would mean that even when not having py::cast(g) as base, it might still be safe? That would indeed allow for a looser handling of memory!

@molpopgen
Copy link

So that would mean that even when not having py::cast(g) as base, it might still be safe? That would indeed allow for a looser handling of memory!

One would have to do some tests to confirm, but I think one can have safety and Pythonic semantics here.

@YannickJadoul
Copy link
Collaborator

One would have to do some tests to confirm

No, you're right. I see no obvious reason why it wouldn't work.

but I think one can have safety and Pythonic semantics here.

You already can do so now, ofc. But this could make it easier, indeed.

@ShuhuaGao
Copy link
Author

Hi, @molpopgen and @YannickJadoul , if I was not wrong, it seems the aforementioned reference_internal does not work as expected. See the tests below.

Test 1: use py::none() as base

struct Group {
    int indices[5] = {1, 2, 3, 4, 5};
};

py::array_t<int> get_indices(Group& g) { 
    return py::array_t<int>{5, g.indices, py::none()}; 
}

py::class_<Group>(m, "Group", "doc of the Group struct")
        .def(py::init<>())
        .def_property_readonly("indices", &get_indices, "getter of indices")
        .def(  // to print the internal data
            "print_indices", [](const Group& g) {
                for (auto i : g.indices) py::print(i);
            });

Test in python.

g = mymodule.Group()
g.print_indices()  # 1 2 3 4 5
g.indices[0] = -1
g.indices[4] = -5
g.print_indices()  # -1 2 3 4 -5

# test the lifecycle
ref_indices = g.indices
del g
gc.collect()
print('after gc.collect: ')
for i in range(3):
    print(f'Access #{i}: ', ref_indices)

After gc.collect(), each run produced different results in the last line. One result is listed below.

after gc.collect: 
Access #0:  [1478849888      21915          3          4         -5]
Access #1:  [1482134912      21915 1478849888      21915         64]
Access #2:  [1482134912      21915 1478849888      21915         64]

It seems there is memory violation since we read some random numbers.

Test 2: the only difference is to replace py::none() with py::cast(g)

Now the above code works, and the result is always:

after gc.collect: 
Access #0:  [-1  2  3  4 -5]
Access #1:  [-1  2  3  4 -5]
Access #2:  [-1  2  3  4 -5]

@YannickJadoul
Copy link
Collaborator

@molpopgen, @ShuhuaGao Right. From looking at the code, it seems the return_value_policy is ignored when returning a py::object (or py::handle or any subclass). Which kind of makes sense, because you already have created a Python object that lives in the Python world, so any Python object is already supposed to correctly work with Python memory semantics. So it's the job of the cast from C++ to Python to take into account memory management (like keeping things alive), and that's why you should use the correct base, there.

Sorry for the confusion; I hadn't thought of/considered that :-/

@maartenbreddels
Copy link

Unless I am mistaken, I think this is the only resource that gives a hint on how to return a numpy array, with a view on data of a C++ object, where the NumPy array keeps the C++ object alive.

Just to summarize, this should be the example, where the returned numpy array does not copy the data, and keeps the Group C++ object alive:

py::array_t<int> get_indices(Group& g) { 
    return py::array_t<int>{5, g.indices, py::cast(g)}; 
}

At least my testing shows this to be the case, and I thought I would make this more explicit, so others might find this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants