Not clear how to expose existing C++ vector as numpy array #1042

yesint · 2017-08-30T07:46:05Z

This is a question of documentation rather than an issue. I can't find any example of the following very common scenario:

std::vector<int> some_func();
...
// We want to expose returned std::vector as a numpy array without copying
m.def("some_func", []() -> py::array {
   auto data = some_func();
   // What to do with data?? Map it with Eigen (then return what?), wrap somehow with py::buffer (how?)
})

I don't know the answer. It would be very nice to have this explained in docs since this scenario if rather common.

The text was updated successfully, but these errors were encountered:

YannickJadoul · 2017-08-30T09:35:37Z

py::array will automatically copy your data if you don't give it a base argument in the constructor (though maybe that's indeed not very well documented).

If you don't want to copy, one solution would be to move the std::vector into a py::capsule and use that capsule as the base for a new py::array, then just continue using the v.data() of that moved vector to construct. If I'm not mistaken, the returned py::array will then keep that capsule alive and delete the vector once capsule can be and is garbage collected.

YannickJadoul · 2017-08-30T09:47:32Z

Untested code, but this should be the implementation of the non-copy approach:

auto v = new std::vector<int>(some_func());
auto capsule = py::capsule(v, [](void *v) { delete reinterpret_cast<std::vector<int>*>(v); });
return py::array(v->size(), v->data(), capsule);

Yes, probably more black magic than you might expect. But then again, you're not doing something simple either. You are keeping a C++ object alive to make sure you can access its internal data safely, without leaking the memory.

But if you don't mind the copy, just go:

auto v = some_func();
return py::array(v.size(), v.data());

yesint · 2017-08-30T10:03:23Z

@YannickJadoul, thank you very it really works! I don't mind doing black magic (and the magic is in fact quite logical) but currently the user is not even aware that this kind of magic exists.
Are there any plans to document the usage of py::array and py::capsule? The constructors of these types are non-trivial and usage of base argument is, well, a bit arcane.

yesint · 2017-08-30T10:19:34Z

Another suggestion. Probably it makes sense to provide an easy non-copying conversion from any contiguous buffer to py::arrray? Something like:

auto v = new std::vector<int>(some_func());
py::array array_from_buffer<int>(v, int ndim, shape, strides);

which will create corresponding py::buffer_info and capsule internally?
Could be a great addition in the cases when numerical data have to be returned, especially if one needs to wrap the function like:

void some_func(vector<int>& val1, vector<vector<float>>& val2);

Manual wrapping of each argument with py::buffer, py::capsule into the py::array becomes tedious in such cases.

YannickJadoul · 2017-08-30T10:31:07Z

but currently the user is not even aware that this kind of magic exists.

Agreed, I had to look into the actual headers to check the exact constructors, etc. But I don't know about planned documentation updates. If you feel like it, I'm sure a PR with more documentation on this would be gladly accepted ;-) Then again, I'm not always sure what's stable API and what're implementation details.

YannickJadoul · 2017-08-30T10:34:51Z

Probably it makes sense to provide an easy non-copying conversion from any contiguous buffer to py::arrray?

Not sure how easy that is to do (and how much more confusing this will make the whole situation). Maybe some kind of a static function as 'named constructor' could make sense, though?

By the way, std::vector<std::vector<int>> is not a contiguous structure. And I don't think this technique works when (un)wrapping the arguments of a function. What I just described was a way of not copying a return std::vector.

yesint · 2017-08-30T10:59:22Z

Sure, it won't work with "input" function parameters but works for "output" when one transforms c++ signature to python function returning tuple of numpy arrays instead of bunch of ref parameters (tht's exactly my case). In any case such thing should not be automatic - the user have to make it explicit in lambda.

Vector is indeed not contigous, sorry. But for example vectorEigen::Vector3f is contigous and could be returned efficiently as 2d array. Such funny structures are common when dealing with variable number of space points when Eigen::MatrixXf is not usable due to unknown dimension.

xkunglu · 2018-11-28T01:23:51Z

@YannickJadoul your code works, thanks for the reference, I just wanted to point out that you are missing a parenthesis at the end of the line
auto cap = py::capsule(v, [](void *v) { delete reinterpret_cast<std::vector<int>*>(v); });

arquolo · 2019-06-28T14:17:38Z

By the way,
anybody knows, how to get py::array_t from std::shared_ptr<std::vector<T>> without copy (and using new/delete)?
I tried this:

std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
    ptr->size(),
    ptr->data(),
    py::capsule(ptr.get(), [](void* p){ reinterpret_cast<decltype(ptr)*>(p)->reset(); }),
};

Obviously, this will never work, because when return happens, ptr will be deallocated from stack.
Using capture also does not help, because py::capsule can't accept them:

std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
    ptr->size(),
    ptr->data(),
    py::capsule([ptr](){ }),  // using lambda-capture to increase lifetime of ptr
};

Worked this solution (which seems very dirty):

std::shared_ptr<std::vector<float>> ptr = get_data();
return py::array_t<float>{
    ptr->size(),
    ptr->data(),
    py::capsule(
        new auto(ptr),  // <- can leak
        [](void* p){ delete reinterpret_cast<decltype(ptr)*>(p); }
    )
};

YannickJadoul · 2019-06-28T16:22:11Z

@arquolo Indeed, the only data that can be stored in a py::capsule is a single void * and a simple function pointer (this is a Python C API thing, by the way; pybind11 just made a C++ wrapper around it). So if you want the capsule to be a (co-)owner of the shared_ptr, I would think that the last solution is the only one that works and stores the actual shared_ptr object.

Is it that dirty, though? In the end, a capsule taking a std::function (or any kind of lambda/functor object) would incur this same allocation (inside of the std::function) because of the variable size of the capture.

The one thing to note, though, is that the object doesn't need to be a capsule. I can just as well be any other object (though hopefully one that keeps the data alive), so if your shared_ptr would be stored as member in a C++ class that is exposed to Python, you could also take use that py::object.

ferdonline · 2019-07-04T22:00:31Z

We define the following utility functions, which have proven to be live savers :)

template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence&& seq) {
    // Move entire object to heap (Ensure is moveable!). Memory handled via Python capsule
    Sequence* seq_ptr = new Sequence(std::move(seq));
    auto capsule = py::capsule(seq_ptr, [](void* p) { delete reinterpret_cast<Sequence*>(p); });
    return py::array(seq_ptr->size(),  // shape of array
                     seq_ptr->data(),  // c-style contiguous strides for Sequence
                     capsule           // numpy array references this parent
    );
}

and the copy version

template <typename Sequence>
inline py::array_t<typename Sequence::value_type> to_pyarray(const Sequence& seq) {
    return py::array(seq.size(), seq.data());
}

arquolo · 2019-07-09T07:42:43Z

Thanks @ferdonline
However, the move-helper needs to change signature to:

template <typename Sequence,
          typename = std::enable_if_t<std::is_rvalue_reference_v<Sequence&&>>>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence&& seq)

With such fix, the compiler will warn you if you calls with without std::move

ferdonline · 2019-10-25T13:48:13Z

With such fix, the compiler will warn you if you calls with without std::move

@arquolo If you call without std::move, it will bind as an L-value reference and then inside it does the std::move anyway. IMHO that's a fine behavior.

YannickJadoul · 2019-10-25T13:54:40Z

@arquolo If you call without std::move, it will bind as an L-value reference and then inside it does the std::move anyway. IMHO that's a fine behavior.

You will destroy the original container, then, though. That's quite unexpected if you didn't call the container with an rvalue reference.
Isn't the standard solution to use std::forward<Sequence>(seq)? In that case you'll copy if you pass an lvalue reference, and you'll move if you get an rvalue or rvalue reference.

ferdonline · 2019-10-25T14:04:27Z

The function is called as_array and the "docs" say it will move, so I think it's fine, but you choose.
It's standard to use std::forward in case you want to pass on the same reference type. Here we don't care, we just want to transform whatever reference type to an rvalue reference.

LeslieGerman · 2020-01-17T17:27:02Z

By the way,
anybody knows, how to get py::array_t from std::shared_ptr<std::vector> without copy (and using new/delete)?

@arquolo , you might be interested in what I have found: #323 (comment)

YannickJadoul · 2020-06-10T19:36:36Z

If anyone's interested in a version of @ferdonline's utility function without explicit/manual new and delete:

template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
    auto size = seq.size();
    auto data = seq.data();
    std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
    auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); });
    seq_ptr.release();
    return py::array(size, data, capsule);
}

Apart from avoiding new and delete, this also does not leak if for some reason py::capsule would throw.

sharpe5 · 2020-06-21T16:09:02Z

@YannickJadoul

template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
    auto size = seq.size();
    auto data = seq.data();
    std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
    auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); });
    seq_ptr.release();
    return py::array(size, data, capsule);
}

Apart from avoiding new and delete, this also does not leak if for some reason py::capsule would throw.

I'm not sure this would work?

The memory would be freed early as there is nothing left to hold onto the heap allocation after the unique_ptr goes out of scope.

Then another heap allocation could grab the same memory, and new writes could corrupt what is already there (i.e. the numpy buffer we just returned). See https://www.cplusplus.com/reference/memory/unique_ptr/get/.

sharpe5 · 2020-06-21T16:10:51Z

@YannickJadoul This is what I am using:

/**
 * \brief Returns py:array<T> from vector<T>. Efficient as zero-copy.
 * - Uses std::move to obtain ownership of said vector and transfer everything to the heap.
 * - Only accepts parameter using std::move(...), or else the vector metadata on the stack will go out of scope (heap data will always be fine).
 * \tparam T Type.
 * \param passthrough Numpy array.
 * \return py::array_t<T> with a clean and safe reference to contents of Numpy array.
 */
template<typename T>
inline py::array_t<T> toPyArray(std::vector<T>&& passthrough)
{
	// Pass result back to Python.
	// Ref: https://stackoverflow.com/questions/54876346/pybind11-and-stdvector-how-to-free-data-using-capsules
	auto* transferToHeapGetRawPtr = new std::vector<T>(std::move(passthrough));
	// At this point, transferToHeapGetRawPtr is a raw pointer to an object on the heap. No unique_ptr or shared_ptr, it will have to be freed with delete to avoid a memory leak.

	// Alternate implementation: use a shared_ptr or unique_ptr, but this appears to be more difficult to reason about as a raw pointer (void *) is involved - how does C++ know which destructor to call?
	
	const py::capsule freeWhenDone(transferToHeapGetRawPtr, [](void *toFree) {				
		delete static_cast<std::vector<T> *>(toFree);
		//fmt::print("Free memory."); // Within Python, clear memory to check free: sys.modules[__name__].__dict__.clear()
	});
	
	auto passthroughNumpy = py::array_t<T>(/*shape=*/{transferToHeapGetRawPtr->size()}, /*strides=*/{sizeof(T)}, /*ptr=*/transferToHeapGetRawPtr->data(), freeWhenDone);
	return passthroughNumpy;	
}

YannickJadoul · 2020-06-21T16:34:02Z

@sharpe5

The memory would be freed early as there is nothing left to hold onto the heap allocation after the unique_ptr goes out of scope.

That's why you call seq_ptr.release(), to release ownership of the pointer, right? (but only after you're certain the creation of the py::capsule worked) See https://en.cppreference.com/w/cpp/memory/unique_ptr/release

@YannickJadoul This is what I am using:

This seems quite similar (or the same?) to @ferdonline's utility function. As far as I can see, it will still leak memory when py::capsule throws, because there's nothing holding on to that raw pointer? But yes, it probably won't, and if it throws, something else is probably wrong, so it's fine enough to use.
Also, it uses raw new/delete, which is what I tried and managed to avoid with my fragment.

sharpe5 · 2020-06-21T17:25:10Z

@YannickJadoul You are right, your code is absolutely correct.

I can't help but think that the content of the capsule function is just a very complicated way of calling delete. I greatly prefer modern C++ and smart pointers, but if there is (void *) in the middle it becomes more difficult to reason about the data flow (for me at least!). Either smart pointers up and down the entire stack, or not at all? It is tricky to choose the right level of abstraction, and sometimes if one abstracts too much the intent gets obscured.

I did not see @ferdonline's utility function initially (see above), the one I quoted was written from first principles. It's somewhat interesting that they are virtually identical :)

YannickJadoul · 2020-06-21T20:17:53Z

I can't help but think that the content of the capsule function is just a very complicated way of calling delete.

Yes, it definitely is, but it does have the advantage of covering the corner case of exceptions in py::capsule's constructor and applying the good practice of avoiding new and delete. I don't think it's that much more complicated,, so I just threw out that addition, if people want to use it. But do of course use what is most comfortable to you.

bstaletic · 2020-07-23T00:53:28Z

This issue has been resolved. @YannickJadoul has done a great job answering questions here. Further question are better suited for gitter.

YannickJadoul · 2020-07-23T16:04:40Z

I'm thinking. Maybe we can/should add a convenience function for this to pybind11, since it seems to be such a popular issue. I'll reopen to remind ourselves.

sharpe5 · 2020-07-23T18:10:19Z

For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.

It is blazingly fast.

I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.

It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.

The zero-copy C++ >> Python adapter is in my post above.

This is the zero-copy Python >> C++ adapter:

/**
 * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy.
 * \tparam T Type.
 * \param passthrough Numpy array.
 * \return Span<T> that with a clean and safe reference to contents of Numpy array.
 */
template<class T=float32_t>
inline std::span<T> toSpan(const py::array_t<T>& passthrough)
{
	py::buffer_info passthroughBuf = passthrough.request();
	if (passthroughBuf.ndim != 1) {
		throw std::runtime_error("Error. Number of dimensions must be one");
	}
	size_t length = passthroughBuf.shape[0];
	T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr);
	std::span<T> passthroughSpan(passthroughPtr, length);
	return passthroughSpan;
}

ghost · 2020-07-25T07:37:16Z

Hi, I would like to check whether the cleanup function is really called, so wrote the following code.

auto v = new std::vector<int>(some_func());
auto capsule = py::capsule(v, [](void *v) { 
    py::scoped_ostream_redirect output;
    std::cout << "deleting int vector\n";
    delete reinterpret_cast<std::vector<int>*>(v); 
});
return py::array(v->size(), v->data(), capsule);

However, "deleting int vector" is not printed out when I run a python script. I even add the following python code at the end of the python script, but there was no use.

import gc
gc.collect(2)
gc.collect(1)
gc.collect(0)

Could you help me to make the cleanup function called explicitly?

Thank you

sharpe5 · 2020-07-25T07:40:26Z

@tlsdmstn56-2 You need to delete the variable returned by the pybind11 module on the Python side, or else the memory will not be freed. py::array returns a zero-copy reference to the data, so the memory will be held on the C++ side until it is no longer needed on the Python side.

del my_variable

cchriste · 2021-06-01T22:19:17Z

@sharpe5

For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.

It is blazingly fast.

I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.

It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.

The zero-copy C++ >> Python adapter is in my post above.

This is the zero-copy Python >> C++ adapter:
/**
 * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy.
 * \tparam T Type.
 * \param passthrough Numpy array.
 * \return Span<T> that with a clean and safe reference to contents of Numpy array.
 */
template<class T=float32_t>
inline std::span<T> toSpan(const py::array_t<T>& passthrough)
{
	py::buffer_info passthroughBuf = passthrough.request();
	if (passthroughBuf.ndim != 1) {
		throw std::runtime_error("Error. Number of dimensions must be one");
	}
	size_t length = passthroughBuf.shape[0];
	T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr);
	std::span<T> passthroughSpan(passthroughPtr, length);
	return passthroughSpan;
}

This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks!

sharpe5 · 2021-06-02T08:30:16Z

@cchriste mentioned:

This is great for sharing the raw data, but how does it handle ownership?

It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks! Short answer: it doesn't, but that's fine as the parent Python function caller holds ownership for the duration of the call. Remember, this is the "zero-copy Python >> C++ adapter", so Python creates the Numpy array, C++ modifies the array contents, then returns. Here is an example scenario: * Python creates a Numpy array, it is the owner. * Python calls a method written in C++/pybind11. * The C++ uses the `toSpan` method above to obtain a reference to this array. * The C++ can then safely edit the contents of this array. * The C++ returns. * The Numpy array is now modified, without the overhead of copying the array's contents back and forth from Python to C++ to Python. This is *really* useful when modifying columns in a DataFrame. It would be possible to break this if we really wanted to. The C++ side could create another thread, and that thread could start modifying the array behind Python's back, even after the original function call had returned and the Python side had deallocated it. But we assume that once the C++ function returns it does not touch that array again.

…

On Tue, 1 Jun 2021, 23:20 Cameron Christensen, ***@***.***> wrote: For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python. It is blazingly fast. I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data. It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span. The zero-copy C++ >> Python adapter is in my post above. This is the zero-copy Python >> C++ adapter: /** * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy. * \tparam T Type. * \param passthrough Numpy array. * \return Span<T> that with a clean and safe reference to contents of Numpy array. */ template<class T=float32_t> inline std::span<T> toSpan(const py::array_t<T>& passthrough) { py::buffer_info passthroughBuf = passthrough.request(); if (passthroughBuf.ndim != 1) { throw std::runtime_error("Error. Number of dimensions must be one"); } size_t length = passthroughBuf.shape[0]; T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr); std::span<T> passthroughSpan(passthroughPtr, length); return passthroughSpan; } This is great for sharing the raw data, but how does it handle ownership? It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1042 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ3FJHLSPLXCIF4OG3VKCDTQVMHHANCNFSM4DY367GQ> .

cchriste · 2021-06-02T15:41:21Z

@cchriste mentioned:
This is great for sharing the raw data, but how does it handle ownership?
It looks like the short answer is that it doesn't, but maybe I'm missing something. Thanks! Short answer: it doesn't, but that's fine as the parent Python function caller holds ownership for the duration of the call. Remember, this is the "zero-copy Python >> C++ adapter", so Python creates the Numpy array, C++ modifies the array contents, then returns. Here is an example scenario: * Python creates a Numpy array, it is the owner. * Python calls a method written in C++/pybind11. * The C++ uses the toSpan method above to obtain a reference to this array. * The C++ can then safely edit the contents of this array. * The C++ returns. * The Numpy array is now modified, without the overhead of copying the array's contents back and forth from Python to C++ to Python. This is really useful when modifying columns in a DataFrame. It would be possible to break this if we really wanted to. The C++ side could create another thread, and that thread could start modifying the array behind Python's back, even after the original function call had returned and the Python side had deallocated it. But we assume that once the C++ function returns it does not touch that array again.
…

I appreciate the quick reply, and agree this is very useful. For our use case, we do in fact want to take ownership of the data.

Going from C++ to Python seems safe: memory buffers are tagged with an ownership flag and, after the last reference to that memory is removed, won't be freed unless owned. Thanks for your other example demonstrating a trick to claim ownership when creating arrays for which pybind11 should simply provide a more straightforward argument.

The other way around does not seem as straightforward. Even if some clever combination of PyObject_GetBuffer/PyObject_Release can be used to ensure Python doesn't delete memory out from under C++, if it's deleted by C++ then any existing Python objects will suddenly be pointing to deallocated space. Maybe if ownership transfer is achieved using a move (a py::array& can be passed to C++, so it's possible to modify the object directly), and only if the reference count is exactly one, the desired goal can be achieved.

sharpe5 · 2021-06-02T15:59:40Z

@cchriste For Python to C++, I imagine that if the C++ wanted to take ownership of the data, the easiest and safest way would be to make a copy. I imagine that's the only way to prevent Python garbage collecting that data once del variable is executed on the Python side. Get it working first, then optimise it later.

You also mentioned:

if it's deleted by C++ then any existing Python objects will suddenly be pointing to deallocated space

... but the method above exposes the Numpy array as a span which is read-only as far as memory allocation/deallocation goes, and can be range checked, which goes a long way towards making any subsequent C++ code more robust. The span container is actually quite nice like that, see comments on StackOverflow. I'd also recommend putting some comments in the code as insurance against other developers making changes without a clear understanding of the limitations.

roastduck · 2022-06-02T07:48:38Z

This seems to be a good place to use a memoryview for holding onto the buffer instead of a capsule? #2307 is useful for invalidating the buffer once it has been released.

Actually, I think I misunderstood the problem, never mind. A memoryview might be useful in some of these cases however.

@virtuald I am also encountering this problem. As far as I understand, returning a memoryview means "lend" my memory to a memoryview, while returning an array with a capsule described in this thread means "move" my memory to an array. I would prefer lending (or borrowing), because there is less black magic. I can keep my owner object alive using keep_alive, which is equivalent to "moving", if the owner object is also exposed to PyBind11.

However, a memoryview is not a NumPy object. It dose not support NumPy's arithmetic operations. Can I lend my memory to an array, instead of a memoyview? ~~I found some of the array's constructor support a borrowed or stolen parameter, but I did not find any document.~~

I have figured it out. I can "lend" my data to an array by passing it a capsule with an empty destructor.

ghost · 2023-08-02T17:48:11Z

not necessarily a pybind solution, but you could allocate the std::vector on the heap with new, this way it won't get freed until you call delete, given that, it should be safe to use .data() pointer as a pointer for the NumPy array

wuxian08 · 2024-07-02T07:55:33Z

For the record, I have a large Python module that has zero-copy communication between Python and C++ when working with columns in a DataFrame. It is zero-copy both ways, i.e. Python >> C++ and C++ >> Python.

It is blazingly fast.

I usually combine it with OpenMP or TBB to do multi-threaded calculations on the column data.

It is all in pybind11 and Modern C++ (except for one raw pointer reference which is wrapped in a function; see above). It's easily testable, when the function is called from C++ is accepts a templated vector, and when it is called from Python it accepts a templated span.

The zero-copy C++ >> Python adapter is in my post above.

This is the zero-copy Python >> C++ adapter:
/**
 * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy.
 * \tparam T Type.
 * \param passthrough Numpy array.
 * \return Span<T> that with a clean and safe reference to contents of Numpy array.
 */
template<class T=float32_t>
inline std::span<T> toSpan(const py::array_t<T>& passthrough)
{
	py::buffer_info passthroughBuf = passthrough.request();
	if (passthroughBuf.ndim != 1) {
		throw std::runtime_error("Error. Number of dimensions must be one");
	}
	size_t length = passthroughBuf.shape[0];
	T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr);
	std::span<T> passthroughSpan(passthroughPtr, length);
	return passthroughSpan;
}

Thanks for sharing the code. One thing to notice is that if T is a struct but not packed (i.e., std::is_class_v<T> && alignof(T) > 1 ), this might lead to core dump on some machines. The reason is that when registering T to numpy dtype, it loses the alignement requirement of the dtype. One can simply check that by the assertion assert(py::dtype::of<T>.attr("alignment") == 1);.

In this case, the alignment of the input buffer passthroughBuf.ptr would be 1, which violates the alignment of T and triggers errors on some platforms.

PierreMarchand20 · 2024-08-01T10:45:01Z

If anyone's interested in a version of @ferdonline's utility function without explicit/manual new and delete:
template <typename Sequence>
inline py::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
    auto size = seq.size();
    auto data = seq.data();
    std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
    auto capsule = py::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence*>(p)); });
    seq_ptr.release();
    return py::array(size, data, capsule);
}
Apart from avoiding new and delete, this also does not leak if for some reason py::capsule would throw.

I was using this version for a while in a library, but recently I noticed it did not work anymore. It must something related to the compiler because I did not change the pybind11 version I was using (its commit is fixed a git submodule in my library). But the version of @sharpe5 works. The main difference seems to come from the constructor of py::array, so a fix for @YannickJadoul seems to be:

template <typename Sequence>
inline pybind11::array_t<typename Sequence::value_type> as_pyarray(Sequence &&seq) {
    auto size                         = seq.size();
    auto data                         = seq.data();
    std::unique_ptr<Sequence> seq_ptr = std::make_unique<Sequence>(std::move(seq));
    auto capsule = pybind11::capsule(seq_ptr.get(), [](void *p) { std::unique_ptr<Sequence>(reinterpret_cast<Sequence *>(p)); });
    seq_ptr.release();
    return pybind11::array({size}, {sizeof(typename Sequence::value_type)}, data, capsule);
}

add individual frame counting - no support for row-dark currently - had to change ctor for vectorToPyArray, see here: pybind/pybind11#1042 (comment) . This may be a numpy 2 thing - outputs will be a SparseArray with one frame/scan shape = (1, 1) and no metadata. This is to take advantage of methods in SparseArray

ax3l mentioned this issue Jun 13, 2018

Python (Numpy) API openPMD/openPMD-api#32

Closed

matsen mentioned this issue Jul 18, 2019

set up proper communication with numpy phylovi/bito#61

Closed

molpopgen mentioned this issue Sep 6, 2019

Possible crashes on OS X molpopgen/fwdpy11#299

Closed

ShuhuaGao mentioned this issue Jun 27, 2020

How to interpret/expose a C++ raw memory as a numpy array *view* in Python? #2271

Closed

YannickJadoul mentioned this issue Jul 8, 2020

Creating an array which owns its data #2121

Closed

bstaletic closed this as completed Jul 23, 2020

skgbanga mentioned this issue Sep 7, 2020

Overhead of calling C++ function from python using pybind #2470

Closed

YannickJadoul mentioned this issue Sep 27, 2020

[QUESTION] how to access raw data created in C++? #2533

Closed

YannickJadoul mentioned this issue Nov 12, 2020

[QUESTION] vector of variants as a NumPy array #2655

Open

cchriste mentioned this issue Jan 16, 2021

Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it SCIInstitute/ShapeWorks#903

Closed

thorstenhater mentioned this issue Feb 17, 2021

Python wrapper: prepare wrapped python objects for numpy usage arbor-sim/arbor#870

Closed

ghutchis mentioned this issue May 17, 2021

Access to Cube class inside Python OpenChemistry/avogadrolibs#580

Open

lkeegan mentioned this issue Jun 29, 2021

add ndarray example without copying data ssciwr/pybind11-numpy-example#1

Closed

andyj10224 mentioned this issue Jul 11, 2021

[Small, Important Change] Changed phi_ao code to support puream basis sets psi4/psi4#2210

Merged

5 tasks

LunarLanding mentioned this issue Oct 21, 2022

add batch_query_array which flattens nested vec to two numpy arrays before returning to python atksh/python_prtree#35

Merged

mlxd mentioned this issue Nov 23, 2022

Decouple Numpy layer data initialization PennyLaneAI/pennylane-lightning-gpu#70

Merged

5 tasks

benbovy mentioned this issue Dec 1, 2022

pybind11:vectorize and GIL release benbovy/spherely#2

Open

Suke0811 mentioned this issue Dec 22, 2022

Pybind vectorization issues Suke0811/REMS#157

Open

This was referenced Feb 1, 2023

Docker builds using GitHub Actions OpenChemistry/stempy#274

Closed

Upgrading python crashes on MPI counting with multipass reader OpenChemistry/stempy#275

Closed

rwgk mentioned this issue Feb 9, 2023

FWD pybind11 google/pybind11clif#1042

Closed

QuLogic mentioned this issue Feb 23, 2024

Convert path extension to pybind11 matplotlib/matplotlib#27087

Merged

1 task

badisa mentioned this issue Sep 6, 2024

Prefer a more terse way of converting vectors to numpy arrays proteneer/timemachine#1373

Closed

swelborn mentioned this issue Sep 12, 2024

add electronCount for single frame OpenChemistry/stempy#320

Open

Not clear how to expose existing C++ vector as numpy array #1042

Not clear how to expose existing C++ vector as numpy array #1042

Comments

yesint commented Aug 30, 2017 • edited Loading

YannickJadoul commented Aug 30, 2017

YannickJadoul commented Aug 30, 2017 • edited Loading

yesint commented Aug 30, 2017

yesint commented Aug 30, 2017

YannickJadoul commented Aug 30, 2017

YannickJadoul commented Aug 30, 2017

yesint commented Aug 30, 2017

xkunglu commented Nov 28, 2018 • edited Loading

arquolo commented Jun 28, 2019 • edited Loading

YannickJadoul commented Jun 28, 2019

ferdonline commented Jul 4, 2019 • edited Loading

arquolo commented Jul 9, 2019 • edited Loading

ferdonline commented Oct 25, 2019

YannickJadoul commented Oct 25, 2019

ferdonline commented Oct 25, 2019 • edited Loading

LeslieGerman commented Jan 17, 2020

YannickJadoul commented Jun 10, 2020

sharpe5 commented Jun 21, 2020 • edited Loading

sharpe5 commented Jun 21, 2020 • edited Loading

YannickJadoul commented Jun 21, 2020

sharpe5 commented Jun 21, 2020 • edited Loading

YannickJadoul commented Jun 21, 2020

bstaletic commented Jul 23, 2020

YannickJadoul commented Jul 23, 2020

sharpe5 commented Jul 23, 2020 • edited Loading

ghost commented Jul 25, 2020

sharpe5 commented Jul 25, 2020 • edited Loading

cchriste commented Jun 1, 2021 • edited Loading

sharpe5 commented Jun 2, 2021 via email

cchriste commented Jun 2, 2021

sharpe5 commented Jun 2, 2021 • edited Loading

roastduck commented Jun 2, 2022 • edited Loading

ghost commented Aug 2, 2023

wuxian08 commented Jul 2, 2024 • edited Loading

PierreMarchand20 commented Aug 1, 2024

yesint commented Aug 30, 2017 •

edited

Loading

YannickJadoul commented Aug 30, 2017 •

edited

Loading

xkunglu commented Nov 28, 2018 •

edited

Loading

arquolo commented Jun 28, 2019 •

edited

Loading

ferdonline commented Jul 4, 2019 •

edited

Loading

arquolo commented Jul 9, 2019 •

edited

Loading

ferdonline commented Oct 25, 2019 •

edited

Loading

sharpe5 commented Jun 21, 2020 •

edited

Loading

sharpe5 commented Jun 21, 2020 •

edited

Loading

sharpe5 commented Jun 21, 2020 •

edited

Loading

sharpe5 commented Jul 23, 2020 •

edited

Loading

sharpe5 commented Jul 25, 2020 •

edited

Loading

cchriste commented Jun 1, 2021 •

edited

Loading

sharpe5 commented Jun 2, 2021 •

edited

Loading

roastduck commented Jun 2, 2022 •

edited

Loading

wuxian08 commented Jul 2, 2024 •

edited

Loading