-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eigen/numpy referencing #610
Conversation
Looks really awesome! Two questions:
|
dfef38d
to
960d3d0
Compare
@ludwigschmidt - it likely will have some impact, depending on what you're doing with the I'll ponder the fail-instead-of-copy issue. |
Looks great!
This is an a very important point! End user is likely to have no idea about different storage order so the converter should provide the most logical and expected behavior. At the same time signatures of exposed C++ functions should not change just because we are making bindings for them.
|
@yesint: when returning from Eigen to numpy there is no problem: numpy will accept any stride so we can always map into an Eigen matrix (or Ref/Map/Block). When going the other way, swapping indices (which is essentially doing a transpose on the argument) seems rather unexpected. It also won't work in some cases: e.g. a One thing I should point out, though, is that Eigen itself will make a copy if you try to pass a storage-incompatible matrix into a |
docs/advanced/cast/eigen.rst
Outdated
|
||
.. code-block:: cpp | ||
|
||
m.def("scale", [](py::EigenRef<Eigen::MatrixXd> m, double c) { m *= c; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you probably mean to use py::EigenDRef
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, yeah. I started with EigenRef, then added the D (for Dynamic) to make it clear it isn't just the typical Eigen::Ref<M>
.
provides a ``py::EigenDRef<MatrixType>`` type alias for your convenience (along | ||
with EigenDMap for the equivalent Map, and EigenDStride for just the stride | ||
type). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand from this documentation why we need a fully dynamic stride type - why isn't Eigen::RowMajor
sufficient to get the same layout as numpy?
Edit: Ah I think I see now - this changes the stride, but not actually the memory layout - so it's less intrusive!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy doesn't always have a row-major layout, either. E.g. if a
is row-major, then a.transpose()
will be column major.
Also the stride and the memory layout are the same concept: Row major really just means the column stride is 1 (i.e. rows are contiguous), and column major just means the row stride is 1 (i.e. columns are contiguous). The restriction in Eigen is essentially that Ref<Matrix>
requires an inner stride of 1 (where inner = column for row-major and = row for column-major).
(When comparing to numpy strides, numpy strides are in bytes, so multiply by the dtype size when looking at numpy array strides).
docs/advanced/cast/eigen.rst
Outdated
or columns of a row-major matrix), but not along the inner dimension. | ||
|
||
This type, however, has the added benefit of also being able to map numpy array | ||
slices. For example, you could use the following (contrived) example uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo, remove "uses" or "you could use"
This looks awesome! I'll test it in my project over the next few days/weeks! A few remarks/questions:
I'm absolutely not sure here but wouldn't that be an lvalue?
In that case, why "when returning a reference (e.g. const MatrixXd &) we copy by default"? (Just curious or maybe I don't fully understand - probably there's a perfectly valid reason for it). Do I understand correctly that passing a np-array to a C++ function that takes a MatrixXd& or const & would occur a copy? So in this case the general advice would be to change a C++ API to use Eigen::Ref instead of & / const &? I guess you've thought about that and there's a reason for it but is it not possible to get the same behaviour with just normal references, without having to resort to Eigen::Ref's everywhere? This might be a stupid question... but: What happens with objects containing Eigen matrices, when these objects are passed by const&? Let's say I have
and I'm binding a function
When calling this function from Python, does Thank you very much for your great work on this! |
No, for ordinary object returns such as
Mainly because that's how lvalue reference returns work in the rest of pybind (with the default
Yes, you're correct. The issue is that in order to provide a
That won't be a problem at all: pybind11 internally works by storing a C++ instance of your my_class inside the wrapper that is exposed to Python; when a method is invoked, we call the C++ method on that internal instance; no copy is needed. |
0b5621a
to
a92bbff
Compare
Removed the gcc-7 fix commit (I included it in PR #611, which was changing related code anyway), and incorporated @patrikhuber's doc fixes. |
@jagerman Cool! Thank you very much for your explanations. Thanks in particular for the explanation regarding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Great handling of copy/move/reference with return_value_policy
and also const-correctness with the writeable
flag.
Regarding Eigen::Ref
arguments to constant data with fixed strides, I feel like this should never make a copy. Eigen itself wouldn't make a copy there to adjust for the strides, and I would imagine most people use Ref
specifically to avoid a copy so that might be surprising behavior.
include/pybind11/eigen.h
Outdated
#if EIGEN_VERSION_AT_LEAST(3,3,0) | ||
using EigenIndex = Eigen::Index; | ||
#else | ||
using EigenIndex = std::ptrdiff_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be using EigenIndex = EIGEN_DEFAULT_DENSE_INDEX_TYPE
since it's user-configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
include/pybind11/pybind11.h
Outdated
@@ -132,9 +132,7 @@ class cpp_function : public function { | |||
? &rec->data : rec->data[0]); | |||
|
|||
/* Override policy for rvalues -- always move */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is not accurate any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: s/always/usually to enforce rvp::move on an rvalue/
tests/test_eigen.cpp
Outdated
m.def("get_rm_const_ref", []() { return Eigen::Ref<const MatrixXdR>(get_rm()); }); | ||
// Just the corners (via a Map instead of a Ref): | ||
m.def("get_cm_corners", get_cm_corners); | ||
m.def("get_cm_corners_const", get_cm_corners_const); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also use lambdas here (and everywhere), like above? Should reduce duplication and make things easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking everywhere: choleskyn
, incr_matrix
, even_rows
etc. As far as I can see only add_rm()
and add_cm()
are ever used more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably worthwhile. I was just changing the ones I added, but I'll go change the rest, too.
include/pybind11/eigen.h
Outdated
template <typename T> using is_eigen_ref = is_template_base_of<Eigen::RefBase, T>; | ||
//template <typename T> struct is_eigen_ref : std::false_type {}; | ||
//template <typename PlainType, int Options, typename Stride> struct is_eigen_ref<Eigen::Ref<PlainType, Options, Stride>> | ||
// : std::true_type {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, yeah, fixed.
include/pybind11/eigen.h
Outdated
// operator Type*() { return value.get(); } | ||
// operator Type&() { if (!value) pybind11_fail("Eigen::Ref<...> value not loaded"); return *value; } | ||
// template <typename _T> using cast_op_type = pybind11::detail::cast_op_type<_T>; | ||
//}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fixed.
AFAIK The only time |
Consider moving the smaller fixes into another PR which can be merged quickly. It would benefit other PRs and avoid possible merge conflicts between parallel work. |
The docs say it'll copy, and it indeed appears to: #include <Eigen/Core>
#include <iostream>
using namespace Eigen;
void f(Ref<const MatrixXd> x) { std::cout << x(0, 1) << "\n"; }
int main() {
Matrix<double, 4, 4, RowMajor> x;
for (int i = 0; i < 16; i++) x(i / 4, i % 4) = i;
f(x);
} print |
Re: copying on a So essentially we're trading off pybind11 making the interface more restrictive to python than it is to C++ against the legitimate desire to not want a copy. I originally started playing around with argument passing to think about ways to have a sort of
which would essentially resolve this by letting the pybind11 layer indicate whether it wants copyability or not (either default to allowing copy with a |
995ac3d
to
a1067bd
Compare
I agree that there are valid reasons for either approach. Leaving the decision to the pybind user via argument policies would probably be ideal. |
@ludwigschmidt - the issue, however, is that right now there is no such policy, and adding one potentially imposes a cost on every single bound function. Return value policies are useful all over the place; at the moment, however, only eigen would benefit from a hypothetical argument policy. |
Regarding https://eigen.tuxfamily.org/dox-devel/classEigen_1_1Ref.html On that page, the Eigen documentation also says that keeping the inner stride flexible can lead to significantly slower code. So in some settings it might be good to avoid |
@jagerman I see, that's a good point. Adding running time overhead everywhere is indeed an issue. When reading the pybind documentation, I saw a section on call policies: http://pybind11.readthedocs.io/en/master/advanced/functions.html#additional-call-policies Could that be an approach that avoids new overhead? Unfortunately I don't know much about the pybind internals. |
The latter doesn't technically have to be |
Interesting. On the other hand,
Overloading |
I'm not particularly tied to that implementation. |
Just replaced the |
Hmm, with this and #634 in mind, should |
Conversion just shouldn't happen in the first place with |
(That useless-almost-everywhere |
This rocks! (especially with the new
EDIT: How would you like me to commit this? Squash, or rebase? |
I'm trying out some benchmarks to get an idea of this. Will report back shortly.
I'll squash it into fewer commits (after figuring out the above) and rebase to current master, and let you know once I think it's ready to go. |
Currently when we do a conversion between a numpy array and an Eigen Vector, we allow the conversion only if the Eigen type is a compile-time vector (i.e. at least one dimension is fixed at 1 at compile time), or if the type is dynamic on *both* dimensions. This means we can run into cases where MatrixXd allow things that conforming, compile-time sizes does not: for example, `Matrix<double,4,Dynamic>` is currently not allowed, even when assigning from a 4-element vector, but it *is* allowed for a `Matrix<double,Dynamic,Dynamic>`. This commit also reverts the current behaviour of using the matrix's storage order to determine the structure when the Matrix is fully dynamic (i.e. in both dimensions). Currently we assign to an eigen row if the storage order is row-major, and column otherwise: this seems wrong (the storage order has nothing to do with the shape!). While numpy doesn't distinguish between a row/column vector, Eigen does, but it makes more sense to consistently choose one than to produce something with a different shape based on the intended storage layout.
`is_template_base_of<T>` fails when `T` is `const` (because its implementation relies on being able to convert a `T*` to a `Base<U>*`, which won't work when `T` is const). (This also agrees with std::is_base_of, which ignores cv qualification.)
A few of pybind's numpy constants are using the numpy-deprecated names (without "ARRAY_" in them); updated our names to be consistent with current numpy code.
numpy arrays aren't currently properly setting base: by setting `->base` directly, the base doesn't follow what numpy expects and documents (that is, following chained array bases to the root array). This fixes the behaviour by using numpy's PyArray_SetBaseObject to set the base instead, and then updates the tests to reflect the fixed behaviour.
Numpy raises ValueError when attempting to modify an array, while py::array is raising a RuntimeError. This changes the exception to a std::domain_error, which gets mapped to the expected ValueError in python.
Eigen::Ref objects, when returned, are almost always returned as rvalues; what's important is the data they reference, not the outer shell, and so we want to be able to use `::copy`, `::reference_internal`, etc. to refer to the data the Eigen::Ref references (in the following commits), rather than the Eigen::Ref instance itself. This moves the policy override into a struct so that code that wants to avoid it (or wants to provide some other Return-type-conditional override) can create a specialization of return_value_policy_override<Return> in order to override the override. This lets an Eigen::Ref-returning function be bound with `rvp::copy`, for example, to specify that the data should be copied into a new numpy array rather than referenced, or `rvp::reference_internal` to indicate that it should be referenced, but a keep-alive used (actually, we used the array's `base` rather than a py::keep_alive in such a case, but it accomplishes the same thing).
8fe1859
to
3174f30
Compare
This commit largely rewrites the Eigen dense matrix support to avoid copying in many cases: Eigen arguments can now reference numpy data, and numpy objects can now reference Eigen data (given compatible types). Eigen::Ref<...> arguments now also make use of the new `convert` argument use (added in PR pybind#634) to avoid conversion, allowing `py::arg().noconvert()` to be used when binding a function to prohibit copying when invoking the function. Respecting `convert` also means Eigen overloads that avoid copying will be preferred during overload resolution to ones that require copying. This commit also rewrites the Eigen documentation and test suite to explain and test the new capabilities.
test_eigen.py and test_numpy_*.py have the same @pytest.requires_eigen_and_numpy or @pytest.requires_numpy on every single test; this changes them to use pytest's global `pytestmark = ...` instead to disable the entire module when numpy and/or eigen aren't available.
3174f30
to
93c3212
Compare
It's already there, in the "Returning values to Python" section. I don't explicitly say it's a capsule, as that's more of an implementation detail the caller doesn't need to worry about, but the lifetime of the stored matrix is explicitly mentioned.
I ran some benchmarks on this where I construct an eigen matrix of various sizes, returning it by value: the capsule approach appears to always be faster, even for tiny matrices. I tested at My guess is that the copying is relatively cheap, but the memory allocation adds enough of an overhead that it's cheaper to do an Eigen move + numpy reference for any size.
I've rebased and squashed it down now. The first 6 commits are all more or less independent features/fixes required by Eigen and, I think, properly belong as separate commits; then the main one (with code+docs+test, plus the last few miscellaneous fixes squashed together), and finally 3174f30 is a small cleanup to the test scripts to avoid needing to put the same decorator in front of almost every test in the eigen and numpy tests. |
See this PR: pybind/pybind11#610
This looks good to merge to me. Are there any other things planned, or shall I push the button? |
(and thanks for benchmarking, that's a relief) |
Nope, it's good to go! |
Ok, awesome! |
pybind's current Eigen support relies on copying whenever transferring data back and forth between Eigen C++ types and numpy arrays. In many cases we can avoid such copies by either having Eigen reference a numpy array data, or having numpy reference eigen's data. This PR does that.
(I'd like to leave it here for a while for people to comment/criticize/etc. before merging!)
Full details are in the documentation commit, but the basic highlights are:
MatrixXd
), we now return a numpyarray
without copying which references theMatrixXd
data. The MatrixXd itself gets stashed into acapsule
which is tied to the lifetime of the returnedarray
.const MatrixXd &
) we copy by default, but you can get a numpy reference instead viareturn_value_policy::reference
or a reference tied to the parent via::reference_internal
.Eigen::Ref<MatrixXd>
orEigen::Ref<const MatrixXd>
, we avoid copying if possible. Unfortunately it is often not possible:Eigen::Ref
by default assumes a contiguous inner stride, which means contiguous columns for a MatrixXd. Two possible solutions:order='F'
), or change Eigen types fromMatrixXd
toMatrix<double, Dynamic, Dynamic, RowMajor>
. But even this isn't always enough: numpy'sa.transpose()
, for example, returns a view into the original numpy array, but just flips the column/row majors.Eigen::Ref<Eigen::MatrixXd, 0, Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic>>
. In addition to numpy arrays of either row/column contiguity, it also allows non-contiguous inputs such as array slices. Since this is really cumbersome to write, I addedpy::EigenDRef<MatrixType>
/py::EigenDMap<MatrixType>
/py::EigenDStride
as useful shortcuts.Eigen::Ref<MatrixXd>
) are supported. Unlike non-mutable references (e.g.Eigen::Ref<const MatrixXd>
) you won't be allowed to call such a function if a conversion of copy is required.Eigen::Block
s,Eigen::Map
s, andEigen::Ref
s when return a numpy array that references the block/map/matrix data without copying it. You'll very often want to use apy::return_value_policy::reference_internal
here (if a method returning internal data) or some other keep-alive mechanism, for obvious reasons. You can explicitly ask for data to be copied into a new numpy array by usingreturn_value_policy::copy
.flags.writeable
: you can't pass aflags.writeable = False
numpy array as anEigen::Map<Matrix>
argument. We also honour the other way: if you return a const Matrix value, reference, or Eigen::Ref, you end up with an array withflags.writeable = False
.This PR also contains a few other mostly-related odds and ends that I ran into when writing all of this up.