Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eigen/numpy referencing #610

Merged
merged 8 commits into from
Feb 24, 2017
Merged

Conversation

jagerman
Copy link
Member

pybind's current Eigen support relies on copying whenever transferring data back and forth between Eigen C++ types and numpy arrays. In many cases we can avoid such copies by either having Eigen reference a numpy array data, or having numpy reference eigen's data. This PR does that.

(I'd like to leave it here for a while for people to comment/criticize/etc. before merging!)

Full details are in the documentation commit, but the basic highlights are:

  • when returning an rvalue Eigen concrete type (e.g. MatrixXd), we now return a numpy array without copying which references the MatrixXd data. The MatrixXd itself gets stashed into a capsule which is tied to the lifetime of the returned array.
  • when returning a reference (e.g. const MatrixXd &) we copy by default, but you can get a numpy reference instead via return_value_policy::reference or a reference tied to the parent via ::reference_internal.
  • when passing a numpy array to a function taking an Eigen::Ref<MatrixXd> or Eigen::Ref<const MatrixXd>, we avoid copying if possible. Unfortunately it is often not possible: Eigen::Ref by default assumes a contiguous inner stride, which means contiguous columns for a MatrixXd. Two possible solutions:
    • Thus you either need to pay attention to the order of passed numpy arrays (i.e. creating with order='F'), or change Eigen types from MatrixXd to Matrix<double, Dynamic, Dynamic, RowMajor>. But even this isn't always enough: numpy's a.transpose(), for example, returns a view into the original numpy array, but just flips the column/row majors.
    • A nicer solution is to take arguments as Eigen::Ref<Eigen::MatrixXd, 0, Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic>>. In addition to numpy arrays of either row/column contiguity, it also allows non-contiguous inputs such as array slices. Since this is really cumbersome to write, I added py::EigenDRef<MatrixType>/py::EigenDMap<MatrixType>/py::EigenDStride as useful shortcuts.
  • mutable Eigen arguments (e.g. Eigen::Ref<MatrixXd>) are supported. Unlike non-mutable references (e.g. Eigen::Ref<const MatrixXd>) you won't be allowed to call such a function if a conversion of copy is required.
  • you can return Eigen::Blocks, Eigen::Maps, and Eigen::Refs when return a numpy array that references the block/map/matrix data without copying it. You'll very often want to use a py::return_value_policy::reference_internal here (if a method returning internal data) or some other keep-alive mechanism, for obvious reasons. You can explicitly ask for data to be copied into a new numpy array by using return_value_policy::copy.
  • we honour flags.writeable: you can't pass a flags.writeable = False numpy array as an Eigen::Map<Matrix> argument. We also honour the other way: if you return a const Matrix value, reference, or Eigen::Ref, you end up with an array with flags.writeable = False.
  • you can even do some crazy nested numpy-eigen-numpy-eigen-... referencing, if you feel so inclined (one of the added tests does this just to see that it works).

This PR also contains a few other mostly-related odds and ends that I ran into when writing all of this up.

@ludwigschmidt
Copy link

Looks really awesome! Two questions:

  • I didn't know about Eigen::Ref<Eigen::MatrixXd, 0, Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic>> before. Does it have (significant) running time overhead compared to using a fixed and known row- or column-major ordering?

  • Mutable arguments fail if a copy would be required. Is it possible to make this behavior available for immutable arguments, too? In some cases (e.g., very large matrices), it might be better to produce an error than to copy the large matrix. In other settings a copy is certainly not an issue. So it might be best if the Eigen / numpy bindings supported both.

@jagerman
Copy link
Member Author

jagerman commented Jan 21, 2017

@ludwigschmidt - it likely will have some impact, depending on what you're doing with the Ref, as vectorization will be disabled, which can be a big deal (depending on your CPU capabilities, of course). Another option is to keep all the actual storage in Eigen (with everything in numpy just a view on Eigen data) in which case you don't need the fully Dynamic Ref.

I'll ponder the fail-instead-of-copy issue.

@yesint
Copy link
Contributor

yesint commented Jan 21, 2017

Looks great!

Thus you either need to pay attention to the order of passed numpy arrays (i.e. creating with order='F'), or change Eigen types from MatrixXd to Matrix<double, Dynamic, Dynamic, RowMajor>.

This is an a very important point! End user is likely to have no idea about different storage order so the converter should provide the most logical and expected behavior. At the same time signatures of exposed C++ functions should not change just because we are making bindings for them.
So, I think the best default behavior is:

  • When returning Eigen matrices to create numpy array where first index is col of Eigen matrix (referencing the same data is possible).
  • When getting numpy arguments expect that in numpy array first index is column number.

@jagerman
Copy link
Member Author

@yesint: when returning from Eigen to numpy there is no problem: numpy will accept any stride so we can always map into an Eigen matrix (or Ref/Map/Block).

When going the other way, swapping indices (which is essentially doing a transpose on the argument) seems rather unexpected. It also won't work in some cases: e.g. a mult(a, a.transpose()) would either get two a or two a.transpose(), neither of which can be multiplied together.

One thing I should point out, though, is that Eigen itself will make a copy if you try to pass a storage-incompatible matrix into a Ref<const M>, so we're not really doing something totally unexpected here.


.. code-block:: cpp

m.def("scale", [](py::EigenRef<Eigen::MatrixXd> m, double c) { m *= c; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you probably mean to use py::EigenDRef here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yeah. I started with EigenRef, then added the D (for Dynamic) to make it clear it isn't just the typical Eigen::Ref<M>.

provides a ``py::EigenDRef<MatrixType>`` type alias for your convenience (along
with EigenDMap for the equivalent Map, and EigenDStride for just the stride
type).

Copy link
Contributor

@patrikhuber patrikhuber Jan 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand from this documentation why we need a fully dynamic stride type - why isn't Eigen::RowMajor sufficient to get the same layout as numpy?
Edit: Ah I think I see now - this changes the stride, but not actually the memory layout - so it's less intrusive!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy doesn't always have a row-major layout, either. E.g. if a is row-major, then a.transpose() will be column major.

Also the stride and the memory layout are the same concept: Row major really just means the column stride is 1 (i.e. rows are contiguous), and column major just means the row stride is 1 (i.e. columns are contiguous). The restriction in Eigen is essentially that Ref<Matrix> requires an inner stride of 1 (where inner = column for row-major and = row for column-major).

(When comparing to numpy strides, numpy strides are in bytes, so multiply by the dtype size when looking at numpy array strides).

or columns of a row-major matrix), but not along the inner dimension.

This type, however, has the added benefit of also being able to map numpy array
slices. For example, you could use the following (contrived) example uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, remove "uses" or "you could use"

@patrikhuber
Copy link
Contributor

patrikhuber commented Jan 21, 2017

This looks awesome! I'll test it in my project over the next few days/weeks!

A few remarks/questions:

when returning an rvalue Eigen concrete type (e.g. MatrixXd)

I'm absolutely not sure here but wouldn't that be an lvalue?

if you return a const Matrix value, reference, or Eigen::Ref, you end up with an array with flags.writeable = False

In that case, why "when returning a reference (e.g. const MatrixXd &) we copy by default"? (Just curious or maybe I don't fully understand - probably there's a perfectly valid reason for it).

Do I understand correctly that passing a np-array to a C++ function that takes a MatrixXd& or const & would occur a copy? So in this case the general advice would be to change a C++ API to use Eigen::Ref instead of & / const &? I guess you've thought about that and there's a reason for it but is it not possible to get the same behaviour with just normal references, without having to resort to Eigen::Ref's everywhere?

This might be a stupid question... but: What happens with objects containing Eigen matrices, when these objects are passed by const&? Let's say I have

class my_class {
    MatrixXd big_matrix;
}

and I'm binding a function

f(const my_class& my) { ... };

When calling this function from Python, does big_matrix get copied? (my_class is exposed to Python via pybind11 as well)

Thank you very much for your great work on this!

@jagerman
Copy link
Member Author

when returning an rvalue Eigen concrete type (e.g. MatrixXd)
I'm absolutely not sure here but wouldn't that be an lvalue?

No, for ordinary object returns such as MatrixXd foo() { return MatrixXd::Random(5, 5); }; we capture the MatrixXd as an rvalue and move it into a new temporary MatrixXd.

In that case, why "when returning a reference (e.g. const MatrixXd &) we copy by default"? (Just curious or maybe I don't fully understand - probably there's a perfectly valid reason for it).

Mainly because that's how lvalue reference returns work in the rest of pybind (with the default return_value_policy::automatic). The justification is that we really can't know how long the lvalue reference is going to stay valid, so you need to take an extra step in the binding code to say "no, really, it's safe to keep using this" by explicitly binding with return_value_policy::reference or "no, really, it's safe to keep using this as long as you also keep the object itself alive" by using return_value_policy::reference_internal.

Do I understand correctly that passing a np-array to a C++ function that takes a MatrixXd& or const & would occur a copy? So in this case the general advice would be to change a C++ API to use Eigen::Ref instead of & / const &? I guess you've thought about that and there's a reason for it but is it not possible to get the same behaviour with just normal references, without having to resort to Eigen::Ref's everywhere?

Yes, you're correct. The issue is that in order to provide a MatrixXd instance to a function, we have to have a MatrixXd instance in the first place, but we don't, we just have a chunk of data in memory with a map of how to interpret that data. MatrixXd, on the other hand, is a concrete class that always owns its own data: being able to reference some other data is the entire point of Ref<MatrixXd>. This is no different than Eigen itself, though: if you try to pass something that isn't exactly a MatrixXd to a function taking a const MatrixXd &, you'd invoke a copy. For example, while you couldn't pass a Matrix4d without a copy, you can pass the Matrix4d to a Ref<const MatrixXd> without needing a copy.

What happens with objects containing Eigen matrices ...

That won't be a problem at all: pybind11 internally works by storing a C++ instance of your my_class inside the wrapper that is exposed to Python; when a method is invoked, we call the C++ method on that internal instance; no copy is needed.

@jagerman jagerman force-pushed the eigen-numpy-referencing branch 2 times, most recently from 0b5621a to a92bbff Compare January 22, 2017 05:37
@jagerman
Copy link
Member Author

Removed the gcc-7 fix commit (I included it in PR #611, which was changing related code anyway), and incorporated @patrikhuber's doc fixes.

@patrikhuber
Copy link
Contributor

@jagerman Cool! Thank you very much for your explanations. Thanks in particular for the explanation regarding return_value_policy and the reference-passing - these are explained very clearly!

Copy link
Member

@dean0x7d dean0x7d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great handling of copy/move/reference with return_value_policy and also const-correctness with the writeable flag.

Regarding Eigen::Ref arguments to constant data with fixed strides, I feel like this should never make a copy. Eigen itself wouldn't make a copy there to adjust for the strides, and I would imagine most people use Ref specifically to avoid a copy so that might be surprising behavior.

#if EIGEN_VERSION_AT_LEAST(3,3,0)
using EigenIndex = Eigen::Index;
#else
using EigenIndex = std::ptrdiff_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be using EigenIndex = EIGEN_DEFAULT_DENSE_INDEX_TYPE since it's user-configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -132,9 +132,7 @@ class cpp_function : public function {
? &rec->data : rec->data[0]);

/* Override policy for rvalues -- always move */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not accurate any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: s/always/usually to enforce rvp::move on an rvalue/

m.def("get_rm_const_ref", []() { return Eigen::Ref<const MatrixXdR>(get_rm()); });
// Just the corners (via a Map instead of a Ref):
m.def("get_cm_corners", get_cm_corners);
m.def("get_cm_corners_const", get_cm_corners_const);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also use lambdas here (and everywhere), like above? Should reduce duplication and make things easier to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking everywhere: choleskyn, incr_matrix, even_rows etc. As far as I can see only add_rm() and add_cm() are ever used more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably worthwhile. I was just changing the ones I added, but I'll go change the rest, too.

template <typename T> using is_eigen_ref = is_template_base_of<Eigen::RefBase, T>;
//template <typename T> struct is_eigen_ref : std::false_type {};
//template <typename PlainType, int Options, typename Stride> struct is_eigen_ref<Eigen::Ref<PlainType, Options, Stride>>
// : std::true_type {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yeah, fixed.

// operator Type*() { return value.get(); }
// operator Type&() { if (!value) pybind11_fail("Eigen::Ref<...> value not loaded"); return *value; }
// template <typename _T> using cast_op_type = pybind11::detail::cast_op_type<_T>;
//};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fixed.

@dean0x7d
Copy link
Member

@jagerman said:
One thing I should point out, though, is that Eigen itself will make a copy if you try to pass a storage-incompatible matrix into a Ref<const M>, so we're not really doing something totally unexpected here.

AFAIK The only time Ref<const M> is allowed to create a temporary is when accepting an expression. Then it evaluates that expression into a compatible format. But if it's assigned a concrete matrix, it's not going to accept incompatible strides.

@dean0x7d
Copy link
Member

@jagerman said:
(I'd like to leave it here for a while for people to comment/criticize/etc. before merging!)

This PR also contains a few other mostly-related odds and ends that I ran into when writing all of this up.

Consider moving the smaller fixes into another PR which can be merged quickly. It would benefit other PRs and avoid possible merge conflicts between parallel work.

@jagerman
Copy link
Member Author

AFAIK The only time Ref is allowed to create a temporary is when accepting an expression.

The docs say it'll copy, and it indeed appears to:

#include <Eigen/Core>
#include <iostream>

using namespace Eigen;

void f(Ref<const MatrixXd> x) { std::cout << x(0, 1) << "\n"; }

int main() {
    Matrix<double, 4, 4, RowMajor> x;
    for (int i = 0; i < 16; i++) x(i / 4, i % 4) = i;
    f(x);
}

print 1.

@jagerman
Copy link
Member Author

Re: copying on a Ref<const Matrix>, my main concern is that there are likely legitimate C++ interfaces out there that accept arguments by Ref<const Matrix> as an accepts-anything-conformable argument, because indeed Eigen will make a copy for expressions or comformable-but-not-stride-compatible types.

So essentially we're trading off pybind11 making the interface more restrictive to python than it is to C++ against the legitimate desire to not want a copy.

I originally started playing around with argument passing to think about ways to have a sort of argument_value_policy. (This let to me getting sidetracked with #611 being the result), so that you could say something like:

.def("whatever", &Class::whatever, py::arg("x") = argument_policy::allow_copy

which would essentially resolve this by letting the pybind11 layer indicate whether it wants copyability or not (either default to allowing copy with a arg_policy::no_copy, or default to disallowing with arg_policy::allow_copy).

@jagerman jagerman mentioned this pull request Jan 22, 2017
@ludwigschmidt
Copy link

So essentially we're trading off pybind11 making the interface more restrictive to python than it is to C++ against the legitimate desire to not want a copy.

I agree that there are valid reasons for either approach. Leaving the decision to the pybind user via argument policies would probably be ideal.

@jagerman
Copy link
Member Author

Rebased with @dean0x7d 's comments integrated, and with the commits that this PR doesn't depend on moved out to #613 and #614.

@jagerman
Copy link
Member Author

@ludwigschmidt - the issue, however, is that right now there is no such policy, and adding one potentially imposes a cost on every single bound function. Return value policies are useful all over the place; at the moment, however, only eigen would benefit from a hypothetical argument policy.

@ludwigschmidt
Copy link

Regarding Ref and temporary copies: it looks like Ref<MatrixXf> is not allowed to copy but const Ref<const MatrixXf>& is allowed to copy. See

https://eigen.tuxfamily.org/dox-devel/classEigen_1_1Ref.html

On that page, the Eigen documentation also says that keeping the inner stride flexible can lead to significantly slower code. So in some settings it might be good to avoid EigenDRef. They suggest a workaround based on function overloading. Is such an overloading approach also possible via pybind?

@ludwigschmidt
Copy link

@jagerman I see, that's a good point. Adding running time overhead everywhere is indeed an issue.

When reading the pybind documentation, I saw a section on call policies:

http://pybind11.readthedocs.io/en/master/advanced/functions.html#additional-call-policies

Could that be an approach that avoids new overhead? Unfortunately I don't know much about the pybind internals.

@jagerman
Copy link
Member Author

Regarding Ref and temporary copies: it looks like Ref<MatrixXf> is not allowed to copy but const Ref<const MatrixXf>& is allowed to copy.

The latter doesn't technically have to be const or an lvalue (it's the const on the MatrixType that is important), but yes, you're right that Ref<Matrix> doesn't allow copies: this PR enforces that, too.

@dean0x7d
Copy link
Member

The docs say it'll copy, and it indeed appears to:

Interesting. On the other hand, Matrix<double, 4, 1> -> Ref<const Matrix<double, 1, 4>> is a compile-time error even though both hold simple contiguous data.

I originally started playing around with argument passing to think about ways to have a sort of argument_value_policy. (This let to me getting sidetracked with #611 being the result), so that you could say something like:

.def("whatever", &Class::whatever, py::arg("x") = argument_policy::allow_copy

Overloading arg assignment like this doesn't seem very nice, but I can't say I have a better suggestion.

@jagerman
Copy link
Member Author

Overloading arg assignment like this doesn't seem very nice, but I can't say I have a better suggestion.

I'm not particularly tied to that implementation. py::arg_nocopy("name") could work as well.

@jagerman
Copy link
Member Author

jagerman commented Feb 5, 2017

Just replaced the EigenNoCopyRef with a .noconvert()-respecting implementation to go with the just-merged #634.

@aldanor
Copy link
Member

aldanor commented Feb 6, 2017

Hmm, with this and #634 in mind, should array ctors force-set writeable flag to false if conversion took place? Or even rather, only allow writeable on arrays passed in as non-convert args? ("lvalues")

@jagerman
Copy link
Member Author

jagerman commented Feb 6, 2017

Conversion just shouldn't happen in the first place with noconvert(): they should fail to load entirely if loaded with convert = false by returning false from load(). If no other overload picks it up without conversion, and it doesn't have an explicit .noconvert(), it'll get called again with convert = true. At that point, I don't see any reason to expect writeable to be false: post-conversion we have a new array with new data that should be usable normally.

@jagerman
Copy link
Member Author

jagerman commented Feb 6, 2017

(That useless-almost-everywhere bool in load() has a use now!)

@wjakob
Copy link
Member

wjakob commented Feb 14, 2017

This rocks! (especially with the new bool convert). A few minor comments:

  1. It would be good to mention in the docs that the a returned Eigen::Matrix is stashed into a capsule so that the array can reference it without lifetime issues. (that is pretty cool btw)

  2. I'm wondering if 1. adds potentially too much overhead for trivial kinds of situations. Imagine a program that does computations with 3D vectors -- these should probably be copied rather than using such a fancy copy-avoiding approach. (Not that this kind of program would run particularly efficiently before, but adding a big extra slowdown would be unfortunate)

  3. The docs are amazing!! 👍

EDIT: How would you like me to commit this? Squash, or rebase?

@jagerman
Copy link
Member Author

I'm wondering if 1. adds potentially too much overhead for trivial kinds of situations.

I'm trying out some benchmarks to get an idea of this. Will report back shortly.

EDIT: How would you like me to commit this? Squash, or rebase?

I'll squash it into fewer commits (after figuring out the above) and rebase to current master, and let you know once I think it's ready to go.

Currently when we do a conversion between a numpy array and an Eigen
Vector, we allow the conversion only if the Eigen type is a
compile-time vector (i.e. at least one dimension is fixed at 1 at
compile time), or if the type is dynamic on *both* dimensions.

This means we can run into cases where MatrixXd allow things that
conforming, compile-time sizes does not: for example,
`Matrix<double,4,Dynamic>` is currently not allowed, even when assigning
from a 4-element vector, but it *is* allowed for a
`Matrix<double,Dynamic,Dynamic>`.

This commit also reverts the current behaviour of using the matrix's
storage order to determine the structure when the Matrix is fully
dynamic (i.e. in both dimensions).  Currently we assign to an eigen row
if the storage order is row-major, and column otherwise: this seems
wrong (the storage order has nothing to do with the shape!).  While
numpy doesn't distinguish between a row/column vector, Eigen does, but
it makes more sense to consistently choose one than to produce
something with a different shape based on the intended storage layout.
`is_template_base_of<T>` fails when `T` is `const` (because its
implementation relies on being able to convert a `T*` to a `Base<U>*`,
which won't work when `T` is const).

(This also agrees with std::is_base_of, which ignores cv qualification.)
A few of pybind's numpy constants are using the numpy-deprecated names
(without "ARRAY_" in them); updated our names to be consistent with
current numpy code.
numpy arrays aren't currently properly setting base: by setting `->base`
directly, the base doesn't follow what numpy expects and documents (that
is, following chained array bases to the root array).

This fixes the behaviour by using numpy's PyArray_SetBaseObject to set
the base instead, and then updates the tests to reflect the fixed
behaviour.
Numpy raises ValueError when attempting to modify an array, while
py::array is raising a RuntimeError.  This changes the exception to a
std::domain_error, which gets mapped to the expected ValueError in
python.
Eigen::Ref objects, when returned, are almost always returned as
rvalues; what's important is the data they reference, not the outer
shell, and so we want to be able to use `::copy`,
`::reference_internal`, etc. to refer to the data the Eigen::Ref
references (in the following commits), rather than the Eigen::Ref
instance itself.

This moves the policy override into a struct so that code that wants to
avoid it (or wants to provide some other Return-type-conditional
override) can create a specialization of
return_value_policy_override<Return> in order to override the override.

This lets an Eigen::Ref-returning function be bound with `rvp::copy`,
for example, to specify that the data should be copied into a new numpy
array rather than referenced, or `rvp::reference_internal` to indicate
that it should be referenced, but a keep-alive used (actually, we used
the array's `base` rather than a py::keep_alive in such a case, but it
accomplishes the same thing).
This commit largely rewrites the Eigen dense matrix support to avoid
copying in many cases: Eigen arguments can now reference numpy data, and
numpy objects can now reference Eigen data (given compatible types).

Eigen::Ref<...> arguments now also make use of the new `convert`
argument use (added in PR pybind#634) to avoid conversion, allowing
`py::arg().noconvert()` to be used when binding a function to prohibit
copying when invoking the function.  Respecting `convert` also means
Eigen overloads that avoid copying will be preferred during overload
resolution to ones that require copying.

This commit also rewrites the Eigen documentation and test suite to
explain and test the new capabilities.
test_eigen.py and test_numpy_*.py have the same
@pytest.requires_eigen_and_numpy or @pytest.requires_numpy on every
single test; this changes them to use pytest's global `pytestmark = ...`
instead to disable the entire module when numpy and/or eigen aren't
available.
@jagerman
Copy link
Member Author

jagerman commented Feb 15, 2017

  1. It would be good to mention in the docs that the a returned Eigen::Matrix is stashed into a capsule so that the array can reference it without lifetime issues. (that is pretty cool btw)

It's already there, in the "Returning values to Python" section. I don't explicitly say it's a capsule, as that's more of an implementation detail the caller doesn't need to worry about, but the lifetime of the stored matrix is explicitly mentioned.

  1. I'm wondering if 1. adds potentially too much overhead for trivial kinds of situations.

I ran some benchmarks on this where I construct an eigen matrix of various sizes, returning it by value: the capsule approach appears to always be faster, even for tiny matrices. I tested at -O3 and -Os, for sizes 1×1, 3×1, 3×3, 10×10, 100×100, and 1000×1000. There's an expected penalty for the largest one: the procedure with numpy copying takes about 7.5 times longer. For the smaller sizes, copying takes between 27% and 48% longer, but on all those sizes, it always takes longer.

My guess is that the copying is relatively cheap, but the memory allocation adds enough of an overhead that it's cheaper to do an Eigen move + numpy reference for any size.

EDIT: How would you like me to commit this? Squash, or rebase?

I've rebased and squashed it down now. The first 6 commits are all more or less independent features/fixes required by Eigen and, I think, properly belong as separate commits; then the main one (with code+docs+test, plus the last few miscellaneous fixes squashed together), and finally 3174f30 is a small cleanup to the test scripts to avoid needing to put the same decorator in front of almost every test in the eigen and numpy tests.

@wjakob
Copy link
Member

wjakob commented Feb 24, 2017

This looks good to merge to me. Are there any other things planned, or shall I push the button?

@wjakob
Copy link
Member

wjakob commented Feb 24, 2017

(and thanks for benchmarking, that's a relief)

@jagerman
Copy link
Member Author

Nope, it's good to go!

@wjakob
Copy link
Member

wjakob commented Feb 24, 2017

Ok, awesome!

@wjakob wjakob merged commit 2a75784 into pybind:master Feb 24, 2017
@jagerman jagerman modified the milestone: v2.1 Mar 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants