Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unchecked array access via proxy object #746

Merged
merged 10 commits into from
Mar 22, 2017

Conversation

jagerman
Copy link
Member

@jagerman jagerman commented Mar 19, 2017

My stab at addressing #599/#617.

This adds bounds-unchecked access to arrays through a a.unchecked<Type, Dimensions>() method which returns a proxy object (after checking dimensionality and mutability). For array_t<T>, the Type template parameter is omitted. A readonly version (which doesn't require mutability when being created) is available as a.unchecked_readonly<...>().

(I chose the name unchecked rather than unsafe because it isn't exactly unsafe to use this, it's just potentially unsafe if not done carefully).

The one slightly unusual addition from @dean0x7d 's simple example is specifying the Dimensions as a template parameter. I did this because it allows storage of the strides and shape in std::arrays; having them stored that way (as opposed to storing a copy of the array's strides/shape pointers) allows the
compiler to make significant optimizations that it can't make with a pointer; testing with nested loops of the form:

    for (size_t i0 = 0; i0 < r.shape(0); i0++)
        for (size_t i1 = 0; i1 < r.shape(1); i1++)
            ...
                r(i0, i1, ...) += 1;

over a 10 million element array gave me around a 25% speedup (versus using a pointer) for the 1D case, 33% for 2D, and saw the 5D case run about 2.5 times as fast; the latter, in particular, was a pretty remarkably speedup.

Since the primary purpose of this interface is for unfettered, high-performance access, requiring a compile-time dimensionality seems worthwhile given the gains it can achieve. As a nice side effect, it also lets us catch an invalid number of indices at compile time, and allows operator [] access to be enabled only for 1D arrays.

I haven't added the tests into the PR yet; I wanted to invite comments on the proposal first.

This adds bounds-unchecked access to arrays through a `a.unchecked<Type,
Dimensions>()` method.  (For `array_t<T>`, the `Type` template parameter
is omitted).  A readonly version (which doesn't require mutability) is
available as `a.unchecked_readonly<...>()`.

Specifying the Dimensions as a template parameter allows storage of an
std::array; having the strides and sizes stored that way (as opposed to
storing a copy of the array's strides/shape pointers) allows the
compiler to make significant optimizations of the shape() method that it
can't make with a pointer; testing with nested loops of the form:

    for (size_t i0 = 0; i0 < r.shape(0); i0++)
        for (size_t i1 = 0; i1 < r.shape(1); i1++)
            ...
                r(i0, i1, ...) += 1;

over a 10 million element array gives around a 25% speedup (versus using
a pointer) for the 1D case, 33% for 2D, and runs more than twice as fast
with a 5D array.

Since the primary purpose of this interface is to get maximum
performance, needing a compile-time dimensionality seems worthwhile
given the gains it can achieve.
@jagerman jagerman added this to the v2.2 milestone Mar 19, 2017
@jagerman jagerman mentioned this pull request Mar 19, 2017
@wjakob
Copy link
Member

wjakob commented Mar 19, 2017

I thin that this looks great. Regarding the dramatic performance improvements: it's possible that having the dimensions fixed at compile time makes the program structure simple enough that the compiler will add a vectorized path for your example computations (adding entries, etc.).

One really minor comments is that the reference classes should perhaps be in the detail namespace since they are not part of the explicit API. (this is just C++ duck typing where we return another type that behaves like an array).

@wjakob
Copy link
Member

wjakob commented Mar 19, 2017

^ @SylvainCorlay

@jagerman
Copy link
Member Author

@wjakob - I expect that's most of it; but it appears to also need the variables to be local to be able to apply that optimization. I'm guessing that with just a pointer, the compiler can't guarantee that something else hasn't changed the value at that memory location, and has to check it each time.

This gives a compile-time failure (rather than a later runtime failure)
if attempting to return the proxy object (e.g. from a bound lambda with
return type deduction).
@jagerman jagerman mentioned this pull request Mar 19, 2017
@aldanor
Copy link
Member

aldanor commented Mar 19, 2017

This looks great, especially the compile-time ndim on the proxies.

@wjakob wjakob modified the milestones: v2.1, v2.2 Mar 19, 2017
@wjakob
Copy link
Member

wjakob commented Mar 19, 2017

Should the unchecked references also support the data() and mutable_data (if applicable) functions?

@SylvainCorlay
Copy link
Member

Quick note on the writable flag: mutable accessors return non-const pointers and references, which can be used later on while the numpy array writability flag has changed. If there is no consistent way to support the mutability flag in C++, why not drop it completely?

@wjakob
Copy link
Member

wjakob commented Mar 19, 2017

I don't think I would call that an inconsistency. When a pybind11 function invoked from Python (i.e. the main use case of pybind11), the GIL is held, so there is a well-defined window were we have exclusive access to arrays.

@jagerman
Copy link
Member Author

Only enforcing writability at the time of making the reference is also what numpy itself does:

>>> z = np.array([[1,2],[3,4]])
>>> y = np.transpose(z)
>>> z.flags.writeable = False
>>> z[1,1] = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only
>>> z
array([[1, 2],
       [3, 4]])
>>> y[1,1] = 10
>>> z
array([[ 1,  2],
       [ 3, 10]])
>>> y
array([[ 1,  3],
       [ 2, 10]])

Yes, you can work around it if you try, but that doesn't mean it isn't still useful.

@jagerman jagerman changed the title WIP: add unchecked array access via proxy object Add unchecked array access via proxy object Mar 20, 2017
@SylvainCorlay
Copy link
Member

OK, fair enough for the mutability check.

@SylvainCorlay
Copy link
Member

Regarding static dimension, what alternative did you test this against? If you were storing e.g. std::vector shapes, a difference of performance can be attributed to the stack-allocated std::array over heap-allocated std::vector. However, if you were wrapping the underlying pointer, I don't see the reason why it should make a difference.

@jagerman
Copy link
Member Author

I compared against both std::vector and storing a copy of the shape()/strides() pointers. Both were indistinguishable from each other.

@SylvainCorlay
Copy link
Member

ok, it makes sense. So the only problem would be dynamic allocation vs stack allocation.

@wjakob
Copy link
Member

wjakob commented Mar 20, 2017

@jagerman: would you mind also adding data() and mutable_data()?

Other than that, is there anything else that needs to be done here?

@jagerman
Copy link
Member Author

What would data()/mutable_data() do that array.data()/array.mutable_data() doesn't do?

@jagerman
Copy link
Member Author

jagerman commented Mar 20, 2017

I think adding data() would also require adding a stride(n) method; without the stride, I don't see any use for direct access to the data() pointer. But that's easy enough to add. Edit: I was misremembering how data() worked (it's a post-stride data pointer).

Copy link
Member

@dean0x7d dean0x7d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Minor comment on access defaults: the existing functions are read-only by default (e.g. data()/mutable_data(), at()/mutable_at()), while the new functions flip that to read-write by default (unchecked()/unchecked_readonly()). It would be nice to make the default unchecked() call read-only for consitency.

m.def("increment_3d", [](py::array_t<double> x) {
auto r = x.unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
if (x.ndim() != 3)
throw std::runtime_error("error: 3D array required");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing something, but doesn't the unchecked() call already throw under the same condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yeah. That's leftover from earlier versions where it didn't.

* care: the array must not be destroyed or reshaped for the duration of the returned object,
* and the caller must take care not to access invalid dimensions or dimension indices.
*/
template <typename T, size_t Dims> detail::unchecked_reference<T, Dims> unchecked() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that both const and non-const versions of this function are needed. So far in pybind11, const has referred to the state of the internal pointer PyObject *m_ptr, but not to the pointee. E.g. attr() is const because obj.attr("something") = value will modify the Python object, but not the pointer stored in obj.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Additionally with the unchecked() -> mutable_unchecked() rename it makes no sense at all to have a mutable_unchecked() that returns a non-mutable reference.


template <typename T, size_t Dim>
struct type_caster<unchecked_const_reference<T, Dim>> {
static_assert(Dim == (size_t) -1 /* always fail */, "unchecked array proxy object is not castable");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jagerman
Copy link
Member Author

AFAICS, r.mutable_data(1, 2, 3) is going to be exactly equivalent to &r(1, 2, 3); I'm not entirely sure I see the point.

@wjakob
Copy link
Member

wjakob commented Mar 20, 2017

I just thought it is weird that code which uses mutable_data (maybe even without arguments to just get the pointer) would stop working if accessed through an unchecked reference. Thoughts?

@wjakob
Copy link
Member

wjakob commented Mar 20, 2017

(maybe I'm just missing something)

@jagerman
Copy link
Member Author

Such code is going to require change anyway, because it's going to have to access through a new proxy object rather than the array itself.

Makes it consistent with other numpy methods (e.g. mutable_at/at).

Also delete the const version of `unchecked` that mapped to
`unchecked_readonly`: it makes no sense at all with the rename for a
`mutable_unchecked` to return a non-mutable object.
@jagerman
Copy link
Member Author

Adding a:

template <typename... Ix> const T *data(Ix... ix) const { return &operator()(ix...); }

and exactly the same for mutable_data() (but in the mutable subclass, and without the two consts) would do it, I think; I just don't see that it offers much. (But I've no objection to adding it if you think it's worth having).

@wjakob
Copy link
Member

wjakob commented Mar 20, 2017

I think that would be nice. Ditto for the itemsize() (can now be implemented via sizeof()), size() and nbytes() methods.

@dean0x7d
Copy link
Member

A few more observations:

  • Is a runtime version needed? That is, .unchecked(3) in addition to .unchecked<3>()?
  • Iterators might be nice, but that's a lot more work for nD arrays, so it can probably wait.
  • We should probably discuss doccomment style consistently since there's a mixture of different styles for block comments /** */ in the codebase (leading asterisk, and without).

@jagerman
Copy link
Member Author

I pushed a change implementing dynamic dimensions as the default if the size is omitted (and thus defaults to -1), so you can write:

auto r1 = arr.unchecked<T>();
auto r2 = arr_t.unchecked();

I also added the various utility methods (itemsize(), data(), etc.) and added documentation/tests for both.

That also makes it easier to compare performance; these are for 1-, 2-, and 5-nested loops that increment the value or calculate the sum of all matrix values, for 10M-element matrices of various shapes; the values are the timeit times with number=500 calls:

fixed, 1D, 10M,  incr 3.946771406001062
fixed, 1D, 10M,  sum  4.139643211001385
fixed, 10K x 1K, incr 3.954216455000278
fixed, 10K x 1K, sum  4.140550066000287
fixed, 100x100x10x10x10, incr 4.9838552410001284
fixed, 100x100x10x10x10, sum  4.188112009000179
dynamic, 1D, 10M,  incr 5.28798617999928
dynamic, 1D, 10M,  sum  4.169090997002058
dynamic, 10K x 1K, incr 6.072964282000612
dynamic, 10K x 1K, sum  4.407604636999167
dynamic, 100x100x10x10x10, incr 10.352673255998525
dynamic, 100x100x10x10x10, sum  8.596683744002803

@wjakob
Copy link
Member

wjakob commented Mar 22, 2017

I pushed a change implementing dynamic dimensions as the default if the size is omitted (and thus defaults to -1)

That's super-neat! To me this all looks good (including the docs) -- shall we merge it?

@jagerman jagerman merged commit cc88278 into pybind:master Mar 22, 2017
@jagerman
Copy link
Member Author

Merged!

I ran another test to see how much of an improvement this makes versus the current checked access. The answer: huge.

checked, 1D, 10M,  incr 31.572535417000836
checked, 1D, 10M,  sum  22.71654115000092
checked, 10K x 1K, incr 44.53464991600049
checked, 10K x 1K, sum  39.15508063499874
checked, 100x100x10x10x10, incr 109.21409312399919
checked, 100x100x10x10x10, sum  102.60653751300015

(numbers are directly comparable to the previous post).

@dean0x7d
Copy link
Member

That's a pretty great speed up. Have you tried comparing to numpy itself? E.g. np.sum. I wonder if the compile-time dimensions give the proxy array an advantage over some of numpy's builtins.

Side note: The git history might have benefited from a bit of a cleanup. Fixes within the PR are not useful information when looking through global history.

@jagerman
Copy link
Member Author

Crap, you're right, I meant to squash that during merge. Can I undo a merge to fix it without messing up too much?

@wjakob
Copy link
Member

wjakob commented Mar 22, 2017

Yeah, that's a lot of commits. @jagerman: do you want to rewrite history and squash this down a bit?

@wjakob
Copy link
Member

wjakob commented Mar 22, 2017

You can just rebase -i on your end and then force-push.

@jagerman
Copy link
Member Author

Okay, I'll squash directly on master.

@wjakob
Copy link
Member

wjakob commented Mar 22, 2017

Looks like there is a problem with PyPy again

@jagerman
Copy link
Member Author

Rewritten.

@jagerman
Copy link
Member Author

PyPy 5.7 was released a few days ago; I suggest we switch the build to use the released version. (Perhaps we can add a 5.8 nightly as a failure-allowed build?).

@nbecker
Copy link

nbecker commented Mar 22, 2017

I'm happy to see speedups, and unchecked access is available. Unfortunately from my perspective, it isn't default and requires extra (code) steps. My typical use would be where indexes are correct by design, and checking access is a waste. It would be nice if I could write:
void F(unchecked_array<int,2> A)
And have all the same methods available on unchecked_array as on the default checked version (docs now mention a subset of methods available)

@jagerman
Copy link
Member Author

There are tradeoffs: the biggest issue/objection was that it would be encoding the access mechanism by using a fundamentally different C++ type for safe versus unsafe access arrays, despite the type representing exactly the same underlying object. You'd need yet another C++ type (either under a separate name, or via a template parameter--either way still a distinct type) to distinguish between read-only and mutable input arrays, so now we'd have 3 C++ types to represent exactly the same object--and they'd need implicit conversions defined so that you could switch between them at will (with mutability checks when switching from a read-only to mutable type). There would also be a large amount of method duplication to support it all.

So it's possible, but I'm not really convinced that it's worth the cost just to avoid needing to write auto r = a.unchecked<2>().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants