Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython ND Array iterator #1510

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

vighneshbirodkar
Copy link
Contributor

Thoughts appreciated.
Will submit the neighborhood iterator once this is approved.

@jni
Copy link
Member

jni commented May 15, 2015

What's the difference between pxi and pyx?

iter.next()

del iter
assert np.sum(arr) == sum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to import numpy somewhere? Also, don't overwrite built-in names...

@jni
Copy link
Member

jni commented May 15, 2015

Again, maybe my Cython noob-ness showing, but is it possible to make this more Pythonic? Here's how I would like it to read:

while iter:
    total += next(iter)

I particularly dislike the get_int and get_float methods... imho the iterator should "emit" the same type as the underlying array.

Finally, quite often, we want not just the value but the current index. Is there a way for you to get that easily with a current_index method? (This will be necessary for the neighbour iterator anyway, I think.)

@vighneshbirodkar
Copy link
Contributor Author

@jni, a pyx file is to be compiled into C code and eventually an .so or .dll file. A pxi is analogous to the C #include directive, it will simply copy paste the contents.

@stefanv
Copy link
Member

stefanv commented May 15, 2015

/cc @cournape

@stefanv
Copy link
Member

stefanv commented May 15, 2015

/cc @njsmith

@njsmith
Copy link

njsmith commented May 15, 2015

I am summoned! Uh. What am I looking at, and why?
On May 15, 2015 11:19 AM, "Stefan van der Walt" notifications@github.com
wrote:

/cc @njsmith https://github.com/njsmith


Reply to this email directly or view it on GitHub
#1510 (comment)
.

@ahojnnes
Copy link
Member

I probably missed the discussion, but how is this useful or better than for item in iterator?

@vighneshbirodkar
Copy link
Contributor Author

@ahojnnes this translates to a much more efficient C code and is intended for fadt Cython ND iteration. See #1495

@njsmith We are trying to write an easy to use as well as fast Cython wrapper around the NumPy iterator API

edit: @njsmith , see #1495 as well

@jni
Copy link
Member

jni commented May 16, 2015

@vighneshbirodkar is the .pxi significantly faster? As I mentioned, the lack of scoping is really frustrating to me...

@stefanv
Copy link
Member

stefanv commented May 16, 2015 via email

@ahojnnes
Copy link
Member

Why is this not a PR to Cython then? It should be relatively easy to generate the appropriate code whenever the iterable is a numpy array? So, that you can actually use nice syntax, e.g. for i in array...

@njsmith
Copy link

njsmith commented May 16, 2015

This looks really weird to me. Aside from the part where you're writing a .pxi instead of a .pxd... you can't free the nditer object, it's a regular refcounted Python object. And the exposed interface (of doing a single one-element-at-a-time iteration over a single array) is incredibly tiny compared to what nditer already exposes to C and Python. And the methods are all like single-line renames of existing nditer functions? (I guess I could be confused and you're wrapping one of the older iteration interfaces in numpy, but I'm 99% sure that PyArrayIterObject is the same as nditer.)

Like, if you just call the Python function np.nditer (which has a very rich construction API and who cares if it's a bit slow b/c it's not the inner loop), this returns a new Python object, and you just have to tell Cython that btw this Python object is specifically of the PyArrayIterObject type, and now you can call the low-level stuff on it directly while letting Cython take care of refcnting. Wouldn't that make more sense?

@vighneshbirodkar
Copy link
Contributor Author

Hello @njsmith , @stefanv thanks for looking into this
According to the numpy pxd file, keeping function definitions in .pxd file is still experimental, that's why I decided to write a .pxi file. With that said, I'll give it a try and see what happens. I will remove the free call.

Yes, the functionality offered by my code is indeed trivial. My only intent is to let it's user not have to do pointer manipulation and C imports in order to iterate over an ND array. With this, all they have to do is call next and get_* functions.

As you said np.nditer would do this with some loss of speed. But the problem comes when we try to iterate over an ND array and its ND neighborhood. This would need 2 python iterators one inside the other, that's when things start getting really slow. That's why I eventually also plan to wrap the Neighborhood Iterator. I believe iterating over an ND image and its ND neighborhood is pretty useful functionality to have. You can do it now using the C API from withing Cython, but that would make the code a lot less Pythonic and shift your focus to getting the imports and pointer manipulation right.

TL;DR The API(when complete) will let the user iterate over an ND array and its neighborhood without being familiar with Numpy C API.

Edit : corrected he to they

@njsmith
Copy link

njsmith commented May 16, 2015

If you just remove the free() then you'll have a memory leak. You need to either call Py_D
ECREF or (better) use the syntax that tells cython that these objects are actually python objects and not just C structs, so that it does refcnting automatically. See how PyArrayObject gets wrapped in numpy.pxd.

With this, all he has to do is call next and get_* functions.

Friendly reminder that not all programmers are male.

As you said np.nditer would do this with some loss of speed.

You're using nditer, just via a really limited api. The loss of speed of using the full api would be unmeasurable for any reasonable sized image and would make your code simpler -- all you really need for your purposes is some utility functions (not a class) that call next and return properly cast pointers. For the neighborhood iterator then yeah iirc there is currently no python wrapper for the constructor function so you also need to wrap that. (Maybe numpy should have a python level api for this built in though, if you get inspired... ;-))

@vighneshbirodkar
Copy link
Contributor Author

@njsmith Now that you point it out, the class does seem like an overkill, will make the changes soon.

I would disagree with you regarding the loss of speed. Python iterators would make python function calls which will come with its own over head. On the other hand PyArray_ITER_NEXT is just a macro with a for loop inside.

@vighneshbirodkar
Copy link
Contributor Author

@njsmith I'll run a quick benchmark and share the result soon.

@vighneshbirodkar
Copy link
Contributor Author

@njsmith See https://github.com/vighneshbirodkar/cy_iter/blob/master/iter.pyx

The code is not exactly the same, but similar, the results on my system were

Numpy took  0.000637054443359
ans =  2161916
Cython took  0.00178599357605
ans =  2161916
Python took  0.204711914062
ans =  2161916

I am assuming that the eventual operation intended is not something as simply as the arithmetic sum.

Now think about iterating over neighbors, let's say over a 2D image. For each pixel, you would have a Python iterator going over 8 values. The difference would be even more obvious then.

@njsmith
Copy link

njsmith commented May 16, 2015

Read again, you're seeing what you expect me to say, not what I'm saying
:-). You can call PyIter_ITER_NEXT on the object returned by np.nditer
On May 16, 2015 12:07 PM, "Vighnesh Birodkar" notifications@github.com
wrote:

@njsmith https://github.com/njsmith Now that you point it out, the
class does seem like an overkill, will make the changes soon.

I would disagree with you regarding the loss of speed. Python iterators
would make python function calls which will come with its own over head. On
the other hand PyArray_ITER_NEXT is just a macro with a for loop inside.


Reply to this email directly or view it on GitHub
#1510 (comment)
.

@vighneshbirodkar
Copy link
Contributor Author

@njsmith
Okay :-). I'll re-factor the file and break it into functions.

Also, multiple get_* functions are unattractive, as Juan also pointed out. Any idea how we could use fused types to solve this ? My question on the Cython mailing list is yet unanswered :-(
https://groups.google.com/forum/#!topic/cython-users/rcS4ZRXXkgM

@vighneshbirodkar
Copy link
Contributor Author

@njsmith (edit: If) I am not wrong the nditer function relies on the newer NpyIter struct. This eventually uses the NpyIter_GetIterNext function which returns a function pointer to be used for iteration. Wouldn't is be more convenient if this were wrapped up in a class ?

@njsmith
Copy link

njsmith commented May 17, 2015

Ah, yeah, you're probably right :-). I am not at all an expert on the details of numpy's iteration APIs (and they're very confusing!), so while I tried to figure out what exactly was going on with PyArrayIterObject before posting I probably got it wrong.

I can't really say whether any particular API would be more or less convenient than just using the existing numpy API directly. If the only difference is that the new API is better documented then you're probably better off submitting doc patches :-). If you have other improvement ideas then we'd certainly like to hear them upstream too, though that wouldn't help in the short term (b/c skimage needs to support older numpys).

@JDWarner
Copy link
Contributor

Just my 2 cents - I have no problem with Cython up through typed memoryviews, but find the NumPy C API difficult to grok. Even if it's possible using pointers and the C API presently, there's a reason few of our algorithms go this route right now - it's a barrier both to implementation and maintenance. An efficient and elegant Cython implementation, even if just a wrapper (and even if slightly less efficient), would let us realize these gains more generally across the package.

@vighneshbirodkar
Copy link
Contributor Author

Since we support numpy 0.9, we can't assume the Npy_* API available to us.
And, as @njsmith suggested, we cannot be sure np.nditer wraps which one
of these API.

And, as the documentation[1] suggests, neighbourhood iteration is not
supported with the new API.

I'll go ahead and refactor this class into utility functions using the same
iterator API.

[1] http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html
On 18 May 2015 04:37, "Josh Warner" notifications@github.com wrote:

Just my 2 cents - I have no problem with Cython up through typed
memoryviews, but find the NumPy C API difficult to grok. Even if it's
possible using pointers and the C API presently, there's a reason few of
our algorithms go this route right now - it's a barrier both to
implementation and maintenance. An efficient and elegant Cython
implementation, even if just a wrapper (and even if slightly less
efficient), would let us realize these gains more generally across the
package.


Reply to this email directly or view it on GitHub
#1510 (comment)
.

@jaimefrio
Copy link
Contributor

Hi Vighnesh,

You may want to take a look at the Cython implementation of labeling in ndimage here. It uses PyArrayIterObject from Cython, and even has some examples of fused data type specialization to copy cast data from the iteration point into a contiguous buffer. A lot of it is just a convoluted mess of function pointers, that would be infinitely more clearly written in C directly, but I digress...

Anyway, @njsmith pointed you in the wrong direction, but with good intentions. ;-) PyArrayIterObject is indeed exposed as a Python object, only it is not np.nditer, but np.flatiter. As Nathaniel pointed out, you can call PyArray_ITER_NEXT on the return of np.flatiter. There are some issues with the way yu are accessing the data in the array, but I'll comment on that inline.

As you mentioned elsewhere, this is highly overlapping with @aman-iitj and his GSoC project. This said, I by no means want to push you away from working on this, but rather pull you in so that your work is coordinated with Aman's.


cdef inline cnp.float_t get_float(self):
""" Returns the current element as `float`. """
cdef cnp.float_t *address = <cnp.float_t*>self.iter.dataptr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately you cannot do this... At least not without previously checking that the array being iterated is aligned and its type not byteswapped. The dtype object has a pointer to a copyswap function, see nere, that can be used to copy data into an aligned buffer which you can then dereference without worries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jaimefrio
I plan to re visit this soon
Could you elaborate what you mean by aligned ? Or could you point me to some documentation which elaborates it ?

And I am guessing here by byteswapped you mean I should check for the endianness of the array ?

@stefanv
Copy link
Member

stefanv commented May 26, 2015

@jaimefrio If it can be written more clearly in C, shouldn't we do that and wrap the C with Cython?

Base automatically changed from master to main February 18, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants