Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add page to documentation explaining coordinate conventions in skimage #1170

Closed
jni opened this issue Sep 23, 2014 · 63 comments
Closed

Add page to documentation explaining coordinate conventions in skimage #1170

jni opened this issue Sep 23, 2014 · 63 comments

Comments

@jni
Copy link
Member

jni commented Sep 23, 2014

There's been a fair bit of recent confusion regarding coordinate systems in scikit-image. In particular, see #1142 (including this comment) and #1140. I suggest we adopt (at least) the following conventions:

  • Numpy-style indexing is referred to as (row, col) or (dep, row, col): vertical axis first, then horizontal, origin is at top left. In the case of 3D, it's (somewhat unintuitively) depth first, then row, then column. Other possibilities are (row, col, ch) and (dep, row, col, ch).
  • Matplotlib/Cartesian-style indexing has horizontal axis first, then vertical, then optionally "height", and should be referred to as xy or xyz. The origin is at the bottom-left.
  • Another convention is horizontal, then vertical, but with the origin at the top left. This could (should?) be called (col, row).

We should strive to keep this consistent throughout the library to avoid confusion (see above issues), and we should add a page in the docs defining this, just as we do with data types and ranges.

@stefanv
Copy link
Member

stefanv commented Sep 23, 2014

(dep, row, col) does not fit intuitively with the idea of 3D data (as you point out).

@jni
Copy link
Member Author

jni commented Sep 23, 2014

@stefanv, unfortunately I think it's what makes the most sense when using C-order arrays, since then im[z] gives you a (row, col) image that is a contiguous array. Usually (dep) has a different voxel spacing compared to row-col, and so it should be the leading dimension in C-order arrays (and the trailing dimension in Fortran-order ones). This mirrors the discussion we had about ZYX in SLIC. (which will need to be updated according to this discussion.)

@stefanv
Copy link
Member

stefanv commented Sep 24, 2014

Hasn't that ship sailed already? scikit-image already uses MxNx3 for color images. Is what you're saying that, for spatial data, we should use a different ordering of axes?

@ahojnnes
Copy link
Member

Imo we should stay with MxNxDepth, independent of rc vs xy

@jni
Copy link
Member Author

jni commented Sep 24, 2014

@stefanv @ahojnnes it's completely arbitrary what we call the axes. (row, col, channel) accurately describes what we do and is intuitive. For 3D grayscale data, whether we call something (row, col, depth) or (depth, row, col), it will not influence the behaviour of anything, only how we think of the volume. There are significant advantages to thinking about it in terms of (depth, row, col). For example, what prints out to the screen will mirror how we are thinking of the image:

In [1]: x = np.arange(60).reshape(3, 4, 5)

In [2]: x
Out[2]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

@blink1073
Copy link
Member

I agree that if the core object for scikit-image is numpy.ndarray, then our nomenclature should reflect how data is represented there (outside of masks of course).

@stefanv
Copy link
Member

stefanv commented Sep 25, 2014

@jni I agree that there are advantages to doing it that way, it's just not the way we've been doing it (or calling things) so far. If you're saying that we have no algorithms in skimage at this point in time that depend on the (row, col, depth) format, then I guess we can change it without a problem (and hopefully that is the case since, as you said, naming should be arbitrary unless you do an operation that depends on knowing which way is "down").

@jni
Copy link
Member Author

jni commented Sep 25, 2014

As far as I know there is no algorithm or implementation in skimage that depends on this. It is purely terminology and we get to choose now how we want to go forward.

@ahojnnes
Copy link
Member

E.g. the whole warping framework depends on (row, col, channel).

@jni
Copy link
Member Author

jni commented Sep 25, 2014

@ahojnnes I am not suggesting changing (row, col, channel) for 2D+c, I'm suggesting that we call 3D grayscale images dimensions (depth, row, col) and 3D+c images (depth, row, col, channel).

@ahojnnes
Copy link
Member

@jni OK, but I would name it (Z, Y, X, C) then... ? The row, col <=> depth sounds / looks inconsistent... at least to my ears / eyes :-)

@ahojnnes
Copy link
Member

This would also fit with the warps, since they use (Y, X, C)...

@jni
Copy link
Member Author

jni commented Sep 25, 2014

ZYX has the problem of the location of the origin, which some would assume to be on the bottom left of each plane. (row, col) makes it clear that we are looking at it in numpy order. I agree that "depth" is a bit off but I think we need something that maintains "row col" but adds a third dimension.

@blink1073
Copy link
Member

How about page, row, col? I hate to use Matlab as an example again, but....

@jni
Copy link
Member Author

jni commented Sep 25, 2014

@blink1073 much like you, I hate to admit it, but I actually do like that terminology... =)

@JDWarner
Copy link
Contributor

This could be my physics roots showing, but I always tightly link each spatial dimension to the increasing NumPy array rank. IMHO, the minor potential gains from c-contiguity order (z, y, x) are far outweighed by making fewer dangerous mistakes when coding. From experience, tightly coupling (x, y, z) to array rank (0, 1, 2) loses me much less time in debugging. It's also more generalizable, which leads to cleaner and more maintainable code. The only real downside is you can't use the shortcut arr[slice], requiring arr[:, :, slice] or arr[..., slice]

To be clear, (x, y, z) here represent a right-handed coordinate frame as follows (z is coming out of your screen):

     o------------> y
    /|
   / |
  /  |
z    |
     v
     x

(row, col, page) has one significant downside when used in docs. In my experience, people will intuitively assume page increments in the wrong direction! They think the zero index is the "top" page, and the rest are buried behind it in a stack. That would define a left-handed, incorrect, coordinate frame which inevitably results in confusion. This isn't just users - MATLAB themselves perpetuate this in their docs (see above page linked by @blink1073)! Even the pros screw this up.

IMHO our most important concerns

  1. Pick a reference frame that is right handed, define it robustly up front, and use it consistently
  2. Take caution when choosing words rather than abstractions to represent axes, as they may carry unintentional inferential baggage
  3. Balance generalizability/cleanliness from an algorithmic/code maintenance standpoint with user expectations.

@jni
Copy link
Member Author

jni commented Sep 25, 2014

@JDWarner nice writeup, as usual! But in this rare occasion I disagree on a few counts...

  1. Firstly, the motivation for this post, find_contours returns (y,x) instead of (x,y) pairs #1140, shows that tying (x, y, z) to the numpy indexing convention can result in confusion, because many, many users will think of x as the horizontal and y as the vertical, with the origin on the bottom left. This is also what mpl uses. Therefore, we need a clear way to distinguish our more common (rr, cc) indexing from this. I don't see it in your proposal. (Indeed, this is "unintentional inferential baggage", to use your term. =)
  2. Is there anything intrinsically wrong with using a left-handed reference frame? As you said, many people intrinsically assume this anyway. And I very much like the volume[page] notation to grab an individual slice.
  3. Furthermore, the speed of many operations will depend on the contiguity of your slices, sometimes by a factor of 2x, sometimes more, so if you want to encourage volume[:, :, z] notation, we should switch to order='F' arrays... Which I'm not excited about doing.

Of the above, I would say that (1) is really the most important issue. I'd be ok with getting voted down on issue (2). On issue (3) I'm torn. I agree that maintainability trumps performance, but only to an extent, and it's getting to the point where we are getting regular complaints on the mailing list about our performance. So, I think it'd be good to place a bit more emphasis on the speed of scikit-image functions than we have in the past.

@jni
Copy link
Member Author

jni commented Sep 25, 2014

Oh, PIL is another place where (x, y) doesn't match numpy's usage. (Looking at some things there now.)

@blink1073
Copy link
Member

I asked my wife (an Aero engineer and photographer). She agreed with row, col, and suggested "layer", which is what PhotoShop calls it.

@jni
Copy link
Member Author

jni commented Sep 26, 2014

@blink1073 did she have an opinion on the ordering of things? =)

@blink1073
Copy link
Member

She prefers row, col, layer. Also, if I load an image using PIL or feed one into OpenCV, the layer is last.

@blink1073
Copy link
Member

The only downside is having to type img[:, :, 0] to get the red layer.

@blink1073
Copy link
Member

And also the speed penalty you mentioned... We could use np.asfortranarray within functions where speed is of import.

@jni
Copy link
Member Author

jni commented Sep 26, 2014

@blink1073 as I have mentioned repeatedly, channels != layers/pages/depth!

@blink1073
Copy link
Member

You lost me Juan. I thought we were keeping channel out of this discussion because it is used by warps (even though that's what we use at work to describe them with our hyperspectral cameras...)

@jni
Copy link
Member Author

jni commented Sep 26, 2014

So why are you mentioning "the red layer"?

@blink1073
Copy link
Member

Ha! So channel == color/wavelength /alpha?

@JDWarner
Copy link
Contributor

^ This right here, is exactly how implicit meanings in an axis label lead to confusion!

@JDWarner
Copy link
Contributor

My personal vote would be for (i, j, k) or (m, n, p) in an abstracted, truly right handed coordinate system.

I'm not sure forcing Fortran vs. C-contiguous arrays is as bad as you're making out. Any Cython function which uses typed memoryviews should have a Python wrapper with np.ascontiguousarray already; adding order=F and changing arr[:, :, ::1] to arr[::1, :, :] is a minimal change, if necessary. But is a hypothetical 2x speed difference worth long-term maintainability?

@jni
Copy link
Member Author

jni commented Sep 26, 2014

@blink1073 yes

@JDWarner all of our nD/3D functions transparently allow abstract coordinates, and you can put the (page/plane/depth) at the end if you like, no problem. The question is how we prefer users to think and how we prefer our own code to be arranged. I personally strongly prefer volume[page] to volume[..., page] and I get performance as a bonus.

Incidentally, the performance difference is not hypothetical, it's real, check the mailing list.

Changing our array order convention is not as trivial as you think: you might also need to change your inner/outer for loops, and you'll break with numpy convention, which is our strongest asset.

(i, j, k) is clean, but it is not as useful or self-documenting as (row, col, page) or (page, row, col), when these are used correctly. I'm pretty sure with a bit more effort we can figure out what "correctly" means. =)

A bit of context about my thinking about performance. You'll find old statements from me on PRs and on the mailing list that readability is much more important than performance, etc. However, I think we are now at a critical moment in Scientific Python adoption. I think the only thing that can stop Python from becoming the next Perl will be its ability to be as performant as the up-and-comers such as Go and Julia. I think nonchalance about performance has got Python to where it is: famous for being slow. (My own gala library also suffers from this.) If we are going to stop that, we need to be at least a bit careful with what we do. Using the right array ordering and notation for the task is an easy thing to get right.

@blink1073
Copy link
Member

The C-order does not help us on RGB images, since we'd always have to access each channel as img[:, :, ch]. Given this, having to use img_stack[:, :, page, ch] doesn't seem so bad...

@jni
Copy link
Member Author

jni commented Sep 28, 2014

In [1]: a = np.ones(128, 128, 128, 3)
In [4]: def corder(arr):
   ...:     for z in range(128):
   ...:         b = arr[z, :, :, 0] * 3
In [5]: def forder(arr):
   ...:     for z in range(128):
   ...:         b = arr[:, :, z, 0] * 3
In [7]: %timeit corder(a)
100 loops, best of 3: 5.27 ms per loop
In [8]: %timeit forder(a)
10 loops, best of 3: 24.7 ms per loop

@blink1073
Copy link
Member

Yikes, switching the order internally definitely does not scale well:

In [2]: a = np.ones((128, 128, 128, 3))

In [3]: def corder(arr):
   ...:     for z in range(128):
   ...:         b = arr[z, :, :, 0] * 3
   ...:

In [4]: def forder(arr):
   ...:     arr = np.asfortranarray(arr)
   ...:     for z in range(128):
   ...:         b = arr[:, :, z, 0] * 3
   ...:

In [5]: %timeit corder(a)
100 loops, best of 3: 5.04 ms per loop

In [6]: %timeit forder(a)
1 loops, best of 3: 116 ms per loop

I'm with you @jni, I think page should come first for major practicality reasons.

@jni
Copy link
Member Author

jni commented Sep 29, 2014

@JDWarner @ahojnnes @stefanv Do the benchmarks above (which surprised even me in their magnitude) change your opinion about how we name/do things?

@stefanv
Copy link
Member

stefanv commented Sep 30, 2014

@jni Oddly enough, I started feeling sympathy for your argument earlier this week when working with image stacks, and where typing stack[..., N] is pretty darn annoying (in addition to being slow). I regret that we cannot change everything at once, but such is the legacy. However, the Fortran order is one that somehow skipped my mind earlier--we can still survive the legacy (M, N, 3) by switching all internal repr's for color images to Fortran!

Anyway, I'm happy with using the first dimension as the depth of the stack, and as you say it happens to be nice that numpy also prints it that way. It also, ironically, allows a person to distinguish better between color and multi-dimensional data.

@jni
Copy link
Member Author

jni commented Sep 30, 2014

@stefanv See benchmarks above: switching to Fortran for colour is probably not worth it, especially in 3D+c images. The entire algorithm would have to be rewritten to change outer loops to inner loops. If multichannel images need to process each channel separately, the best approach is probably to grap copies of each channel:

if multichannel:
    channel_images = [a[..., i].copy() for i in range(a.shape[-1])]
else:
    channel_images = [a]
out = []
for ch in channel_images:
    out.append(do_function(ch))

Or somesuch.

@blink1073
Copy link
Member

Sounds reasonable to me, making a local copy of a 2D array whenever we need to work with it:

In [13]: a = np.ones((128, 128, 3))

In [14]: def direct(a):
   ....:     for i in range(10):
   ....:         a[:, :, 0] += 10
   ....:

In [16]: def copy(a):
   ....:     b = a[:, :, 0].copy()
   ....:     for i in range(10):
   ....:         b += 10
   ....:

In [17]: %timeit direct(a)
1000 loops, best of 3: 267 µs per loop

In [18]: %timeit copy(a)
10000 loops, best of 3: 102 µs per loop

@blink1073
Copy link
Member

Copies are c-contiguous:

In [1]: a = np.ones((100, 100))

In [2]: a = a.T

In [3]: a.flags
Out[3]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [4]: b = a.copy()

In [5]: b.flags
Out[5]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

@blink1073
Copy link
Member

One more tidbit:

In [6]: %timeit np.ascontiguousarray(a)
100000 loops, best of 3: 16.1 µs per loop

In [7]: %timeit a.copy()
100000 loops, best of 3: 14.7 µs per loop

@stefanv
Copy link
Member

stefanv commented Sep 30, 2014

I feel a utility function coming on: channels(image) (or something like it)
can return an iterator that copies out the bands, then we can avoid
repeating that pattern everywhere. Since we will organise the multi-layer
data appropriately, we won't need a function there.
On Sep 30, 2014 12:40 PM, "Steven Silvester" notifications@github.com
wrote:

One more tidbit:

In [6]: %timeit np.ascontiguousarray(a)100000 loops, best of 3: 16.1 µs per loop
In [7]: %timeit a.copy()100000 loops, best of 3: 14.7 µs per loop


Reply to this email directly or view it on GitHub
#1170 (comment)
.

@jni
Copy link
Member Author

jni commented Sep 30, 2014

That's a nice function and very readable, imho! Add a "which" kwarg defaulting to None (all channels), in case we only need to work with one (avoid two copies).

@stefanv
Copy link
Member

stefanv commented Oct 2, 2014

@jni Are you taking authorship of this one (for writing the page in the user guide as well as the utility function)?

@jni
Copy link
Member Author

jni commented Nov 5, 2014

Oops, this fell by the wayside. I'll look into writing this.

@stefanv stefanv added this to the 0.11 milestone Nov 6, 2014
@emmanuelle
Copy link
Member

@jni one good place to integrate this piece of documentation could be the new section about images as numpy arrays that has just been merged in the user guide (PR #705)

@stefanv
Copy link
Member

stefanv commented Nov 28, 2014

@emmanuelle @jni was at a workshop yesterday, so he's a bit behind, but I'm sure he'll get to it today :)

@jni
Copy link
Member Author

jni commented Dec 3, 2014

@stefanv wrong again! =P I've scheduled this for next week.

Do we want to make 0.11 a Christmas present to the SciPy community? ;)

@stefanv
Copy link
Member

stefanv commented Dec 3, 2014

@jni That is an excellent idea. Shall I organise an online sprint for the
13th of December to round off the release? That gives us enough time to
round up outstanding issues.
On Dec 3, 2014 5:13 AM, "Juan Nunez-Iglesias" notifications@github.com
wrote:

@stefanv https://github.com/stefanv wrong again! =P I've scheduled this
for next week.

Do we want to make 0.11 a Christmas present to the SciPy community? ;)


Reply to this email directly or view it on GitHub
#1170 (comment)
.

@blink1073
Copy link
Member

+1 @stefanv

@jni
Copy link
Member Author

jni commented Dec 3, 2014

@stefanv I love the idea of a sprint, but how are we dealing with time zones? =) I'm out on the 13th Australia time but could do the 14th morning, which corresponds with afternoon US time on the 13th, so that could work? But then it's like midnight in Europe. =) Where are you these days anyway? LOL

@blink1073
Copy link
Member

Juan everyone knows the only true time is whatever time it is in Boston.

@stefanv
Copy link
Member

stefanv commented Dec 3, 2014

@jni Tag teams! We do a whole weekend.

@jni
Copy link
Member Author

jni commented Dec 9, 2014

Completely random observation: using level, row, column yields the abbreviation lrc, which are all together in an outside-in sequence (pinky to middle finger) on the Dvorak keyboard layout. That's awesome. =D

CC @tonysyu =D

@ahojnnes
Copy link
Member

What's the latest status here?

@jni
Copy link
Member Author

jni commented Dec 23, 2014

@ahojnnes Being addressed by #1280, but I still have some work to do there...

@jni
Copy link
Member Author

jni commented Feb 3, 2015

OMG Finally: closed by #1280! =D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants