Add page to documentation explaining coordinate conventions in skimage #1170

jni · 2014-09-23T02:16:41Z

There's been a fair bit of recent confusion regarding coordinate systems in scikit-image. In particular, see #1142 (including this comment) and #1140. I suggest we adopt (at least) the following conventions:

Numpy-style indexing is referred to as (row, col) or (dep, row, col): vertical axis first, then horizontal, origin is at top left. In the case of 3D, it's (somewhat unintuitively) depth first, then row, then column. Other possibilities are (row, col, ch) and (dep, row, col, ch).
Matplotlib/Cartesian-style indexing has horizontal axis first, then vertical, then optionally "height", and should be referred to as xy or xyz. The origin is at the bottom-left.
Another convention is horizontal, then vertical, but with the origin at the top left. This could (should?) be called (col, row).

We should strive to keep this consistent throughout the library to avoid confusion (see above issues), and we should add a page in the docs defining this, just as we do with data types and ranges.

stefanv · 2014-09-23T13:36:53Z

(dep, row, col) does not fit intuitively with the idea of 3D data (as you point out).

jni · 2014-09-23T14:23:46Z

@stefanv, unfortunately I think it's what makes the most sense when using C-order arrays, since then im[z] gives you a (row, col) image that is a contiguous array. Usually (dep) has a different voxel spacing compared to row-col, and so it should be the leading dimension in C-order arrays (and the trailing dimension in Fortran-order ones). This mirrors the discussion we had about ZYX in SLIC. (which will need to be updated according to this discussion.)

stefanv · 2014-09-24T10:19:43Z

Hasn't that ship sailed already? scikit-image already uses MxNx3 for color images. Is what you're saying that, for spatial data, we should use a different ordering of axes?

ahojnnes · 2014-09-24T11:25:22Z

Imo we should stay with MxNxDepth, independent of rc vs xy

jni · 2014-09-24T11:55:37Z

@stefanv @ahojnnes it's completely arbitrary what we call the axes. (row, col, channel) accurately describes what we do and is intuitive. For 3D grayscale data, whether we call something (row, col, depth) or (depth, row, col), it will not influence the behaviour of anything, only how we think of the volume. There are significant advantages to thinking about it in terms of (depth, row, col). For example, what prints out to the screen will mirror how we are thinking of the image:

In [1]: x = np.arange(60).reshape(3, 4, 5)

In [2]: x
Out[2]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

blink1073 · 2014-09-24T11:57:57Z

I agree that if the core object for scikit-image is numpy.ndarray, then our nomenclature should reflect how data is represented there (outside of masks of course).

stefanv · 2014-09-25T00:13:12Z

@jni I agree that there are advantages to doing it that way, it's just not the way we've been doing it (or calling things) so far. If you're saying that we have no algorithms in skimage at this point in time that depend on the (row, col, depth) format, then I guess we can change it without a problem (and hopefully that is the case since, as you said, naming should be arbitrary unless you do an operation that depends on knowing which way is "down").

jni · 2014-09-25T00:49:10Z

As far as I know there is no algorithm or implementation in skimage that depends on this. It is purely terminology and we get to choose now how we want to go forward.

ahojnnes · 2014-09-25T00:52:10Z

E.g. the whole warping framework depends on (row, col, channel).

jni · 2014-09-25T00:53:55Z

@ahojnnes I am not suggesting changing (row, col, channel) for 2D+c, I'm suggesting that we call 3D grayscale images dimensions (depth, row, col) and 3D+c images (depth, row, col, channel).

ahojnnes · 2014-09-25T00:59:31Z

@jni OK, but I would name it (Z, Y, X, C) then... ? The row, col <=> depth sounds / looks inconsistent... at least to my ears / eyes :-)

ahojnnes · 2014-09-25T01:00:36Z

This would also fit with the warps, since they use (Y, X, C)...

jni · 2014-09-25T01:11:24Z

ZYX has the problem of the location of the origin, which some would assume to be on the bottom left of each plane. (row, col) makes it clear that we are looking at it in numpy order. I agree that "depth" is a bit off but I think we need something that maintains "row col" but adds a third dimension.

blink1073 · 2014-09-25T01:42:30Z

How about page, row, col? I hate to use Matlab as an example again, but....

jni · 2014-09-25T02:23:12Z

@blink1073 much like you, I hate to admit it, but I actually do like that terminology... =)

JDWarner · 2014-09-25T03:05:50Z

This could be my physics roots showing, but I always tightly link each spatial dimension to the increasing NumPy array rank. IMHO, the minor potential gains from c-contiguity order (z, y, x) are far outweighed by making fewer dangerous mistakes when coding. From experience, tightly coupling (x, y, z) to array rank (0, 1, 2) loses me much less time in debugging. It's also more generalizable, which leads to cleaner and more maintainable code. The only real downside is you can't use the shortcut arr[slice], requiring arr[:, :, slice] or arr[..., slice]

To be clear, (x, y, z) here represent a right-handed coordinate frame as follows (z is coming out of your screen):

     o------------> y
    /|
   / |
  /  |
z    |
     v
     x

(row, col, page) has one significant downside when used in docs. In my experience, people will intuitively assume page increments in the wrong direction! They think the zero index is the "top" page, and the rest are buried behind it in a stack. That would define a left-handed, incorrect, coordinate frame which inevitably results in confusion. This isn't just users - MATLAB themselves perpetuate this in their docs (see above page linked by @blink1073)! Even the pros screw this up.

IMHO our most important concerns

Pick a reference frame that is right handed, define it robustly up front, and use it consistently
Take caution when choosing words rather than abstractions to represent axes, as they may carry unintentional inferential baggage
Balance generalizability/cleanliness from an algorithmic/code maintenance standpoint with user expectations.

jni · 2014-09-25T06:04:19Z

@JDWarner nice writeup, as usual! But in this rare occasion I disagree on a few counts...

Firstly, the motivation for this post, find_contours returns (y,x) instead of (x,y) pairs #1140, shows that tying (x, y, z) to the numpy indexing convention can result in confusion, because many, many users will think of x as the horizontal and y as the vertical, with the origin on the bottom left. This is also what mpl uses. Therefore, we need a clear way to distinguish our more common (rr, cc) indexing from this. I don't see it in your proposal. (Indeed, this is "unintentional inferential baggage", to use your term. =)
Is there anything intrinsically wrong with using a left-handed reference frame? As you said, many people intrinsically assume this anyway. And I very much like the volume[page] notation to grab an individual slice.
Furthermore, the speed of many operations will depend on the contiguity of your slices, sometimes by a factor of 2x, sometimes more, so if you want to encourage volume[:, :, z] notation, we should switch to order='F' arrays... Which I'm not excited about doing.

Of the above, I would say that (1) is really the most important issue. I'd be ok with getting voted down on issue (2). On issue (3) I'm torn. I agree that maintainability trumps performance, but only to an extent, and it's getting to the point where we are getting regular complaints on the mailing list about our performance. So, I think it'd be good to place a bit more emphasis on the speed of scikit-image functions than we have in the past.

jni · 2014-09-25T23:56:16Z

Oh, PIL is another place where (x, y) doesn't match numpy's usage. (Looking at some things there now.)

blink1073 · 2014-09-26T00:36:01Z

I asked my wife (an Aero engineer and photographer). She agreed with row, col, and suggested "layer", which is what PhotoShop calls it.

jni · 2014-09-26T02:05:49Z

@blink1073 did she have an opinion on the ordering of things? =)

blink1073 · 2014-09-26T02:39:12Z

She prefers row, col, layer. Also, if I load an image using PIL or feed one into OpenCV, the layer is last.

blink1073 · 2014-09-26T02:40:05Z

The only downside is having to type img[:, :, 0] to get the red layer.

blink1073 · 2014-09-26T02:43:10Z

And also the speed penalty you mentioned... We could use np.asfortranarray within functions where speed is of import.

jni · 2014-09-26T03:07:01Z

@blink1073 as I have mentioned repeatedly, channels != layers/pages/depth!

blink1073 · 2014-09-26T03:16:51Z

You lost me Juan. I thought we were keeping channel out of this discussion because it is used by warps (even though that's what we use at work to describe them with our hyperspectral cameras...)

jni · 2014-09-26T03:18:53Z

So why are you mentioning "the red layer"?

blink1073 · 2014-09-26T03:40:49Z

Ha! So channel == color/wavelength /alpha?

JDWarner · 2014-09-26T03:42:54Z

^ This right here, is exactly how implicit meanings in an axis label lead to confusion!

JDWarner · 2014-09-26T03:46:25Z

My personal vote would be for (i, j, k) or (m, n, p) in an abstracted, truly right handed coordinate system.

I'm not sure forcing Fortran vs. C-contiguous arrays is as bad as you're making out. Any Cython function which uses typed memoryviews should have a Python wrapper with np.ascontiguousarray already; adding order=F and changing arr[:, :, ::1] to arr[::1, :, :] is a minimal change, if necessary. But is a hypothetical 2x speed difference worth long-term maintainability?

jni · 2014-09-26T07:06:58Z

@blink1073 yes

@JDWarner all of our nD/3D functions transparently allow abstract coordinates, and you can put the (page/plane/depth) at the end if you like, no problem. The question is how we prefer users to think and how we prefer our own code to be arranged. I personally strongly prefer volume[page] to volume[..., page] and I get performance as a bonus.

Incidentally, the performance difference is not hypothetical, it's real, check the mailing list.

Changing our array order convention is not as trivial as you think: you might also need to change your inner/outer for loops, and you'll break with numpy convention, which is our strongest asset.

(i, j, k) is clean, but it is not as useful or self-documenting as (row, col, page) or (page, row, col), when these are used correctly. I'm pretty sure with a bit more effort we can figure out what "correctly" means. =)

A bit of context about my thinking about performance. You'll find old statements from me on PRs and on the mailing list that readability is much more important than performance, etc. However, I think we are now at a critical moment in Scientific Python adoption. I think the only thing that can stop Python from becoming the next Perl will be its ability to be as performant as the up-and-comers such as Go and Julia. I think nonchalance about performance has got Python to where it is: famous for being slow. (My own gala library also suffers from this.) If we are going to stop that, we need to be at least a bit careful with what we do. Using the right array ordering and notation for the task is an easy thing to get right.

blink1073 · 2014-09-28T02:56:54Z

The C-order does not help us on RGB images, since we'd always have to access each channel as img[:, :, ch]. Given this, having to use img_stack[:, :, page, ch] doesn't seem so bad...

jni · 2014-09-28T03:29:47Z

In [1]: a = np.ones(128, 128, 128, 3)
In [4]: def corder(arr):
   ...:     for z in range(128):
   ...:         b = arr[z, :, :, 0] * 3
In [5]: def forder(arr):
   ...:     for z in range(128):
   ...:         b = arr[:, :, z, 0] * 3
In [7]: %timeit corder(a)
100 loops, best of 3: 5.27 ms per loop
In [8]: %timeit forder(a)
10 loops, best of 3: 24.7 ms per loop

blink1073 · 2014-09-28T09:52:25Z

Yikes, switching the order internally definitely does not scale well:

In [2]: a = np.ones((128, 128, 128, 3))

In [3]: def corder(arr):
   ...:     for z in range(128):
   ...:         b = arr[z, :, :, 0] * 3
   ...:

In [4]: def forder(arr):
   ...:     arr = np.asfortranarray(arr)
   ...:     for z in range(128):
   ...:         b = arr[:, :, z, 0] * 3
   ...:

In [5]: %timeit corder(a)
100 loops, best of 3: 5.04 ms per loop

In [6]: %timeit forder(a)
1 loops, best of 3: 116 ms per loop

I'm with you @jni, I think page should come first for major practicality reasons.

jni · 2014-09-29T05:22:15Z

@JDWarner @ahojnnes @stefanv Do the benchmarks above (which surprised even me in their magnitude) change your opinion about how we name/do things?

stefanv · 2014-09-30T04:28:23Z

@jni Oddly enough, I started feeling sympathy for your argument earlier this week when working with image stacks, and where typing stack[..., N] is pretty darn annoying (in addition to being slow). I regret that we cannot change everything at once, but such is the legacy. However, the Fortran order is one that somehow skipped my mind earlier--we can still survive the legacy (M, N, 3) by switching all internal repr's for color images to Fortran!

Anyway, I'm happy with using the first dimension as the depth of the stack, and as you say it happens to be nice that numpy also prints it that way. It also, ironically, allows a person to distinguish better between color and multi-dimensional data.

jni · 2014-09-30T05:29:05Z

@stefanv See benchmarks above: switching to Fortran for colour is probably not worth it, especially in 3D+c images. The entire algorithm would have to be rewritten to change outer loops to inner loops. If multichannel images need to process each channel separately, the best approach is probably to grap copies of each channel:

if multichannel:
    channel_images = [a[..., i].copy() for i in range(a.shape[-1])]
else:
    channel_images = [a]
out = []
for ch in channel_images:
    out.append(do_function(ch))

Or somesuch.

blink1073 · 2014-09-30T10:30:12Z

Sounds reasonable to me, making a local copy of a 2D array whenever we need to work with it:

In [13]: a = np.ones((128, 128, 3))

In [14]: def direct(a):
   ....:     for i in range(10):
   ....:         a[:, :, 0] += 10
   ....:

In [16]: def copy(a):
   ....:     b = a[:, :, 0].copy()
   ....:     for i in range(10):
   ....:         b += 10
   ....:

In [17]: %timeit direct(a)
1000 loops, best of 3: 267 µs per loop

In [18]: %timeit copy(a)
10000 loops, best of 3: 102 µs per loop

blink1073 · 2014-09-30T10:37:40Z

Copies are c-contiguous:

In [1]: a = np.ones((100, 100))

In [2]: a = a.T

In [3]: a.flags
Out[3]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [4]: b = a.copy()

In [5]: b.flags
Out[5]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

blink1073 · 2014-09-30T10:40:15Z

One more tidbit:

In [6]: %timeit np.ascontiguousarray(a)
100000 loops, best of 3: 16.1 µs per loop

In [7]: %timeit a.copy()
100000 loops, best of 3: 14.7 µs per loop

stefanv · 2014-09-30T12:50:20Z

I feel a utility function coming on: channels(image) (or something like it)
can return an iterator that copies out the bands, then we can avoid
repeating that pattern everywhere. Since we will organise the multi-layer
data appropriately, we won't need a function there.
On Sep 30, 2014 12:40 PM, "Steven Silvester" notifications@github.com
wrote:

One more tidbit:

In [6]: %timeit np.ascontiguousarray(a)100000 loops, best of 3: 16.1 µs per loop
In [7]: %timeit a.copy()100000 loops, best of 3: 14.7 µs per loop

—
Reply to this email directly or view it on GitHub
#1170 (comment)
.

jni · 2014-09-30T13:25:59Z

That's a nice function and very readable, imho! Add a "which" kwarg defaulting to None (all channels), in case we only need to work with one (avoid two copies).

stefanv · 2014-10-02T22:18:27Z

@jni Are you taking authorship of this one (for writing the page in the user guide as well as the utility function)?

jni · 2014-11-05T00:46:54Z

Oops, this fell by the wayside. I'll look into writing this.

emmanuelle · 2014-11-23T17:45:32Z

@jni one good place to integrate this piece of documentation could be the new section about images as numpy arrays that has just been merged in the user guide (PR #705)

stefanv · 2014-11-28T12:52:26Z

@emmanuelle @jni was at a workshop yesterday, so he's a bit behind, but I'm sure he'll get to it today :)

jni · 2014-12-03T03:13:08Z

@stefanv wrong again! =P I've scheduled this for next week.

Do we want to make 0.11 a Christmas present to the SciPy community? ;)

stefanv · 2014-12-03T10:07:34Z

@jni That is an excellent idea. Shall I organise an online sprint for the
13th of December to round off the release? That gives us enough time to
round up outstanding issues.
On Dec 3, 2014 5:13 AM, "Juan Nunez-Iglesias" notifications@github.com
wrote:

@stefanv https://github.com/stefanv wrong again! =P I've scheduled this
for next week.

Do we want to make 0.11 a Christmas present to the SciPy community? ;)

—
Reply to this email directly or view it on GitHub
#1170 (comment)
.

blink1073 · 2014-12-03T10:17:42Z

+1 @stefanv

jni · 2014-12-03T10:48:53Z

@stefanv I love the idea of a sprint, but how are we dealing with time zones? =) I'm out on the 13th Australia time but could do the 14th morning, which corresponds with afternoon US time on the 13th, so that could work? But then it's like midnight in Europe. =) Where are you these days anyway? LOL

blink1073 · 2014-12-03T12:05:41Z

Juan everyone knows the only true time is whatever time it is in Boston.

stefanv · 2014-12-03T13:23:00Z

@jni Tag teams! We do a whole weekend.

jni · 2014-12-09T08:22:10Z

Completely random observation: using level, row, column yields the abbreviation lrc, which are all together in an outside-in sequence (pinky to middle finger) on the Dvorak keyboard layout. That's awesome. =D

CC @tonysyu =D

ahojnnes · 2014-12-23T13:13:37Z

What's the latest status here?

jni · 2014-12-23T13:19:04Z

@ahojnnes Being addressed by #1280, but I still have some work to do there...

jni · 2015-02-03T22:51:27Z

OMG Finally: closed by #1280! =D

jni mentioned this issue Sep 30, 2014

Add polygon perimeter drawing #1177

Merged

stefanv added this to the 0.11 milestone Nov 6, 2014

jni closed this as completed Feb 3, 2015

jni mentioned this issue May 3, 2018

Use consistent notation for array dimensions in the docstrings #3031

Merged

3 tasks

grlee77 mentioned this issue Aug 17, 2020

swap rectangle variable names #4926

Merged

Add page to documentation explaining coordinate conventions in skimage #1170

Add page to documentation explaining coordinate conventions in skimage #1170

Comments

jni commented Sep 23, 2014 • edited

stefanv commented Sep 23, 2014

jni commented Sep 23, 2014

stefanv commented Sep 24, 2014

ahojnnes commented Sep 24, 2014

jni commented Sep 24, 2014

blink1073 commented Sep 24, 2014

stefanv commented Sep 25, 2014

jni commented Sep 25, 2014

ahojnnes commented Sep 25, 2014

jni commented Sep 25, 2014

ahojnnes commented Sep 25, 2014

ahojnnes commented Sep 25, 2014

jni commented Sep 25, 2014

blink1073 commented Sep 25, 2014

jni commented Sep 25, 2014

JDWarner commented Sep 25, 2014

jni commented Sep 25, 2014

jni commented Sep 25, 2014

blink1073 commented Sep 26, 2014

jni commented Sep 26, 2014

blink1073 commented Sep 26, 2014

blink1073 commented Sep 26, 2014

blink1073 commented Sep 26, 2014

jni commented Sep 26, 2014

blink1073 commented Sep 26, 2014

jni commented Sep 26, 2014

blink1073 commented Sep 26, 2014

JDWarner commented Sep 26, 2014

JDWarner commented Sep 26, 2014

jni commented Sep 26, 2014

blink1073 commented Sep 28, 2014

jni commented Sep 28, 2014

blink1073 commented Sep 28, 2014

jni commented Sep 29, 2014

stefanv commented Sep 30, 2014

jni commented Sep 30, 2014

blink1073 commented Sep 30, 2014

blink1073 commented Sep 30, 2014

blink1073 commented Sep 30, 2014

stefanv commented Sep 30, 2014

jni commented Sep 30, 2014

stefanv commented Oct 2, 2014

jni commented Nov 5, 2014

emmanuelle commented Nov 23, 2014

stefanv commented Nov 28, 2014

jni commented Dec 3, 2014

stefanv commented Dec 3, 2014

blink1073 commented Dec 3, 2014

jni commented Dec 3, 2014

blink1073 commented Dec 3, 2014

stefanv commented Dec 3, 2014

jni commented Dec 9, 2014

ahojnnes commented Dec 23, 2014

jni commented Dec 23, 2014

jni commented Feb 3, 2015

jni commented Sep 23, 2014 •

edited