Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use `plane, row, col` notation in all docstrings #3031

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@soupault
Copy link
Member

soupault commented Apr 27, 2018

References

Closes #2276.

For reviewers

(Don't remove the checklist below.)

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@soupault soupault added this to the 0.14 milestone Apr 27, 2018

@soupault soupault requested a review from jni Apr 27, 2018

@@ -621,7 +621,7 @@ def threshold_mean(image):
Parameters
----------
image : (N, M[, ..., P]) ndarray
image : ([P, ..., ]N, M) ndarray

This comment has been minimized.

@jni

jni Apr 30, 2018

Contributor

Well, this (P, N, M) is inconsistent with ([P, ] M, N) above. And anyway I think the ordering of the letters should be alphabetical — the P here does not correspond to "planes" any more than N corresponds to "rows"...

template : (m, n[, d]) array
Template to locate. It must be `(m <= M, n <= N[, d <= D])`.
template : ([p, ]m, n) array
Template to locate. It must be `(m <= M, n <= N[, p <= P])`.

This comment has been minimized.

@jni

jni Apr 30, 2018

Contributor

The ordering of the inequalities should match the ordering of the dimensions in the shape.

@@ -46,7 +46,7 @@ def manual_polygon_segmentation(image, alpha=0.4, return_all=False):
Returns
-------
labels : array of int, shape ([Q, ]M, N)
labels : array of int, shape ([P, ]M, N)

This comment has been minimized.

@jni

jni Apr 30, 2018

Contributor

If we are trying to unify all of the docstring shapes, then specific letters should have specific meanings. In this case, the leading dimension is not "planes", it is, in a sense, replicas of the same image.

arr_in : (K, M, N[, C]) ndarray
An array representing an ensemble of `K` images of equal shape.
arr_in : (P, M, N[, C]) ndarray
An array representing an ensemble of `P` images of equal shape.

This comment has been minimized.

@jni

jni Apr 30, 2018

Contributor

K here is a specifically nice letter to signify that the 2D images are not meant to be interpreted as a 3D stack.

@jni

This comment has been minimized.

Copy link
Contributor

jni commented Apr 30, 2018

@soupault I appreciate the effort to make our docs more consistent, but I am not a fan of the convention here. "M, N, P" was chosen to signify arbitrary letters in a sequence, containing N which is the standard for "count of the number of elements". That P corresponds to planes is an unfortunate coincidence. I might suggest Np, Nr, Nc. However, I like the arbitrary letters because they expand to N dimensions (not just 2 and 3). What is the ... in the n-dimensional examples if we attach specific meaning to the letters?

So, I do want to unify all of the docstrings, but I would suggest (M, N), (M, N, P), and (M, N, ..., P) (with optional 3 or Nch where appropriate) everywhere. What do you think?

@soupault

This comment has been minimized.

Copy link
Member Author

soupault commented Apr 30, 2018

@jni Thank you for the feedback! I knew this PR would cause a discussion, so let's get straight to it :).

I would suggest (M, N), (M, N, P), and (M, N, ..., P)
I might suggest Np, Nr, Nc

Let's take a look together :) :

Case This PR Alphabetical order N_something
2/3D gray [P, ]M, N [M, ]N, P [Np, ]Nr, Nc
2+D gray [P, ..., ]M, N [M, ..., ]N, P [Np, ..., ]Nr, Nc
1+D gray [P, ..., ]M[, N] [M, ..., ]N[, P] [Np, ..., ]Nr[, Nc]
1+D gray, gray/multichannel [P, ..., ]M[, N][, C] [M, ..., ]N[, P][, C] [Np, ..., ]Nr[, Nc][, Nch]
2D multichannel (M, N, C) (M, N, C) (Nr, Nc, Nch)
3D gray/multichannel (P, M, N[, C]) (M, N, P[, C]) (Np, Nr, Nc[, Nch])

Which convention looks more consistent to you? As it is, I like This PR more (who knew, right :D ) and also N_something (although, it is visually heavy).

In my opinion, when a-b order is used, the meaning of different letters completely dissapears. In the proposed approach, M more or less stands for rows, N - for columns, and P, ... for planes (whatever this means in different contexts: spatial planes, images in a stack, etc). For example, from [P, ..., ]M[, N][, C] I can infer (using the association rule from the above) that the function takes either column-vector, or 2d (row*column) array, or arbitrary nD stack of such 2d arrays. A-b [M, ..., ]N[, P][, C] | makes this inference harder to me. Notice, that many of skimage functions take as input 2D arrays only, and the notation there is (M, N). So, I've, kind of, built these association rules in my head in a natural way. :)

Another point is that in [P, ..., ], ... could imply P, Q, R, ..., while in [M, ..., ]N ... is counter-intuitive to unfold.

Regarding your other comments:

  • thresholding, template - yes, I agree;
  • montage, manual_segmentation - I think that P is fine there (images in the stack could be considered as planes), but the old notation also makes sense.

To sum up:
In my opinion there is nothing wrong in using M, N, P, C as soft aliases to rows, cols, planes, channels dimensions, but I see a point in your (let's call it) mathematical notation, and will be glad to follow any consistent option.

@jni

This comment has been minimized.

Copy link
Contributor

jni commented May 1, 2018

@soupault thanks for your awesome table! =)

There's a fourth column, which is a modified version of this PR:

  • P, M, N when there is a 3-spatial-dimensions image, K, M, N when it's a collection of 2D images

I know you don't think this distinction is necessary, but for me it's helpful so I thought I'd bring it to the forefront for the discussion. =)

Looking at the first two columns, I did realise that I don't have a major preference, so if others vote for column 1, I will happily yield.

@soupault

This comment has been minimized.

Copy link
Member Author

soupault commented May 1, 2018

Alright, so we have a tie here. :)
@scikit-image/core does anyone have a strong opinion on any of the alternatives?

@JDWarner

This comment has been minimized.

Copy link
Contributor

JDWarner commented May 1, 2018

I agree with @jni, pertaining to the distinction between a rank-3 array intended to represent a 3D volume and a rank-3 array which represents channels or a collection of images with two spatial dimensions.

The ship has sailed and I realize why we opted to place z-axis in front of the other spatial axes for performance, but it remains unfortunate from an accessibility standpoint. Users understand row, column, (plane) but when you lead with plane, it is significantly less clear.

As far as overall approach is concerned, consider we have multiple years of conference tutorials and other instruction materials which have been given surrounding the row/col convention. Row/Col is massively beneficial for any users without a MATLAB background. I do think we should retain r/c for consistency; thus, I actually favor the overall 'N_something' approach. But, instead of capitalizing (thus emphasizing) N, what if we reversed it - the first line in the table above then becomes [nP, ]nR, nC.

@kne42

This comment has been minimized.

Copy link
Contributor

kne42 commented May 2, 2018

ZYX/PRC notation is also unfortunate from a geometric standpoint, especially pertaining to n-dimensional quasipolar coordinates (see section 2.2 for definition).

In this definition, x1, x2, and x3 pertain to the y, x, and z dimensions, respectively. While alphabetically out of order, this makes sense if we look at how matrices are typically represented in terms of the xy-coordinate system: row corresponds to y and column corresponds to x. That is, a matrix M has a value v = M[y, x] at cartesian coordinates (x, y). In other words, y is the first dimension, and x is the second dimension in matrices.

By extension, x_ncorresponds to the nth dimension, making z the third dimension. It logically follows that in an xyz-coordinate system, a matrix M has a value v = M[y, x, z] at cartesian coordinates (x, y, z) and that additional entries should follow the form Nr, Nc, [..., Np].

Placing planes before rows and columns disrupts this correspondence between dimensions ordering and their quasipolar coordinate index. Besides it just generally being confusing and different from what most users are used to.

I don't know how relevant this is, but I just wanted to give my 2¢.

@JDWarner

This comment has been minimized.

Copy link
Contributor

JDWarner commented May 2, 2018

@kne42 you're right. That's what I was referencing. However, it turns out there are real performance gains when iterating over the last axis due to the way arrays are allocated in memory. There was a lengthy discussion on this matter in a past issue and performance won over accessibility; we've since moved to this convention. I dissented at the time, but bowed to the majority.

@grlee77

This comment has been minimized.

Copy link
Contributor

grlee77 commented May 2, 2018

There's a fourth column, which is a modified version of this PR:

P, M, N when there is a 3-spatial-dimensions image, K, M, N when it's a collection of 2D images

I know you don't think this distinction is necessary, but for me it's helpful so I thought I'd bring it to the >forefront for the discussion. =)

I also like having distinction of non-spatial axis (for image collections, medical image timeseries, etc), but I don't have any strong preference for any of the different columns in the table. The most important thing is that it is consistent across the library! Thanks @soupault for working on making that the case.

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented May 2, 2018

FWIW, the microscopy community has generally standardized on (x, y, z, c, t) for their axes: three spatial dimensions, then color/spectral channels, then time (or other index).

One of the nice things about the numpy striding system is that one can generally choose whatever makes most sense for the memory layout pretty much independently of whatever makes most sense for the indexing convention (i.e. what axes are listed in what order in an index expression). A properly-constructed set of allows basically any index order to fit with any memory order.

I thought it was a shame that skimage chose the matlab/C-array style of indexing convention originally rather than taking advantage of the richness of numpy indexing, and I still do think it's a shame, but Josh is right.

Anyhow, in my lab's microscopy code, we use clever striding so we can manipulate numpy arrays with the index order everyone is used to. Whenever we have to hand the arrays over to something more opinionated like skimage (scipy.ndimage doesn't usually make such strong assumptions), we just make sure to slice and transpose or whatnot as needed. Since we generally use the same "correct for performance" memory layout underneath as does skimage (and basically everyone else, with t is the slowest moving index, then z, y, x, and finally c -- though there is some disagreement about where c "ought" to go), these operations don't require actually shuffling any data around and so are basically zero-cost.

@jni

This comment has been minimized.

Copy link
Contributor

jni commented May 3, 2018

Hi @zpincus!

the microscopy community has generally standardized on (x, y, z, c, t) for their axes

That seems to me to be overstating the community's ability for standardisation. =P Just now I am working on a LIF (Leica Image Format) file with order (t, z, c, y, x). 🤦‍♂️ (Though when you think about the acquisition, this order makes sense.)

one can generally choose whatever makes most sense for the memory layout pretty much independently of whatever makes most sense for the indexing convention
[...]
we use clever striding so we can manipulate numpy arrays with the index order everyone is used to.
[...]
these operations don't require actually shuffling any data around and so are basically zero-cost.

Can you elaborate on this? Honestly, I would very much like it if we could do "the right thing" for any input array order. However, for many algorithms, doing the memory-optimal, axis-order-independent thing is harder than assuming C- or F-order. If I understand correctly, you'd have to pepper np.argsort(arr.strides) calls everywhere?

At any rate, C-order is the NumPy default. If F-order had been the default, I would have advocated for xyzt. As mentioned in our array order section, there are significant performance penalties from using the memory-unoptimised ordering — usually about 5x, sometimes higher. (~9x on the webpage example.) This isn't peanuts.

I also object to calling scikit-image "opinionated" about this: as far as I know (do raise issues if not!) our code works perfectly fine whether you use prc or xyz convention in your arrays. Just pass in the right voxel spacing to the relevant functions and you'll be golden. You might just wait 10 times longer, depending on the algorithm.

Finally, regarding the tradeoff between performance and readability: I actually think prc/zyx is more readable, at least in a SciPy context. We've been conditioned for decades to think that volume[:, :, z] is more readable than volume[z], but this is plainly absurd. Further, row-col coordinates allow direct indexing into a NumPy arrays. When we were a mix of xy and rc, we had to decide whether a function would take NumPy coordinates or "real" coordinates. Now they are equivalent. That's a massive benefit. (And note that this is also true of SciPy ndimage: the coordinate order in e.g. map_coordinates corresponds to the axis order in the array.)

Anyway, that's enough rehashing of old arguments from me. I'm just hoping to convince the dissenters a little bit that they maybe might enjoy this convention, rather than grudgingly accept it. =P

@kne42 note that Cartesian coordinates typically have the origin on the bottom left, not top left like matrices, so the analogy between xy (Cartesian) and yx (Matrix) is not perfect. This is part of the motivation for my push to the row/column terminology (which by the way is extremely common in matrix literature anyway, more so than y/x).

@kne42

This comment has been minimized.

Copy link
Contributor

kne42 commented May 4, 2018

@jni I'm actually totally fine with leading indexing with planes. My main point of discontent is with the conflation of row, col with y, x and plane with z. While I get that we have moved away from this convention ourselves, that may not be so apparent to a lot of the people who use this library.

Thus, when they see PRC, they think ZYX which is dangerous from a geometric standpoint as outlined in my earlier comment. PRC should convert to ZXY for geometric purposes. And while, again, we have moved away from axis convention, we should remind our users that if they were to convert a 3D input to an array, it should be of the shape (Z, X, Y) and for 2D arrays, (X, Y). This would mean that the only changes needed to be made for geometrical applications that relied on (Y, X, Z) is to reverse the order of the indices used.

Of course, the whole deal with this confusion in the first place is due to the inconsistent naming conventions that resulted in the 0th axis being labeled as y, the 1st as x, and the 2nd as z. Once all of this conventional ordeal is clarified, it can then be explained that [P, ..., ]M, N corresponds to axes [n, ..., ]1, 0.

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented May 4, 2018

@jni Hey there! I knew I shouldn't have opened my mouth without being better prepared :)

Anyhow, my main point is that there are (at least) three separate concepts that all too often get conflated when discussing array indexing, particular in the context of images:

  1. Layout of multidimensional data on disk / in file formats.
  2. Layout of multidimensional data in memory, especially as it relates to cache-optimal image-iteration strategies.
  3. Logical / display ordering of the indices with regard to each dimension.

So, when you talk about the Leica Image Format, that's really a case of concept 1, while discussion about array ordering (as in the skimage docs) is a case of concept 2. But why do those docs talk about what order the axes are labeled in (concept 3) as if it has anything to do with performance?

As long as you iterate over an image in order of fastest to slowest moving index (in memory), you get good performance regardless of how the numpy strides tuple is configured. A lot of people (not you, @jni, but many who should know better) don't seem to get this. But arbitrarily strided arrays are one of the particular bits that make numpy really great to use!! Moreover, numpy's C API has various nd iterator tools, so that one can iterate over an arbitrary numpy array in a memory-efficient fashion.

And in particular, numpy striding lets users break free from the tyrannical connection between concepts 2 and 3. Using striding, you can let an array be laid out in a a memory-optimal fashion while indexing it in whatever way is most appropriate to a given application. If an algorithm is smart about iteration (i.e. either using numpy's C iterators, or doing a bit of inspection on the stride array), then you can just pass your array to the algorithm and all is well. If the algorithm makes assumptions about memory layout matching the index ordering, then it's almost as easy: a quick transpose operation (or otherwise rearranging the axis order without actually touching the memory), and then you can pass off your numpy array to the algorithm.

So I think it's reasonable to state that many microscopy folks would generally like to logically index their data as (x, y, z, c, t) (which is more or less how ImageJ and the OME Data Model do it), but would prefer memory layouts to look like (slow to fast moving) TZYXC or TZCYX, depending on how the color data is acquired and used.

And if there are use-cases where it makes more sense to say arr[z] rather than arr[:,:,z], then let then use (z,x,y,c,t) or whatever. Let a thousand flowers bloom! Numpy striding allows it!

Further, row-col coordinates allow direct indexing into a NumPy arrays. When we were a mix of xy and rc, we had to decide whether a function would take NumPy coordinates or "real" coordinates. Now they are equivalent. That's a massive benefit. (And note that this is also true of SciPy ndimage: the coordinate order in e.g. map_coordinates corresponds to the axis order in the array.)

Here we are in 100% agreement. Array index ordering should be treated as authoritative. There should never be a distinction between "numpy coordinates" and "real coordinates". If you index array[10, 30], then the corresponding geometric point ought to be (10, 30).

But I think it's silly to insist that if I have an image that's stored in the standard scanline order in memory (i.e. row by row) that it's only "appropriate" to index that chunk of memory as [r, c] with a C striding convention, rather than [x, y] with a Fortran striding convention.

And what if I have an RGB image in scanline order (e.g. rows of pixels as (r,g,b) triplets)? In many (but not all!) cases, I'd personally prefer to index it as [x, y, c], which is neither Fortran, nor C, but again totally reasonable for a numpy array.

So, I do wish that skimage had made more of an effort up front to be stride-agnostic and coordinate-convention-agnostic. I never run into these problems when using scipy.ndimage, which also doesn't need pages of documentation about what various image axes "mean". (This is what I mean when I say that skimage is more opinionated.)

However, at the end of the day, it hardly matters whether I call convert_to_c_indexing(my_image) before passing an array to skimage, or whether skimage does that internally.(*) It would have been nice to push the burden of caring about memory layout onto the authors instead of users of the algorithms, but that's not a huge thing. (And maybe that will still be possible in a future version of skimage.)

(*) Where convert_to_c_indexing() is some notional function that basically does np.argsort(arr.strides) and constructs a new view on the same memory that will allow an algorithm to blindly assume that the last index is fastest-moving.

@stefanv

This comment has been minimized.

Copy link
Member

stefanv commented May 8, 2018

@zpincus I'm trying to fully comprehend your argument, so pardon any misunderstanding on my part. There are two typical situations: Python code that operate on an image, that expect the image to be indexable in a certain way ([p, r, c]) but makes few or no assumptions on memory layout, and Cython code, that asks for contiguous data with a specified memory layout. Now, it may be true that currently many of our algorithms are currently implemented that way—to make contiguous copies—for the sake of simplicity, but I don't suppose we are precluded from handling arbitrarily strided arrays in Cython. So, if the problem you describe is unnecessary internal copies, it feels like something we should be able to address?

I share your wish that we were more prescient ten years ago, but such is the challenge of living blinded by the present :)

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented May 9, 2018

@stefanv First off, I realize that this wasn't really the right forum for me to bring this up, so my apologies for derailing relevant discussion about this PR!

Nevertheless, I'll continue :)

My overall point is that I think the rc vs. xy argument is actually a canard: if you think about it, there are surprisingly few cases where well-written code should need to make any assumptions about the particular indexing convention a user is employing.

There are two typical situations: Python code that operate on an image, that expect the image to be indexable in a certain way ([p, r, c]) but makes few or no assumptions on memory layout, and Cython code, that asks for contiguous data with a specified memory layout.

Well put. I'd just note that these two situations often get conflated for some reason, with one being used to justify the other. But in numpy, these issues are much more separable than in C or Fortran or Matlab, where indexing is more closely tied to memory layout.

Regarding the first situation: I propose that it's a "code smell" for code that doesn't care about memory layout to have any expectations about how an array should be indexed. The main exceptions are (a) dealing with color/multichannel images, where the color dimension is semantically distinct in a way that a z or t axis is not; and (b) visualization tools.

But are there other justifiable cases where code "needs" to know what my axis labels are? I agree with @jni's point that functions should just directly index into numpy arrays. However, I disagree that this requires that all functions use a rc rather than xy convention: it merely requires that functions refrain wherever possible from asserting any convention. For example, scipy.ndimage functions work equally well whether I use xy indexing on a fortran-contiguous array or rc indexing on a C-contiguous one.

One might think that code that needs to receive / provide geometric coordinates (such as image drawing) must commit to using xy or rc coordinates, but even this isn't true. As long as the first geometric coordinate corresponds to the position along the first array axis (and so forth), then everything works out. Either ndimage.map_coordinates or skimage.measure.find_contours are decent example.

Regarding the second situation (where memory layout matters): I posit that it's rarely much more work for a function to iterate efficiently over an arbitrary strided array than to just iterate over a C-contiguous one. A few simple helper functions can make this pretty trivial, by e.g. making a view on the same memory in which the last index is guaranteed to be the fastest-moving in memory.

In particular, I think it's silly to insist on contiguity. Say I have a contiguous array, but then want to apply an algorithm to a slice of it, say arr[10:999, 30:750]. That sub-array isn't contiguous, but it can still be iterated over extremely efficiently with appropriate striding. (Indeed, this would be rather more efficiently than making a contiguous copy first...) Similarly, with an RGB image, algorithms that understand striding can perform efficient, zero-copy operations on individual color planes.

Given that cython is pretty smart about strided indexing, all that's required is a helper function to provide a view on any given array in which the axes are in slow-to-fast order. Then one merely needs to write cython where the iteration is nested in the same order (i.e. the current best practice).

@jni

This comment has been minimized.

Copy link
Contributor

jni commented May 9, 2018

@zpincus I have not helped things by going off on my layout efficiency tangent, but I think you have a misunderstanding of what xy/rc mean in the context of skimage. The point is not actually that we want to be prescriptive about axis order. (Actually if you look above I voted for the nomenclature with the least referential baggage, "M, N, P". Perhaps you want to cast your vote for that also? 😉)

The point is that we were not are not consistent in our axis convention. So, the transform module uses what we are calling here "xy" coordinates, that is, the first axis is the horizontal and the second is vertical, with the origin on the top-left, which doesn't match NumPy indexing any way you slice it. Before I started on this quest, this type of inconsistent coordinate use was scattered all over the library. Now I think only a few spots remain.

ndimage is wonderful, but certainly if you give it a Fortran-ordered array but use horizontal first axis coordinates, you will not get the results you think. Even when indexing a F-ordered array, in NumPy or ndimage, the first coordinate is the rows, and the second coordinate is the columns, and the only thing that changes is the optimal order of iteration.

The xy/rc distinction is necessary, therefore, so that at least people who are using the library one way or another know which functions require them to swap axes around. Perhaps in the far future we can redo all our docs so that we stop referring to "rows and columns" and instead we go for "axis 0 and axis 1". But is that really much better?

@soupault soupault force-pushed the soupault:plane_row_col branch from a40f81d to b94284b May 19, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 19, 2018

Codecov Report

Merging #3031 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3031      +/-   ##
=========================================
- Coverage    85.9%   85.9%   -0.01%     
=========================================
  Files         336     336              
  Lines       27301   27300       -1     
=========================================
- Hits        23454   23453       -1     
  Misses       3847    3847
Impacted Files Coverage Δ
skimage/segmentation/morphsnakes.py 97.54% <ø> (ø) ⬆️
skimage/filters/_frangi.py 97.05% <ø> (ø) ⬆️
skimage/measure/_marching_cubes_lewiner.py 87.5% <ø> (ø) ⬆️
skimage/feature/texture.py 77.57% <ø> (ø) ⬆️
skimage/feature/util.py 96.87% <ø> (ø) ⬆️
skimage/measure/_regionprops.py 97.8% <ø> (-0.01%) ⬇️
skimage/future/graph/rag.py 96.07% <ø> (ø) ⬆️
skimage/feature/template.py 100% <ø> (ø) ⬆️
skimage/measure/entropy.py 100% <ø> (ø) ⬆️
skimage/segmentation/active_contour_model.py 98.97% <ø> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10b0543...b94284b. Read the comment docs.

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented May 22, 2018

@jni I don't think there's really that much disagreement here. I also would prefer "M,N,P", or "axis 0...n" for minimal baggage, but I agree that failing that, the key thing is consistency. So the set of changes in this PR will be good on that front.

My main point in starting this conversation was to remind folks that numpy makes it pretty easy to index any particular chunk of memory in whatever way makes most sense for a given application. One can then change the indexing convention (without having to touch the actual data in memory) when handing off an array to other code that uses a different convention.

That said, I'm not sure I follow you here:

ndimage is wonderful, but certainly if you give it a Fortran-ordered array but use horizontal first axis coordinates, you will not get the results you think. Even when indexing a F-ordered array, in NumPy or ndimage, the first coordinate is the rows, and the second coordinate is the columns, and the only thing that changes is the optimal order of iteration.

Could you give an example of what you mean by this? I do this sort of thing every day and don't seem to have the problems you suggest I ought to.

Let's say that I have a single-channel image stored in the standard scanline order in memory (i.e. the pixels are laid out as contiguous rows in memory). Let's further say that the image is 640 pixels wide and 480 tall, and the dtype is uint8. Just so we're all on the same page, I will refer to the image coordinates in the standard convention used by Photoshop, ImageJ, or any other standard image viewer: (0,0) is top-left, (0, 640) is top-right, and (480, 0) is bottom-left. That is, using an (x, y) coordinate convention.

Using numpy, I can construct a Fortran-ordered view on this memory of shape (640, 480) with strides (1, 640). Indexing the image as image[x, y] will give me the exact same pixel value that I would get by hovering the cursor over position (x, y) in Photoshop or ImageJ.

Further, indexing this as image[30, :] gives me the 30th column of the image -- that is, if you opened the image in Photoshop, this would be the vertical strip of pixels which have x-coordinate = 30. Indexing as image[:, 30] gives the horizontal strip of pixels with y-coordinate = 30; i.e. the 30th row.

Using, say, ndimage.map_coordinates to transform the image also provides no surprises, as long as I make sure the x coordinates are first and the y coordinates are second. (I mean, where else would they be seeing as how I can index that array as [x, y]?)

So far so good, right? It certainly doesn't seem like "the first coordinate is the rows" here...

Moreover, all remains well in the world when I construct a C-ordered view on that same memory. This view would have shape (480, 640) and strides (480, 1). Then image[30, :] is the 30th row, and image[:, 30] is the 30th column. OK, it's a bit silly that image[y, x] gives me the pixel that I see listen in Photoshop as (x, y), but that's really not a big deal, it's just notation.

Moreover, if I use ndimage.map_coordinates to transform this image, everything still works, as long as I make sure that the y coordinates are first (and where else would they be, seeing as how I am indexing that array as [y, x]?)

What if I make a new C-ordered array, of shape (640, 480) and insist on indexing it as [x, y]? Incredibly, everything still works. Even ndimage.map_coordinates does exactly what one would expect. This image isn't stored in scanline order in memory, but that really doesn't matter until you try to save out the image or otherwise pass it to a library that only understands scanline-ordered memory. (Then you actually have to do a memory-reordering operation.)

The only thing that's a little odd is that when you print out a Fortran-ordered view of a scanline-ordered image in memory in the terminal, you get the transpose of what you'd see in photoshop. But for anything but toy 5-by-5 images or whatever, this is really not a big deal since you never actually do it.

The key point is that the issue of "Fortran" or "C" ordering (or more complex striding schemes) is about the indexing convention, which in numpy is more or less separate from the memory layout (i.e. is the image in some kind of scanline order or not), which itself is reasonably separate from how the image ought to be viewed in Photoshop or matplotlib or whatever. (Check out the TIFF spec sometime, if you dare, to see how distinct all of these aspects of an image are.)

Now, obviously memory layout drives the cache-efficient traversal order, but how or why that should have anything to do with the indexing convention or visualization is beyond me.

Literally the only places where numpy "cares" about the ordering is in array printing, and the fact that C order is default when creating new arrays. I do admit that the latter means that incautious users who want to use [x, y] indexing but forget to include order='F' when creating new arrays will generate non-scanline-ordered images in memory, which can require slow shuffling before handing that memory over to external libraries. But this is a pretty minor concern, since the main libraries we're talking about here are for image IO -- and if you're hitting the disk, the few ms required to reshuffle an image in memory is pretty minor in comparison.

So, given all of this, I really don't think I fully agree with your conclusion that

The xy/rc distinction is necessary, therefore, so that at least people who are using the library one way or another know which functions require them to swap axes around.

I'm pretty sure that the only functions for which this distinction really is necessary are visualization and image IO. Can you all think of other classes of functions that need to know whether axis 0 represents vertical or horizontal (or temporal, &c.) strips of pixels? Certainly rasterization / image-drawing functions don't need to know any of this. They just need to know what axes to draw along, not whether those axes are "rows" or "columns". (ndimage is awesome because it makes no commitments about what axes "mean". Functions that need to operate on only one (or a subset of) axes just take that as a parameter, rather than making assumptions.)

As such, I do honestly believe that "axis 0...n" is the best notation except for visualization and IO. And in those cases, I think the the best thing to do is for the functions to have a parameter that controls how axes are interpreted, rather than making a particular assumption.

@soupault soupault referenced this pull request May 22, 2018

Closed

Implementation of unsharp masking filter. #2772

0 of 7 tasks complete

@soupault soupault removed this from the 0.14 milestone May 29, 2018

@soupault soupault added this to the 0.14.1 milestone May 29, 2018

@jni

This comment has been minimized.

Copy link
Contributor

jni commented Jun 21, 2018

@zpincus

Wow.

Thank you for that amazing response.

You are, of course, right about your assessment that this whole confusion, my whole confusion, has to do with display of the arrays. That is, both Matplotlib and NumPy printing works in row/column order, and that had coloured my whole view about our coordinate systems. As you say, there's a few different issues mixed up here and they were all jumbled in my head until your post. (They are probably still a bit jumbled, but much less so. ;)

So, let me clarify, in light of this new understanding, why some parts of skimage are still incorrect: in parts of the transform module, coordinate [x, y] refers not to the coordinates x on axis0 and y axis1 of a NumPy array (that may or may not be displayed incorrectly by matplotlib). Rather, they refer to array[y, x]. See, for example, the dst coordinates in this example.

Do you agree that this is bad?

That is the primary problem that I tried to address in moving to r, c notation. I hope you agree that this example is problematic, even in the context of x, y coordinates. What your example shows is that it is not the notation that is wrong, but rather the axis/coordinate ordering.

Anyway, I think I'm going to put your vote down for "M, N, P". =P In my opinion, "N0, N1, N2" is too unwieldy for a docstring.

(Regarding scanline efficiency, that remains a problem with x/y, but you are right that it is a much more minor point.)

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented Jun 21, 2018

@jni No worries -- it took me years of strenuously thinking about this exact question to come to this viewpoint. It's far from obvious or intuitive... (The good news is that once one fully absorbs how numpy striding works for images, you can do amazing/absurd things. Like make an (x, y, 3, 3)-shaped view on an (x, y)-shaped image, where the sub-array [i, j, :, :] of the view is the 3x3 neighborhood centered on pixel [i, j] in the original image...)

Anyhow, yes, my vote would be for (M, N, P...) indexing. I agree that "n0, n1" etc is too unwieldy.

Further, I totally agree about your example being a clear instance of something that ought to be fixed: the library assumes that coordinates are specified in (x, y) tuples, but that the array will be indexed [y, x]. In contrast, this is where ndimage.map_coordinates shines. So fixing the transforms and so forth to work more like ndimage would be a big win for sure.

My only objection to r, c notation is that it still contains the implicit assumption that the first axis is "supposed" to be rows, and the second axis is "supposed" to be columns. But once things are consistent, changing from r, c to m, n is (mostly) just a find-and-replace. I don't think there's too much code, other than image IO and display, that would have to be changed to go from r, c (with the implicit assumption about what the axes mean), to m, n (where there is no assumption and you need to ask the user if a bit of code needs to care about what the axes mean).

@stefanv

This comment has been minimized.

Copy link
Member

stefanv commented Jun 21, 2018

For pedagogical purposes, it helps to have coordinates that make some intuitive sense. When we talk about rows, columns, depth, planes, etc. new users get it. If we tell them to think in terms of N-dimensional arrays and coordinates—well, that's all a bit overwhelming.

Is this benefit enough to justify talking about rows and columns? I don't know. The gap could perhaps be addressed effectively by a section in the user guide.

I suspect that we are currently making indexing/layout assumptions inside our Cython modules as well, so that's something to flag for future attention.

Anyway, these points are somewhat orthogonal to this PR—the convention introduced here seems perfectly fine.

@zpincus

This comment has been minimized.

Copy link
Contributor

zpincus commented Jun 21, 2018

I agree with Stefan; intuitive coordinates may be pedagogically helpful. So long as it's clearly stated for advanced users and more importantly devs to not rely on axes "meaning" anything in particular.

I also agree that this is more or less orthogonal to this PR, as well.

@kmario23

This comment has been minimized.

Copy link

kmario23 commented Jul 15, 2018

Just to add one more view: In deep learning, the conventions are:

NCHW - Chainer, CuDNN, PyTorch, TensorFlow, Torch
NHWC - TensorFlow
CHWN - CuDNN, Neon


  • N refers to the number of images in a batch.
  • H refers to the number of pixels in the vertical (height) dimension.
  • W refers to the number of pixels in the horizontal (width) dimension.
  • C refers to the channels. For example, 1 for black and white or grayscale and 3 for RGB.

@soupault soupault referenced this pull request Aug 19, 2018

Merged

FEAT: Flood fill in n-dimensions #3245

3 of 3 tasks complete

@soupault soupault modified the milestones: 0.14.1, 0.15 Aug 20, 2018

@jni

This comment has been minimized.

Copy link
Contributor

jni commented Aug 23, 2018

@soupault by my reading of the above discussion, (M, N, P) is the least contentious. Though I am of course a biased reader... ;) (Incidentally, the "symbolic dimension" section of the XND ndtypes docs fit this scheme: https://ndtypes.readthedocs.io/en/latest/ndtypes/types.html#symbolic-dimension)

@jni jni referenced this pull request Dec 17, 2018

Merged

Viewer window refactor #88

3 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.