Fixed ragged map with zero dim and Larger Iterating signal #2877

CSSFrancis · 2022-02-04T23:35:45Z

Description of the change

This is kind of a small change to fix some bugs that I have come across with the map function

Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.
Mapping now works when the iterating signal dimension is larger than the base signal

Progress of the PR

[ x] Change implemented (can be split into several points),
add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
[ x] add tests,
[x ] ready for review.

…ger than your signal

codecov · 2022-02-04T23:48:45Z

Codecov Report

Merging #2877 (38246a1) into RELEASE_next_minor (db5743b) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2877      +/-   ##
======================================================
+ Coverage               77.13%   77.24%   +0.11%     
======================================================
  Files                     206      206              
  Lines                   31868    31883      +15     
  Branches                 7161     7165       +4     
======================================================
+ Hits                    24582    24629      +47     
+ Misses                   5533     5500      -33     
- Partials                 1753     1754       +1

Impacted Files	Coverage Δ
hyperspy/signal.py	`74.02% <100.00%> (+0.22%)`	⬆️
hyperspy/learn/ornmf.py	`86.77% <0.00%> (-1.59%)`	⬇️
hyperspy/misc/utils.py	`84.20% <0.00%> (-0.54%)`	⬇️
hyperspy/_signals/lazy.py	`89.95% <0.00%> (-0.32%)`	⬇️
hyperspy/samfire.py	`64.46% <0.00%> (-0.32%)`	⬇️
hyperspy/drawing/image.py	`73.83% <0.00%> (-0.24%)`	⬇️
hyperspy/axes.py	`90.52% <0.00%> (-0.19%)`	⬇️
hyperspy/model.py	`80.59% <0.00%> (+0.23%)`	⬆️
hyperspy/samfire_utils/samfire_worker.py	`84.57% <0.00%> (+19.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db5743b...38246a1. Read the comment docs.

ericpre · 2022-02-05T13:07:03Z

Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.

This is an important and not a trivial change, can you elaborate on this?

Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

There are recent discussions on this topic:

and one important aspect is that we need to make sure that ragged and non-ragged signals behave consistently with map, slicing, etc. with respect to signal and navigation dimensions.

CSSFrancis · 2022-02-05T18:00:49Z

Sorry I should have given some examples, I submitted this a little quickly last night.

Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.

This is an important and not a trivial change, can you elaborate on this?

The issue is that ragged signals built using the constructor can possible have a signal_shape = () and navigation_shape = (). This results in an error when you use the map function on a ragged array with one dimension.

For example this block of code would fail(In this case the signal have a nav dimension of ( ) )

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x, ragged=True)
s.map(np.sum)

But this code block will not (In this case the signal have a nav dimension of 1)

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.map(np.sum)

The reason this fails is that the rechunking function try to rechunk a fundamentally 1 dimensional array with hyperspy trying to tell that it is a zero dimensional array.

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0

Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

Yea so the issue is this chunk of code doesn't run:

def add_sum(image, add):
    out = np.sum(image) + np.sum(add)
    return out
import hyperspy.api as hs
import numpy as np
x = np.ones((10,20,2,3))
s = hs.signals.Signal2D(x)
s_add = hs.signals.BaseSignal(2 * np.ones((10, 20, 2, 2, 2))).transpose(3)


s_out = s.map(add_sum, inplace=False,  add=s_add)# This doesn't work
s_out = s_add.map(add_sum, inplace=False,  add=s)# This does work

The issue here is that the map_blocks function needs to have the blocks all the same length otherwise it takes the longest arg passed (not the signal passed). There is a way around this using the blockwise function but that requires a little bit more setup and would require tearing apart much of the map function. I tried that for a little bit but decided against it because it broke some other parts of the map function.

There are recent discussions on this topic:

_assign_subclass: signal_shape (1,) now becomes BaseSignal #2773

Fix ragged array #2842

and one important aspect is that we need to make sure that ragged and non-ragged signals behave consistently with map, slicing, etc. with respect to signal and navigation dimensions.

Ahh, I must have missed the recent stuff. I think that this is more consistent than before. At the very least the errors thrown in the above cases are very uninformative and potentially confusing. There is a high likelyhood that few people have come across them, but I recently was trying to map a couple of ragged signals and came across both go these edge cases.

ericpre · 2022-02-06T18:38:46Z

Thanks for the details, now I see where does it come from. I think, we need to define what is the expected and desired behaviour.

For example this block of code would fail(In this case the signal have a nav dimension of ( ) )
import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x, ragged=True)
s.map(np.sum)
But this code block will not (In this case the signal have a nav dimension of 1)
import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.map(np.sum)

This looks to me like a bug: these two (slightly different) ways of creating a bug should give the same type of signals. What I would expect here is to have the same navigation shape regardless if the signal is ragged or not. In this case, the navigation dimension is zero so I would expect the navigation shape to be ().

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.
import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0

Same bug as mentioned directly (in this comment) above, s.inav[0] shouldn't be possible in this case.

Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

Yea so the issue is this chunk of code doesn't run:
def add_sum(image, add):
    out = np.sum(image) + np.sum(add)
    return out
import hyperspy.api as hs
import numpy as np
x = np.ones((10,20,2,3))
s = hs.signals.Signal2D(x)
s_add = hs.signals.BaseSignal(2 * np.ones((10, 20, 2, 2, 2))).transpose(3)


s_out = s.map(add_sum, inplace=False,  add=s_add)# This doesn't work
s_out = s_add.map(add_sum, inplace=False,  add=s)# This does work
The issue here is that the map_blocks function needs to have the blocks all the same length otherwise it takes the longest arg passed (not the signal passed). There is a way around this using the blockwise function but that requires a little bit more setup and would require tearing apart much of the map function. I tried that for a little bit but decided against it because it broke some other parts of the map function.

It looks to me that map should be adjusted to special case ragged array. I guess @magnunor would have a similar issue to adjust pyxem code for the new implementation of map in pyxem/pyxem#789.

CSSFrancis · 2022-02-06T22:42:19Z

This looks to me like a bug: these two (slightly different) ways of creating a bug should give the same type of signals. What I would expect here is to have the same navigation shape regardless if the signal is ragged or not. In this case, the navigation dimension is zero so I would expect the navigation shape to be ().

I guess that makes some sense, as long as the map function works on zero dimensional signals I think that it is not terribly important. I am still thinking that the array np.empty((1,),dtype=object should have a navigation dimension of 1 dimension. It is pretty inconsistent with numpy if that is not that case

print(np.empty((), dtype=object).shape) # ()
print(np.empty((1), dtype=object).shape) # (1,)
print(np.empty((2), dtype=object).shape) # (2,)

I doesn't seem like the second and third example should have different numbers of dimensions in hyperspy when in numpy they have the same number of dimensions.

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.
import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0
Same bug as mentioned directly (in this comment) above, s.inav[0] shouldn't be possible in this case.

I think it should be? Based on my comment above. I would rather it actually return a non ragged signal kind of like how numpy handles it. I think that would fix some of the weird edge cases you are talking about.

It looks to me that map should be adjusted to special case ragged array. I guess @magnunor would have a similar issue to adjust pyxem code for the new implementation of map in pyxem/pyxem#789.

I think the issue is bigger than ragged signals as from my example you can see that it happens with non ragged signals. Ragged signals are just the signals where it happens the most because they have no signal dimensions.

Anyways, let me know what you think. I haven't really seen much of an error with ragged signals always having a navigation axis = 1 but obviously you have some more experience with this.

CSSFrancis · 2022-02-07T02:53:22Z

I looked over this a little bit more. It seems like there is still a bug with ragged signals mapped with a larger signal.

This test fails for example because there are extra dimensions added.

 def test_iter_kwarg_larger_shape_ragged(self):
        def return_img(image, add):
            return image
        x = np.empty((1,), dtype=object)
        x[0] = np.ones((4, 2))
        s = hs.signals.BaseSignal(x, ragged=True)

        s_add = hs.signals.BaseSignal(2 * np.ones((1, 201, 101))).transpose(2)
        s_out = s.map(return_img, inplace=False, add=s_add, ragged=True)
        np.testing.assert_array_equal(s_out.data[0], x[0])

I'll look this over a little bit more and see if I can find a way to fix this.

ericpre · 2022-02-07T10:21:45Z

I guess that makes some sense, as long as the map function works on zero dimensional signals I think that it is not terribly important. I am still thinking that the array np.empty((1,),dtype=object should have a navigation dimension of 1 dimension. It is pretty inconsistent with numpy if that is not that case
print(np.empty((), dtype=object).shape) # ()
print(np.empty((1), dtype=object).shape) # (1,)
print(np.empty((2), dtype=object).shape) # (2,)
I doesn't seem like the second and third example should have different numbers of dimensions in hyperspy when in numpy they have the same number of dimensions.

Yes, this is exactly one think that I have to make more consistent as part of #2842. The rational is that the ragged dimension is in the signal dimension and this is communicated to the users through the axes_manager. The example you give above is actually not a ragged array: it simply has object dtype. Maybe a better example is:

data = np.empty((2, ), dtype=object)
data[0] = np.arange(3) # data index 0 has shape (3, )
data[1] = np.arange(4) # data index 1 has shape (4, )
print(data.shape) # data shape is (2,) and the ragged dimension is not visible at all, while it is definitely there!

Considering this example, hyperspy is behaving consistently with numpy, as the ragged dimension is not visible. Hyperspy does explicitly mention it in the axes_manager.
Off the top of my head, a difference between hyperspy and numpy is that numpy will convert ragged array into normal array and hyperspy try to keep it as ragged array.

Sometimes it is not possible to find a solution, which provide a perfect consistency between numpy and hyperspy and all hyperspy features and we have to define a compromise somewhere. One of the concern that I have tried to address in #2842 is that slicing of ragged signal should behave consistently, as illustrated by the example in #2842 (comment). Then other things come up during the review.

I would suggest to separate the discussions, because otherwise it will difficult to follow for anyone (and also for our future self! ;)):

is the behaviour of ragged signal satisfactory?
is the map function working well on ragged signal?

On these two points, it seems that you reported two different bugs here (inconsistency when creating ragged signal for the first point).
Splitting the discussion should help to focus on the correct questions/issues while still keeping the wider context and perspective.

CSSFrancis · 2022-02-07T16:39:10Z

I would suggest to separate the discussions, because otherwise it will difficult to follow for anyone (and also for our future self! ;)):

is the behaviour of ragged signal satisfactory?

is the map function working well on ragged signal?

On these two points, it seems that you reported two different bugs here (inconsistency when creating ragged signal for the first point). Splitting the discussion should help to focus on the correct questions/issues while still keeping the wider context and perspective.

That sounds like a good idea. I kind of imagined this as a small fix, but obviously there is more to it that first meets the eye.

I will probably close this PR, move the relevant discussion and refactor the PR into two separate cases.

ericpre · 2022-03-16T18:11:41Z

@CSSFrancis, I think this PR can be closed now, after #2878 and #2903 have been merged?

CSSFrancis · 2022-03-16T20:14:58Z

Closing after #2903 #2878

Fixed ragged map with zero dim and mapping with iterating signals lar…

38246a1

…ger than your signal

CSSFrancis mentioned this pull request Feb 8, 2022

Map function larger iter arrays #2878

Merged

4 tasks

magnunor mentioned this pull request Feb 20, 2022

Update pyxem to work with new version of HyperSpy's s.map function pyxem/pyxem#789

Closed

jlaehne added the status: needs review label Feb 21, 2022

jlaehne added this to the v1.7 milestone Feb 21, 2022

jlaehne added release: next minor type: bug-fix labels Feb 22, 2022

magnunor mentioned this pull request Mar 13, 2022

Version 0.14.0 pyxem/pyxem#788

Closed

CSSFrancis closed this Mar 16, 2022

ericpre removed this from the v1.7 milestone Mar 17, 2022

ericpre removed status: needs review release: next minor labels Mar 17, 2022

magnunor mentioned this pull request Mar 21, 2022

Ragged BaseSignal navigation shape #2907

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed ragged map with zero dim and Larger Iterating signal #2877

Fixed ragged map with zero dim and Larger Iterating signal #2877

CSSFrancis commented Feb 4, 2022

codecov bot commented Feb 4, 2022 •

edited

ericpre commented Feb 5, 2022

CSSFrancis commented Feb 5, 2022

ericpre commented Feb 6, 2022

CSSFrancis commented Feb 6, 2022

CSSFrancis commented Feb 7, 2022 •

edited

ericpre commented Feb 7, 2022

CSSFrancis commented Feb 7, 2022

ericpre commented Mar 16, 2022

CSSFrancis commented Mar 16, 2022

Fixed ragged map with zero dim and Larger Iterating signal #2877

Fixed ragged map with zero dim and Larger Iterating signal #2877

Conversation

CSSFrancis commented Feb 4, 2022

Description of the change

Progress of the PR

codecov bot commented Feb 4, 2022 • edited

Codecov Report

ericpre commented Feb 5, 2022

CSSFrancis commented Feb 5, 2022

ericpre commented Feb 6, 2022

CSSFrancis commented Feb 6, 2022

CSSFrancis commented Feb 7, 2022 • edited

ericpre commented Feb 7, 2022

CSSFrancis commented Feb 7, 2022

ericpre commented Mar 16, 2022

CSSFrancis commented Mar 16, 2022

codecov bot commented Feb 4, 2022 •

edited

CSSFrancis commented Feb 7, 2022 •

edited