Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed ragged map with zero dim and Larger Iterating signal #2877

Closed

Conversation

CSSFrancis
Copy link
Member

Description of the change

This is kind of a small change to fix some bugs that I have come across with the map function

  • Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.
  • Mapping now works when the iterating signal dimension is larger than the base signal

Progress of the PR

  • [ x] Change implemented (can be split into several points),
  • add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • [ x] add tests,
  • [x ] ready for review.

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #2877 (38246a1) into RELEASE_next_minor (db5743b) will increase coverage by 0.11%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2877      +/-   ##
======================================================
+ Coverage               77.13%   77.24%   +0.11%     
======================================================
  Files                     206      206              
  Lines                   31868    31883      +15     
  Branches                 7161     7165       +4     
======================================================
+ Hits                    24582    24629      +47     
+ Misses                   5533     5500      -33     
- Partials                 1753     1754       +1     
Impacted Files Coverage Δ
hyperspy/signal.py 74.02% <100.00%> (+0.22%) ⬆️
hyperspy/learn/ornmf.py 86.77% <0.00%> (-1.59%) ⬇️
hyperspy/misc/utils.py 84.20% <0.00%> (-0.54%) ⬇️
hyperspy/_signals/lazy.py 89.95% <0.00%> (-0.32%) ⬇️
hyperspy/samfire.py 64.46% <0.00%> (-0.32%) ⬇️
hyperspy/drawing/image.py 73.83% <0.00%> (-0.24%) ⬇️
hyperspy/axes.py 90.52% <0.00%> (-0.19%) ⬇️
hyperspy/model.py 80.59% <0.00%> (+0.23%) ⬆️
hyperspy/samfire_utils/samfire_worker.py 84.57% <0.00%> (+19.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db5743b...38246a1. Read the comment docs.

@ericpre
Copy link
Member

ericpre commented Feb 5, 2022

  • Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.

This is an important and not a trivial change, can you elaborate on this?

  • Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

There are recent discussions on this topic:

and one important aspect is that we need to make sure that ragged and non-ragged signals behave consistently with map, slicing, etc. with respect to signal and navigation dimensions.

@CSSFrancis
Copy link
Member Author

Sorry I should have given some examples, I submitted this a little quickly last night.

  • Ragged signals always have a navigation dimension. I think this makes sense as a numpy object array has to have some size rather than be some scalar.

This is an important and not a trivial change, can you elaborate on this?

The issue is that ragged signals built using the constructor can possible have a signal_shape = () and navigation_shape = (). This results in an error when you use the map function on a ragged array with one dimension.

For example this block of code would fail(In this case the signal have a nav dimension of ( ) )

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x, ragged=True)
s.map(np.sum)

But this code block will not (In this case the signal have a nav dimension of 1)

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.map(np.sum)

The reason this fails is that the rechunking function try to rechunk a fundamentally 1 dimensional array with hyperspy trying to tell that it is a zero dimensional array.

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0
  • Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

Yea so the issue is this chunk of code doesn't run:

def add_sum(image, add):
    out = np.sum(image) + np.sum(add)
    return out
import hyperspy.api as hs
import numpy as np
x = np.ones((10,20,2,3))
s = hs.signals.Signal2D(x)
s_add = hs.signals.BaseSignal(2 * np.ones((10, 20, 2, 2, 2))).transpose(3)


s_out = s.map(add_sum, inplace=False,  add=s_add)# This doesn't work
s_out = s_add.map(add_sum, inplace=False,  add=s)# This does work

The issue here is that the map_blocks function needs to have the blocks all the same length otherwise it takes the longest arg passed (not the signal passed). There is a way around this using the blockwise function but that requires a little bit more setup and would require tearing apart much of the map function. I tried that for a little bit but decided against it because it broke some other parts of the map function.

There are recent discussions on this topic:

and one important aspect is that we need to make sure that ragged and non-ragged signals behave consistently with map, slicing, etc. with respect to signal and navigation dimensions.

Ahh, I must have missed the recent stuff. I think that this is more consistent than before. At the very least the errors thrown in the above cases are very uninformative and potentially confusing. There is a high likelyhood that few people have come across them, but I recently was trying to map a couple of ragged signals and came across both go these edge cases.

@ericpre
Copy link
Member

ericpre commented Feb 6, 2022

Thanks for the details, now I see where does it come from. I think, we need to define what is the expected and desired behaviour.

For example this block of code would fail(In this case the signal have a nav dimension of ( ) )

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x, ragged=True)
s.map(np.sum)

But this code block will not (In this case the signal have a nav dimension of 1)

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.map(np.sum)

This looks to me like a bug: these two (slightly different) ways of creating a bug should give the same type of signals. What I would expect here is to have the same navigation shape regardless if the signal is ragged or not. In this case, the navigation dimension is zero so I would expect the navigation shape to be ().

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0

Same bug as mentioned directly (in this comment) above, s.inav[0] shouldn't be possible in this case.

  • Mapping now works when the iterating signal dimension is larger than the base signal

Can you give an example to illustrate what is the issue? This will help with considering other alternatives.

Yea so the issue is this chunk of code doesn't run:

def add_sum(image, add):
    out = np.sum(image) + np.sum(add)
    return out
import hyperspy.api as hs
import numpy as np
x = np.ones((10,20,2,3))
s = hs.signals.Signal2D(x)
s_add = hs.signals.BaseSignal(2 * np.ones((10, 20, 2, 2, 2))).transpose(3)


s_out = s.map(add_sum, inplace=False,  add=s_add)# This doesn't work
s_out = s_add.map(add_sum, inplace=False,  add=s)# This does work

The issue here is that the map_blocks function needs to have the blocks all the same length otherwise it takes the longest arg passed (not the signal passed). There is a way around this using the blockwise function but that requires a little bit more setup and would require tearing apart much of the map function. I tried that for a little bit but decided against it because it broke some other parts of the map function.

It looks to me that map should be adjusted to special case ragged array. I guess @magnunor would have a similar issue to adjust pyxem code for the new implementation of map in pyxem/pyxem#789.

@CSSFrancis
Copy link
Member Author

This looks to me like a bug: these two (slightly different) ways of creating a bug should give the same type of signals. What I would expect here is to have the same navigation shape regardless if the signal is ragged or not. In this case, the navigation dimension is zero so I would expect the navigation shape to be ().

I guess that makes some sense, as long as the map function works on zero dimensional signals I think that it is not terribly important. I am still thinking that the array np.empty((1,),dtype=object should have a navigation dimension of 1 dimension. It is pretty inconsistent with numpy if that is not that case

print(np.empty((), dtype=object).shape) # ()
print(np.empty((1), dtype=object).shape) # (1,)
print(np.empty((2), dtype=object).shape) # (2,)

I doesn't seem like the second and third example should have different numbers of dimensions in hyperspy when in numpy they have the same number of dimensions.

I should also note that there are some inconsistencies with numpy which had some small effects when trying to guess the output.

import hyperspy.api as hs
import numpy as np
x = np.empty((1,), object)
x[0] = np.ones((2,3))
s = hs.signals.BaseSignal(x)
s.ragged=True
s.inav[0]
# returns a ragged array
s.data[0]
# returns the object at index 0

Same bug as mentioned directly (in this comment) above, s.inav[0] shouldn't be possible in this case.

I think it should be? Based on my comment above. I would rather it actually return a non ragged signal kind of like how numpy handles it. I think that would fix some of the weird edge cases you are talking about.

It looks to me that map should be adjusted to special case ragged array. I guess @magnunor would have a similar issue to adjust pyxem code for the new implementation of map in pyxem/pyxem#789.

I think the issue is bigger than ragged signals as from my example you can see that it happens with non ragged signals. Ragged signals are just the signals where it happens the most because they have no signal dimensions.

Anyways, let me know what you think. I haven't really seen much of an error with ragged signals always having a navigation axis = 1 but obviously you have some more experience with this.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Feb 7, 2022

I looked over this a little bit more. It seems like there is still a bug with ragged signals mapped with a larger signal.

This test fails for example because there are extra dimensions added.

 def test_iter_kwarg_larger_shape_ragged(self):
        def return_img(image, add):
            return image
        x = np.empty((1,), dtype=object)
        x[0] = np.ones((4, 2))
        s = hs.signals.BaseSignal(x, ragged=True)

        s_add = hs.signals.BaseSignal(2 * np.ones((1, 201, 101))).transpose(2)
        s_out = s.map(return_img, inplace=False, add=s_add, ragged=True)
        np.testing.assert_array_equal(s_out.data[0], x[0])

I'll look this over a little bit more and see if I can find a way to fix this.

@ericpre
Copy link
Member

ericpre commented Feb 7, 2022

I guess that makes some sense, as long as the map function works on zero dimensional signals I think that it is not terribly important. I am still thinking that the array np.empty((1,),dtype=object should have a navigation dimension of 1 dimension. It is pretty inconsistent with numpy if that is not that case

print(np.empty((), dtype=object).shape) # ()
print(np.empty((1), dtype=object).shape) # (1,)
print(np.empty((2), dtype=object).shape) # (2,)

I doesn't seem like the second and third example should have different numbers of dimensions in hyperspy when in numpy they have the same number of dimensions.

Yes, this is exactly one think that I have to make more consistent as part of #2842. The rational is that the ragged dimension is in the signal dimension and this is communicated to the users through the axes_manager. The example you give above is actually not a ragged array: it simply has object dtype. Maybe a better example is:

data = np.empty((2, ), dtype=object)
data[0] = np.arange(3) # data index 0 has shape (3, )
data[1] = np.arange(4) # data index 1 has shape (4, )
print(data.shape) # data shape is (2,) and the ragged dimension is not visible at all, while it is definitely there!

Considering this example, hyperspy is behaving consistently with numpy, as the ragged dimension is not visible. Hyperspy does explicitly mention it in the axes_manager.
Off the top of my head, a difference between hyperspy and numpy is that numpy will convert ragged array into normal array and hyperspy try to keep it as ragged array.

Sometimes it is not possible to find a solution, which provide a perfect consistency between numpy and hyperspy and all hyperspy features and we have to define a compromise somewhere. One of the concern that I have tried to address in #2842 is that slicing of ragged signal should behave consistently, as illustrated by the example in #2842 (comment). Then other things come up during the review.

I would suggest to separate the discussions, because otherwise it will difficult to follow for anyone (and also for our future self! ;)):

  • is the behaviour of ragged signal satisfactory?
  • is the map function working well on ragged signal?

On these two points, it seems that you reported two different bugs here (inconsistency when creating ragged signal for the first point).
Splitting the discussion should help to focus on the correct questions/issues while still keeping the wider context and perspective.

@CSSFrancis
Copy link
Member Author

I would suggest to separate the discussions, because otherwise it will difficult to follow for anyone (and also for our future self! ;)):

  • is the behaviour of ragged signal satisfactory?
  • is the map function working well on ragged signal?

On these two points, it seems that you reported two different bugs here (inconsistency when creating ragged signal for the first point). Splitting the discussion should help to focus on the correct questions/issues while still keeping the wider context and perspective.

That sounds like a good idea. I kind of imagined this as a small fix, but obviously there is more to it that first meets the eye.

I will probably close this PR, move the relevant discussion and refactor the PR into two separate cases.

@ericpre
Copy link
Member

ericpre commented Mar 16, 2022

@CSSFrancis, I think this PR can be closed now, after #2878 and #2903 have been merged?

@CSSFrancis
Copy link
Member Author

Closing after #2903 #2878

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants