Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value2index: convert index to int #2765

Merged
merged 16 commits into from Jul 1, 2021

Conversation

jlaehne
Copy link
Contributor

@jlaehne jlaehne commented Jun 7, 2021

Description of the change

Follow up for #2743: Using numba for the value2index function can lead to a non-integer index that is not usable for slicing. Therefore, int is enforced and tested.

Fixes failing test from LumiSpy: LumiSpy/lumispy#78 (comment)

Along the way realized that rounding in value2index is not sufficiently tested for. Fixed bug for rounding=math.floor. However, also rounding=round was not working as it should: 0.5 was rounded down and not up.

Progress of the PR

  • Change implemented (can be split into several points),
  • Fix rounding,
  • add tests for integer,
  • add tests for rounding,
  • ready for review.

@jlaehne jlaehne mentioned this pull request Jun 7, 2021
3 tasks
@jlaehne jlaehne changed the title convert index to int value2index: convert index to int Jun 7, 2021
@jlaehne jlaehne mentioned this pull request Jun 7, 2021
3 tasks
@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #2765 (595be28) into non_uniform_axes (edd9937) will decrease coverage by 0.45%.
The diff coverage is 75.00%.

Impacted file tree graph

@@                 Coverage Diff                  @@
##           non_uniform_axes    #2765      +/-   ##
====================================================
- Coverage             78.43%   77.98%   -0.46%     
====================================================
  Files                   203      203              
  Lines                 31710    31129     -581     
  Branches               7087     6801     -286     
====================================================
- Hits                  24873    24276     -597     
- Misses                 5031     5052      +21     
+ Partials               1806     1801       -5     
Impacted Files Coverage Δ
hyperspy/misc/array_tools.py 80.71% <57.14%> (+4.36%) ⬆️
hyperspy/axes.py 90.49% <100.00%> (+0.38%) ⬆️
hyperspy/io_plugins/hspy.py 68.60% <0.00%> (-10.56%) ⬇️
hyperspy/io_plugins/nexus.py 79.28% <0.00%> (-9.82%) ⬇️
hyperspy/io_plugins/empad.py 93.61% <0.00%> (-2.11%) ⬇️
hyperspy/misc/utils.py 85.54% <0.00%> (-0.84%) ⬇️
hyperspy/model.py 80.23% <0.00%> (ø)
hyperspy/misc/example_signals_loading.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edd9937...595be28. Read the comment docs.

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 10, 2021

The test_value_changed_event function revealed that actually rounding was not performed properly even prior to #2743.

@LMSC-NTappy could you review this PR and check the performance of the updated rounding function?

Comment on lines 473 to 474
vdiff_array = np.abs(axis_array - v)
index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the idea is to deal with the case where two identical values are found in vdiff_array in which case you want to return the last one to ensure rounding above, I guess.

Unfortunately, while this might solve a failing test, this is not a solution for two reasons:

  1. if the axis is sorted in decreasing order, this will still round below
  2. the case where vdiff_array==np.min(vdiff_array) matches more than 1 value does not typically happens.

Here is an example:

>>> import numpy as np
>>> ax = np.linspace(-10.0,-11.0,11)
>>> ax
array([-10. , -10.1, -10.2, -10.3, -10.4, -10.5, -10.6, -10.7, -10.8,
       -10.9, -11. ])
>>> vdiff_array = np.abs(ax+10.15)
>>> vdiff_array
array([0.15, 0.05, 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85])
>>> vdiff_array==np.min(vdiff_array)
array([False, False,  True, False, False, False, False, False, False,
       False, False])

As you can see, this performs a rounding below (case 1) and we only have one single value matched in the equality comparison operator (case 2). Of course, the reason for this behaviour is float arithmetics. We have:

>>> vdiff_array[1] 
0.05000000000000071
>>> vdiff_array[2]
0.049999999999998934

which, no matter how you look at it, will still round below in most strategies, I think the best way to deal with it would be to bias very slightly vdiff_array towards upper-rounding. I would advocate something like np.abs(axis_array - v + MACHINEEPSILON) and MACHINEEPSILON would be determined using some relative tolerance, eg rtol=1e-10; MACHINEEPSILON = np.min(np.diff(axis))*rtol .

I don't pretend this is the ideal solution. But it can shifts a commonly encountered problem (incorrect value2indexing exactly half the interval between axis points) to a much less commonly encountered one (incorrect value2indexing a value that is very close but not equal to exactly half the interval between axis points)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was to enforce rounding up for a 0.5 value.

I like your idea with the upwards bias, I had thought in a similar direction. I'll implement that to see.

Though, I now realize that numpy also does not necessarily round 0.5 upwards, but it rounds to the next even number: https://github.com/numpy/numpy/blob/e94ed84010c60961f82860d146681d3fd607de4e/numpy/core/fromnumeric.py#L2723-L2789
(explicitly mentioning the issue with machine precision as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly.

Actually I wonder why this rounding approach is used in hyperspy. When slicing with a float value, I would personally prefer knowing in advance whether the sliced axis will contain or not the value I input.

I would agree it's not a big deal, but in the end we spend some effort to preserve this functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is python rounding which uses the round-to-even strategy to avoid accumulation of offset: https://realpython.com/python-rounding/#pythons-built-in-round-function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the end I think it does not make sense to touch this function,because round is of course used elsewhere in the code as well and we should be consistent there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint

In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes

>>>from hyperspy.axes import UniformDataAxis

>>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]]
>>>inval = [-10.95,-10.95,10.95,10.95]

>>>for i,j in zip(inax,inval):
>>>    ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i))
>>>    nua_idx = super(type(ax),ax).value2index(j,rounding=round)
>>>    unif_idx = ax.value2index(j,rounding=round)
>>>    print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx))

Value2Index IN ax: [-11.  -10.9], val: -10.95 --> UAX: 1, NUA: 0
Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [11.  10.9], val: 10.95 --> UAX: 1, NUA: 0

I assume the behaviour you desire is the one from NUA?

elif rounding is math.floor:
#flooring means finding index of the closest xi with xi - v <= 0
#we look for armgax of strictly non-positive part of self.axis-v.
#The trick is to replace strictly positive values with -np.inf
index = numba_closest_index_ceil(self.axis,value)
index = numba_closest_index_floor(self.axis,value).astype(int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good To Me.

However, this behaviour is weird because in the numba_closest_index functions, the returned array is initialized with "uint" dtype and argmax / argmin functions should not return anything else than int types. I wonder what causes this behaviour...

Well, if it works like this then who am I to judge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is odd, there should be a bug somewhere and from a quick look, it should not return float type and casting explicitly to int after calling jitted function shouldn't be necessary...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so as well.

However, for some reason a statement such as s.isig[value2index(x)+1:] would throw an error for value2index(x)+1 not being an int that can be used as index. It is explicitly for when another integer is added/subtracted to the result of value2index It is the source of the failure of the tests in LumiSpy/lumispy#78 (comment)
As the failure appeared only recently and only for the non_uniform_axes branch it must be a result of rewriting the value2index function.

Adding the .astype(int) fixes this problem. (Note that the value2index function for UniformDataAxis also explicitly casts to int.) Therefore, I propose to go for this "workaround".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, I propose to go for this "workaround".

I share this opinion

@LMSC-NTappy
Copy link
Contributor

@jlaehne I took a look, see comments above.

I'll check the execution speed, however I'd like to finish discussing this rounding behaviour issue.

Cheers

Nicolas

@LMSC-NTappy
Copy link
Contributor

In any case, this is not noticeably worse than the previous implementation.

image

Once the rounding behavior is settled this can be integrated IMO

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 14, 2021

Status is that for uniform axis value2index uses round-half-to-even and for non-uniform axis, round-half-away-from-zero.

Are we OK with that difference? Then I would just add the difference to the docstring.

Otherwise, I would have to redo the non-uniform axis behavior again to also perform a round-half-to-even. Instead of the biasing away from zero that would mean that I would have to revert to checking whether axis_array - v has two values of same distance to zero and if so check which one of them would result in an even value.

Copy link
Contributor

@LMSC-NTappy LMSC-NTappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't see your reply.

Yes I am fine with that. I didn't see your comment soon enough. If you think it doesn't matter that both behave differently that's ok for me.

Comment on lines 473 to 474
vdiff_array = np.abs(axis_array - v)
index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint

In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes

>>>from hyperspy.axes import UniformDataAxis

>>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]]
>>>inval = [-10.95,-10.95,10.95,10.95]

>>>for i,j in zip(inax,inval):
>>>    ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i))
>>>    nua_idx = super(type(ax),ax).value2index(j,rounding=round)
>>>    unif_idx = ax.value2index(j,rounding=round)
>>>    print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx))

Value2Index IN ax: [-11.  -10.9], val: -10.95 --> UAX: 1, NUA: 0
Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [11.  10.9], val: 10.95 --> UAX: 1, NUA: 0

I assume the behaviour you desire is the one from NUA?

@ericpre
Copy link
Member

ericpre commented Jun 14, 2021

Status is that for uniform axis value2index uses round-half-to-even and for non-uniform axis, round-half-away-from-zero.

Are we OK with that difference? Then I would just add the difference to the docstring.

Otherwise, I would have to redo the non-uniform axis behavior again to also perform a round-half-to-even. Instead of the biasing away from zero that would mean that I would have to revert to checking whether axis_array - v has two values of same distance to zero and if so check which one of them would result in an even value.

Humm, it is not great if value2index doesn't give the same results for different type of axis even if the axis are exactly the same... I don't think that the rounding really matters but it should be done consistently! It may be good to get sorted now, while we know what is the issue and what needs to be done.

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 15, 2021

OK, I'll give it a try the next days.

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 22, 2021

Humm, it is not great if value2index doesn't give the same results for different type of axis even if the axis are exactly the same... I don't think that the rounding really matters but it should be done consistently! It may be good to get sorted now, while we know what is the issue and what needs to be done.

Indeed, value2index should behave the same for any type of axis.

However, the only way I could come up with to implement the round-to-even strategy for non uniform axes is by deactivating numba for that function. If I have to decide for example whether 10.1 or 10.2 has an even last digit, I can check the last character of repr(x). As a float itself does not hold any information on how many digits are relevant, I have not found any pure numba way of checking that.

@LMSC-NTappy
Copy link
Contributor

the only way I could come up with to implement the round-to-even strategy for non uniform axes is by deactivating numba for that function

A consistent behavior is important but I prefer a solution that does not require sacrificing the efficiency of BaseDataAxis.value2index to deal with this specific case. Besides, UniformDataAxis.value2index does not perform the round-to-even strategy:

>>>from hyperspy.axes import UniformDataAxis
>>>dd = UniformDataAxis(size=10, scale=0.1, offset=10);
>>>test_vals = dd.axis[:-1:]+0.05
>>>print(dd.axis)
[10.  10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9]
>>>print(test_vals)
[10.05 10.15 10.25 10.35 10.45 10.55 10.65 10.75 10.85]
>>>dd.value2index(test_vals,rounding=round)
array([ 1,  2,  2,  4,  5,  6,  7,  8,  9])

In the end, I am afraid we are chasing a chimaera. Because round is evaluated on the result of floating-point operations. In UniformDataAxis.value2index:

index = rounding((value - self.offset) / self.scale)

Which will, I think, hardly ever map to the exact midpoint between two integers indices.

I propose to implement the same biasing on UniformDataAxis.value2index and not care about the round-to-even policy. IEEE 754 recommends the round-to-even operation only to remove bias from addition of many rounded values. I think we can all agree that this absolutely won't be an issue here. Or am I missing something?

@ericpre
Copy link
Member

ericpre commented Jun 23, 2021

Yes, I agree that the method used is not very critical here and we should choice a method, which is efficient, simple and consistent between different type of axes. If it requires to slightly change the behaviour of UniformDataAxis.value2index, this sounds like a good compromise to me.

I propose to implement the same biasing on UniformDataAxis.value2index and not care about the round-to-even policy

@LMSC-NTappy, what do you mean? Sorry if you are referencing to something above that I may have missed!

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 24, 2021

OK, I checked only the case for 10.15 and 10.25 - and from that thought it uses the numpy routine - good that you checked the full array. So I'll just revert the last commit and then we should go for that solution instead of chasing further.

@ericpre
Copy link
Member

ericpre commented Jun 24, 2021

In jlaehne#7, I have changed the rounding behaviour of both non-uniform and uniform axis to "round half towards zero". The different type of axis should behave consistently now!

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 28, 2021

@ericpre, when I use the example of @LMSC-NTappy from above that checks a complete 10-value axis, I still get a disagreement:

>>> from hyperspy.axes import UniformDataAxis
>>> dd = UniformDataAxis(size=10, scale=0.1, offset=10);
>>> test_vals = dd.axis[:-1:]+0.05
>>> print(dd.axis)
[10.  10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9]
>>> print(test_vals)
[10.05 10.15 10.25 10.35 10.45 10.55 10.65 10.75 10.85]
>>> dd.value2index(test_vals,rounding=round)
array([1, 2, 2, 4, 5, 6, 7, 7, 9])
>>> super(type(dd),dd).value2index(test_vals,rounding=round)
array([0, 1, 2, 3, 4, 5, 6, 7, 8])

What was the motivation for using round-half-to-zero instead of -away-from-zero?

@ericpre
Copy link
Member

ericpre commented Jun 30, 2021

@jlaehne, @LMSC-NTappy: the inconsistency should between uniform and non-uniform axes should be fixed in 0659013. Could you please review it?

What was the motivation for using round-half-to-zero instead of -away-from-zero?

I was thinking that it will help to avoid out of axis error.

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 30, 2021

nice, LGTM! You covered it all by tests, but also the locally run test cases work consistently now.

@jlaehne
Copy link
Contributor Author

jlaehne commented Jun 30, 2021

I think we can merge this one now.

@LMSC-NTappy
Copy link
Contributor

Looks good to me!

I tested the speed and it's as before. In fact the speed of UniformDataAxis.value2index() has doubled!

Congrats, ready to merge IMO!

image

@ericpre
Copy link
Member

ericpre commented Jul 1, 2021

Thanks! It was tedious but I think it ended up in a good place!

Regarding the choice of round-half-to-zero instead of round-half-away-from-zero, it should be very easy to change, if necessary.

@ericpre ericpre merged commit 2bfef10 into hyperspy:non_uniform_axes Jul 1, 2021
@jlaehne jlaehne deleted the nua_value2index_fix branch July 1, 2021 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants