value2index: convert index to int #2765

jlaehne · 2021-06-07T22:12:01Z

Description of the change

Follow up for #2743: Using numba for the value2index function can lead to a non-integer index that is not usable for slicing. Therefore, int is enforced and tested.

Fixes failing test from LumiSpy: LumiSpy/lumispy#78 (comment)

Along the way realized that rounding in value2index is not sufficiently tested for. Fixed bug for rounding=math.floor. However, also rounding=round was not working as it should: 0.5 was rounded down and not up.

Progress of the PR

codecov · 2021-06-07T22:24:30Z

Codecov Report

Merging #2765 (595be28) into non_uniform_axes (edd9937) will decrease coverage by 0.45%.
The diff coverage is 75.00%.

@@                 Coverage Diff                  @@
##           non_uniform_axes    #2765      +/-   ##
====================================================
- Coverage             78.43%   77.98%   -0.46%     
====================================================
  Files                   203      203              
  Lines                 31710    31129     -581     
  Branches               7087     6801     -286     
====================================================
- Hits                  24873    24276     -597     
- Misses                 5031     5052      +21     
+ Partials               1806     1801       -5

Impacted Files	Coverage Δ
hyperspy/misc/array_tools.py	`80.71% <57.14%> (+4.36%)`	⬆️
hyperspy/axes.py	`90.49% <100.00%> (+0.38%)`	⬆️
hyperspy/io_plugins/hspy.py	`68.60% <0.00%> (-10.56%)`	⬇️
hyperspy/io_plugins/nexus.py	`79.28% <0.00%> (-9.82%)`	⬇️
hyperspy/io_plugins/empad.py	`93.61% <0.00%> (-2.11%)`	⬇️
hyperspy/misc/utils.py	`85.54% <0.00%> (-0.84%)`	⬇️
hyperspy/model.py	`80.23% <0.00%> (ø)`
hyperspy/misc/example_signals_loading.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edd9937...595be28. Read the comment docs.

jlaehne · 2021-06-10T07:52:54Z

The test_value_changed_event function revealed that actually rounding was not performed properly even prior to #2743.

@LMSC-NTappy could you review this PR and check the performance of the updated rounding function?

LMSC-NTappy · 2021-06-10T09:09:39Z

hyperspy/misc/array_tools.py

+        vdiff_array = np.abs(axis_array - v)
+        index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]


Here the idea is to deal with the case where two identical values are found in vdiff_array in which case you want to return the last one to ensure rounding above, I guess.

Unfortunately, while this might solve a failing test, this is not a solution for two reasons:

if the axis is sorted in decreasing order, this will still round below

the case where vdiff_array==np.min(vdiff_array) matches more than 1 value does not typically happens.

Here is an example:

>>> import numpy as np >>> ax = np.linspace(-10.0,-11.0,11) >>> ax array([-10. , -10.1, -10.2, -10.3, -10.4, -10.5, -10.6, -10.7, -10.8, -10.9, -11. ]) >>> vdiff_array = np.abs(ax+10.15) >>> vdiff_array array([0.15, 0.05, 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85]) >>> vdiff_array==np.min(vdiff_array) array([False, False, True, False, False, False, False, False, False, False, False])

As you can see, this performs a rounding below (case 1) and we only have one single value matched in the equality comparison operator (case 2). Of course, the reason for this behaviour is float arithmetics. We have:

>>> vdiff_array[1] 0.05000000000000071 >>> vdiff_array[2] 0.049999999999998934

which, no matter how you look at it, will still round below in most strategies, I think the best way to deal with it would be to bias very slightly vdiff_array towards upper-rounding. I would advocate something like np.abs(axis_array - v + MACHINEEPSILON) and MACHINEEPSILON would be determined using some relative tolerance, eg rtol=1e-10; MACHINEEPSILON = np.min(np.diff(axis))*rtol .

I don't pretend this is the ideal solution. But it can shifts a commonly encountered problem (incorrect value2indexing exactly half the interval between axis points) to a much less commonly encountered one (incorrect value2indexing a value that is very close but not equal to exactly half the interval between axis points)

Yes, the idea was to enforce rounding up for a 0.5 value.

I like your idea with the upwards bias, I had thought in a similar direction. I'll implement that to see.

Though, I now realize that numpy also does not necessarily round 0.5 upwards, but it rounds to the next even number: https://github.com/numpy/numpy/blob/e94ed84010c60961f82860d146681d3fd607de4e/numpy/core/fromnumeric.py#L2723-L2789
(explicitly mentioning the issue with machine precision as well).

Yeah exactly.

Actually I wonder why this rounding approach is used in hyperspy. When slicing with a float value, I would personally prefer knowing in advance whether the sliced axis will contain or not the value I input.

I would agree it's not a big deal, but in the end we spend some effort to preserve this functionality.

It is python rounding which uses the round-to-even strategy to avoid accumulation of offset: https://realpython.com/python-rounding/#pythons-built-in-round-function

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Well, in the end I think it does not make sense to touch this function,because round is of course used elsewhere in the code as well and we should be consistent there.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint

In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes

>>>from hyperspy.axes import UniformDataAxis >>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]] >>>inval = [-10.95,-10.95,10.95,10.95] >>>for i,j in zip(inax,inval): >>> ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i)) >>> nua_idx = super(type(ax),ax).value2index(j,rounding=round) >>> unif_idx = ax.value2index(j,rounding=round) >>> print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx)) Value2Index IN ax: [-11. -10.9], val: -10.95 --> UAX: 1, NUA: 0 Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1 Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1 Value2Index IN ax: [11. 10.9], val: 10.95 --> UAX: 1, NUA: 0

I assume the behaviour you desire is the one from NUA?

LMSC-NTappy · 2021-06-10T09:16:22Z

hyperspy/axes.py

            elif rounding is math.floor:
                #flooring means finding index of the closest xi with xi - v <= 0
                #we look for armgax of strictly non-positive part of self.axis-v.
                #The trick is to replace strictly positive values with -np.inf
-                index = numba_closest_index_ceil(self.axis,value)
+                index = numba_closest_index_floor(self.axis,value).astype(int)


Looks Good To Me.

However, this behaviour is weird because in the numba_closest_index functions, the returned array is initialized with "uint" dtype and argmax / argmin functions should not return anything else than int types. I wonder what causes this behaviour...

Well, if it works like this then who am I to judge?

I agree, this is odd, there should be a bug somewhere and from a quick look, it should not return float type and casting explicitly to int after calling jitted function shouldn't be necessary...

I thought so as well.

However, for some reason a statement such as s.isig[value2index(x)+1:] would throw an error for value2index(x)+1 not being an int that can be used as index. It is explicitly for when another integer is added/subtracted to the result of value2index It is the source of the failure of the tests in LumiSpy/lumispy#78 (comment)
As the failure appeared only recently and only for the non_uniform_axes branch it must be a result of rewriting the value2index function.

Adding the .astype(int) fixes this problem. (Note that the value2index function for UniformDataAxis also explicitly casts to int.) Therefore, I propose to go for this "workaround".

Therefore, I propose to go for this "workaround".

I share this opinion

LMSC-NTappy · 2021-06-10T09:30:02Z

@jlaehne I took a look, see comments above.

I'll check the execution speed, however I'd like to finish discussing this rounding behaviour issue.

Cheers

Nicolas

hyperspy/misc/array_tools.py

hyperspy/axes.py

LMSC-NTappy · 2021-06-14T08:10:21Z

In any case, this is not noticeably worse than the previous implementation.

Once the rounding behavior is settled this can be integrated IMO

jlaehne · 2021-06-14T08:26:52Z

Status is that for uniform axis value2index uses round-half-to-even and for non-uniform axis, round-half-away-from-zero.

Are we OK with that difference? Then I would just add the difference to the docstring.

Otherwise, I would have to redo the non-uniform axis behavior again to also perform a round-half-to-even. Instead of the biasing away from zero that would mean that I would have to revert to checking whether axis_array - v has two values of same distance to zero and if so check which one of them would result in an even value.

LMSC-NTappy

Sorry I didn't see your reply.

Yes I am fine with that. I didn't see your comment soon enough. If you think it doesn't matter that both behave differently that's ok for me.

LMSC-NTappy · 2021-06-14T08:36:43Z

hyperspy/misc/array_tools.py

+        vdiff_array = np.abs(axis_array - v)
+        index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]


To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint

In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes

>>>from hyperspy.axes import UniformDataAxis >>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]] >>>inval = [-10.95,-10.95,10.95,10.95] >>>for i,j in zip(inax,inval): >>> ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i)) >>> nua_idx = super(type(ax),ax).value2index(j,rounding=round) >>> unif_idx = ax.value2index(j,rounding=round) >>> print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx)) Value2Index IN ax: [-11. -10.9], val: -10.95 --> UAX: 1, NUA: 0 Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1 Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1 Value2Index IN ax: [11. 10.9], val: 10.95 --> UAX: 1, NUA: 0

I assume the behaviour you desire is the one from NUA?

ericpre · 2021-06-14T20:18:15Z

Status is that for uniform axis value2index uses round-half-to-even and for non-uniform axis, round-half-away-from-zero.

Are we OK with that difference? Then I would just add the difference to the docstring.

Otherwise, I would have to redo the non-uniform axis behavior again to also perform a round-half-to-even. Instead of the biasing away from zero that would mean that I would have to revert to checking whether axis_array - v has two values of same distance to zero and if so check which one of them would result in an even value.

Humm, it is not great if value2index doesn't give the same results for different type of axis even if the axis are exactly the same... I don't think that the rounding really matters but it should be done consistently! It may be good to get sorted now, while we know what is the issue and what needs to be done.

jlaehne · 2021-06-15T08:02:58Z

OK, I'll give it a try the next days.

jlaehne · 2021-06-22T21:42:00Z

Humm, it is not great if value2index doesn't give the same results for different type of axis even if the axis are exactly the same... I don't think that the rounding really matters but it should be done consistently! It may be good to get sorted now, while we know what is the issue and what needs to be done.

Indeed, value2index should behave the same for any type of axis.

However, the only way I could come up with to implement the round-to-even strategy for non uniform axes is by deactivating numba for that function. If I have to decide for example whether 10.1 or 10.2 has an even last digit, I can check the last character of repr(x). As a float itself does not hold any information on how many digits are relevant, I have not found any pure numba way of checking that.

LMSC-NTappy · 2021-06-23T17:22:27Z

the only way I could come up with to implement the round-to-even strategy for non uniform axes is by deactivating numba for that function

A consistent behavior is important but I prefer a solution that does not require sacrificing the efficiency of BaseDataAxis.value2index to deal with this specific case. Besides, UniformDataAxis.value2index does not perform the round-to-even strategy:

>>>from hyperspy.axes import UniformDataAxis
>>>dd = UniformDataAxis(size=10, scale=0.1, offset=10);
>>>test_vals = dd.axis[:-1:]+0.05
>>>print(dd.axis)
[10.  10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9]
>>>print(test_vals)
[10.05 10.15 10.25 10.35 10.45 10.55 10.65 10.75 10.85]
>>>dd.value2index(test_vals,rounding=round)
array([ 1,  2,  2,  4,  5,  6,  7,  8,  9])

In the end, I am afraid we are chasing a chimaera. Because round is evaluated on the result of floating-point operations. In UniformDataAxis.value2index:

hyperspy/hyperspy/axes.py

Line 569 in 2ddc359

index = rounding((value - self.offset) / self.scale)

Which will, I think, hardly ever map to the exact midpoint between two integers indices.

I propose to implement the same biasing on UniformDataAxis.value2index and not care about the round-to-even policy. IEEE 754 recommends the round-to-even operation only to remove bias from addition of many rounded values. I think we can all agree that this absolutely won't be an issue here. Or am I missing something?

ericpre · 2021-06-23T21:06:52Z

Yes, I agree that the method used is not very critical here and we should choice a method, which is efficient, simple and consistent between different type of axes. If it requires to slightly change the behaviour of UniformDataAxis.value2index, this sounds like a good compromise to me.

I propose to implement the same biasing on UniformDataAxis.value2index and not care about the round-to-even policy

@LMSC-NTappy, what do you mean? Sorry if you are referencing to something above that I may have missed!

jlaehne · 2021-06-24T09:38:22Z

OK, I checked only the case for 10.15 and 10.25 - and from that thought it uses the numpy routine - good that you checked the full array. So I'll just revert the last commit and then we should go for that solution instead of chasing further.

This reverts commit c977edd.

ericpre · 2021-06-24T17:52:18Z

In jlaehne#7, I have changed the rounding behaviour of both non-uniform and uniform axis to "round half towards zero". The different type of axis should behave consistently now!

Nua value2index fix

jlaehne · 2021-06-28T13:48:15Z

@ericpre, when I use the example of @LMSC-NTappy from above that checks a complete 10-value axis, I still get a disagreement:

>>> from hyperspy.axes import UniformDataAxis
>>> dd = UniformDataAxis(size=10, scale=0.1, offset=10);
>>> test_vals = dd.axis[:-1:]+0.05
>>> print(dd.axis)
[10.  10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9]
>>> print(test_vals)
[10.05 10.15 10.25 10.35 10.45 10.55 10.65 10.75 10.85]
>>> dd.value2index(test_vals,rounding=round)
array([1, 2, 2, 4, 5, 6, 7, 7, 9])
>>> super(type(dd),dd).value2index(test_vals,rounding=round)
array([0, 1, 2, 3, 4, 5, 6, 7, 8])

What was the motivation for using round-half-to-zero instead of -away-from-zero?

ericpre · 2021-06-30T19:55:42Z

@jlaehne, @LMSC-NTappy: the inconsistency should between uniform and non-uniform axes should be fixed in 0659013. Could you please review it?

What was the motivation for using round-half-to-zero instead of -away-from-zero?

I was thinking that it will help to avoid out of axis error.

jlaehne · 2021-06-30T20:57:49Z

nice, LGTM! You covered it all by tests, but also the locally run test cases work consistently now.

jlaehne · 2021-06-30T20:58:57Z

I think we can merge this one now.

LMSC-NTappy · 2021-07-01T17:59:37Z

Looks good to me!

I tested the speed and it's as before. In fact the speed of UniformDataAxis.value2index() has doubled!

Congrats, ready to merge IMO!

ericpre · 2021-07-01T18:06:24Z

Thanks! It was tedious but I think it ended up in a good place!

Regarding the choice of round-half-to-zero instead of round-half-away-from-zero, it should be very easy to change, if necessary.

convert index to int

3f91fe8

jlaehne added status: needs review type: bug-fix topic: non-uniform axis labels Jun 7, 2021

jlaehne mentioned this pull request Jun 7, 2021

force r to be int LumiSpy/lumispy#79

Merged

3 tasks

jlaehne changed the title ~~convert index to int~~ value2index: convert index to int Jun 7, 2021

jlaehne mentioned this pull request Jun 7, 2021

Black formatting LumiSpy/lumispy#78

Merged

3 tasks

Add tests for rounding

72df017

jlaehne added status: WIP and removed status: needs review labels Jun 8, 2021

jlaehne added 3 commits June 8, 2021 10:33

fix rounding floor

dac2ced

fix rounding

a67980b

simplify round, fix value_changed test

5ebe28e

jlaehne added status: needs review and removed status: WIP labels Jun 10, 2021

LMSC-NTappy reviewed Jun 10, 2021

View reviewed changes

use bias to round up

9b6a944

LMSC-NTappy reviewed Jun 10, 2021

View reviewed changes

hyperspy/misc/array_tools.py Outdated Show resolved Hide resolved

ericpre reviewed Jun 10, 2021

View reviewed changes

hyperspy/axes.py Show resolved Hide resolved

fix rounding

a50e108

jlaehne force-pushed the nua_value2index_fix branch from 7cf2c5d to a50e108 Compare June 11, 2021 23:48

LMSC-NTappy approved these changes Jun 14, 2021

View reviewed changes

rounding docstring

74d2afb

round half-to-even

c977edd

jlaehne and others added 4 commits June 24, 2021 11:40

Revert "round half-to-even"

ef35be2

This reverts commit c977edd.

Use round half towards zero as rounding method for non uniform axis

aa43a08

Use round half towards zero as rounding method for uniform axis

3644d22

Run black on array_tools.py

7a55b07

ericpre mentioned this pull request Jun 24, 2021

Nua value2index fix jlaehne/hyperspy#7

Merged

Merge pull request #7 from ericpre/nua_value2index_fix

3561a67

Nua value2index fix

ericpre added 2 commits June 30, 2021 16:42

Fix inconsistency in value2index for uniform and non-uniform axis.

0659013

Skip a test on aarch64

595be28

ericpre merged commit 2bfef10 into hyperspy:non_uniform_axes Jul 1, 2021

ericpre removed the status: needs review label Jul 1, 2021

jlaehne deleted the nua_value2index_fix branch July 1, 2021 21:07

ericpre mentioned this pull request Jul 6, 2021

rounding error while parsing input in value2index #2746

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

value2index: convert index to int #2765

value2index: convert index to int #2765

jlaehne commented Jun 7, 2021 •

edited

codecov bot commented Jun 7, 2021 •

edited

jlaehne commented Jun 10, 2021

LMSC-NTappy Jun 10, 2021

jlaehne Jun 10, 2021

LMSC-NTappy Jun 10, 2021

jlaehne Jun 11, 2021

jlaehne Jun 11, 2021

jlaehne Jun 11, 2021

LMSC-NTappy Jun 14, 2021

LMSC-NTappy Jun 10, 2021

ericpre Jun 10, 2021

jlaehne Jun 10, 2021

LMSC-NTappy Jun 10, 2021

LMSC-NTappy commented Jun 10, 2021

LMSC-NTappy commented Jun 14, 2021

jlaehne commented Jun 14, 2021

LMSC-NTappy left a comment

LMSC-NTappy Jun 14, 2021

ericpre commented Jun 14, 2021

jlaehne commented Jun 15, 2021

jlaehne commented Jun 22, 2021

LMSC-NTappy commented Jun 23, 2021

ericpre commented Jun 23, 2021

jlaehne commented Jun 24, 2021

ericpre commented Jun 24, 2021

jlaehne commented Jun 28, 2021

ericpre commented Jun 30, 2021

jlaehne commented Jun 30, 2021

jlaehne commented Jun 30, 2021

LMSC-NTappy commented Jul 1, 2021

ericpre commented Jul 1, 2021

		vdiff_array = np.abs(axis_array - v)
		index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]

value2index: convert index to int #2765

value2index: convert index to int #2765

Conversation

jlaehne commented Jun 7, 2021 • edited

Description of the change

Progress of the PR

codecov bot commented Jun 7, 2021 • edited

Codecov Report

jlaehne commented Jun 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LMSC-NTappy commented Jun 10, 2021

LMSC-NTappy commented Jun 14, 2021

jlaehne commented Jun 14, 2021

LMSC-NTappy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericpre commented Jun 14, 2021

jlaehne commented Jun 15, 2021

jlaehne commented Jun 22, 2021

LMSC-NTappy commented Jun 23, 2021

ericpre commented Jun 23, 2021

jlaehne commented Jun 24, 2021

ericpre commented Jun 24, 2021

jlaehne commented Jun 28, 2021

ericpre commented Jun 30, 2021

jlaehne commented Jun 30, 2021

jlaehne commented Jun 30, 2021

LMSC-NTappy commented Jul 1, 2021

ericpre commented Jul 1, 2021

jlaehne commented Jun 7, 2021 •

edited

codecov bot commented Jun 7, 2021 •

edited