New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
value2index: convert index to int #2765
value2index: convert index to int #2765
Conversation
Codecov Report
@@ Coverage Diff @@
## non_uniform_axes #2765 +/- ##
====================================================
- Coverage 78.43% 77.98% -0.46%
====================================================
Files 203 203
Lines 31710 31129 -581
Branches 7087 6801 -286
====================================================
- Hits 24873 24276 -597
- Misses 5031 5052 +21
+ Partials 1806 1801 -5
Continue to review full report at Codecov.
|
The @LMSC-NTappy could you review this PR and check the performance of the updated rounding function? |
hyperspy/misc/array_tools.py
Outdated
vdiff_array = np.abs(axis_array - v) | ||
index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the idea is to deal with the case where two identical values are found in vdiff_array
in which case you want to return the last one to ensure rounding above, I guess.
Unfortunately, while this might solve a failing test, this is not a solution for two reasons:
- if the axis is sorted in decreasing order, this will still round below
- the case where
vdiff_array==np.min(vdiff_array)
matches more than 1 value does not typically happens.
Here is an example:
>>> import numpy as np
>>> ax = np.linspace(-10.0,-11.0,11)
>>> ax
array([-10. , -10.1, -10.2, -10.3, -10.4, -10.5, -10.6, -10.7, -10.8,
-10.9, -11. ])
>>> vdiff_array = np.abs(ax+10.15)
>>> vdiff_array
array([0.15, 0.05, 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85])
>>> vdiff_array==np.min(vdiff_array)
array([False, False, True, False, False, False, False, False, False,
False, False])
As you can see, this performs a rounding below (case 1) and we only have one single value matched in the equality comparison operator (case 2). Of course, the reason for this behaviour is float arithmetics. We have:
>>> vdiff_array[1]
0.05000000000000071
>>> vdiff_array[2]
0.049999999999998934
which, no matter how you look at it, will still round below in most strategies, I think the best way to deal with it would be to bias very slightly vdiff_array towards upper-rounding. I would advocate something like np.abs(axis_array - v + MACHINEEPSILON)
and MACHINEEPSILON
would be determined using some relative tolerance, eg rtol=1e-10; MACHINEEPSILON = np.min(np.diff(axis))*rtol
.
I don't pretend this is the ideal solution. But it can shifts a commonly encountered problem (incorrect value2indexing exactly half the interval between axis points) to a much less commonly encountered one (incorrect value2indexing a value that is very close but not equal to exactly half the interval between axis points)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea was to enforce rounding up for a 0.5
value.
I like your idea with the upwards bias, I had thought in a similar direction. I'll implement that to see.
Though, I now realize that numpy also does not necessarily round 0.5 upwards, but it rounds to the next even number: https://github.com/numpy/numpy/blob/e94ed84010c60961f82860d146681d3fd607de4e/numpy/core/fromnumeric.py#L2723-L2789
(explicitly mentioning the issue with machine precision as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly.
Actually I wonder why this rounding approach is used in hyperspy. When slicing with a float value, I would personally prefer knowing in advance whether the sliced axis will contain or not the value I input.
I would agree it's not a big deal, but in the end we spend some effort to preserve this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is python rounding which uses the round-to-even strategy to avoid accumulation of offset: https://realpython.com/python-rounding/#pythons-built-in-round-function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in the end I think it does not make sense to touch this function,because round is of course used elsewhere in the code as well and we should be consistent there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'
Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint
In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes
>>>from hyperspy.axes import UniformDataAxis
>>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]]
>>>inval = [-10.95,-10.95,10.95,10.95]
>>>for i,j in zip(inax,inval):
>>> ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i))
>>> nua_idx = super(type(ax),ax).value2index(j,rounding=round)
>>> unif_idx = ax.value2index(j,rounding=round)
>>> print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx))
Value2Index IN ax: [-11. -10.9], val: -10.95 --> UAX: 1, NUA: 0
Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [11. 10.9], val: 10.95 --> UAX: 1, NUA: 0
I assume the behaviour you desire is the one from NUA?
elif rounding is math.floor: | ||
#flooring means finding index of the closest xi with xi - v <= 0 | ||
#we look for armgax of strictly non-positive part of self.axis-v. | ||
#The trick is to replace strictly positive values with -np.inf | ||
index = numba_closest_index_ceil(self.axis,value) | ||
index = numba_closest_index_floor(self.axis,value).astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good To Me.
However, this behaviour is weird because in the numba_closest_index
functions, the returned array is initialized with "uint"
dtype and argmax / argmin functions should not return anything else than int
types. I wonder what causes this behaviour...
Well, if it works like this then who am I to judge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is odd, there should be a bug somewhere and from a quick look, it should not return float type and casting explicitly to int
after calling jitted function shouldn't be necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so as well.
However, for some reason a statement such as s.isig[value2index(x)+1:]
would throw an error for value2index(x)+1
not being an int
that can be used as index. It is explicitly for when another integer is added/subtracted to the result of value2index
It is the source of the failure of the tests in LumiSpy/lumispy#78 (comment)
As the failure appeared only recently and only for the non_uniform_axes
branch it must be a result of rewriting the value2index
function.
Adding the .astype(int)
fixes this problem. (Note that the value2index
function for UniformDataAxis
also explicitly casts to int
.) Therefore, I propose to go for this "workaround".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, I propose to go for this "workaround".
I share this opinion
@jlaehne I took a look, see comments above. I'll check the execution speed, however I'd like to finish discussing this rounding behaviour issue. Cheers Nicolas |
7cf2c5d
to
a50e108
Compare
Status is that for uniform axis Are we OK with that difference? Then I would just add the difference to the docstring. Otherwise, I would have to redo the non-uniform axis behavior again to also perform a round-half-to-even. Instead of the biasing away from zero that would mean that I would have to revert to checking whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't see your reply.
Yes I am fine with that. I didn't see your comment soon enough. If you think it doesn't matter that both behave differently that's ok for me.
hyperspy/misc/array_tools.py
Outdated
vdiff_array = np.abs(axis_array - v) | ||
index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'
Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint
In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes
>>>from hyperspy.axes import UniformDataAxis
>>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]]
>>>inval = [-10.95,-10.95,10.95,10.95]
>>>for i,j in zip(inax,inval):
>>> ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i))
>>> nua_idx = super(type(ax),ax).value2index(j,rounding=round)
>>> unif_idx = ax.value2index(j,rounding=round)
>>> print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx))
Value2Index IN ax: [-11. -10.9], val: -10.95 --> UAX: 1, NUA: 0
Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [11. 10.9], val: 10.95 --> UAX: 1, NUA: 0
I assume the behaviour you desire is the one from NUA?
Humm, it is not great if |
OK, I'll give it a try the next days. |
Indeed, However, the only way I could come up with to implement the |
A consistent behavior is important but I prefer a solution that does not require sacrificing the efficiency of >>>from hyperspy.axes import UniformDataAxis
>>>dd = UniformDataAxis(size=10, scale=0.1, offset=10);
>>>test_vals = dd.axis[:-1:]+0.05
>>>print(dd.axis)
[10. 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9]
>>>print(test_vals)
[10.05 10.15 10.25 10.35 10.45 10.55 10.65 10.75 10.85]
>>>dd.value2index(test_vals,rounding=round)
array([ 1, 2, 2, 4, 5, 6, 7, 8, 9]) In the end, I am afraid we are chasing a chimaera. Because round is evaluated on the result of floating-point operations. In Line 569 in 2ddc359
Which will, I think, hardly ever map to the exact midpoint between two integers indices. I propose to implement the same biasing on |
Yes, I agree that the method used is not very critical here and we should choice a method, which is efficient, simple and consistent between different type of axes. If it requires to slightly change the behaviour of
@LMSC-NTappy, what do you mean? Sorry if you are referencing to something above that I may have missed! |
OK, I checked only the case for 10.15 and 10.25 - and from that thought it uses the numpy routine - good that you checked the full array. So I'll just revert the last commit and then we should go for that solution instead of chasing further. |
In jlaehne#7, I have changed the rounding behaviour of both non-uniform and uniform axis to "round half towards zero". The different type of axis should behave consistently now! |
Nua value2index fix
@ericpre, when I use the example of @LMSC-NTappy from above that checks a complete 10-value axis, I still get a disagreement:
What was the motivation for using round-half-to-zero instead of -away-from-zero? |
@jlaehne, @LMSC-NTappy: the inconsistency should between uniform and non-uniform axes should be fixed in 0659013. Could you please review it?
I was thinking that it will help to avoid out of axis error. |
nice, LGTM! You covered it all by tests, but also the locally run test cases work consistently now. |
I think we can merge this one now. |
Thanks! It was tedious but I think it ended up in a good place! Regarding the choice of round-half-to-zero instead of round-half-away-from-zero, it should be very easy to change, if necessary. |
Description of the change
Follow up for #2743: Using
numba
for thevalue2index
function can lead to a non-integer index that is not usable for slicing. Therefore,int
is enforced and tested.Fixes failing test from LumiSpy: LumiSpy/lumispy#78 (comment)
Along the way realized that rounding in
value2index
is not sufficiently tested for. Fixed bug forrounding=math.floor
. However, alsorounding=round
was not working as it should: 0.5 was rounded down and not up.Progress of the PR