Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value2index: convert index to int #2765

Merged
merged 16 commits into from
Jul 1, 2021
Merged
6 changes: 3 additions & 3 deletions hyperspy/axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -594,17 +594,17 @@ def value2index(self, value, rounding=round):
if rounding is round:
#Use argmin(abs) which will return the closest value
# rounding_index = lambda x: np.abs(x).argmin()
index = numba_closest_index_round(self.axis,value)
index = numba_closest_index_round(self.axis,value).astype(int)
elif rounding is math.ceil:
#Ceiling means finding index of the closest xi with xi - v >= 0
#we look for argmin of strictly non-negative part of self.axis-v.
#The trick is to replace strictly negative values with +np.inf
index = numba_closest_index_ceil(self.axis,value)
index = numba_closest_index_ceil(self.axis,value).astype(int)
jlaehne marked this conversation as resolved.
Show resolved Hide resolved
elif rounding is math.floor:
#flooring means finding index of the closest xi with xi - v <= 0
#we look for armgax of strictly non-positive part of self.axis-v.
#The trick is to replace strictly positive values with -np.inf
index = numba_closest_index_ceil(self.axis,value)
index = numba_closest_index_floor(self.axis,value).astype(int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good To Me.

However, this behaviour is weird because in the numba_closest_index functions, the returned array is initialized with "uint" dtype and argmax / argmin functions should not return anything else than int types. I wonder what causes this behaviour...

Well, if it works like this then who am I to judge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is odd, there should be a bug somewhere and from a quick look, it should not return float type and casting explicitly to int after calling jitted function shouldn't be necessary...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so as well.

However, for some reason a statement such as s.isig[value2index(x)+1:] would throw an error for value2index(x)+1 not being an int that can be used as index. It is explicitly for when another integer is added/subtracted to the result of value2index It is the source of the failure of the tests in LumiSpy/lumispy#78 (comment)
As the failure appeared only recently and only for the non_uniform_axes branch it must be a result of rewriting the value2index function.

Adding the .astype(int) fixes this problem. (Note that the value2index function for UniformDataAxis also explicitly casts to int.) Therefore, I propose to go for this "workaround".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, I propose to go for this "workaround".

I share this opinion

else:
raise ValueError(
f'Non-supported rounding function. Use '
Expand Down
4 changes: 2 additions & 2 deletions hyperspy/misc/array_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -470,8 +470,8 @@ def numba_closest_index_round(axis_array,value_array):
index_array = np.empty_like(value_array,dtype='uint')
#assign on flat, iterate on flat.
for i,v in enumerate(value_array.flat):
index_array.flat[i] = np.abs(axis_array - v).argmin()

vdiff_array = np.abs(axis_array - v)
index_array.flat[i] = np.flatnonzero(vdiff_array == np.min(vdiff_array))[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the idea is to deal with the case where two identical values are found in vdiff_array in which case you want to return the last one to ensure rounding above, I guess.

Unfortunately, while this might solve a failing test, this is not a solution for two reasons:

  1. if the axis is sorted in decreasing order, this will still round below
  2. the case where vdiff_array==np.min(vdiff_array) matches more than 1 value does not typically happens.

Here is an example:

>>> import numpy as np
>>> ax = np.linspace(-10.0,-11.0,11)
>>> ax
array([-10. , -10.1, -10.2, -10.3, -10.4, -10.5, -10.6, -10.7, -10.8,
       -10.9, -11. ])
>>> vdiff_array = np.abs(ax+10.15)
>>> vdiff_array
array([0.15, 0.05, 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85])
>>> vdiff_array==np.min(vdiff_array)
array([False, False,  True, False, False, False, False, False, False,
       False, False])

As you can see, this performs a rounding below (case 1) and we only have one single value matched in the equality comparison operator (case 2). Of course, the reason for this behaviour is float arithmetics. We have:

>>> vdiff_array[1] 
0.05000000000000071
>>> vdiff_array[2]
0.049999999999998934

which, no matter how you look at it, will still round below in most strategies, I think the best way to deal with it would be to bias very slightly vdiff_array towards upper-rounding. I would advocate something like np.abs(axis_array - v + MACHINEEPSILON) and MACHINEEPSILON would be determined using some relative tolerance, eg rtol=1e-10; MACHINEEPSILON = np.min(np.diff(axis))*rtol .

I don't pretend this is the ideal solution. But it can shifts a commonly encountered problem (incorrect value2indexing exactly half the interval between axis points) to a much less commonly encountered one (incorrect value2indexing a value that is very close but not equal to exactly half the interval between axis points)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was to enforce rounding up for a 0.5 value.

I like your idea with the upwards bias, I had thought in a similar direction. I'll implement that to see.

Though, I now realize that numpy also does not necessarily round 0.5 upwards, but it rounds to the next even number: https://github.com/numpy/numpy/blob/e94ed84010c60961f82860d146681d3fd607de4e/numpy/core/fromnumeric.py#L2723-L2789
(explicitly mentioning the issue with machine precision as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly.

Actually I wonder why this rounding approach is used in hyperspy. When slicing with a float value, I would personally prefer knowing in advance whether the sliced axis will contain or not the value I input.

I would agree it's not a big deal, but in the end we spend some effort to preserve this functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is python rounding which uses the round-to-even strategy to avoid accumulation of offset: https://realpython.com/python-rounding/#pythons-built-in-round-function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the end I think it does not make sense to touch this function,because round is of course used elsewhere in the code as well and we should be consistent there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the behaviour for non-uniform axes, I have used a machineepsilon here as well to have 'round-away-from-zero' instead of 'round-to-even'

Not sure what behaviour you mean. You want a behaviour similar to numpy.rint? https://numpy.org/doc/stable/reference/generated/numpy.rint.html#numpy.rint

In any case, I quickly checked out your branche and I think there still is an issue of uniformity between the Uniform and Non Uniform Axes

>>>from hyperspy.axes import UniformDataAxis

>>>inax = [[-11.0,-10.9],[-10.9,-11.0],[+10.9,+11.0],[+11.0,+10.9]]
>>>inval = [-10.95,-10.95,10.95,10.95]

>>>for i,j in zip(inax,inval):
>>>    ax = UniformDataAxis(scale=i[1]-i[0],offset=i[0],size=len(i))
>>>    nua_idx = super(type(ax),ax).value2index(j,rounding=round)
>>>    unif_idx = ax.value2index(j,rounding=round)
>>>    print("Value2Index IN ax: {}, val: {} --> UAX: {}, NUA: {}".format(ax.axis,j,unif_idx,nua_idx))

Value2Index IN ax: [-11.  -10.9], val: -10.95 --> UAX: 1, NUA: 0
Value2Index IN ax: [-10.9 -11. ], val: -10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [10.9 11. ], val: 10.95 --> UAX: 0, NUA: 1
Value2Index IN ax: [11.  10.9], val: 10.95 --> UAX: 1, NUA: 0

I assume the behaviour you desire is the one from NUA?

return index_array

@njit(cache=True)
Expand Down
16 changes: 15 additions & 1 deletion hyperspy/tests/axes/test_data_axis.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,10 @@ def test_value_changed_event(self):
ax.events.value_changed.connect(m.trigger_me)
ax.value = ax.value
assert not m.trigger_me.called
ax.value = ax.value + (ax.axis[1] - ax.axis[0]) / 2
ax.value = ax.value + (ax.axis[1] - ax.axis[0]) * 0.4
assert not m.trigger_me.called
ax.value = ax.value + (ax.axis[1] - ax.axis[0]) / 2
assert m.trigger_me.called
ax.value = ax.axis[1]
assert m.trigger_me.called

Expand Down Expand Up @@ -179,6 +181,11 @@ def test_convert_to_uniform_axis(self):
def test_value2index(self):
assert self.axis.value2index(10.15) == 3
assert self.axis.value2index(60) == 8
assert self.axis.value2index(2.5,rounding=round) == 2
assert self.axis.value2index(2.5,rounding=math.ceil) == 2
assert self.axis.value2index(2.5,rounding=math.floor) == 1
# Test that output is integer
assert isinstance(self.axis.value2index(60), (int, np.integer))

def test_value2index_error(self):
with pytest.raises(ValueError):
Expand Down Expand Up @@ -342,6 +349,11 @@ def test_functional_value2index(self):
#Tests for value2index
#Works as intended
assert self.axis.value2index(44.7) == 7
assert self.axis.value2index(2.5,rounding=round) == 2
assert self.axis.value2index(2.5,rounding=math.ceil) == 2
assert self.axis.value2index(2.5,rounding=math.floor) == 1
# Returns integer
assert isinstance(self.axis.value2index(45), (int, np.integer))
#Input None --> output None
assert self.axis.value2index(None) == None
#NaN in --> error out
Expand Down Expand Up @@ -470,6 +482,8 @@ def test_uniform_value2index(self):
assert self.axis.value2index(10.15) == 2
assert self.axis.value2index(10.17,rounding=math.floor) == 1
assert self.axis.value2index(10.13,rounding=math.ceil) == 2
# Test that output is integer
assert isinstance(self.axis.value2index(10.15), (int, np.integer))
#Endpoint left
assert self.axis.value2index(10.) == 0
#Endpoint right
Expand Down