Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: loosen kwargs requirements in ediff1d #12713

Merged
merged 1 commit into from
Jan 10, 2019
Merged

Conversation

mattip
Copy link
Member

@mattip mattip commented Jan 10, 2019

Fixes #12711 and compatibility with 1.15. In fixing #11490, checks were added to ediff1d's to_begin and to_end kwargs in PR #11805. The checks are now too stringent, and cause a regression from previous versions. This PR relies on np.asanyarray(values, dtype=dtype_req) to raise if the values cannot be cast. Unfortunately, asanyarray uses unsafe casting, so an additional check that the values did not change is needed.

Also note the exception type in tests was adjusted to ValueError, since casting np.array([1, 2, 3], dtype='int64') to 'int32' is fine, but np.array([1, 1<<35, 3], dtype='int64') is not.

Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks useful / sensible at first glance, but you've probably noticed that the Azure failures are related to the new unit test added here.

@llimeht
Copy link

llimeht commented Jan 10, 2019

This solves the failing tests I had with pbcore, so +1 from that perspective. Thanks!

@seberg
Copy link
Member

seberg commented Jan 10, 2019

@llimeht do you guys rely on the np.array input also being cast lazily unsafely? Or only scalars, or scalars and lists? All three of those are a bit different... If we want to only be more forgiving with scalars it may actually be pretty reasonable to use:

# Hack relying on our "scalars are worthless" casting behaviour :)
to_begin = np.asarray(to_begin)
np.result_type(to_begin, arr.dtype) == arr.dtype

A good quick middle way may be to use same_kind casting here. That keeps the fix for the surprises with np.nan while not being overly strict. It would be a bit stricter than the current approach (since this one actually allows float arrays if values match), but that seems fine overall.

@seberg
Copy link
Member

seberg commented Jan 10, 2019

Well, I guess this isn't a bad middle way, we can make it more strict/nicer at some point later. Will keep a bit in case someone has a better idea, but we should probably just go with it and backport.

@mattip
Copy link
Member Author

mattip commented Jan 10, 2019

The Azure tests are from 32 bit platforms where instead of a ValueError the call to np.asanyarray(values, dtype='int32') raises an OverflowError since 1<<35 does not fit into an int on that platform. I will change the tests to fit into 32 bit integers. I think it is out of scope to change the asanyarray exception or try to recast one exception as another inside ediff1d.

@tylerjereddy tylerjereddy added this to the 1.16.1 release milestone Jan 10, 2019
@tylerjereddy tylerjereddy added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 10, 2019
@tylerjereddy
Copy link
Contributor

This is a pure Python bug fix with 100 % patch diff coverage assessed by codecov and an approving backport suggestion from a non-BIDS core dev so in it goes with tentative 1.16.1 milestone tag and tag to remind us about the release note eventually needed.

@tylerjereddy tylerjereddy merged commit ce779dc into numpy:master Jan 10, 2019
@tylerjereddy
Copy link
Contributor

Thanks @mattip

@llimeht
Copy link

llimeht commented Jan 10, 2019

pbcore is also happy with this version. Thanks!

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jan 19, 2019
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jan 20, 2019
@charris charris removed this from the 1.16.1 release milestone Jan 20, 2019
@stuartarchibald
Copy link
Contributor

This patch will cause an exception to be raised in the case of a NaN in the to_begin/to_end arrays. e.g.

dt=np.float64
np.ediff1d(np.arange(10, dtype=dt), to_begin=np.array([1, np.nan], dtype=dt))

because of lines like this with an equality test of array == array: https://github.com/numpy/numpy/pull/12713/files#diff-9e06c7b7bcc8e8153f5c2df57283b4cbR109

Numba unit tests caught it when updating to support 1.16. numba/numba#3826

Further... the error message is perhaps a bit strange in the context of e.g.:

np.ediff1d(np.arange(10, dtype=np.float32), to_begin=np.array([np.finfo(np.float64).max], dtype=np.float64)) 

which produces

ValueError: cannot convert 'to_begin' to array with dtype 'dtype('float32')' as required for input ary

which is not actually the problem, because technically the array np.array([np.finfo(np.float64).max], dtype=np.float64)) can be converted to a np.float32 dtype:

In [18]: np.array([np.finfo(np.float64).max], dtype=np.float64).astype(np.float32)                                              
Out[18]: array([inf], dtype=float32)

it's just that it's not "safe" under the given behavioural assumptions?

Happy to open a new ticket/patch for either of these if considered a problem? Else Numba will be patched to match.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: numpy.lib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Noting ediff1d casting changes in release notes, docs?
6 participants