New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A range of fixes relating to small coords dtypes #158
Conversation
Codecov Report
@@ Coverage Diff @@
## master #158 +/- ##
=========================================
- Coverage 96.92% 96.9% -0.03%
=========================================
Files 11 11
Lines 1205 1196 -9
=========================================
- Hits 1168 1159 -9
Misses 37 37
Continue to review full report at Codecov.
|
I'm a little sad to see the smaller dtypes go away, but I agree that they have caused enough small and hard-to-detect errors that removing them is probably the right way to go short-term. I would not be surprised if a future performance-oriented effort lead to them being reintroduced. Are there tests that we can do to ensure that these bugs don't get added back in in the future by enthusiastic devs? |
Thanks for making these fixes by the way. It's surprisingly nice to discover bugs in the evening and then learn that someone else has fixed them when you wake up :) |
I hope so too, but I wouldn't be surprised if it didn't happen. IIRC, this is the third time someone has reported bugs caused by this. Overflows can be surprisingly hard to catch, you have to go through every line for a possible overflow. Given the nature of sparse arrays, I'd like to see us switch to bigints or something similar in the future, for really large sparse arrays.
The tests are already there. They weren't being caught because (surprise, surprise) we were using unsigned dtypes. Now that we're using |
I'm not sure I understand. If someone were to undo the source code changes that you're making here and go back to using dtypes as we were using them before, they wouldn't get a signal that they were breaking things. |
I just did a quick test, adding back small dtypes without making the fixes required to actually support them. A lot of things do break. It's hard to come up with a comprehensive test suite that tests for overflows. You have to test for (for example) when you go from |
Alright then
…On Tue, Jun 12, 2018 at 9:13 AM, Hameer Abbasi ***@***.***> wrote:
I'm not sure I understand. If someone were to undo the source code changes
that you're making here and go back to using dtypes as we were using them
before, they wouldn't get a signal that they were breaking things.
I just did a quick test, adding back small dtypes without making the fixes
required to actually support them. A lot of things *do* break.
It's hard to come up with a comprehensive test suite that tests for
overflows. You have to test for (for example) when you go from np.uint8
to np.uint16, np.int8 to np.int16 , etc. (for someone who decides to use
signed types), find all of those thresholds, find what triggers them in
each case. This can be challenging in general, about as challenging as
actually finding the cases that would cause overflows in the first place.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#158 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszMedYMkscDf9cJzke1lR0vXOvJ-0ks5t776QgaJpZM4UkB2B>
.
|
There was a range of bugs relating to small
coords
dtypes (used to save memory) often causing overflows and other bugs, also in interaction withdask.array
I removed all
np.min_scalar_type
instances as well as custom smallcoords
instances.This led to a few other bugs, which were also fixed. First seen in scipy-conference/scipy_proceedings#388 (comment)
Example from that here: https://gist.github.com/hameerabbasi/0230eb7b6359416d1db4167f802ada83