New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow correct reading of custom floating point types #781
Conversation
Hi, I was wondering if I needed to do anything more for this PR to be accepted ? |
Could you add a test for handling non-standard float sizes? Someone other than me should do a review of this (@FrancescAlted @andrewcollette @andreabedini ) as I am not confident in my domain knowledge of non-standard floating types. |
Hi,
Thanks for looking at this.
Unfortunately, I don't think I can write unit tests for this: as far as I
know, it's not possible to create non-standard floats in h5py, hence I
won't have anything to test on.
|
You can create custom data types (lifted from # Mini floats
IEEE_F16BE = IEEE_F32BE.copy()
IEEE_F16BE.set_fields(15, 10, 5, 0, 10)
IEEE_F16BE.set_size(2)
IEEE_F16BE.set_ebias(15)
IEEE_F16BE.lock() You will probably have to use the low-level interface, but I think you can create datasets with custom / non-standard floats. |
Ah, great ! Didn't know this was possible. I'll create some unit test asap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests as discussed.
This allows the data type to be promoted to larger floats if exponent or mantissa are larger than the standard values for the current type (eg a 6 bits exponent with 9 bits mantissa has to be promoted to float 32). Fixes #630 Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Ok, the tests are added and seem to pass! |
dset = f[dataset3] | ||
try: | ||
self.assert_(dset.dtype == np.float16) | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where can the AttributeError
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I comes from the np.float16
not existing nvm 🐑
# Handle non-standard exponent and mantissa sizes. | ||
if m_size > 112 or (2**e_size - 2 - e_bias) > 16383 or (1 - e_bias) < -16382: | ||
raise ValueError('Invalid exponent or mantissa size in ' + str(self)) | ||
elif m_size > 52 or (2**e_size - 2 - e_bias) > 1023 or (1 - e_bias) < -1022: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, more ignorant questions from me:
Is it possible for more than one of these conditions be true and require the next size float up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what the last of the four tests does check, the float on 2 bytes gets promoted twice up to a float 64 (8 bytes)
@@ -946,16 +946,29 @@ cdef class TypeFloatID(TypeAtomicID): | |||
size = self.get_size() # int giving number of bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Should I ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, it makes the code less confusing and (maybe) saves a bit of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I am tentatively in favor of merging this. It has tests, does not break any existing tests, and I take on good faith @mraspaud claim that this fixes the original issue. My only concern is that I do not know enough about float internals to evaluate the if/elif block as being correct in all reasonable cases. |
I can explain the logic a bit further: When creating custom floating point types, we can tweak both the sizes of the exponent and mantissa, but also the exponent bias, in order to better cover a given range of real numbers better. This tweaking is usually done for compression. Now, when we convert those types back to standard IEEE floats, we need to check 2 things:
1 is quite easy to fulfil: just check that the bitlength of the standard mantissa matches or is bigger than the bitlength of the custom type mantissa. Hence the checks I made :
Hope that clears things up... |
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with this.
Going to let sit for a bit longer to give others a chance to review.
@mraspaud Thanks for the explanation! |
Is it worth putting some version of that explanation either in the docs or as a long comment in the source? |
Some suggestions:
I'm not sure about the cases where we're dealing with numbers which are larger than can be represented by a double/float64, it looks like we try using a long double, but that's equivalent to a normal double on windows (as long as you're using msvc, which most people are), so maybe we'd lose information? In any case, the tests currently don't cover floats that big, can those be added? |
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
@aragilar Thanks for the feedback. I implemented your suggestions, and made type checking/promotion more generic in the process :). I also did a lightweight check for testing the longdouble. However, more thorough checking is really tricky since, as you mentioned, it is heavily platform dependent. Tell me if you think this PR needs more work. |
Something went very badly wrong with git here.. |
I took the liberty of rebasing your work on top of the current master (it looks like you rebased current master on top of your branch?) and opened a new PR with only your commits. Left 2 small comments over at #812 |
Replaced by #812. |
Sometimes, the custom floating point types need to be expanded in larger standard floating point types, eg a custom float fitting in 16 bits has to be unpacked into a float 32 for correct representation. On the current master branch, this would fail in some cases of larger than standard exponent values for example.
This PR checks the exponent value extent and significand precision to find which standard float type to expand the custom type into. Fixes #630