Allow correct reading of custom floating point types #781

mraspaud · 2016-11-27T22:21:28Z

Sometimes, the custom floating point types need to be expanded in larger standard floating point types, eg a custom float fitting in 16 bits has to be unpacked into a float 32 for correct representation. On the current master branch, this would fail in some cases of larger than standard exponent values for example.

This PR checks the exponent value extent and significand precision to find which standard float type to expand the custom type into. Fixes #630

mraspaud · 2016-12-08T07:25:56Z

Hi, I was wondering if I needed to do anything more for this PR to be accepted ?

tacaswell · 2016-12-17T16:40:31Z

Could you add a test for handling non-standard float sizes?

Someone other than me should do a review of this (@FrancescAlted @andrewcollette @andreabedini ) as I am not confident in my domain knowledge of non-standard floating types.

mraspaud · 2016-12-18T22:22:40Z

Hi, Thanks for looking at this. Unfortunately, I don't think I can write unit tests for this: as far as I know, it's not possible to create non-standard floats in h5py, hence I won't have anything to test on.

tacaswell · 2016-12-19T13:21:58Z

You can create custom data types (lifted from h5t.pyx):

# Mini floats
IEEE_F16BE = IEEE_F32BE.copy()
IEEE_F16BE.set_fields(15, 10, 5, 0, 10)
IEEE_F16BE.set_size(2)
IEEE_F16BE.set_ebias(15)
IEEE_F16BE.lock()

You will probably have to use the low-level interface, but I think you can create datasets with custom / non-standard floats.

mraspaud · 2016-12-19T14:45:07Z

Ah, great ! Didn't know this was possible. I'll create some unit test asap.

tacaswell

Please add tests as discussed.

This allows the data type to be promoted to larger floats if exponent or mantissa are larger than the standard values for the current type (eg a 6 bits exponent with 9 bits mantissa has to be promoted to float 32). Fixes #630 Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

mraspaud · 2016-12-22T10:09:12Z

Ok, the tests are added and seem to pass!

Tests were added.

tacaswell · 2016-12-22T13:56:37Z

h5py/tests/old/test_h5t.py

+        dset = f[dataset3]
+        try:
+            self.assert_(dset.dtype == np.float16)
+        except AttributeError:


where can the AttributeError come from?

ah, I comes from the np.float16 not existing nvm 🐑

tacaswell · 2016-12-22T14:03:30Z

h5py/h5t.pyx

+        # Handle non-standard exponent and mantissa sizes.
+        if m_size > 112 or (2**e_size - 2 - e_bias) > 16383 or (1 - e_bias) < -16382:
+            raise ValueError('Invalid exponent or mantissa size in ' + str(self))
+        elif m_size > 52 or (2**e_size - 2 - e_bias) > 1023 or (1 - e_bias) < -1022:


Sorry, more ignorant questions from me:

Is it possible for more than one of these conditions be true and require the next size float up?

Yes, that's what the last of the four tests does check, the float on 2 bytes gets promoted twice up to a float 64 (8 bytes)

tacaswell · 2016-12-22T14:03:43Z

h5py/h5t.pyx

@@ -946,16 +946,29 @@ cdef class TypeFloatID(TypeAtomicID):
        size = self.get_size()                  # int giving number of bytes


I think this line can be removed.

Yes. Should I ?

I think so, it makes the code less confusing and (maybe) saves a bit of time.

tacaswell · 2016-12-22T14:07:47Z

I am tentatively in favor of merging this. It has tests, does not break any existing tests, and I take on good faith @mraspaud claim that this fixes the original issue.

My only concern is that I do not know enough about float internals to evaluate the if/elif block as being correct in all reasonable cases.

mraspaud · 2016-12-22T20:21:39Z

I can explain the logic a bit further:
A floating point number is made of a sign bit, a exponent and a mantissa (or significand), which have specific bit-length in the standard IEEE float types. One more thing that is needed to define these types is the bias of the exponent, which turns the unsigned exponent values into signed. In standard IEEE float types, the exponent bias is half of the maximum exponent value, for example 127 for a 8 bits exponent.

When creating custom floating point types, we can tweak both the sizes of the exponent and mantissa, but also the exponent bias, in order to better cover a given range of real numbers better. This tweaking is usually done for compression.

Now, when we convert those types back to standard IEEE floats, we need to check 2 things:

that the mantissa of the custom type fits in the mantissa of the standard type (otherwise we loose absolute precision)
that the exponent of the custom type fits in the exponent of the standard type (otherwise we loose precision over the span of numbers we are interested in).

1 is quite easy to fulfil: just check that the bitlength of the standard mantissa matches or is bigger than the bitlength of the custom type mantissa.
2 is a bit trickier: you need to check not only that the bitlength of the standard type exponent is larger than the one of the custom type, but also that the range covered by the exponent with the bias applied fits with the one of the standard type. For example if we have a custom exponent of 3 bits (exponent in [0 to 7]) and a bias of 118, then we need a standard type which covers exponents up to 125.
That corresponds the a single float (32 bits) which has an 8-bit exponent with a bias of 127, effectively covering -126 to 127 (-127 and 128 are reserved for special values, eg 0, nan, infinity...).

Hence the checks I made :

is the mantissa size big enough ?
is the standard exponent coverage large enough ?

Hope that clears things up...

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

tacaswell

I am happy with this.

Going to let sit for a bit longer to give others a chance to review.

tacaswell · 2016-12-22T23:56:19Z

@mraspaud Thanks for the explanation!

tacaswell · 2016-12-22T23:57:10Z

Is it worth putting some version of that explanation either in the docs or as a long comment in the source?

aragilar · 2016-12-26T08:51:48Z

Some suggestions:

Use six.PY2 rather than sys.hexversion for consistency with the rest of h5py (or use b"", which is compatible with all the versions of python we support).
Exception should mention that there's insufficient precision to represent number, rather than it being invalid.
Use numpy.finfo rather than raw numbers, avoids potential problems with https://docs.scipy.org/doc/numpy-dev/user/basics.types.html#extended-precision.

I'm not sure about the cases where we're dealing with numbers which are larger than can be represented by a double/float64, it looks like we try using a long double, but that's equivalent to a normal double on windows (as long as you're using msvc, which most people are), so maybe we'd lose information? In any case, the tests currently don't cover floats that big, can those be added?

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

mraspaud · 2016-12-29T10:15:37Z

@aragilar Thanks for the feedback. I implemented your suggestions, and made type checking/promotion more generic in the process :). I also did a lightweight check for testing the longdouble. However, more thorough checking is really tricky since, as you mentioned, it is heavily platform dependent. Tell me if you think this PR needs more work.

tacaswell · 2016-12-30T00:15:00Z

Something went very badly wrong with git here..

tacaswell · 2016-12-30T02:58:43Z

I took the liberty of rebasing your work on top of the current master (it looks like you rebased current master on top of your branch?) and opened a new PR with only your commits.

Left 2 small comments over at #812

aragilar · 2016-12-30T08:45:02Z

Replaced by #812.

tacaswell modified the milestones: 2.7.1, 2.7 Dec 21, 2016

tacaswell previously requested changes Dec 21, 2016

View reviewed changes

mraspaud added 5 commits December 22, 2016 10:00

Check custom type floats with exponent bias to promote to right type

b3c1ac8

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Add unittests for custom floating type reading

27ce6a0

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Make custom float unittest portable

08592ec

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Fix string encoding for custom floats test

183316a

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

tacaswell reviewed Dec 22, 2016

View reviewed changes

Remove unneeded get_size call

381a428

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

tacaswell approved these changes Dec 22, 2016

View reviewed changes

asanakoy and others added 3 commits December 28, 2016 20:26

Fix inconsistency when slicing with numpy.array of shape (1,)

fb2a806

Check for duplicates in fancy index validation

4fc9300

DOC: add notes on requiring a rebuild

53f2298

tacaswell and others added 10 commits December 28, 2016 20:26

DOC: add missing reference

ebb6b1b

DOC: correct reference

3e398ef

DOC: remove un-included file + unused config

7b374fe

DOC: enforce that docs build with out warning

83f6dd4

Update classifiers to include supported python version and interpreters

6a1d55e

Use six.PY2 to check python version in h5t tests

7b0db15

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Generalize custom floating type promotion

cec28ad

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Test promotions with longdoubles

30fe64f

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Correct test comment

2f8a109

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

Use longdouble instead of explicitely using float128

88dc2bf

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>

tacaswell mentioned this pull request Dec 30, 2016

Fix ns floats #812

Merged

aragilar closed this Dec 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow correct reading of custom floating point types #781

Allow correct reading of custom floating point types #781

mraspaud commented Nov 27, 2016

mraspaud commented Dec 8, 2016

tacaswell commented Dec 17, 2016

mraspaud commented Dec 18, 2016 via email •

edited

tacaswell commented Dec 19, 2016

mraspaud commented Dec 19, 2016

tacaswell left a comment

mraspaud commented Dec 22, 2016

tacaswell Dec 22, 2016

tacaswell Dec 22, 2016

tacaswell Dec 22, 2016

mraspaud Dec 22, 2016

tacaswell Dec 22, 2016

mraspaud Dec 22, 2016

tacaswell Dec 22, 2016

mraspaud Dec 22, 2016

tacaswell commented Dec 22, 2016

mraspaud commented Dec 22, 2016

tacaswell left a comment

tacaswell commented Dec 22, 2016

tacaswell commented Dec 22, 2016

aragilar commented Dec 26, 2016

mraspaud commented Dec 29, 2016

tacaswell commented Dec 30, 2016

tacaswell commented Dec 30, 2016

aragilar commented Dec 30, 2016

		@@ -946,16 +946,29 @@ cdef class TypeFloatID(TypeAtomicID):
		size = self.get_size() # int giving number of bytes

Allow correct reading of custom floating point types #781

Allow correct reading of custom floating point types #781

Conversation

mraspaud commented Nov 27, 2016

mraspaud commented Dec 8, 2016

tacaswell commented Dec 17, 2016

mraspaud commented Dec 18, 2016 via email • edited

tacaswell commented Dec 19, 2016

mraspaud commented Dec 19, 2016

tacaswell left a comment

Choose a reason for hiding this comment

mraspaud commented Dec 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Dec 22, 2016

mraspaud commented Dec 22, 2016

tacaswell left a comment

Choose a reason for hiding this comment

tacaswell commented Dec 22, 2016

tacaswell commented Dec 22, 2016

aragilar commented Dec 26, 2016

mraspaud commented Dec 29, 2016

tacaswell commented Dec 30, 2016

tacaswell commented Dec 30, 2016

aragilar commented Dec 30, 2016

mraspaud commented Dec 18, 2016 via email •

edited