Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow correct reading of custom floating point types #781

Closed
wants to merge 19 commits into from
Closed

Allow correct reading of custom floating point types #781

wants to merge 19 commits into from

Conversation

mraspaud
Copy link
Contributor

Sometimes, the custom floating point types need to be expanded in larger standard floating point types, eg a custom float fitting in 16 bits has to be unpacked into a float 32 for correct representation. On the current master branch, this would fail in some cases of larger than standard exponent values for example.

This PR checks the exponent value extent and significand precision to find which standard float type to expand the custom type into. Fixes #630

@mraspaud
Copy link
Contributor Author

mraspaud commented Dec 8, 2016

Hi, I was wondering if I needed to do anything more for this PR to be accepted ?

@tacaswell
Copy link
Member

Could you add a test for handling non-standard float sizes?

Someone other than me should do a review of this (@FrancescAlted @andrewcollette @andreabedini ) as I am not confident in my domain knowledge of non-standard floating types.

@mraspaud
Copy link
Contributor Author

mraspaud commented Dec 18, 2016 via email

@tacaswell
Copy link
Member

You can create custom data types (lifted from h5t.pyx):

# Mini floats
IEEE_F16BE = IEEE_F32BE.copy()
IEEE_F16BE.set_fields(15, 10, 5, 0, 10)
IEEE_F16BE.set_size(2)
IEEE_F16BE.set_ebias(15)
IEEE_F16BE.lock()

You will probably have to use the low-level interface, but I think you can create datasets with custom / non-standard floats.

@mraspaud
Copy link
Contributor Author

Ah, great ! Didn't know this was possible. I'll create some unit test asap.

@tacaswell tacaswell modified the milestones: 2.7.1, 2.7 Dec 21, 2016
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests as discussed.

This allows the data type to be promoted to larger floats if exponent or
mantissa are larger than the standard values for the current type (eg a
6 bits exponent with 9 bits mantissa has to be promoted to float 32).
Fixes #630

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
@mraspaud
Copy link
Contributor Author

Ok, the tests are added and seem to pass!

@tacaswell tacaswell dismissed their stale review December 22, 2016 13:54

Tests were added.

dset = f[dataset3]
try:
self.assert_(dset.dtype == np.float16)
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where can the AttributeError come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I comes from the np.float16 not existing nvm 🐑

# Handle non-standard exponent and mantissa sizes.
if m_size > 112 or (2**e_size - 2 - e_bias) > 16383 or (1 - e_bias) < -16382:
raise ValueError('Invalid exponent or mantissa size in ' + str(self))
elif m_size > 52 or (2**e_size - 2 - e_bias) > 1023 or (1 - e_bias) < -1022:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, more ignorant questions from me:

Is it possible for more than one of these conditions be true and require the next size float up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what the last of the four tests does check, the float on 2 bytes gets promoted twice up to a float 64 (8 bytes)

@@ -946,16 +946,29 @@ cdef class TypeFloatID(TypeAtomicID):
size = self.get_size() # int giving number of bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Should I ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, it makes the code less confusing and (maybe) saves a bit of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tacaswell
Copy link
Member

I am tentatively in favor of merging this. It has tests, does not break any existing tests, and I take on good faith @mraspaud claim that this fixes the original issue.

My only concern is that I do not know enough about float internals to evaluate the if/elif block as being correct in all reasonable cases.

@mraspaud
Copy link
Contributor Author

I can explain the logic a bit further:
A floating point number is made of a sign bit, a exponent and a mantissa (or significand), which have specific bit-length in the standard IEEE float types. One more thing that is needed to define these types is the bias of the exponent, which turns the unsigned exponent values into signed. In standard IEEE float types, the exponent bias is half of the maximum exponent value, for example 127 for a 8 bits exponent.

When creating custom floating point types, we can tweak both the sizes of the exponent and mantissa, but also the exponent bias, in order to better cover a given range of real numbers better. This tweaking is usually done for compression.

Now, when we convert those types back to standard IEEE floats, we need to check 2 things:

  1. that the mantissa of the custom type fits in the mantissa of the standard type (otherwise we loose absolute precision)
  2. that the exponent of the custom type fits in the exponent of the standard type (otherwise we loose precision over the span of numbers we are interested in).

1 is quite easy to fulfil: just check that the bitlength of the standard mantissa matches or is bigger than the bitlength of the custom type mantissa.
2 is a bit trickier: you need to check not only that the bitlength of the standard type exponent is larger than the one of the custom type, but also that the range covered by the exponent with the bias applied fits with the one of the standard type. For example if we have a custom exponent of 3 bits (exponent in [0 to 7]) and a bias of 118, then we need a standard type which covers exponents up to 125.
That corresponds the a single float (32 bits) which has an 8-bit exponent with a bias of 127, effectively covering -126 to 127 (-127 and 128 are reserved for special values, eg 0, nan, infinity...).

Hence the checks I made :

  • is the mantissa size big enough ?
  • is the standard exponent coverage large enough ?

Hope that clears things up...

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this.

Going to let sit for a bit longer to give others a chance to review.

@tacaswell
Copy link
Member

@mraspaud Thanks for the explanation!

@tacaswell
Copy link
Member

Is it worth putting some version of that explanation either in the docs or as a long comment in the source?

@aragilar
Copy link
Member

Some suggestions:

  • Use six.PY2 rather than sys.hexversion for consistency with the rest of h5py (or use b"", which is compatible with all the versions of python we support).
  • Exception should mention that there's insufficient precision to represent number, rather than it being invalid.
  • Use numpy.finfo rather than raw numbers, avoids potential problems with https://docs.scipy.org/doc/numpy-dev/user/basics.types.html#extended-precision.

I'm not sure about the cases where we're dealing with numbers which are larger than can be represented by a double/float64, it looks like we try using a long double, but that's equivalent to a normal double on windows (as long as you're using msvc, which most people are), so maybe we'd lose information? In any case, the tests currently don't cover floats that big, can those be added?

@mraspaud
Copy link
Contributor Author

@aragilar Thanks for the feedback. I implemented your suggestions, and made type checking/promotion more generic in the process :). I also did a lightweight check for testing the longdouble. However, more thorough checking is really tricky since, as you mentioned, it is heavily platform dependent. Tell me if you think this PR needs more work.

@tacaswell
Copy link
Member

Something went very badly wrong with git here..

@tacaswell tacaswell mentioned this pull request Dec 30, 2016
@tacaswell
Copy link
Member

I took the liberty of rebasing your work on top of the current master (it looks like you rebased current master on top of your branch?) and opened a new PR with only your commits.

Left 2 small comments over at #812

@aragilar
Copy link
Member

Replaced by #812.

@aragilar aragilar closed this Dec 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants