Fully support IEEE-754 floats on binary matching #4537

josevalim · 2021-02-19T12:18:53Z

At the moment, Erlang does not fully support IEEE-754
encoding in binary matching. In particular, it is not
possible to decode/encode infinity, negative infinity,
and nan's.

Furthermore, all of the arithmetic operations in Erlang
raise in the presence of the values above.

The IEEE-754 standard makes a strong case for not
raising when working with non-finite values - but we can
probably say this ship has sailed for Erlang. Furthermore,
one can assume that, if those features were desired by
the community, they would already have been implemented.

Therefore, this patch proposes to allow encoding/decoding
of non-finite values. After all, if a non-finite value was
encoded as part of a binary, it was likely expected that
the decoding party should handle it. This pull request adds
basic mechanisms to do so.

At the same time, this pull request does not change
(nor intends to) any of the arithmetic operations. Therefore,
operations that raised in the past when returning non-finite
values continue to behave the same.

A possible venue for exploration in a later pull request
is to augment the math module with IEEE compatible functions,
for example, math:ieee_log/1, math:ieee_exp/1, as well as
math:ieee_add/2, math:ieee_subtract/2 and so forth
(including math:ieee_negate/1) that supports non-finite
values. This would allow Erlang developers to work with
non-finite values if they want to and allow them to do so
efficiently (currently it is inefficient and cumbersome).

I am submitting this pull request for a initial request for
comments. In particular, I want to show supporting non-finite
values seems to be trivial, given this is handled by any C
toolchain that supports IEEE-754 (which is commonplace).
I will work on tests if there is interest.

josevalim · 2021-02-19T12:20:37Z

Here is a quick snippet of what can be done:

1> <<F/float>> = <<16#7FF0000000000000:64>>.
<<127,240,0,0,0,0,0,0>>
2> F.
#Inf
3> float_to_list(F).
"#Inf"
4> is_float(F).
true
5> math:is_finite(F).
false
6> math:is_infinite(F).
true

I picked #Inf, #-Inf, and #NaN to represent those values.

josevalim · 2021-02-19T16:33:41Z

@paulo-ferraz-oliveira maybe I misunderstood your question but we don't have a float_to_atom. :)

paulo-ferraz-oliveira · 2021-02-19T17:09:45Z

@paulo-ferraz-oliveira maybe I misunderstood your question but we don't have a float_to_atom. :)

@josevalim, you were too fast. I figured my question made no sense and deleted it (I should have edited it, probably), but I still got an answer from you 😄.

Great addition, by the way.

For history, the (stupid) question was: "Will this work with float_to_atom(#Inf)?". 😞

It goes to show how excited I was/am: my brain just shut down and though "Wouldn't it be nice to have this as an atom() too?"

At the moment, Erlang does not fully support IEEE-754 encoding in binary matching. In particular, it is not possible to decode/encode infinity, negative infinity, and nan's. Furthermore, all of the arithmetic operations in Erlang raise in the presence of the values above. The IEEE-754 standard makes a strong case for *not* raising when working with non-finite values - but we can probably say this ship has sailed for Erlang. Furthermore, one can assume that, if those features were desired by the community, they would already have been implemented. Therefore, this patch proposes to allow encoding/decoding of non-finite values. After all, if a non-finite value was encoded as part of a binary, it was likely expected that the decoding party should handle it. This pull request adds basic mechanisms to do so. At the same time, this pull request does not change (nor intends to) any of the arithmetic operations. Therefore, operations that raised in the past when returning non-finite values continue to behave the same. A possible venue for exploration in a later pull request is to augment the math module with IEEE compatible functions, for example, `math:ieee_log/1`, `math:ieee_exp/1`, as well as `math:ieee_add/2`, `math:ieee_subtract/2` and so forth (including `math:ieee_negate/1`) that supports non-finite values. This would allow Erlang developers to work with non-finite values only if they want to and allow them to do so efficiently (as currently it is inneficient and cumbersome). I am submitting this pull request for a initial request for comments. In particular, I want to show supporting non-finite values seems to be trivial, given this is handled by any C toolchain that supports IEEE-754 (which is commonplace). I will work on tests if there is interest.

peerst · 2021-02-19T18:19:01Z

The IEEE-754 standard makes a strong case for not
raising when working with non-finite values - but we can
probably say this ship has sailed for Erlang. Furthermore,
one can assume that, if those features were desired by
the community, they would already have been implemented.

Well I never got why Erlang raised on non-finite values and found it undesirable.
So speaking for myself I would strongly desire non raising non-finite values.

That it's not already implement probably doesn't mean it's not desired. So many things to desire so little time ;-)

I would say it's worthwhile discussing how such a feature could be implemented possibly in a backward compatible form. Its not only the math functions but also expressions like 1.0/0.0 or over/underflowing (BTW do we have support for -0.0 and +0.0 ?)

Raising or non raising should be probably a per process setting, defaulting to raising.
Possibly with a way to set the default for the whole VM?

peerst · 2021-02-19T18:23:29Z

I picked #Inf, #-Inf, and #NaN to represent those values.

What was the rationale for the # ?

Its none of the choices listed here https://en.wikipedia.org/wiki/NaN#Display

wojtekmach · 2021-02-19T18:34:14Z

My guess is the representation would be similar to #Ref<> and #Port<> and thus maybe also get special treatment by the shell. NaN or nan could be mistaken for a variable or an atom.

peerst · 2021-02-19T18:45:41Z

My guess is the representation would be similar to #Ref<> and #Port<> and thus maybe also get special treatment by the shell. NaN or nan could be mistaken for a variable or an atom.

How could I miss this! And one could even have #Nan<special nan flags>

josevalim · 2021-02-19T18:56:33Z

I would say it's worthwhile discussing how such a feature could be implemented possibly in a backward compatible form. Its not only the math functions but also expressions like 1.0/0.0 or over/underflowing (BTW do we have support for -0.0 and +0.0 ?)

I will be happy to have this discussion. For now this PR focuses on the decoding/encoding/checking, exactly because those are backwards compatible. The arithmetic part can be as easy or as hard we want it to be. :D

We do have support on -0.0 and 0.0. They were some bugs when handling and printing those, but they have already been fixed on master.

What was the rationale for the # ?

As @wojtekmach said, easy to recognize as a special entity in Erlang. But easy to change to anything else. :)

josevalim · 2021-03-06T11:26:35Z

Note to self: if we accept this PR, we need to change round/ceil/floor to raise if a non finite float is given (as they expect integer returns).

galdor · 2021-03-22T11:13:28Z

I would love to have full IEEE.754 support!

However this kind of modification should really extend to other operations such as the behaviour of the division by zero. There should probably be a VM flag to active this change.

Furthermore, one can assume that, if those features were desired by the community, they would already have been implemented.

I really wish this kind of argument was not used so often. In my experience, in the open source world, a ton of features are highly desired but never implemented for lots of reasons:

They are complicated to implement and no one tried and succeeded.
Implementing them is time consuming, and no one had the time for it.
There is too much friction in the contribution process.
The feature has been rejected by maintainers.

Regarding IEEE.754 support in Erlang:

People do not usually write math-heavy code in Erlang, it just does not make sense (slow, no IEEE.754, no unboxed vectors, etc.). So developers who need it use other languages and therefore never contribute.
Changing this kind of core component (arithmetic) is hard, and would probably sparks intense discussions. This is not encouraging.

One should not assume that the right solution is the one most similar to the status quo. Floating point operations should definitely not raise.

galdor · 2021-03-22T12:53:22Z

Thinking about it, I believe the text representation should match IEEE.754 formats. You were talking about the need to recognize them as "special entities", but contrary to values such as ports or references which are specific to Erlang, non-finite floating point values are not special, they are perfectly valid numerical values.

It would be really strange to format strings for a user interface and end up with # everywhere. And it would be even more awkward to have to use a different function just to avoid these marks. Furthermore, the usual representations for non-finite values start with a capital letter (NaN, -Inf, etc.), and are therefore hard to miss.

peerst · 2021-03-22T13:12:05Z

It would be really strange to format strings for a user interface and end up with # everywhere. And it would be even more awkward to have to use a different function just to avoid these marks. Furthermore, the usual representations for non-finite values start with a capital letter (NaN, -Inf, etc.), and are therefore hard to miss.

We have to distinguish between external representations which can be produced by float_to_list etc. the other are the Erlang literals used when a term is printed. If not going for the usual Erlang special "thing" format #thing<...> other possibilities would be atoms with upper case letters assuming that's what you are referring to. But they not atoms but still float values. And we really don't want to convert a atom 'Inf' to a float. Just Inf as a literal is out anyway because it clashes with variable names.

float_to_list and its friends should return the standard IEEE names

josevalim · 2021-03-22T13:25:09Z

Thinking about it, I believe the text representation should match IEEE.754 formats.

I am not sure IEEE 754 actually specifies how Infinity should be exhibited. At least I could not find any reference to the copy of the spec that I own. Most references are actually directly to ∞ (which we could actually use, especially as source codes are now required to be UTF-8). If someone can reference the textual representation of the spec, I would appreciate it.

galdor · 2021-03-22T14:54:44Z

We have to distinguish between external representations which can be produced by float_to_list etc. the other are the Erlang literals used when a term is printed. If not going for the usual Erlang special "thing" format #thing<...> other possibilities would be atoms with upper case letters assuming that's what you are referring to. But they not atoms but still float values. And we really don't want to convert a atom 'Inf' to a float. Just Inf as a literal is out anyway because it clashes with variable names.

float_to_list and its friends should return the standard IEEE names

Yes I was talking about the external representation. To be precise, it is perfectly fine to use #NaN as Erlang term representing the IEEE.754 NaN value, but float_to_list(#NaN) should return "NaN", and io:format("~f", [#NaN]) should print NaN.

@josevalim Yes, you are right. So it is more about picking what is usually used. I'm used to NaN (and qNaN, sNan), -Inf and +Inf, but I do not think it matters as long as it is something already commonly used and easy to recognize.

galdor · 2021-05-10T16:32:05Z

Any news about this one ? I imagine there are lots of things to discuss to one day have full IEEE.754 support in Erlang, but there are some good ideas on this thread.

josevalim · 2021-05-10T17:19:46Z

I believe the goal is to resume this discussion once OTP 24 is released (this week!!!). :)

josevalim · 2021-06-03T16:39:09Z

Hi OTP team! Now that OTP 24 is out, I would love to know if you have any feedback on the next steps for this PR or what are the concerns with moving forward with this. Thank you!

KennethL · 2021-06-04T18:40:23Z

It is in our plans to work with this early on the way to OTP 25. You can expect activity really soon now.

…

On Thu, Jun 3, 2021, 18:39 José Valim ***@***.***> wrote: Hi OTP team! Now that OTP 24 is out, I would love to know if you have any feedback on the next steps for this PR or what are the concerns with moving forward with this. Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4537 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABFWSFIROYZNJLUDVGIS23TQ6V33ANCNFSM4X4IUUTQ> .

Hopefully the following will allow us to drop this code one day: * erlang/otp#4537

KennethL · 2021-10-04T12:14:45Z

We have now discussed this in the OTP team:
We think a complete implementation would be preferable in the long run. We decided to spend some time to write an EEP for this which we then can discuss and get feedback on.

A short summary before we have a version of the EEP.

comparision of ieee floats?

Decided to follow ieee total ordering and break backwards compatibility. This is good because we have a standard we can point to, however it is not how most/all other languages compare floats, so it can lead to confusion.

How to handle sNaN?

sNaN is a special type of NaN that cause x86 (and possibly other) processors to segfault the current program if used.
A sNaN can enter Erlang via: binary_to_term/enif_make_float/<<Float/float>>
We suggest to return badarg/nomatchwhen found.

How to indicate that ieee arithmetics is to be used?

Introduce new operators: fdiv, fplus, fmul, fsub, where both operands must be floats.
Introduce new guard bifs: fabs/1, is_finite/1, is_infinite/1
Introduce a new module/API corresponding to math perhaps named float or similar. Here all the functions from math should be available but with the IEEE 754 semantics.

Other things to do

The type float() should be defined to include the infinite floats.
We need another type representing finite floats. Maybe finite_float or float_finite.
We need a literal syntax for NaN and +/- Inf as well as a printed representation.

It seems to be a lot of work to implement all of this and we questioned if there are other language features that are
more important.
As already mentioned we decided to write an EEP with our suggested solution, then we will see if this is something for OTP 25, 26 or not at all. There might be a simple step in this direction
that can support the most important use cases (@josevalim).

josevalim · 2021-10-04T12:41:09Z

Thank you @KennethL for the summary! 💯 So my understanding is that we will decode them from binaries (with the exception of snan) but the existing operators will continue to raise on non-finite types. Some quick thoughts:

For the math module, may I suggest fmath? This will also mirror operators and guards such as fabs, etc.
While experimenting on those ideas for Elixir, I haven't found the literal syntax for infinity and NaNs to be necessary, as long as we can verify those in guards. Especially with NaNs, where there isn't a single value. I think most languages don't provide a literal syntax either.

michallepicki · 2024-10-12T14:50:51Z

As already mentioned we decided to write an EEP with our suggested solution, then we will see if this is something for OTP 25, 26 or not at all.

Has there been any movement on that front? :)

josevalim · 2024-10-12T16:58:18Z

The latest input I received was that this would have large ramifications. Returning "infinity" or "nan" might as well be an error for many applications today and doing those changes would make it so it silently works. Therefore there are no plans to move this forward. :)

michalmuskala · 2024-10-13T11:26:24Z

I wonder if perhaps having an extra modifier for floats that would unlock this would be possible, something like

<<X/float-full:64>>

Though I assume the main complexity is in actually having infinities and nan values floating around the codebases, rather than just decoding them.

garazdawi · 2024-10-14T08:17:05Z

Yes, it is having NaN/±inf in Erlang code that is the problem.

Specifically it is equality and comparison that is the main hurdle. Now that we have decided that matching -0.0 and +0.0 as the same was a bug, we could make matching a structural comparison and == an arithmetic comparison.

Though problems would still exist, for example if we did lists:usort([#NaN<0>,#NaN<0>]) the return value would be [#NaN<0>,#NaN<0>]. There also might be code that expects A < B orelse B > A orelse A == B to always be true for Erlang terms, which it would not be anymore for #NaN.

It is also a bit odd that a > #Inf, though that one I think we can live with.

It is still a lot of work to fix this in all cases and, as Jose mentions, there is code where 1 / 0 is currently expected to fail and if we change it to not fail it could potentially silently break a lot of code.

So no, there are currently no plans to revisit this.

josevalim force-pushed the jv-ieee branch from fc90978 to 2f3d0d5 Compare February 19, 2021 14:05

rickard-green added the team:VM Assigned to OTP team VM label Feb 19, 2021

josevalim force-pushed the jv-ieee branch from 2f3d0d5 to 121685f Compare February 19, 2021 17:42

paulo-ferraz-oliveira mentioned this pull request Feb 19, 2021

Consider support for #Inf, #-Inf and NaN (OTP 24+) tomas-abrahamsson/gpb#199

Closed

seanmor5 mentioned this pull request Feb 19, 2021

Infix dot operator elixir-nx/nx#236

Closed

g-andrade added a commit to g-andrade/locus that referenced this pull request Aug 29, 2021

Support decoding IEEE-754 infinities in MMDB data

cc984e9

Hopefully the following will allow us to drop this code one day: * erlang/otp#4537

bjorng closed this Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully support IEEE-754 floats on binary matching #4537

Fully support IEEE-754 floats on binary matching #4537

josevalim commented Feb 19, 2021

josevalim commented Feb 19, 2021 •

edited

Loading

josevalim commented Feb 19, 2021

paulo-ferraz-oliveira commented Feb 19, 2021 •

edited

Loading

peerst commented Feb 19, 2021

peerst commented Feb 19, 2021

wojtekmach commented Feb 19, 2021 •

edited

Loading

peerst commented Feb 19, 2021 •

edited

Loading

josevalim commented Feb 19, 2021

josevalim commented Mar 6, 2021

galdor commented Mar 22, 2021 •

edited

Loading

galdor commented Mar 22, 2021

peerst commented Mar 22, 2021 •

edited

Loading

josevalim commented Mar 22, 2021

galdor commented Mar 22, 2021

galdor commented May 10, 2021

josevalim commented May 10, 2021

josevalim commented Jun 3, 2021

KennethL commented Jun 4, 2021 via email

KennethL commented Oct 4, 2021

josevalim commented Oct 4, 2021

michallepicki commented Oct 12, 2024

josevalim commented Oct 12, 2024

michalmuskala commented Oct 13, 2024

garazdawi commented Oct 14, 2024

Fully support IEEE-754 floats on binary matching #4537

Fully support IEEE-754 floats on binary matching #4537

Conversation

josevalim commented Feb 19, 2021

josevalim commented Feb 19, 2021 • edited Loading

josevalim commented Feb 19, 2021

paulo-ferraz-oliveira commented Feb 19, 2021 • edited Loading

peerst commented Feb 19, 2021

peerst commented Feb 19, 2021

wojtekmach commented Feb 19, 2021 • edited Loading

peerst commented Feb 19, 2021 • edited Loading

josevalim commented Feb 19, 2021

josevalim commented Mar 6, 2021

galdor commented Mar 22, 2021 • edited Loading

galdor commented Mar 22, 2021

peerst commented Mar 22, 2021 • edited Loading

josevalim commented Mar 22, 2021

galdor commented Mar 22, 2021

galdor commented May 10, 2021

josevalim commented May 10, 2021

josevalim commented Jun 3, 2021

KennethL commented Jun 4, 2021 via email

KennethL commented Oct 4, 2021

comparision of ieee floats?

How to handle sNaN?

How to indicate that ieee arithmetics is to be used?

Other things to do

josevalim commented Oct 4, 2021

michallepicki commented Oct 12, 2024

josevalim commented Oct 12, 2024

michalmuskala commented Oct 13, 2024

garazdawi commented Oct 14, 2024

josevalim commented Feb 19, 2021 •

edited

Loading

paulo-ferraz-oliveira commented Feb 19, 2021 •

edited

Loading

wojtekmach commented Feb 19, 2021 •

edited

Loading

peerst commented Feb 19, 2021 •

edited

Loading

galdor commented Mar 22, 2021 •

edited

Loading

peerst commented Mar 22, 2021 •

edited

Loading