Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully support IEEE-754 floats on binary matching #4537

Closed
wants to merge 1 commit into from

Conversation

josevalim
Copy link
Contributor

At the moment, Erlang does not fully support IEEE-754
encoding in binary matching. In particular, it is not
possible to decode/encode infinity, negative infinity,
and nan's.

Furthermore, all of the arithmetic operations in Erlang
raise in the presence of the values above.

The IEEE-754 standard makes a strong case for not
raising when working with non-finite values - but we can
probably say this ship has sailed for Erlang. Furthermore,
one can assume that, if those features were desired by
the community, they would already have been implemented.

Therefore, this patch proposes to allow encoding/decoding
of non-finite values. After all, if a non-finite value was
encoded as part of a binary, it was likely expected that
the decoding party should handle it. This pull request adds
basic mechanisms to do so.

At the same time, this pull request does not change
(nor intends to) any of the arithmetic operations. Therefore,
operations that raised in the past when returning non-finite
values continue to behave the same.

A possible venue for exploration in a later pull request
is to augment the math module with IEEE compatible functions,
for example, math:ieee_log/1, math:ieee_exp/1, as well as
math:ieee_add/2, math:ieee_subtract/2 and so forth
(including math:ieee_negate/1) that supports non-finite
values. This would allow Erlang developers to work with
non-finite values if they want to and allow them to do so
efficiently (currently it is inefficient and cumbersome).

I am submitting this pull request for a initial request for
comments. In particular, I want to show supporting non-finite
values seems to be trivial, given this is handled by any C
toolchain that supports IEEE-754 (which is commonplace).
I will work on tests if there is interest.

@josevalim
Copy link
Contributor Author

josevalim commented Feb 19, 2021

Here is a quick snippet of what can be done:

1> <<F/float>> = <<16#7FF0000000000000:64>>.
<<127,240,0,0,0,0,0,0>>
2> F.
#Inf
3> float_to_list(F).
"#Inf"
4> is_float(F).
true
5> math:is_finite(F).
false
6> math:is_infinite(F).
true

I picked #Inf, #-Inf, and #NaN to represent those values.

@josevalim
Copy link
Contributor Author

@paulo-ferraz-oliveira maybe I misunderstood your question but we don't have a float_to_atom. :)

@paulo-ferraz-oliveira
Copy link
Contributor

paulo-ferraz-oliveira commented Feb 19, 2021

@paulo-ferraz-oliveira maybe I misunderstood your question but we don't have a float_to_atom. :)

@josevalim, you were too fast. I figured my question made no sense and deleted it (I should have edited it, probably), but I still got an answer from you 😄.

Great addition, by the way.


For history, the (stupid) question was: "Will this work with float_to_atom(#Inf)?". 😞

It goes to show how excited I was/am: my brain just shut down and though "Wouldn't it be nice to have this as an atom() too?"

At the moment, Erlang does not fully support IEEE-754
encoding in binary matching. In particular, it is not
possible to decode/encode infinity, negative infinity,
and nan's.

Furthermore, all of the arithmetic operations in Erlang
raise in the presence of the values above.

The IEEE-754 standard makes a strong case for *not*
raising when working with non-finite values - but we can
probably say this ship has sailed for Erlang. Furthermore,
one can assume that, if those features were desired by
the community, they would already have been implemented.

Therefore, this patch proposes to allow encoding/decoding
of non-finite values. After all, if a non-finite value was
encoded as part of a binary, it was likely expected that
the decoding party should handle it. This pull request adds
basic mechanisms to do so.

At the same time, this pull request does not change
(nor intends to) any of the arithmetic operations. Therefore,
operations that raised in the past when returning non-finite
values continue to behave the same.

A possible venue for exploration in a later pull request
is to augment the math module with IEEE compatible functions,
for example, `math:ieee_log/1`, `math:ieee_exp/1`, as well as
`math:ieee_add/2`, `math:ieee_subtract/2` and so forth
(including `math:ieee_negate/1`) that supports non-finite
values. This would allow Erlang developers to work with
non-finite values only if they want to and allow them to
do so efficiently (as currently it is inneficient and
cumbersome).

I am submitting this pull request for a initial request for
comments. In particular, I want to show supporting non-finite
values seems to be trivial, given this is handled by any C
toolchain that supports IEEE-754 (which is commonplace).
I will work on tests if there is interest.
@peerst
Copy link
Contributor

peerst commented Feb 19, 2021

The IEEE-754 standard makes a strong case for not
raising when working with non-finite values - but we can
probably say this ship has sailed for Erlang. Furthermore,
one can assume that, if those features were desired by
the community, they would already have been implemented.

Well I never got why Erlang raised on non-finite values and found it undesirable.
So speaking for myself I would strongly desire non raising non-finite values.

That it's not already implement probably doesn't mean it's not desired. So many things to desire so little time ;-)

I would say it's worthwhile discussing how such a feature could be implemented possibly in a backward compatible form. Its not only the math functions but also expressions like 1.0/0.0 or over/underflowing (BTW do we have support for -0.0 and +0.0 ?)

Raising or non raising should be probably a per process setting, defaulting to raising.
Possibly with a way to set the default for the whole VM?

@peerst
Copy link
Contributor

peerst commented Feb 19, 2021

I picked #Inf, #-Inf, and #NaN to represent those values.

What was the rationale for the # ?

Its none of the choices listed here https://en.wikipedia.org/wiki/NaN#Display

@wojtekmach
Copy link
Contributor

wojtekmach commented Feb 19, 2021

My guess is the representation would be similar to #Ref<> and #Port<> and thus maybe also get special treatment by the shell. NaN or nan could be mistaken for a variable or an atom.

@peerst
Copy link
Contributor

peerst commented Feb 19, 2021

My guess is the representation would be similar to #Ref<> and #Port<> and thus maybe also get special treatment by the shell. NaN or nan could be mistaken for a variable or an atom.

How could I miss this! And one could even have #Nan<special nan flags>

@josevalim
Copy link
Contributor Author

I would say it's worthwhile discussing how such a feature could be implemented possibly in a backward compatible form. Its not only the math functions but also expressions like 1.0/0.0 or over/underflowing (BTW do we have support for -0.0 and +0.0 ?)

I will be happy to have this discussion. For now this PR focuses on the decoding/encoding/checking, exactly because those are backwards compatible. The arithmetic part can be as easy or as hard we want it to be. :D

We do have support on -0.0 and 0.0. They were some bugs when handling and printing those, but they have already been fixed on master.

What was the rationale for the # ?

As @wojtekmach said, easy to recognize as a special entity in Erlang. But easy to change to anything else. :)

@josevalim
Copy link
Contributor Author

Note to self: if we accept this PR, we need to change round/ceil/floor to raise if a non finite float is given (as they expect integer returns).

@galdor
Copy link

galdor commented Mar 22, 2021

I would love to have full IEEE.754 support!

However this kind of modification should really extend to other operations such as the behaviour of the division by zero. There should probably be a VM flag to active this change.

Furthermore, one can assume that, if those features were desired by the community, they would already have been implemented.

I really wish this kind of argument was not used so often. In my experience, in the open source world, a ton of features are highly desired but never implemented for lots of reasons:

  • They are complicated to implement and no one tried and succeeded.
  • Implementing them is time consuming, and no one had the time for it.
  • There is too much friction in the contribution process.
  • The feature has been rejected by maintainers.

Regarding IEEE.754 support in Erlang:

  • People do not usually write math-heavy code in Erlang, it just does not make sense (slow, no IEEE.754, no unboxed vectors, etc.). So developers who need it use other languages and therefore never contribute.
  • Changing this kind of core component (arithmetic) is hard, and would probably sparks intense discussions. This is not encouraging.

One should not assume that the right solution is the one most similar to the status quo. Floating point operations should definitely not raise.

@galdor
Copy link

galdor commented Mar 22, 2021

Thinking about it, I believe the text representation should match IEEE.754 formats. You were talking about the need to recognize them as "special entities", but contrary to values such as ports or references which are specific to Erlang, non-finite floating point values are not special, they are perfectly valid numerical values.

It would be really strange to format strings for a user interface and end up with # everywhere. And it would be even more awkward to have to use a different function just to avoid these marks. Furthermore, the usual representations for non-finite values start with a capital letter (NaN, -Inf, etc.), and are therefore hard to miss.

@peerst
Copy link
Contributor

peerst commented Mar 22, 2021

It would be really strange to format strings for a user interface and end up with # everywhere. And it would be even more awkward to have to use a different function just to avoid these marks. Furthermore, the usual representations for non-finite values start with a capital letter (NaN, -Inf, etc.), and are therefore hard to miss.

We have to distinguish between external representations which can be produced by float_to_list etc. the other are the Erlang literals used when a term is printed. If not going for the usual Erlang special "thing" format #thing<...> other possibilities would be atoms with upper case letters assuming that's what you are referring to. But they not atoms but still float values. And we really don't want to convert a atom 'Inf' to a float. Just Inf as a literal is out anyway because it clashes with variable names.

float_to_list and its friends should return the standard IEEE names

@josevalim
Copy link
Contributor Author

Thinking about it, I believe the text representation should match IEEE.754 formats.

I am not sure IEEE 754 actually specifies how Infinity should be exhibited. At least I could not find any reference to the copy of the spec that I own. Most references are actually directly to ∞ (which we could actually use, especially as source codes are now required to be UTF-8). If someone can reference the textual representation of the spec, I would appreciate it.

@galdor
Copy link

galdor commented Mar 22, 2021

We have to distinguish between external representations which can be produced by float_to_list etc. the other are the Erlang literals used when a term is printed. If not going for the usual Erlang special "thing" format #thing<...> other possibilities would be atoms with upper case letters assuming that's what you are referring to. But they not atoms but still float values. And we really don't want to convert a atom 'Inf' to a float. Just Inf as a literal is out anyway because it clashes with variable names.

float_to_list and its friends should return the standard IEEE names

Yes I was talking about the external representation. To be precise, it is perfectly fine to use #NaN as Erlang term representing the IEEE.754 NaN value, but float_to_list(#NaN) should return "NaN", and io:format("~f", [#NaN]) should print NaN.

@josevalim Yes, you are right. So it is more about picking what is usually used. I'm used to NaN (and qNaN, sNan), -Inf and +Inf, but I do not think it matters as long as it is something already commonly used and easy to recognize.

@galdor
Copy link

galdor commented May 10, 2021

Any news about this one ? I imagine there are lots of things to discuss to one day have full IEEE.754 support in Erlang, but there are some good ideas on this thread.

@josevalim
Copy link
Contributor Author

I believe the goal is to resume this discussion once OTP 24 is released (this week!!!). :)

@josevalim
Copy link
Contributor Author

Hi OTP team! Now that OTP 24 is out, I would love to know if you have any feedback on the next steps for this PR or what are the concerns with moving forward with this. Thank you!

@KennethL
Copy link
Contributor

KennethL commented Jun 4, 2021 via email

g-andrade added a commit to g-andrade/locus that referenced this pull request Aug 29, 2021
Hopefully the following will allow us to drop this code one day:
* erlang/otp#4537
@KennethL
Copy link
Contributor

KennethL commented Oct 4, 2021

We have now discussed this in the OTP team:
We think a complete implementation would be preferable in the long run. We decided to spend some time to write an EEP for this which we then can discuss and get feedback on.

A short summary before we have a version of the EEP.

comparision of ieee floats?

Decided to follow ieee total ordering and break backwards compatibility. This is good because we have a standard we can point to, however it is not how most/all other languages compare floats, so it can lead to confusion.

How to handle sNaN?

sNaN is a special type of NaN that cause x86 (and possibly other) processors to segfault the current program if used.
A sNaN can enter Erlang via: binary_to_term/enif_make_float/<<Float/float>>
We suggest to return badarg/nomatchwhen found.

How to indicate that ieee arithmetics is to be used?

  • Introduce new operators: fdiv, fplus, fmul, fsub, where both operands must be floats.
  • Introduce new guard bifs: fabs/1, is_finite/1, is_infinite/1
  • Introduce a new module/API corresponding to math perhaps named float or similar. Here all the functions from math should be available but with the IEEE 754 semantics.

Other things to do

  • The type float() should be defined to include the infinite floats.
  • We need another type representing finite floats. Maybe finite_float or float_finite.
  • We need a literal syntax for NaN and +/- Inf as well as a printed representation.

It seems to be a lot of work to implement all of this and we questioned if there are other language features that are
more important.
As already mentioned we decided to write an EEP with our suggested solution, then we will see if this is something for OTP 25, 26 or not at all. There might be a simple step in this direction
that can support the most important use cases (@josevalim).

@josevalim
Copy link
Contributor Author

Thank you @KennethL for the summary! 💯 So my understanding is that we will decode them from binaries (with the exception of snan) but the existing operators will continue to raise on non-finite types. Some quick thoughts:

  1. For the math module, may I suggest fmath? This will also mirror operators and guards such as fabs, etc.

  2. While experimenting on those ideas for Elixir, I haven't found the literal syntax for infinity and NaNs to be necessary, as long as we can verify those in guards. Especially with NaNs, where there isn't a single value. I think most languages don't provide a literal syntax either.

@bjorng bjorng closed this Jan 11, 2022
@michallepicki
Copy link
Contributor

As already mentioned we decided to write an EEP with our suggested solution, then we will see if this is something for OTP 25, 26 or not at all.

Has there been any movement on that front? :)

@josevalim
Copy link
Contributor Author

The latest input I received was that this would have large ramifications. Returning "infinity" or "nan" might as well be an error for many applications today and doing those changes would make it so it silently works. Therefore there are no plans to move this forward. :)

@michalmuskala
Copy link
Contributor

I wonder if perhaps having an extra modifier for floats that would unlock this would be possible, something like

<<X/float-full:64>>

Though I assume the main complexity is in actually having infinities and nan values floating around the codebases, rather than just decoding them.

@garazdawi
Copy link
Contributor

Yes, it is having NaN/±inf in Erlang code that is the problem.

Specifically it is equality and comparison that is the main hurdle. Now that we have decided that matching -0.0 and +0.0 as the same was a bug, we could make matching a structural comparison and == an arithmetic comparison.

Though problems would still exist, for example if we did lists:usort([#NaN<0>,#NaN<0>]) the return value would be [#NaN<0>,#NaN<0>]. There also might be code that expects A < B orelse B > A orelse A == B to always be true for Erlang terms, which it would not be anymore for #NaN.

It is also a bit odd that a > #Inf, though that one I think we can live with.

It is still a lot of work to fix this in all cases and, as Jose mentions, there is code where 1 / 0 is currently expected to fail and if we change it to not fail it could potentially silently break a lot of code.

So no, there are currently no plans to revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.