Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

Open
kortschak opened this issue Sep 6, 2015 · 25 comments
Open

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

kortschak opened this issue Sep 6, 2015 · 25 comments
Assignees
Labels
Milestone

Comments

@kortschak
Copy link
Contributor

@kortschak kortschak commented Sep 6, 2015

See https://groups.google.com/d/topic/golang-dev/oIB-wBj3ufw/discussion.

The language specification never mentioned binary exponent float representation, but it was previously included in the gc implementation and it is included as a formatting option via strconv.AppendFloat with the 'b' fmt argument. However, it now lives on as a parsing option only in the compiler and test code in strconv.

The capacity to represent exact float values in a clear human-readable way is valuable in numeric code, for example here, where otherwise comments are required to explain the magic hex.

It is not clear how this should be included, since parsing a string is failable at runtime and these values are likely to nearly always be compile time constants.

/cc @griesemer

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Sep 6, 2015

Some comments:

  1. The compiler is using the mechanism to write (export) and read (import) exported float constants, written using the p exponent, because it permits an easy and lossless representation of a float in decimal form (decimal mantissa, exponent to power of 2, but written in decimal form). Note that the need for this format is likely going away since a binary representation of the exported data is more compact, just as precise, and faster to read and write (I'm working on a respective change).

  2. The strconv.AppendFloat representation of the 'b' format requires the bitsize of the argument (32 or 64). It simply interprets the mantissa as a large decimal, and then prints the exponent. For instance, a float64 0.0, using 'b' format, formats as: "0p-1074" which is somewhat odd as it requires understanding of the 64bit float format to explain how the result was obtained. Similarly, 1.0 is printed as 4503599627370496p-52, that is it is the float64 mantissa (53 bits) interpreted as a decimal, which is then printed (4503599627370496, same as 1<<52), followed by the exponent (http://play.golang.org/p/Rt0SIFzzHi). Again, it requires the mantissa size to explain the output. (0p0 and 1p0 would be just a valid, but be more expensive to derive - basically it's the same mantissa with trailing 0's removed and the exponent adjusted - a canonical form).

And some questions:

  1. What are you proposing? (Is this a proposal?)
  2. Are you arguing that this format should be acceptable syntax in the language?
@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Sep 6, 2015

This is not a proposal, I wanted to sound things out first.

I don't think that it needs to be part of the language, the utility outside numerics is limited. At the most making strconv.ParseFloat handle it is what I am thinking.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Sep 10, 2015

Having strconv.ParseFloat handle the format sounds reasonable to me. I think the next step would be to define the exact format (syntax). It's probably something like:

number = [sign] mantissa 'p' [sign] exponent.
sign = '+' | '-' .
mantissa = decimalDigit {decimalDigit}.
exponent = decimalDigit {decimalDigit}.

Questions:

  • Should both 'p' and 'P' be permitted? Why/why not?
  • Can the mantissa be hexadecimal? Why/why not?
@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Sep 10, 2015

It seems to me that a single case for 'p' makes sense because it is a (marginally) simpler thing to look for instances (visually and mechanically) when there is only one thing to look for and that thing is visually distinct from a digit (in the decimal case). Mantissa optionally as hex makes moderate sense since a bit pattern may be what is being specified.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Sep 10, 2015

Permitting only 'p' sounds good. Perhaps for a start, also leave away hexadecimal notation. Thus, a float in p notation is essentially a signed integer followed by a 'p' exponent.

Venture to send a change list? (strconv/atof.go)

@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Sep 10, 2015

Yeah, I'll look into that. Just an initial observation though; it seems that fmt.Scan* handle binary exponent float representations judging by the test cases that exist (though also with dot rather than int-only mantissa). big.Float also handles these cases, but also includes hex input (including non-int).

Before I add to the variety, I'd like to get input on that.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Sep 10, 2015

I haven't looked at fmt.Scan. Permitting a decimal point for a decimal mantissa is tricky. The point of the 'p' notation is 100% lossless conversion with a fast and simple algorithm. In general that's not true anymore once a decimal point is permitted.

big.Float uses a different format: the mantissa is represented by a hex number which corresponds to the bits after (to the right) of the "decimal" point - that is, that mantissa value m is 0.5 <= m < 1.0. It's essentially used for testing (and could possibly be changed).

Given a sign s (-1, +1), a mantissa m that is simply a decimal unsigned integer, and a binary exponent b, the floating point value x is x = s * m * 2**b . No further explanation needed.

There are design decisions to be made when printing using a binary exponent: The mantissa may be scaled arbitrarily. Currently, printing simply prints the float32/float64 mantissa bits like if they were int32 or int64 bit numbers (with appropriate exponent). This requires knowing the bit size of the type to reproduce. Another option would be to always print a canonical form; for instance such that the mantissa is the smallest possible value before requiring a decimal point. That is equivalent to having no trailing 0's in the mantissa (or the mantissa being odd, except for x == 0).

But for parsing it doesn't matter.

More generally: it seems that strconv conversion routines should parse numbers that it can print.

@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Sep 10, 2015

Agreed. Just getting clarification.

@rsc rsc self-assigned this Oct 23, 2015
@rsc rsc added the Thinking label Oct 23, 2015
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 25, 2015

@kortschak, regarding your initial comment:

The capacity to represent exact float values in a clear human-readable way is valuable in numeric code, for example here, where otherwise comments are required to explain the magic hex.

And the code says:

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = math.Float64frombits(0x3ca0000000000000)

    // dlamchP is 2 * eps
    dlamchP = math.Float64frombits(0x3cb0000000000000)

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error, it is equal to (1/math.MaxFloat64) * (1+ 6*eps)
    dlamchS = math.Float64frombits(0x4000000000001)

    ...
)

I want to make the point, unrelated to what we do in strconv, that this is unnecessary in Go. This kind of thing - specifying floating point constants in hexadecimal - is rampant in C because C compilers have historically been quite bad at reading floating point inputs. Using hex was the only way to guarantee the compiler arrived at the number you intended. But modern practice has improved, and Go gets this right. There are any number of ways you could write the above code using plain floating point constants, but the most direct is:

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = 1.1102230246251565e-16

    // dlamchP is 2 * eps
    dlamchP = 2.220446049250313e-16

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error, it is equal to (1/math.MaxFloat64) * (1+ 6*eps)
    dlamchS = 5.56268464626801e-309

    ...
)

This is guaranteed to have the same effect as the math.Float64frombits calls.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 25, 2015

Postponing the strconv work.

@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Nov 25, 2015

This misses the point. If we were able to say

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = 1p-53 // or package-provided equivalent.

    // dlamchP is 2 * eps
    dlamchP = 2*dlamchE

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error.
    dlamchS = (1/math.MaxFloat64) * (1+ 6*dlamchE)

    ...
)

then I would agree, but we can't. The capacity to express exact float values is less than half the problem.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Dec 11, 2015

@kortschak FWIW, in Go we can express 1p-53 quite elegantly as the constant expression 1.0/(1<<53). All the various forms agree with the value computed from the bit pattern: http://play.golang.org/p/VjkVDA8PrL .

Or more generally, any float constant of the form xxxp+exp or xxxp-exp can be expressed as xxx<<exp or xxx.0/(1<<exp) . Thus, the need for the p notation in code is diminished.

@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Dec 11, 2015

@griesemer Thanks for that tip. This covers our use case better than a strconv parser, though I feel probably the last sentence in #12518 (comment) justifies this addition for other uses.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Jan 4, 2016

FWIW I agree that since strconv can generate the p form it should also accept the p form. That said, doing so correctly at the boundaries is tricky. It's not a 1-liner.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 10, 2016

Decision is in my previous comment above: yes, it's fine to do this just get the corner cases right please.

@rsc rsc added NeedsFix and removed NeedsDecision Thinking labels Oct 10, 2016
@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Oct 11, 2016

In looking into this I have found that fmt.Scan does binary exponent format float parsing (as should be expected from the documentation). However, it is less restrictive than the documentation in fmt suggests it should be (for example, this works although the documentation specifies a "decimalless scientific notation"). It is even more relaxed than would be acceptable for strconv.ParseFloat since the routines backing the scan functions do not error when the exponent is out of range (instead setting the value to ±Inf). So I guess a question here is whether the scan float binary exponent functionality should be backed by a new strconv binary exponent float parser (keeping the behaviour that overflows are silently converted to Infs by discarding the out of range error), and whether the the Scan behaviour with decimal mantissas should be brought into the strconv.ParseFloat function (and documented in fmt).

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 21, 2016

The fmt scan overflow to inf behavior for binary exponents is a bug. Note that decimal exponents are handled right: https://play.golang.org/p/SHc0zAdyhx. It's just one more corner case that makes this non-trivial. It would be fine to support 1.2p4 as fmt.Scan does. In fact that's probably important. But it adds more corner cases.

I think we should probably postpone this to Go 1.9 since it's not urgent and there's little time left in Go 1.8. I'm interested to see this happen though.

@griesemer griesemer self-assigned this Oct 21, 2016
@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Mar 22, 2017

@griesemer

The comment here shows a technique for representing these constants using shifts, however this only really handles a minority of cases easily; anything that is not a multiple of a power of 2 is difficult and (more troublingly) values where the exponent is greater than the spec minimum integer constant representation width cannot easily either. An example of this is the value LAPACK DLAMCH("S") which is 1p-1022 and which cannot be expressed using the expression 1.0 / (1 << 1022), requiring instead 1.0 / (1 << 256) / (1 << 256) / (1 << 256) / (1 << 254) to shoehorn float exponents into the integer model.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Mar 22, 2017

@kortschak Ack. Are you proposing the p notation in the language?

@kortschak

This comment has been minimized.

Copy link
Contributor Author

@kortschak kortschak commented Mar 22, 2017

At this stage, that's just a data point that I had not noticed before.

The issue here is that I would suppose that a very small group of Go programmer would need that (and go compiler authors who have that in internal code as far as I remember). For the use that we have (1 case in two locations), I am suggesting for use to use the longer expression above. It would be nice to be able to express these values simply, but whether that is worth your (pl.) time is not clear to me.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented May 9, 2017

Not happening for 1.9.

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Aug 15, 2017

I think the agreement here is that we accept the 'p' notation for binary exponents with strconv.ParseFloat and eventually update fmt.Scan to match its behavior with respect to error handling or corner cases.

@martisch Is this something you might be interested in looking into (starting with ParseFloat)? If so, feel free to assign this issue to yourself.

@griesemer griesemer changed the title strconv: add capacity to parse binary exponent float representations strconv: strconv.ParseFloat should accept 'p' notation for binary exponents Aug 15, 2017
@martisch martisch self-assigned this Aug 15, 2017
@martisch

This comment has been minimized.

Copy link
Member

@martisch martisch commented Aug 15, 2017

I am interested and added myself.

@odeke-em odeke-em changed the title strconv: strconv.ParseFloat should accept 'p' notation for binary exponents strconv: ParseFloat should accept 'p' notation for binary exponents Mar 5, 2018
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Mar 5, 2018

@griesemer

This comment has been minimized.

Copy link
Contributor

@griesemer griesemer commented Sep 14, 2018

Moving this off 1.12. This takes a dedicated careful effort. We get to it when we get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.