Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fmt::Binary for f32 and f64 #49066

Closed
wants to merge 1 commit into from

Conversation

varkor
Copy link
Member

@varkor varkor commented Mar 15, 2018

I often find it useful to print f32 and f64 in their binary formats when doing floating point analysis or bit manipulations (e.g. in #48622) and I'm definitely not the only one. Printing their binary representation seems like sensible behaviour for fmt::Binary.

I've opted to print fixed-width (i.e. 32 bits for f32 and 64 bits for f64) because this seems to make the most sense for the primary use cases of printing the floating-point representation, where the bit ranges are very much relevant to the values, unlike for integers.

@rust-highfive
Copy link
Collaborator

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 15, 2018
@hanna-kruppe
Copy link
Contributor

If we do this, there should also be matching fmt::{Upper,Lower}Hex impls.

@varkor
Copy link
Member Author

varkor commented Mar 15, 2018

Sure, I can add those now to give a more complete picture when reviewing.

@clarfonthey
Copy link
Contributor

It makes more sense to actually represent the number in binary, instead of its bits. If I want to show its bits, I'll explicitly call to_bits.

@hanna-kruppe
Copy link
Contributor

I have some sympathy for that position, but note that AFAIK we don't currently have any precedent for how to render a binary float in base 2. For hex there is a relatively established syntax (used in Java for example), but I've never seen anything for base 2.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

@clarcharr: although I can understand that viewpoint, I don't think it's the natural way to display a IEEE 754 floating point number (as opposed to, say, Decimal or some other arbitrary-precision floating-point number). Note that it is also inconsistent with the fmt::Binary implementation for signed integers, which also prints the bitwise representation, rather than the base-2 representation.

@clarfonthey
Copy link
Contributor

@rkruppe I mean, there isn't precedent for it, but considering how the exponent for hex literals is a power of two, it seems very logical to do similarly for binary literals. I can't imagine it being implemented any other way, although having a standard hex format print for {:x} and just the raw bits being printed for {:b} seems very inconsistent.

I personally vote to print the actual float for binary, octal, and hex formatting. That said, it might make more sense to have an RFC for adding that to the syntax first...

Either way, considering how this impl would be insta-stable, I'm not comfortable having this be the default when to_bits exists and no larger discussion is had.

@clarfonthey
Copy link
Contributor

@varkor oh, I didn't know that. In that case, it makes sense to make the binary impl print the bits.

I still stand for octal and hex however.

@hanna-kruppe
Copy link
Contributor

@clarcharr

I still stand for octal and hex however.

Do you mean printing the number in base 8 and 16, as opposed to printing the IEEE 754 encoding as {:b} does? Such an inconsistency would be the worst of both worlds IMO.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

@clarcharr: arguments for preferring mathematical or bitwise representations could have been made either way originally (I'm not sure what the original motivations were), but currently integers are formatted bitwise with fmt::LowerHex and fmt::UpperHex at the moment. Arguably it makes more sense for integers to be formatted base-16 (e.g. with signs, rather than bitwise) than floating-points, so there's clear precedent. Also, as @rkruppe says, consistency is important, across the different formatting impls.

@tspiteri
Copy link
Contributor

I'm with @clarcharr on this one; I see no value in printing the bitwise representation when it is easy to format!("{:032b}", f.to_bits()). And I think this PR is significant enough to require an RFC.

About consistency with signed integers, it is true that signed integers are formatted bitwise, but at least they consist of a single field where the meaning is much closer to an unsigned integer than the meaning of bits in floating-point numbers. IEEE754 numbers consist of three separate fields which can also complicate things: when you need to examine the bit representation, it almost always makes sense to separate the three fields, and they don't even lie on hex-digit boundaries, so printing the fields independently in hex is very different from printing the whole bitwise representation in hex.

And if consistency with signed integers is cited, then the commit should not include a width and zero-padding; format(":b", -1i8) produces "11111111", but format(":b", 1i8) produces "1" not "00000001".

Also, printing the bitwise representation is more useful in debugging than in displaying; and fmt::Binary and the others are closer to fmt::Display than to fmt::Debug. So maybe this would be more suitable as an extension similar to what RFC 2226 (tracking issue) is for integers.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

so printing the fields independently in hex is very different from printing the whole bitwise representation in hex.

I agree here: unless there's a unanimous opinion about how to best format floating points in hex, I'd rather leave them for now. But perhaps @rkruppe had thoughts here?

And if consistency with signed integers is cited, then the commit should not include a width and zero-padding; format(":b", -1i8) produces "11111111", but format(":b", 1i8) produces "1" not "00000001".

Honestly, I think this is a mistake in the design of the Binary implementations for integers, but as it stands, that's a fair point.

and fmt::Binary and the others are closer to fmt::Display than to fmt::Debug

Why do you say this? Printing the 2's-complement representation of an integer seems more like Debug than Display to me.
Regardless, RFC 2226 might be the better route to go down here; that's a good call.

@hanna-kruppe
Copy link
Contributor

I am also leaning towards not implementing fmt::{Binary,LowerHex,UpperHex} at all given that both options have some advantages and downsides, and there's no pressing need to pick either either option.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

Out of curiosity, @clarcharr, do you have any concrete examples of when you've wanted to print the decimal expansion of a floating-point number before? Just wondering whether it's just what you would expect the behaviour to be in this case, or whether the other behaviour is useful.

@tspiteri
Copy link
Contributor

And if consistency with signed integers is cited, then the commit should not include a width and zero-padding; format(":b", -1i8) produces "11111111", but format(":b", 1i8) produces "1" not "00000001".

Honestly, I think this is a mistake in the design of the Binary implementations for integers, but as it stands, that's a fair point.

I also think that the current behaviour is a mistake. My intention was not to say that the floats should be printed in less than 32/64 bits, but that the consistency argument is flawed, especially when considering that floats are very different from integers.

and fmt::Binary and the others are closer to fmt::Display than to fmt::Debug

Why do you say this? Printing the 2's-complement representation of an integer seems more like Debug than Display to me.

I always had the impression that all the formatting traits except Debug are meant to print something user-facing, and only Debug is there for debugging. For example, if it were only for debugging, why would there be both LowerHex and UpperHex? That is basically where I don't particularly like the behaviour of Binary and the other trait implementations for signed primitives as bitwise representations, but that's now done, and probably it was done that way to be consistent with printf("%x %X", ...), so there are probably valid reasons for the decisions taken.

In conclusion, I'm wary of implementing Binary (and maybe other formatting traits) on floats since there is no one obvious output that would be given. And the workaround for your use case is really simple, either use to_bits(), or else use a formatting wrapper and implement the formatting traits on that. If there are enough users of one such kind of wrapper, then it would make sense to consider the implementations in std.

@clarfonthey
Copy link
Contributor

clarfonthey commented Mar 16, 2018

@varkor The US' outdated system of measurement often measures in binary (eg. 5/16 inches) and while I've never personally written Rust code to deal with this it's not an unusual representation to try and figure out what 0.2 is in a binary-float expansion.

In terms of debugging, I personally find a binary float representation to be more fruitful than a raw-bit version. Knowing that -0.1 is:

10111101110011001100110011001101

instead of:

-10011001100110011001101p-4

Is honestly less useful to me. I can read the latter as -0.000(1001)... whereas I have to mentally read the first bit as a sign, the next eight bits as an exponent (which, mind you, I have to convert from binary to decimal to understand), and then the rest as the actual mantissa.

And, if I want to construct the actual IEEE754 representation, I can encode the sign, exponent, and mantissa back quite easily.

Thinking this over again, this in fact doesn't go against the fact that Binary for integers shows two's complement instead of a signed number. For floats, because the exponent and sign are 100% separate, you're still technically displaying the underlying representation, just, in a way more people can read.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

For floats, because the exponent and sign are 100% separate, you're still technically displaying the underlying representation, just, in a way more people can read.

Readability of the components is a good point. What do you think about a space or underscore separated format, e.g. 1_01111011_10011001100110011001101 — still true to a "binary" formatting, but significantly more human-readable too. This would be just as useful to me as the initial implementation, but more useful than no implementation at all.

@tspiteri
Copy link
Contributor

tspiteri commented Mar 16, 2018

I hacked a couple of quick wrappers (playground link) for the two possibilities. I only implemented Binary and LowerHex, and did not do any padding, and did not pick good names; but I did handle all 5 FP categories. I cannot find a single case where I prefer the pure bitwise representation.

@clarfonthey
Copy link
Contributor

Yeah, not a very hard choice; the expansion looks way better than the raw version.

@varkor
Copy link
Member Author

varkor commented Mar 16, 2018

An expansion is not helpful for my use cases (which render the aesthetics moot), so I'll just close the PR, as if there's not general agreement, there's not strong motivation to settle this question here. While I'd prefer to see some more examples of where a binary expansion is desirable, you're right in that it's not difficult to print the bitwise representation without this implementation, so it's not pressing.

@varkor varkor closed this Mar 17, 2018
@varkor varkor deleted the floating-point-binary-format branch March 17, 2018 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants