Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[locale][num_get] Improve Stage 2 of string to float conversion #65168

Closed

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Sep 1, 2023

https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2

"Stage 2" of num_get::do_get() depends on "a check ... to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1". Previously this was a very simple check whether the next character was in a set of allowed characters. This could lead to Stage 2 accumulating character sequences such as "1.2f" and passing them to strtold (Stage 3).
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3

Stage 3 can fail, however, if the entire character sequence from Stage 2 is not used in the conversion. For example, the "f" in "1.2f" is not used. https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4

As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit set.

This change improves the checks made in Stage 2, determining what is passed to Stage 3.

  • Hex digits are only considered valid if "0x" has been seen

  • INFINITY value is recognised

  • Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations.

  • New tests are added. The old ones are preserved but refactored.

Differential Revision: https://reviews.llvm.org/D99091

@ldionne ldionne added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Sep 1, 2023
@ldionne ldionne self-assigned this Sep 1, 2023
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2

"Stage 2" of num_get::do_get() depends on "a check ... to determine if c is
allowed as the next character of an input field of the conversion specifier
returned by Stage 1". Previously this was a very simple check whether the next
character was in a set of allowed characters.  This could lead to Stage 2
accumulating character sequences such as "1.2f" and passing them to strtold
(Stage 3).
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3

Stage 3 can fail, however, if the entire character sequence from Stage 2 is not
used in the conversion. For example, the "f" in "1.2f" is not used.
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4

As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit
set.

This change improves the checks made in Stage 2, determining what is passed to
Stage 3.

* Hex digits are only considered valid if "0x" has been seen

* INFINITY value is recognised

* Characters in INFINITY and NAN are only valid in sequence. This is done by
  checking one character backwards, which has obvious limitations.

* New tests are added. The old ones are preserved but refactored.

Differential Revision: https://reviews.llvm.org/D99091
@ldionne ldionne force-pushed the wip/num_get-string-to-float-conversion branch from a69e54d to 18800a8 Compare September 1, 2023 13:24
@EricWF
Copy link
Member

EricWF commented Sep 6, 2023

It seems possible to ship this without breaking the ABI. And assuming the new behavior doesn't start to reject meaningful previously accepted code then I think we should ship it retroactively.

@frederick-vs-ja
Copy link
Contributor

LWG2381 (not implemented in libc++ yet) intendedly rejects Inf and NaN. Should we restrict the handling of Inf and NaN to some extension mode?

@mordante
Copy link
Member

It seems possible to ship this without breaking the ABI. And assuming the new behavior doesn't start to reject meaningful previously accepted code then I think we should ship it retroactively.

I also don't see why this has to be an ABI break. What is the reason?

LWG2381 (not implemented in libc++ yet) intendedly rejects Inf and NaN. Should we restrict the handling of Inf and NaN to some extension mode?

IMO we should just implement the LWG issue and not have an extension mode.

@ldionne
Copy link
Member Author

ldionne commented Jan 12, 2024

I'm more than happy to drop this PR if it's not valid anymore. I was just trying to salvage it from https://reviews.llvm.org/D99091, but honestly I'm not very fluent in this part of the library so progress has been almost non-existent, and this is a bit low on my priority list. I'll drop this.

@ldionne ldionne closed this Jan 12, 2024
@ldionne ldionne deleted the wip/num_get-string-to-float-conversion branch January 12, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants