[locale][num_get] Improve Stage 2 of string to float conversion #65168

ldionne · 2023-09-01T13:13:40Z

https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2

"Stage 2" of num_get::do_get() depends on "a check ... to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1". Previously this was a very simple check whether the next character was in a set of allowed characters. This could lead to Stage 2 accumulating character sequences such as "1.2f" and passing them to strtold (Stage 3).
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3

Stage 3 can fail, however, if the entire character sequence from Stage 2 is not used in the conversion. For example, the "f" in "1.2f" is not used. https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4

As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit set.

This change improves the checks made in Stage 2, determining what is passed to Stage 3.

Hex digits are only considered valid if "0x" has been seen
INFINITY value is recognised
Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations.
New tests are added. The old ones are preserved but refactored.

Differential Revision: https://reviews.llvm.org/D99091

https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2 "Stage 2" of num_get::do_get() depends on "a check ... to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1". Previously this was a very simple check whether the next character was in a set of allowed characters. This could lead to Stage 2 accumulating character sequences such as "1.2f" and passing them to strtold (Stage 3). https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3 Stage 3 can fail, however, if the entire character sequence from Stage 2 is not used in the conversion. For example, the "f" in "1.2f" is not used. https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4 As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit set. This change improves the checks made in Stage 2, determining what is passed to Stage 3. * Hex digits are only considered valid if "0x" has been seen * INFINITY value is recognised * Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations. * New tests are added. The old ones are preserved but refactored. Differential Revision: https://reviews.llvm.org/D99091

EricWF · 2023-09-06T19:11:42Z

It seems possible to ship this without breaking the ABI. And assuming the new behavior doesn't start to reject meaningful previously accepted code then I think we should ship it retroactively.

frederick-vs-ja · 2023-09-20T01:40:50Z

LWG2381 (not implemented in libc++ yet) intendedly rejects Inf and NaN. Should we restrict the handling of Inf and NaN to some extension mode?

mordante · 2023-12-24T14:47:29Z

It seems possible to ship this without breaking the ABI. And assuming the new behavior doesn't start to reject meaningful previously accepted code then I think we should ship it retroactively.

I also don't see why this has to be an ABI break. What is the reason?

LWG2381 (not implemented in libc++ yet) intendedly rejects Inf and NaN. Should we restrict the handling of Inf and NaN to some extension mode?

IMO we should just implement the LWG issue and not have an extension mode.

ldionne · 2024-01-12T19:56:47Z

I'm more than happy to drop this PR if it's not valid anymore. I was just trying to salvage it from https://reviews.llvm.org/D99091, but honestly I'm not very fluent in this part of the library so progress has been almost non-existent, and this is a bit low on my priority list. I'll drop this.

ldionne added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Sep 1, 2023

ldionne self-assigned this Sep 1, 2023

ldionne force-pushed the wip/num_get-string-to-float-conversion branch from a69e54d to 18800a8 Compare September 1, 2023 13:24

frederick-vs-ja mentioned this pull request Jan 12, 2024

[libc++] LWG2381: Inconsistency in parsing floating point numbers #77948

Open

ldionne closed this Jan 12, 2024

ldionne deleted the wip/num_get-string-to-float-conversion branch January 12, 2024 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[locale][num_get] Improve Stage 2 of string to float conversion #65168

[locale][num_get] Improve Stage 2 of string to float conversion #65168

ldionne commented Sep 1, 2023

EricWF commented Sep 6, 2023

frederick-vs-ja commented Sep 20, 2023

mordante commented Dec 24, 2023

ldionne commented Jan 12, 2024

[locale][num_get] Improve Stage 2 of string to float conversion #65168

[locale][num_get] Improve Stage 2 of string to float conversion #65168

Conversation

ldionne commented Sep 1, 2023

EricWF commented Sep 6, 2023

frederick-vs-ja commented Sep 20, 2023

mordante commented Dec 24, 2023

ldionne commented Jan 12, 2024