Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime: fix hex float parsing with underscore in exponent #2296

Merged
merged 11 commits into from Apr 5, 2019

Conversation

@hhugo
Copy link
Contributor

commented Mar 7, 2019

Following the format described in https://caml.inria.fr/pub/docs/manual-ocaml/lex.html.

Note that the ocaml lexer is more restrictive than float_of_string. In particular, float_of_string accepts leading underscore in the exponent which is rejected in the ocaml lexer.

TODO:

  • add some tests with null bytes in the input
  • add some tests showing that underscores are accepted everywhere.
@hhugo hhugo force-pushed the hhugo:fix-float-of-string branch from 697bcad to 13435b5 Mar 7, 2019
@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2019

Here is what would happen before this PR.

$ echo "0x1p1_1" > a.ml && ocamlc a.ml
Fatal error: exception Failure("float_of_string")
Raised by primitive operation at file "bytecomp/symtable.ml", line 209, characters 42-61
Called from file "bytecomp/symtable.ml", line 233, characters 39-55
Called from file "list.ml", line 106, characters 12-15
Called from file "bytecomp/symtable.ml", line 232, characters 2-87
Called from file "bytecomp/bytelink.ml", line 367, characters 14-47
Re-raised at file "bytecomp/bytelink.ml", line 386, characters 4-11
Called from file "driver/main.ml", line 199, characters 6-67
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Re-raised at file "parsing/location.ml", line 479, characters 14-25
Called from file "parsing/location.ml" (inlined), line 484, characters 31-61
Called from file "driver/main.ml", line 203, characters 4-35
Called from file "driver/main.ml", line 207, characters 2-9
EXIT STATUS 2
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

I was wondering whether we really want underscores in exponents of FP numbers, but it's probably too late to change our minds about this, as they've been supported for ages in the case of decimal FP notation, e.g.

# 1e3_00;;
- : float = 1e+300
#  float_of_string "1e3_00";;
- : float = 1e+300

I still find the "copy string while removing underscores" code inelegant, but it's better than crashing the compiler.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

Could we have a fast path that does not do anything if no _ are present? If we start by counting the number of _, this may simplify the size-calculation logic in the slow path as well.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2019

I still find the "copy string while removing underscores" code inelegant, but it's better than crashing the compiler.

This implementation was copied and adapted from caml_float_of_string.

Regarding the fast path, based on the following point, I'm tempted to do nothing.

  • The non-hexadecimal path already does the "copy string while removing underscores"
  • Hexadecimal floats are pretty uncommon
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2019

I'm not too concerned about speed, more concerned about code duplication. Perhaps float_of_string should always remove underscores first, then handle decimal and hex floats separately. I'll try to read the whole code again to form an opinion.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

It may have been a mistake to talk of fast/slow path, but I think having a direct path in the simple case is also a win for correctness. You have modified more than 10 lines of C code doing index computations, read and write in a fixed-size array, so that's above the limit where I feel reasonably confident that nothing goes wrong by just eyeballing the code. On the other hand, I could check a "simple path" implementation to easily become very confident that all existing use-cases (without underscores) are completely untouched by the change -- and thus be more at ease when reading the "complex path" code.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2019

I gave a try at @gasche's suggestion. See last commits

runtime/floats.c Outdated Show resolved Hide resolved
@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2019

float_of_string "0x1\000a" currently returns some result. With this PR, one would raise an exception, similar to what happen with the decimal notation.

hhugo and others added 6 commits Mar 14, 2019
Just remove underscores before parsing in all cases (hex and dec).
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2019

Your code was getting too complicated for me to review, so I took the liberty of pushing straight onto your branch the simpler fix I had in mind, namely remove underscores first, then parse as hex or as decimal. Again, I'm not worried about the possible run-time costs, but I am concerned about code complexity.

I'll give this PR a round of CI precheck next.

@gasche
gasche approved these changes Mar 23, 2019
Copy link
Member

left a comment

I reviewed @xavierleroy's new implementation and I believe that it is correct -- sorry for not noticing the simplification myself.

Optionally, the whole code could be further simplified/streamlined by:

  • merging the return continuations of the of hexademical and decimal cases (currently the caml_stat_free and caml_copy_double logics are duplicated)
  • moving the decimal logic (do we have strod_l or not?) to its own caml_float_of_dec function, so that the two branches are at the same abstraction level within the main function.
@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Mar 23, 2019

the simpler fix I had in mind, namely remove underscores first, then parse as hex or as decimal.

My last was doing that already, no?

merging the return continuations of the of hexademical and decimal cases

This simplication was part of my last version

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2019

@hhugo: reviewing code is harder than writing it, and your latest diff was so large that I certainly won't review it. From this point on, I'll write the code. Please don't feel offended, I'm just trying to use my time as efficiently as I can.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2019

This said, both @hhugo's and my latest code accept things like _0x1p2 or 0_x1p2, which can be questioned.

Refactor code as suggested during review.
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

I give up on the idea of rejecting e.g. 0_x1p2. Let's be tolerant in what we accept...

I just pushed a cleanup of the code, with more sharing of continuations as suggested by @gasche.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

@xavierleroy, should we be tolerant in what we accept for ints as well?

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

should we be tolerant in what we accept for ints as well?

For me a reasonable goal for int_of_string and float_of_string is that they match OCaml's lexical conventions for integer literals and FP literals. For string_of_float we accept more than just OCaml FP literals, because we don't control what strtod does. Checking the string before calling strtod would be a lot of code for a small benefit.

But for int_of_float I think the current implementation accepts all OCaml integer literals and only OCaml integer literals. I'm not sure we want to accept more. Did you have something specific in mind?

@gasche
gasche approved these changes Apr 5, 2019
Copy link
Member

left a comment

I have looked at the current patch and I believe that the code is correct.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

@xavierleroy , nothing specific in mind. I'm fine with the current diff.

@xavierleroy xavierleroy merged commit 17d7479 into ocaml:trunk Apr 5, 2019
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

OK, this is merged in trunk. Should it be cherry-picked to 4.08? @gasche and I have cold feet (because it is very late in 4.08's release cycle) but @damiendoligez can decide.

@hhugo hhugo deleted the hhugo:fix-float-of-string branch Apr 5, 2019
@damiendoligez

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

@hhugo how did you find this bug? I have a hard time believing that it's big enough to justify cherry-picking.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

I discovered the bug more or less randomly. I would not push hard to have a fix in 4.08.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.