Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float() parsing innacuracy #6480

Closed
Einscrew opened this issue Dec 1, 2020 · 5 comments
Closed

Float() parsing innacuracy #6480

Einscrew opened this issue Dec 1, 2020 · 5 comments

Comments

@Einscrew
Copy link

Einscrew commented Dec 1, 2020

Environment Information

I'm running jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.252-b09 on 1.8.0_252-b09 +jit [linux-x86_64] from logstash (although I don't think this is a logstash specific problem) on a Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Here's a side-by-side comparison between ruby and jruby, I think the expected behavior should be the later:

  • ruby -e 'v="661e7086-33af-11eb-a4ef-fa163ec4aef9"; puts(Float(v) rescue v)' returns 661e7086-33af-11eb-a4ef-fa163ec4aef9
  • jruby -e 'v="661e7086-33af-11eb-a4ef-fa163ec4aef9"; puts(Float(v) rescue v)' returns Infinity

I think jruby is parsing "661e7086" from the original the string and interpreting as scientific notation. Although more robust, I don't think is a good behaviour in general (imho jruby should follow ruby).

Thanks a lot for the attention!

@enebo
Copy link
Member

enebo commented Dec 1, 2020

Reduced this a little more to:

Float("661e7086FROGGER")

So at first glance it appears that we ignore extra data after finding a valid float value on the front of the string and MRI will raise an ArgumentError because there is invalid extra crud after the float value. I half wonder if Ruby ever did what we did or not?

@enebo
Copy link
Member

enebo commented Dec 1, 2020

Some notes...

We actually lean on JVM Double.parseDouble BUT we pre-process the string first to account for things Java does not (or at least did not up to a point) like using _.

This processing happens to abort when seeing '6' of 7086 thinking as an exponent it will be too large a value. So this is how we get Inifinity. It never even makes it to the Java double parsing. If it did then we would end up with a NumberFormatException and match MRI.

At some point we should just properly make a double processor and not perform this pre-processing. It would be quicker and ultimately it be a little less confusing. It is also a fairly large and icky job.

For now, I might add some logic to look past the first three digits as part of tooLargeExponent() in ConvertDouble.java. If it sees a non-numeric value in the rest of the string it will just abort. In this case it should find a '-'.

The original String value is quite entertaining because all the characters after the initial float string are actually all value characters that can occur within a float number.

@enebo
Copy link
Member

enebo commented Dec 1, 2020

I actually am mistaken in my last comment this pre-processing does not have 0x float values so any non-[0-9] would be recognized as not valid.

@enebo enebo added this to the JRuby 9.2.14.0 milestone Dec 1, 2020
@enebo
Copy link
Member

enebo commented Dec 1, 2020

Targetting 9.2.14 which we plan on releasing soon since the fix I used is very very narrow so the risk is minimal. This new logic only occurs in a place where we think we have too many digits in an exponent. Worst case a break in logic would not cause Infinity to happen. There is not logical flaw here though 🥇 :)

Also ruby/spec does actually adequately test garbage after a float. Our failure is just a weird wrinkle in how we implement things so I am adding something to our regression specs.

enebo added a commit that referenced this issue Dec 1, 2020
@enebo enebo closed this as completed Dec 1, 2020
@Einscrew
Copy link
Author

Einscrew commented Dec 2, 2020

Thanks a lot for the very fast action!
I'm glad I could contribute somehow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants