New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1559: Improve spec compliance when parsing IEEE754 strings. #1703
Fix #1559: Improve spec compliance when parsing IEEE754 strings. #1703
Conversation
3341e79
to
74cb4f3
Compare
Travis CI appears to be having a bad day, so I am having a bad day. To the best that I can tell the failures are internal to Travis and have |
74cb4f3
to
3762f50
Compare
rebased proactively. Travis CI is all green. |
errno.errno = 0 | ||
var res = f(cstr, end) | ||
|
||
def exceptionMsg = s"For input string \042${s}\042" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def exceptionMsg = s"For input string \042${s}\042" | |
def exceptionMsg = s"For input string \"$s\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" not working in interpolated strings is a problem still a problem in many scala versions. It is a problem
in the Scala 2.11 currently used by SN and in the pending Scala 2.12. I believe that it is unfixed in
Scala 2.13.2. It may get fixed in 2.13.3 (or .n).
To reduce the donkeying around, I re-wrote the line to achieve the same intent but not rely upon
string interpolation. I also added an informative section documenting the bug I was working around.
I wanted to steer far clear of octal constants (\042), which work, and Unicode constants (\u0022), which
fail because they are translated too early.
// The need to use \042 for double quote seems to be a Scala 2.11 bug. | ||
// Uglier workarounds exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment here does not make sense. Should it apply to def exceptionMsg
instead? If yes, is it still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is still relevant but looks like it is in the wrong/a_confusing place after exceptionMsg got factored out.
I moved the comment to just before exceptionMsg and documented the bug & need for workaround.
// Uglier workarounds exist. | ||
throw new NumberFormatException(exceptionMsg) | ||
} | ||
// Else strtod() or strtof() will have returned the proper type for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returned the proper type"? I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded it. key change is s/type/value/. Also tried to remove some mystification about the return type out of
that if branch. After a year, I could not follow it, so I figured others might have problems also.
// with or without an initial 'D', 'd', 'F', 'f' size indicator | ||
// is allowed. | ||
// The whitespace can include tabs ('\t') and even newlines! ('\n'). | ||
// Perhaps even Unicode.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Perhaps" is not very precise.
The JavaDoc has a very clear definition of what's allowed at https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#valueOf-java.lang.String-
There is even a regex for valid inputs, which we also use in Scala.js:
https://github.com/scala-js/scala-js/blob/0e8fe18808a10887db073cf33c6211721a064e5b/javalanglib/src/main/scala/java/lang/Double.scala#L75-L85
so whitespace chars are the ones in the range \u0000
to \u0020
included, and only those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the "Perhaps" line. I have learned more about Unicode and how Java & Scala define & parse whitespace over the year since this PR was created.
I followed the valueOf URL you kindly provided and added a reference to
javase/8/docs/api/java/lang/String.html#trim-- to the comment.
The PR is was explicitly crafted to use the Java String.trim method to ensure consistency.
Thank you for picking up on the less than stellar comment. Someone may have to follow this code someday.
def vetIEEE754Tail(tail: String): Boolean = { | ||
|
||
if (tail.length <= 0) { | ||
true | ||
} else { | ||
val rest = tail.tail.trim | ||
(tail(0): @scala.annotation.switch) match { | ||
case 'D' | 'd' => vetIEEE754Tail(rest) | ||
case 'F' | 'f' => vetIEEE754Tail(rest) | ||
case _ => (tail.trim.length == 0) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know that chars in the range [0-32] are admissible, we can easily write this without trim
, not causing any more allocation than necessary:
private def vetIEEE754Tail(tail: String): Boolean = {
val len = tail.length
if (len == 0) true
else {
var i = 0
if ((tail.charAt(0) & 0xdd) == 0x44) // magic: is first char one of D d F f
i = 1
while (i != len) {
if ((tail.charAt(i) & 0xff) > ' ')
return false
i += 1
}
true
}
}
And to avoid even allocating the substring at call site, we can take the start index instead of assuming that we have to start at 0:
private def vetIEEE754Tail(tail: String, start: Int): Boolean = {
val len = tail.length
if (len == start) true
else {
var i = start
if ((tail.charAt(i) & 0xdd) == 0x44) // magic: is first char one of D d F f
i += 1
while (i != len) {
if ((tail.charAt(i) & 0xff) > ' ')
return false
i += 1
}
true
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extensively re-worked the implementation based on review comments, sample code, the review
concern with minimizing allocations and a years worth of experience & understanding.
The idea was to do the math in the C and C byte realm rather than improperly conflating Scala/Java Strings.
I also added a few unit-test tests for corner cases which I found tricky or hard to remember.
} else if ((!end(0) == 0) || | ||
vetIEEE754Tail(s.slice((!end - cstr).toInt, s.length))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing that !end(0) == 0
is not correct, because there might be a \u0000
followed by non-whitespace characters in the original string. With the suggested definition of vetIEEE754Tail
, we can simply do:
} else if ((!end(0) == 0) || | |
vetIEEE754Tail(s.slice((!end - cstr).toInt, s.length))) { | |
} else if (vetIEEE754Tail(s, (!end - cstr).toInt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. My mind was in the C world and I did not consider perverse Java string.
Gone in current iteration. Thank you for the catch.
throw new NumberFormatException(exceptionMsg) | ||
} else if ((!end(0) == 0) || | ||
vetIEEE754Tail(s.slice((!end - cstr).toInt, s.length))) { | ||
res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dead code. We return res
below. You can leave an empty block, or, better, invert the condition of the else if
just above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone in current iteration.
val epsilon = 0.0001 | ||
assert(Math.abs(Double.parseDouble("6.66D") - 6.66) < epsilon, "a8") | ||
|
||
// Java allows trailing whitespace, including tabs. | ||
assert(Math.abs(Double.parseDouble("6.66D\t ") - 6.66) < epsilon, "a9") | ||
|
||
assert(Math.abs(Double.parseDouble("6.66d") - 6.66) < epsilon, "a10") | ||
|
||
assert(Math.abs(Double.parseDouble("7.77F") - 7.77) < epsilon, "a11") | ||
assert(Math.abs(Double.parseDouble("7.77f") - 7.77) < epsilon, "a12") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseDouble
has a very precise definition of its result. We should test it with a strong ==
, not with an epsilon
difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests with a \u0000
character, such as "7.77\u0000 "
(should succeed) and "7.77\u0000a"
(should fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both DoubleSuite.scala & FloatSuite.scala:
epsilon
comparisons changed to strong==
test as recommended.- Requested tests added to each file.
I also fixed a copy & paste error in FloatSuite.scala where Double.parseDouble() was being called instead of
the correct Float.parseFloat().
…E754 strings. * This PR fixes Issue scala-native#1559 "toFloat/toDouble throw `NumberFormatException` when given trailing designation". My thanks to @teodimoff for detecting & reporting this issue. Strings such as "3.14f".toDouble and "3.14D".toFloat are now accepted in the same way they are in Java 8. * This PR supersedes PR scala-native#1638, which I will now close. I had forgotten that I had already fixed the issue two months ago, duh! Upon comparing the code, I like the unified parseIEEE754 approach better because it is DRY. I rolled the more Java 8 compatible exception message from PR scala-native#1638 forward into this one. The parseIEEE754 code handles close-to-zero and +/- Infinity better. * Java 8 allows contiguous whitespace in the after/to_the_left_of the IEEE754 value. This whitespace can be blanks, tabs, newlines and a whole host of characters. As of this PR, Scala Native follows the Java 8 practice. I am not sure if Unicode is allowed, so I used the Java string trim method and let Java figure it out. * I also fixed the way the parse handled values too close to zero or too far from zero in either positive or negative directions. These cases now return the expected values of 0.0, +Infinity, and -Infinity, respectively. * Any NumberFormatException thrown now uses the Java 8 idiom of echoing the input string within quotes: 'For input string: "+9z.93"'. This allows one to detect trailing whitespace. * Test cases were added to DoubleSuite.scala & FloatSuite.scala for the conditions covered by this PR. Test cases for parsing hexadecimal strings were also added to improve coverage. Documentation: * The standard changelog entry is requested. Testing: * Built and tested ("test-all") in release-fast mode using sbt 1.2.8 on X86_64 only . All tests pass.
3762f50
to
79293a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Looks perfect.
…trings. (scala-native#1703) The JavaDoc says that those methods allow contiguous whitespace in the after/to_the_left_of the IEEE754 value. Whitespace is defined as what `trim()` would removed, i.e., characters in the range `\u0000` to `\u0020`, included. This commit also fixes the way the parse handled values too close to zero or too far from zero in either positive or negative directions. These cases now return the expected values of 0.0, +Infinity, and -Infinity, respectively. Test cases were added to DoubleSuite.scala and FloatSuite.scala for the conditions covered by this commit. Test cases for parsing hexadecimal strings were also added to improve coverage.
This PR fixes Issue toFloat/toDouble throw
NumberFormatException
when given trailing designation. #1559 "toFloat/toDouble throwNumberFormatException
when given trailing designation".
My thanks to @teodimoff for detecting & reporting this issue.
Strings such as "3.14f".toDouble and "3.14D".toFloat are now accepted in
the same way they are in Java 8.
Also fixes, in passing, Scala Native PR NumberFormatException instead of
Infinity
or-Infinity
#1836. Added test case from that Issue.This PR supersedes PR Fix #1559: Float & Double Strings with trailing type designation now parse #1638, which I will now close. I had forgotten
that I had already fixed the issue two months ago, duh!
Upon comparing the code, I like the unified parseIEEE754 approach better
because it is DRY.
I rolled the more Java 8 compatible exception message from PR Fix #1559: Float & Double Strings with trailing type designation now parse #1638
forward into this one.
The parseIEEE754 code handles close-to-zero and +/- Infinity
better.
Java 8 allows contiguous whitespace in the after/to_the_left_of
the IEEE754 value. This whitespace can be blanks, tabs, newlines
or any character between 0 (NUL) and 0x20 (space).
I also fixed the way the parse handled values too close to zero
or too far from zero in either positive or negative directions.
These cases now return the expected values of 0.0, +Infinity, and
-Infinity, respectively.
Any NumberFormatException thrown now uses the Java 8 idiom of
echoing the input string within quotes: 'For input string: "+9z.93"'.
This allows one to detect trailing whitespace.
Test cases were added to DoubleSuite.scala & FloatSuite.scala for
the conditions covered by this PR. Test cases for parsing
hexadecimal strings were also added to improve coverage.
Documentation:
Testing:
X86_64 only . All tests pass.