Character.digit is too restrictive in what it considers digits #1189

martijnhoekstra · 2018-03-07T12:17:57Z

The scala native implementation of Character.digit excludes the following digit ranges that OpenJDK does consider digits:

660 (ARABIC-INDIC DIGIT ZERO) - 669 (ARABIC-INDIC DIGIT NINE)
6f0 (EXTENDED ARABIC-INDIC DIGIT ZERO) - 6f9 (EXTENDED ARABIC-INDIC DIGIT NINE)
7c0 (NKO DIGIT ZERO) - 7c9 (NKO DIGIT NINE)
966 (DEVANAGARI DIGIT ZERO) - 96f (DEVANAGARI DIGIT NINE)
9e6 (BENGALI DIGIT ZERO) - 9ef (BENGALI DIGIT NINE)
a66 (GURMUKHI DIGIT ZERO) - a6f (GURMUKHI DIGIT NINE)
ae6 (GUJARATI DIGIT ZERO) - aef (GUJARATI DIGIT NINE)
b66 (ORIYA DIGIT ZERO) - b6f (ORIYA DIGIT NINE)
be6 (TAMIL DIGIT ZERO) - bef (TAMIL DIGIT NINE)
c66 (TELUGU DIGIT ZERO) - c6f (TELUGU DIGIT NINE)
ce6 (KANNADA DIGIT ZERO) - cef (KANNADA DIGIT NINE)
d66 (MALAYALAM DIGIT ZERO) - d6f (MALAYALAM DIGIT NINE)
e50 (THAI DIGIT ZERO) - e59 (THAI DIGIT NINE)
ed0 (LAO DIGIT ZERO) - ed9 (LAO DIGIT NINE)
f20 (TIBETAN DIGIT ZERO) - f29 (TIBETAN DIGIT NINE)
1040 (MYANMAR DIGIT ZERO) - 1049 (MYANMAR DIGIT NINE)
1090 (MYANMAR SHAN DIGIT ZERO) - 1099 (MYANMAR SHAN DIGIT NINE)
17e0 (KHMER DIGIT ZERO) - 17e9 (KHMER DIGIT NINE)
1810 (MONGOLIAN DIGIT ZERO) - 1819 (MONGOLIAN DIGIT NINE)
1946 (LIMBU DIGIT ZERO) - 194f (LIMBU DIGIT NINE)
19d0 (NEW TAI LUE DIGIT ZERO) - 19d9 (NEW TAI LUE DIGIT NINE)
1a80 (TAI THAM HORA DIGIT ZERO) - 1a89 (TAI THAM HORA DIGIT NINE)
1a90 (TAI THAM THAM DIGIT ZERO) - 1a99 (TAI THAM THAM DIGIT NINE)
1b50 (BALINESE DIGIT ZERO) - 1b59 (BALINESE DIGIT NINE)
1bb0 (SUNDANESE DIGIT ZERO) - 1bb9 (SUNDANESE DIGIT NINE)
1c40 (LEPCHA DIGIT ZERO) - 1c49 (LEPCHA DIGIT NINE)
1c50 (OL CHIKI DIGIT ZERO) - 1c59 (OL CHIKI DIGIT NINE)
a620 (VAI DIGIT ZERO) - a629 (VAI DIGIT NINE)
a8d0 (SAURASHTRA DIGIT ZERO) - a8d9 (SAURASHTRA DIGIT NINE)
a900 (KAYAH LI DIGIT ZERO) - a909 (KAYAH LI DIGIT NINE)
a9d0 (JAVANESE DIGIT ZERO) - a9d9 (JAVANESE DIGIT NINE)
aa50 (CHAM DIGIT ZERO) - aa59 (CHAM DIGIT NINE)
abf0 (MEETEI MAYEK DIGIT ZERO) - abf9 (MEETEI MAYEK DIGIT NINE)
ff10 (FULLWIDTH DIGIT ZERO) - ff19 (FULLWIDTH DIGIT NINE)

The text was updated successfully, but these errors were encountered:

densh · 2018-03-07T12:19:31Z

Thanks for the bug report!

LeeTibbert · 2018-08-29T03:52:56Z

@martijnhoekstra @densh

As a fellow user of ScalaNative, thank you for reporting this problem.
Are you still experiencing it with ScalaNative >= 0.3.8, released July 16, 2018.
I am just respectfully looking for a top of the head, "Yeah, I this gets me twice an hour!"
answer, not asking for more than 2 minutes of your time.

TLDR;

Looks like PR #651 dated April 21, 2017 (the 2017 looks suspicious!)
fixes this. From my reading of the Git history, this shipped in S-N 0.3.8.
If you are still experiencing problems, I'll dig deeper.

If you are no longer experiencing this problem, I believe this issue can
be closed.

Long Story:

You carefully documented report was incredibly helpful.

CharacterSuite.scala has no relevant tests.

When I could not replicate the reported issue using one arbitrarily chosen digit in the
list you kindly supplied I fell back to exhaustive testing.

I took the list off ranges you documented and wrote a program to
test all members of the range and one character below and one
above the range. For all ranges, for all tested characters isDigit()
returned the appropriate result.

I tested first using Java 10, then Java 8, the current S-N support standard.

Looks fixed to me. Please advise. Thank you.

Lee
gardening on the backlog.

martijnhoekstra · 2018-08-29T07:46:06Z

The bug is still there, but the title is wrong (the issue is with digit, as in the description, not with isDigit as in the title)

See: https://scastie.scala-lang.org/nlxWbwbyQumeFTzJGRxfPw

The native digit method there is lifted from https://github.com/scala-native/scala-native/blob/master/javalib/src/main/scala/java/lang/Character.scala#L1152-L1168

sjrd · 2018-08-29T08:12:26Z

We recently fixed Character.digit in Scala.js. See scala-js/scala-js@396ff72.

densh · 2018-08-29T08:33:30Z

@sjrd Thanks for the heads up! 👍

@sjrd

…considers digits" * The presenting problem was expressed in issue scala-native#1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

@sjrd

…considers digits" * The presenting problem was expressed in issue scala-native#1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * An unexpected but welcome consequence of this PR is that j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, and j.l.Long.parseUnsignedLong now handle Unicode digits. I also checked that at least j.l.Integer.parseInt rejected, as it should, Unicode supplementary characters, even if Character.digit reports them as digits. ScalaJs PR "Fix scala-native#2935: Support non-ASCII scripts in `parseInt` and `parseLong`." appears to be unneeded in S-N. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

LeeTibbert · 2018-08-30T13:17:52Z

@sjrd Thank you for the information about the Scala.js Character.digit fix. You saved me days if not
weeks of effort.

@martijnhoekstra PR #1333 is in the queue. Let me know if that solves
the reported issue or not. Thank you for helping improve ScalaNative.

@sjrd

…considers digits" * The presenting problem was expressed in issue scala-native#1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * An unexpected but welcome consequence of this PR is that extended Unicode strings are now supported by j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, & j.l.Long.parseUnsignedLong. I also checked that at least j.l.Integer.parseInt rejected, as it should, Unicode supplemental characters, even though they might be reported as digits by Character.digit. I was prompted to explore the parse* methods parsing Unicode by a ScalaJs commit to Character.scala after the Character.digit commit I was porting. It appears that ScalaJs change is not needed by ScalaNative. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

@sjrd

…considers digits" * The presenting problem was expressed in issue scala-native#1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * An unexpected but welcome consequence of this PR is that extended Unicode strings are now supported by j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, & j.l.Long.parseUnsignedLong. I also checked that at least j.l.Integer.parseInt rejected, as it should, Unicode supplemental characters, even though they might be reported as digits by Character.digit. I was prompted to explore the parse* methods parsing Unicode by a ScalaJs commit to Character.scala after the Character.digit commit I was porting. It appears that ScalaJs change is not needed by ScalaNative. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

@sjrd

…gits (#1333) * The presenting problem was expressed in issue #1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * An unexpected but welcome consequence of this PR is that extended Unicode strings are now supported by j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, & j.l.Long.parseUnsignedLong. I also checked that at least j.l.Integer.parseInt rejected, as it should, Unicode supplemental characters, even though they might be reported as digits by Character.digit. I was prompted to explore the parse* methods parsing Unicode by a ScalaJs commit to Character.scala after the Character.digit commit I was porting. It appears that ScalaJs change is not needed by ScalaNative. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

@sjrd

…considers digits (scala-native#1333) * The presenting problem was expressed in issue scala-native#1189 "Character.digit is too restrictive in what it considers digits " This issue is now fixed. ScalaNative now can correctly categorize digits from additional Unicode 16 bit blocks. * The code in this pull request is ported, with gratitude and minor modifications, from ScalaJS, especially ScalaJS commit: https://github.com/scala-js/scala-js/commit/ 396ff7287d06a7690ac12ed906b878e377fba91d Thank you, @sjrd, for both permission to use the code and, especially, notification that it existed. * The prior code had been missing a Java 8 method `def digit(codePoint: Int, radix: Int): Int`. That method has now been ported. * An unexpected but welcome consequence of this PR is that extended Unicode strings are now supported by j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, & j.l.Long.parseUnsignedLong. I also checked that at least j.l.Integer.parseInt rejected, as it should, Unicode supplemental characters, even though they might be reported as digits by Character.digit. I was prompted to explore the parse* methods parsing Unicode by a ScalaJs commit to Character.scala after the Character.digit commit I was porting. It appears that ScalaJs change is not needed by ScalaNative. * The ScalaJs digit test case was added, with minor modifications to avoid naming conflicts, to CharacterSuite.scala. 64/32 bit issues: None known. Documentation: Deserves a release note. Testing: * Built and tested ("test-all") on X86_64. All tests passed.

densh added bug component:javalib labels Mar 7, 2018

densh added this to the Backlog milestone Mar 7, 2018

martijnhoekstra changed the title ~~Character.isDigit is too restrictive in what it considers digits~~ Character.digit is too restrictive in what it considers digits Aug 29, 2018

LeeTibbert mentioned this issue Aug 30, 2018

Fix #1189 "Character.digit is too restrictive in what it considers digits" #1333

Merged

densh closed this as completed in #1333 Feb 26, 2019

densh modified the milestones: Backlog, 0.4.0 Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character.digit is too restrictive in what it considers digits #1189

Character.digit is too restrictive in what it considers digits #1189

martijnhoekstra commented Mar 7, 2018

densh commented Mar 7, 2018

LeeTibbert commented Aug 29, 2018

martijnhoekstra commented Aug 29, 2018

sjrd commented Aug 29, 2018

densh commented Aug 29, 2018

LeeTibbert commented Aug 30, 2018

Character.digit is too restrictive in what it considers digits #1189

Character.digit is too restrictive in what it considers digits #1189

Comments

martijnhoekstra commented Mar 7, 2018

densh commented Mar 7, 2018

LeeTibbert commented Aug 29, 2018

TLDR;

Long Story:

martijnhoekstra commented Aug 29, 2018

sjrd commented Aug 29, 2018

densh commented Aug 29, 2018

LeeTibbert commented Aug 30, 2018