Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character.digit is too restrictive in what it considers digits #1189

Closed
martijnhoekstra opened this issue Mar 7, 2018 · 6 comments · Fixed by #1333
Closed

Character.digit is too restrictive in what it considers digits #1189

martijnhoekstra opened this issue Mar 7, 2018 · 6 comments · Fixed by #1333

Comments

@martijnhoekstra
Copy link

The scala native implementation of Character.digit excludes the following digit ranges that OpenJDK does consider digits:

660 (ARABIC-INDIC DIGIT ZERO) - 669 (ARABIC-INDIC DIGIT NINE)
6f0 (EXTENDED ARABIC-INDIC DIGIT ZERO) - 6f9 (EXTENDED ARABIC-INDIC DIGIT NINE)
7c0 (NKO DIGIT ZERO) - 7c9 (NKO DIGIT NINE)
966 (DEVANAGARI DIGIT ZERO) - 96f (DEVANAGARI DIGIT NINE)
9e6 (BENGALI DIGIT ZERO) - 9ef (BENGALI DIGIT NINE)
a66 (GURMUKHI DIGIT ZERO) - a6f (GURMUKHI DIGIT NINE)
ae6 (GUJARATI DIGIT ZERO) - aef (GUJARATI DIGIT NINE)
b66 (ORIYA DIGIT ZERO) - b6f (ORIYA DIGIT NINE)
be6 (TAMIL DIGIT ZERO) - bef (TAMIL DIGIT NINE)
c66 (TELUGU DIGIT ZERO) - c6f (TELUGU DIGIT NINE)
ce6 (KANNADA DIGIT ZERO) - cef (KANNADA DIGIT NINE)
d66 (MALAYALAM DIGIT ZERO) - d6f (MALAYALAM DIGIT NINE)
e50 (THAI DIGIT ZERO) - e59 (THAI DIGIT NINE)
ed0 (LAO DIGIT ZERO) - ed9 (LAO DIGIT NINE)
f20 (TIBETAN DIGIT ZERO) - f29 (TIBETAN DIGIT NINE)
1040 (MYANMAR DIGIT ZERO) - 1049 (MYANMAR DIGIT NINE)
1090 (MYANMAR SHAN DIGIT ZERO) - 1099 (MYANMAR SHAN DIGIT NINE)
17e0 (KHMER DIGIT ZERO) - 17e9 (KHMER DIGIT NINE)
1810 (MONGOLIAN DIGIT ZERO) - 1819 (MONGOLIAN DIGIT NINE)
1946 (LIMBU DIGIT ZERO) - 194f (LIMBU DIGIT NINE)
19d0 (NEW TAI LUE DIGIT ZERO) - 19d9 (NEW TAI LUE DIGIT NINE)
1a80 (TAI THAM HORA DIGIT ZERO) - 1a89 (TAI THAM HORA DIGIT NINE)
1a90 (TAI THAM THAM DIGIT ZERO) - 1a99 (TAI THAM THAM DIGIT NINE)
1b50 (BALINESE DIGIT ZERO) - 1b59 (BALINESE DIGIT NINE)
1bb0 (SUNDANESE DIGIT ZERO) - 1bb9 (SUNDANESE DIGIT NINE)
1c40 (LEPCHA DIGIT ZERO) - 1c49 (LEPCHA DIGIT NINE)
1c50 (OL CHIKI DIGIT ZERO) - 1c59 (OL CHIKI DIGIT NINE)
a620 (VAI DIGIT ZERO) - a629 (VAI DIGIT NINE)
a8d0 (SAURASHTRA DIGIT ZERO) - a8d9 (SAURASHTRA DIGIT NINE)
a900 (KAYAH LI DIGIT ZERO) - a909 (KAYAH LI DIGIT NINE)
a9d0 (JAVANESE DIGIT ZERO) - a9d9 (JAVANESE DIGIT NINE)
aa50 (CHAM DIGIT ZERO) - aa59 (CHAM DIGIT NINE)
abf0 (MEETEI MAYEK DIGIT ZERO) - abf9 (MEETEI MAYEK DIGIT NINE)
ff10 (FULLWIDTH DIGIT ZERO) - ff19 (FULLWIDTH DIGIT NINE)
@densh
Copy link
Member

densh commented Mar 7, 2018

Thanks for the bug report!

@densh densh added this to the Backlog milestone Mar 7, 2018
@LeeTibbert
Copy link
Contributor

@martijnhoekstra @densh

As a fellow user of ScalaNative, thank you for reporting this problem.
Are you still experiencing it with ScalaNative >= 0.3.8, released July 16, 2018.
I am just respectfully looking for a top of the head, "Yeah, I this gets me twice an hour!"
answer, not asking for more than 2 minutes of your time.

TLDR;

Looks like PR #651 dated April 21, 2017 (the 2017 looks suspicious!)
fixes this. From my reading of the Git history, this shipped in S-N 0.3.8.
If you are still experiencing problems, I'll dig deeper.

If you are no longer experiencing this problem, I believe this issue can
be closed.

Long Story:

You carefully documented report was incredibly helpful.

CharacterSuite.scala has no relevant tests.

When I could not replicate the reported issue using one arbitrarily chosen digit in the
list you kindly supplied I fell back to exhaustive testing.

I took the list off ranges you documented and wrote a program to
test all members of the range and one character below and one
above the range. For all ranges, for all tested characters isDigit()
returned the appropriate result.

I tested first using Java 10, then Java 8, the current S-N support standard.

Looks fixed to me. Please advise. Thank you.

Lee
gardening on the backlog.

@martijnhoekstra
Copy link
Author

The bug is still there, but the title is wrong (the issue is with digit, as in the description, not with isDigit as in the title)

See: https://scastie.scala-lang.org/nlxWbwbyQumeFTzJGRxfPw

The native digit method there is lifted from https://github.com/scala-native/scala-native/blob/master/javalib/src/main/scala/java/lang/Character.scala#L1152-L1168

@martijnhoekstra martijnhoekstra changed the title Character.isDigit is too restrictive in what it considers digits Character.digit is too restrictive in what it considers digits Aug 29, 2018
@sjrd
Copy link
Collaborator

sjrd commented Aug 29, 2018

We recently fixed Character.digit in Scala.js. See scala-js/scala-js@396ff72.

@densh
Copy link
Member

densh commented Aug 29, 2018

@sjrd Thanks for the heads up! 👍

LeeTibbert pushed a commit to LeeTibbert/scala-native-fork that referenced this issue Aug 30, 2018
…considers digits"

  * The presenting problem was expressed in issue scala-native#1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method
    has now been ported.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
LeeTibbert pushed a commit to LeeTibbert/scala-native-fork that referenced this issue Aug 30, 2018
…considers digits"

  * The presenting problem was expressed in issue scala-native#1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * An unexpected but welcome consequence of this PR is that
    j.l.Integer.parseInt, j.l.Integer.parseUnsignedInt, j.l.Long.parseLong,
    and j.l.Long.parseUnsignedLong now handle Unicode digits. I also
    checked that at least j.l.Integer.parseInt rejected, as it should,
    Unicode supplementary characters, even if Character.digit reports them
    as digits.  ScalaJs PR "Fix scala-native#2935: Support non-ASCII scripts in
    `parseInt` and `parseLong`." appears to be unneeded in S-N.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method
    has now been ported.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
@LeeTibbert
Copy link
Contributor

@sjrd Thank you for the information about the Scala.js Character.digit fix. You saved me days if not
weeks of effort.

@martijnhoekstra PR #1333 is in the queue. Let me know if that solves
the reported issue or not. Thank you for helping improve ScalaNative.

LeeTibbert pushed a commit to LeeTibbert/scala-native-fork that referenced this issue Aug 30, 2018
…considers digits"

  * The presenting problem was expressed in issue scala-native#1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method
    has now been ported.

  * An unexpected but welcome consequence of this PR is that extended
    Unicode strings are now supported by j.l.Integer.parseInt,
    j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, &
    j.l.Long.parseUnsignedLong. I also checked that at least
    j.l.Integer.parseInt rejected, as it should, Unicode supplemental
    characters, even though they might be reported as digits by
    Character.digit.

    I was prompted to explore the parse* methods parsing Unicode by
    a ScalaJs commit to Character.scala after the Character.digit
    commit I was porting.  It appears that ScalaJs change is not needed
    by ScalaNative.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
LeeTibbert pushed a commit to LeeTibbert/scala-native-fork that referenced this issue Aug 30, 2018
…considers digits"

  * The presenting problem was expressed in issue scala-native#1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method has now
    been ported.

  * An unexpected but welcome consequence of this PR is that extended
    Unicode strings are now supported by j.l.Integer.parseInt,
    j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, &
    j.l.Long.parseUnsignedLong. I also checked that at least
    j.l.Integer.parseInt rejected, as it should, Unicode supplemental
    characters, even though they might be reported as digits by
    Character.digit.

    I was prompted to explore the parse* methods parsing Unicode by
    a ScalaJs commit to Character.scala after the Character.digit
    commit I was porting.  It appears that ScalaJs change is not needed
    by ScalaNative.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
densh pushed a commit that referenced this issue Feb 26, 2019
…gits (#1333)

* The presenting problem was expressed in issue #1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method has now
    been ported.

  * An unexpected but welcome consequence of this PR is that extended
    Unicode strings are now supported by j.l.Integer.parseInt,
    j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, &
    j.l.Long.parseUnsignedLong. I also checked that at least
    j.l.Integer.parseInt rejected, as it should, Unicode supplemental
    characters, even though they might be reported as digits by
    Character.digit.

    I was prompted to explore the parse* methods parsing Unicode by
    a ScalaJs commit to Character.scala after the Character.digit
    commit I was porting.  It appears that ScalaJs change is not needed
    by ScalaNative.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
@densh densh modified the milestones: Backlog, 0.4.0 Feb 26, 2019
ekrich pushed a commit to ekrich/scala-native that referenced this issue May 21, 2021
…considers digits (scala-native#1333)

* The presenting problem was expressed in issue scala-native#1189
    "Character.digit is too restrictive in what it considers digits "
    This issue is now fixed. ScalaNative now can correctly categorize
    digits from additional Unicode 16 bit blocks.

  * The code in this pull request is ported, with gratitude and minor
    modifications, from ScalaJS, especially ScalaJS commit:
        https://github.com/scala-js/scala-js/commit/
        396ff7287d06a7690ac12ed906b878e377fba91d

    Thank you, @sjrd, for both permission to use the code and, especially,
    notification that it existed.

  * The prior code had been missing a Java 8 method
    `def digit(codePoint: Int, radix: Int): Int`. That method has now
    been ported.

  * An unexpected but welcome consequence of this PR is that extended
    Unicode strings are now supported by j.l.Integer.parseInt,
    j.l.Integer.parseUnsignedInt, j.l.Long.parseLong, &
    j.l.Long.parseUnsignedLong. I also checked that at least
    j.l.Integer.parseInt rejected, as it should, Unicode supplemental
    characters, even though they might be reported as digits by
    Character.digit.

    I was prompted to explore the parse* methods parsing Unicode by
    a ScalaJs commit to Character.scala after the Character.digit
    commit I was porting.  It appears that ScalaJs change is not needed
    by ScalaNative.

  * The ScalaJs digit test case was added, with minor modifications to avoid
    naming conflicts, to CharacterSuite.scala.

64/32 bit issues:

      None known.

Documentation:

      Deserves a release note.

Testing:

	* Built and tested ("test-all") on X86_64. All tests passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants