New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update uppercase lowercase to use UnicodeData.txt vs CaseFolding.txt #1611
Conversation
Analysis Output:
The other 49 differences, our encoding has an uppercase and jvm does no casing. Edit: 2019-07-17
For all differences, our encoding has lowercase and jvm does no casing. The previous encoding had 70 differences for
|
Just to confirm with myself that we are really doing well, I used
So the additional changes are due to more code points in Unicode 7.0.0. I can update this PR to use the 6.3.0 data so we more closely match JDK8 which uses 6.2.0. Then when we decide to track the next production JDK version, 11 we can upgrade to using Unicode 10.0.0. |
Update: the 4 codepoints above in the JDK need to be handled so I will change some code to handle that special case. I think it makes sense to have 100% parity with the current JDK 8 that we are tracking. |
Now we match JDK8 exactly.
Generation output
|
@densh Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good job @ekrich !
…ing.txt (scala-native#1611) * Update uppercase lowercase to use UnicodeData.txt vs CaseFolding.txt * Separate and update tests and simplify lookup code for new encoding * Does not use SpecialCasing.txt * Full parity with JDK8 using Unicode 6.3.0 * Remove test case above Unicode 6.3 upper/lower case range
This is a long overdue update. After learning more on the subject of Unicode, it was apparent that using the UnicodeData was more appropriate than the CaseFolding file. We are in the process of improving the transformation code which is public at https://github.com/ekrich/scala-unicode and we can also easily update to newer versions of Unicode. Unicode 10.0.0 which is used in JDK11 has been transformed using the same code.
The encoding compression is as follows: