-
Notifications
You must be signed in to change notification settings - Fork 219
8301119: Support for GB18030-2022 #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back andrew! A progress list of the required criteria for merging this PR into |
|
This backport pull request has now been updated with issue from the original commit. |
|
This backport pull request has now been updated with issue from the original commit. |
|
@gnu-andrew Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
|
The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: git checkout JDK-8301119
git fetch https://git.openjdk.org/jdk8u.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
@gnu-andrew this pull request can not be integrated into git checkout JDK-8301119
git fetch https://git.openjdk.org/jdk8u.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
@gnu-andrew Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
|
I'm not able to reproduce these failures locally, which makes it hard to diagnose what's going on here. I can try and move the character set to |
…64cpp Fix inter_masm_riscv64.cpp by fixing mdp->mdx and adding annotation
That was my bad. Ran on the wrong JDK build (so have hidden the comment). Sorry. I'll post a review of the patch today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use the late init hook in AbstractCharsetProvider.charsetForName() instead which would make this patch cleaner IMO. See gnu-andrew#1
| private String canonicalize(String csn) { | ||
| if (csn.startsWith("gb18030-")) { | ||
| return csn.equals("gb18030-2022") && !GB18030.IS_2000 || | ||
| csn.equals("gb18030-2000") && GB18030.IS_2000 ? "gb18030" : csn; | ||
| } else { | ||
| String acn = aliasMap.get(csn); | ||
| return (acn != null) ? acn : csn; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe if we hooked into the late initialization hook instead we wouldn't need this canonicalization as the aliasMap would map it back to the correct class name.
| import java.util.Map; | ||
| import sun.misc.ASCIICaseInsensitiveComparator; | ||
|
|
||
| import sun.nio.cs.ext.GB18030; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on GB18030 class (sun.nio.cs.ext. in charsets.jar) in package sun.nio.cs (rt.jar) seems worrisome.
| charset("GB18030", "GB18030", | ||
| new String[] { | ||
| "gb18030-2000" | ||
| GB18030.IS_2000 ? "gb18030-2000" : "gb18030-2022" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initialization will be wrong when run with LANG=zh_CN.GB18030 and -Djdk.charset.GB18030=2000as in that case the GB18030.IS_2000 will be false as the JVM won't have properly initialized yet. Again a suggestion to use the late init hook instead. See https://bugs.openjdk.org/browse/JDK-8310947
|
Replaced by openjdk/jdk8u-dev#339 |
This is being proposed for inclusion in 8u382 during rampdown, so that the changes are in place for when GB18030-2022 enforcement begins in August. It modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting
-Djdk.charset.GB18030=2000.With the preceding test changes in place (#43 and #44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the
makedirectory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in theSPI.javagenerator tool moves intoExtendedCharsets.javain the class library, as the file is not auto-generated in 8u. The change toStandardCharsets.java.templatelands inAbstractCharsetProvider.java.In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allows
IS_2000to be package-private tosun.nio.cs, whereas we need to make it public insun.nio.cs.extso it can be accessed fromsun.nio.cs.To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The
genCharsetProvider.shwould need the alterations we have added toExtendedCharsets.javato handle this, but converted to awk.The only adjustment to
GB18030.java, other than copyright headers, is to replace the use ofjdk.internal.misc.VM.initLevelwith that ofsun.misc.VM.isBooted. 8u does not provide as fine-grained access to the initialisation status as 11u, and so may force the use of the 2022 standard until a later stage in the bootup (BOOTEDisinitLevel() = 4in 11u).With the tests, the adjustments are just due to differing bug IDs, the absence of
@modulesand the use of constructs (var) and library calls (Set.of) that don't exist in 8u. TheList.ofandSet.ofcalls are frequent issues in backports, so I used this as an opportunity to introduce a full set of equivalents into the test library. It should now be possible to just rewriteSet.oftoUtils.setOfandList.oftoUtils.listOf. The returned collections are expected to be unmodifiable, not containnulland (in the case of sets) not contain duplicates. Simple replacement with a newly constructedArrayListorHashSetwould not ensure this. While this test does not rely on this, others may, so it seemed worth providing a closer replacement for use in future backports.All
sun.nio.cstests pass with this patch applied.Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk8u.git pull/45/head:pull/45$ git checkout pull/45Update a local copy of the PR:
$ git checkout pull/45$ git pull https://git.openjdk.org/jdk8u.git pull/45/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 45View PR using the GUI difftool:
$ git pr show -t 45Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk8u/pull/45.diff
Webrev
Link to Webrev Comment