Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8301119: Support for GB18030-2022 #45

Closed
wants to merge 1 commit into from

Conversation

gnu-andrew
Copy link
Member

@gnu-andrew gnu-andrew commented Jun 23, 2023

This is being proposed for inclusion in 8u382 during rampdown, so that the changes are in place for when GB18030-2022 enforcement begins in August. It modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting -Djdk.charset.GB18030=2000.

With the preceding test changes in place (#43 and #44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the make directory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in the SPI.java generator tool moves into ExtendedCharsets.java in the class library, as the file is not auto-generated in 8u. The change to StandardCharsets.java.template lands in AbstractCharsetProvider.java.

In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allows IS_2000 to be package-private to sun.nio.cs, whereas we need to make it public in sun.nio.cs.ext so it can be accessed from sun.nio.cs.

To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The genCharsetProvider.sh would need the alterations we have added to ExtendedCharsets.java to handle this, but converted to awk.

The only adjustment to GB18030.java, other than copyright headers, is to replace the use of jdk.internal.misc.VM.initLevel with that of sun.misc.VM.isBooted. 8u does not provide as fine-grained access to the initialisation status as 11u, and so may force the use of the 2022 standard until a later stage in the bootup (BOOTED is initLevel() = 4 in 11u).

With the tests, the adjustments are just due to differing bug IDs, the absence of @modules and the use of constructs (var) and library calls (Set.of) that don't exist in 8u. The List.of and Set.of calls are frequent issues in backports, so I used this as an opportunity to introduce a full set of equivalents into the test library. It should now be possible to just rewrite Set.of to Utils.setOf and List.of to Utils.listOf. The returned collections are expected to be unmodifiable, not contain null and (in the case of sets) not contain duplicates. Simple replacement with a newly constructed ArrayList or HashSet would not ensure this. While this test does not rely on this, others may, so it seemed worth providing a closer replacement for use in future backports.

All sun.nio.cs tests pass with this patch applied.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8301119: Support for GB18030-2022 (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk8u.git pull/45/head:pull/45
$ git checkout pull/45

Update a local copy of the PR:
$ git checkout pull/45
$ git pull https://git.openjdk.org/jdk8u.git pull/45/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 45

View PR using the GUI difftool:
$ git pr show -t 45

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk8u/pull/45.diff

Webrev

Link to Webrev Comment

@gnu-andrew gnu-andrew changed the base branch from master to pr/44 June 23, 2023 01:39
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 23, 2023

👋 Welcome back andrew! A progress list of the required criteria for merging this PR into pr/44 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title Backport cfe34ed89c4f6ef9a49dceef30da1e43b418b152 8186801: Add regression test to test mapping based charsets (generated at build time) Jun 23, 2023
@openjdk
Copy link

openjdk bot commented Jun 23, 2023

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added backport rfr Pull request is ready for review labels Jun 23, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 23, 2023

Webrevs

@gnu-andrew gnu-andrew changed the title 8186801: Add regression test to test mapping based charsets (generated at build time) Backport 5c4e744dabcf7785c35168db5d0458ccebfd41e6 Jun 23, 2023
@openjdk openjdk bot changed the title Backport 5c4e744dabcf7785c35168db5d0458ccebfd41e6 8301119: Support for GB18030-2022 Jun 23, 2023
@openjdk
Copy link

openjdk bot commented Jun 23, 2023

This backport pull request has now been updated with issue from the original commit.

@openjdk
Copy link

openjdk bot commented Jun 23, 2023

@gnu-andrew Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk-notifier openjdk-notifier bot changed the base branch from pr/44 to master June 23, 2023 12:24
@openjdk-notifier
Copy link

The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8301119
git fetch https://git.openjdk.org/jdk8u.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link

openjdk bot commented Jun 23, 2023

@gnu-andrew this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8301119
git fetch https://git.openjdk.org/jdk8u.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 23, 2023
@openjdk
Copy link

openjdk bot commented Jun 23, 2023

@gnu-andrew Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jun 23, 2023
jerboaa

This comment was marked as outdated.

@gnu-andrew
Copy link
Member Author

I'm not able to reproduce these failures locally, which makes it hard to diagnose what's going on here.

I can try and move the character set to sun.nio.cs after all and see if it makes a difference for you.

DingliZhang pushed a commit to DingliZhang/jdk8u that referenced this pull request Jun 25, 2023
…64cpp

Fix inter_masm_riscv64.cpp by fixing mdp->mdx and adding annotation
@jerboaa
Copy link
Contributor

jerboaa commented Jun 27, 2023

These 4 tests consistently fail for me with a fastdebug build:

FAILED: java/nio/charset/Charset/RegisteredCharsets.java
FAILED: sun/nio/cs/mapping/CoderTest.java
FAILED: sun/nio/cs/mapping/TestConv.java
FAILED: sun/nio/cs/TestGB18030.java

That was my bad. Ran on the wrong JDK build (so have hidden the comment). Sorry. I'll post a review of the patch today.

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use the late init hook in AbstractCharsetProvider.charsetForName() instead which would make this patch cleaner IMO. See gnu-andrew#1

Comment on lines +118 to +125
private String canonicalize(String csn) {
if (csn.startsWith("gb18030-")) {
return csn.equals("gb18030-2022") && !GB18030.IS_2000 ||
csn.equals("gb18030-2000") && GB18030.IS_2000 ? "gb18030" : csn;
} else {
String acn = aliasMap.get(csn);
return (acn != null) ? acn : csn;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if we hooked into the late initialization hook instead we wouldn't need this canonicalization as the aliasMap would map it back to the correct class name.

@@ -34,7 +34,7 @@
import java.util.Locale;
import java.util.Map;
import sun.misc.ASCIICaseInsensitiveComparator;

import sun.nio.cs.ext.GB18030;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on GB18030 class (sun.nio.cs.ext. in charsets.jar) in package sun.nio.cs (rt.jar) seems worrisome.

Comment on lines 115 to +117
charset("GB18030", "GB18030",
new String[] {
"gb18030-2000"
GB18030.IS_2000 ? "gb18030-2000" : "gb18030-2022"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization will be wrong when run with LANG=zh_CN.GB18030 and -Djdk.charset.GB18030=2000as in that case the GB18030.IS_2000 will be false as the JVM won't have properly initialized yet. Again a suggestion to use the late init hook instead. See https://bugs.openjdk.org/browse/JDK-8310947

@gnu-andrew
Copy link
Member Author

Replaced by openjdk/jdk8u-dev#339

@gnu-andrew gnu-andrew closed this Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport rfr Pull request is ready for review
2 participants