-
Notifications
You must be signed in to change notification settings - Fork 164
8301119: Support for GB18030-2022 #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/contributor add @jerboaa |
👋 Welcome back andrew! A progress list of the required criteria for merging this PR into |
This backport pull request has now been updated with issue from the original commit. |
@gnu-andrew |
@gnu-andrew |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Andrew,
I looked through the patch, and compared it with 11u, too, looks fine. Minor remarks inline, feel free to ignore the style nits.
new String[] { | ||
"gb18030-2000" | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this replacement to the comment above listing the changes done in init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I've updated the comment to document jdk.charset.GB18030
.
@@ -1184,6 +1185,8 @@ protected void init() { | |||
|
|||
String map = AccessController.doPrivileged( | |||
new GetPropertyAction("sun.nio.cs.map")); | |||
boolean isGB18030_2000 = "2000".equals(AccessController.doPrivileged( | |||
new GetPropertyAction("jdk.charset.GB18030"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More user friendly would be to test for "2000" or "2023" and to throw IAE for other values; but since later JDK versions don't do this either this is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but yes this is something that should go into trunk and be backported down. I'm already a little uneasy about fixing 8310947 in 8u first.
@@ -12595,6 +12637,7 @@ private CoderResult encodeArrayLoop(CharBuffer src, | |||
int condensedKey = 0; // expands to a four byte sequence | |||
int hiByte = 0, loByte = 0; | |||
currentState = GB18030_DOUBLE_BYTE; | |||
boolean isGB18030_2000 = ExtendedCharsets.isGB18030_2000(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in other places: naming these vars IS_2000 would have simplified reviewing (mechanical comparison with 11)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some renaming is intentional as it's a static boolean constant in 11u, but a method and a private instance variable in 8u. I do think the isGB18032_2000
is odd (presumably to avoid a name conflict with the method), so I will alter that to just is_2000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with making that isGB18030_2000
in the end and the method IS_2000
. The latter certainly shortens some of the GB18030.java
changes.
if (e3 == null) { throw new NullPointerException("e3"); } | ||
if (e4 == null) { throw new NullPointerException("e4"); } | ||
if (e5 == null) { throw new NullPointerException("e5"); } | ||
if (e6 == null) { throw new NullPointerException("e6"); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in other places: Objects.requireNonNull may be shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely forgot that method existed!
added = s.add(e5); | ||
if (!added) { throw new IllegalArgumentException("duplicate 5"); } | ||
added = s.add(e6); | ||
if (!added) { throw new IllegalArgumentException("duplicate 6"); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be shortened, but thats up to you (if one does not care for the string, a simple if (!(s.add(e1) && s.add(e2) && ...
would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I wrote this in such a long-winded way. Maybe there was more to it at one point and I didn't refine it after making further changes :/
Anyway, converted to a single line but kept the message, which can be useful in debugging (this is test output after all)
Thanks for the quick review. I wasn't expecting you to be able to find time today. I've made adjustments as discussed above. Tests pass as before. |
Still good. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
@gnu-andrew This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Thanks both. Flagged for approval with |
I see |
Going to push as commit 4dc1305. |
@gnu-andrew Pushed as commit 4dc1305. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting
-Djdk.charset.GB18030=2000
. This PR replaces openjdk/jdk8u#45.With the preceding test changes in place (openjdk/jdk8u#43 and openjdk/jdk8u#44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the
make
directory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in the SPI.java generator tool moves into ExtendedCharsets.java in the class library, as the file is not auto-generated in 8u. Following additional work by @jerboaa, the alias is now set to2022
initially, and then replaced in theinit()
method ifjdk.charset.GB18030
is2000
.In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allowed
IS_2000
to be package-private to sun.nio.cs. This is resolved by @jerboaa's changes which modifyaliasMap
appropriately on creation, and so allow the use of a package-private methodisGB18030_2000
inExtendedCharsets
instead.To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The
genCharsetProvider.sh
would need the alterations we have added toExtendedCharsets.java
to handle this, but converted toawk
. I did experiment with this, but saw test failures.The only adjustment to the
GB18030.java
changes is copyright headers and the removal ofIS_2000
as mentioned above.With the tests, the adjustments are just due to differing bug IDs, the absence of
@modules
and the use of constructs (var
) and library calls (Set.of
) that don't exist in 8u. TheList.of
andSet.of
calls are frequent issues in backports, so I used this as an opportunity to introduce a full set of equivalents into the test library. It should now be possible to just rewriteSet.of
toUtils.setOf
andList.of
toUtils.listOf
. The returned collections are expected to be unmodifiable, not containnull
and (in the case of sets) not contain duplicates. Simple replacement with a newly constructedArrayList
orHashSet
would not ensure this. While this test does not rely on this, others may, so it seemed worth providing a closer replacement for use in future backports.All
sun.nio.cs
andjava.nio.charset
tests pass with this patch applied.Progress
Issues
Reviewers
Contributors
<sgehwolf@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk8u-dev.git pull/339/head:pull/339
$ git checkout pull/339
Update a local copy of the PR:
$ git checkout pull/339
$ git pull https://git.openjdk.org/jdk8u-dev.git pull/339/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 339
View PR using the GUI difftool:
$ git pr show -t 339
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk8u-dev/pull/339.diff
Webrev
Link to Webrev Comment