Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8300916: Re-examine the initialization of JNU Charset in StaticProperty #12171

Closed
wants to merge 10 commits into from

Conversation

naotoj
Copy link
Member

@naotoj naotoj commented Jan 24, 2023

This issue was found during the review of this PR: #12132 where Charset class was loaded/initialized at the phase 1 of the startup process. Since Charset depends on StaticProperty, loading of Charset class should be delayed. I basically moved cache for jnuCharset into the actual calling locations ProcessImpl and ProcessEnvironment for unix platforms so that initPhase1() won't initialize Charset class.
Unrelated, but I replaced Locale.ENGLISH with Locale.ROOT in the argument of toLowerCase().


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8301114 to be approved

Issues

  • JDK-8300916: Re-examine the initialization of JNU Charset in StaticProperty
  • JDK-8301114: Re-examine the initialization of JNU Charset in StaticProperty (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12171/head:pull/12171
$ git checkout pull/12171

Update a local copy of the PR:
$ git checkout pull/12171
$ git pull https://git.openjdk.org/jdk pull/12171/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12171

View PR using the GUI difftool:
$ git pr show -t 12171

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12171.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 24, 2023

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 24, 2023
@openjdk
Copy link

openjdk bot commented Jan 24, 2023

@naotoj The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jan 24, 2023
@mlbridge
Copy link

mlbridge bot commented Jan 24, 2023

@@ -69,6 +69,10 @@ final class ProcessImpl extends Process {
// Linux platforms support a normal (non-forcible) kill signal.
static final boolean SUPPORTS_NORMAL_TERMINATION = true;

// Cache for JNU Charset. The encoding name is guaranteed
// to be supported in this environment.
static final Charset JNU_CHARSET = Charset.forName(StaticProperty.jnuEncoding());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the change in this PR, the jnuCharset used to fallback to Charset.defaultCharset(), if the sun.jnu.encoding value wasn't a legal charset or couldn't be loaded.
On this line, we no longer do that. The comment, above this line, indicates that this is intentional. Looking at the code in java.lang.System#initPhase1(), the sun.jnu.encoding property value is updated to UTF-8 if the previous value isn't a supported charset. That is done just before the StaticProperty class is referenced and loaded. Effectively, as noted in this comment, the sun.jnu.encoding would just have the right usable/supported value stored in StaticProperty.jnuEncoding() by the time we reach here. So this change looks fine to me.

@jaikiran
Copy link
Member

Hello Naoto,

This issue was found during the review of this PR: #12132 where Charset class was loaded/initialized at the phase 1 of the startup process. Since Charset depends on StaticProperty

From the comment in that PR:

That said, the change does highlight an issue in StaticProperty. where it calls Charset.defaultCharset(), which in turn will StaticProperty.FILE_ENCODING before that class is fully initialized.

Looking at the code in Charset.defaultCharset() https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/charset/Charset.java#L651, I don't see it using StaticProperty class. Instead, it just queries the System.getProperty(). I don't see any other methods in Charset using StaticProperty either.

@AlanBateman
Copy link
Contributor

AlanBateman commented Jan 25, 2023

The change to StaticProperty to avoid calling out to Charset.defaultCharset from the initializer is good. However, the other part to that is the scenario in PR 12132 where the default Charset was accidentally located via the provider mechanism in JDK 9-17. If I read the changes correctly, that fragile scenario will come back. We have a couple of ways to avoid that, one being to ensure that defaultCharset is called before the boot layer is set. A simpler, and more reliable, would be to change Charset.defaultCharset to use standardProvider.charsetForName with the value of "file.encoding", and avoid the provider lookup completely.

@naotoj
Copy link
Member Author

naotoj commented Jan 25, 2023

Charset.defaultCharset() now uses standardProvider.charsetForName(<file.encoding>) charset.

@AlanBateman
Copy link
Contributor

Charset.defaultCharset() now uses standardProvider.charsetForName(<file.encoding>) charset.

I think this is the right thing to do. It can also be changed to use StaticProperty.fileEncoding() and maybe the field can be changed to be a @Stable field.

It might be that we will need to create a CSR and Release Note for this change. The scenario is PR 12132 is unfortunate but does not show that some deployments may have been relying on the this from JDK 9 to JDK 17. With the change here, we are doubling now on ensuring that the default charset is loaded from java.base.

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Jan 25, 2023
@naotoj
Copy link
Member Author

naotoj commented Jan 25, 2023

Good point. Will make the change and create a CSR/relnote.

@mlchung
Copy link
Member

mlchung commented Jan 25, 2023

A simpler, and more reliable, would be to change Charset.defaultCharset to use standardProvider.charsetForName with the value of "file.encoding", and avoid the provider lookup completely.

This is a good observation. The change looks good to me.

@@ -628,6 +629,7 @@ public SortedMap<String,Charset> run() {
});
}

@Stable
private static volatile Charset defaultCharset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field is stable then does the volatile go away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It should.

cs = standardProvider.charsetForName(StaticProperty.fileEncoding());
} else {
PrivilegedAction<Charset> pa =
() -> standardProvider.charsetForName(StaticProperty.fileEncoding());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what operation does this do security permission check? I don't think any.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I was under the impression that the call should have been made with privileged action (as in the comments for those access methods in StaticProperty, but the property value itself is not given to untrusted code. Will remove the permission check.

@takiguc
Copy link

takiguc commented Jan 26, 2023

Hello @AlanBateman
You said

The change to StaticProperty to avoid calling out to Charset.defaultCharset from the initializer is good. However, the other part to that is the scenario in PR 12132 where the default Charset was accidentally located via the provider mechanism in JDK 9-17. If I read the changes correctly, that fragile scenario will come back. We have a couple of ways to avoid that, one being to ensure that defaultCharset is called before the boot layer is set. A simpler, and more reliable, would be to change Charset.defaultCharset to use standardProvider.charsetForName with the value of "file.encoding", and avoid the provider lookup completely.

Could you explain about fragile scenario ?

@AlanBateman
Copy link
Contributor

Could you explain about fragile scenario ?

Charset.defaultCharset does the lookup on the first usage. Since JDK 18, that first usage is in initPhase1 so it will always find the default charset in java.base. In JDK 9-17, the first usage may have been in initPhase2 (during module system initialization), in which case it would locate the default charset in java.base, the first usage may have been after the VM is fully initialized, in which case it would be a service provider lookup and maybe find it in jdk.charsets or on the class path.

I've added a comment to the CSR with more context and to support the proposal to double down on requiring the default charset to be in java.base.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates/iterations, I think you've got this to a good place.

One thing to think about is having System.initPhase3 read file.encoding and if not UTF-8, it could call Charset.defaultCharset and if not the expected value then it could emit a warning like is done for a bad value of java.io.tmpdir.

One thing is whether to add a regression test to ensure that the default charset is UTF-8 when run with -Dfile.encoding=XXX and XXX is in the service provider module.

@@ -628,7 +628,8 @@ public SortedMap<String,Charset> run() {
});
}

private static volatile Charset defaultCharset;
@Stable
private static Charset defaultCharset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style wise, I think the annotation is added between the private and static modifiers in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@naotoj
Copy link
Member Author

naotoj commented Jan 26, 2023

Thanks for the updates/iterations, I think you've got this to a good place.

One thing to think about is having System.initPhase3 read file.encoding and if not UTF-8, it could call Charset.defaultCharset and if not the expected value then it could emit a warning like is done for a bad value of java.io.tmpdir.

One thing is whether to add a regression test to ensure that the default charset is UTF-8 when run with -Dfile.encoding=XXX and XXX is in the service provider module.

Filed: https://bugs.openjdk.org/browse/JDK-8301199

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Jan 27, 2023
@openjdk
Copy link

openjdk bot commented Jan 27, 2023

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8300916: Re-examine the initialization of JNU Charset in StaticProperty

Reviewed-by: mchung, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 96 new commits pushed to the master branch:

  • ae0e76d: 8301120: Cleanup utility classes java.util.Arrays and java.util.Collections
  • b8e5abc: 8301097: Update GHA XCode to 12.5.1
  • 9c4bc2c: 8301132: Test update for deprecated sprintf in Xcode 14
  • 7f05d57: 8217920: Lookup.defineClass injects a class that can access private members of any class in its own module
  • 22c976a: 8177418: NPE is not apparent for methods in java.util.TimeZone API docs
  • 7aaf76c: 8300924: Method::invoke throws wrong exception type when passing wrong number of arguments to method with 4 or more parameters
  • 49ff520: 8300241: Replace NULL with nullptr in share/classfile/
  • f52d35c: 8300240: Replace NULL with nullptr in share/ci/
  • 5c1ec82: 8301077: Replace NULL with nullptr in share/services/
  • dff4131: 8285850: [AIX] unreachable code in basic_tools.m4 -> BASIC_CHECK_TAR
  • ... and 86 more: https://git.openjdk.org/jdk/compare/86fed79670c109fc3a7fbe1eb2b1485c6dd99e2f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 27, 2023
@naotoj
Copy link
Member Author

naotoj commented Jan 30, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jan 30, 2023

Going to push as commit 3238139.
Since your change was applied there have been 122 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 30, 2023
@openjdk openjdk bot closed this Jan 30, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 30, 2023
@openjdk
Copy link

openjdk bot commented Jan 30, 2023

@naotoj Pushed as commit 3238139.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
5 participants