Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8276970: Default charset for PrintWriter that wraps PrintStream #6401

Closed
wants to merge 3 commits into from

Conversation

naotoj
Copy link
Member

@naotoj naotoj commented Nov 15, 2021

Fixing the default charset for PrintWriter/OutputStreamWriter that wraps a PrintStream to its charset. This issue was raised during the conversations in #5771
A corresponding CSR has also been drafted: https://bugs.openjdk.java.net/browse/JDK-8277078


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8276970: Default charset for PrintWriter that wraps PrintStream

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6401/head:pull/6401
$ git checkout pull/6401

Update a local copy of the PR:
$ git checkout pull/6401
$ git pull https://git.openjdk.java.net/jdk pull/6401/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6401

View PR using the GUI difftool:
$ git pr show -t 6401

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6401.diff

@naotoj
Copy link
Member Author

naotoj commented Nov 15, 2021

/csr

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 15, 2021

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review csr Pull request needs approved CSR before integration labels Nov 15, 2021
@openjdk
Copy link

openjdk bot commented Nov 15, 2021

@naotoj this pull request will not be integrated until the CSR request JDK-8277078 for issue JDK-8276970 has been approved.

@openjdk
Copy link

openjdk bot commented Nov 15, 2021

@naotoj The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Nov 15, 2021
@mlbridge
Copy link

mlbridge bot commented Nov 15, 2021

Webrevs

@@ -68,6 +68,7 @@
private final boolean autoFlush;
private boolean trouble = false;
private Formatter formatter;
private Charset charset;
Copy link
Member

@jaikiran jaikiran Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Naoto, should this be formally marked as final?

Copy link
Member Author

@naotoj naotoj Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will make it final.

@takiguc
Copy link

takiguc commented Nov 16, 2021

I tested some of java tool commands on #5771 .

jar.exe, javac.exe, javadoc.exe, javap.exe, jdeps.exe, jlink.exe, jmod.exe, jpackage.exe

It worked fine as expected on CentOS7 (ja_JP.eucjp locale) and Windows 10 Pro for Japanese.

* default charset.
* OutputStreamWriter, which will convert characters into bytes using
* the charset in {@code out} if it is a {@code PrintStream}, or using
* the default charset.
*
Copy link
Contributor

@AlanBateman AlanBateman Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the wording in OutputStreamWriter because it puts the default encoding first and makes it just a bit clearer that the PS case is the exception.

Copy link
Member Author

@naotoj naotoj Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will use OutputStreamWriter's wording here. Also I am tempted to make PrintStream::charset() public, as some custom OutputStreamWriter implementations would also need the charset information.

Copy link
Contributor

@AlanBateman AlanBateman Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am tempted to make PrintStream::charset() public, as some custom OutputStreamWriter implementations would also need the charset information.

I think that would be good addition.

@naotoj
Copy link
Member Author

naotoj commented Nov 16, 2021

I tested some of java tool commands on #5771 .

jar.exe, javac.exe, javadoc.exe, javap.exe, jdeps.exe, jlink.exe, jmod.exe, jpackage.exe

It worked fine as expected on CentOS7 (ja_JP.eucjp locale) and Windows 10 Pro for Japanese.

Good to know! Thank you for your help.

*/
public Charset charset() {
return charset;
}
Copy link
Contributor

@AlanBateman AlanBateman Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. You could use {@return the charset used ...} to avoid repeating the message. Also might be better to move the method to after the constructors so that it's with the other instance methods.
The update method descriptions in PS, PW, and OutputStreamWriter look good.
So overall I think we've got to a good place. Wrapping a PS with PW and not inheriting the charset is an potential accident that goes back 20+ years.

Copy link
Member Author

@naotoj naotoj Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alan. Modified as suggested.

@naotoj
Copy link
Member Author

naotoj commented Nov 16, 2021

BTW, I still observe on Windows (system locale=ja-JP):

D:\projects\jdk\git\jdk>.\build\windows-x64\jdk\bin\jshell -J-Duser.language=ja
|  JShellへようこそ -- バージョン18-internal
|  概要については、次を入力してください: /help intro

jshell> System.out.println("\u3042")
縺・

This needs to be separately addressed in https://bugs.openjdk.java.net/browse/JDK-8274784

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Nov 17, 2021
@openjdk
Copy link

openjdk bot commented Nov 17, 2021

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8276970: Default charset for PrintWriter that wraps PrintStream

Reviewed-by: rriggs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 195 new commits pushed to the master branch:

  • 29e552c: 8272358: Some tests may fail when executed with other locales than the US
  • ce4471f: 8277346: ProblemList 7 serviceability/sa tests on macosx-x64
  • 45a60db: 8277045: G1: Remove unnecessary set_concurrency call in G1ConcurrentMark::weak_refs_work
  • 6bb0462: 8277224: sun.security.pkcs.PKCS9Attributes.toString() throws NPE
  • d8c0280: 8277316: ciReplay: dump_replay_data is not thread-safe
  • 007ad7c: 8277303: Terminology mismatch between JLS17-3.9 and SE17's javax.lang.model.SourceVersion method specs
  • 8881f29: 8277310: ciReplay: @CPI MethodHandle references not resolved
  • 262d070: 8277246: Check for NonRepudiation as well when validating a TSA certificate
  • a907b2b: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here"
  • b687664: 8277159: Fix java/nio/file/FileStore/Basic.java test by ignoring /run/user/* mount points
  • ... and 185 more: https://git.openjdk.java.net/jdk/compare/7115892f96a5a57ce9d37602038b787d19da5d81...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 17, 2021
@naotoj
Copy link
Member Author

naotoj commented Nov 18, 2021

BTW, I still observe on Windows (system locale=ja-JP):

D:\projects\jdk\git\jdk>.\build\windows-x64\jdk\bin\jshell -J-Duser.language=ja
|  JShellへようこそ -- バージョン18-internal
|  概要については、次を入力してください: /help intro

jshell> System.out.println("\u3042")
縺・

This needs to be separately addressed in https://bugs.openjdk.java.net/browse/JDK-8274784

The following diff seems to fix the garbled char issue above:

$ git diff
diff --git a/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java b/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
index 810e80acf47..be0b9dcb0c3 100644
--- a/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
+++ b/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
@@ -30,6 +30,7 @@ import java.io.PrintStream;
 import java.lang.reflect.Method;
 import java.net.Socket;

+import java.nio.charset.Charset;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.function.Consumer;
@@ -63,8 +64,8 @@ public class RemoteExecutionControl extends DirectExecutionControl implements Ex
         InputStream inStream = socket.getInputStream();
         OutputStream outStream = socket.getOutputStream();
         Map<String, Consumer<OutputStream>> outputs = new HashMap<>();
-        outputs.put("out", st -> System.setOut(new PrintStream(st, true)));
-        outputs.put("err", st -> System.setErr(new PrintStream(st, true)));
+        outputs.put("out", st -> System.setOut(new PrintStream(st, true, Charset.forName(System.getProperty("native.encoding")))));
+        outputs.put("err", st -> System.setErr(new PrintStream(st, true, Charset.forName(System.getProperty("native.encoding")))));
         Map<String, Consumer<InputStream>> input = new HashMap<>();
         input.put("in", System::setIn);
         forwardExecutionControlAndIO(new RemoteExecutionControl(), inStream, outStream, outputs, input);

@naotoj
Copy link
Member Author

naotoj commented Nov 18, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Nov 18, 2021

Going to push as commit 231fb61.
Since your change was applied there have been 195 commits pushed to the master branch:

  • 29e552c: 8272358: Some tests may fail when executed with other locales than the US
  • ce4471f: 8277346: ProblemList 7 serviceability/sa tests on macosx-x64
  • 45a60db: 8277045: G1: Remove unnecessary set_concurrency call in G1ConcurrentMark::weak_refs_work
  • 6bb0462: 8277224: sun.security.pkcs.PKCS9Attributes.toString() throws NPE
  • d8c0280: 8277316: ciReplay: dump_replay_data is not thread-safe
  • 007ad7c: 8277303: Terminology mismatch between JLS17-3.9 and SE17's javax.lang.model.SourceVersion method specs
  • 8881f29: 8277310: ciReplay: @CPI MethodHandle references not resolved
  • 262d070: 8277246: Check for NonRepudiation as well when validating a TSA certificate
  • a907b2b: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here"
  • b687664: 8277159: Fix java/nio/file/FileStore/Basic.java test by ignoring /run/user/* mount points
  • ... and 185 more: https://git.openjdk.java.net/jdk/compare/7115892f96a5a57ce9d37602038b787d19da5d81...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 18, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Nov 18, 2021
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 18, 2021
@openjdk
Copy link

openjdk bot commented Nov 18, 2021

@naotoj Pushed as commit 231fb61.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@takiguc
Copy link

takiguc commented Nov 18, 2021

BTW, I still observe on Windows (system locale=ja-JP):

D:\projects\jdk\git\jdk>.\build\windows-x64\jdk\bin\jshell -J-Duser.language=ja
|  JShellへようこそ -- バージョン18-internal
|  概要については、次を入力してください: /help intro

jshell> System.out.println("\u3042")
縺・

This needs to be separately addressed in https://bugs.openjdk.java.net/browse/JDK-8274784

The following diff seems to fix the garbled char issue above:

$ git diff
diff --git a/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java b/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
index 810e80acf47..be0b9dcb0c3 100644
--- a/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
+++ b/src/jdk.jshell/share/classes/jdk/jshell/execution/RemoteExecutionControl.java
@@ -30,6 +30,7 @@ import java.io.PrintStream;
 import java.lang.reflect.Method;
 import java.net.Socket;

+import java.nio.charset.Charset;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.function.Consumer;
@@ -63,8 +64,8 @@ public class RemoteExecutionControl extends DirectExecutionControl implements Ex
         InputStream inStream = socket.getInputStream();
         OutputStream outStream = socket.getOutputStream();
         Map<String, Consumer<OutputStream>> outputs = new HashMap<>();
-        outputs.put("out", st -> System.setOut(new PrintStream(st, true)));
-        outputs.put("err", st -> System.setErr(new PrintStream(st, true)));
+        outputs.put("out", st -> System.setOut(new PrintStream(st, true, Charset.forName(System.getProperty("native.encoding")))));
+        outputs.put("err", st -> System.setErr(new PrintStream(st, true, Charset.forName(System.getProperty("native.encoding")))));
         Map<String, Consumer<InputStream>> input = new HashMap<>();
         input.put("in", System::setIn);
         forwardExecutionControlAndIO(new RemoteExecutionControl(), inStream, outStream, outputs, input);

Many thanks, @naotoj .
It's good idea !.
By this code change, we just touch RemoteExecutionControl.java and encoding issue on jshell may be OK.
I'd like to discuss it on #5771 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
5 participants