Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264208: Console charset API #3419

Closed
wants to merge 14 commits into from
Closed

8264208: Console charset API #3419

wants to merge 14 commits into from

Conversation

@naotoj
Copy link
Member

@naotoj naotoj commented Apr 9, 2021

Please review the changes for the subject issue. This has been suggested in a recent discussion thread for the JEP 400 [1]. A CSR has also been drafted, and comments are welcome [2].


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3419/head:pull/3419
$ git checkout pull/3419

Update a local copy of the PR:
$ git checkout pull/3419
$ git pull https://git.openjdk.java.net/jdk pull/3419/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3419

View PR using the GUI difftool:
$ git pr show -t 3419

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3419.diff

@naotoj
Copy link
Member Author

@naotoj naotoj commented Apr 9, 2021

/csr

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 9, 2021

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@naotoj this pull request will not be integrated until the CSR request JDK-8264209 for issue JDK-8264208 has been approved.

@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@naotoj The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

* {@link java.nio.charset.Charset#defaultCharset() Charset.defaultCharset()}.
*
* @return A {@code Charset} object used in this {@code Console}.
* @since 17

This comment has been minimized.

@JoeWang-Java

JoeWang-Java Apr 9, 2021
Member

A couple of minor comments:
May replace {@code Charset} with @link;
Add "the" to the sentence: The returned charset corresponds to "the" input...
@return: javadoc guide suggests "do not capitalize and do not end with a period" when writing a phrase. But I guess for consistency in this class, it's okay to capitalize.

This comment has been minimized.

@naotoj

naotoj Apr 9, 2021
Author Member

Thanks, Joe. Modified as suggested.

This comment has been minimized.

@JoeWang-Java

JoeWang-Java Apr 9, 2021
Member

Thanks. Looks good to me.

* {@link java.nio.charset.Charset#defaultCharset() Charset.defaultCharset()}.
*
* @return A {@code Charset} object used in this {@code Console}.
* @since 17

This comment has been minimized.

@JoeWang-Java

JoeWang-Java Apr 9, 2021
Member

Thanks. Looks good to me.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 10, 2021

Mailing list message from Bernd Eckenfels on security-dev:

Hello,

I like the API, it is useful, however not enough to replace the defaultCharset once the Change to UTF8 is done. You still need a way to query the platforms file encoding (especially on Windows).

Also I wonder if the Javadoc needs to discuss platform aspects of console, especially System.out and LANG on unix vs. windows.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
Von: security-dev <security-dev-retn at openjdk.java.net> im Auftrag von Naoto Sato <naoto at openjdk.java.net>
Gesendet: Friday, April 9, 2021 11:06:00 PM
An: core-libs-dev at openjdk.java.net <core-libs-dev at openjdk.java.net>; security-dev at openjdk.java.net <security-dev at openjdk.java.net>
Betreff: Re: RFR: 8264208: Console charset API [v2]

Please review the changes for the subject issue. This has been suggested in a recent discussion thread for the JEP 400 [[1](https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-March/075214.html)]. A CSR has also been drafted, and comments are welcome [[2](https://bugs.openjdk.java.net/browse/JDK-8264209)].

Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:

Reflected the review comments.

-------------

Changes:
- all: https://git.openjdk.java.net/jdk/pull/3419/files
- new: https://git.openjdk.java.net/jdk/pull/3419/files/d6db04bb..8fd8f6e

Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3419&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3419&range=00-01

Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod
Patch: https://git.openjdk.java.net/jdk/pull/3419.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3419/head:pull/3419

PR: https://git.openjdk.java.net/jdk/pull/3419
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/security-dev/attachments/20210410/2912ead1/attachment.htm>


/**
* Returns the {@link java.nio.charset.Charset Charset} object used in
* this {@code Console}.

This comment has been minimized.

@AlanBateman

AlanBateman Apr 11, 2021
Contributor

The Console is a singleton and the existing methods use "the console" so I think we should do the same here.

We'll need to add to the description of the System.{in,out,err} fields, I don't mind if we do it as part of this PR or another issue.

This comment has been minimized.

@naotoj

naotoj Apr 12, 2021
Author Member

Javadoc updated. Instead of modifying System.out/err (System.in is not affected, as it does not convert b2c), I modified PrintStream javadoc.

This comment has been minimized.

@naotoj

naotoj Apr 12, 2021
Author Member

Reverted the changes to PrintStream, as it defaults to Charset.defaultCharset(). Added descriptions to System.out/err instead (and modified the impl).

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 12, 2021

Mailing list message from Naoto Sato on core-libs-dev:

Hi Bernd,

On 4/9/21 5:21 PM, Bernd Eckenfels wrote:

Hello,

I like the API, it is useful, however not enough to replace the defaultCharset once the Change to UTF8 is done. You still need a way to query the platforms file encoding (especially on Windows).

Initially I thought it would be beneficial to provide the method that
returns so-called `platform` charset, but I am not so sure introducing
it. The reason is that once JEP 400 is enabled, that method only serves
to migrate the old apps in the new environment. And that's where the
`COMPAT` system property would be utilized. If those apps have luxury to
make source code changes, I would recommend migrating the code by giving
the charset argument to the failing FileReader or alike.

Also I wonder if the Javadoc needs to discuss platform aspects of console, especially System.out and LANG on unix vs. windows.

I will add some descriptions to System.out/err in relation to Console,
but how they map to platform's settings (LANG on Unix/System locale on
Windows) is an implementation detail, and I don't think it should be be
described in the spec.

Naoto

setOut0(newPrintStream(fdOut, props.getProperty("sun.stdout.encoding")));
setErr0(newPrintStream(fdErr, props.getProperty("sun.stderr.encoding")));
setOut0(newPrintStream(fdOut, cs));
setErr0(newPrintStream(fdErr, cs));

This comment has been minimized.

@JoeWang-Java

JoeWang-Java Apr 13, 2021
Member

It was getting from sun.stdout.encoding or sun.stderr.encoding. After the change, when there is no console, it would be set with Charset.defaultCharset(). Would that be an incompatible change?

This comment has been minimized.

@naotoj

naotoj Apr 13, 2021
Author Member

Although the code path is different, the logic to determine the encoding is not changed, as sun.stdout/err.encoding are only set if the VM is invoked from a terminal (in fact, there's a bug where they aren't set even in a terminal on unix env, which I fixed in this patch) which is the same condition in which System.console() returns the console. And for both cases, it will default to Charset.defaultCharset() if they are not available.

Having said that, one regression test started to fail with this implementation change as follows:

Exception in thread "main" java.util.ServiceConfigurationError: Locale provider adapter "CLDR"cannot be instantiated.
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.forType(LocaleProviderAdapter.java:199)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.findAdapter(LocaleProviderAdapter.java:287)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.getAdapter(LocaleProviderAdapter.java:258)
	at java.base/java.text.NumberFormat.getInstance(NumberFormat.java:960)
	at java.base/java.text.NumberFormat.getNumberInstance(NumberFormat.java:518)
	at java.base/java.util.Scanner.useLocale(Scanner.java:1270)
	at java.base/java.util.Scanner.<init>(Scanner.java:543)
	at java.base/java.util.Scanner.<init>(Scanner.java:554)
	at TestConsole.main(TestConsole.java:54)
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.forType(LocaleProviderAdapter.java:188)
	... 8 more

I haven't looked at it in detail but it seems that calling System.console() in System.initPhase1() is not allowed, as the module system is not there yet. So I reverted the implementation to the original and simply retained the description in System.out/err with a change to determined by to equivalent to.

This comment has been minimized.

@JoeWang-Java

JoeWang-Java Apr 13, 2021
Member

Thanks for the explanation and updates. It's a worthwhile exercise to attempt to align things around the new method. A short note above line 2023 to record this might be useful as well (sth. like it's eq to console != null ? console.charset() : Charset.defaultCharset();). No need to create another update if you decide to add the note.

This comment has been minimized.

@naotoj

naotoj Apr 13, 2021
Author Member

Thanks. Added some explanations as suggested.

@takiguc
Copy link

@takiguc takiguc commented Apr 13, 2021

I have 2 concerns:

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.
  2. Sometimes stderr may be redirected to stdout by shell. Why do we need to set different encodings for these two (sun.stdout.encoding and sun.stderr.encoding) ?
@AlanBateman
Copy link
Contributor

@AlanBateman AlanBateman commented Apr 13, 2021

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.
  2. Sometimes stderr may be redirected to stdout by shell. Why do we need to set different encodings for these two (sun.stdout.encoding and sun.stderr.encoding) ?

Console's class description already covers this "If the virtual machine is started from an interactive command line without redirecting the standard input and output streams then ...". The existing reader and writer methods use the same charset.

@naotoj
Copy link
Member Author

@naotoj naotoj commented Apr 13, 2021

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.

As for this one, I am open to suggestions. I thought charset() was concise, and analogous to Charset.defaultCharset().

@@ -156,6 +159,9 @@ private System() {
* of a user even if the principal output stream, the value of the
* variable {@code out}, has been redirected to a file or other
* destination that is typically not continuously monitored.
* The encoding used in the conversion from characters to bytes is
* equivalent to {@link Console#charset()} if the {@code Console}
* exists, {@link Charset#defaultCharset()} otherwise.
*/
public static final PrintStream err = null;

This comment has been minimized.

@AlanBateman

AlanBateman Apr 14, 2021
Contributor

I still think we need something in System.in to link to Console::charset. If someone creates an InputStreamReader(System.in) then it will use the default charset, they should be using Console::reader for these cases.

This comment has been minimized.

@naotoj

naotoj Apr 14, 2021
Author Member

Added some explanation to it.


/**
* Returns the {@link java.nio.charset.Charset Charset} object used in
* the {@code Console}.

This comment has been minimized.

@AlanBateman

AlanBateman Apr 15, 2021
Contributor

What would you think about re-phrasing the first sentence to use "for the Console" rather than "in the Console".

This comment has been minimized.

@naotoj

naotoj Apr 15, 2021
Author Member

Changed to "for the Console", as well as @return.

* {@link Console#reader()}.
*
* @see Console#charset()
* @see Console#reader()

This comment has been minimized.

@AlanBateman

AlanBateman Apr 15, 2021
Contributor

What would you think about changing the example in InputStreamReader class description as part of this?

This comment has been minimized.

@naotoj

naotoj Apr 15, 2021
Author Member

Replaced System.in with generic anInputStream, as changing new InputStreamReader with Console.reader() would defy the purpose of the example.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

For the test, can it be re-written in Java.
The direction has been to avoid creating new shell tests as they are fragile.
There are test utilities in ProcessTool to make launching and checking for output very easy.

@@ -48,7 +48,7 @@
*

This comment has been minimized.

@RogerRiggs

RogerRiggs Apr 16, 2021
Contributor

Oddly, none of the reference in this class to the default charset are links to Charset.defaultCharset().
That would be a useful addition, either in the class javadoc or in the 1-arg constructor that uses the default charset.

This comment has been minimized.

@naotoj

naotoj Apr 16, 2021
Author Member

Thanks, Roger. Both are good suggestions. Will incorporate them into the next iteration.

@openjdk openjdk bot removed the csr label Apr 21, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 21, 2021

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264208: Console charset API

Reviewed-by: joehw, rriggs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 57 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Apr 21, 2021
// This method is called in sun.security.util.Password,
// cons already exists when this method is called
return cons.cs;
}
});

This comment has been minimized.

@RogerRiggs

RogerRiggs Apr 22, 2021
Contributor

Please keep the charset() method and return CHARSET.

I'm looking at a use case that needs to know the platform charset regardless of whether the console exists.
When a process is launched it may be redirected to /dev/tty or a pseudo tty and in that case
a Reader from that stream should be able to use the encoding of the platform.
Its still a work in progress, but it would save some refactoring or duplication later.

This comment has been minimized.

@naotoj

naotoj Apr 22, 2021
Author Member

Would the singleton Console.cons be instantiated in your use case? It is created only when isatty() (or Windows' equivalent) in the native code returns true.

This comment has been minimized.

@RogerRiggs

RogerRiggs Apr 22, 2021
Contributor

Not always, for example, if stderr was redirected to a terminal but not stdin and stdout.
The istty check is only true if both stdin and stdout are ttys.

This comment has been minimized.

@naotoj

naotoj Apr 22, 2021
Author Member

Then charset() in the shared secret would return null. Would that suffice your case?

This comment has been minimized.

@RogerRiggs

RogerRiggs Apr 22, 2021
Contributor

I read lines 575-587 as initializing CHARSET regardless of whether the Console was created.

This comment has been minimized.

@naotoj

naotoj Apr 22, 2021
Author Member

OK, revived the charset() method.

@naotoj
Copy link
Member Author

@naotoj naotoj commented Apr 23, 2021

/integrate

@openjdk openjdk bot closed this Apr 23, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 23, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 23, 2021

@naotoj Since your change was applied there have been 57 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit bebfae4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@naotoj naotoj deleted the naotoj:JDK-8264208 branch Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants