Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264208: Console charset API #3419

Closed
wants to merge 14 commits into from
Closed

Conversation

naotoj
Copy link
Member

@naotoj naotoj commented Apr 9, 2021

Please review the changes for the subject issue. This has been suggested in a recent discussion thread for the JEP 400 [1]. A CSR has also been drafted, and comments are welcome [2].


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3419/head:pull/3419
$ git checkout pull/3419

Update a local copy of the PR:
$ git checkout pull/3419
$ git pull https://git.openjdk.java.net/jdk pull/3419/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3419

View PR using the GUI difftool:
$ git pr show -t 3419

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3419.diff

@naotoj
Copy link
Member Author

naotoj commented Apr 9, 2021

/csr

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 9, 2021

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review csr Pull request needs approved CSR before integration labels Apr 9, 2021
@openjdk
Copy link

openjdk bot commented Apr 9, 2021

@naotoj this pull request will not be integrated until the CSR request JDK-8264209 for issue JDK-8264208 has been approved.

@openjdk
Copy link

openjdk bot commented Apr 9, 2021

@naotoj The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Apr 9, 2021
@mlbridge
Copy link

mlbridge bot commented Apr 9, 2021

* {@link java.nio.charset.Charset#defaultCharset() Charset.defaultCharset()}.
*
* @return A {@code Charset} object used in this {@code Console}.
* @since 17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments:
May replace {@code Charset} with @link;
Add "the" to the sentence: The returned charset corresponds to "the" input...
@return: javadoc guide suggests "do not capitalize and do not end with a period" when writing a phrase. But I guess for consistency in this class, it's okay to capitalize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Joe. Modified as suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

* {@link java.nio.charset.Charset#defaultCharset() Charset.defaultCharset()}.
*
* @return A {@code Charset} object used in this {@code Console}.
* @since 17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

@mlbridge
Copy link

mlbridge bot commented Apr 10, 2021

Mailing list message from Bernd Eckenfels on security-dev:

Hello,

I like the API, it is useful, however not enough to replace the defaultCharset once the Change to UTF8 is done. You still need a way to query the platforms file encoding (especially on Windows).

Also I wonder if the Javadoc needs to discuss platform aspects of console, especially System.out and LANG on unix vs. windows.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
Von: security-dev <security-dev-retn at openjdk.java.net> im Auftrag von Naoto Sato <naoto at openjdk.java.net>
Gesendet: Friday, April 9, 2021 11:06:00 PM
An: core-libs-dev at openjdk.java.net <core-libs-dev at openjdk.java.net>; security-dev at openjdk.java.net <security-dev at openjdk.java.net>
Betreff: Re: RFR: 8264208: Console charset API [v2]

Please review the changes for the subject issue. This has been suggested in a recent discussion thread for the JEP 400 [[1](https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-March/075214.html)]. A CSR has also been drafted, and comments are welcome [[2](https://bugs.openjdk.java.net/browse/JDK-8264209)].

Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:

Reflected the review comments.

-------------

Changes:
- all: https://git.openjdk.java.net/jdk/pull/3419/files
- new: https://git.openjdk.java.net/jdk/pull/3419/files/d6db04bb..8fd8f6e

Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3419&range=01
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3419&range=00-01

Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod
Patch: https://git.openjdk.java.net/jdk/pull/3419.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3419/head:pull/3419

PR: https://git.openjdk.java.net/jdk/pull/3419
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/security-dev/attachments/20210410/2912ead1/attachment.htm>


/**
* Returns the {@link java.nio.charset.Charset Charset} object used in
* this {@code Console}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Console is a singleton and the existing methods use "the console" so I think we should do the same here.

We'll need to add to the description of the System.{in,out,err} fields, I don't mind if we do it as part of this PR or another issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc updated. Instead of modifying System.out/err (System.in is not affected, as it does not convert b2c), I modified PrintStream javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes to PrintStream, as it defaults to Charset.defaultCharset(). Added descriptions to System.out/err instead (and modified the impl).

@mlbridge
Copy link

mlbridge bot commented Apr 12, 2021

Mailing list message from Naoto Sato on core-libs-dev:

Hi Bernd,

On 4/9/21 5:21 PM, Bernd Eckenfels wrote:

Hello,

I like the API, it is useful, however not enough to replace the defaultCharset once the Change to UTF8 is done. You still need a way to query the platforms file encoding (especially on Windows).

Initially I thought it would be beneficial to provide the method that
returns so-called `platform` charset, but I am not so sure introducing
it. The reason is that once JEP 400 is enabled, that method only serves
to migrate the old apps in the new environment. And that's where the
`COMPAT` system property would be utilized. If those apps have luxury to
make source code changes, I would recommend migrating the code by giving
the charset argument to the failing FileReader or alike.

Also I wonder if the Javadoc needs to discuss platform aspects of console, especially System.out and LANG on unix vs. windows.

I will add some descriptions to System.out/err in relation to Console,
but how they map to platform's settings (LANG on Unix/System locale on
Windows) is an implementation detail, and I don't think it should be be
described in the spec.

Naoto

setOut0(newPrintStream(fdOut, props.getProperty("sun.stdout.encoding")));
setErr0(newPrintStream(fdErr, props.getProperty("sun.stderr.encoding")));
setOut0(newPrintStream(fdOut, cs));
setErr0(newPrintStream(fdErr, cs));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was getting from sun.stdout.encoding or sun.stderr.encoding. After the change, when there is no console, it would be set with Charset.defaultCharset(). Would that be an incompatible change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the code path is different, the logic to determine the encoding is not changed, as sun.stdout/err.encoding are only set if the VM is invoked from a terminal (in fact, there's a bug where they aren't set even in a terminal on unix env, which I fixed in this patch) which is the same condition in which System.console() returns the console. And for both cases, it will default to Charset.defaultCharset() if they are not available.

Having said that, one regression test started to fail with this implementation change as follows:

Exception in thread "main" java.util.ServiceConfigurationError: Locale provider adapter "CLDR"cannot be instantiated.
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.forType(LocaleProviderAdapter.java:199)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.findAdapter(LocaleProviderAdapter.java:287)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.getAdapter(LocaleProviderAdapter.java:258)
	at java.base/java.text.NumberFormat.getInstance(NumberFormat.java:960)
	at java.base/java.text.NumberFormat.getNumberInstance(NumberFormat.java:518)
	at java.base/java.util.Scanner.useLocale(Scanner.java:1270)
	at java.base/java.util.Scanner.<init>(Scanner.java:543)
	at java.base/java.util.Scanner.<init>(Scanner.java:554)
	at TestConsole.main(TestConsole.java:54)
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at java.base/sun.util.locale.provider.LocaleProviderAdapter.forType(LocaleProviderAdapter.java:188)
	... 8 more

I haven't looked at it in detail but it seems that calling System.console() in System.initPhase1() is not allowed, as the module system is not there yet. So I reverted the implementation to the original and simply retained the description in System.out/err with a change to determined by to equivalent to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation and updates. It's a worthwhile exercise to attempt to align things around the new method. A short note above line 2023 to record this might be useful as well (sth. like it's eq to console != null ? console.charset() : Charset.defaultCharset();). No need to create another update if you decide to add the note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added some explanations as suggested.

@takiguc
Copy link

takiguc commented Apr 13, 2021

I have 2 concerns:

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.
  2. Sometimes stderr may be redirected to stdout by shell. Why do we need to set different encodings for these two (sun.stdout.encoding and sun.stderr.encoding) ?

@AlanBateman
Copy link
Contributor

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.
  2. Sometimes stderr may be redirected to stdout by shell. Why do we need to set different encodings for these two (sun.stdout.encoding and sun.stderr.encoding) ?

Console's class description already covers this "If the virtual machine is started from an interactive command line without redirecting the standard input and output streams then ...". The existing reader and writer methods use the same charset.

@naotoj
Copy link
Member Author

naotoj commented Apr 13, 2021

  1. I think method name "charset()" is too short. It's not called frequently. This method name should explain functionality.

As for this one, I am open to suggestions. I thought charset() was concise, and analogous to Charset.defaultCharset().

@@ -156,6 +159,9 @@ private System() {
* of a user even if the principal output stream, the value of the
* variable {@code out}, has been redirected to a file or other
* destination that is typically not continuously monitored.
* The encoding used in the conversion from characters to bytes is
* equivalent to {@link Console#charset()} if the {@code Console}
* exists, {@link Charset#defaultCharset()} otherwise.
*/
public static final PrintStream err = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we need something in System.in to link to Console::charset. If someone creates an InputStreamReader(System.in) then it will use the default charset, they should be using Console::reader for these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanation to it.


/**
* Returns the {@link java.nio.charset.Charset Charset} object used in
* the {@code Console}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about re-phrasing the first sentence to use "for the Console" rather than "in the Console".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "for the Console", as well as @return.

* {@link Console#reader()}.
*
* @see Console#charset()
* @see Console#reader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about changing the example in InputStreamReader class description as part of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced System.in with generic anInputStream, as changing new InputStreamReader with Console.reader() would defy the purpose of the example.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test, can it be re-written in Java.
The direction has been to avoid creating new shell tests as they are fragile.
There are test utilities in ProcessTool to make launching and checking for output very easy.

@@ -48,7 +48,7 @@
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, none of the reference in this class to the default charset are links to Charset.defaultCharset().
That would be a useful addition, either in the class javadoc or in the 1-arg constructor that uses the default charset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Roger. Both are good suggestions. Will incorporate them into the next iteration.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Apr 21, 2021
@openjdk
Copy link

openjdk bot commented Apr 21, 2021

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264208: Console charset API

Reviewed-by: joehw, rriggs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 57 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 21, 2021
// This method is called in sun.security.util.Password,
// cons already exists when this method is called
return cons.cs;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the charset() method and return CHARSET.

I'm looking at a use case that needs to know the platform charset regardless of whether the console exists.
When a process is launched it may be redirected to /dev/tty or a pseudo tty and in that case
a Reader from that stream should be able to use the encoding of the platform.
Its still a work in progress, but it would save some refactoring or duplication later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the singleton Console.cons be instantiated in your use case? It is created only when isatty() (or Windows' equivalent) in the native code returns true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always, for example, if stderr was redirected to a terminal but not stdin and stdout.
The istty check is only true if both stdin and stdout are ttys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then charset() in the shared secret would return null. Would that suffice your case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read lines 575-587 as initializing CHARSET regardless of whether the Console was created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, revived the charset() method.

@naotoj
Copy link
Member Author

naotoj commented Apr 23, 2021

/integrate

@openjdk openjdk bot closed this Apr 23, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 23, 2021
@openjdk
Copy link

openjdk bot commented Apr 23, 2021

@naotoj Since your change was applied there have been 57 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit bebfae4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@naotoj naotoj deleted the JDK-8264208 branch April 29, 2021 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated security security-dev@openjdk.org
5 participants