Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8301991: Convert l10n properties resource bundles to UTF-8 native #12726

Closed

Conversation

justin-curtis-lu
Copy link
Member

@justin-curtis-lu justin-curtis-lu commented Feb 23, 2023

This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii.

In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8301991: Convert l10n properties resource bundles to UTF-8 native (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12726/head:pull/12726
$ git checkout pull/12726

Update a local copy of the PR:
$ git checkout pull/12726
$ git pull https://git.openjdk.org/jdk.git pull/12726/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12726

View PR using the GUI difftool:
$ git pr show -t 12726

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12726.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 23, 2023

👋 Welcome back jlu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 23, 2023

@justin-curtis-lu The following labels will be automatically applied to this pull request:

  • build
  • client
  • compiler
  • core-libs
  • hotspot-compiler
  • i18n
  • javadoc
  • jmx
  • kulla
  • net
  • security
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org kulla kulla-dev@openjdk.org i18n i18n-dev@openjdk.org javadoc javadoc-dev@openjdk.org security security-dev@openjdk.org jmx jmx-dev@openjdk.org build build-dev@openjdk.org client client-libs-dev@openjdk.org core-libs core-libs-dev@openjdk.org compiler compiler-dev@openjdk.org net net-dev@openjdk.org labels Feb 23, 2023
@@ -249,7 +249,7 @@ private boolean createFile(String propertiesPath, String outputPath,
Writer writer = null;
try {
writer = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream(outputPath), StandardCharsets.UTF_8));
new OutputStreamWriter(new FileOutputStream(outputPath), StandardCharsets.ISO_8859_1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ISO_8859_1 seems strange.
Since these are generated files, you could write them as UTF-8 and then override the default javac option for ascii when compiling just these files.

Or else just stay with ascii; no one should be looking at these files!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will stick with your latter solution, as since the .properties files were converted via native2ascii, it makes sense to write out via ascii.

@justin-curtis-lu justin-curtis-lu marked this pull request as ready for review March 15, 2023 16:02
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 15, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 15, 2023

Webrevs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be excluded because it's used in a test that relates to UTF-8 encoding (or not) of property files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, removed the changes for this file

Comment on lines 150 to 156
tr=\u00A4
tr_TR=TL
uk=\u00A4
uk_UA=\u0433\u0440\u043d.
uk_UA=\u0433\u0440\u043D.
zh=\u00A4
zh_CN=\uFFE5
zh_HK=HK$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they not encoded into UTF-8 native?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this.

I'm not sure, something must have went wrong when scripting through the entire JDK.

Will convert the file individually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CurrencySymbols.properties is fully converted to UTF-8 now

@@ -221,7 +223,8 @@ private static boolean createFile(String propertiesPath, String outputPath,
}
Properties p = new Properties();
try {
p.load(new FileInputStream(propertiesPath));
FileInputStream input = new FileInputStream(propertiesPath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this stream be closed in a finally { } block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better be try-with-resources?

outBuffer.append(toHex( aChar & 0xF));
outBuffer.append(toHex((aChar >> 8) & 0xF));
outBuffer.append(toHex((aChar >> 4) & 0xF));
outBuffer.append(toHex(aChar & 0xF));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't know when this tool is called, but why is it still writing in \unnnn style?

Copy link
Contributor

@wangweij wangweij Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably understand it now, source code still needs escaping. When can we put in UTF-8 there as well?

@justin-curtis-lu
Copy link
Member Author

Something thing to consider is that Intellj defaults .properties files to ISO 8859-1.

https://www.jetbrains.com/help/idea/properties-files.html#encoding

So users of Intellj / (other IDEs that default to ISO 8859-1 for .properties files) will need to change the default encoding to utf-8 for such files. Or ideally, the respective IDEs can change their default encoding for .properties files if this change is integrated.

@naotoj
Copy link
Member

naotoj commented Mar 31, 2023

Hmm, I just wonder why they are sticking to ISO-8859-1 as the default. I know j.u.Properties defaults to 8859-1, but PropertyResourceBundle, which is their primary use defaults to UTF-8 since JDK9 (https://openjdk.org/jeps/226)

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 29, 2023

@justin-curtis-lu This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@justin-curtis-lu
Copy link
Member Author

Wondering if anyone has any thoughts on the consequences of this PR, in relation to Intellj's (and other IDEs) default encoding for .properties files. Intellj sets the default encoding for .properties files to ISO-8859-1, which would be the wrong encoding if the .properties files are converted to UTF-8 native. This would cause certain key,values to be skewed when represented in the file.

Although the default file-encoding for .properties can be switched to UTF-8, it is not the default.

Wondering what some solutions/thoughts to this are.

@naotoj
Copy link
Member

naotoj commented May 11, 2023

I think this is fine, as those properties files are JDK's own. I believe the benefit of moving to UTF-8 outweighs the issue you wrote, which can be remedied by changing the encoding in the IDEs.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 8, 2023

@justin-curtis-lu This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 6, 2023

@justin-curtis-lu This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org client client-libs-dev@openjdk.org compiler compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org i18n i18n-dev@openjdk.org javadoc javadoc-dev@openjdk.org jmx jmx-dev@openjdk.org kulla kulla-dev@openjdk.org net net-dev@openjdk.org rfr Pull request is ready for review security security-dev@openjdk.org serviceability serviceability-dev@openjdk.org
6 participants