Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base #14763

Closed
wants to merge 4 commits into from

Conversation

Glavo
Copy link
Contributor

@Glavo Glavo commented Jul 3, 2023

Clean up misuses of toLowerCase()/toUpperCase() in java.base.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base (Task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14763/head:pull/14763
$ git checkout pull/14763

Update a local copy of the PR:
$ git checkout pull/14763
$ git pull https://git.openjdk.org/jdk.git pull/14763/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14763

View PR using the GUI difftool:
$ git pr show -t 14763

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14763.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 3, 2023

👋 Welcome back Glavo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 3, 2023

@Glavo The following labels will be automatically applied to this pull request:

  • core-libs
  • net
  • nio
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org nio nio-dev@openjdk.org core-libs core-libs-dev@openjdk.org net net-dev@openjdk.org labels Jul 3, 2023
@jaikiran
Copy link
Member

Hello Glavo, I've created https://bugs.openjdk.org/browse/JDK-8311943 to track this change. Please update the title to this PR to 8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base so that it triggers the official RFR.

@Glavo Glavo changed the title Avoid locale-sensitive case conversions in java.base 8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base Jul 12, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 12, 2023
@mlbridge
Copy link

mlbridge bot commented Jul 12, 2023

Webrevs

@@ -628,7 +629,7 @@ else if ('0' <= c && c <= '9') {
peekc = c;
sval = String.copyValueOf(buf, 0, i);
if (forceLower)
sval = sval.toLowerCase();
sval = sval.toLowerCase(Locale.ROOT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this change to StreamTokenizer needs eyes. I think long standing behavior of the lowerCaseMode(true) has been to use the rules for the default locale so we need to be careful.

Copy link
Contributor Author

@Glavo Glavo Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this change to StreamTokenizer needs eyes. I think long standing behavior of the lowerCaseMode(true) has been to use the rules for the default locale so we need to be careful.

I investigated usage of this method on GitHub:

https://github.com/search?q=%22lowerCaseMode%28true%29%22+language%3AJava&type=code

In some of the use cases I investigated, it seems that no one wants to rely on the default locale.

However, while I think this corrects the behavior, this caused a change in the behavior of the API, so a CSR may be required. I don't want to debate this in this PR, so I'll revert this change and open a new PR in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a small suggestion to make it clear whats wanted here. In other projects I am involved in (Apache Lucene/Solr, Apache TIKA, PostgresSQL JDBC, Checkstyle itsself, Elasticserach/Opensearch), which use the forbiddenapis Maven/Gradle/Ant plugin, we forbid all calls to several Java APIs (including toLowerCase/toUpperCase case). All bytecode using this will build failure (FYI, we also disallow other stuff like relying of default timezone or characterset).
To make it clear what is really intended, those projects agreed on having toLowerCase(Locale.getDefault()), so it is explicit what's wanted.
Without that it could be that somebody else starts the discussion again.

This is just a suggestion to be explicit as it makes maintaining the code easier.

Copy link
Contributor Author

@Glavo Glavo Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a small suggestion to make it clear whats wanted here. In other projects I am involved in (Apache Lucene/Solr, Apache TIKA, PostgresSQL JDBC, Checkstyle itsself, Elasticserach/Opensearch), which use the forbiddenapis Maven/Gradle/Ant plugin, we forbid all calls to several Java APIs (including toLowerCase/toUpperCase case). All bytecode using this will build failure (FYI, we also disallow other stuff like relying of default timezone or characterset). To make it clear what is really intended, those projects agreed on having toLowerCase(Locale.getDefault()), so it is explicit what's wanted. Without that it could be that somebody else starts the discussion again.

This is just a suggestion to be explicit as it makes maintaining the code easier.

I agree with this.

I'm working on deprecating toLowerCase() and toUpperCase(), this PR is part of that effort. I wish to convert all use cases of them to toLowerCase(Locale) and toUpperCase(Locale).

More backstory is detailed in #13434 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, while I think this corrects the behavior, this caused a change in the behavior of the API, so a CSR may be required. I don't want to debate this in this PR, so I'll revert this change and open a new PR in the future.

StreamTokenizer is a very old API and changing long standing behavior may break something or be observable with existing code/usages. I see youve reverted this part (thanks) and looking at it separately is fine. It might be that the conclusion is that it's just too risky to change, in which case Uwe's suggestion is good and would avoid it showing up on someone's else radar in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be that the conclusion is that it's just too risky to change, in which case Uwe's suggestion is good and would avoid it showing up on someone's else radar in the future.

Until we're sure we want to normalize a usage of toLowerCase() to one of toLowerCase(Locale.ROOT) or toLowerCase(Locale.getDefault()), I think it should be left here as-is, thus keeping it in an ambiguous state to remind us to continue discussing it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't normalize this use case to be locale-independent, then I even think lowerCaseMode should be deprecated, because it's almost impossible for users to get expected behavior with this method.

In order to make it meaningful, I think it is still necessary to consider making it locale insensitive. We can allow users to fall back to the old behavior through new system properties, or introduce new API methods in StreamTokenizer to allow users to set the Locale to be used.

@mlbridge
Copy link

mlbridge bot commented Jul 14, 2023

Mailing list message from Remi Forax on nio-dev:

----- Original Message -----

From: "Uwe Schindler" <uschindler at openjdk.org>
To: "core-libs-dev" <core-libs-dev at openjdk.org>, net-dev at openjdk.org, nio-dev at openjdk.org, security-dev at openjdk.org
Sent: Wednesday, July 12, 2023 6:08:17 PM
Subject: Re: RFR: 8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base [v2]

On Wed, 12 Jul 2023 14:31:53 GMT, Glavo <duke at openjdk.org> wrote:

src/java.base/share/classes/java/io/StreamTokenizer.java line 632:

630: sval = String.copyValueOf(buf, 0, i);
631: if (forceLower)
632: sval = sval.toLowerCase(Locale.ROOT);

I suspect this change to StreamTokenizer needs eyes. I think long standing
behavior of the lowerCaseMode(true) has been to use the rules for the default
locale so we need to be careful.

I suspect this change to StreamTokenizer needs eyes. I think long standing
behavior of the lowerCaseMode(true) has been to use the rules for the default
locale so we need to be careful.

I investigated usage of this method on GitHub:

https://github.com/search?q=%22lowerCaseMode%28true%29%22+language%3AJava&type=code

In some of the use cases I investigated, it seems that no one wants to rely on
the default locale.

However, while I think this corrects the behavior, this caused a change in the
behavior of the API, so a CSR may be required. I don't want to debate this in
this PR, so I'll revert this change and open a new PR in the future.

Maybe a small suggestion to make it clear whats wanted here. In other projects I
am involved in (Apache Lucene/Solr, Apache TIKA, PostgresSQL JDBC, Checkstyle
itsself, Elasticserach/Opensearch), which use the [forbiddenapis
Maven/Gradle/Ant plugin](https://github.com/policeman-tools/forbidden-apis/),
we forbid all calls to several Java APIs (including toLowerCase/toUpperCase
case). All bytecode using this will build failure (FYI, we also disallow other
stuff like relying of default timezone or characterset).
To make it clear what is really intended, those projects agreed on having
`toLowerCase(Locale.getDefault())`, so it is explicit what's wanted.
Without that it could be that somebody else starts the discussion again.

This is just a suggestion to be explicit as it makes maintaining the code
easier.

One solution is to deprecate String.toLowerCase()/toUpperCase(), forcing users to explicitly use the variants that takes a Locale.
Obviously, I'm talking about a simple deprecation not a deprecation for removal.

R?mi

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 11, 2023

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@AlanBateman
Copy link
Contributor

Now that the incompatible change to StreamTokenizer is dropped from this change then I assume the rest can be reviewed.

@Glavo
Copy link
Contributor Author

Glavo commented Aug 15, 2023

I updated this PR to resolve the merge conflict. Now it is waiting to be reviewed again.

@Glavo
Copy link
Contributor Author

Glavo commented Aug 16, 2023

Can someone review this PR?

Copy link
Member

@naotoj naotoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the changes.

@openjdk
Copy link

openjdk bot commented Aug 16, 2023

@Glavo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8311943: Cleanup usages of toLowerCase() and toUpperCase() in java.base

Reviewed-by: naoto

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • 13f6450: 8313765: Invalid CEN header (invalid zip64 extra data field size)
  • 24e896d: 8310275: Bug in assignment operator of ReservedMemoryRegion
  • 1925508: 8314144: gc/g1/ihop/TestIHOPStatic.java fails due to extra concurrent mark with -Xcomp
  • b80001d: 8314209: Wrong @SInCE tag for RandomGenerator::equiDoubles
  • ef6db5c: 8314211: Add NativeLibraryUnload event
  • 49ddb19: 8313760: [REDO] Enhance AES performance
  • d46f0fb: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally
  • 38687f1: 8314262: GHA: Cut down cross-compilation sysroots deeper
  • a602624: 8314020: Print instruction blocks in byte units
  • 0b12480: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected
  • ... and 12 more: https://git.openjdk.org/jdk/compare/a02d65efccaab5bb7c2f2aad4a2eb5062f545ef8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@naotoj) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 16, 2023
@Glavo
Copy link
Contributor Author

Glavo commented Aug 16, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 16, 2023
@openjdk
Copy link

openjdk bot commented Aug 16, 2023

@Glavo
Your change (at version c616072) is now ready to be sponsored by a Committer.

@naotoj
Copy link
Member

naotoj commented Aug 16, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Aug 16, 2023

Going to push as commit b32d641.
Since your change was applied there have been 22 commits pushed to the master branch:

  • 13f6450: 8313765: Invalid CEN header (invalid zip64 extra data field size)
  • 24e896d: 8310275: Bug in assignment operator of ReservedMemoryRegion
  • 1925508: 8314144: gc/g1/ihop/TestIHOPStatic.java fails due to extra concurrent mark with -Xcomp
  • b80001d: 8314209: Wrong @SInCE tag for RandomGenerator::equiDoubles
  • ef6db5c: 8314211: Add NativeLibraryUnload event
  • 49ddb19: 8313760: [REDO] Enhance AES performance
  • d46f0fb: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally
  • 38687f1: 8314262: GHA: Cut down cross-compilation sysroots deeper
  • a602624: 8314020: Print instruction blocks in byte units
  • 0b12480: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected
  • ... and 12 more: https://git.openjdk.org/jdk/compare/a02d65efccaab5bb7c2f2aad4a2eb5062f545ef8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 16, 2023
@openjdk openjdk bot closed this Aug 16, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Aug 16, 2023
@openjdk
Copy link

openjdk bot commented Aug 16, 2023

@naotoj @Glavo Pushed as commit b32d641.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@Glavo Glavo deleted the case-conversion-java-base branch August 16, 2023 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated net net-dev@openjdk.org nio nio-dev@openjdk.org security security-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

5 participants