Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8041488: Locale-Dependent List Patterns #15130

Closed
wants to merge 26 commits into from

Conversation

naotoj
Copy link
Member

@naotoj naotoj commented Aug 2, 2023

Introducing a new formatting class for locale-dependent list patterns. The class is to provide the functionality from the Unicode Consortium's LDML specification for list patterns. For example, given a list of String as "Monday", "Wednesday", "Friday", its format method would produce "Monday, Wednesday, and Friday" in US English. A CSR has also been drafted, and its draft javadoc can be viewed here: https://cr.openjdk.org/~naoto/JDK-8041488-ListPatterns-PR/api.00/java.base/java/text/ListFormat.html


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8295240 to be approved

Issues

  • JDK-8041488: Locale-Dependent List Patterns (Enhancement - P4)
  • JDK-8295240: Locale-Dependent List Patterns (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15130/head:pull/15130
$ git checkout pull/15130

Update a local copy of the PR:
$ git checkout pull/15130
$ git pull https://git.openjdk.org/jdk.git pull/15130/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15130

View PR using the GUI difftool:
$ git pr show -t 15130

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15130.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 2, 2023

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels Aug 2, 2023
@openjdk
Copy link

openjdk bot commented Aug 2, 2023

@naotoj The following labels will be automatically applied to this pull request:

  • build
  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org labels Aug 2, 2023
@mlbridge
Copy link

mlbridge bot commented Aug 2, 2023

@naotoj
Copy link
Member Author

naotoj commented Aug 2, 2023

/label remove build

@openjdk openjdk bot removed the build build-dev@openjdk.org label Aug 2, 2023
@openjdk
Copy link

openjdk bot commented Aug 2, 2023

@naotoj
The build label was successfully removed.

src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
naotoj and others added 5 commits August 28, 2023 12:55
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
@openjdk openjdk bot removed the rfr Pull request is ready for review label Aug 28, 2023
@naotoj
Copy link
Member Author

naotoj commented Aug 28, 2023

Thanks, Roger. All comments make sense to me. Will update the PR soon.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 28, 2023
@naotoj naotoj requested a review from JoeWang-Java September 1, 2023 18:51
* as "Monday, Wednesday, and Friday". This class provides the functionality
* defined in Unicode Consortium's LDML specification for
* <a href="https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns">
* List Patterns</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function, it seems to me, is to change the representation from one form to another. what would you think about the following:

The {@code ListFormat} class is a tool for converting a list of strings to a text representation and vice versa in a locale-sensitive way. It transforms strings to text in accordance with the List Patterns (link) as defined in Unicode Consortium's LDML specification. For example, it can be used to format a list of 3 weekdays, i.e. "Monday", "Wednesday", "Friday", as "Monday, Wednesday, and Friday" in an inclusive list pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will modify the wording in the next revision. I think we should stick to the wording format/parse here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd stick to the general first sentence structure used by DateFormat, Format, and MessageFormat.
("tools" in OpenJDK are standalone programs.)

For example,
"ListFormat formats and parses lists of strings."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Roger. The current version is derived from MessageFormat, which is the closest of all the *Format classes. Will come up with a better one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I mainly want to say that ListFormat doesn't create (produce) new items but change (convert) the presentation from one form to another.

* <a href="https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns">
* List Patterns</a>.
* <p>
* Three types of concatenation are provided: {@link Type#STANDARD STANDARD},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "Type" and "Style" together make up a specific pattern. It might be good to introduce the term "List Patterns" here first, that is, moving the introduction of patterns to the class description from the 1-arg getInstance method. once we have the terms established, we can then delve into the specific cases "types" and "styles" represent. Something like:

List Patterns

List Patterns are rules that define how a series or list is formed ... (include the description for the getInstance(String[] patterns) here)

Standard Patterns

{@code ListFormat} supports a few pre-defined patterns with a combination of Type (link) and Style(link). Types and Styles are defined as follows.

Type

{@link Type#STANDARD STANDARD}: a simple list with conjunction "and";

...

Style

{@link Style#FULL FULL}: uses the conjunction word such as "and";

{@link Style#SHORT SHORT}: uses the shorthand of the conjunction word, "&" (ampersand) for "and" for example;

{@link Style#NARROW NARROW}: uses no conjunction word.

For example, a combination of {@link Type#STANDARD STANDARD} and {@link Style#FULL FULL} forms an inclusive list pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Type/Style/Locale forming a specific pattern is an implementation detail, so I would not describe it in the spec (although as you say they form a specific pattern in the impl). It could be that an impl of 3-arg getInstance() can be independent of patterns described in 1-arg getInstance().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type and Style select a pattern defined for the locale. (for the zero arg and three arg getInstance methods).
I think I would avoid the term "concatenation", the type defines a pattern that contains punctuation and connecting words. The style selects among the full, short, and narrow patterns.

Copy link
Member

@JoeWang-Java JoeWang-Java Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they're already a part of the public API described in the current javadoc, e.g. ListFormat.Style and ListFormat.Type (https://cr.openjdk.org/~naoto/JDK-8041488-ListPatterns-PR/api.00/java.base/java/text/ListFormat.html). I thought you meant to support general patterns (the List Patterns) and a few (specific) "type attributes" as in the LDML spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are public API. My intention was to keep those patterns exposure as minimal as possible, as they are complicated so that users who just wish to acquire formats only with Type/Style/Locale need not know it. Only those who want to customize their own patterns can do so with the 1-arg getInstance(), thus I wanted the description in that method.

/**
* The {@code UNIT} ListFormat style. This style concatenates
* elements, useful for enumerating units.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "style" used in Type, I assume you meant "type"? Just that it might be confused with Style below.

Same as previous comments, a combination of Type and Style, if I understand correctly, forms a specific pattern. I might say something about it in the enum class description.

A STANDARD type then is a simple list with conjunction "and", or an inclusive list, and etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Will correct those style/type typos.

A STANDARD type then is a simple list with conjunction "and", or an inclusive list, and etc.

I could not quite catch what you meant by this. Can you please elaborate on it more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a List Pattern, Type, as in the current document, defined its type as in the language, while Style the writing style. I guess I would try to avoid stating it "concatenates elements", but rather represents a simple list with conjunction "and", that is also called an inclusive list as in the language term.

private String createMessageFormatString(int count) {
var sb = new StringBuilder(256).append(patterns[START]);
IntStream.range(2, count - 1).forEach(i -> sb.append(middleBetween).append("{").append(i).append("}"));
sb.append(patterns[END].replaceFirst("\\{0}", "").replaceFirst("\\{1}", "\\{" + (count - 1) + "\\}"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what it looks, it could be a concern for potentially adding large number of long strings with a list of small items. I don't seem to see where the input is limited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will add some kind of limitation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see what arbitrary limit would make sense.
But perhaps an APINote that formatting the string from an excessively long list may exceed memory or string sizes. (Without being specific about the exception being thrown).

Copy link
Member Author

@naotoj naotoj Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will add that @APinote in the 1-arg getInstance()

src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/text/ListFormat.java Outdated Show resolved Hide resolved
Comment on lines +86 to +90
* Alternatively, Locale, Type, and/or Style independent instances
* can be created with {@link #getInstance(String[])}. The String array to the
* method specifies the delimiting patterns for the start/middle/end portion of
* the formatted string, as well as optional specialized patterns for two or three
* elements. Refer to the method description for more detail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Alternatively, Locale, Type, and/or Style independent instances
* can be created with {@link #getInstance(String[])}. The String array to the
* method specifies the delimiting patterns for the start/middle/end portion of
* the formatted string, as well as optional specialized patterns for two or three
* elements. Refer to the method description for more detail.
* Alternatively, more flexible patterns can be constructed from the pattern parts for the start, middle,
* and end of the formatted list as well as specialized patterns for lists of two or three elements.
* Refer to the {@link #getInstance(String[])} method description for more detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part, I would not use patterns in the first suggested sentence but use more flexible ListFormat instances instead.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good.

Comment on lines 93 to 95
* On parsing, if some ambiguity is found in the input string, such as delimiting
* sequences being found in the input string, may produce the result that when formatted is not a
* round-trip with the corresponding formatting. For example, a two element String list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* On parsing, if some ambiguity is found in the input string, such as delimiting
* sequences being found in the input string, may produce the result that when formatted is not a
* round-trip with the corresponding formatting. For example, a two element String list
* On parsing, if some ambiguity is found in the input string, such as delimiting
* sequences in the input string, the result, when formatted with the same formatting, does not
* re-produce the input string . For example, a two element String list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Roger. Incorporated.

* @param obj The object to format. Must be a List or an array
* of Object.
* @param toAppendTo where the text is to be appended
* @param pos Ignored. Not used in ListFormat. May be null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not used?
I could see a use to identity the string inserted to enable highlighting or other markup around the new string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldPosition is dedicated to identifying fields in *Format classes which are either Format.Field or fields that have names ending with "_FIELD", which ListFormat has neither of them.

Comment on lines +535 to +536
init();
} catch (IllegalArgumentException iae) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the patterns array contains a null, perhaps due to corrupted stream contents, init() will throw NPE.
I don't recommend catching NPE here, but perhaps init() should check for nulls and throw IAE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, null checks for the patterns array elements should be done also for the 1-arg factory method. Will add the check in init() and the corresponding test.

@openjdk
Copy link

openjdk bot commented Sep 8, 2023

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8041488: Locale-Dependent List Patterns

Reviewed-by: joehw, rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 42 new commits pushed to the master branch:

  • 578ded4: 8312418: Add Elements.getEnumConstantBody
  • dccf670: 8306632: Add a JDK Property for specifying DTD support
  • a62c48b: 8315891: java/foreign/TestLinker.java failed with "error occurred while instantiating class TestLinker: null"
  • 9559e03: 8315578: PPC builds are broken after JDK-8304913
  • e409d07: 8315696: SignedLoggerFinderTest.java test failed
  • ab6a87e: 8314838: 3 compiler tests ignore vm flags
  • b3dfc39: 8315930: Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native"
  • ebc718f: 8315818: vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java fails on libgraal
  • 4a6bd81: 8315854: G1: Remove obsolete comment in G1ReclaimEmptyRegionsTask
  • c664f1c: 8307352: AARCH64: Improve itable_stub
  • ... and 32 more: https://git.openjdk.org/jdk/compare/bd477810b176696e0fd043f5594663ebcf9884cf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Sep 8, 2023
@naotoj
Copy link
Member Author

naotoj commented Sep 11, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Sep 11, 2023

Going to push as commit d0be73a.
Since your change was applied there have been 58 commits pushed to the master branch:

  • dd214d0: 8315437: Enable parallelism in vmTestbase/nsk/monitoring/stress/classload tests
  • 877731d: 8315770: serviceability/sa/TestJmapCoreMetaspace.java should run with -XX:-VerifyDependencies
  • d06a564: 8315765: G1: Incorrect use of G1LastPLABAverageOccupancy
  • 66b6a5a: 8277954: Replace use of monitors with explicit locks in the JDK LDAP provider implementation
  • 4cb4637: 8315970: Big-endian issues after JDK-8310929
  • ae08143: 8315611: Open source swing text/html and tree test
  • 7b3e697: 8315855: G1: Revise signature of set_humongous_candidate
  • 1941290: 8315942: Sort platform enums and definitions after JDK-8304913 follow-ups
  • 996b336: 8315781: Reduce the max value of GCDrainStackTargetSize
  • 35bccac: 8315841: RISC-V: Check for hardware TSO support
  • ... and 48 more: https://git.openjdk.org/jdk/compare/bd477810b176696e0fd043f5594663ebcf9884cf...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 11, 2023
@openjdk openjdk bot closed this Sep 11, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 11, 2023
@openjdk
Copy link

openjdk bot commented Sep 11, 2023

@naotoj Pushed as commit d0be73a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants