-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8041488: Locale-Dependent List Patterns #15130
Conversation
👋 Welcome back naoto! A progress list of the required criteria for merging this PR into |
Webrevs
|
/label remove build |
@naotoj |
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Thanks, Roger. All comments make sense to me. Will update the PR soon. |
* as "Monday, Wednesday, and Friday". This class provides the functionality | ||
* defined in Unicode Consortium's LDML specification for | ||
* <a href="https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns"> | ||
* List Patterns</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main function, it seems to me, is to change the representation from one form to another. what would you think about the following:
The {@code ListFormat} class is a tool for converting a list of strings to a text representation and vice versa in a locale-sensitive way. It transforms strings to text in accordance with the List Patterns (link) as defined in Unicode Consortium's LDML specification. For example, it can be used to format a list of 3 weekdays, i.e. "Monday", "Wednesday", "Friday", as "Monday, Wednesday, and Friday" in an inclusive list pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will modify the wording in the next revision. I think we should stick to the wording format
/parse
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stick to the general first sentence structure used by DateFormat, Format, and MessageFormat.
("tools" in OpenJDK are standalone programs.)
For example,
"ListFormat formats and parses lists of strings."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, Roger. The current version is derived from MessageFormat, which is the closest of all the *Format classes. Will come up with a better one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I mainly want to say that ListFormat doesn't create (produce) new items but change (convert) the presentation from one form to another.
* <a href="https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns"> | ||
* List Patterns</a>. | ||
* <p> | ||
* Three types of concatenation are provided: {@link Type#STANDARD STANDARD}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "Type" and "Style" together make up a specific pattern. It might be good to introduce the term "List Patterns" here first, that is, moving the introduction of patterns to the class description from the 1-arg getInstance method. once we have the terms established, we can then delve into the specific cases "types" and "styles" represent. Something like:
List Patterns
List Patterns are rules that define how a series or list is formed ... (include the description for the getInstance(String[] patterns) here)Standard Patterns
{@code ListFormat} supports a few pre-defined patterns with a combination of Type (link) and Style(link). Types and Styles are defined as follows.Type
{@link Type#STANDARD STANDARD}: a simple list with conjunction "and";...
Style
{@link Style#FULL FULL}: uses the conjunction word such as "and";{@link Style#SHORT SHORT}: uses the shorthand of the conjunction word, "&" (ampersand) for "and" for example;
{@link Style#NARROW NARROW}: uses no conjunction word.
For example, a combination of {@link Type#STANDARD STANDARD} and {@link Style#FULL FULL} forms an inclusive list pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that Type/Style/Locale forming a specific pattern is an implementation detail, so I would not describe it in the spec (although as you say they form a specific pattern in the impl). It could be that an impl of 3-arg getInstance() can be independent of patterns described in 1-arg getInstance().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type and Style select a pattern defined for the locale. (for the zero arg and three arg getInstance methods).
I think I would avoid the term "concatenation", the type defines a pattern that contains punctuation and connecting words. The style selects among the full, short, and narrow patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought they're already a part of the public API described in the current javadoc, e.g. ListFormat.Style and ListFormat.Type (https://cr.openjdk.org/~naoto/JDK-8041488-ListPatterns-PR/api.00/java.base/java/text/ListFormat.html). I thought you meant to support general patterns (the List Patterns) and a few (specific) "type attributes" as in the LDML spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are public API. My intention was to keep those patterns exposure as minimal as possible, as they are complicated so that users who just wish to acquire formats only with Type/Style/Locale need not know it. Only those who want to customize their own patterns can do so with the 1-arg getInstance(), thus I wanted the description in that method.
/** | ||
* The {@code UNIT} ListFormat style. This style concatenates | ||
* elements, useful for enumerating units. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "style" used in Type, I assume you meant "type"? Just that it might be confused with Style below.
Same as previous comments, a combination of Type and Style, if I understand correctly, forms a specific pattern. I might say something about it in the enum class description.
A STANDARD type then is a simple list with conjunction "and", or an inclusive list, and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Will correct those style/type typos.
A STANDARD type then is a simple list with conjunction "and", or an inclusive list, and etc.
I could not quite catch what you meant by this. Can you please elaborate on it more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a List Pattern, Type, as in the current document, defined its type as in the language, while Style the writing style. I guess I would try to avoid stating it "concatenates elements", but rather represents a simple list with conjunction "and", that is also called an inclusive list as in the language term.
private String createMessageFormatString(int count) { | ||
var sb = new StringBuilder(256).append(patterns[START]); | ||
IntStream.range(2, count - 1).forEach(i -> sb.append(middleBetween).append("{").append(i).append("}")); | ||
sb.append(patterns[END].replaceFirst("\\{0}", "").replaceFirst("\\{1}", "\\{" + (count - 1) + "\\}")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what it looks, it could be a concern for potentially adding large number of long strings with a list of small items. I don't seem to see where the input is limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will add some kind of limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see what arbitrary limit would make sense.
But perhaps an APINote that formatting the string from an excessively long list may exceed memory or string sizes. (Without being specific about the exception being thrown).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will add that @APinote in the 1-arg getInstance()
src/java.base/share/classes/sun/util/locale/provider/LocaleResources.java
Outdated
Show resolved
Hide resolved
* Alternatively, Locale, Type, and/or Style independent instances | ||
* can be created with {@link #getInstance(String[])}. The String array to the | ||
* method specifies the delimiting patterns for the start/middle/end portion of | ||
* the formatted string, as well as optional specialized patterns for two or three | ||
* elements. Refer to the method description for more detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Alternatively, Locale, Type, and/or Style independent instances | |
* can be created with {@link #getInstance(String[])}. The String array to the | |
* method specifies the delimiting patterns for the start/middle/end portion of | |
* the formatted string, as well as optional specialized patterns for two or three | |
* elements. Refer to the method description for more detail. | |
* Alternatively, more flexible patterns can be constructed from the pattern parts for the start, middle, | |
* and end of the formatted list as well as specialized patterns for lists of two or three elements. | |
* Refer to the {@link #getInstance(String[])} method description for more detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this part, I would not use patterns
in the first suggested sentence but use more flexible ListFormat instances
instead.
…ources.java Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very good.
* On parsing, if some ambiguity is found in the input string, such as delimiting | ||
* sequences being found in the input string, may produce the result that when formatted is not a | ||
* round-trip with the corresponding formatting. For example, a two element String list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* On parsing, if some ambiguity is found in the input string, such as delimiting | |
* sequences being found in the input string, may produce the result that when formatted is not a | |
* round-trip with the corresponding formatting. For example, a two element String list | |
* On parsing, if some ambiguity is found in the input string, such as delimiting | |
* sequences in the input string, the result, when formatted with the same formatting, does not | |
* re-produce the input string . For example, a two element String list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Roger. Incorporated.
* @param obj The object to format. Must be a List or an array | ||
* of Object. | ||
* @param toAppendTo where the text is to be appended | ||
* @param pos Ignored. Not used in ListFormat. May be null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not used?
I could see a use to identity the string inserted to enable highlighting or other markup around the new string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldPosition
is dedicated to identifying fields in *Format classes which are either Format.Field
or fields that have names ending with "_FIELD", which ListFormat has neither of them.
init(); | ||
} catch (IllegalArgumentException iae) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the patterns
array contains a null, perhaps due to corrupted stream contents, init()
will throw NPE.
I don't recommend catching NPE here, but perhaps init()
should check for nulls and throw IAE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, null checks for the patterns
array elements should be done also for the 1-arg factory method. Will add the check in init()
and the corresponding test.
@naotoj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 42 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
Going to push as commit d0be73a.
Your commit was automatically rebased without conflicts. |
Introducing a new formatting class for locale-dependent list patterns. The class is to provide the functionality from the Unicode Consortium's LDML specification for list patterns. For example, given a list of String as "Monday", "Wednesday", "Friday", its
format
method would produce "Monday, Wednesday, and Friday" in US English. A CSR has also been drafted, and its draft javadoc can be viewed here: https://cr.openjdk.org/~naoto/JDK-8041488-ListPatterns-PR/api.00/java.base/java/text/ListFormat.htmlProgress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15130/head:pull/15130
$ git checkout pull/15130
Update a local copy of the PR:
$ git checkout pull/15130
$ git pull https://git.openjdk.org/jdk.git pull/15130/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 15130
View PR using the GUI difftool:
$ git pr show -t 15130
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15130.diff
Webrev
Link to Webrev Comment