-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/text/language: Cannot parse ISO/IEC 15897 (C, POSIX, etc.) as language tags #25340
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
Comments
ianlancetaylor
added
the
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
label
May 10, 2018
wking
added a commit
to wking/libpod
that referenced
this issue
May 11, 2018
We also considered ordering with sort.Strings, but Matthew rejected that because it uses a byte-by-byte UTF-8 comparison [1] which would fail many language-specific conventions [2]. There's some more discussion of the localeToLanguage mapping in [3]. Currently language.Parse does not handle either 'C' or 'POSIX', returning: und, language: tag is not well-formed for both. [1]: containers#686 (comment) [2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions [3]: golang/go#25340 Signed-off-by: W. Trevor King <wking@tremily.us>
rh-atomic-bot
pushed a commit
to containers/podman
that referenced
this issue
May 11, 2018
We also considered ordering with sort.Strings, but Matthew rejected that because it uses a byte-by-byte UTF-8 comparison [1] which would fail many language-specific conventions [2]. There's some more discussion of the localeToLanguage mapping in [3]. Currently language.Parse does not handle either 'C' or 'POSIX', returning: und, language: tag is not well-formed for both. [1]: #686 (comment) [2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions [3]: golang/go#25340 Signed-off-by: W. Trevor King <wking@tremily.us> Closes: #686 Approved by: mheon
Go's BCP 47 tags support passing POSIX using "en-US-u-va-posix". |
wking
added a commit
to wking/libpod
that referenced
this issue
Mar 6, 2019
We're going to feed this into Go's BCP 47 language parser. Language tags have the form [1]: language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] and locales have the form [2]: [language[_territory][.codeset][@modifier]] The modifier is useful for collation, but Go's language-based API [3] does not provide a way for us to supply it. This code converts our locale to a BCP 47 language by stripping the dot and later and replacing the first underscore, if any, with a hyphen. This will avoid errors like [4]: WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed when feeding language.Parse(...). [1]: https://tools.ietf.org/html/bcp47#section-2.1 [2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 [3]: golang/go#25340 [4]: containers#2494 Signed-off-by: W. Trevor King <wking@tremily.us>
wking
added a commit
to wking/cri-o
that referenced
this issue
Mar 8, 2019
We're going to feed this into Go's BCP 47 language parser. Language tags have the form [1]: language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse] and locales have the form [2]: [language[_territory][.codeset][@modifier]] The modifier is useful for collation, but Go's language-based API [3] does not provide a way for us to supply it. This code converts our locale to a BCP 47 language by stripping the dot and later and replacing the first underscore, if any, with a hyphen. This will avoid errors like [4]: WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed when feeding language.Parse(...). This ports containers/podman@69cb8639 (libpod/container_internal: Split locale at the first dot, etc., 2019-03-05, containers/podman#2550) to CRI-O. [1]: https://tools.ietf.org/html/bcp47#section-2.1 [2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 [3]: golang/go#25340 [4]: containers/podman#2494 Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
What did you expect to see?
A POSIX language tag.
What did you see instead?
A "tag is not well-formed" error. Passing
C
instead ofPOSIX
produces the same error.Neither of these subtags are in the IANA registry, so the current errors are strictly compliant with the current BCP 47 claim. But POSIX defines a collation order which the current behavior does not provide access to. As far as I can tell, there's currently no way to create a
Collator
that will sort using the POSIX rules. Do folks who need to support them need to roll their own sorter, or should x/text/language be extended to support locales from the ISO/IEC 15897 registry? Is this related to theposix
variant discussed here, here, and here (but not registered or grandfathered?)? Maybe I should be translatingLC_COLLATE=POSIX
andLC_COLLATE=C
to aund-u-va-posix
language tag for sorting? See also the distinction between languages and locales in RFC 2277. This may be related to this TODO?The text was updated successfully, but these errors were encountered: