Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/language: Cannot parse ISO/IEC 15897 (C, POSIX, etc.) as language tags #25340

Open
wking opened this issue May 10, 2018 · 1 comment

Comments

@wking
Copy link

commented May 10, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

~/.local/lib/go/src/golang.org/x/text $ go version
go version go1.10.2 linux/arm64
~/.local/lib/go/src/golang.org/x/text $ git describe
v0.3.0-48-g67e48ad

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

$ go env GOARCH
arm64

What did you do?

$ cat main.go 
package main

import (
        "fmt"

        "golang.org/x/text/language"
)

func main() {
        tag, err := language.Parse("POSIX")
        fmt.Printf("%v\n%v\n", tag, err)
}
$ go run main.go 
und
language: tag is not well-formed

What did you expect to see?

A POSIX language tag.

What did you see instead?

A "tag is not well-formed" error. Passing C instead of POSIX produces the same error.

Neither of these subtags are in the IANA registry, so the current errors are strictly compliant with the current BCP 47 claim. But POSIX defines a collation order which the current behavior does not provide access to. As far as I can tell, there's currently no way to create a Collator that will sort using the POSIX rules. Do folks who need to support them need to roll their own sorter, or should x/text/language be extended to support locales from the ISO/IEC 15897 registry? Is this related to the posix variant discussed here, here, and here (but not registered or grandfathered?)? Maybe I should be translating LC_COLLATE=POSIX and LC_COLLATE=C to a und-u-va-posix language tag for sorting? See also the distinction between languages and locales in RFC 2277. This may be related to this TODO?

@gopherbot gopherbot added this to the Unreleased milestone May 10, 2018

wking added a commit to wking/libpod that referenced this issue May 10, 2018
hooks: Order injection by collated JSON filename
We also considered ordering with sort.Strings, but Matthew rejected
that because it uses a byte-by-byte UTF-8 comparison [1] which would
fail many language-specific conventions [2].

There's some more discussion of the localeToLanguage mapping in [3].
Currently language.Parse does not handle either 'C' or 'POSIX',
returning:

  und, language: tag is not well-formed

for both.

[1]: containers#686 (comment)
[2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
[3]: golang/go#25340

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 10, 2018
hooks: Order injection by collated JSON filename
We also considered ordering with sort.Strings, but Matthew rejected
that because it uses a byte-by-byte UTF-8 comparison [1] which would
fail many language-specific conventions [2].

There's some more discussion of the localeToLanguage mapping in [3].
Currently language.Parse does not handle either 'C' or 'POSIX',
returning:

  und, language: tag is not well-formed

for both.

[1]: containers#686 (comment)
[2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
[3]: golang/go#25340

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 11, 2018
hooks: Order injection by collated JSON filename
We also considered ordering with sort.Strings, but Matthew rejected
that because it uses a byte-by-byte UTF-8 comparison [1] which would
fail many language-specific conventions [2].

There's some more discussion of the localeToLanguage mapping in [3].
Currently language.Parse does not handle either 'C' or 'POSIX',
returning:

  und, language: tag is not well-formed

for both.

[1]: containers#686 (comment)
[2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
[3]: golang/go#25340

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 11, 2018
hooks: Order injection by collated JSON filename
We also considered ordering with sort.Strings, but Matthew rejected
that because it uses a byte-by-byte UTF-8 comparison [1] which would
fail many language-specific conventions [2].

There's some more discussion of the localeToLanguage mapping in [3].
Currently language.Parse does not handle either 'C' or 'POSIX',
returning:

  und, language: tag is not well-formed

for both.

[1]: containers#686 (comment)
[2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
[3]: golang/go#25340

Signed-off-by: W. Trevor King <wking@tremily.us>
rh-atomic-bot added a commit to containers/libpod that referenced this issue May 11, 2018
hooks: Order injection by collated JSON filename
We also considered ordering with sort.Strings, but Matthew rejected
that because it uses a byte-by-byte UTF-8 comparison [1] which would
fail many language-specific conventions [2].

There's some more discussion of the localeToLanguage mapping in [3].
Currently language.Parse does not handle either 'C' or 'POSIX',
returning:

  und, language: tag is not well-formed

for both.

[1]: #686 (comment)
[2]: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
[3]: golang/go#25340

Signed-off-by: W. Trevor King <wking@tremily.us>

Closes: #686
Approved by: mheon
@mpvl

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

Go's BCP 47 tags support passing POSIX using "en-US-u-va-posix".
Passing such a tag to collator should select posix sorting. Does that work for you?

wking added a commit to wking/libpod that referenced this issue Mar 6, 2019
libpod/container_internal: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3,4] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [5]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

[1]: https://golang.org/pkg/strings/#SplitN
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: https://tools.ietf.org/html/bcp47#section-2.1
[4]: golang/go#25340
[5]: containers#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue Mar 6, 2019
libpod/container_internal: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [4]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

[1]: https://tools.ietf.org/html/bcp47#section-2.1
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: golang/go#25340
[4]: containers#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/cri-o that referenced this issue Mar 8, 2019
lib/container_server: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [4]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

This ports containers/libpod@69cb863 (libpod/container_internal:
Split locale at the first dot, etc., 2019-03-05,
containers/libpod#2550) to CRI-O.

[1]: https://tools.ietf.org/html/bcp47#section-2.1
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: golang/go#25340
[4]: containers/libpod#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
joselamego added a commit to joselamego/cri-o that referenced this issue Mar 13, 2019
lib/container_server: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [4]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

This ports containers/libpod@69cb863 (libpod/container_internal:
Split locale at the first dot, etc., 2019-03-05,
containers/libpod#2550) to CRI-O.

[1]: https://tools.ietf.org/html/bcp47#section-2.1
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: golang/go#25340
[4]: containers/libpod#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
joselamego added a commit to joselamego/cri-o that referenced this issue Mar 13, 2019
lib/container_server: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [4]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

This ports containers/libpod@69cb863 (libpod/container_internal:
Split locale at the first dot, etc., 2019-03-05,
containers/libpod#2550) to CRI-O.

[1]: https://tools.ietf.org/html/bcp47#section-2.1
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: golang/go#25340
[4]: containers/libpod#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
muayyad-alsadi added a commit to muayyad-alsadi/libpod that referenced this issue Apr 21, 2019
libpod/container_internal: Split locale at the first dot, etc.
We're going to feed this into Go's BCP 47 language parser.  Language
tags have the form [1]:

  language
  ["-" script]
  ["-" region]
  *("-" variant)
  *("-" extension)
  ["-" privateuse]

and locales have the form [2]:

  [language[_territory][.codeset][@modifier]]

The modifier is useful for collation, but Go's language-based API
[3] does not provide a way for us to supply it.  This code converts
our locale to a BCP 47 language by stripping the dot and later and
replacing the first underscore, if any, with a hyphen.  This will
avoid errors like [4]:

  WARN[0000] failed to parse language "en_US.UTF-8": language: tag is not well-formed

when feeding language.Parse(...).

[1]: https://tools.ietf.org/html/bcp47#section-2.1
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[3]: golang/go#25340
[4]: containers#2494

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.