-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/text/language: change of behavior for language matcher #24211
Comments
/cc @mpvl |
This issue seems to occur only when the preferred language contains regional preferences. Example Supported:
Tests:
Regression This regression was introduced on this commit: golang/text@6008361#diff-c716fa1ccf70cc54ac8b513b999c84eb Here: if w.RegionID != tt.RegionID && w.RegionID != 0 {
if w.RegionID != 0 && tt.RegionID != 0 && tt.RegionID.Contains(w.RegionID) {
tt.RegionID = w.RegionID
tt.RemakeString()
} else if r := w.RegionID.String(); len(r) == 2 {
// TODO: also filter macro and deprecated.
tt, _ = tt.SetTypeForKey("rg", strings.ToLower(r)+"zzzz")
}
} Temporary workaround |
I was just bitten by this today. Here is a minimal repro (does not repro on play.golang.org): package main
import "fmt"
import "golang.org/x/text/language"
func main() {
m := language.NewMatcher([]language.Tag{language.English})
tag, i, conf := m.Match(language.AmericanEnglish)
fmt.Println(tag, i, conf) // en-u-rg-uszzzz 0 Exact
} |
I notice that the documentation says this:
It seems like the intended behavior is to save the input array passed to the matcher and then use the index returned from Match. It is annoying that I can't query the input array from the matcher itself though. Example workaround
|
My temporary workaround: langTag, _, _ := languageMatcher.Match(tags...)
langTagString := langTag.String()[0:2] |
See golang/go#24211 for more details This hit us for a zh_CN manpage with precisely one "en" option: the returned tag is en-u-rg-chzzzz, not en.
I was a bit surprised by this when I meet this problem in my production environment ,is that just a mistake? I was really loving golang, but this is really frustrated me, my every beliefs on golang have beed break. it not something about changing the exists codes or improving my project's unit test, it never be. |
I just got bitten by this, why is it still open after so long ? |
This seems like the most straightforward, least-hacky way to get the language out of the tag: supported := []language.Tag{language.English}
m := language.NewMatcher(supported)
tag, _, _ := m.Match(language.AmericanEnglish)
base, _, _ := tag.Raw()
languageCode := base.ISO3() // eng |
Experiencing this on a patch update is quite irritating. We use this as a workaround: strings.Split(tag.String(), "-")[0] |
Should the community conclude that We have issues like this one going back 3 years, and these:
with no progress |
Related to this, the NewMatcher documentation about the returned index is a bit convoluted:
The beginning of the sentence is simple (i.e. it is the index in slice |
@ianlancetaylor looking at https://dev.golang.org/owners, @mpvl is still listed as owner for x/text, but it seems like he's not particularly engaged in it at the moment. Is there another owner we can ping? (This issue is really easy to hit and inconvenient to work around.) |
IMHO this is working as intended. The "new" behavior provides strictly more information than the old behavior. If you want to know which tag of your list of supported tags was matched, use the returned index. If you want to provide better internationalization support, use the returned tag, which can specify extra information using BCP 47 Extension U. Based on the original example, with the supported tag If you are using the tag as a key, for example for localization, your best option is probably to use both tags: the one obtained via the index for the lookup, and the returned one for services from
https://go.dev/blog/matchlang goes into some of the details of this, too. I believe that, as an alternative to using the index, it is safe to use In summary: the returned tag is an accurate description of the user's wishes, reconciled with the supported languages. This tag should be used with APIs that themselves understand BCP-47 tags. The returned index refers to the input and is useful if you need to use it as raw data, for example as a key into a map. As an aside, a lot of the code that takes the returned tag and slices and dices it (as included in many of the comments in this issue) is prone to being incorrect due to canonicalization. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.9.2
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?linux amd64
What did you do?
The golang.org/x/text package seemed to have changed in the behaviour of language matching with an update a few days ago:
What did you expect to see?
This used to print "en" but now prints "en-u-rg-uszzzz". This doesn't make sense because I only support "en" and "fr" so why is it returning something else? Switching the order of my preferred languages gives "en".
Is there a rhyme or reason why the change because I cannot understand? Defeats the purpose of a language "matcher" if it is going to return languages that I don't support.
If this is by design what is the best way to get just "en". Parent()? Base()? SomethingElse()?
What did you see instead?
"en-u-rg-uszzzz", a language I do not support.
The text was updated successfully, but these errors were encountered: