Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/language: change of behavior for language matcher #24211

Open
LayneChris opened this issue Mar 2, 2018 · 12 comments
Open

x/text/language: change of behavior for language matcher #24211

LayneChris opened this issue Mar 2, 2018 · 12 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@LayneChris
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9.2

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

linux amd64

What did you do?

The golang.org/x/text package seemed to have changed in the behaviour of language matching with an update a few days ago:

package main

import (
    "fmt"

    "golang.org/x/text/language"
)

func main() {
    s := []language.Tag{language.MustParse("en"), language.MustParse("fr")}
    p := []language.Tag{language.MustParse("en-US"), language.MustParse("en")}

    l := language.NewMatcher(s)
    ll, _, _ := l.Match(p...)

    fmt.Println(ll)
}

What did you expect to see?

This used to print "en" but now prints "en-u-rg-uszzzz". This doesn't make sense because I only support "en" and "fr" so why is it returning something else? Switching the order of my preferred languages gives "en".

Is there a rhyme or reason why the change because I cannot understand? Defeats the purpose of a language "matcher" if it is going to return languages that I don't support.

If this is by design what is the best way to get just "en". Parent()? Base()? SomethingElse()?

What did you see instead?

"en-u-rg-uszzzz", a language I do not support.

@bradfitz bradfitz changed the title Change of behavior for language matcher x/text/language: change of behavior for language matcher Mar 2, 2018
@gopherbot gopherbot added this to the Unreleased milestone Mar 2, 2018
@bradfitz
Copy link
Contributor

bradfitz commented Mar 2, 2018

/cc @mpvl

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 13, 2018
@basgys
Copy link

basgys commented Mar 14, 2018

This issue seems to occur only when the preferred language contains regional preferences.

Example

Supported:

  • en
  • en-GB
  • de-CH
  • fr-CH
  • en-US

Tests:

  • want en expect result en actual result en
  • want fr expect result fr-CH actual result fr-CH
  • want de expect result de-CH actual result de-CH
  • want en-GB expect result en-GB actual result en-GB
  • want fr-CH expect result fr-CH actual result fr-CH
  • want de-CH expect result de-CH actual result de-CH
  • want it expect result en actual result en
  • want it-CH expect result en actual result en-u-rg-chzzzz

Regression

This regression was introduced on this commit: golang/text@6008361#diff-c716fa1ccf70cc54ac8b513b999c84eb

Here:

if w.RegionID != tt.RegionID && w.RegionID != 0 {
	if w.RegionID != 0 && tt.RegionID != 0 && tt.RegionID.Contains(w.RegionID) {
		tt.RegionID = w.RegionID
		tt.RemakeString()
	} else if r := w.RegionID.String(); len(r) == 2 {
		// TODO: also filter macro and deprecated.
		tt, _ = tt.SetTypeForKey("rg", strings.ToLower(r)+"zzzz")
	}
}

Temporary workaround
Rollback golang.org/x/text to commit 2120f96286c5897163172b6f5ac2fd0921777714 cc @LayneChris

@nicksnyder
Copy link

nicksnyder commented Apr 4, 2018

I was just bitten by this today.

Here is a minimal repro (does not repro on play.golang.org):

package main

import "fmt"
import "golang.org/x/text/language"

func main() {
	m := language.NewMatcher([]language.Tag{language.English})
	tag, i, conf := m.Match(language.AmericanEnglish)
	fmt.Println(tag, i, conf) // en-u-rg-uszzzz 0 Exact
}

@nicksnyder
Copy link

nicksnyder commented Apr 4, 2018

I notice that the documentation says this:

Note that Tag that is returned by Match and MatchString may differ from any of the supported languages, as it may contain carried over settings from the user tags. This may be inconvenient when your application has some additional locale-specific data for your supported languages. Match and MatchString both return the index of the matched supported tag to simplify associating such data with the matched tag.

It seems like the intended behavior is to save the input array passed to the matcher and then use the index returned from Match.

It is annoying that I can't query the input array from the matcher itself though.

Example workaround

package main

import "fmt"
import "golang.org/x/text/language"

func main() {
	supported := []language.Tag{language.English}
	m := language.NewMatcher(supported)
	_, i, conf := m.Match(language.AmericanEnglish)
	fmt.Println(supported[i], i, conf) // en 0 Exact
}

@dnx2k
Copy link

dnx2k commented May 30, 2018

My temporary workaround:

langTag, _, _ := languageMatcher.Match(tags...)
langTagString := langTag.String()[0:2]

stapelberg added a commit to stapelberg/debiman that referenced this issue Aug 9, 2018
See golang/go#24211 for more details

This hit us for a zh_CN manpage with precisely one "en" option: the returned tag
is en-u-rg-chzzzz, not en.
@chenjie4255
Copy link

I was a bit surprised by this when I meet this problem in my production environment ,is that just a mistake? I was really loving golang, but this is really frustrated me, my every beliefs on golang have beed break.

it not something about changing the exists codes or improving my project's unit test, it never be.

@schmurfy
Copy link

I just got bitten by this, why is it still open after so long ?

@dradtke
Copy link

dradtke commented May 4, 2020

This seems like the most straightforward, least-hacky way to get the language out of the tag:

supported := []language.Tag{language.English}
m := language.NewMatcher(supported)
tag, _, _ := m.Match(language.AmericanEnglish)
base, _, _ := tag.Raw()
languageCode := base.ISO3() // eng

playground link

@svenwltr
Copy link

Experiencing this on a patch update is quite irritating.

We use this as a workaround:

strings.Split(tag.String(), "-")[0]

@razor-1
Copy link

razor-1 commented Apr 4, 2021

@sebastien-rosset
Copy link

It seems like the intended behavior is to save the input array passed to the matcher and then use the index returned from Match.
It is annoying that I can't query the input array from the matcher itself though.

Related to this, the NewMatcher documentation about the returned index is a bit convoluted:

The index returned by the Match method corresponds to the index of the matched tag in t, but is augmented with the Unicode extension ('u')of the corresponding preferred tag. This allows user locale options to be passed transparently.

The beginning of the sentence is simple (i.e. it is the index in slice t). However using the word "augmented" to qualify the meaning of the index is confusing. What does it mean to augment "the index of the matched tag"?

@josharian
Copy link
Contributor

@ianlancetaylor looking at https://dev.golang.org/owners, @mpvl is still listed as owner for x/text, but it seems like he's not particularly engaged in it at the moment. Is there another owner we can ping?

(This issue is really easy to hit and inconvenient to work around.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests