x/text/language: fails to parse valid BCP47 -t extension string; mistakes field for a region #54316
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What did you do?
What did you expect to see?
What did you see instead?
Cause
In
/x/text/internal/language
, this error happens whenever parseExtension calls parseTag to parse language, script, region and variants if they appear after "-t".(e.g. "en-t-m0-ungegn" is fine and does not trigger this error, but "en-t-en-m0-..." does.)
rfc6497 gives another example,
und-Cyrl-t-und-latn-m0-ungegn-2007
, which Go also fails to parse for the same reason."The field separator subtags, such as 'm0', were chosen because they are short, visually distinctive, and cannot occur in a language subtag". But parseTag believes "m0" is part of the (language, script, region, variants). It thinks "m0" is like the "GB" in "en-GB".
Note also that a language code like "en-001" is a valid part of the language/script/region/variants, so we have to be very specific that Go is parsing the tag incorrectly only when the subtag is exactly two characters long and contains a digit.
Fix
Trivial small pull request incoming. Passes all existing tests, adds one new test case.
The text was updated successfully, but these errors were encountered: