Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is html a language tag? #1

Open
vtellier opened this issue Jan 30, 2019 · 3 comments
Open

Is html a language tag? #1

vtellier opened this issue Jan 30, 2019 · 3 comments

Comments

@vtellier
Copy link

Hi,

I'm quite surprised to see that the word html is considered as a language tag, i could not find any html language tag on the net so I post this issue.

screenshot from 2019-01-30 18-02-29

@revelt
Copy link

revelt commented Jan 2, 2020

Currently, any sequence of letters, minimum two, will yield true. No, html is not a language tag, algorithm needs to be looked at. Even zzz and zz (just not z) is reported as true.

There are two ways to write this program. One is to concoct regex which matches existing variety of country codes: "at least two characters, optionally followed by a dash...". That's the easy way.

The other way is to extract all the country codes known to humanity (ISO 3166 alpha-2?), then, at minimum, match the values without dashes against that list. Source: https://tools.ietf.org/html/rfc1766

@revelt
Copy link

revelt commented Jan 6, 2020

PS. In the meantime, if anybody needs more strict language tag validation, I created a validator app based on a function, not regex, https://www.npmjs.com/package/is-language-code — it validates not only the values against known IANA tags but also evaluates the logic according the spec. The missing test ar-a-aaa-b-bbb-a-ccc also passes.

@opyh
Copy link

opyh commented Jan 23, 2020

(Shameless plug:) Maybe https://github.com/sozialhelden/ietf-language-tags might be interesting for more complex parsing and edge cases, too. Note that "language code" refers to different standards - there are differences between "IETF language tag", "Unicode language tag" and "POSIX language tag".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants