RFC5646 is an effort to define "Tags for Identifying Language." This RFC5646 is also known as BCP-47 (Best Current Practice #47)
You can see the full text here: https://tools.ietf.org/html/rfc5646
The language tag based on this RFC can be very complex. While en-US
is simple enough to mean 'English as used in the United States", it can get more complicated such as: hy-Latn-IT-arevela
which means 'Eastern Armenian language written in Latin script, as used in Italy'
The tag is designed to have a maximum of 35 character (para 4.4.1):
[language:8]-[script:5]-[region:4]-[variant1:9]-[variant2:9]
Internet Assigned Number Authority maintains the official record of all approved language tags (iana).
Extracted from those official record are these 7 python files that contain the tag and the description of all the language tag. While the typical application probably deals with the "language tag" section only, you might need some other part as well.
There are currently 9116 language tag records.
Basically ISO-639 list (2 or 3 characters) or other sublanguage/future use. This is the primary language tag
Also part of ISO-639 or some reserved used.
This part is from ISO-15924
This part is either from ISO-3166-1 or 3 digit United Nation UN M.49 code
Some registered variants.
Non-redundant tags registered during the RFC 3066 era. This includes "i-klingon" and "i-enochian" language tag if you must know.
Some rare and almost-never-been-used tag.
Source: iana.org
```
%%
Type: language
Subtag: bi
Description: Bislama
Added: 2005-10-16
%%
Type: language
Subtag: bm
Description: Bambara
Added: 2005-10-16
%%
Type: language
Subtag: bn
Description: Bengali
Description: Bangla
Added: 2005-10-16
Suppress-Script: Beng
%%
```
- Each entry separated by "%%"
- The number of records each entry can vary. E.g., multiple descriptions, additionals field, etc
- The easiest is copy and paste the content into your file.
- If it is too much, just include it as a module, for example:
- copy RFC5646_language.py into your current directory
- Add as follows to display the description of the language tag:
from .RFC5646_language import RFC5646_language ... ... if "-" in self.request.LANGUAGE_CODE: lang = self.request.LANGUAGE_CODE.split("-")[0] else: lang = self.request.LANGUAGE_CODE if lang in RFC5646_language: requested_language = RFC5646_language[lang] else: self.request.LANGUAGE_CODE = "en" # default to English ... ...
- If you use other programming languages than python, copy-paste the content in a text editor to reformat according to the language requirement.
- Save the display from iana.org database to a text file. In this repository, name it "language-subtag-registry.txt"
- Use the included
extractor.py
to generate the new file - Note, if iana.org changes the display/format, a fiddle around
extractor.py
to get a correct parsing is required