You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 26, 2024. It is now read-only.
Kosa: Construct a flat list of languages on the server side, but content is divided by text vs. audio/video. Non-Chinese text and audio happen to have the same keys. Chinese text content is either keyed as zho-hant or zho-hans, but nothing else. Audio content is keyed as cmn, yue, nan, or hak.
Mobile App: Users select a language and the app constructs a "preferred languages list" behind the scenes. All languages have a backup language of English. Selecting a Chinese language presents the user with a script selection box as well, with options of Traditional or Simplified. The "preferred" language is always zho-hant or zho-hans and the second-most-preferred language will be the spoken Chinese language (of cmn, yue, nan, or hak). Chinese users also get a final backup of English.
Thinking: This system allows Kosa to serve content with a flat language key for anything, greatly simplifying how it tracks languages and preventing a language tree from emerging anywhere in the API. The "preferred languages list" allows us to (a) back up everything with English content and (b) add flexible language preferences and new script options later, if required. The language-selection algorithm can be dumb-but-flexible, allowing us to avoid lookup trees entirely.
Requirements
embed standardized language names in Dart and Ruby/Clojure using ICU libraries (?)
a minimum required set of languages include:
pali
english
espanol
italiano
simplified chinese
francais
portugues
srpsko-hrvatski (serbo-croatian)
The complete list of languages currently supported by Pariyatti:
This list is available through ICU libraries. This CLDR format also contains the language name equivalents (आनगराी / English vs. Hindi / हिंदी vs every other possible combination).
Current Thinking (2021 July 22)
eng
,hin
, etc.)hans
vs.hant
)zho-hant
orzho-hans
, but nothing else. Audio content is keyed ascmn
,yue
,nan
, orhak
.zho-hant
orzho-hans
and the second-most-preferred language will be the spoken Chinese language (ofcmn
,yue
,nan
, orhak
). Chinese users also get a final backup of English.Requirements
The complete list of languages currently supported by Pariyatti:
It seems that ISO 639-3 (an extension of ISO 639-3) has reasonably comprehensive support:
hans
/hant
instead of_CN
/_TW
?)My current thinking is ISO 639-3 + (optional) region specifier. Alternatively, some BCP 47 subset... but it's just so complicated.
Wikipedia uses a number of hacks to get around BCP 47 limitations:
Examples explaining why flattening Chinese languages won't work:
cmn
,nan
,hak
but always useszho-hant
nan
andhak
but always usezho-hans
Chinese scripts can be decoded here:
https://www.chineseconverter.com/en/convert/find-out-if-simplified-or-traditional-chinese
Old notes from Asana:
1:
My first round of research turned up this:
A Language should have three fields: IANA code, English name ("Hindi"), Actual name ("हिंदी")
IANA tag registry is here: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
Prefer tag combinations were are nearest matches to the Gettext locale standard, wherever possible:
https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html#Locale-Names
2:
Ooooohhhkkkayyyyy. It looks like THIS is maybe the standard way to do this? At least according to friends at Wikipedia:
https://github.com/unicode-org/cldr/tree/release-37/common/main
3:
The canonical ICU webpage is here: http://site.icu-project.org/home
The Ruby library is listed here (gem
icu
): http://site.icu-project.org/relatedThere is a Dart package: https://pub.dev/packages/icu
4: (post-Asana)
Clojure: https://github.com/Vincit/satakieli (wraps ICU4J)
Java: http://site.icu-project.org (ICU4J)
The text was updated successfully, but these errors were encountered: