-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zh_CN.yaml #6904
Add zh_CN.yaml #6904
Conversation
Most of the translations don't have a country code; would it make sense to make this |
Chinese at least are classified between traditional and simplified and this PR is for simplified. Moreover, even if we are talking about the same traditional Chinese, the usage between Taiwan's Traditional Chinese is quite different from Hong Kong. In this case probably we don't need to localized that much (say HK vs Taiwan), so simply discern between trad and sim may be good enough. One standard I saw is zh-hant and zh-hans, where the Han stands for 漢,and t and s is just the short form of traditional and simplified. |
Will these names actually work with pandoc? I think pandoc is expecting a standard language + country code. I guess that the At https://stackoverflow.com/questions/4892372/language-codes-for-simplified-chinese-and-traditional-chinese it says that
I think it would create fewer issues to use |
From https://pandoc.org/MANUAL.html#language-variables
Note that Mandarin vs. Cantonese etc. refers to the languages, while traditional vs. simplified Chinese refers to the scripts usually used to write those languages. LanguageFrom the Language subtag lookup, we see for example for madarin:
Scriptfrom https://tools.ietf.org/html/bcp47
Interestingly, for example the Mozilla docs uses The whole "Using Extended Language Subtags" section is worth a read: https://tools.ietf.org/html/bcp47#section-4.1.2, but basically:
|
A bit more info: As said above there's differences in characters and the spoken languages. Simplified vs traditional is about the character. In this case the file is in simpler Chinese and would be incorrect in traditional Chinese. (Ie we'd need both.) But it is not entirely true that the spoken languages has no "localizations". The matter is quite complicated. First we can talk about formal written Chinese, in that case the Mainland China's, Hong Kong's and Taiwan's are quite different. (1st in simplified Chinese, last 2 in Traditional. Perhaps Singaporean Chinese is also different but I don't have experience in that.) Then there are "lesser formal" Chinese, which can also be written. Such as Cantonese in Hong Kong surrounding areas, and 台語 in Taiwanese which is different from Mandarin, which is like the 2nd most spoken Chinese languages in Taiwan. And each of these spoken Chinese languages can be totally different in written form comparing to the "formal Chinese" mentioned above. A very good example will be from Chinese Bible translations. CUV is like the "Chinese King James", and has simplified and traditional variants. There's another 文理和合本 which is in ancient Chinese, sort of like Shakespearean English. Then there's a Cantonese Bible which is completely different, and also a 台語聖經 in that Taiwanese language. For simplicity, I think traditional vs simplified is good enough for a starter, because the lookup table is very simple here. I can look into how pandoc should handle it a bit more tomorrow. |
thanks,If necessary, I can contact traditional Chinese (zh_TW, zh_HK) Cantonese speakers to improve these translations together.
|
A few more details: I checked, and the functions in Text.Pandoc.BCP47 do allow language variants. It occurred to me that citeproc might not be expecting the variant tags (e.g. |
So let's stick with zh-Hans? I can made a zh-Hant PR. |
To clarify: if |
I personally only used zh-Hant and zh-Hans in the past. I think in written Chinese it is the simplest thing people will do. The reason is simplified Chinese and traditional Chinese has different "character sets" in the unicode (in the past they have their own character sets such as big5 and even Hong Kong variant of big5-*, and the simplified ones has confusingly many gb-* variants.) e.g. in choosing Chinese fonts in LaTeX, matching the words you type in zh-Han(t|s) is very important as many Chinese fonts only cover either traditional/simplified Chinese. So for simplicity may be start with having only zh-Hant and zh-Hans first, and only when there's demand we might add more. |
Well, if we use |
A quick search on the internet can't determine what zh alone means. May be someone else has declared it already, but if not, and we declare it here zh = zh-Hans, it is political and controversial. Simplified Chinese is the work of the Chinese Communist Party in "recent" history which simplified the characters somewhat, aimed to be easier to learn but proven to show no real advantage in literacy rate; and at the same time it destroys the history around those characters (like the Chinese version of studying etymology etc.) The rest of the Chinese world, except probably only Singapore, still uses Traditional Chinese (not only other Chinese countries but Chinese in other countries.) Practically speaking, zh-Hant to zh-Hans mapping is surjective as far as I know. Basically when simplified Chinese was designed, multiple traditional Chinese characters are mapped to the same simplified Chinese character. So zh-Hant has more information there. Hence, one possible approach would be to have zh-Hant only, and uses a library to translate it automatically. One example is OpenCC. P.S. of course by popular vote zh-Hans will win just because the PRC has more Chinese then anywhere else. |
e.g. from https://tools.ietf.org/html/bcp47:
From these texts you can't see what zh alone can mean. Especially from the bold sentence it seems to suggest bare "zh" should be used only for historical purposes. Since this is a new "feature" here that no one else has relied on in pandoc before, may be we should just expect people to use the more precise variants (with language subtags.) |
OK, I'll go with zh-Hans and zh-Hant then. Thanks! |
No description provided.