zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

ltiao · 2015-03-18T12:25:47Z

From nikola/data/themes/base/messages, it can be seen that the only supported
Chinese locale is zh-CN, which is Simplified Chinese. The other Chinese locales
are summarized below:

Locale	Description
zh-CN	Chinese (Simplified, PRC)
zh-SG	Chinese (Simplified, Singapore)
zh-TW	Chinese (Traditional, Taiwan)
zh-HK	Chinese (Traditional, Hong Kong S.A.R.)
zh-MO	Chinese (Traditional, Macao S.A.R.)

We see that zh-SG also uses Simplified Chinese, and all the rest use Traditional
Chinese.

It would be very easy to support all the Chinese locales by

Adding support to a locale which uses Traditional Chinese, say zh-TW
The other Traditional Chinese locales (zh-HK, zh-MO) can then simply inherit
MESSAGES from the zh-TW locale.
The same applies to zh-SG, which can just inherit from zh-CN.

A few things to note:

"Many characters were left untouched by simplification, and are thus identical
between the traditional and simplified Chinese orthographies."
(http://en.wikipedia.org/wiki/Simplified_Chinese_characters). In fact, many of
the characters in messages_zh_cn.py are Traditional Chinese characters. So potentially, all Chinese locale MESSAGES
can inherit from a base Traditional Chinese file and be simplified as needed.
That said, the words used to describe something can still differ between cultures
(kind of like how Americans say "trash" and Australians say "rubbish").

So for example, "source [code]" is "源代码" in Mainland China and "原始碼"
everywhere else (Singapore uses "原始码"). Note that the difference between
the words go beyond mere simplification. While "码" is a simplification of
"碼", "源" / "原" and "代" / "始" are completely different words. This example
also illustrates how characters can be identical in traditional and simplified
orthographies: Singapore uses the simplified version of "原始碼", but there is
no simplification for the words "原" and "始", so it the same as the traditional version.

For right now, I propose that we simply do the 3 steps above and refine later
on as needed, since "special cases aren't special enough to break the rules."

To illustrate, messages_zh_sg.py would look something like this:

from messages_zh_cn import MESSAGES
from copy import deepcopy
MESSAGES = deepcopy(MESSAGES)

and the future refinements would look something like this:

# -*- encoding:utf-8 -*-
from __future__ import unicode_literals
from messages_zh_cn import MESSAGES
from copy import deepcopy
MESSAGES = deepcopy(MESSAGES)

MESSAGES["Source"] = "原始码"
# ...etc

Please let me know if the logic is sound and I can work on this and submit a PR.

The text was updated successfully, but these errors were encountered:

ralsina · 2015-03-18T12:41:26Z

On 18/03/15 09:25, Louis Tiao wrote:

From |nikola/data/themes/base/messages|, it can be seen that the only
supported
Chinese locale is |zh-CN|, which is Simplified Chinese. The other
Chinese locales
are summarized below:

Locale Description
zh-CN Chinese (Simplified, PRC)
zh-SG Chinese (Simplified, Singapore)
zh-TW Chinese (Traditional, Taiwan)
zh-HK Chinese (Traditional, Hong Kong S.A.R.)
zh-MO Chinese (Traditional, Macao S.A.R.)

We see that zh-SG also uses Simplified Chinese, and all the rest use
Traditional
Chinese.

It would be very easy to support all the Chinese locales by

Adding support to a locale which uses Traditional Chinese, say
|zh-TW|

The other Traditional Chinese locales (|zh-HK|, |zh-MO|) can then
simply inherit |MESSAGES| from the |zh-TW| locale.

The same applies to |zh-SG|, which can just inherit from |zh-CN|.

A few things to note:

"Many characters were left untouched by simplification, and are
thus identical between the traditional and simplified Chinese
orthographies."
(http://en.wikipedia.org/wiki/Simplified_Chinese_characters). In
fact, many of the characters in messages_zh_cn.py

nikola/nikola/data/themes/base/messages/messages_zh_cn.py

Line 27 in 7b9123e

"Read in English": "中文版",

are Traditional Chinese characters. So potentially, all Chinese
locale |MESSAGES| can inherit from a base Traditional Chinese file
and be simplified as needed.
2.

That said, the words used to describe something can still differ
between cultures
(kind of like how Americans say "trash" and Australians say
"rubbish").

So for example, "source [code]" is "源代码" in Mainland China and
"原始碼"
everywhere else (Singapore uses "原始码"). Note that the
difference between
the words go beyond mere simplification. While "码" is a
simplification of
"碼", "源" / "原" and "代" / "始" are completely different words.
This example
also illustrates how characters can be identical in traditional
and simplified
orthographies: Singapore uses the simplified version of "原始碼",
but there is
no simplification for the words "原" and "始", so it the same as
the traditional version.

For right now, I propose that we simply do the 3 steps above and
refine later
on as needed, since "special cases aren't special enough to break the
rules."

To illustrate, |messages_zh_sg.py| would look like this:

from messages_zh_cnimport MESSAGES

and the future refinements would look like this:

-- encoding:utf-8 --

from __future__import unicode_literals
from messages_zh_cnimport MESSAGES

MESSAGES["Source"]= "原始码"

...etc

Please let me know if the logic is sound and I can work on this and
submit a PR.

We are doing all the translation work in transifex. Just request a team
for whatever new locale you want and that's all, no code changes should
be needed.

Kwpolska · 2015-03-18T14:49:30Z

I am also 👎 for this entire idea. It does not make any sense to do all this. Do you speak one of the 5 missing languages? Then just request a team on Transifex and get translating, it’s just a few strings. And you can start with the zh_CN translations and modify them appropriately. Playing with inheritance does not make any sense, especially for a small project like Nikola.

Also: your code examples completely ignore the fact that dicts are mutable.

ltiao · 2015-03-18T14:58:59Z

Thanks for the replies. I'm relatively new to the codebase and didn't realize you were using Transifex. Once @ralsina pointed this out, I understood that all of this is completely unnecessary. I'm happy for someone to go ahead and close this issue.

Too difficult to update and maintain multiple files for each language. Better to use main language file, and only override with country specific settings where differences exist. Also - getting a 'dictionary not loaded yet' error

Kwpolska closed this as completed Mar 18, 2015

This was referenced Nov 27, 2015

add chinese locales: zh-SG, zh-HK, zh-MO moment/moment#2776

Closed

add chinese cultures: zh-SG, zh-TW, zh-HK, zh-MO BenjaminVanRyseghem/numbro#121

Open

erickguan mentioned this issue Mar 14, 2017

Choose strings for the UI using the Accept-Language header. eggpi/citationhunt#91

Closed

ukanuk mentioned this issue May 2, 2020

Request translations in particular language variants ukanuk/wplangtools#1

Open

akaroka mentioned this issue May 27, 2021

Add simplified chinese to BeeWare.lektorproject beeware/beeware.github.io#464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

ltiao commented Mar 18, 2015

ralsina commented Mar 18, 2015

-- encoding:utf-8 --

...etc

Kwpolska commented Mar 18, 2015

ltiao commented Mar 18, 2015

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

Comments

ltiao commented Mar 18, 2015

ralsina commented Mar 18, 2015

-- encoding:utf-8 --

...etc

Kwpolska commented Mar 18, 2015

ltiao commented Mar 18, 2015