Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

Closed
ltiao opened this issue Mar 18, 2015 · 3 comments
Closed

zh-TW, zh-HK, zh-* locales and inheritance in zh-CN #1637

ltiao opened this issue Mar 18, 2015 · 3 comments

Comments

@ltiao
Copy link

ltiao commented Mar 18, 2015

From nikola/data/themes/base/messages, it can be seen that the only supported
Chinese locale is zh-CN, which is Simplified Chinese. The other Chinese locales
are summarized below:

Locale Description
zh-CN Chinese (Simplified, PRC)
zh-SG Chinese (Simplified, Singapore)
zh-TW Chinese (Traditional, Taiwan)
zh-HK Chinese (Traditional, Hong Kong S.A.R.)
zh-MO Chinese (Traditional, Macao S.A.R.)

We see that zh-SG also uses Simplified Chinese, and all the rest use Traditional
Chinese.

It would be very easy to support all the Chinese locales by

  1. Adding support to a locale which uses Traditional Chinese, say zh-TW
  2. The other Traditional Chinese locales (zh-HK, zh-MO) can then simply inherit
    MESSAGES from the zh-TW locale.
  3. The same applies to zh-SG, which can just inherit from zh-CN.

A few things to note:

  1. "Many characters were left untouched by simplification, and are thus identical
    between the traditional and simplified Chinese orthographies."
    (http://en.wikipedia.org/wiki/Simplified_Chinese_characters). In fact, many of
    the characters in messages_zh_cn.py are Traditional Chinese characters. So potentially, all Chinese locale MESSAGES
    can inherit from a base Traditional Chinese file and be simplified as needed.

  2. That said, the words used to describe something can still differ between cultures
    (kind of like how Americans say "trash" and Australians say "rubbish").

    So for example, "source [code]" is "源代码" in Mainland China and "原始碼"
    everywhere else (Singapore uses "原始码"). Note that the difference between
    the words go beyond mere simplification. While "码" is a simplification of
    "碼", "源" / "原" and "代" / "始" are completely different words. This example
    also illustrates how characters can be identical in traditional and simplified
    orthographies: Singapore uses the simplified version of "原始碼", but there is
    no simplification for the words "原" and "始", so it the same as the traditional version.

For right now, I propose that we simply do the 3 steps above and refine later
on as needed, since "special cases aren't special enough to break the rules."

To illustrate, messages_zh_sg.py would look something like this:

from messages_zh_cn import MESSAGES
from copy import deepcopy
MESSAGES = deepcopy(MESSAGES)

and the future refinements would look something like this:

# -*- encoding:utf-8 -*-
from __future__ import unicode_literals
from messages_zh_cn import MESSAGES
from copy import deepcopy
MESSAGES = deepcopy(MESSAGES)

MESSAGES["Source"] = "原始码"
# ...etc

Please let me know if the logic is sound and I can work on this and submit a PR.

@ralsina
Copy link
Member

ralsina commented Mar 18, 2015

On 18/03/15 09:25, Louis Tiao wrote:

From |nikola/data/themes/base/messages|, it can be seen that the only
supported
Chinese locale is |zh-CN|, which is Simplified Chinese. The other
Chinese locales
are summarized below:

Locale Description
zh-CN Chinese (Simplified, PRC)
zh-SG Chinese (Simplified, Singapore)
zh-TW Chinese (Traditional, Taiwan)
zh-HK Chinese (Traditional, Hong Kong S.A.R.)
zh-MO Chinese (Traditional, Macao S.A.R.)

We see that zh-SG also uses Simplified Chinese, and all the rest use
Traditional
Chinese.

It would be very easy to support all the Chinese locales by

  1. Adding support to a locale which uses Traditional Chinese, say
    |zh-TW|
  2. The other Traditional Chinese locales (|zh-HK|, |zh-MO|) can then
    simply inherit |MESSAGES| from the |zh-TW| locale.
  3. The same applies to |zh-SG|, which can just inherit from |zh-CN|.

A few things to note:

  1. "Many characters were left untouched by simplification, and are
    thus identical between the traditional and simplified Chinese
    orthographies."
    (http://en.wikipedia.org/wiki/Simplified_Chinese_characters). In
    fact, many of the characters in messages_zh_cn.py

    "Read in English": "中文版",

    are Traditional Chinese characters. So potentially, all Chinese
    locale |MESSAGES| can inherit from a base Traditional Chinese file
    and be simplified as needed.
    2.

    That said, the words used to describe something can still differ
    between cultures
    (kind of like how Americans say "trash" and Australians say
    "rubbish").

    So for example, "source [code]" is "源代码" in Mainland China and
    "原始碼"
    everywhere else (Singapore uses "原始码"). Note that the
    difference between
    the words go beyond mere simplification. While "码" is a
    simplification of
    "碼", "源" / "原" and "代" / "始" are completely different words.
    This example
    also illustrates how characters can be identical in traditional
    and simplified
    orthographies: Singapore uses the simplified version of "原始碼",
    but there is
    no simplification for the words "原" and "始", so it the same as
    the traditional version.

For right now, I propose that we simply do the 3 steps above and
refine later
on as needed, since "special cases aren't special enough to break the
rules."

To illustrate, |messages_zh_sg.py| would look like this:

from messages_zh_cnimport MESSAGES

and the future refinements would look like this:

-- encoding:utf-8 --

from __future__import unicode_literals
from messages_zh_cnimport MESSAGES

MESSAGES["Source"]= "原始码"

...etc

Please let me know if the logic is sound and I can work on this and
submit a PR.

We are doing all the translation work in transifex. Just request a team
for whatever new locale you want and that's all, no code changes should
be needed.

@Kwpolska
Copy link
Member

I am also 👎 for this entire idea. It does not make any sense to do all this. Do you speak one of the 5 missing languages? Then just request a team on Transifex and get translating, it’s just a few strings. And you can start with the zh_CN translations and modify them appropriately. Playing with inheritance does not make any sense, especially for a small project like Nikola.

Also: your code examples completely ignore the fact that dicts are mutable.

@ltiao
Copy link
Author

ltiao commented Mar 18, 2015

Thanks for the replies. I'm relatively new to the codebase and didn't realize you were using Transifex. Once @ralsina pointed this out, I understood that all of this is completely unnecessary. I'm happy for someone to go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants