Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move zh-tw and zh-cn locales to zh-hant and zh-hans, respectively #5829

Closed
diox opened this issue Jul 3, 2017 · 13 comments
Closed

Move zh-tw and zh-cn locales to zh-hant and zh-hans, respectively #5829

diox opened this issue Jul 3, 2017 · 13 comments
Labels
component:code_quality component:i18n state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. triaged

Comments

@diox
Copy link
Member

diox commented Jul 3, 2017

Recent version of django moved zh-tw and zh-cn locales to zh-hant and zh-hans. There is some code for backwards-compatibility, but that will go away and we'll need to handle that.

I had started doing that in #5811 but it requires more thought so I reverted that commit and moved the discussion here. One of the issues is that if we change the locales like this, we need to add redirects from the old locales and migrate data in translated fields.

This blocks #5271 (See #5829 (comment))

@eviljeff
Copy link
Member

tbh, this change (in django) makes me uneasy. zh-CN is a locale code - Chinese understood in mainland China, whereas zh-hans is a language+script code - Chinese written in Simplified Chinese (not just understood in mainland China). It's converting one thing to something different - although it's true that zh-CN has typically been misused to mean Simplified Chinese anyway.

@diox
Copy link
Member Author

diox commented Aug 30, 2017

See https://code.djangoproject.com/ticket/18419 for the discussion in Django. It looks like zh-CN was indeed considered to be Simplified Chinese and that's why they changed it.

In any case, I'm not sure we'll be able to avoid the change, unless we keep a copy / symlink the old locales files for the strings translated in django itself and ignore the warning.

@yookoala
Copy link

yookoala commented Aug 30, 2017

In practice, zh-CN and zh-SG are subsets of zh-Hans; zh-TW and zh-HK are subset of zh-Hant. So zh-CN if not resolved directly, should be resolved as zh-Hans.

The difference within the subsets are usually vocabularies and translation costumes (much like the difference between en_US and en_UK). But the scripts used are common within the sets.

@diox
Copy link
Member Author

diox commented Jan 10, 2018

Re-reading the django code I don't think it's a blocker for us... yet. When django removes the backwards-compatibility code, we'll be in trouble, but at the moment it will still work - and produce a deprecation warning.

@yookoala
Copy link

yookoala commented Jan 11, 2018

According to the discussion in the ticket, there should be a fallback for zh-tw, zh-hk to zh-Hant (and zh-cn, zh-sg to zh-Hans). This would be the ideal behaviour given the situation.

The pull request django/django#1868 seems to have implemented the idea and was merged years ago. But the behaviour of https://addons.mozilla.org/ does not follow the default Django. So is there something in the middle that went wrong?

@diox
Copy link
Member Author

diox commented Jan 11, 2018

We don't follow the default because Mozilla products don't - the source of truth for our languages is currently https://product-details.mozilla.org/1.0/languages.json and it doesn't support zh-hant/zh-hans.

Right now, everything works - because of the fallback code. I created this issue because in more recent versions of Django, the fallback code emits a deprecation warning indicating that it might go away in the future. However, as I said in my previous comment, this is not necessarily a blocker for upgrading to Django 1.11, we can live with the deprecation warning, as long as the fallback code is still there.

@muffinresearch
Copy link
Contributor

This may also be impacted by https://github.com/mozilla-services/cloudops-deployment/issues/1404

@eviljeff
Copy link
Member

eviljeff commented Jun 8, 2018

blocked by mozilla-services/cloudops-deployment#1404 pretty much

@eviljeff
Copy link
Member

not blocked by mozilla-services/cloudops-deployment#1404 any more. Looks like the fallback was removed in django1.9 @diox 🐼

@stale
Copy link

stale bot commented Mar 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. If you think this bug should stay open, please comment on the issue with further details. Thank you for your contributions.

@stale stale bot added the state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. label Mar 23, 2020
@yookoala
Copy link

I don't think there is any improvement on the situation. Please keep this issue open.

@stale stale bot removed the state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. label Mar 24, 2020
@xslidian
Copy link

xslidian commented Apr 6, 2020

zh-Hans & zh-Hant are preferred, not only because they are the most widely used/accepted forms, but also because they could cover rare cases like zh-MY (→ undefined when only zh-CN/TW are available) or zh-US-x-LAX (LA Chinatown dialects; → undefined in zh-CN/TW).

When language isn't specified for zh, zh-cmn is implied, due to the Mandarin Chinese grammar is most compatible with the modern Chinese used since 1920s, known as Baihuawen.
When form isn't specified, the locale's default form is selected, which is usually the form their local authorities write official documents in.
So we have the following implication mappings:

locale best guessed language tag
zh-CN zh-cmn-Hans-CN
zh-HK zh-cmn-Hant-HK
zh-MO zh-cmn-Hant-MO
zh-TW zh-cmn-Hant-TW
zh-SG zh-cmn-Hans-SG
zh-MY zh-cmn-Hans-MY

You can find such default mapping rules (und_HKzh_Hant_HK) in the Likely Subtags section of Unicode CLDR Charts.


Take Hong Kong for example, their government writes in Mandarin Chinese (cmn) in the traditional form (Hant), so zh-cmn-Hant-HK should be preferred if the user selects zh-HK as preferred locale, but doesn't specify a certain language (we don't know if she's capable of reading the written form of a more real-life language).

However, Cantonese (zh-yue) is much more used in daily life, and newspapers.
So if a user definitely selects zh-yue as the language and has a HK locale, and if zh-yue-Hant-HK isn't available, we should fallback to zh-yue-Hant-CN if possible, then zh-cmn-Hant-HK/zh-cmn-Hant-TW.
(Yes, books published in mainland China are allowed to use zh-Hant :P)

By using the Likely Subtags data, you can get rid of chaos without having to learn the development history of the Chinese macro-language.

@stale
Copy link

stale bot commented Oct 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. If you think this bug should stay open, please comment on the issue with further details. Thank you for your contributions.

@stale stale bot added the state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. label Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:code_quality component:i18n state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. triaged
Projects
None yet
Development

No branches or pull requests

5 participants