Specification for i18n #38

martinheidegger · 2016-10-18T01:38:24Z

Right now Readme's in github are not internationalised. Even if they could be. I.e README.ja.md could show the readme in japanese. I think it would be a step in the right direction to specify this format as well.

The text was updated successfully, but these errors were encountered:

sotayamashita · 2016-11-18T01:48:59Z

@martinheidegger I agree with it so we need to decide which language code use.

martinheidegger · 2016-11-18T01:52:52Z

I tend towards ISO 639-1 (even though ISO 639-3 would be more complete) because as-a-readme it should try to find a balance of effort/usefulness. I think any person that can read "Cantonese" can read "Chinese" (at least to my knowledge).

sotayamashita · 2016-11-18T08:50:42Z

@RichardLitt Please add the discussion label.

RichardLitt · 2016-11-18T13:11:38Z

Done. Thanks @sotayamashita.

I agree about this issue; thank you so much for making it!

Can you point me to some repos which already do translations? I would love to see how others have done it, already, before deciding on a new standard.

Why wouldn't you lean towards ISO 639-3? What would be the downsides?

martinheidegger · 2016-11-18T13:28:12Z

Downsides to ISO639-3:

Uncommon in usage

Usually people know ISO 639-1 (en, ja, etc.) they usually have not heard of the more extended forms. Mistakes and irritation is expectable.

Usefulness questionable

It makes sense to translate content into more than one language because the major languages all have millions of people that talk it. Several of the ISO 639-3 languages are dialects of people that are spoken additionally to one of the ISO 639-1 languages. Those languages provide little or no value.

Growth of Maintenance cost

Maintain translations is a pain in the ass. A restriction to 184 languages at least restricts a little bit from having the translations to grow overboard. (tbh. I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers - 25 languages would qualify)

RichardLitt · 2016-11-18T20:50:26Z

Those are fair points. I think limiting to 184 languages is fine.

Would it be possible to specify both language codes? Is there a self-describing language code - as in, can it be clear that we are using ISO 639-1 as opposed to ISO639-3 easily?

martinheidegger · 2016-11-18T23:57:17Z

ISO 639-1 are two letter codes, ISO 639-3 are three-letter codes. Usually specifying "two letter code" implies that you use ISO 639-1.

I am not quite sure if this is what you asked but: it is possible to support either language code:

if (characterCodePart.length === 2) {
 iso639_1Check(characterCodePart)
} else if (characterCodePart.length === 3) {
 iso639_3Check(characterCodePart)
} else {
 throw watIsDat();
}

RichardLitt · 2016-11-19T05:26:21Z

Ah. So, it's very easy to tell the difference, then. In that case, why not add something saying: "Use ISO 639-1 if you can. If you can't, using ISO 639-3 is also valid." Is there a need to eliminate one for the sake of the other?

martinheidegger · 2016-11-19T07:31:49Z

see my reasoning above. i still somehow think the top 25 would be enough

RichardLitt · 2016-11-19T10:26:33Z

Maybe I am not being clear about where I am confused.

Even if they could be. I.e README.ja.md could show the readme in japanese. I think it would be a step in the right direction to specify this format as well.

What I am seeing is that we add this to the format:

If you have i18n for your READMEs, the standard is to name your files accordingly: README.ja.md, where ja is the ISO 639-1 code. All 639-1 codes are valid; if your language falls outside of ISO 639-1, then you may use ISO 639-3, which has three letters. For instance, README.ask.md for Askunu. However, if your language has both a ISO 639-1 and ISO 639-3 code, default to the ISO 639-1 code. So, for instance, README.en.md instead of README.eng.md.

This is what I am considering adding to the spec. For the linter, we can check, if it is two digits, if it is an ISO 639-1 language. If it is three, than it is the other - as your code suggests. I don't see a feasible way to limit the languages to 25, or why we would even want to.

Does this sound alright? What am I missing from your understanding of intentionally excluding ISO 639-3 languages?

wooorm · 2016-11-19T10:30:54Z

Maybe this can be more permissive by allowing BCP-47 tags? E.g., de, en-GB, nl-BE, and the like. That would open up regions as well, and it allows both 639-1 and 639-3 (preferring the shortest) too.

RichardLitt · 2016-11-19T10:41:12Z

I'm finding the spec a bit hard to parse. I'm also not sure I want to live in a world where README.en.md and README.en-GB.md are two READMEs I need to keep updated. But it does seem to have the best ratification elsewhere - it's used by a lot of other computing standards [See wiki]. So, I am for that.

Use the appropriate IETF tag seems to be a pretty fine thing to say; it allows us to not worry about conforming to one ISO variant over another, and it puts the burden on the translator to know what tag they should use in their README version (which they should already know, anyway).

wooorm · 2016-11-19T10:51:55Z

Having BCP47 also allows for different currencies; comma or full-stop as number separators; multiple scripts (Chinese, some Slavic languages, etc)!

RichardLitt · 2016-11-19T10:56:38Z

Well, that's sold me. Others?

martinheidegger · 2016-11-19T12:28:41Z

I wrote above this:

I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more

Using BCP47 would go straight against this.

Of the following two options:

A

or

B

I am going with B) ー Godspeed.

RichardLitt · 2016-11-19T12:37:04Z

@martinheidegger I'm really sorry man; I'm still a bit confused why you feel that we need to restrict languages. That's why we keep talking past each other.

What would restricting languages functionally mean? Is this not just about naming the README.md files?

martinheidegger · 2016-11-21T04:31:05Z

Okay, not trying to win the argument here. Just trying to convey my point: At the core there is one question: "Why does github not support internationalization?" And my guessing point on this is: because multiple languages would split the community and the effectiveness of open source code.

With this in mind, the counter question becomes "Why would you even try to support different languages in open source?". The only answer I can come up with is: "incompetency" (sort of): Not every person in the world speaks/writes/reads english. (See EPI)

By providing different translations we accept that and try to accommodate people who don't speak english but that is an effort and comes at a cost. To argue that cost, to make it worth it, translators should focus on the biggest amount of people that can not deal with english. Because more translations mean more effort and less work on the open-source code itself. Which I think is not good to facilitate.

allows for different currencies; comma or full-stop

This means to me just that we can put more effort into some place to which the effort doesn't help much, if at all.

RichardLitt · 2016-11-21T18:51:58Z

I understand all of that; thank you so much for laying it out clearly.

I agree about the cost, about why GitHub doesn't support i18n, and about focusing on the most amount of people.

I am curious about this possible point: If I speak a language that is uncommon, what is to stop me naming my README using my language code and doing my own work of translating the README? Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages - buy-in would be the responsibility of the translators and language communities, not the project. Limiting languages to the top 25 most spoken would be detrimental towards their efforts, I think. Do you understand what I am getting at? I may be misinformed! Please let me know if so. I agree that, as a translator, if I spoke three languages, I should focus on the more common one, but I don't see a problem with also translating the other one if I wanted.

Regarding currencies, commas, and the like; those may be important, and I don't see a problem with using a standard that does away with any possible bike-shedding. That's the point of standard-readme in the end, too.

martinheidegger · 2016-11-22T08:35:14Z

Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages.

The problem is here, I think, that you can't really separate the translators spec from the writers spec. Also the translators need to work from some base. so: if you open the spec to all languages, all languages will likely be used.

buy-in would be the responsibility of the translators and language communities, not the project.

That would suggest translators are not essential members of the project... heresy? 😛

... rd that does away with any possible bike-shedding.

Just for a complete picture: It is possible to limit the "25" languages to "25 BCP 47 codes". In other words: "en" could automatically stand for "en_UK".

There is an argument for and against limitation. BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.

RichardLitt · 2016-11-22T10:18:25Z

so: if you open the spec to all languages, all languages will likely be used.

I don't think this is true. It's up to each project to decide which languages should be used. If I write a spec saying all languages are possible for standard-readme, I won't wake up tomorrow to 7000 translations from all human languages.

That would suggest translators are not essential members of the project... heresy?

Sure they are. But only if the project has multilingual users or support.

BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.

I think that sums it up for me. I am going to go with BCP_47. We can revisit this later if we need to.

See #38

RichardLitt added the discussion label Nov 18, 2016

RichardLitt added a commit that referenced this issue Nov 22, 2016

Added i18n note

359b36e

See #38

RichardLitt mentioned this issue Nov 22, 2016

Added i18n note #46

Merged

martinheidegger closed this as completed Nov 22, 2016

RichardLitt mentioned this issue Nov 14, 2017

fix: note SPDX for license #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specification for i18n #38

Specification for i18n #38

martinheidegger commented Oct 18, 2016

sotayamashita commented Nov 18, 2016

martinheidegger commented Nov 18, 2016 •

edited

sotayamashita commented Nov 18, 2016

RichardLitt commented Nov 18, 2016

martinheidegger commented Nov 18, 2016 •

edited

RichardLitt commented Nov 18, 2016

martinheidegger commented Nov 18, 2016

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

wooorm commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

wooorm commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 21, 2016 •

edited

RichardLitt commented Nov 21, 2016

martinheidegger commented Nov 22, 2016

RichardLitt commented Nov 22, 2016

Specification for i18n #38

Specification for i18n #38

Comments

martinheidegger commented Oct 18, 2016

sotayamashita commented Nov 18, 2016

martinheidegger commented Nov 18, 2016 • edited

sotayamashita commented Nov 18, 2016

RichardLitt commented Nov 18, 2016

martinheidegger commented Nov 18, 2016 • edited

Uncommon in usage

Usefulness questionable

Growth of Maintenance cost

RichardLitt commented Nov 18, 2016

martinheidegger commented Nov 18, 2016

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

wooorm commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

wooorm commented Nov 19, 2016

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 19, 2016

A

B

RichardLitt commented Nov 19, 2016

martinheidegger commented Nov 21, 2016 • edited

RichardLitt commented Nov 21, 2016

martinheidegger commented Nov 22, 2016

RichardLitt commented Nov 22, 2016

martinheidegger commented Nov 18, 2016 •

edited

martinheidegger commented Nov 18, 2016 •

edited

martinheidegger commented Nov 21, 2016 •

edited