-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification for i18n #38
Comments
@martinheidegger I agree with it so we need to decide which language code use. |
@RichardLitt Please add the |
Done. Thanks @sotayamashita. I agree about this issue; thank you so much for making it! Can you point me to some repos which already do translations? I would love to see how others have done it, already, before deciding on a new standard. Why wouldn't you lean towards ISO 639-3? What would be the downsides? |
Downsides to ISO639-3: Uncommon in usageUsually people know ISO 639-1 (en, ja, etc.) they usually have not heard of the more extended forms. Mistakes and irritation is expectable. Usefulness questionableIt makes sense to translate content into more than one language because the major languages all have millions of people that talk it. Several of the ISO 639-3 languages are dialects of people that are spoken additionally to one of the ISO 639-1 languages. Those languages provide little or no value. Growth of Maintenance costMaintain translations is a pain in the ass. A restriction to 184 languages at least restricts a little bit from having the translations to grow overboard. (tbh. I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers - 25 languages would qualify) |
Those are fair points. I think limiting to 184 languages is fine. Would it be possible to specify both language codes? Is there a self-describing language code - as in, can it be clear that we are using ISO 639-1 as opposed to ISO639-3 easily? |
ISO 639-1 are two letter codes, ISO 639-3 are three-letter codes. Usually specifying "two letter code" implies that you use ISO 639-1. I am not quite sure if this is what you asked but: it is possible to support either language code:
|
Ah. So, it's very easy to tell the difference, then. In that case, why not add something saying: "Use ISO 639-1 if you can. If you can't, using ISO 639-3 is also valid." Is there a need to eliminate one for the sake of the other? |
see my reasoning above. i still somehow think the top 25 would be enough |
Maybe I am not being clear about where I am confused.
What I am seeing is that we add this to the format:
This is what I am considering adding to the spec. For the linter, we can check, if it is two digits, if it is an ISO 639-1 language. If it is three, than it is the other - as your code suggests. I don't see a feasible way to limit the languages to 25, or why we would even want to. Does this sound alright? What am I missing from your understanding of intentionally excluding ISO 639-3 languages? |
Maybe this can be more permissive by allowing BCP-47 tags? E.g., |
I'm finding the spec a bit hard to parse. I'm also not sure I want to live in a world where Use the appropriate IETF tag seems to be a pretty fine thing to say; it allows us to not worry about conforming to one ISO variant over another, and it puts the burden on the translator to know what tag they should use in their README version (which they should already know, anyway). |
Having BCP47 also allows for different currencies; comma or full-stop as number separators; multiple scripts (Chinese, some Slavic languages, etc)! |
Well, that's sold me. Others? |
@martinheidegger I'm really sorry man; I'm still a bit confused why you feel that we need to restrict languages. That's why we keep talking past each other. What would restricting languages functionally mean? Is this not just about naming the README.md files? |
Okay, not trying to win the argument here. Just trying to convey my point: At the core there is one question: "Why does github not support internationalization?" And my guessing point on this is: because multiple languages would split the community and the effectiveness of open source code. With this in mind, the counter question becomes "Why would you even try to support different languages in open source?". The only answer I can come up with is: "incompetency" (sort of): Not every person in the world speaks/writes/reads english. (See EPI) By providing different translations we accept that and try to accommodate people who don't speak english but that is an effort and comes at a cost. To argue that cost, to make it worth it, translators should focus on the biggest amount of people that can not deal with english. Because more translations mean more effort and less work on the open-source code itself. Which I think is not good to facilitate.
This means to me just that we can put more effort into some place to which the effort doesn't help much, if at all. |
I understand all of that; thank you so much for laying it out clearly. I agree about the cost, about why GitHub doesn't support i18n, and about focusing on the most amount of people. I am curious about this possible point: If I speak a language that is uncommon, what is to stop me naming my README using my language code and doing my own work of translating the README? Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages - buy-in would be the responsibility of the translators and language communities, not the project. Limiting languages to the top 25 most spoken would be detrimental towards their efforts, I think. Do you understand what I am getting at? I may be misinformed! Please let me know if so. I agree that, as a translator, if I spoke three languages, I should focus on the more common one, but I don't see a problem with also translating the other one if I wanted. Regarding currencies, commas, and the like; those may be important, and I don't see a problem with using a standard that does away with any possible bike-shedding. That's the point of standard-readme in the end, too. |
The problem is here, I think, that you can't really separate the translators spec from the writers spec. Also the translators need to work from some base. so: if you open the spec to all languages, all languages will likely be used.
That would suggest translators are not essential members of the project... heresy? 😛
Just for a complete picture: It is possible to limit the "25" languages to "25 BCP 47 codes". In other words: "en" could automatically stand for "en_UK". There is an argument for and against limitation. BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure. |
I don't think this is true. It's up to each project to decide which languages should be used. If I write a spec saying all languages are possible for standard-readme, I won't wake up tomorrow to 7000 translations from all human languages.
Sure they are. But only if the project has multilingual users or support.
I think that sums it up for me. I am going to go with BCP_47. We can revisit this later if we need to. |
Right now Readme's in github are not internationalised. Even if they could be. I.e
README.ja.md
could show the readme in japanese. I think it would be a step in the right direction to specify this format as well.The text was updated successfully, but these errors were encountered: