New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a determination about i18n and l10n #402

Closed
dstufft opened this Issue Mar 14, 2015 · 20 comments

Comments

Projects
None yet
@dstufft
Member

dstufft commented Mar 14, 2015

Right now we have strings tagged for translation, is this actually important for PyPI? Should we plan for an eventual future where we translate PyPI or can we assume that anyone programming in Python is going to grasp enough english to interact with the PyPI UI?

If we're going to support translating Warehouse, then make sure we have enough mechanisms in place to do that and sort out how that will all work.

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 14, 2015

This is something that @jezdez probably has opinions on.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Mar 18, 2015

Red Hat's http://zanata.org/ team works out of the Brisbane office (same place I work), so I'd be inclined to suggest that as the translation platform if we decide to support translations (and I think we should).

Their (lead?) designer, @lukebrooker is also my main point of reference when it comes to keeping track of what "good" currently looks like from a web design point of view :)

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 18, 2015

Right now most (or all?) of our strings are marked for translation but I don't have a strong opinion on if we should or not, other than supporting them makes the code base slightly less nice, but only slightly so I don't care much. We don't have any tooling around extracting/selecting a language though. I just don't really know if it's useful. Almost all of the descriptions of projects are in English (although there are some which are not, particularly there seems to be a decent contingent of some Asian looking language (pictographs, no idea what actual language it is) in the descriptions.

If it is useful and we can find people willing to invest in maintaining translations (my assertion is that wrongly translated or partially translated is worse than not translated) then I think we should have it. I also don't have a strong opinion about what tool it should use and I'm more than happy to defer that decision to whoever actually does the work on translating things. Particularly interesting is how translating would work in software that doesn't get shipped periodically as versioned releases (like an OS or other similar software does) but in one that gets shipped continuously (or near continuously).

I speak English and only English so my own usefulness in this regard is quite limited.

@mktums

This comment has been minimized.

mktums commented Mar 22, 2015

As native russian who resides in Russia I, as developer, can tell that almost all employers requires fluent knowledge of english for reading tech literature (such as documentation, articles, etc.), so for most potential russian speaking developers lack of translation will not be a problem.

My personal opinion is that (for russian translation) it'd be much more easier to work with english. But for Japanese or Chinese it would be better to have translations, IMHO.

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 22, 2015

Thanks @mktums!

Your comment about Japanese or Chinese is interesting, I've noticed that almost all project descriptions on PyPI seem to be in English, except for a small (but sizable) minority which are using some sort of Asian looking language (I don't know which one(s), and maybe it isn't Asian... I'm seriously not knowledgeable in this area). An example that I just pulled off the front page of PyPI is ufp.

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 22, 2015

Looking at the front page some more, maybe I'm just not noticing other languages because I generally just skim these lists and if they use lettering that looks similar to english they just don't stand out to me. Here's another project li-pagador-paypal which appears to be in a non English language.

@frecar

This comment has been minimized.

frecar commented Mar 22, 2015

As a native Norwegian, who lives in Norway, I (along with any other professional developer I know in Norway) prefer to use English. I think there are two main reasons for this:

  1. The documentation will always be better in English (since the core development is in English)
  2. Non-English projects are hard to adopt and will for that reason be unavailable for most developers..

I think it is wise to enforce English for any open-source project.

@mktums

This comment has been minimized.

mktums commented Mar 22, 2015

@dstufft it's actually pretty easy to determine which language it is, based on form of hieroglyphs ;) In provided example it's Korean.

Generally package's owners uses their language primary for packages that can be used in their region, such as wrappers for local payment services. For example there's a lot of russian region only packages that will be interesting only for russian based developers.

And I'm totally agree with @frecar.

@ncoghlan

This comment has been minimized.

Member

ncoghlan commented Mar 26, 2015

Note that this issue is about translating the Warehouse UI itself, rather than saying anything about the documentation of uploaded projects. The rationale for doing that is well articulated in the post announcing the Portuguese language version of Stack Overflow: http://blog.stackoverflow.com/2014/01/ola-mundo-announcing-stack-overflow-in-portuguese/

The short version of the rationale is that while English is indeed a requirement for full participation in the global open source community, it shouldn't be a requirement to learn English first before you start learning to use open source software. Rather, it's desirable to achieve an environment where folks can still be learning English at the same time as they're learning to code. This is especially the case as learning to code becomes a standard part of primary school curricula around the world.

There's a GSoC project proposal this year to start enabling translation of exception messages for CPython itself, and this is the main rationale: so folks can still learn to automate their own tasks, without the "learn English" prerequisite involved in becoming a full fledged professional Python programmer in most countries. It's also worth noting my understanding is that China, Japan, Korea, India have large enough local programming communities that English is only a requirement if you want to work for a multinational organisation based in an English speaking country. I expect there are other countries with similarly large local communities, those are just the four where I'm already fairly sure it's the case.

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 26, 2015

A key difference between what Warehouse would do and what Stack Overflow did is that while Stack Overflow made a Portuguese language version, it was essentially creating a whole new stack exchange site that was distinct from all the others that just happened to be in Portuguese. It has it's own content distinct from all of the others, and that content is in Portuguese. This means that someone interacting with that site will work completely in Portuguese and it isn't just a translation of the UI itself.

So while this issue isn't about translating the documentation of the uploaded projects, I do think it is an important thing to note. If a user doesn't speak English well enough to understand an English UI, are they going to understand an English project description? Do we have enough projects where the description is available in a language other than English (and in particular, enough projects the share a language) where a non English speaking user can meaningfully benefit from PyPI? On the flipside, we have an incomplete set of data because currently you can only have one project description, so it makes sense I think for that to default to English for a project even if that project wants to try and be more inclusive towards people who don't speak English. If we went one further and enabled people to upload multilingual project descriptions such that if someone was operating Warehouse in a particular language, and that project had a particular language uploaded as well, we could then localize the project description itself?

I don't feel qualified myself to speak authoritatively on whether or not it's helpful in the general case, since I only speak English myself so it's already in the "right" language for me. However from the (limited) feedback I've gotten on this issue, on Twitter, and in IRC it appears that amongst the people who've I've talked to English is seen as a sort of lingua franca amongst the programming community. That you need to be able to speak, read, and write it to meaningfully participate in the wider programming ecosystem. A number of them have expressed that even if Warehouse was in their native language they'd continue to use it in English because of the lingua franca nature of English in programming and because it's more likely that English is going to make sense and be properly worded etc.

Unfortunately there's going to be a bit of a bias in the answers I'm able to get. Someone who isn't able to communicate in English is unlikely to be following me on twitter, or talking to me in IRC, or reading this issue at all. There could be a sizable number of people out there silently struggling to use PyPI who would be helped by having it translated into their own language who I simply can't reach because I can't speak their language (would they be helped if the UI but not the content was in their language?).

Beyond the question of whether translating Warehouse makes sense at a high level, is whether we have the manpower available to actually do it. It was expressed to me that people generally hate getting translated versions of sites because they are often out of date or the translations are not of a high quality or they are only partially translated anyways. If we continue to mark strings for translation and we put in the mechanisms that allow people to switch to different languages in Warehouse, are we going to have the manpower available to maintain high quality translated that are kept up to date? This won't be able to be something that someone just comes along once and does a one shot translation, it will need to be people who are committed to maintaining the translations in an ongoing basis. It will also need to be people who we can trust to some degree (e.g. they can't be relatively unknown) since they are going to have a lot of control over what words get put on the main site with little ability for myself to review since I simply can't speak their language and the best I can do is plug it into translate.google.com.

How too will this effect our development process? Right now a PR can be completely self contained, it has all of the new strings it needs (in English) and can be immediately deployed. However if we have translations does that mean we have to chase down translators if we adjust the wording or add a new translation string and get them to submit PRs (or whatever the tooling is to translate something is) before we can deploy? Do we deploy a change that's going to make some new string English only until a translator has a chance to translate it? Will that eventually become a liability where we have languages that we don't have translators for anymore that are just slowly suffering from bitrot over time that we'll then need to decide if it's better to remove it or let it continue to bitrot? Will we end up getting push back from people who have come to rely on a language that we no longer have someone able to maintain it?

Right now I'm leaning heavily towards stating that Warehouse (and thus PyPI) are English only (though I should probably at least post this issue on distutils-sig as well) with the primary reason being that I'm thoroughly concerned about our ability to attract and retain volunteers willing and able to maintain translations in a short enough timeframe that are also well known enough that we can feel comfortable enough essentially giving them the ability to write whatever words they want in our UI without much oversight. A secondary reason is that I have doubts about the usefulness in general given that the vast bulk of the content on PyPI is English only (which is backed up by the fact I've not yet found any non-native English speaker who says that they think it would be worthwhile to do the translation, admittedly a biased sample of people though).

@tiran

This comment has been minimized.

tiran commented Mar 26, 2015

I like to bring up another reason why i18n is useful. Some of you have already stated that most engineers and developers are fluent in English. In Germany that's true for almost every professional. Honestly I can't imagine a professional developer without English skills.

But Python has a much broader audience than just professional coders. Projects like RaspberryPi are trying to get young children into development. When I was in school I wasn't fluent in English until the age of 15, 16 after a school trip to UK. Some school classes have French as first foreign language. These kids start to learn English at the age of 13! Also elderly people tend to lack English skills in Germany. I know elderly people who have learnt Ancient Greek, Latin and Hebrew at school but no English.

To me it makes sense to translate the new PyPI gui as well as support i18n for project docs to increase diversity and welcome people that haven't learnt sufficient English (yet).

@jezdez

This comment has been minimized.

jezdez commented Mar 26, 2015

I'm sorry for not taking the time to read this wall of text, the short answer to whether to localize a web site is always yes.

The mere fact that a lot of the people in our circles use English for communication is orthogonal to answering that question. Providing native user interfaces is a matter of reaching out to those that are not familiar with English in day to day work.

In case you're an English native speaker, the closest analogy I can think of is this: imagine having to read every document and website in "Legalese" - the language often used in legal documents like terms of service. If you're not a trained legal professional you'll understand some of it, and probably can make a bit of sense of it. But probably not completely, and you're left with a lingering feeling to maybe having missed something. That's how it feels to read something in English if you're not a native speaker.

So if you think Python can afford to stick to English and risk alienating those users that feel intimidated by English, don't consider i18n/l10n. But I would urge you to make the effort and accept the fact that there are more languages than English.

@ionelmc

This comment has been minimized.

ionelmc commented Mar 26, 2015

My 2 cents here (and I'm not a native English speaker): If the documentation for the core tools (pip, virtualenv, setuptools) is not translated, there's not much to gain for having localized interface for PyPI. At the very least http://packaging.python.org/ should be translated.

Users are going to look at the package index and then get baffled when they want to actually install a package.

Sure, it's desirable to translate Warehourse, but you need to look farther if you want to get my grandma to install a package.

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 26, 2015

Ok, after taking all the feedback into consideration and talking over it on IRC I'm going to go ahead and say that we're going to require user facing strings be marked for translation and, assuming we can get people to do the translations, the Warehouse (and thus PyPI) UI will be translated into non English languages.

Improvements to the packaging standards to make the descriptions themselves translated is out of scope, I think it's probably not a bad idea, but that'll need to come from the package format side and if the package formats add support for it then Warehouse will implement it.

Improvements to other tooling/projects will have to take place in those projects/tools and I don't think that we need to wait for the other tools to "catch up" to start doing it. Someone has to go first, and since Warehouse is greenfield, it might as well be us.

Thanks again everyone for all your input!

@dstufft dstufft closed this Mar 27, 2015

@Ivoz

This comment has been minimized.

Member

Ivoz commented Mar 29, 2015

Just want to note I found amazing crowd-sourcing appeared like magic to help a (somewhat small in the scheme of things) tutorial project that I help run at https://www.transifex.com/projects/p/python-for-beginners/ get translated into numerous languages. I can only predict, PyPI being something much bigger, would easily get the copywriters needed if you gave them the tools to do the work. Transifex will host open-source projects for free and seems to have a large userbase already, but if @ncoghlan thinks zanata is just as good then I don't mind which we'd end up going for.

@mktums

This comment has been minimized.

mktums commented Mar 29, 2015

@Ivoz I've already used Transifex for helping translate django-rest-framework, and I must say - it's pretty simple to use.

@mattrobenolt

This comment has been minimized.

Contributor

mattrobenolt commented Mar 29, 2015

@BYK did all of this for Disqus and we use Transifex as well.

@BYK

This comment has been minimized.

BYK commented Mar 29, 2015

I can indeed vouch for Transifex. I'd also happy to help about anything related to automating translations (the process, not talking about machine translation :))

@dstufft

This comment has been minimized.

Member

dstufft commented Mar 29, 2015

One thing I'm interested in hearing about is how people manage translations for a web application. For a more traditional application which you periodically release I know it's typical to have a "string freeze" towards the end of the life cycle and then have a call for translators to go through and translate them which you then pull into the project before cutting the final release. However given that a web application doesn't really have periodic releases like that when you can do a string freeze and wait for translators.. how do you manage that? Do you just expect translators to periodically check for new translations? Do you ship things without translations for new strings or do you expect people to somehow get translations done before accepting a PR?

@alex

This comment has been minimized.

Member

alex commented Mar 29, 2015

If having it translated before laucnhing it is important, and you want to
do CD (both of which are pretty reasonable), you can just put anything with
a new string behind a feature flag, get people to translate, and then drop
the flag.

On Sun, Mar 29, 2015 at 3:56 PM, Donald Stufft notifications@github.com
wrote:

One thing I'm interested in hearing about is how people manage
translations for a web application. For a more traditional application
which you periodically release I know it's typical to have a "string
freeze" towards the end of the life cycle and then have a call for
translators to go through and translate them which you then pull into the
project before cutting the final release. However given that a web
application doesn't really have periodic releases like that when you can do
a string freeze and wait for translators.. how do you manage that? Do you
just expect translators to periodically check for new translations? Do you
ship things without translations for new strings or do you expect people to
somehow get translations done before accepting a PR?


Reply to this email directly or view it on GitHub
#402 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment