New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Localization #1453

Open
di opened this Issue Sep 8, 2016 · 6 comments

Comments

Projects
None yet
4 participants
@di
Member

di commented Sep 8, 2016

An initial attempt at localization was removed in #1335, but eventually we may want to bring translation back to make PyPI more accessible to a wider audience.

Previously we used http://l20n.org/, but this may or may not be the best tool for the job.

Additional tools that might be worth exploring:

@brainwane

This comment has been minimized.

Member

brainwane commented Dec 17, 2017

I am very much in favor of PyPI being localized, but since legacy PyPI did not support localization, I need to make the hard call and say that it's not on our critical path for launching Warehouse.

@brainwane

This comment has been minimized.

Member

brainwane commented Feb 13, 2018

Per Heidi Waterhouse's overview and appreciating the discussion in #402, there are some prerequisite steps we ought to do even if we don't have time right now to gather volunteer translations via Transifex or TranslateWiki or fully implement a localization tool like l20n.

  • Figure out how to use Pyramid's existing localisation/internationalization support so we can avoid hard-coding messages in our templates. For instance, warehouse/templates/includes/edit-project-button.html has the plain hardcoded English message "Edit Project", not something like ... I don't know the syntax, but, {{ edit-project-msg }} which would have the right message interpolated for the user's locale, which (at launch) we'd default to en.
  • Confirm extended character support so people's names can include, e.g., Chinese characters.
  • Check our screens for fixed-width elements.
@di

This comment has been minimized.

Member

di commented Feb 13, 2018

Figure out how to use Pyramid's existing localization/internationalization support so we can avoid hard-coding messages in our templates.

The actual marking of strings for translation can be done pretty easily with pyramid.i18n which provides a request.localizer.translate function, and pyramid-jinja2 which provides the same function for our Jinja2 templates.

The marked strings can then be extracted and compiled for translation with babel

The hard parts here as I see it are:

  • identifying all the strings which require localization
  • updating them (and all the tests that this will break, which will be a lot)
  • figuring out how this will affect caching
  • actually getting the strings translated
@dstufft

This comment has been minimized.

Member

dstufft commented Feb 13, 2018

figuring out how this will affect caching

This is a large part of why when I disabled localization, I ripped it out completely. I also asked a number of people and the responses I got were mixed, ranging from "absolutely, translate this it will help non-native english speakers use PyPI" to "It would be nice, but unless you've got the infrastructure to ensure that they stay up to date, and the bulk of the messages stay translated, it'll probably hurt more than it helps" to "don't bother, it's basically impossible to program Python without knowing English anyways".

I am native english (and only english) so I can't really decide between these.

My biggest concern with localization is really just logistics. We're a volunteer project so we can't pay to have someone ready and able to translate new or changed strings, we have to rely on the community to do it. The impact of that is likely going to mean that at best we're going to end up with partial translations over time as people help and translate a large portion of the text and then lose interest or no longer have the time to contribute and the translation for a particular language starts to bitrot. This would apply both to new strings (which would need translated fresh) and modified strings (which would need a double check to ensure they still make sense with the modifications).

The other issue is just quality of the translation itself. From my dabbling in this before, I've noticed that often times people are eager to help out when they know 2 or more languages and submit translations, but either their proficiency at one the languages is lacking, or translating takes a different skillset than just comprehension/fluency and they end up submitting technically correct but very poor or confusing translations. These translations are next to impossible for me to review (and possibly anyone currently on the team? I'm going to guess everyone currently working on the team is english native) so it's unlikely going to be something any of us are capable of reviewing.

So at a minimum, I think if we're going to offer localization we're going to need someone to take ownership of each language we add. This someone would need to have experience doing localization and a strong background in both english and the target language. With taking ownership they'd effectively be making a commitment to ensuring that the their language stays translated with high quality translations (that doesn't mean that they have to personally do it, they could recruit other people for instance and that's fine).

Until we have such a person (or persons), I don't think it makes sense to bother worrying about the technical side of making localization possible.

@nlhkabu

This comment has been minimized.

Member

nlhkabu commented Feb 14, 2018

I agree with Donald on this one - I think the first step here is to set up the social infrastructure to support i18n. IMO, we should look for an "owner" for each language - someone who could either edit/review translations, or write them. Each owner should understand that they are committing to the project and if they want to walk away, they will need to find someone to handover to (of course, we can help them with this).

We might even want to recruit someone to manage the whole subject - someone to notify each language owner when new translations are needed, to help recruit translators and generally ensure that we don't end up with translation rot.

I am currently working with a translations specialist at PeopleDoc (my day job) - one thing that has been really strongly emphasised is the importance of providing context to the translators. Basically, our translation specialist has asked that we annotate each string in the template to describe where it is and what it is doing. This helps the translators understand things such as - is this a verb, an adjective or a noun? A good example being the word "complete" - which could either be an action or a status. I think that doing this would address the vast majority of the quality concerns Donald raised earlier.

If we are going to kick this off, may I suggest French as the first candidate? I work for a French company and have good connections with the French Python community- so it could be a good starting point. May I also suggest we leave RTL languages for last? We should be able to convert our CSS to RTL thanks to https://www.npmjs.com/package/postcss-rtl, but I'd rather tackle this down the line.

@brainwane

This comment has been minimized.

Member

brainwane commented May 2, 2018

I think French as a first candidate is wise!

RTL as a later standalone project makes some sense to me -- perhaps under the Google Summer of Code umbrella (I'm influenced by Moriel Schottlender here).

I'll note here, for future readers, that Warehouse is seeking grants or other sponsorship-type funding to work on researching and implementing l10n (as noted on distutils-sig).

(Another note: came across https://medium.com/@thejameskyle/the-language-of-programming-7983b8f6910d which recommends Crowdin.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment