Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
127 lines (78 sloc) 15.8 KB

Internationalization in studio-frontend

When you need to internationalize in a react context, this is the guide for you.

Relevant topics: studio-frontend react react-intl reactifex transifex keyvaluejson localization i18n internationalization icu

Table of Contents


Background

To start, I assume you're more-or-less familiar with React. A fair amount of what follows will relate specifically to studio-frontend and edx-platform, but it is my hope that the documentation will allow you to apply a similar system to other independent React applications.

The problem you are now ready to solve is as follows:

How can I ensure my React application is displayed in the end user's chosen language?

In our case, this led to some followup questions:

  • How will I handle pluralization? Does my tooling support that?
  • Why does all of the current edX i18n tooling assume python? Can we do this entirely in a javascript context?
  • Can we ensure we meet the guidelines specified in WCAG 2.0 Success Criterion 3.1.2 (Language of Parts)?

react-intl is a library that provides React components and an API to format dates, numbers, and strings, including pluralization and handling translations, per the project's README. And they have a well-documented wiki, huzzah! This is exactly what we need.

react-intl basics in studio-frontend

Flagging strings for translation

First things first, we can't localize strings for a user unless we know which strings should be localized. The simplest way is to use a FormattedMessage component, which will have your string specified in a defaultMessage prop.

In the case of studio-frontend, we elected to abstract things a bit further and declare our strings-to-localize using the underlying defineMessages API function directly in a separate file as seen here. This allows us to know that all display strings are consistently located at src/components/<component>/displayMessages.jsx, which we feel is easier to work with from a "where is this message defined" perspective.

Note that whichever way you choose to go, react-intl will be converting your display strings into message descriptors, which contain the string itself, a unique identifier, and optionally a description.

Extracting flagged messages

babel-plugin-react-intl can then be used to extract the messages from your source files. If you, like us, want to require descriptions, set "enforceDescriptions": true in your .babelrc as seen here.

The last step our message extraction is one of convenience. By default, babel-plugin-react-intl will extract your strings into multiple files, corresponding to the files the messages were declared in. This may be useful when your display strings are declared in FormattedMessage components all over the place, but it adds little value in our situation. To place all the strings in a single file, I created reactifex, whose output looks like this (astute readers will note that descriptions are not preserved - more on this in a bit).

For studio-frontend, we have defined 2 simple make targets that will update the extracted messages-in-JSON file, as well as a CI check to ensure they've been run on each PR.

So the end result of message declaration and extraction is a JSON file of messages to be translated. Conveniently, this file is able to be uploaded to Transifex as KEYVALUEJSON. We've configured our project on Transifex to watch this file in github.

The act of translation

For now, assume that the strings get translated, and we have in our possession a map that looks like {message_id: translated_message} for the user's current language. studio-frontend's translated message files look like this, as an example. We'll discuss how these files get updated as well as how the included message descriptors get injected into the app a bit later.

Using translated strings

Now that you have your translated strings, the process of utilizing them is fairly straightforward. First, you need to ensure your messages (and the corresponding locale, more on that below) get passed into an IntlProvider component that wraps your app's root. studio-frontend does so here.

Within this context, any FormattedMessage will call formatMessage under the hood. As described in those docs, this will try to match up the FormattedMessage's default message descriptor with the descriptor provided by the wrapping IntlProvider's messages using id as a key, falling back to the default message definition if there's no match.

There are a few ways to utilize FormattedMessage objects in your React code; here are some examples from studio-frontend:

  • In the simplest case, the message component only gets a message descriptor, as seen here. This results in a simple <span> element, containing the translated string.
  • This is also true if you make use of the value prop to interpolate a variable into your message, seen here.
  • For complex cases, you may pass as function as the message component's child. This is especially useful if you want to translate an attribute such as aria-label, like we did here

Advanced topics in studio-frontend internationalization

Locale data

react-intl's docs explain this better than I can. Basically, you need to load a library-provided file and call addlocaleData on it so that plurals, times, and numbers are formatted correctly. studio-frontend does that here, using a given locale code to look up the correct library-provided file in a map of supported languages.

Supported Languages

We spent forever trying to make our setup such that each available (from react-intl) language was available to webpack separately, but ended up realizing that we need to enumerate the list of available (from edX) languages and package them all up in the studio-frontend bundle. This is deemed acceptable because the language list is fairly static, and there are only 5 of them including the default English. The exported languages (and their corresponding locales) are defined here, you can read this PR for more detail.

As an aside, note that the translated message files live at src/data/i18n/locales/ and are imported into the above file. These are updated by a weekly job, and are imported into the above currentlySupportedLangs.jsx so as to include them in our bundled distribution.

WrappedMessage

As mentioned above, we want to ensure we comply with WCAG guidelines to the fullest extent possible. To that end, we created a wrapper around the default FormattedMessage, creatively named WrappedMessage. This wraps the component in a <span lang=<lang_code>> element, which allows screen readers to know the language. Notably, this approach allows for each string to be wrapped with the correct lang, even if we're missing the translation for that string and fall back to the English default.

Also note that we define the message prop in such a way as to exactly match the message descriptor objects linked above. This allows us to use the messages we define in a separate file as a variable, then have the defaultMessage and id sub-props fan out into FormattedMessage as described in the react-intl docs.

Use of this wrapper is totally optional; in the remainder of this document I'll refer to FormattedMessage instead, since they do the exact same thing and it may help when searching the react-intl wiki.

edx-platform Integration

It's one thing to have your React application load in a particular language. But what if you didn't know which language in any sort of javascript context, say if you're getting this app as a python-rendered response from a django server that contains the user's desired language deep in python land? If you were to declare the following something of a "hack", I wouldn't fight you. I do resent the notion of it being a dirty hack though, that'll come later.

The solution we landed on is to have the django server grab the desired data from our webpack bundle and load it into the DOM directly, in a predefined location. Then, our react code can extract the contents and dynamically load it into the root IntlProvider. The code for this can be seen in our mako entry point and the helper function it calls on the edx-platform side, and then in our app's root and the helper function it calls on the studio-frontend side.

Pluralization

pluralization - English only has "singular" and "plural" (or "not singular"); other languages such as Russian or Arabic have more than two options, depending on whether there exist zero, one, two, a few, or many items. If we do not handle these cases, a string as incorrect as 1 results may be rendered in those languages.

This problem is why we stopped using python-centric Transifex tools. react-intl, being part of FormatJS, handles pluralizable strings in ICU format quite well out-of-the-box. The problem for us initially came in when sending these files up to Transifex. Every previous edX project in Transifex used PO files to specify strings. For a hot sec, we were converting our js-defined messages into python-defined PO files, then the reverse on returning from transifex. However, this broke horribly when we got to pluralizable strings, since the conversion tools we were using were not built to handle that case.

Instead of fixing the tool, we opted to keep things in JSON and use a KEYVALUEJSON file. This works correctly and keeps plurals working (note that Transifex already had support for translators to pluralize), but it came at a cost. That file type is so simplified that it loses track of descriptions, which can be vital to helping a translator know the context of the string they're translating. To solve that, I again turned to reactifex.

Push comments using the Transifex API

This is a dirty hack, for some value of the words "dirty" and "hack". Transifex provides a rather nice API, which can do 2 important things:

So naturally, I wrote a js library that will make the above calls using curl. I used curl because that's what Transifex documented, and I couldn't get js calls working correctly without external dependencies (something I'm a bit stubborn about when writing libraries). reactifex's documentation explains in more detail, and you can see it in action in studio-frontend in our Makefile

Jenkins Jobs

There are only 2 things remaining that I haven't explained here - how do translated strings get pulled down from Transifex and updated in our repo, and how do we push comments to Transifex (given the fact that Transifex watches github for the main KEYVALUEJSON file)? The answer in both cases is a weekly job. At edX, we run them in Jenkins, but it could be any type of cron job.

The full details of these jobs and their setup is well beyond the scope of this document, I'll document that in an internal wiki page later. Suffice it to say that the push job runs these lines surrounding make push_translations, and the pull job runs make pull_translations