Skip to content

Localization Guide

chrisgarrity edited this page Oct 18, 2018 · 13 revisions

Making Scratch accessible to communities beyond those who speak English is very important to us. We make a lot of effort to make sure as many parts of our site are localized into as many different languages as we can. Part of our localization efforts involve making sure we properly tag all text on our website so that it can be translated by our localization community*.

In order to make Scratch translatable, we use a React plugin called react-intl. React-intl provides components in which you can wrap text on the website, which it then uses to look up translations for that text (we currently use the beta for v2 version of this plugin). For static text inside of components, react-intl provides us the FormatMessage component to use. For instance, say we have the following text on Scratch:

var React = require('react');

var HitchhikersGalaxyGuide = React.createClass({
    .
    .
    .
    render: function () {
        return (
            <p className=”box-content”>
                In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.
            </p>
        );
    }
});

If we wanted to make this translatable, we would turn it into the following:

var React = require('react');
var FormattedMessage = require('react-intl').FormattedMessage;

var HitchhikersGalaxyGuide = React.createClass({
    .
    .
    .
    render: function () {
        return (
            <p className=”box-content”>
                <FormattedMessage
	            id: ’info.DouglasAdamsQuote’ />
            </p>
        );
    }
});

And we'd then add the string that is to be displayed into our template localization file, called l10n.json:

{
    .
    .
    .
    "info.DouglasAdamsQuote": "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.",
    .
    .
    .
}

Adding it to the template file ensures that when messages are loaded into the react component, this one will be found by the id check for info.DouglasAdamsQuote.

You may notice that there are multiple files called l10n.json in this src directory. This is done to ensure that only the relevant strings for a view are loaded onto that page for localization at request time. The basic structure of our l10n files is as follows:

  • /src/l10n.json – contains general strings that are re-used in multiple parts of the site.
  • /src/views/:<viewName>/l10n.json – contains strings specific to the view of viewName (such as splash, or about).

Only views should have l10n.json files in them – components should not.

We occasionally use some of the other react-intl components/methods – formatMessage(), FormattedHTMLMessage, or FormattedRelative – but they are not as likely to be used (more information on them is available at react-intl).

When assigning a string an ID, it’s important to give it a descriptive one. Generally, the pattern for id generation should follow the following guidelines:

  • id: ‘<name of component or view>.<brief description of what the string says>’

The brief description should not be longer than 4 words total, and if it is more than one word, it should be in camelCase. The exception to this convention is when a string is used in multiple components and/or views – for example, translating the word “About”. In these cases, the guidelines are as follows:

  • id: ‘general.<brief description of what the string says>’

So, for the word “About” it might look like: id: ‘general.about’

Removing FormattedHTMLMessage

In react-intl the FormattedHTMLMessage has been deprecated in favor of using FormattedMessage with placeholder values. New strings should not include embedded HTML, and we should be actively replacing current uses of FormattedHTMLMessage.

Examples

A string with HTML formatting

If the current implementation is:

// In l10n.json:
"info.towelday": "<span class='some-class'>Towel day</span>, an annual commemoration of the life and work of Douglas Adams."

// In .jsx:
<FormattedHTMLMessage id='info.towelday'/>

Change to:

// In l10n.json:
"info.towelday": "Towel day",
"info.toweldaydesc" : "{toweldayName}, an annual commemoration of the life and work of Douglas Adams."

// In the jsx:
...
<FormattedMessage 
    id="info.toweldaydesc" 
    values={{
        toweldayName: (
            <span className='some-class'>
                <FormattedMessage id="info.towelday" />
            </span>
        )
    }}
/>
// Or import injectIntl, and wrap the exported component:
import { injectIntl } from 'react-intl';
...
<FormattedMessage 
    id="info.toweldaydesc"
    values={{
        toweldayName: (
            <span className='some-class'>
                {this.props.intl.formatMessage({id: 'info.towelday'})}
            </span>
        )
    }}
/>
... 
export default injectIntl(<component>);

In general, do not split text into two independent strings unless they are unrelated. (e.g., <span><FormattedMessage id="string1"/></span><FormattedMessage id="string2"/> in the jsx). Some languages would translate the phrase putting the placeholder at the end, and that's impossible if the text of the first string is not represented by a placeholder in the second. On the other hand, if the formatted HTML is actually separate sentences or block elements, feel free to just make them separate string ids.

A string with a link

This is very similar to the example above.

If the original is a string that includes a link to a website such as:

// in l10n.json:
"info.scratchlink": "Take me back to <a href='//scratch.mit.edu'>scratch.mit.edu</a>"
// in jsx:
<FormattedHTMLMessage id="info.scratchlink"/>

That changes to:

// in l10n.json:
"info.scratchlinktext": "Take me back to {scratchLink}"
// in jsx:
<FormattedMessage 
    id='info.scratchlinktext'
    values={{
        scratchLink: (
            <a href='//scratch.mit.edu'>
                scratch.mit.edu
            </a>
        )
    }}
/>

Setting up Transifex for new views (pages)

When a new view with localization is added, the file needs to be initialized with Transifex to make it available for translators.

In scratch-www: If you have the tx command line utility available, you can use the tx set command to set the source file for the view. For example, to add a new resources called 'tips':

> tx set --source -r scratch-website.tips-l10njson -l en --type KEYVALUEJSON src/views/tips/l10n.json

If you don't have access to the command line tool, you can edit the .tx/config file directly. The block for the tips resource looks like:

[scratch-website.tips-l10njson]
source_file = src/views/tips/l10n.json
source_lang = en
type = KEYVALUEJSON

You can also push the file up to transifex manually if you have developer access:

> tx push -r scratch-website.tips-l10njson -s

In scratchr2_translations Translations are actually shared via the scratchr2_translations repository. So new translations resources need to be added there as well. Edit the config file in the scratchr2_translations repo in www/.tx. For a new page called tips you would add:

[scratch-website.tips-l10njson]
file_filter = translations/scratch-website.tips-l10njson/<lang>.json
source_lang = en
type = KEYVALUEJSON

The convention for the resource name is: scratch-website.<view>-<l10n file name without the dot> scratch-website is the id of the project on transifex.

Transifex Language mappings

For the most part Transifex supports locale codes that use the ISO two letter language codes optionally followed by underscore locale. For example, 'es', 'es_MX', 'es_419', 'es_ES', etc. Traditionally Scratch has used only the two letter language code with three letter codes for languages that do not have a two letter code (e.g., Fruilian: fur). However there are a few cases where Scratch uses language + locale. These include places where there are different writing systems such as Chinese and Japanese. It also includes places where there are substantial differences in the language between regions, and we have enough volunteer translators to be able to maintain multiple translations such as Portuguese and Brazilian Portuguese.

There is one additional hitch associated with languages that include locale. While the standards specify that underscore separates the two parts, browser standards specify that hyphen specifies the parts. So the locale in the browser will be specified as 'pt-br', not 'pt_BR'. For this reason when we download the translated files we want them to be named for the browser. Transifex supports this in the transifex config file with the lang_map option:

lang_map = zh_CN:zh-cn, zh_TW:zh-tw, pt_BR:pt-br, es_419:es-419, aa_DJ:aa-dj

When a language is added in transifex with the locale, we need to add it to the lang_map so that the downloaded file is named correctly.


*If you are interested in becoming a translator for Scratch, you can find out more at http://wiki.scratch.mit.edu/wiki/How_to_Translate_Scratch.