Skip to content
This repository has been archived by the owner on Jan 19, 2022. It is now read-only.

Commit

Permalink
Spell check is for the week
Browse files Browse the repository at this point in the history
  • Loading branch information
ozten committed Mar 19, 2013
1 parent 9ecf128 commit babd1cd
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 50 deletions.
20 changes: 11 additions & 9 deletions localization/localization_part_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Mozilla provides products and services which are localized into as many as 90 la
The following are just a few examples of localization:
* Providing copy translated into a specific regional variation of a language
* Rendering a screen right to left for a given language
* Bulletproofing designs to accomodate variable length copy
* Bulletproofing designs to accommodate variable length copy
* Making labels, headings, and buttons have names that resonate with a local audience

In this series of posts, I'm going to cover some technical aspects of how to localize a Node.js service.
Expand Down Expand Up @@ -51,7 +51,7 @@ In your code
translation_directory: 'static/i18n'
}));

We will look at the configuration values in detail during the third intallment of this L10n series.
We will look at the configuration values in detail during the third installment of this L10n series.

The i18n `abide` middleware sets up request processing and injects various functions we'll use for translation.

Expand All @@ -74,10 +74,12 @@ It will be either `ltr` or `rtl`. The English language is rendered `ltr` or left

`gettext` is a JS function which will take an English string and return a localize string, again based on the user's preferred region and language.


When doing localization, we refer to **strings** or Gettext strings.
These are peices of copy, labels, button, etc.
Any prose that is visible to the end user is a String.
These are pieces of copy, labels, button, etc.
Any prose that is visible to the end user is a string.

Technically, we don't mean JavaScript String, as you can have strings which are part of your program, but never shown to the user.
String is overloaded to mean, stuff that must get translated.

Here is an example JavaScript file:

Expand All @@ -87,15 +89,15 @@ Here is an example JavaScript file:
});
});

We can see that these variables and functions are placed in the `req` object.
We can see that these variables and functions (like `gettext`) are placed in the `req` object.

So to setup our site for localization, we must look through all of our code and templates and wrap strings in calls to gettext.
So to setup our site for localization, we must look through all of our code and templates and wrap **strings** in calls to `gettext`.

## Language Detection

By setting up the i18n-abide module, we've actually installed a new peice of middleware.
By setting up the i18n-abide module, we've actually installed a new piece of middleware.

At runtime, the middleware will detect the user's prefered locale.
At runtime, the middleware will detect the user's preferred locale.
It will look at it's configuration to find the best language match.
It will then output "Hello, World!" localized to one of your supported languages or default to English.

Expand Down
45 changes: 25 additions & 20 deletions localization/localization_part_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,19 @@ We wrapped strings in templates files as well as JavaScript files with calls to
A goal for Mozilla Persona's Node.js code is to be compatible with the larger Mozilla community, while being Node friendly and flexible.

The Mozilla project is over a decade old.
It's had one of the bigest (and coolest) l10n communities in Open Source.
As a result, it has many existing tools and the lowest levels are old crotchety tools.
It's had one of the biggest (and coolest) L10n communities in Open Source.
As a result, it has many existing tools and those at the lowest levels are sometimes old *crotchety* tools.

### Gettext
GNU Gettext is a toolchain that allows you to localize copy and other Strings from webapps or native apps. These are called Strings (after the C name for ... strings). When you write your Node.js code and templates, you put English Strings<sup>[1]</sup> in like normal, but you wrap them in a functional call to gettext.
GNU Gettext is a toolchain that allows you to localize copy and other strings from webapps or native apps. These are called strings (after the C name for ... strings). When you write your Node.js code and templates, you put English strings<sup>[1]</sup> in like normal, but you wrap them in a functional call to `gettext`.

Wrapping with `gettext` does a few different things for you:
* As a build step, you can extract all the Strings
* At runtime, the gettext function replaces the English String with a localize string
* As a build step, you can extract all the strings into a string catalog
* At runtime, the gettext function replaces the English string with a localize string

This build step builds a catalog of Strings from your code and template files.
This build step is how we'll create a catalog of strings from your code and template files.

All these Strings end up in text files that end with the `.po` file suffix. I'll refer to these as PO files.
All these strings end up in text files that end with the `.po` file suffix. I'll refer to these as **PO files**.

### PO Files

Expand All @@ -34,21 +34,24 @@ An example snippet of a PO file named zh_TW/LC_MESSAGES/messages.po:
msgid "Persona preserves your privacy"
msgstr "Persona 保護您的隱私"

We'll examine this in more detail below, but we can see that `msgid` is the English String and `msgstr` has the Chinese translation. There are comments in the file for where in the codebase the String is used.

There are many other tools that Gettext provides, for managing Strings, PO files, etc. We'll cover these in a bit.
We'll examine this in more detail below, but we can see that `msgid` is the English String and `msgstr` has the Chinese translation. There are comments in the file (anything starting with `#`).
The comment above shows the location in the codebase the string is used.

There are many other tools that GNU Gettext provides, for managing Strings, PO files, etc. We'll cover these in a bit.

## Why a new toolchain?
Before we get into the Node modules that make working with Gettext easy, we must ask ourselves... why this toolchain?

A year ago I did a deep survey of all the Node l10n and i18n modules. Most "reinvent the wheel", creating their own JSON based formats for storing Strings.
A year ago I did a deep survey of all the Node L10n and I18n modules.
Most "reinvent the wheel", creating their own JSON based formats for storing Strings.

In order to work with our community, we must use PO files. They have many tools such as [POEdit](http://www.poedit.net/), [Verbatim](https://localize.mozilla.org/), [Translate Toolkit](https://github.com/translate/translate), and [Pootle](https://github.com/translate/pootle)
In order to work with the Mozilla community, we must use PO files.
They have many tools such as [POEdit](http://www.poedit.net/), [Verbatim](https://localize.mozilla.org/), [Translate Toolkit](https://github.com/translate/translate), and [Pootle](https://github.com/translate/pootle).

So our basic constraint is to create a solution that uses `PO` files, which is how we'll tell our localizers what all of our strings are and how they will give us the finished translations.

Coming from PHP and Python at Mozilla, I've found that Gettext works very well. As a web service gets large and has more copy, there are many nuances of localizing copy that require the well tested API of gettext.
Coming from PHP and Python at Mozilla, I've found that Gettext works very well.
As a web service gets large and has more copy, there are many nuances of localizing copy that require the well tested tools and APIs of gettext.

## Providing PO Files to localizers

Expand All @@ -59,9 +62,9 @@ This person or persons can be you, a localization expert, or a build system guru
So what does a String wrangler do?

* First time extraction of Strings from the software
* Extracting new, changed, or detecting deleted Strings in later releases
* Extracting new, changed, or detecting deleted strings in later releases
* Preparing the PO files for each localizer team
* Resolving conflicts and marking Strings which have changed or been deleted
* Resolving conflicts and marking strings which have changed or been deleted

This may sound complicated, but the good news is that only the String wrangler has to worry about this problems that crop up.
These steps can be automated.
Expand Down Expand Up @@ -162,17 +165,18 @@ Here is a sample file system layout:

You can give your localizers access to this part of your codebase.
The Spanish team will need access to `locale/es/LC_MESSAGES/messages.po` for example.
If you have a really big team, you might have `es-ES` for Spain's Spanish and `es-AR` for Argentian Spanish, instead of just a base `es` for all Spanish locales.
If you have a really big team, you might have `es-ES` for Spain's Spanish and `es-AR` for Argentinian Spanish, instead of just a base `es` for all Spanish locales.

You can grow the number of locales over time.

### Merging String changes

Over time, you'll add new Strings and change or delete others. You'll need to update all of the PO files with these changes.
Release after release, you'll add new Strings and change or delete others.
You'll need to update all of the PO files with these changes.

Gettext has powerful tools to make this easy.

We provide a wrapper shell script called `merge_po.sh` which uses `msgmerge` under the covers.
We provide a wrapper shell script called `merge_po.sh` which uses GNU Gettext's `msgmerge` under the covers.

Let's put the i18n-abide tools in our path:

Expand All @@ -186,7 +190,7 @@ And run a String merge:
Just like the first time... `extract-pot` grabs all the Strings and updates the POT file. Next `merge_po.sh` updates each locale's PO file to match our codebase. You can now ask your L10n teams to localize any new or changed Strings.

### Gettext versus Not Invented Here
It is easy enough to throw out Gettext and re-invent the wheel using an invented JSON format.
It is easy enough to throw out Gettext and re-invent the wheel using a new JSON format.
This is the strategy that most node modules take.
If you have a healthy application, as you add locales and develop new features, you will find yourself frustrated by a thousand paper cuts.
Without `merge_po.sh`, you'll have to write your own merge tools.
Expand All @@ -199,6 +203,7 @@ Gettext offers a powerful merge feature, which will save us many painful hours o
Now that we have various catalogs of strings in a po file per locale, we can hand these off to our localization teams.

It is always a good idea to talk to the localizers before you start the extract / merge steps.
You can read Gettext tutorials, as they are all comptaible with our setup.
Give them a heads up on when the PO files will be ready, how many strings they have, and when you'd like to hav the localization finished by.
Also, you can read Gettext tutorials, as they are all compatible with our setup.

Okay, go get your Strings translated and in the next installment, we'll put them to work!
46 changes: 25 additions & 21 deletions localization/localization_part_3.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ Typically in a file system like this:
LC_MESSAGES
messages.po

We need a way to get strings in our PO files into our app at runtime. There are a few ways you can do this.
We need a way to get strings in our PO files into our application at runtime. There are a few ways you can do this.

The first way, is to have **server side strings** and the `gettext` function provided by i18n-abide will work it's magik.
The first way, is to have **server side strings** and the `gettext` function provided by i18n-abide will work it's magic.

The second way, is to hav e**client side strings** and you'll include a gettext.js script in your code.
The second way, is to have **client side strings** and you'll include a gettext.js script in your code.
This is distributed with i18n-abide.

**Both of these methods require** the strings to be in a **JSON** file format.
The server side translation loads them on app startup, and the client side translation loads them via HTTP (or you can put them into your built and minified JavaScript).
The server side translation loads them on application startup, and the client side translation loads them via HTTP (or you can put them into your built and minified JavaScript).


Since this system is compatible with GNU Gettext, a third option for server side strings is to use [node-gettext](https://github.com/andris9/node-gettext). It's quite efficient for doing server side translation. We'll use the first option in this post.
Expand Down Expand Up @@ -58,13 +58,13 @@ And we get a file structure like:

The `static` directory is exposed to web traffic, so a request to `/i18n/es/messages.json` would get the Spanish JSON file.

You can do this via Node.js or a webserver such as `nginx`.
You can do this via Node.js or a web server such as `nginx`.

## Config
## Configuration

`i18n-abide` requires some configuration to decide which languages are supported and to know where to find our JSON files.

As we saw in the first installment, here is the required configuration for our app
As we saw in the first installment, here is the required configuration for our application

app.use(i18n.abide({
supported_languages: ['en-US', 'de', 'es', 'zh-TW'],
Expand All @@ -80,7 +80,7 @@ We mentioned in the first post that i18n-abide will do it's best to serve up an
But, how do we know what the user's preferred language is?

The i18n-abide module looks at the `Accept-Language` HTTP header.
This sent by the browser and includes all of the user's preferred languages with a preference order.
This is sent by the browser and includes all of the user's preferred languages with a preference order.

i18n-abide processes this value and compares it with your app's `supported_languages`.
It will make the best match possible and serve up that language.
Expand All @@ -90,37 +90,37 @@ If it cannot find a good match, it will serve up the strings you've put into you

## Start you engines

Okay, now that configs are in place, we have atleast one locale transliated, let's fire it up!
Okay, now that configs are in place and we have at least one locale translated, let's fire it up!

npm start

In your web browser, change your preferred language.
In your web browser, change your preferred language to one which you have localized.

![](lang_selection.png)

Now load a page for your application. You should see it localized now.
Now load a page for your application. You should see it translated now.


![](dialog-greek.png)

Here is Mozilla Persona in **Greek**. So, cool!

Screenshot zh-TW.
![](dialog-greek.png)

### gobbledygook

If you want to **test** your L10n setup, **before you have real translations** done, we're built a great test locale.
It is inspired by David Bowie's Labrythn.
It is inspired by David Bowie's Labyrinth.

To use it, just add `it-CH` or another locale you're not currently using to your config under both `supported_languages` as well as the **debug_lang** setting.

Partial config showing `it-CH` is used in supported_languages and debug_lang.
Example partial config:

app.use(i18n.abide({
supported_languages: ['en-US', 'de', 'es', 'zh-TW', 'it-CH'],
debug_lang: 'it-CH',
...

Now if you set your browser's preferred language to Italian/Switzerland, i18n-abide will use gobbledygook to localize the content.
Now if you set your browser's preferred language to Italian/Switzerland (it-CH), i18n-abide will use gobbledygook to localize the content.

![](it-CH-chooser.png)

Expand All @@ -139,7 +139,8 @@ Here is a heads up on a few more topics.

i18n-abide provides a `format` function which can be used in client or server side JavaScript code.

Format takes a formatted string. This function can be used in one of two flavors of parameter replacements.
Format takes a formatted string and replaces parameters with actual values at runtime.
This function can be used in one of two flavors of parameter replacements.

Formats
* %s - `format` is called with a format string and then an array of strings. Each will be replaced in order.
Expand Down Expand Up @@ -171,28 +172,31 @@ Reasons to use format:
The named parameters are nice, in that they are self documenting.
The localizer knows that the variable is a URL.

String interpolation is quite common in localizaing software.
String interpolation is quite common in localizing software.

Another example is runtime data injected into your strings.

<p>{{format(gettext('Welcome back, %(user_name)s'), {user_name: user.name})}}</p>

## Avoid Inflexible Design

We need to put our L10n hats on as early as when we review the initial graphic design of the website.
We need to put our L10n hats on early.
As early as when we review the initial graphic design of the website.

Avoid putting copy into images. Use CSS to keep words as plain text positioned over images.

Make sure [CSS is bulletproof](). An English word in German can be many times larger and destroy a
poorly planned design.

Database backed websites have already taught us to think this way, but designers may not be used to
Database backed websites have already taught us to think about systematic design way, but designers may not be used to
allowing for variable length labels or buttons.

Overly "tight" or clever designs simply will not work in a localized context.


## String Freeze

Remember our build step to prepare files for localizers to transalte?
Remember our build step to prepare files for localizers to translate?
And in this post we learned about `po2json` for using these strings in our app...
Well, this means we're going to need to coordinate our software releases with our L10n community.

Expand Down

0 comments on commit babd1cd

Please sign in to comment.