Skip to content
This repository has been archived by the owner on Feb 26, 2022. It is now read-only.

HTML Page Localization

ochameau edited this page Mar 27, 2012 · 9 revisions

Introduction

Add-on SDK-based addons often include HTML pages displayed in UI elements such as panels, widgets, and tabs. Such content typically includes locale-specific text. And addons are frequently used in multiple locales. So it should be possible for SDK-based addons to localize their pages.

The Mozilla platform supports the localization of XUL documents and XHTML pages via DTDs and entity references. It also supports the localization of text in scripts via .properties files and keys. But it doesn't support the localization of HTML pages.

Bug 691782 implements support for the localization of addon main programs via an l10n CommonJS module and .properties files. This proposal builds on and complements that one by adding two APIs for the localization of HTML pages.

Proposal

DOM Localization

For localization of static text in pages, we watch all HTML documents used in our addon. Each time a document is displayed, we search for all DOM nodes having a data-l10n-id attribute and replace their text content with a string fetched from .properties files. We take the value whose key is equal to the data-l10n-id attribute.

This operation has to be done before the document is displayed and only after DOM tree is loaded. We start by hiding the document as soon as it is created (on document-element-inserted event, that fires when the HTML parser is done loading the document), then we wait for DOMContentLoaded event before processing the DOM tree. Finally, when we are done, we unhide the document. In order to hide/unhide the document we can simply use visiblity css attribute on document root element. It has the benefit to allow displaying the document background color until the end of this process. (bug 737003 ask for some additional CSS feature in order to avoid messing with document CSS.)

Here is an HTML document example:

<html>
  <head>
    <title data-l10n-id="title"></title>
  </head>
  <body>
    <p data-l10n-id="greeting"></p>
  </body>
</html>

Given this English .properties file in a browser set to the en-US locale:

title = Simple Page
greeting = Hello, world!

The page would be processed to:

<html>
  <head>
    <title data-l10n-id="title">Simple Page</title>
  </head>
  <body>
    <p data-l10n-id="greeting">Hello, world!</p>
  </body>
</html>

[Alternately, we can use html content instead of text content, but we will have to implement and execute some processing in order to avoid XSS. I'd prefer keeping text content by default and offer an HTML feature in a next iteration. We can for example specify in .properties files that we want to use HTML by using #html suffix after the key. Like: greetings#html = Hello, <b>John</b> instead of greetings = Hellow, John]

[Alternative DOM attribute could be l10n-id instead of data-l10n-id. The pro of l10n-id is that it could become a standard if l20n ends up keeping this name and succeed in making an official standard. Whereas data-l10n-id is an already valid HTML attribute.]

Static Localization

The static/template localization was the initial proposal. It ended up being challenging to implement due to bug 736046. In parallel, some Mozilla projects (l20n and Boot2Gecko) aimed to implement the same feature, but with a DOM approach. We finally moved our focus in order to have a common way of translating HTML.

For localization of static text in pages, we integrate a localization processor into the addon:// protocol handler, which is being implemented as part of the Add-on Pages API in bug 644595. The localization processor parses pages while they are being loaded and replaces property references (i.e. references to locale-specific text strings in .properties files) with their localized text referents. Property references are property keys delimited by dollar-sign-prefixed curly braces (${key}), and their referents are stored in the same .properties files that support localization of addon main programs.

[Alternative delimiter styles under consideration: mustaches ({{key}}), character entities (&key;). One consideration is avoiding conflicts with client-side template processors like jQuery Templates and mustache.js. We might make it possible to disable localization if we find that users encounter conflicts we cannot easily resolve, although we must work hard to minimize the risk of conflicts, since we want addons to be localizable, and addon developers want that too.]

[Alternately, we could make the processor identify all text nodes (and localizable attributes, like the alt attribute to the <img> tag) and use their values as keys automatically, so users wouldn't have to specify delimited keys. It is unclear how feasible or preferable this alternative is.]

For example, the following page includes two references:

<html>
  <head>
    <title>${title}</title>
  </head>
  <body>
    ${greeting}
  </body>
</html>

Given this English .properties file in a browser set to the en-US locale:

title = Simple Page
greeting = Hello, world!

The page would be processed to:

<html>
  <head>
    <title>Simple Page</title>
  </head>
  <body>
    Hello, world!
  </body>
</html>

The locale processor's parser is context-aware and escapes the text it inserts into the page using something like the approach described in Using type inference to make web templates robust against XSS.

Keys can be the localized strings themselves, and property references whose keys are not found in a .properties file are replaced with the key itself (i.e. <title>${Simple Page}</title> becomes <title>Simple Page</title> if there is no Simple Page key in the .properties file), supporting gettext-style localization.

Dynamic Localization of Text

For dynamic localization of text in pages, i.e. plaintext strings that are inserted into the DOM of a page by a page script after the page is loaded, the SDK injects an API into the pages that is like the one provided by the l10n CommonJS module for addon main programs.

[Alternative: give pages access to CommonJS modules and make the API available via the l10n CommonJS module.]

For example, given the .properties file mentioned previously, the statement:

alert(_("greeting"));

Would display an alert dialog with the text:

Hello, world!

References Within References

Property values are processed recursively, like entity values in DTD files, so they can embed references to other properties via the same syntax as the pages themselves.

For example, given the .properties file:

appName = Bamboozle
thankYou = Thank you for using ${appName}!

The statement:

alert(_("thankYou"));

Would display an alert dialog with the text:

Thank you for using Bamboozle!

[Alternative: rely on the conventional mechanism for embedding references in property values, which uses opaque identifiers like %S whose semantics can be specified via comments that are sometimes structured to facilitate machine readability. But parsing those comments is complex and brittle, and relying on opaque identifiers would complicate the JSON format by which localizations are shipped in addon XPIs.]

Non-goals

  1. Dynamic Localization of HTML: processing HTML content with locale-specific plaintext strings that is dynamically inserted into a page, à la jQuery Templates (unlike dynamic localization of the plaintext strings themselves, which is covered in the Dynamic Localization of Text section above). This doesn't seem essential for the initial implementation, although it may prove useful to implement in a later phase of development.
  2. Generic Template Processing: this proposal aims, instead, for interoperability with third-party template processors.

References