Intl

Steven R. Loomis edited this page Dec 15, 2015 · 2 revisions
Clone this wiki locally

What is Intl?

EcmaScript 402 describes the global Intl (short for Internationalization) object and other related functions and functionality.

Node.js (or more properly, the v8 engine) uses ICU4C to implement this Intl support in native C/C++. ICU's source is not included with Node's source repository or source distributions.

There are multiple ways to build Node with ICU. This page applies to v0.11.15+ and v0.12+.

Building

Building with a pre-installed ICU (system-icu)

Node can link against an ICU which is already installed on your system, either via make install in the ICU directory or via a package manager. pkg-config is used to locate ICU.

This option is not available using Windows.

Install ICU

.. From a package manager

Run one of these or similar as appropriate for your system:

 apt-get install libicu-dev
 yum install libicu-dev

.. From source

  • Download ICU source http://icu-project.org/download

  • Follow the enclosed readme.html to build ICU, particularly paying attention to the --prefix argument

  • build ICU and then:

    make install

Verify that ICU is installed

 pkg-config --modversion icu-i18n

If this command fails, node will not be able to find the installed ICU. Verify that the PKG_CONFIG_PATH points to the newly installed icu-i18n.pc file

Configure node with system-icu

 ./configure --with-intl=system-icu

Building Node with an embedded ICU

This section describes how to compile ICU as part of the Node build process. ICU is typically statically linked into Node and thus there are no further external dependencies.

full-icu vs small-icu

There are two different --with-intl modes with which to build ICU.

full-icu includes all locales which are available in the particular release of ICU.

small-icu includes a specific subset, typically only "English" support", but still supports the full API.

Configure Node with auto downloading

If the --download=all option is used ( download-all on Windows ), the full-icu and small-icu modes will attempt to download ICU's source from the Internet if it was not otherwise present. If you omit the download option, Node will not attempt to download ICU. The downloaded ICU will be located in the deps/icu directory.

Unix/Macintosh: (either small or full ICU) choose one of:

./configure --with-intl=small-icu --download=all
./configure --with-intl=full-icu --download=all

Windows:

vcbuild small-icu download-all
vcbuild full-icu download-all

Configure Node with specific ICU source

You can find other ICU releases at the ICU homepage. Download the file named something like icu4c-**##.#**-src.tgz (or .zip).

Unix/Macintosh: from an already-unpacked ICU - you can omit --with-icu-source if you have unpacked ICU into deps/icu.

./configure --with-intl=[small-icu,full-icu] --with-icu-source=/path/to/icu

Unix/Macintosh: from a local ICU tarball

./configure --with-intl=[small-icu,full-icu] --with-icu-source=/path/to/icu.tgz

Unix/Macintosh: from a tarball URL

./configure --with-intl=[small-icu,full-icu] --with-icu-source=http://url/to/icu.tgz

Windows: first unpack latest ICU to deps/icu icu4c-##.#-src.tgz (or .zip) as deps/icu (You'll have: deps/icu/source/...)

vcbuild.bat small-icu|full-icu

Using and customizing the small-icu build

  • If you use the "small-icu" option, you can provide additional data at runtime.
    • Two methods:
      • The NODE_ICU_DATA env variable: env NODE_ICU_DATA=/some/path node
      • The --icu-data-dir parameter: node --icu-data-dir=/some/path
    • Example: If you use the path /some/path, then ICU 53 on Little Endian (l) finds:
    • Notes:
      • See u_setDataDirectory() and the ICU Users Guide for many more details.
      • "53l" will be "53b" on a big endian machine.
  • With the small-icu mode, you can also choose different locales than "English only" as arguments to configure. For example, --with-icu-locales=de,zh,fr will include only German, Chinese and French but not English. The http://apps.icu-project.org/datacustom/ page will list currently available locale IDs. (not available on Windows).

Building using Chromium's ICU

Note: not recommended! This build is missing some locales, is an older revision, and has a larger output size. It is documented here for completeness.

svn checkout --force --revision 214189 \
    http://src.chromium.org/svn/trunk/deps/third_party/icu46 \
    deps/v8/third_party/icu46
./configure --with-icu-path=deps/v8/third_party/icu46/icu.gyp
make
make install

Verifying an Intl build

  • node test/simple/test-intl.js is a built-in unit test of basic functionality.
  • btest402.js is a very basic but verbose test of whether Intl is built correctly.

Using Intl build

Updating

Updating Timezone data

From the ICU documentation:

Time zone data changes often in response to governments around the world changing their local rules and the areas where they apply. ICU derives its tz data from the IANA Time Zone Database.

The ICU project publishes updated timezone resource data in response to IANA updates, and these can be used to patch existing ICU installations. Several update strategies are possible, depending on the ICU version and configuration.

As the ICU Userguide states, it is possible to update time zone data (when ICU 54 and following is used) by:

  • setting the ICU_TIMEZONE_FILES_DIR variable to point to some directory, such as /timezones
  • Download the .res files from the appropriate subdirectory of the ICU TZ site (from the 44/le directory for little endian machines or the 44/be directory for big endian machines) to the /timezones directory
  • On node's next restart, it will use the .res files from the ICU_TIMEZONE_FILES_DIR variable to get the latest timezone data.

FAQ

Q: Why all of the options about configuration? What's the big deal?

ICU does a lot, and while it is highly customizable, by default the source code, object code, and data sizes are substantial. Therefore, it has had different treatment than other node.js dependencies to date.

Q: What do all of the build options mean?

Note that node.js builds statically by default.

  • --with-intl=none - this is the default as of v0.11.16. The Intl object and features aren't included.
  • --with-intl=system-icu - use pkg-config (.pc files) to locate an existing ICU that is already installed somewhere, and link to it (probably dynamically). This is a great option if you can do it, because it means you can share the ICU instance with other apps on your system.
  • --with-intl=full-icu - Take all of ICU, and make use of at least everything that Intl currently supports. Specifically, all locales that ship by default in ICU.
  • --with-intl=small-icu - Use a reduced set of ICU locales, by default just English. node.js is then monolingual, but:
    • the full API is available, so you can get started writing to Intl.
    • you can "side-load" a compatible ICU data file with a runtime or environment variable.
  • --with-icu-path - Not recommended unless you specifically want to use Chromium's ICU.
  • --download=all / --download=icu - one of these options is needed to make configure go out and download ICU for you. (The configure options are listed, but the Windows vcbuild.bat options are similar.)

Q: By how much do the various options affect the on-disk sizes?

System tested:

  • cpu: x86_64
  • os: RHEL 6.6
  • node.js tag: v0.11.16

Results:

  • 16M ./configure --ninja --with-intl=none this is the default
  • 21M ./configure --ninja --with-intl=small-icu --download=all this is how v0.11.16 ships
    • ("side" data file is 25M)
  • 44M ./configure --ninja --with-intl=full-icu --download=all but this could be reduced in the future, see: #8979

Q: How do the pieces fit together? / How do I know who is responsible for the bug?

This is how the different pieces fit together to provide Intl support.

  • Node.js
    • configure / vcbuild.bat options for choosing ICU options
  • v8
    • receives ICU options
    • Implements Intl object (where?) and other functions
    • calls into ICU
  • ICU4C
    • implements NumberFormat, etc functions. Locale Data sourced from CLDR
    • implements normalization etc from Unicode data
  • CLDR
    • source of locale data (i.e. how do you spell Tuesday in Spanish)
  • Unicode
    • source of UCA (root/DUCET) collation
    • source of normalization data
    • source of character encoding