Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR data as versioned “peer” dependency #18

Closed
rxaviers opened this issue Sep 19, 2014 · 8 comments
Closed

CLDR data as versioned “peer” dependency #18

rxaviers opened this issue Sep 19, 2014 · 8 comments

Comments

@rxaviers
Copy link
Owner

Goal

  1. Libraries should be able to define what CLDR versions (the data, not cldrjs itself) they are compatible with.
  2. Tools should assist on (a) fetching the data and (b) dependency management.

Winner approaches

npm + custom-downloader

Ideal for backend applications.

An npm module that uses a custom downloader. See npm's cldr-data npm module. The implementation of these modules have been inspired by phantomjs.

bower + post-install hook (or grunt task)

Ideal for frontend applications.

A bower module that contains CLDR data zip urls only (really light). It works as follows. A project foo depends on a variety of libraries that have different CLDR data requisites, which define that by using cldr-data bower module in their respective bower.json. When bower install is executed on project foo, it will resolve and flat the cldr-data versions of each dependencies and it will come up with a cldr-data that accommodates them all. A bower postinstall hook (e.g., cldr-data-downloader) or a grunt task (grunt-cldr-data-downloader) can be used in the sequence to download and populate the bower_components/cldr-data skeleton.

See bower's cldr-data.

Unsuccessful approaches

npm mirror

❗ npm fails publishing the whole mirror. See comment below.

An npm module (e.g., cldr-data) contains all the CLDR JSON data. It follows the same version numbers of Unicode CLDR, for example cldr-data v26 has the same data served by http://www.unicode.org/Public/cldr/26/json-full.zip.

Usage, a library defines the cldr-data dependency in its package.json:

"dependencies": {
  "cldr-data": "26"
}

Pros

  • Simplest solution using existing npm.

Cons

  • Big download size. CLDR v26 zipped is 51M big. If the module offers JS (that wraps JSON using cldrjs) along with the JSONs themselves, size is even increased.
  • No flat dependency tree.
npm + cherry-pick fetch

❗ fetching everything remotely takes way too long, see comment below.

An npm module (e.g., cldr-data) that follows the same version numbers of Unicode CLDR, but the module itself has no CLDR data. It has an install.js script that will be executed by npm during installation (the scripts/install directive), which will fetch the needed files during installation. A kinda of variant of phantomjs, see https://gist.github.com/rxaviers/87e089c35d46fd3a1492.

Usage, a library defines the cldr-data dependency in its package.json, plus it needs to define which CLDR data set to fetch.

"dependencies": {
  "cldr-data": "26"
},
"_cldr": {
  "locales": [ "en", "zh", "es", "ar" ]
  "jsons": [
    "main/ca-gregorian",
    "supplemental/likelySubtags"
  ]
}

Pros

  • Saner CLDR data download size;

Cons

  • The need to whitelist CLDR sets. Obviously, we could also allow blacklists with "!", e.g. "!supplemental".
  • No flat dependency tree.

Question

  • Is "_cldr" property on package.json the best place to keep that information?
bower mirror

❗ installing a whole mirror works. But, it takes awhile. Tedious. See comment below.

A cldr-data repository that contains all the CLDR JSON data. It follows the same version numbers of Unicode CLDR, for example cldr-data v26 has the same data served by http://www.unicode.org/Public/cldr/26/json-full.zip.

Usage, a library defines the cldr-data dependency in its bower.json:

"dependencies": {
  "cldr-data": "26"
}

Pros

  • Simplest solution using existing bower.

Cons

  • Big download size. CLDR v26 zipped is 51M big. If the module offers JS (that wraps JSON using cldrjs) along with the JSONs themselves, size is even increased.
@rxaviers
Copy link
Owner Author

I've implemented a proof of concept for the "npm (custom fetch)" approach:

https://gist.github.com/rxaviers/87e089c35d46fd3a1492.

The script fetches an initial url (e.g., http://www.unicode.org/repos/cldr-aux/json/26/) and starts crawling content to seek for other URLs filtered by a glob pattern (e.g., http://www.unicode.org/repos/cldr-aux/json/26/main/*/numbers.json, or http://www.unicode.org/repos/cldr-aux/json/26/**/numbers.json).

Conclusion, the content crawling is pretty quick. Although, making multiple requests to crawl and fetch the above content takes way too long. In terms of speed, the simpler approach (that fetches the whole set) is much better.

@rxaviers
Copy link
Owner Author

I've mirrored the whole CLDR JSON v26 into a github repository. Then, I tried to publish it to an npm module. But, it failed:

util.js:35
  var str = String(f).replace(formatRegExp, function(x) {
                      ^
RangeError: Maximum call stack size exceeded

Trying to fetch the full mirror via bower works. But, it's tedious.

@rxaviers
Copy link
Owner Author

I've just created a CLDR JSON downloader https://github.com/rxaviers/cldr-data-downloader

@raphamorim
Copy link

@rxaviers, this CLDR JSON downloader repository replace that script function ?

Plus: this module is working fine to me

@rxaviers
Copy link
Owner Author

@raphamorim yeap. That script is my initial attempt to cherry-pick the files. This is, a custom downloader. But, that didn't work well.

@raphamorim
Copy link

@rxaviers, I've only tested your guide in readme. The goal is when run this module, he auto identify and download the defined version in the package.json ?

@rxaviers
Copy link
Owner Author

Both cldr-data and cldr-data-full npm modules have been created. They address the goal of this issue as follows.

  1. Libraries should be able to define what CLDR versions (the data, not cldrjs itself) they are compatible with.

On an i18n library, define which CLDR versions it's compatible with using its package.json.

"dependencies": {
  "cldr-data": ">26"
}
  1. Tools should assist on (a) fetching the data and (b) dependency management.

The appropriate CLDR JSON data will be fetched with npm install.

Node.js users can access the data by using require("cldr-data").

var cldr = require("cldr-data");
var plurals = cldr("supplemental/plurals");

It's ideal to use cldr-data in conjunction with cldrjs.

var Cldr = require("cldrjs");
var cldr = require("cldr-data");

Cldr.load(cldr("supplemental/plurals"));

More info see README.

@rxaviers
Copy link
Owner Author

Comparing installation times of the core coverage. Note the full coverage makes using github mirrors unusable.

method time
npm mirror 3m57.674s
bower mirror 1m1.001s
npm + custom-downloader 0m8.958s
bower + custom-downloader 0m9.506s

Follow below the output I got running each command. Feel free to execute them yourself.

npm mirror

$ time npm install rxaviers/cldr-data#b0.0.1
cldr-data@0.0.1-alpha.3 node_modules/cldr-data

real    3m57.674s
user    3m27.044s
sys 1m14.285s

bower mirror

$ time bower install rxaviers/cldr-data#b0.0.1
[?] May bower anonymously report usage statistics to improve the tool over time?[?] May bower anonymously report usage statistics to improve the tool over time? No
bower not-cached    git://github.com/rxaviers/cldr-data.git#b0.0.1
bower resolve       git://github.com/rxaviers/cldr-data.git#b0.0.1
bower checkout      cldr-data#b0.0.1
bower invalid-meta  cldr-data is missing "main" entry in bower.json
bower invalid-meta  cldr-data is missing "ignore" entry in bower.json
bower resolved      git://github.com/rxaviers/cldr-data.git#1aeff0b182
bower install       cldr-data#1aeff0b182

cldr-data#1aeff0b182 bower_components/cldr-data

real    1m1.001s
user    0m46.573s
sys 0m15.770s

npm + custom-downloader

$ time npm install cldr-data
\
> cldr-data@26.0.4 install /tmp/x/node_modules/cldr-data
> node install.js

GET `http://www.unicode.org/Public/cldr/26/json.zip`
  [========================================] 100% 0.0s
Received 3425K total.
Unpacking it into `./json`
cldr-data@26.0.4 node_modules/cldr-data
└── cldr-data-downloader@0.1.0 (progress@1.1.8, q@1.0.1, request-progress@0.3.1, nopt@3.0.1, mkdirp@0.5.0, adm-zip@0.4.4, npmconf@2.0.9, request@2.44.0)

real    0m8.958s
user    0m7.022s
sys 0m0.978s

bower + custom-downloader

Requires setting up .bowerrc.

$ time bower install cldr-data
bower not-cached    git://github.com/rxaviers/cldr-data-bower.git#*
bower resolve       git://github.com/rxaviers/cldr-data-bower.git#*
bower download      https://github.com/rxaviers/cldr-data-bower/archive/26.0.2.tar.gz
bower extract       cldr-data#* archive.tar.gz
bower invalid-meta  cldr-data is missing "ignore" entry in bower.json
bower resolved      git://github.com/rxaviers/cldr-data-bower.git#26.0.2
bower preinstall    npm install cldr-data-downloader
bower install       cldr-data#26.0.2
bower postinstall   node ./node_modules/cldr-data-downloader/bin/download.js -i bower_components/cldr-data/index.json -o bower_components/cldr-data/

cldr-data#26.0.2 bower_components/cldr-data

real    0m9.506s
user    0m7.855s
sys 0m1.135s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants