Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt globalize for i18n #1494

Closed
mikeal opened this issue Apr 21, 2015 · 15 comments
Closed

Adopt globalize for i18n #1494

mikeal opened this issue Apr 21, 2015 · 15 comments
Labels
discuss Issues opened for discussions and feedbacks.

Comments

@mikeal
Copy link
Contributor

mikeal commented Apr 21, 2015

The jQuery team along with several other people working on i18n an JS standards have put together a new library for i18n.

https://github.com/jquery/globalize

We've had a lot of conversations in the TC about how taking on ICU is "too big" for core and that we would prefer a more modular approach that allowed us to load language support modularly but nobody has written this yet. Globalize would appear to be at least part of this solution.

Thoughts?

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

@srl295

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

My first question would be why? While globalize is a great project, there is already an EcmaScript standard Intl interface that is supported in V8 based on ICU and the Intl stuff has already been switched on in Node v0.12.x. @srl295 has gone to significant lengths to minimize the default footprint of ICU and to modularize the data to make it possible to use npm to install the additional CLDR/ICU data files. Globalize is fantastic to supplement the functionality currently not supported by the EcmaScript Intl API but developers can already make use of that without io.js or node.js having to do anything in core.

@piscisaureus
Copy link
Contributor

👍

Some remarks:

@piscisaureus
Copy link
Contributor

there is already an EcmaScript standard Intl interface that is supported in V8

based on ICU and the Intl stuff has already been switched on in Node v0.12.x.

But it has deliberately not been turned on in io.js, because we weren't happy with the solution.

@srl295 has gone to significant lengths to minimize the default footprint of ICU and to modularize the data to make it possible to use npm to install the additional CLDR/ICU data files.

Although it is now possible to fetch those data files with npm, it hasn't really been modularized. Node needs to be started up with particular command line arguments (or with an environment variable set), by which the ICU data that will be used globally is specified. It's not possible for a module that needs ICU data to load it on demand.

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

Ok, that's fine. Additional modularization is something that can be explored, but let's not discount the work that's been done so far. Incremental improvement is A Good Thing.

So, right now, for very important performance reasons, ICU does a one time initialization of it's data files and memory maps everything. The downside, as you point out, is that the data files have to be specified at start up time, with modules getting whatever they get from the environment.

So let's explore what this "load on demand" would mean...

Globalize currently depends on 'cldr-data', when you npm install cldr-data, it goes out and downloads all the cldr-data....

bash-3.2$ npm install cldr-data
npm WARN package.json a@ No description
npm WARN package.json a@ No repository field.
npm WARN package.json a@ No README data
npm WARN package.json http-problem@0.0.1 No repository field.
\
> cldr-data@27.0.3 install /Users/james/tmp/node_modules/cldr-data
> node install.js

GET `https://github.com/unicode-cldr/cldr-core/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-dates-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-buddhist-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-chinese-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-coptic-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-dangi-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-ethiopic-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-hebrew-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-indian-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-islamic-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-japanese-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-persian-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-cal-roc-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-localenames-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-misc-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-numbers-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-segments-modern/archive/27.0.3.zip`
GET `https://github.com/unicode-cldr/cldr-units-modern/archive/27.0.3.zip`
  [========================================] 17271320/17264454 100% 0.0s
Received 28753K total.
Unpacking it into `./`
cldr-data@27.0.3 node_modules/cldr-data
└── cldr-data-downloader@0.2.2 (progress@1.1.8, q@1.0.1, adm-zip@0.4.4, request-progress@0.3.1, nopt@3.0.1, mkdirp@0.5.0, npmconf@2.0.9, request@2.53.0)
bash-3.2$ 

Only once everything is downloaded, can you load it in a "modular" way by cherry picking exactly which downloaded files to load into memory. Regardless of what you end up pulling in using require, you end up having to download everything. (btw, doing a du -hs node_modules/cldr-data shows 239M)

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

Further, it's not exactly clear how the "load on demand" model would actually work here. Globalize is written the way it is in order to keep from having to load the entire CLDR dataset on a client-side connection. However, when used on the server side in node, we end up downloading the entire set anyway as part of the cldr-data installation (The ecma402 shim is no different in this regard). So it's not exactly clear what the advantage is on the server side. Perhaps you could take a few minutes to draw out how the load on the demand model would / should work?

@rxaviers
Copy link

Globalize currently depends on 'cldr-data'

Nope. Globalize uses whatever CLDR source you provide, not necessarily from cldr-data (note it's listed as a peer dependency, not a direct dependency). Therefore, although you can use cldr-data for convenience, you don't need to. For example, one could use https://github.com/unicode-cldr/ as source.

@rxaviers
Copy link

Further, it's not exactly clear how the "load on demand" model would actually work here

Globalize needs CLDR content to function properly, although it doesn't embed or host such content. Instead, Globalize empowers developers to load CLDR data the way they want. Vanilla CLDR in its official JSON format (no pre-processing) is expected to be provided (via Globalize.load(<json>)). Developers can use up-to-date CLDR data directly from Unicode as soon as it's released, without having to wait for any pipeline on our side.

I'm happy to answer to any Globalize question. Please, just let me know if I can help with something.

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

Ok, that's fair (and thanks for that reminder @rxaviers ). Like I said, better modularization in ICU is definitely something that can be worked on. I'm just not exactly clear what the overall benefit would be by having io.js "adopt" globalize vs. incrementally improving the icu based solution, particularly given the V8 support that already exists and given that there's absolutely nothing stopping developers from already using globalize if they want. In other words, why would io.js need to do anything in core with regards to globalize?

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

(btw, my apologies for misspeaking... I'd actually forgotten that cldr-data was a peer-dependency)

@piscisaureus
Copy link
Contributor

Like I said, better modularization in ICU is definitely something that can be worked on.

It seems that it would require some serious work to libicu such that multiple "instances" can be constructed (as opposed to there being one singleton instance).

Are there any plans to that end? A quick search of the website/mailing list didn't turn up anything (there was some discussion around ICU4J but not for the c++ implementation).

@jasnell
Copy link
Member

jasnell commented Apr 22, 2015

@srl295 would be able to say for certain how much work would be involved
but the "fix" would be to allow multiple core data files, one for each
locale. The --icu-data-dir mechanism already allows multiple paths to be
specified, the challenge is that the ICU data loader stops on the first
core file found. That's the change we'd need to make. It's definitely
something I could look into doing
On Apr 22, 2015 12:55 PM, "Bert Belder" notifications@github.com wrote:

Like I said, better modularization in ICU is definitely something that can
be worked on.

It seems that it would require some serious work to libicu such that
multiple "instances" can be constructed (as opposed to there being one
singleton instance).

Are there any plans to that end? A quick search of the website/mailing
list didn't turn up anything (some discussion around ICU4J but not for the
c++ implementation).


Reply to this email directly or view it on GitHub
#1494 (comment).

@mikeal
Copy link
Contributor Author

mikeal commented Apr 22, 2015

It looks like there's a lot of ecosystem work going around i18n.

  • @srl295 has published icu npm packages
  • there's the existing ecma402 polyfill
  • globalize and its underlying cldrjs
  • new work in the ecma402 group to port some of the features from globalize in to the next version of ecma402.
  • icu bindings to Intl object in node.js 0.12 and 0.13

With so much going on I think it's a bad idea to "pick one." It would probably be best to find a way for developers to bind the library of their choice to Intl in userland. I don't know how doable this is but maybe it's time we ping the v8 team about this.

@rxaviers
Copy link

Definitely, there are. A little more about that farm in https://github.com/rxaviers/javascript-globalization/

@mikeal, please could you describe in a little more detail which i18n support does io.js need?

@Fishrock123 Fishrock123 added the discuss Issues opened for discussions and feedbacks. label Apr 28, 2015
@Fishrock123
Copy link
Contributor

Converging is going to require us to turn on some sort on Intl by default, and since we already have most of that, just not default, and that's where work is going to be, I'm going to close out and defer to #26. Re-open if necessary though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks.
Projects
None yet
Development

No branches or pull requests

5 participants