Fixes #66 Manage CLDR as peer dependency #68

rxaviers · 2014-09-25T18:42:19Z

Ongoing summary:

Manage CLDR data as peer dependency
Avoid duplicating CLDR data / avoid mirroring CLDR data in here
- Test requires sibling projects using specific file names or it fails #73 -- @rxaviers
- Fix failing tests -- @JCEmmons
- Have CLDR to include supplemental/aliases.json or supplemental/localeAliases.json via http://unicode.org/cldr/trac/ticket/8040 -- @JCEmmons
- post-install hook to filter all the fields required by a given library (cldr/config now lives in cldr-extra/config)
  - define the filtering syntax. (needs review)
  - implement it.
- There is also (at least) one hardcoded list of locales in ecma402 tests that may also need to be changed.

Follow up:

Related question, json-full.zip brings duplicate identical data, e.g., en and en-US. They are identical given likelySubtags. Also, removing en-US likely subtags gives en. @JCEmmons, please correct me if I'm wrong. Why are they included? Shouldn't en suffice? This also applies to pt vs. pt-BR and probably a bunch of others.

Manage CLDR data as peer dependency

Since ibm-js/ecma402 already suggests bower usage to manage and install dependencies, the first change this PR does is to also manage cldr-data using the same tool.

// bower.json
{
  "dependencies": {
    "cldr-data": ">= 25",
    ...
  }
}

At this point, user is able to spot conflicting cldr-data peer dependency between this and other i18n libraries that leverage cldr-data (for example jquery/globalize, which has parsing methods that complements ecma402).

Avoid duplicating CLDR data everywhere

The second step this PR could take, this is optional, is to rely on the above bower setup to avoid having CLDR data embedded in this library.

The benefits of this change are:

Never again have the trouble to mirror the CLDR JSON data yourself. Bower can
be used for that. By you (developing it), or by users (using it).
Avoid having CLDR data duplicated on each i18n library. Using the same example above (end application that depends on both ecma-402 and globalize), having both libraries to use the same peer dependency avoids having the same data duplicated on both libraries. It's avoided to have the same CLDR data on the two libraries. Yeap, I'm being repetitive. You see, duplication is awful :P. Note it's not only a matter of saving disk space during development. But, it's also about flattening the dependencies when serving to the client.

The changes involved here are:

rm -Rf cldr

//.bowerrc
{
  "scripts": {
    "preinstall": "npm install cldr-data-downloader",
    "postinstall": "./node_modules/cldr-data-downloader/bin/download.js -i bower_components/cldr-data/index.json -o bower_components/cldr-data/"
  }
}

require.config({
  path: {
    cldr: "bower_components/cldr-data"
  }
});

Changes of the form:

- "requirejs-text/text!./cldr/supplemental/timeData.json",
+ "requirejs-text/text!cldr/supplemental/timeData.json",

Avoid duplicating CLDR logic (and save runtime memory)

This is about a step further and it's about our both projects using the same foundation low level CLDR traverser cldrjs. But, I leave this for a next iteration after we talk about the above.

Obviously, my PR isn't complete. I want to discuss before implementing.
Fixes #66

cjolif · 2014-09-26T09:48:08Z

@rxaviers this is definitely going in an interesting direction. However if I'm not mistaken @JCEmmons is post-processing the CDLR for ecma402 use (see https://github.com/ibm-js/ecma402/tree/master/cldr/config), so how do you envision this to work here? I suppose this would require yet another post-install hook?

rxaviers · 2014-09-26T12:59:10Z

The post-processing described in cldr/config has the following goal, as I've interpreted it: avoid user to download unnecessary CLDR fields. Please, correct me if I'm wrong.

Given the above concern, I assume we're all in the same page about going against data duplication, or we would end up counting coffee sugar calories, but eating a whole cake later. :P

The fine tuning is indeed a valid goal. Not only for ibm-js/ecma402, but for all i18n projects. So, why don't we extend this for all of them? We could establish a declarative way to define the "field's filter" and have the post-install hook to handle it as well.

To the fine tuning debate, I add one more question. Can we fine tune based on usage? E.g., a user formatting hours (not dates) new Intl.DateTimeFormat("en", {hour: "2-digit", minute: "2-digit"}) perhaps doesn't need a bunch of CLDR fields. Going with the post-install hook approach, the default could be picking all fields the library needs. But, user could customize that filter for the pieces he actually needs if he will.

Please, just let me know on any questions.

JCEmmons · 2014-09-26T15:42:18Z

I'm willing to take a look at this, but to be honest, I haven't had much time to do so yet. Now that CLDR 26 is out the door, perhaps I can carve out some time to look at better ways to do this, as Rafael suggests. For the time being, I have updated all of our data here to the 26 release, and updated a few testcases accordingly, so we should be all current for the next 6 months or so.

cjolif · 2014-09-30T16:22:07Z

The post-processing described in cldr/config has the following goal, as I've interpreted it: avoid user to download unnecessary CLDR fields. Please, correct me if I'm wrong.

Right we also ignore some of the .json files entirely.

The fine tuning is indeed a valid goal. Not only for ibm-js/ecma402, but for all i18n projects. So, why don't we extend this for all of them? We could establish a declarative way to define the "field's filter" and have the post-install hook to handle it as well.

That's definitely a good idea.

To the fine tuning debate, I add one more question. Can we fine tune based on usage? E.g., a user formatting hours (not dates) new Intl.DateTimeFormat("en", {hour: "2-digit", minute: "2-digit"}) perhaps doesn't need a bunch of CLDR fields. Going with the post-install hook approach, the default could be picking all fields the library needs. But, user could customize that filter for the pieces he actually needs if he will.

I suppose that would be a second step. I guess the first step is to make the post-install hook works with all the fields required by a given library. A second step could be to offer fine grain tuning on what the user is exactly leveraging the library.

One concern I have though is, if each lib as different post install hook, then on the client we will end up with duplicating the data again? Or should the various post install hooks be able to work collaboratively and come up with a version that covers all the libraries used in a given application? (not sure how to achieve that).

rxaviers · 2014-09-30T20:10:45Z

I suppose that would be a second step. I guess the first step is to make the post-install hook works with all the fields required by a given library. A second step could be to offer fine grain tuning on what the user is exactly leveraging the library.

Exactly.

One concern I have though is, if each lib has different post install hook, then on the client we will end up with duplicating the data again? Or should the various post install hooks be able to work collaboratively and come up with a version that covers all the libraries used in a given application? (not sure how to achieve that).

The post install hook is set on a given application. Each library only declares its CLDR needs. Exemplifying: https://gist.github.com/rxaviers/dbd2e56840c5c7a508e8

cjolif · 2014-10-01T09:31:27Z

Thanks for the sample.

I guess cldr.json vs bowser.js with cldr key is an implementation detail and we can sort out everything and decide on that later.

The important point is making sure the filtering syntax is expressive enough to cover current use-cases like the following https://github.com/ibm-js/ecma402/blob/master/cldr/config/ecma402_cldr_ca_buddhist_config.txt as well as making sure the syntax makes the merge operation as easy as possible.

rxaviers · 2014-10-01T11:05:05Z

The important point is making sure the filtering syntax is expressive enough to cover current use-cases like the following https://github.com/ibm-js/ecma402/blob/master/cldr/config/ecma402_cldr_ca_buddhist_config.txt as well as making sure the syntax makes the merge operation as easy as possible.

👍

rxaviers · 2014-10-07T12:39:54Z

Considering we are all ok with the idea of "Avoid duplicating CLDR data" according to our last meeting, I will give this PR a next round of changes and will let you know.

Fixes ibm-js#73

rxaviers · 2014-10-07T18:55:31Z

Two supplemental files cannot be find in any CLDR v26 json zip files:

supplemental/aliases.json
supplemental/localeAliases.json

Am I missing something, are they available somewhere? Should we update http://unicode.org/cldr/trac/ticket/7968 with that request?

Ref ibm-js#66 Ref ibm-js#68

rxaviers · 2014-10-07T20:40:18Z

I've pushed a new commit that addresses the CLDR data duplication. It implements what's described in "Avoid duplicating CLDR data everywhere" above. It's not finished though. These items need to be addressed:

Fix failing tests, 🆘 (I need someone to take a look at this, see below)
Find supplemental/aliases.json or supplemental/localeAliases.json. 🆘 (They're not present on any json zip files, where can we find them?)
post-install hook to filter all the fields required by a given library (cldr/config now lives in cldr-extra/config)
- define the filtering syntax.
- implement it.

I will bring an initial definition for the filtering syntax above. I need a help on the two 🆘 items above.

Failing tests:

Warning: Test main - testAbstractOperations - matcherFunctions FAILED on firefox 32.0.3 on LINUX:
AssertionError: BestFitMatcher() should return the correct locale for language tag "en-US": expected 'en-US' to equal 'en'
No stack
Test main - 9.1 - Test_9_1_b FAILED on firefox 32.0.3 on LINUX:
AssertionError: Locale zh-Hans-CN is supported, but fallback zh-CN isn't.: expected -1 to not equal -1 (Testing with NumberFormat.)
No stack Use --force to continue.

rxaviers · 2014-10-08T11:21:00Z

@JCEmmons, where can I find supplemental/aliases.json or supplemental/localeAliases.json in the CLDR json zip files please?

rxaviers · 2014-10-09T02:07:56Z

I will bring an initial definition for the filtering syntax above

I've extended https://gist.github.com/rxaviers/dbd2e56840c5c7a508e8 to include a proposal for the fields filtering (see it at the end of the gist). Is there any currently-in-use-configuration not covered?

clmath · 2014-10-31T14:58:28Z

Needs #73 (done, needs to land on master)

See #73 (comment).

rxaviers · 2014-11-13T12:48:35Z

See #73 (comment).

It could be skipped then.

@JCEmmons, I still need your answer to the question above.

@clmath, I need your help to sort out the failing tests above.

clmath · 2014-11-14T17:02:09Z

Hi Rafael,

After my testing of your fork, I am pretty sure the problem with the tests is that ecma402 is relying on a custom version of the cldr while cldr-data install the official version.
So some locales are in ecma402/cldr but not in cldr-data and vice versa which lead to different locale fallback causing the tests to fail.
There is also (at least) one hardcoded list of locales in ecma402 tests that may also need to be changed.

I have no idea of why ecma402 is not using the official cldr data and I don't have enough knowledge in locales fallback to correct the tests. So I think we need your help @JCEmmons.

rxaviers · 2014-11-14T17:03:28Z

Thank you @clmath.

JCEmmons · 2014-11-17T21:24:33Z

The supplemental/aliases.json and supplemental/localeAliases.json are alias files that aren't in the official CLDR distribution. But they probably need to be added to it. For the time being, you can pull them from https://github.com/JCEmmons/cldr-ecma402 until they are added to CLDR's configuration.

JCEmmons · 2014-11-17T21:31:19Z

In response to @clmath 's comment "I have no idea of why ecma402 is not using the official cldr data and I don't have enough knowledge in locales fallback to correct the tests. " - Is that the "official cldr data" has a lot of fields that ECMA-402 doesn't need. So when I generate the customized data for ECMA-402, I strip out the stuff that we don't use. Note that the program that generates the official data and the one that generates the custom data are the same program, so the JSON keys and fields should be compatible ( although to be honest, I haven't fully tested this yet ).

I'm in the process of seeing what it would take in order to make ECMA-402 work properly with Rafael's "cldr-data". We give up some performance by loading more data, but the amount of additional data may be such that we don't care ( or can be filtered in some other fashion ).

rxaviers · 2014-11-18T12:30:14Z

The supplemental/aliases.json and supplemental/localeAliases.json are alias files that aren't in the official CLDR distribution. But they probably need to be added to it

@JCEmmons can you handle that? Do you need help from anyone in order to get that done?

clmath · 2014-11-18T13:10:10Z

So when I generate the customized data for ECMA-402, I strip out the stuff that we don't use.

I think there is more than that as the list of available locales is different between cldr-data and cldr-ecma402 (for example cldr-data provides zh-Hans-SG while cldr-ecma402 provides zh-SG). If this is relevant we have to make sure that the filtering mecanism from cldr-data take this into account.

rxaviers · 2014-11-18T13:47:37Z

It seems to me like a different content. If this is the case, I wouldn't encourage cldr-data to change what's offered by Unicode. Let's have it fixed upstream instead (i.e., having zh-SG instead of zh-Hans-SG on json-full.zip).

Related question, json-full.zip brings duplicate identical data, e.g., en and en-US. They are identical given likelySubtags. Also, removing en-US likely subtags gives en. @JCEmmons, please correct me if I'm wrong. Why are they included? Shouldn't en suffice? This also applies to pt vs. pt-BR and probably a bunch of others.

rxaviers · 2014-11-18T14:22:50Z

@clmath

There is also (at least) one hardcoded list of locales in ecma402 tests that may also need to be changed.

Is this change needed to get this working or is it an improvement that could be tackled in a separate issue?

clmath · 2014-11-18T14:35:39Z

@rxaviers I think it should be tracked in this issue as it is related to the list of locales available in ecma402. We will need to udpate it to match the final list of available locale so they are all tested.

rxaviers · 2014-11-18T14:36:42Z

Ok, I added this item up to the description as well then.

rxaviers · 2014-11-18T14:55:48Z

About the failing tests, @clmath has already pointed out some customizations made by ecma402 which are not present (or different) in the official CLDR, which may be causing the issue.

We need to list those differences that is causing problems. So, we can either: (a) have CLDR changed accordingly, or (b) have ecma402 code changed to conform with official CLDR.

@clmath @JCEmmons ideas?

clmath · 2014-11-18T19:33:09Z

I think that is @JCEmmons call.

On a side note I updated the description to keep the link to the locale list.

rxaviers · 2014-11-18T20:55:17Z

On a side note I updated the description to keep the link to the locale list.

👍

JCEmmons · 2014-11-18T22:12:28Z

Yes, I can handle it. See http://unicode.org/cldr/trac/ticket/8040

Regards,

John C. Emmons
Globalization Architect & Unicode CLDR TC Chairman
IBM Software Group
Internet: emmo@us.ibm.com

From: Rafael Xavier de Souza notifications@github.com
To: ibm-js/ecma402 ecma402@noreply.github.com
Cc: John Emmons/Austin/IBM@IBMUS
Date: 11/18/2014 06:30 AM
Subject: Re: [ecma402] Fixes #66 (WIP) (#68)

  The supplemental/aliases.json and supplemental/localeAliases.json are
  alias files that aren't in the official CLDR distribution. But they
  probably need to be added to it

@JCEmmons can you handle that? Do you need help from anyone in order to get
that done?

—
Reply to this email directly or view it on GitHub.

rxaviers · 2014-11-19T10:36:59Z

Yes, I can handle it. See http://unicode.org/cldr/trac/ticket/8040

👍

JCEmmons · 2014-11-20T15:34:07Z

In answer to your question about en-US vs. en, pt-BR vs. pt, etc. - yes they can and should be identical. And if your software is smart enough to do removal of likely subtags, you don't need en-US, fr-FR, etc. When we put together the original zip files, we tried not to make any assumptions about what people should and shouldn't do with the data. We didn't want people to have to implement the inheritance mechanism as defined by tr35 just to use the locale data.

Use AMD paths mappings

e1c5e21

Fixes ibm-js#73

rxaviers force-pushed the fix-66-cldr-data_cldr-js branch 2 times, most recently from e891d36 to 236b3ee Compare October 7, 2014 20:21

rxaviers added 2 commits October 7, 2014 17:23

Manage CLDR data as a peer dependency via bower

a586ee9

Ref ibm-js#66 Ref ibm-js#68

Use cldr-data and avoid mirroring CLDR data

55d9947

Ref ibm-js#66 Ref ibm-js#68

rxaviers force-pushed the fix-66-cldr-data_cldr-js branch from 236b3ee to 55d9947 Compare October 7, 2014 20:23

rxaviers mentioned this pull request Oct 7, 2014

Investigate if cldr.js can be used to load the CLDR #66

Open

rxaviers changed the title ~~Fixes #66 (WIP - needs a little discussion)~~ Fixes #66 (WIP) Oct 8, 2014

rxaviers changed the title ~~Fixes #66 (WIP)~~ Fixes #66 Manage CLDR as peer dependency Jan 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #66 Manage CLDR as peer dependency #68

Fixes #66 Manage CLDR as peer dependency #68

rxaviers commented Sep 25, 2014

cjolif commented Sep 26, 2014

rxaviers commented Sep 26, 2014

JCEmmons commented Sep 26, 2014

cjolif commented Sep 30, 2014

rxaviers commented Sep 30, 2014

cjolif commented Oct 1, 2014

rxaviers commented Oct 1, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 8, 2014

rxaviers commented Oct 9, 2014

clmath commented Oct 31, 2014

rxaviers commented Nov 13, 2014

clmath commented Nov 14, 2014

rxaviers commented Nov 14, 2014

JCEmmons commented Nov 17, 2014

JCEmmons commented Nov 17, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

JCEmmons commented Nov 18, 2014

rxaviers commented Nov 19, 2014

JCEmmons commented Nov 20, 2014

Fixes #66 Manage CLDR as peer dependency #68

Are you sure you want to change the base?

Fixes #66 Manage CLDR as peer dependency #68

Conversation

rxaviers commented Sep 25, 2014

Manage CLDR data as peer dependency

Avoid duplicating CLDR data everywhere

Avoid duplicating CLDR logic (and save runtime memory)

cjolif commented Sep 26, 2014

rxaviers commented Sep 26, 2014

JCEmmons commented Sep 26, 2014

cjolif commented Sep 30, 2014

rxaviers commented Sep 30, 2014

cjolif commented Oct 1, 2014

rxaviers commented Oct 1, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 7, 2014

rxaviers commented Oct 8, 2014

rxaviers commented Oct 9, 2014

clmath commented Oct 31, 2014

rxaviers commented Nov 13, 2014

clmath commented Nov 14, 2014

rxaviers commented Nov 14, 2014

JCEmmons commented Nov 17, 2014

JCEmmons commented Nov 17, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

rxaviers commented Nov 18, 2014

clmath commented Nov 18, 2014

rxaviers commented Nov 18, 2014

JCEmmons commented Nov 18, 2014

rxaviers commented Nov 19, 2014

JCEmmons commented Nov 20, 2014