Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Publish tarball that contains only full ICU data #8996

Closed
misterdjules opened this issue Jan 8, 2015 · 26 comments
Closed

Publish tarball that contains only full ICU data #8996

misterdjules opened this issue Jan 8, 2015 · 26 comments

Comments

@misterdjules
Copy link

Eventually, binary installers will ship with full ICU data (see #8995). However, other binaries (binary tarballs, custom builds from source) will still not ship and load the full ICU data by default.

To allow users of these other binaries to use the full ICU data, we should generate and publish a separate tarball that contains only the full ICU data as a separate .dat file.

To use the full ICU data, users could put this file in $PREFIX/share/node/icu/ or could pass its path on the command line using --icu-data-dir=.

The tarball would be automatically generated and published by a Jenkins job.

This depends on #8983.

cc @srl295 @tjfontaine

@srl295
Copy link
Member

srl295 commented Jan 16, 2015

You can pull the full data file out of deps/icu/source/data/in/*.dat after a successful build. However, it won't necessatily have the right endianness.

@misterdjules misterdjules added this to the 0.12.1 milestone Jan 22, 2015
@srl295
Copy link
Member

srl295 commented Jan 24, 2015

Note: this is blocked by #9046 - we can't generate the big endian tarball for the same reason (conversion). Probably the solution proposed there needs to go in first, IF we want to publish BE data from node's Makefile

srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Jan 24, 2015
ICU can't "swap endianness" if its conversion, etc code is removed.
We don't want this code in node, but it's needed for the tools.

May affect 'target' inadvertently - to be analyzed.
Basically this is nodejs#9046 but also blocks nodejs#8996
srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Jan 24, 2015
…CU data

see nodejs#8996

Have not tested the resulting tarballs yet, posting for comment.

Generates:
 - 9.8M    node-v0.11.16-locales-be.tar.gz
 - 9.6M    node-v0.11.16-locales-le.tar.gz
@srl295
Copy link
Member

srl295 commented Jan 24, 2015

@misterdjules ^^ thoughts on #9091 ? Is it the right shape?

@srl295
Copy link
Member

srl295 commented Feb 10, 2015

FYI @sxa555 - how does this look?

@srl295
Copy link
Member

srl295 commented Feb 10, 2015

  • add a README to the tarball with some instructions. maube lodge it in tools/icu/README-data-tarballs.md

@srl295
Copy link
Member

srl295 commented Feb 11, 2015

  • add a command line to test-intl.js to verify full
  • add another test to upload the tarball, download it and test it.

@srl295
Copy link
Member

srl295 commented Feb 11, 2015

@misterdjules where should the extended tests ^^ live?

@sxa
Copy link
Member

sxa commented Feb 12, 2015

Seems good - subject to an assumption that Windows users will be ok with a .tgz instead of a ZIP.

@srl295
Copy link
Member

srl295 commented Feb 13, 2015

@sxa555 yeah wasn't sure about that. Make a .zip also? make them all zips?

@misterdjules
Copy link
Author

@srl295 Thank you very much for your work, and sorry for taking so long to do a first review.

It's not clear to me yet how to add tests for that, since I believe we don't currently have any similar "integration" test.

I would put them in a separate directory in test, maybe in test/integration (I don't like this name, but test/i18n might be too specific).

The test doesn't need to upload the tarball, it could just build node with small-icu support, generate the data tarballs, and then run node such that it loads the data in this tarball and run a small script that tests for full localization support.

We would also want to have that work on all supported platforms (OS X, Linux, SmartOS and Windows).

What do you think?

@sxa
Copy link
Member

sxa commented Feb 23, 2015

Tricky to say on tgz vs zip since in the general case you cant' assume that UNIX users will have an unzip/jar command either ... but I guess that might be more likely than a windows user have gzip+tar ...

Another thought though - how practical would it be to ship the ICU data as an NPM module which users could install globally - if you did that it might be possible to also have node default to searching in the global location for the ICU data file if no alternative was explcitly specified?

@srl295
Copy link
Member

srl295 commented Feb 26, 2015

@sxa555 actually, the npm module idea makes some sense. This could morph into a 'generator' that auto publishes into npm the appropriate versions. And actually, we could install with node (part of the normal build) a text file containing: npm install node-icu-data-le@54.1 ( in other words the exact command line needed to go fetch the apporpriate data version and the appropriate endianness.).

@srl295
Copy link
Member

srl295 commented Mar 19, 2015

While I look into the npm module idea, at least just to fetch the data itself (ala @rxaviers' cldr-data-npm would this make some sense:

  • have a (separate?) repo that has a builder script that generates package.json (with probably a more or less static version 1.0.0 or whatever), ICU license files, readme, etc
  • then push an instance of the npm module with: npm publish ./generated/ --tag icudt54l and npm publish ./generated/ --tag icudt54b
  • then users can get what they need with npm install [-g] node-icudata@icudt54l or npm install [-g] node-icudata@icudt54b for big endian

Alternate options:

  • commit each and every package.json to the repo
  • commit each version of the ICU data to the repo

Any thoughts about the npm package naming and use of the tag?

@misterdjules misterdjules modified the milestones: 0.12.3, 0.12.2 Apr 1, 2015
@srl295
Copy link
Member

srl295 commented Apr 2, 2015

OK, NPM package published here - https://www.npmjs.com/package/icu4c-data

snippets:

node -e 'console.dir("npm install icu4c-data@"+process.config.variables.icu_ver_major+process.config.variables.icu_endianness)'
npm install icu4c-data@54l
node --icu-data-dir=node_modules/icu4c-data -e "console.dir(new Date().toLocaleString('ru',{month:'long'})));"

any thoughts?

@sxa
Copy link
Member

sxa commented Apr 2, 2015

I'd probably install the latest LE version by default if you don't specify , but otherwise LGTM

I guess we need to encourage people to specify the version properly anyway to ensure the ICU version patches what's built into their node anyway.

@srl295
Copy link
Member

srl295 commented Apr 2, 2015

@sxa555 yeah, I think the default has to do with the 'real' npm package version being the highest. Maybe I can retag 'latest' somehow.

@srl295
Copy link
Member

srl295 commented Apr 2, 2015

@srl295 srl295 mentioned this issue Apr 3, 2015
@stuartpb
Copy link

stuartpb commented Apr 3, 2015

Why is endianness part of the package name? Doesn't NPM have a means of selecting platform-specific binaries?

Also, what's wrong with using semver for the package version, rather than putting the version in the name as well?

@srl295
Copy link
Member

srl295 commented Apr 3, 2015

Why is endianness part of the package name? Doesn't NPM have a means of selecting platform-specific binaries?

It's part of the tag, not the name. I'm not aware of that means but I will look, do you have a link?

edit sorry, please see https://www.npmjs.com/package/icu4c-data - I didn't update the proposal at the top of this ticket.

0.54.0 is tagged as 54l, 0.54.1 is tagged as 54b so you can npm install icu4c-data@54l or npm install icu4c-data@54b

@stuartpb
Copy link

stuartpb commented Apr 3, 2015

I guess npm's means of selecting platform-specific binaries is to include a binding.gyp in the package, which is then used on install. So, you'd have a binding.gyp that builds/downloads the appropriate blob at install time, depending on the platform it's been installed to.

(gyp documentation: https://code.google.com/p/gyp/wiki/GypUserDocumentation)

I'm not sure how gyp would select the endianness of the platform, but that essentially just stems from me not knowing what big-endian platforms there are with Node.

Other approaches that seem simple enough to my untrained eye:

  • Just include both as part of the package. (What's another 5MB?)
  • Include little-endian, and have big-endian platforms read from the little-endian data and dynamically convert at runtime. (This is the approach the binding.gyp could take too, actually.)

Specifying the endianness as part of the package spec is no good, as it unnecessarily limits your package to a specific platform.

@srl295
Copy link
Member

srl295 commented Apr 3, 2015

@stuartpb I was hoping to keep the dependencies more lightweight than gyp. Also, you have to select the right version number of the ICU data anyways, so if v8 was linked against icu 54 you already need to select 54 and not 53 or 55 - so selecting 54l versus 54b doesn't seem to add that much of a burden.

What you could have is a node module that on 'install', turns around and like this executes the command output by this.
node -e 'console.dir("npm install icu4c-data@"+process.config.variables.icu_ver_major+process.config.variables.icu_endianness)'

just include both

54l and 54b are each 25MB.

@sxa
Copy link
Member

sxa commented Apr 7, 2015

@srl295 Wasn't sure where you were storing the source for the NPM, but the usage example in the MD has ")))" at the end which is one too many - should just be "))"

@srl295
Copy link
Member

srl295 commented Apr 7, 2015

@sxa555 thanks, it's actually on icu-project at the moment. I'll fix that typo.

@srl295
Copy link
Member

srl295 commented Apr 8, 2015

The ICU ticket has been accepted by the ICU PMC. For now, this will be a task done and updated by the ICU release process (me). If ICU fails to update it, the updating could be taken over by node.js or some other party.

@misterdjules misterdjules modified the milestones: 0.12.3, 0.12.4 May 14, 2015
@misterdjules misterdjules modified the milestones: 0.12.4, 0.12.5 May 25, 2015
@misterdjules misterdjules modified the milestones: 0.12.5, 0.12.6 Jun 22, 2015
@jasnell
Copy link
Member

jasnell commented Jun 24, 2015

@srl295 ... what's the status on this one?

@misterdjules misterdjules modified the milestones: 0.12.6, 0.12.7, 0.12.8 Jul 6, 2015
@jasnell
Copy link
Member

jasnell commented Aug 26, 2015

@srl295 ... going to assume we can close this. Feel free to reopen if that assumption is incorrect.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants