Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

build: enable a smaller Intl (i18n / ICU) build #7719

Closed
wants to merge 34 commits into from

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jun 2, 2014

Implements #7676 - see README.md for details. Also search for Intl on https://github.com/srl295/node/wiki/Installation to see items to add into the main wiki.

Passes make test as well as current HEAD did (some network test failures?)

known TODOs:

  • doc
    • Too much detail in the README.md. Move elsewhere/delete.
  • build
    • Windows build? Rewrite icu-generic.sh in python and/or use cygwin?
      • but, need to allow build of linked-in ICU data (large and small) in windows.
    • Switch to turn on and off "little by default"
    • Enable use of any ICU (not just special SRL branch) - will probably do this by
      • copying iculslocs.c and icutrim.py into node
      • making the unixy build work the same way as windows.
      • update the instructions to say "grab an ICU tarball, unpack as deps/icu, enjoy"
    • change configure options to: --with-intl=small-icu or full-icu or system-icu
      • actually implement system-icu - mac/unx only, looks for an installed ICU.
  • v8 ( discussion here: https://groups.google.com/forum/#!topic/v8-dev/uAdEzPBcdXE )
  • ICU
    • the "smaller ICU" changes could be pushed into a future ICU version. (already have bugs filed for this, not a blocker here)

NB: I have signed the CLA, you will find me under srloomis at US dot IBM dot COM

@srl295
Copy link
Member Author

srl295 commented Jun 2, 2014

see also discussion in #6371 and #4689
On my RedHat x86_64 machine, the build size is 14M as described here vs 18M for a 'non i18n' build.

@srl295 srl295 changed the title Enable a smaller Intl (i18n / ICU) build build: enable a smaller Intl (i18n / ICU) build Jun 2, 2014
@srl295
Copy link
Member Author

srl295 commented Jun 3, 2014

I updated the todo list at top - v8 has commented on some of these patches.

  • The "crasher" v8 number 3348 sounds bad, but it is really a UX issue on misconfiguration, and not a new problem. Not a blocker for this PR.
  • The "save 1M" v8 number 3345 - would be nice to save the 1M but is that a blocker?
  • The "InitializeICU" v8 number 3344 - v8's direction here https://code.google.com/p/v8/issues/detail?id=3344 seems to be that the embedder (i.e. Node) can package and initialize ICU in any other way. So that part of the PR could just be moved into node.cc or other suitable spot.

So, I think I can update this PR to not have any deps/v8 changes.

@srl295
Copy link
Member Author

srl295 commented Jun 3, 2014

I've rebased this PR, and removed the v8 deps.

@srl295
Copy link
Member Author

srl295 commented Jun 3, 2014

if anyone wants a full icudt53l.dat file to play with the --with-icu-dir= option, the one I am using is at https://ssl.icu-project.org/~srl/tmp/NOT_FOR_PRODUCTION/NOT_FOR_PRODUCTION_TESTICU.zip (md5: fd2dcff4bd5b88b4841710120987ebaf ) otherwise you can pull it from the ICU 53 source .tgz download at http://site.icu-project.org/download/53 - see README.md


`make`

`make install`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap lines at 80 columns here? If you have trouble fitting URLs in that space, you can use GH's [description]: http://www.example.com/ markdown extension. See the bottom of doc/api/http.markdown for an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. The README notes are sort of centralized notes on the whole PR, much of it will get either removed or split out elsewhere in the docs.

@bnoordhuis
Copy link
Member

There's a bunch of style issues but in general I like where this is going. Can you run make cpplint and fix the things it complains about?

@srl295
Copy link
Member Author

srl295 commented Jun 4, 2014

@bnoordhuis thanks, will do.

On the .sh script, would it be better written in Python? That won't help Windows (besides using cygwin/msys).

@bnoordhuis
Copy link
Member

Apropos the shell script, I think it's going to be a hard requirement that it works on both Windows and POSIX platforms. Some solution would need to be concocted but I'm not sure what. Do you have suggestions?

@srl295
Copy link
Member Author

srl295 commented Jun 5, 2014

@bnoordhuis re: Windows, Just to verify, there's no prereq of cygwin or msys? (I think this is why chromium's ICU branch was so big, it had prebuilt data in a consumable form for Windows to avoid building the tools..) Anyways, I've got Win7 fired up, I'll work on this. All of the above is important proof of concept and the C/python code will be needed no matter what.

@bnoordhuis
Copy link
Member

there's no prereq of cygwin or msys?

That's correct, the only prerequisites are MSVC and python. I suppose that, technically speaking, you could also build node with the Windows SDK but I don't think anyone actually does.

@srl295
Copy link
Member Author

srl295 commented Jun 9, 2014

@bnoordhuis I was able to get a windows build to build/run w/o crashing, minus any data. So we won't need any new prereqs, just need to port the ICU tooling over to gyp. Trying to avoid hand-curation of the kind that Chromium does.

FYI: windows work on a side PR https://github.com/srl295/node/pull/1 - I'll merge it back in back to this PR once working.

@srl295
Copy link
Member Author

srl295 commented Jun 9, 2014

FYI: https://github.com/srl295/btest402 is my trivial "is Intl alive or dead? is it a small or full set?" test project.

@srl295
Copy link
Member Author

srl295 commented Jun 12, 2014

@bnoordhuis I think I have addressed most of the style issues besides the README.md (which will need major reorganization of my changes), I am still working on the Windows side of things.

@srl295
Copy link
Member Author

srl295 commented Jun 19, 2014

FYI: windows build (with no inbuilt data) is working. Will keep working on the data portion.

@srl295
Copy link
Member Author

srl295 commented Jun 24, 2014

Now vcbuild ... small-icu will work on Windows. Results in an 11M (vs 8M base) binary with just English.

* English and Root data only
* Only the services needed by v8's Intl implementation
* Specify an additional icu data file with either:
* The `NODE_ICU_DATA` env variable: `env NODE_ICU_DATA=/some/path node`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that be a potential security pitfall in the (admittedly rather unlikely) event that node is a setuid root binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICU has a #define UCONFIG_NO_FILE_IO 1 switch as well as an API udata_setFileAccess(UDATA_NO_FILES);, either of which serves to shut off file I/O for ICU itself.

@trevnorris
Copy link

@srl295 Does seem to be a gyp bug. So I won't prevent this patch from landing because of someone else's screw up in a third-party dependency. :-P

Let me just go over this one more time and I'll sign off.

@trevnorris
Copy link

@srl295 Two nits. Other than that LGTM. Ping me when resolved and we'll finally merge this!

Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
The parameter parser specifically looked for the old bracket syntax.
This generated a lot of warnings when building the docs. Those warnings
have been fixed by changing the parsing logic.

Signed-off-by: Trevor Norris <trev.norris@gmail.com>
Running fill() with an empty string would cause Node to hang
indefinitely. Now it will return without having operated on the buffer.

User facing function has been pulled into JS to perform all initial
value checks and coercions. The C++ method has been placed on the
"internal" object.

Coerced non-string values to numbers to match v0.10 support.

Simplified logic and changed a couple variable names.

Added tests for fill() and moved them all to the beginning of
buffer-test.js since many other tests depend on fill() working properly.

Fixes: nodejs#8469
Signed-off-by: Trevor Norris <trev.norris@gmail.com>
srl295 added a commit to srl295/node-v0.x-archive that referenced this pull request Sep 30, 2014
Note: This is a rebase on master/v0.12 (as of 57ed3da)

The two main goals of this change are:
 - to make it easier to build the Intl option using ICU
   (particularly, using a newer ICU than v8/Chromium's version)
 - to enable a much smaller ICU build with only English support
   The goal here is to get node.js binaries built this way by
   default so that the Intl API can be used. Additional data
   can be added at execution time (see Readme and wiki)

More details are in the PR nodejs#7719

In particular, this change adds the --with-intl= configure option to
 provide more ways of building "Intl":
 - full-icu picks up an ICU from deps/icu
 - small-icu is similar, but builds only English
 - system-icu uses pkg-config to find an installed ICU
 - none does nothing (no Intl)

For Windows builds, the "full-icu" or "small-icu" options are added to
vcbuild.bat.

Note that the existing "--with-icu-path" option is not removed from
configure, but may not be used alongside the new option.

Wiki changes have already been made on
 https://github.com/joyent/node/wiki/Installation
and a new page created at
 https://github.com/joyent/node/wiki/Intl
(marked as provisional until this change lands.)

Now, a summary of the changes:

* README.md : doc updates

* .gitignore : added "deps/icu" as this is the location where ICU is
  unpacked to.

* Makefile : added the tools/icu/* files to cpplint, but excluded a
  problematic file.

* configure : added the --with-intl option mentioned above.
  Calculate at config time the list of ICU source files to use and data
  packaging options.

* node.gyp : add the new files src/node_i18n.cc/.h as well as ICU
  linkage.

* src/node.cc : add call into
  node::i18n::InitializeICUDirectory(icu_data_dir) as well as new
  --icu-data-dir option and NODE_ICU_DATA env variable to configure ICU
  data loading. This loading is only relevant in the "small"
  configuration.

* src/node_i18n.cc: new source file for the above Initialize.. function,
  to setup ICU as needed.

* tools/icu : new directory with some tools needed for this build.

* tools/icu/icu-generic.gyp : new .gyp file that builds ICU in some new
  ways, both on unix/mac and windows.

* tools/icu/icu-system.gyp : new .gyp file to build node against a
  pkg-config detected ICU.

* tools/icu/icu_small.json : new config file for the "English-only" small
  build.

* tools/icu/icutrim.py : new tool for trimming down ICU data. Reads the
  above .json file.

* tools/icu/iculslocs.cc : new tool for repairing ICU data manifests
  after trim operation.

* tools/icu/no-op.cc : dummy file to force .gyp into using a C++ linker.

* vcbuild.bat : added small-icu and full-icu options, to call into
  configure.

* fixed toolset dependencies, see nodejs#7719 (comment)

* Made clang work

* wrapped the `--icu-data-dir=` param in `#if defined(NODE_HAVE_I18N_SUPPORT)` as requested
@srl295
Copy link
Member Author

srl295 commented Sep 30, 2014

@trevnorris done and rebased..

Reviewed-By: Chris Dickinson <christopher.s.dickinson@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@trevnorris
Copy link

@srl295 Oh, so close. Sorry one more question. Thanks for persevering through all this. Think this is the longest PR thread in Node history.

@srl295
Copy link
Member Author

srl295 commented Oct 1, 2014

@trevnorris no prob, thanks to you (& the rest of the community) for working with me on it! OK, I've Edit: re-based on v0.12 I also reviewed the diffs vs. v0.12 and didn't find any other such odd changes. Maybe a M-x query-replace-regexp went awry.

Note: This is a rebase on master/v0.12 (as of 25702ab)

The two main goals of this change are:
 - to make it easier to build the Intl option using ICU
   (particularly, using a newer ICU than v8/Chromium's version)
 - to enable a much smaller ICU build with only English support
   The goal here is to get node.js binaries built this way by
   default so that the Intl API can be used. Additional data
   can be added at execution time (see Readme and wiki)

More details are in the PR nodejs#7719

In particular, this change adds the --with-intl= configure option to
 provide more ways of building "Intl":
 - full-icu picks up an ICU from deps/icu
 - small-icu is similar, but builds only English
 - system-icu uses pkg-config to find an installed ICU
 - none does nothing (no Intl)

For Windows builds, the "full-icu" or "small-icu" options are added to
vcbuild.bat.

Note that the existing "--with-icu-path" option is not removed from
configure, but may not be used alongside the new option.

Wiki changes have already been made on
 https://github.com/joyent/node/wiki/Installation
and a new page created at
 https://github.com/joyent/node/wiki/Intl
(marked as provisional until this change lands.)

Now, a summary of the changes:

* README.md : doc updates

* .gitignore : added "deps/icu" as this is the location where ICU is
  unpacked to.

* Makefile : added the tools/icu/* files to cpplint, but excluded a
  problematic file.

* configure : added the --with-intl option mentioned above.
  Calculate at config time the list of ICU source files to use and data
  packaging options.

* node.gyp : add the new files src/node_i18n.cc/.h as well as ICU
  linkage.

* src/node.cc : add call into
  node::i18n::InitializeICUDirectory(icu_data_dir) as well as new
  --icu-data-dir option and NODE_ICU_DATA env variable to configure ICU
  data loading. This loading is only relevant in the "small"
  configuration.

* src/node_i18n.cc: new source file for the above Initialize.. function,
  to setup ICU as needed.

* tools/icu : new directory with some tools needed for this build.

* tools/icu/icu-generic.gyp : new .gyp file that builds ICU in some new
  ways, both on unix/mac and windows.

* tools/icu/icu-system.gyp : new .gyp file to build node against a
  pkg-config detected ICU.

* tools/icu/icu_small.json : new config file for the "English-only" small
  build.

* tools/icu/icutrim.py : new tool for trimming down ICU data. Reads the
  above .json file.

* tools/icu/iculslocs.cc : new tool for repairing ICU data manifests
  after trim operation.

* tools/icu/no-op.cc : dummy file to force .gyp into using a C++ linker.

* vcbuild.bat : added small-icu and full-icu options, to call into
  configure.

* fixed toolset dependencies, see nodejs#7719 (comment)

* Made clang work

* wrapped the `--icu-data-dir=` param in `#if defined(NODE_HAVE_I18N_SUPPORT)` as requested
trevnorris pushed a commit that referenced this pull request Oct 1, 2014
The two main goals of this change are:
 - To make it easier to build the Intl option using ICU (particularly,
   using a newer ICU than v8/Chromium's version)
 - To enable a much smaller ICU build with only English support The goal
   here is to get node.js binaries built this way by default so that the
   Intl API can be used. Additional data can be added at execution time
   (see Readme and wiki)

More details are at #7719

In particular, this change adds the "--with-intl=" configure option to
provide more ways of building "Intl":
 - "full-icu" picks up an ICU from deps/icu
 - "small-icu" is similar, but builds only English
 - "system-icu" uses pkg-config to find an installed ICU
 - "none" does nothing (no Intl)

For Windows builds, the "full-icu" or "small-icu" options are added to
vcbuild.bat.

Note that the existing "--with-icu-path" option is not removed from
configure, but may not be used alongside the new option.

Wiki changes have already been made on
 https://github.com/joyent/node/wiki/Installation
and a new page created at
 https://github.com/joyent/node/wiki/Intl
(marked as provisional until this change lands.)

Summary of changes:

* README.md : doc updates

* .gitignore : added "deps/icu" as this is the location where ICU is
  unpacked to.

* Makefile : added the tools/icu/* files to cpplint, but excluded a
  problematic file.

* configure : added the "--with-intl" option mentioned above.
  Calculate at config time the list of ICU source files to use and data
  packaging options.

* node.gyp : add the new files src/node_i18n.cc/.h as well as ICU
  linkage.

* src/node.cc : add call into
  node::i18n::InitializeICUDirectory(icu_data_dir) as well as new
  --icu-data-dir option and NODE_ICU_DATA env variable to configure ICU
  data loading. This loading is only relevant in the "small"
  configuration.

* src/node_i18n.cc : new source file for the above Initialize..
  function, to setup ICU as needed.

* tools/icu : new directory with some tools needed for this build.

* tools/icu/icu-generic.gyp : new .gyp file that builds ICU in some new
  ways, both on unix/mac and windows.

* tools/icu/icu-system.gyp : new .gyp file to build node against a
  pkg-config detected ICU.

* tools/icu/icu_small.json : new config file for the "English-only" small
  build.

* tools/icu/icutrim.py : new tool for trimming down ICU data. Reads the
  above .json file.

* tools/icu/iculslocs.cc : new tool for repairing ICU data manifests
  after trim operation.

* tools/icu/no-op.cc : dummy file to force .gyp into using a C++ linker.

* vcbuild.bat : added small-icu and full-icu options, to call into
  configure.

* Fixed toolset dependencies, see
  #7719 (comment)

Note that because of a bug in gyp {CC,CXX}_host must also be set.
Otherwise gcc/g++ will be used by default for part of the build.

Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reviewed-by: Fedor Indutny <fedor@indutny.com>
@trevnorris
Copy link

Thanks much. Merged in ac2857b.

@trevnorris trevnorris closed this Oct 1, 2014
@srl295
Copy link
Member Author

srl295 commented Oct 1, 2014

@trevnorris yay! Will :

  • update the wiki
  • file followon for the getopt issue above

@ericf
Copy link

ericf commented Oct 1, 2014

Awesome! Now on to the quest of having the default Node build use the --with-intl 😄

@srl295
Copy link
Member Author

srl295 commented Oct 1, 2014

@ericf yes! That's job #7676 :)

@srl295 srl295 deleted the smaller-icu branch October 2, 2014 19:47
trevnorris pushed a commit that referenced this pull request Oct 7, 2014
Move from argparse to optparse for dependency management.

Fixes: #7719 (comment)
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@srl295 srl295 added the i18n label Jul 29, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet