Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect "full-icu" module #3460

Closed
srl295 opened this issue Oct 20, 2015 · 67 comments
Closed

detect "full-icu" module #3460

srl295 opened this issue Oct 20, 2015 · 67 comments
Assignees
Labels
feature request Issues that request new features to be added to Node.js. i18n-api Issues and PRs related to the i18n implementation.

Comments

@srl295
Copy link
Member

srl295 commented Oct 20, 2015

Followup to nodejs/node-v0.x-archive#8996

  • The npm module full-icu will at postinstall time make sure node_modules/full-icu contains the appropriate icudt*.dat file needed for full ICU support.
    • it also prints out instructions on how to fire up Node with full ICU.
  • Proposal: somehow detect the special (?) module name full-icu in any of the global search paths or local module search paths. If found && the correct icudt file is present, _automatically set Node's ICU path to point to that icudt_.dat file*

This will make getting full data as easy as npm install [-g] full-icu

@srl295
Copy link
Member Author

srl295 commented Oct 20, 2015

@jasnell

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Oct 20, 2015
@jasnell
Copy link
Member

jasnell commented Oct 20, 2015

hmmm... trying it now...

npm install icu4c-data@55l (Node 4.0.0 and small-icu 55) -> icudt55l.dat
npm WARN unmet dependency /usr/local/lib/node_modules/yeoman requires bower@'~0.3.0' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/bower,
npm WARN unmet dependency which is version 1.5.3
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/npm-registry-client/node_modules/npm-package-arg requires semver@'4' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/semver,
npm WARN unmet dependency which is version 5.0.1
icu4c-data@0.55.2 /usr/local/lib/node_modules/icu4c-data
 •  (no icudt55l.dat)
/usr/local/lib/node_modules/full-icu/install-spawn.js:51
                        throw Error('Somehow failed to install ' + icudat);
                        ^

Error: Somehow failed to install icudt55l.dat
    at Error (native)
    at npmInstallNpm (/usr/local/lib/node_modules/full-icu/install-spawn.js:51:10)
    at Object.<anonymous> (/usr/local/lib/node_modules/full-icu/postinstall.js:62:2)
    at Module._compile (module.js:434:26)
    at Object.Module._extensions..js (module.js:452:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Function.Module.runMain (module.js:475:10)
    at startup (node.js:117:18)
    at node.js:951:3

@Fishrock123
Copy link
Contributor

Tagging this for tsc discussion. This is quite interesting but requires hard-coding in some this we haven't done anything like before.

@Fishrock123
Copy link
Contributor

Feel free to remove if you don't think it qualifies for it (yet).

@jasnell
Copy link
Member

jasnell commented Oct 20, 2015

Having it autodiscover is good in theory but does introduce a startup performance concern. It would be good to have a PoC implementation that we can benchmark.

@edef1c
Copy link
Contributor

edef1c commented Oct 20, 2015

It's worth noting that this creates a coupling between node and npm (something that hasn't really existed before — node just bundled npm thus far), or at least a coupling between node and ${npm_prefix}/lib/node_modules (which never has never been in any search path, to my knowledge).
I'm not sure that's a thing we want to do/have.

@edef1c
Copy link
Contributor

edef1c commented Oct 20, 2015

In addition, it obscures an application's true dependencies, by violating "dependency locality".
I think that's an even more serious concern than npm coupling.

@srl295
Copy link
Member Author

srl295 commented Oct 20, 2015

@jasnell the issue with installing as -g has been solved, nodejs/full-icu-npm#2

This is fine to discuss at TSC. It's been rattling around as an idea for a while now.

In addition, it obscures an application's true dependencies, by violating "dependency locality".

I assume this is not just related to the auto discover but the full-icu package itself. Do you have any suggestions as to the approach? To just get the data file, the "sub-dependency" wouldn't have to be an NPM dependency- post install could just wget fetch the data from somewhere.

Anyways - the entire approach is most certainly open to discussion.

@silverwind
Copy link
Contributor

BTW, is building with full-icu by default out of question?

@Fishrock123
Copy link
Contributor

@silverwind this was discussed in detail during the call: https://soundcloud.com/node-foundation/tsc-meeting-2015-10-21

Decision was to move back to GH though.

@srl295
Copy link
Member Author

srl295 commented Oct 22, 2015

@silverwind #3460 is about detecting the full data if you are built in small mode, which would remain relevant unless small mode was removed. I will reply to your question over on #3476 which is about how ICU is built itself.

@orangemocha
Copy link
Contributor

Rather than having node searching through node_modules, how about having the full-icu module put the data somewhere in the user folder, and have node look for it there at startup? It's a similar approach but it decouples the mechanism used by node to find the data from the tool used to install it.

@srl295
Copy link
Member Author

srl295 commented Oct 24, 2015

@orangemocha that could work. Suggestions on what the 'somewhere' should look like? The only precedent for this that I know of is the modules dirs. Maybe something like:

  • {CWD}/icu-data/icudt56l.dat
  • {HOME or profile}/.icu-data/icudt56l.dat
  • {Node install dir}/../lib/icu-data-icudt56l.dat

I put the specific file name because node will look for that one file that it needs. Three stat calls (in this example) and then done.

@orangemocha
Copy link
Contributor

Yes, I would follow a parallel with npm global modules. They are under %APPDATA%\npm on Windows. I believe the Linux equivalent is ~/.appname. I would use node as the app name assuming this data is only useful to node.

So:

  • On *nix: ~/.node/icu-data/icudt56l.dat
  • On Windows: %APPDATA%\node\icu-data\icudt56l.dat

@srl295
Copy link
Member Author

srl295 commented Oct 26, 2015 via email

@Fishrock123
Copy link
Contributor

Removing the tsc-agenda tag for now.

@bnoordhuis
Copy link
Member

Resolution from today's CTC meeting:

  1. No objections to the suggested approach.
  2. Follow up with working code.
  3. Mark experimental for a major cycle or two to shake out issues.

Does that sound about right?

@srl295
Copy link
Member Author

srl295 commented Nov 11, 2015

@bnoordhuis yes, thanks. There have been so many different opinions on this, appreciate the quick review

@chrislea
Copy link

FWIW, from the standpoint of making deb and rpm packages, you'd want to put the .dat file in

/usr/share/node/icudt56l.dat

We could then include basically just that file in a separate package called something like nodejs-intl-full_5.1.0.XXX.deb which would depend on the actual nodejs package, and it should all "just work" like people expect it to.

@StarpTech
Copy link

@bnoordhuis What do you mean with "just works"? I still need to configure the loading process either via env variable or cli flag.

@bnoordhuis
Copy link
Member

@StarpTech If you scroll through the comments, you'll see people remarking they had problems installing the module. Those issues have been fixed.

If you're asking about auto-detection: unlikely to happen, or it would have happened already.

@srl295
Copy link
Member Author

srl295 commented May 16, 2019

FYI, if you use full-icu please see a proposed change here: nodejs/full-icu-npm#36

@sgallagher
Copy link
Contributor

We (Fedora/Red Hat Enterprise Linux) have the following use-case:

  • We want to support developers and multiple deployment scenarios.
  • Not all users require full internationalization support and would prefer to avoid the disk-space usage when possible. (Example: container environments like OpenShift)
  • Most deployments do want full internationalization support.
  • We don't want to build Node.js twice (once for small-icu, once for full-icu) so as to avoid shipping different /usr/bin/node binaries.
  • We don't want users to be required to explicitly set NODE_ICU_DATA or pass --icu-data-dir. We want the presence or absence of the data to be based on the presence or absence of an RPM package.

What we would like to see is a mode to build Node.js with small-icu bundled but also build the icudt6X[bl].dat from the full-icu. We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).

We would then ship two subpackages for Node.js:

  • nodejs-i18n-data which contains only the data files to put in the well-known location
  • nodejs which provides the interpreter and Recommends: nodejs-i18n-data. (Recommends: is an RPM dependency specification which means that the listed package will be installed by default unless --no-recommends are passed, such as in a container environment. And also that removing it later does not result in removing the dependent package.)

@srl295
Copy link
Member Author

srl295 commented Dec 3, 2019

@sgallagher Hello. The full-icu npm module will download the icudt*.dat file for you. Right now it downloads from NPM, but I want to change it to download from ICU's GitHub release directly. Node.js doesn't need to build icudt*.dat file separately. Starting with Node.js 13, --with-intl=full-icu is the default. But you can still build with --with-intl=small-icu explicitly, and I recommend doing so starting now if that is your preference as a packager.

We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).

Yes, that was the premise of this issue when I opened it four years ago. However, it was never implemented beyond a proof of concept phase. It didn't work on all platforms, etc, etc. Someone could still implement this, but I don't plan to work on it.

sgallagher added a commit to sgallagher/node-1 that referenced this issue Dec 12, 2019
When compiled with `--with-intl=small` and
`--with-icu-default-data-dir=PATH`, Node.js will use PATH as a
fallback location for the ICU data.

We will first perform an access check using fopen(PATH, 'r') to
ensure that the file is readable. If it is, we'll set the
icu_data_directory and proceed. There's a slight overhead for the
fopen() check, but it should be barely measurable.

This will be useful for Linux distribution packagers who want to
be able to ship a minimal node binary in a container image but
also be able to add on the full i18n support where needed. With
this patch, it becomes possible to ship the interpreter as
/usr/bin/node in one package for the distribution and to ship the
data files in another package (without a strict dependency
between the two). This means that users of the distribution will
not need to explicitly direct Node.js to locate the ICU data. It
also means that in environments where full internationalization is
not required, they do not need to carry the extra content (with
the associated storage costs).

Refs: nodejs#3460

Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>
Trott pushed a commit that referenced this issue Dec 14, 2019
When compiled with `--with-intl=small` and
`--with-icu-default-data-dir=PATH`, Node.js will use PATH as a
fallback location for the ICU data.

We will first perform an access check using fopen(PATH, 'r') to
ensure that the file is readable. If it is, we'll set the
icu_data_directory and proceed. There's a slight overhead for the
fopen() check, but it should be barely measurable.

This will be useful for Linux distribution packagers who want to
be able to ship a minimal node binary in a container image but
also be able to add on the full i18n support where needed. With
this patch, it becomes possible to ship the interpreter as
/usr/bin/node in one package for the distribution and to ship the
data files in another package (without a strict dependency
between the two). This means that users of the distribution will
not need to explicitly direct Node.js to locate the ICU data. It
also means that in environments where full internationalization is
not required, they do not need to carry the extra content (with
the associated storage costs).

Refs: #3460

Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>

PR-URL: #30825
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
When compiled with `--with-intl=small` and
`--with-icu-default-data-dir=PATH`, Node.js will use PATH as a
fallback location for the ICU data.

We will first perform an access check using fopen(PATH, 'r') to
ensure that the file is readable. If it is, we'll set the
icu_data_directory and proceed. There's a slight overhead for the
fopen() check, but it should be barely measurable.

This will be useful for Linux distribution packagers who want to
be able to ship a minimal node binary in a container image but
also be able to add on the full i18n support where needed. With
this patch, it becomes possible to ship the interpreter as
/usr/bin/node in one package for the distribution and to ship the
data files in another package (without a strict dependency
between the two). This means that users of the distribution will
not need to explicitly direct Node.js to locate the ICU data. It
also means that in environments where full internationalization is
not required, they do not need to carry the extra content (with
the associated storage costs).

Refs: #3460

Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>

PR-URL: #30825
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this issue Jan 14, 2020
When compiled with `--with-intl=small` and
`--with-icu-default-data-dir=PATH`, Node.js will use PATH as a
fallback location for the ICU data.

We will first perform an access check using fopen(PATH, 'r') to
ensure that the file is readable. If it is, we'll set the
icu_data_directory and proceed. There's a slight overhead for the
fopen() check, but it should be barely measurable.

This will be useful for Linux distribution packagers who want to
be able to ship a minimal node binary in a container image but
also be able to add on the full i18n support where needed. With
this patch, it becomes possible to ship the interpreter as
/usr/bin/node in one package for the distribution and to ship the
data files in another package (without a strict dependency
between the two). This means that users of the distribution will
not need to explicitly direct Node.js to locate the ICU data. It
also means that in environments where full internationalization is
not required, they do not need to carry the extra content (with
the associated storage costs).

Refs: #3460

Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>

PR-URL: #30825
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
BethGriggs pushed a commit that referenced this issue Feb 6, 2020
When compiled with `--with-intl=small` and
`--with-icu-default-data-dir=PATH`, Node.js will use PATH as a
fallback location for the ICU data.

We will first perform an access check using fopen(PATH, 'r') to
ensure that the file is readable. If it is, we'll set the
icu_data_directory and proceed. There's a slight overhead for the
fopen() check, but it should be barely measurable.

This will be useful for Linux distribution packagers who want to
be able to ship a minimal node binary in a container image but
also be able to add on the full i18n support where needed. With
this patch, it becomes possible to ship the interpreter as
/usr/bin/node in one package for the distribution and to ship the
data files in another package (without a strict dependency
between the two). This means that users of the distribution will
not need to explicitly direct Node.js to locate the ICU data. It
also means that in environments where full internationalization is
not required, they do not need to carry the extra content (with
the associated storage costs).

Refs: #3460

Signed-off-by: Stephen Gallagher <sgallagh@redhat.com>

PR-URL: #30825
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

No branches or pull requests