New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache accesses to process.env? #3104

Closed
STRML opened this Issue Sep 28, 2015 · 21 comments

Comments

Projects
None yet
9 participants
@STRML
Contributor

STRML commented Sep 28, 2015

I've found in multiple projects that accessing process.env within a hot section of the code leads to major slowdown.

This really hurts in React server-side rendering (issue) and has caused them to rearrange how they access the env.

It would sense to cache already-accessed properties rather than reaching out to the actual environment. And of course, update the cache on assignment.

I would be happy to contribute a patch if anyone could point me in the right direction to get started, as I'm new to Node core dev.

@ChALkeR

This comment has been minimized.

Show comment
Hide comment
@ChALkeR

ChALkeR Sep 28, 2015

Member

See #1648.

Member

ChALkeR commented Sep 28, 2015

See #1648.

@STRML

This comment has been minimized.

Show comment
Hide comment
@STRML

STRML Sep 28, 2015

Contributor

Thanks @ChALkeR. I understand the argument for keeping it as-is, but if it is kept this way, I think it would help to document that values should be cached whenever possible as serious performance problems can result.

Alternatively, a function like process.env[freeze | unfreeze]() would be great to create this cache from inside our programs in case of misbehaving modules.

Contributor

STRML commented Sep 28, 2015

Thanks @ChALkeR. I understand the argument for keeping it as-is, but if it is kept this way, I think it would help to document that values should be cached whenever possible as serious performance problems can result.

Alternatively, a function like process.env[freeze | unfreeze]() would be great to create this cache from inside our programs in case of misbehaving modules.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Sep 28, 2015

Member

Is it actually possible to change the env externally once the process is running? If not, why not cache this?

Member

Fishrock123 commented Sep 28, 2015

Is it actually possible to change the env externally once the process is running? If not, why not cache this?

@ChALkeR

This comment has been minimized.

Show comment
Hide comment
@ChALkeR

ChALkeR Sep 28, 2015

Member

@Fishrock123 Any native module (or a library that is used by it) could call setenv directly. Though the likelihood of this is near zero, it's still possible. And it should affect process.env.

process.env behaves as it should, and caching it on the Node.js side will break it. There is no actual problem — anything that uses it in hot code path could cache it if it's fine with that.

-1 on caching, +1 on the minor note in the docs.

While we are here, the docs for the process.env should state things a bit more clear about how does the env work, because now «changes won't be reflected outside of your process» could be misunderstood. As well as the normal env, changes affect new child processes.

Member

ChALkeR commented Sep 28, 2015

@Fishrock123 Any native module (or a library that is used by it) could call setenv directly. Though the likelihood of this is near zero, it's still possible. And it should affect process.env.

process.env behaves as it should, and caching it on the Node.js side will break it. There is no actual problem — anything that uses it in hot code path could cache it if it's fine with that.

-1 on caching, +1 on the minor note in the docs.

While we are here, the docs for the process.env should state things a bit more clear about how does the env work, because now «changes won't be reflected outside of your process» could be misunderstood. As well as the normal env, changes affect new child processes.

@silverwind

This comment has been minimized.

Show comment
Hide comment
@silverwind

silverwind Sep 28, 2015

Contributor

Also keep in mind that there are modules that assign to process.env.

@STRML why not clone process.env to a local object once in your application?

Contributor

silverwind commented Sep 28, 2015

Also keep in mind that there are modules that assign to process.env.

@STRML why not clone process.env to a local object once in your application?

@STRML

This comment has been minimized.

Show comment
Hide comment
@STRML

STRML Sep 28, 2015

Contributor

@silverwind It's really more of a problem with child modules, like React itself (which is finally caching process.env properly new versions). And I've seen many more that do this in hot paths.

I think @ChALkeR has the right approach; I underestimated how much potential breakage there is. But documenting it properly and making it known that process.env accesses are slow would be great.

Perhaps it should have been function getter/setters, which would have been more expected to have slow behavior. Too late now of course.

Contributor

STRML commented Sep 28, 2015

@silverwind It's really more of a problem with child modules, like React itself (which is finally caching process.env properly new versions). And I've seen many more that do this in hot paths.

I think @ChALkeR has the right approach; I underestimated how much potential breakage there is. But documenting it properly and making it known that process.env accesses are slow would be great.

Perhaps it should have been function getter/setters, which would have been more expected to have slow behavior. Too late now of course.

@ChALkeR

This comment has been minimized.

Show comment
Hide comment
@ChALkeR

ChALkeR Sep 28, 2015

Member

@silverwind

Also keep in mind that there are modules that assign to process.env.

If that was the only possible way to set the env from node, then it could be covered by updating the cache in the setter. But it's not.

Member

ChALkeR commented Sep 28, 2015

@silverwind

Also keep in mind that there are modules that assign to process.env.

If that was the only possible way to set the env from node, then it could be covered by updating the cache in the setter. But it's not.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Oct 16, 2015

Member

Doesn't seem worthwhile. Closing, please cache it yourself if need be. :)

Member

Fishrock123 commented Oct 16, 2015

Doesn't seem worthwhile. Closing, please cache it yourself if need be. :)

@STRML

This comment has been minimized.

Show comment
Hide comment
@STRML

STRML Dec 2, 2015

Contributor

Just wanted to ping this, we're having quite a hard time handling this in React, and caching NODE_ENV results in a 100% (!) speedup in benchmark code. In real-world testing, I see roughly 50-70%. It's significant.

I'm getting sick of issuing PRs to repos caching access to NODE_ENV, and there really isn't a good solution for modules that need to hide features behind debug flags and want them eliminated in bundling. If we cache the environment lookup, UglifyJS is unable to eliminate the dead code in bundling unless we use a const, which then breaks older browsers.

In reality, is it really useful to rely on external changes to the env (say, from native code)? What proportion of users really need this, and is it a valuable tradeoff compared to the performance benefits we could get ecosystem-wide?

Would the Node core contributors consider a semver-major patch? My thoughts are to make process.env a plain object of the env available at boot (still global and able to be set internally), then added either a process.refreshEnv() method or a process.liveEnv object that had the old semantics.

Contributor

STRML commented Dec 2, 2015

Just wanted to ping this, we're having quite a hard time handling this in React, and caching NODE_ENV results in a 100% (!) speedup in benchmark code. In real-world testing, I see roughly 50-70%. It's significant.

I'm getting sick of issuing PRs to repos caching access to NODE_ENV, and there really isn't a good solution for modules that need to hide features behind debug flags and want them eliminated in bundling. If we cache the environment lookup, UglifyJS is unable to eliminate the dead code in bundling unless we use a const, which then breaks older browsers.

In reality, is it really useful to rely on external changes to the env (say, from native code)? What proportion of users really need this, and is it a valuable tradeoff compared to the performance benefits we could get ecosystem-wide?

Would the Node core contributors consider a semver-major patch? My thoughts are to make process.env a plain object of the env available at boot (still global and able to be set internally), then added either a process.refreshEnv() method or a process.liveEnv object that had the old semantics.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Dec 2, 2015

Member

@STRML could you detail what problems you are having caching it locally?

Member

Fishrock123 commented Dec 2, 2015

@STRML could you detail what problems you are having caching it locally?

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Dec 2, 2015

Member

@STRML Another option, make a module that does this and load it in -r/--require:

const env = process.env
delete process.env // might not be necessary
process.env = env
Member

Fishrock123 commented Dec 2, 2015

@STRML Another option, make a module that does this and load it in -r/--require:

const env = process.env
delete process.env // might not be necessary
process.env = env
@STRML

This comment has been minimized.

Show comment
Hide comment
@STRML

STRML Dec 2, 2015

Contributor

I've been doing:

process.env = JSON.parse(JSON.stringify(process.env));

Which seems to work fine.

Okay - so the issue with a library like React caching locally is that there are three main usage paths and two goals.

The goals:

  1. Eliminate the runtime penalty from accessing process.env in Node environments.
  2. Successfully optimize out dev/debug flags so entire branches can be removed during a minification step.

As a library author, you usually deliver pre-bundled source (dist.js, dist.min.js), built source (say, with babel, in a lib/ folder), and sometimes pre-built source (ES2015 etc).

There are three main usage paths:

  • Path 1: Pre-bundled source in the browser (or even via require('mylibrary/dist/lib.min.js')); this is the easiest, we can easily stub process.env.NODE_ENV or __DEV__ flags with constants in our own build steps. Goal 1 and 2 are easily attained.
  • Path 2. Inside NodeJS. We need to have a build step from src/ to lib/ that's replacing __DEV__ flags or calls to process.env.NODE_ENV to cache them so process.env.NODE_ENV is not accessed in hot functions. That's what this PR does. You can do the same by simply hoisting yourself in smaller projects. Goal 1 can be attained with some fiddling, goal 2 is irrelevant.
  • Path 3. Through browserify/webpack by bundling source. This presents a problem in combination with Path 2 because we've just hoisted our env accesses, so the minifier needs to be smart enough to still be able to do dead code elimination. UglifyJS is smart enough to do this, but only if you use a const declaration, which is not portable to < IE11, so it's not viable in most situations. It will not be able to figure out:
// This would work if it were a const
var NODE_ENV = process.env.NODE_ENV;
//...
if(NODE_ENV !== 'production') { // this will not be eliminated
  // debug code
}

Closure compiler has a /** @const */ comment which works. I want to port this to UglifyJS which would solve the problem.

As a result, if we solve Path 2 (NodeJS), we produce larger bundles for Path 3 (Webpack/Browserify) with UglifyJS. In the case of React, the bundle can be as much as 40% larger gzipped, so it's very significant (38 vs 53kb).

If process.env didn't have this performance penalty, then we could just go on writing if (process.env.NODE_ENV !== 'production'), as most library authors already do, envify and Uglify can handle this, and there'd be no problem.

Contributor

STRML commented Dec 2, 2015

I've been doing:

process.env = JSON.parse(JSON.stringify(process.env));

Which seems to work fine.

Okay - so the issue with a library like React caching locally is that there are three main usage paths and two goals.

The goals:

  1. Eliminate the runtime penalty from accessing process.env in Node environments.
  2. Successfully optimize out dev/debug flags so entire branches can be removed during a minification step.

As a library author, you usually deliver pre-bundled source (dist.js, dist.min.js), built source (say, with babel, in a lib/ folder), and sometimes pre-built source (ES2015 etc).

There are three main usage paths:

  • Path 1: Pre-bundled source in the browser (or even via require('mylibrary/dist/lib.min.js')); this is the easiest, we can easily stub process.env.NODE_ENV or __DEV__ flags with constants in our own build steps. Goal 1 and 2 are easily attained.
  • Path 2. Inside NodeJS. We need to have a build step from src/ to lib/ that's replacing __DEV__ flags or calls to process.env.NODE_ENV to cache them so process.env.NODE_ENV is not accessed in hot functions. That's what this PR does. You can do the same by simply hoisting yourself in smaller projects. Goal 1 can be attained with some fiddling, goal 2 is irrelevant.
  • Path 3. Through browserify/webpack by bundling source. This presents a problem in combination with Path 2 because we've just hoisted our env accesses, so the minifier needs to be smart enough to still be able to do dead code elimination. UglifyJS is smart enough to do this, but only if you use a const declaration, which is not portable to < IE11, so it's not viable in most situations. It will not be able to figure out:
// This would work if it were a const
var NODE_ENV = process.env.NODE_ENV;
//...
if(NODE_ENV !== 'production') { // this will not be eliminated
  // debug code
}

Closure compiler has a /** @const */ comment which works. I want to port this to UglifyJS which would solve the problem.

As a result, if we solve Path 2 (NodeJS), we produce larger bundles for Path 3 (Webpack/Browserify) with UglifyJS. In the case of React, the bundle can be as much as 40% larger gzipped, so it's very significant (38 vs 53kb).

If process.env didn't have this performance penalty, then we could just go on writing if (process.env.NODE_ENV !== 'production'), as most library authors already do, envify and Uglify can handle this, and there'd be no problem.

@vkurchatkin

This comment has been minimized.

Show comment
Hide comment
@vkurchatkin

vkurchatkin Dec 2, 2015

Member

Is there a real reason to use env for this? You can use global.__REACT_ENV the same way

Member

vkurchatkin commented Dec 2, 2015

Is there a real reason to use env for this? You can use global.__REACT_ENV the same way

@STRML

This comment has been minimized.

Show comment
Hide comment
@STRML

STRML Dec 2, 2015

Contributor

It's possible. It appears to be the attitude of the devs that they want to avoid polluting globals whenever possible.

Despite that, this is a more widespread problem that doesn't just affect React.

Contributor

STRML commented Dec 2, 2015

It's possible. It appears to be the attitude of the devs that they want to avoid polluting globals whenever possible.

Despite that, this is a more widespread problem that doesn't just affect React.

@jtlapp

This comment has been minimized.

Show comment
Hide comment
@jtlapp

jtlapp Jun 1, 2016

The performance penalty is still not mentioned in the docs. On the contrary, because changes don't affect the shell, the docs seem to imply that env is just like any other JS object.

jtlapp commented Jun 1, 2016

The performance penalty is still not mentioned in the docs. On the contrary, because changes don't affect the shell, the docs seem to imply that env is just like any other JS object.

@bnoordhuis

This comment has been minimized.

Show comment
Hide comment
@bnoordhuis

bnoordhuis Jun 1, 2016

Member

because changes don't affect the shell

They do, child processes created with the child_process module inherit the modified environment unless told otherwise.

Are you thinking of seeing changes to process.env reflected in the parent shell? That's not how setenv(3) and friends work. No program works like that.

Member

bnoordhuis commented Jun 1, 2016

because changes don't affect the shell

They do, child processes created with the child_process module inherit the modified environment unless told otherwise.

Are you thinking of seeing changes to process.env reflected in the parent shell? That's not how setenv(3) and friends work. No program works like that.

@jtlapp

This comment has been minimized.

Show comment
Hide comment
@jtlapp

jtlapp Jun 1, 2016

Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?

jtlapp commented Jun 1, 2016

Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?

@bnoordhuis

This comment has been minimized.

Show comment
Hide comment
@bnoordhuis

bnoordhuis Jun 1, 2016

Member

We accept pull requests, you're welcome to have a stab at improving the documentation.

Having said that, any JS object can be "magic" and really be a bunch of C++ code; process.env is not special in that respect.

Member

bnoordhuis commented Jun 1, 2016

We accept pull requests, you're welcome to have a stab at improving the documentation.

Having said that, any JS object can be "magic" and really be a bunch of C++ code; process.env is not special in that respect.

@jtlapp

This comment has been minimized.

Show comment
Hide comment
@jtlapp

jtlapp Jun 1, 2016

I'd have to investigate the node.js code to have confidence making a statement. Several people in this discussion are already claiming to know how it works. I'm hoping that someone who wants to be helpful wouldn't mind taking a minute to update the repo they already have checked out.

jtlapp commented Jun 1, 2016

I'd have to investigate the node.js code to have confidence making a statement. Several people in this discussion are already claiming to know how it works. I'm hoping that someone who wants to be helpful wouldn't mind taking a minute to update the repo they already have checked out.

gcanti added a commit to gcanti/io-ts that referenced this issue Nov 28, 2017

docs: cache NODE_ENV in unsafe validate example
Just a small documentation update, if you're open to it. Suggesting that `NODE_ENV` be "cached" in the example `unsafeValidate` method.

Reasoning / discussions here:
* nodejs/node#3104
* nodejs/node#1648

Unless you'd expect `NODE_ENV` to be editable while the application is running (debugging on a live server, perhaps?)
@JDiPierro

This comment has been minimized.

Show comment
Hide comment
@JDiPierro

JDiPierro Aug 20, 2018

It looks like Node is caching process.env nowadays:

⌚ 15:46:00
$ export FOO=one

⌚ 15:46:07
$ export BAR=two

⌚ 15:46:09
$ export BAZ=three

⌚ 15:46:12
$ node
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.849ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.224ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.076ms
undefined

> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.085ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.077ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.085ms
undefined
$ node --version
v10.6.0

JDiPierro commented Aug 20, 2018

It looks like Node is caching process.env nowadays:

⌚ 15:46:00
$ export FOO=one

⌚ 15:46:07
$ export BAR=two

⌚ 15:46:09
$ export BAZ=three

⌚ 15:46:12
$ node
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.849ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.224ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.076ms
undefined

> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.085ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.077ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.085ms
undefined
$ node --version
v10.6.0
@a-stepanenko

This comment has been minimized.

Show comment
Hide comment
@a-stepanenko

a-stepanenko Sep 20, 2018

I'd like to warn you that previous comment is not the case. Node.js does not cache env vars, so each time you use process.env.VAR you actually call getnev(). This might be a performance issue if you have lots of env vars (this is especially important in case you run your node.js app in k8s since k8s creates env vars for all endpoints in a namespace, see kubernetes/kubernetes#60099 for details)

Proof (running node.js which prints some env var in an endless loop, attaching to the node.js process via gdb and changing this var):

root@it100msk:~# nodejs -v
v10.11.0
root@it100msk:~# cat test.js
while (true) {
    console.log(process.env.var1);
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 3000);
}
root@it100msk:~# export var1=1
root@it100msk:~# nodejs test.js
1
1
1
1
1

another console on the same host
root@it100msk:~# ps auxf | grep node
root     20226  0.0  0.0  15648  1032 pts/2    S+   02:16   0:00  |       \_ grep --color=auto node
root     20200  0.6  0.1 557612 30580 pts/1    Sl+  02:16   0:00          \_ nodejs test.js
root@it100msk:~# gdb
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 20200
Attaching to process 20200
[New LWP 20201]
[New LWP 20202]
[New LWP 20203]
[New LWP 20204]
[New LWP 20205]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8cc38b4ed9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7ffe46e9cd00, expected=0, futex_word=0x4498250) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
142	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) call putenv("var1=2")
$1 = 0
(gdb) detach
Detaching from program: /usr/bin/node, process 20200
(gdb) quit

back to the first console
2
2
2
2
2

a-stepanenko commented Sep 20, 2018

I'd like to warn you that previous comment is not the case. Node.js does not cache env vars, so each time you use process.env.VAR you actually call getnev(). This might be a performance issue if you have lots of env vars (this is especially important in case you run your node.js app in k8s since k8s creates env vars for all endpoints in a namespace, see kubernetes/kubernetes#60099 for details)

Proof (running node.js which prints some env var in an endless loop, attaching to the node.js process via gdb and changing this var):

root@it100msk:~# nodejs -v
v10.11.0
root@it100msk:~# cat test.js
while (true) {
    console.log(process.env.var1);
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 3000);
}
root@it100msk:~# export var1=1
root@it100msk:~# nodejs test.js
1
1
1
1
1

another console on the same host
root@it100msk:~# ps auxf | grep node
root     20226  0.0  0.0  15648  1032 pts/2    S+   02:16   0:00  |       \_ grep --color=auto node
root     20200  0.6  0.1 557612 30580 pts/1    Sl+  02:16   0:00          \_ nodejs test.js
root@it100msk:~# gdb
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 20200
Attaching to process 20200
[New LWP 20201]
[New LWP 20202]
[New LWP 20203]
[New LWP 20204]
[New LWP 20205]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8cc38b4ed9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7ffe46e9cd00, expected=0, futex_word=0x4498250) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
142	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) call putenv("var1=2")
$1 = 0
(gdb) detach
Detaching from program: /usr/bin/node, process 20200
(gdb) quit

back to the first console
2
2
2
2
2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment