You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a proposal for two Rush changes to improve the developer experience in a large monorepo where postinstall scripts often fail:
Faster retries by separating the install and postinstall operations, so that postinstall can be retried without redoing pnpm install
Avoid unnecessary postinstalls by exposing PNPM's --ignore-scripts parameter for rush install and rush update. (When is a postinstall "unnecessary"? For example you need to install that package as part of rush update --full, but you aren't going to compile any projects that use it.)
(The term "postinstall" in this proposal includes multiple operations; see clarification below.)
Background
My employer has a large monorepo where postinstall fails unusually often. The root causes:
the monorepo has lots of recently imported projects with different tech stacks and versions, so the number of NPM packages with postinstall is unusually large
we also have multiple PNPM lockfiles, causing the same postinstall to run multiple times
CI jobs run in geolocations with unreliable network connections to USA CDNs -- postinstall scripts often download large binaries without retry mechanisms, and without providing a way to redirect their URL to a geolocated mirror
A full rush install of the monorepo does 3 kinds of work:
PNPM validates the lockfile, and in the case of rush update PNPM calculates an updated lockfile
PNPM downloads tarballs from the NPM registry and extracts them to make node_modules folders
Lastly, PNPM runs the "install", "postinstall", "prepare", etc lifecycle scripts for some NPM package dependencies. Typically these "postinstalls" download assets and/or invoke native compilers.
If step 3 fails, then we need to retry rush install repeatedly, for example to reattempt a failed download, or to find out if apt-get install has fixed a compile failure. Each iteration redoes all three steps from the beginning, which is painful because steps 1+2 are very slow.
What PNPM implements
PNPM already provides a mechanism for separating install+postinstall. It works like this:
# Install everything but skip postinstall
pnpm install --ignore-scripts
# Run the postinstall scripts for packages whose postinstall # has not already completed successfully
pnpm rebuild --pending
This "pending" status is tracked in ./node_modules/.modules.yaml in a pendingBuilds field, for example:
Let's ALWAYS add --ignore-scripts to the above command line, and call that Stage 1. After it completes, we run Stage 2 which is pnpm rebuild --pending. This way if a postinstall script fails, then Rush can skip Stage 1 and go straight to Stage 2.
To be clear, if you invoke rush install or rush update without --ignore-scripts, we're proposing that PNPM will still be invoked with --ignore-scripts:
Suppose that project X depends on a package D whose postinstall would fail on my computer. I don't want to fix this failure. (For example maybe the fix involves compiling Python 3 for my old Debian 9 box, which is pointless if my local work won't use this NPM package, or if I can rely on a CI job to build that project.)
If I want to work on project Y that does NOT depend on D, then a subset install will avoid the problem: rush install --to Y
However as part of working on project X, maybe I need to run rush update --full for the entire monorepo. This will need to install D. But the postinstall for D is NOT relevant to my work, because this operation only cares about versions/lockfiles. I will not compile project Y.
Let's expose --ignore-scripts to rush update:
# Regenerate the lockfile for the entire monorepo,
# package D gets installed but not postinstalled
rush update --full --ignore-scripts
# Perform postinstall for dependencies of Y so that I
# can compile Y. (But do NOT postinstall D because I
# will not compile X.)
rush install --to Y
Design considerations
In a world where postinstall always succeeds, deferring pnpm rebuild to Stage 2 might be slightly slower, because it cannot run in parallel with other downloads. (Does PNPM actually parallelize that?) Maybe we should provide a setting to opt-in to this two-stage install? Something like robustPostinstalls=true
What about NPM and Yarn? It seems that neither Yarn Classic nor NPM implement the pnpm rebuild --pending functionality. Yarn Plug'n'Play does implement it, but is not yet supported by Rush. After some consideration, we're okay with this being a PNPM-only feature for now. This proposal doesn't take away any functionality for Yarn/NPM nor does it enable any new scenarios; it merely provides a tool to improve reliability of existing operations.
How does rush install know whether postinstall has run already? We could store this state in a file similar to last-install.flag. We could also parse ./node_modules/.modules.yaml and check whether pendingBuilds is empty, but it would be better to avoid reliance on PNPM internals.
How does rush install --to X know whether postinstall has run for the subset of dependencies needed by X? A few possibilities:
a. Rush could analyze the transitive dependencies of X.
b. Rush could spawn pnpm build --pending --filter ...X every time? This will regress the execution time for an already up to date rush install --to X, but maybe that is negligible?
c. We could disable the two-stage installation in the case of subset installs.
Clarification: In this proposal, the term "postinstall" used as a general term for invoking several different lifecycle scripts (preinstall, install, postinstall, prepublish, prepare) and also recreating the node_modules/.bin scripts.
This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?
This seems useful to me, but wouldn't Part B be better handled by rush install --to <project you're working on> to avoid installing the problematic dependency at all?
@iclanton In the example, the person cannot solve it with rush install. Here's a more concrete scenario:
I need to upgrade the jest-canvas-mock dependency for project Y because it has a new API that I need for my unit tests
But ensureConsistent versions requires me to upgrade all the other projects in the monorepo to have a consistent version.
rush update fails because the postinstall failed for optipng-bin used by project X. The failing code looks like:
try{// From https://sourceforge.net/projects/optipng/files/OptiPNG/awaitbinBuild.file(path.resolve(__dirname,'../vendor/source/optipng.tar.gz'),[`./configure --with-system-zlib --prefix="${bin.dest()}" --bindir="${bin.dest()}"`,🤦♂️'make install'🤦♂️]);console.log('optipng built successfully');}catch(error){
It could take 30 minutes to track down the right prerequisites to make that succeed on my VM. But that's irrelevant to my work -- an upgrade of jest-canvas-mock is unlikely to break project Y, and if it did happen we'll find out when the CI job builds it.
Thus, I need rush update to regenerate the lockfile, whereas the postinstall of optipng-bin is irrelevant.
(Ideally we need to eliminate optipng-bin/lib/install.js from our build, but... one step at a time.🙃 )
Summary
This is a proposal for two Rush changes to improve the developer experience in a large monorepo where postinstall scripts often fail:
pnpm install
--ignore-scripts
parameter forrush install
andrush update
. (When is a postinstall "unnecessary"? For example you need to install that package as part ofrush update --full
, but you aren't going to compile any projects that use it.)(The term "postinstall" in this proposal includes multiple operations; see clarification below.)
Background
My employer has a large monorepo where postinstall fails unusually often. The root causes:
A full
rush install
of the monorepo does 3 kinds of work:rush update
PNPM calculates an updated lockfilenode_modules
folders"install"
,"postinstall"
,"prepare"
, etc lifecycle scripts for some NPM package dependencies. Typically these "postinstalls" download assets and/or invoke native compilers.If step 3 fails, then we need to retry
rush install
repeatedly, for example to reattempt a failed download, or to find out ifapt-get install
has fixed a compile failure. Each iteration redoes all three steps from the beginning, which is painful because steps 1+2 are very slow.What PNPM implements
PNPM already provides a mechanism for separating install+postinstall. It works like this:
This "pending" status is tracked in
./node_modules/.modules.yaml
in apendingBuilds
field, for example:Proposed Solution
The solution has two parts:
Part A: Faster retries
When you run
rush install
, it invokes PNPM something like this:Let's ALWAYS add
--ignore-scripts
to the above command line, and call that Stage 1. After it completes, we run Stage 2 which ispnpm rebuild --pending
. This way if a postinstall script fails, then Rush can skip Stage 1 and go straight to Stage 2.To be clear, if you invoke
rush install
orrush update
without--ignore-scripts
, we're proposing that PNPM will still be invoked with--ignore-scripts
:rush install
rush install --ignore scripts
Part B: Avoid unnecessary postinstalls
Suppose that project
X
depends on a packageD
whose postinstall would fail on my computer. I don't want to fix this failure. (For example maybe the fix involves compiling Python 3 for my old Debian 9 box, which is pointless if my local work won't use this NPM package, or if I can rely on a CI job to build that project.)Y
that does NOT depend onD
, then a subset install will avoid the problem:rush install --to Y
X
, maybe I need to runrush update --full
for the entire monorepo. This will need to installD
. But the postinstall for D is NOT relevant to my work, because this operation only cares about versions/lockfiles. I will not compile projectY
.Let's expose
--ignore-scripts
to rush update:Design considerations
In a world where postinstall always succeeds, deferring
pnpm rebuild
to Stage 2 might be slightly slower, because it cannot run in parallel with other downloads. (Does PNPM actually parallelize that?) Maybe we should provide a setting to opt-in to this two-stage install? Something likerobustPostinstalls=true
What about NPM and Yarn? It seems that neither Yarn Classic nor NPM implement the
pnpm rebuild --pending
functionality. Yarn Plug'n'Play does implement it, but is not yet supported by Rush. After some consideration, we're okay with this being a PNPM-only feature for now. This proposal doesn't take away any functionality for Yarn/NPM nor does it enable any new scenarios; it merely provides a tool to improve reliability of existing operations.How does
rush install
know whether postinstall has run already? We could store this state in a file similar tolast-install.flag
. We could also parse./node_modules/.modules.yaml
and check whetherpendingBuilds
is empty, but it would be better to avoid reliance on PNPM internals.How does
rush install --to X
know whether postinstall has run for the subset of dependencies needed by X? A few possibilities:a. Rush could analyze the transitive dependencies of X.
b. Rush could spawn
pnpm build --pending --filter ...X
every time? This will regress the execution time for an already up to daterush install --to X
, but maybe that is negligible?c. We could disable the two-stage installation in the case of subset installs.
CC @chengcyber
The text was updated successfully, but these errors were encountered: