fix(server): prevent intermittent hanging when requesting a tarball worker #7041
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #7038. When running
pnpm server start
, the tarball fetcher worker pool (introduced in #6850) is immediately marked as "finishing" by this statement.pnpm/pnpm/src/main.ts
Line 300 in 131c723
The server continues running and handles requests after this point, which causes a few problems:
The exact conditions of the hanging were observed in #7038 (comment). In summary, the hanging happens if a worker is requested in between the time that a different worker has been marked for cleanup, but before the
exit
event for that other worker has fired.It's a bit surprising to me that the worker pool will accept new work after
finishAsync
is called. I'm not sure if that's intentional.Reproducing the hanging
The hanging can be reproduced consistently by forcing some contrived settings:
exit
event handler's logic.WorkerPool.ts#L209-L214
1
. This is the value CI on this repo evaluates to.pnpm/worker/src/index.ts
Line 22 in 494f875
After performing above, running
pnpm server start
andpnpm install is-positive@1.0.0
will reproduce the hanging.Although this is fairly contrived, this was happening very consistently on pnpm's CI jobs.
Changes
The main idea behind this PR is to delay the
finishAsync
call until the server is exiting throughCtrl-C
or aPOST /exit
request.pnpm/pnpm/src/main.ts
Line 300 in 131c723
I'm not sure if this is the best approach. I'd welcome suggestions around an alternative fix.
Alternatives
We could also set up a check around the
global.finishWorkers?.()
call and avoid running it if the current process is driving a pnpm server. This felt a bit more hacky to me though. To check if the current process is a pnpm server, information needs to be propagated back up the command handler chain or some kind of sideways information passing. In theory there are other situations where we'd want to avoidglobal.finishWorkers?.()
from running as well.