Upload lock held for longer than necessary #11083
Labels
performance
remote build
The SSH store, ssh:, ssh-ng:, ... (split from protocol label 2024-07)
store
Issues and pull requests concerning the Nix store
Describe the bug
When building with remote builders, very often only one build will make progress; other ones are stuck for quite a long time. With
sudo lslocks | grep upload
, you can see that processes are waiting on the.upload-lock
for the machine (and stracing them confirms that they're blocked onflock(5, LOCK_EX
). This can take a very long time. Interestingly, stracing the process that does have the lock seems to indicate that it's past the upload phase anyway - I see hundred of thousands lines withtype
105:This could I suppose this could be happening in parallel to trying to copy files, though I somewhat doubt that.
Steps To Reproduce
.upload-lock
s inlslocks
Expected behavior
I expect the builds to start building more quickly
nix-env --version
outputnix-env (Nix) 2.18.4
Additional context
Some investigation shows this is happening here. The original motivation for this logic, according to comments in the Perl precursor to this module from a decade ago, is to prevent multiple processes from trying to copy the same derivation over and over again.
It seems like the lock is potentially held too long. But moreover, it's too "big" a lock - we should probably only have a lock per store path + remote. And the alarm of 15 minutes also seems very long.
Priorities
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: