Do not GC the current active incremental session directory #147821
+9
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
when building a relatively large repo (https://github.com/oxidecomputer/omicron) on illumos under heavy CPU pressure, i saw some rustc invocations die like:
a bit of debugging later and it seems that if the system is very slow, Unix-flavored
flock::Lock::new()
doesn't quite get the mutual exclusiongarbage_collect_session_directories
expects. before this patch i could reproduce this with the cratenexus_db_queries
(in that repo) by pinning the fullcargo build
to one core and having a busy loop fighting on that same core. with this patch i cannot reproduce the issue. i took a look at howflock::Lock
is used and i think this is the only problematic use, so i figure i'll propose this change particularly since i don't think file locking can be made.. good... for Unix in general.In
setup_dep_graph
, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized noris_old_enough_to_be_collected
.Unfortunately,
is_old_enough_to_be_collected
is a simple time check, and ifload_dep_graph
is slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be eligible to GC, we try toflock::Lock
it as proof it is not owned by anyone else, and so is a stale working directory.Because we hold the lock in the same process, the behavior of
flock::Lock
is dependent on platform-specifics about file locking APIs.fcntl(F_SETLK)
-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process.fcntl_locking(2)
on Linux describes some relevant problems:fcntl
-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error likeThe release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely.
flock(LOCK_EX)
doesn't seem to have these same issues, and becauseflock::Lock::new
always opens a new file description when locking, I don't think Linux can have this issue.From reading
LockFileEx
on MSDN I think Windows has locking semantics similar toflock
, but I haven't tested there at all.My conclusion is that there is no way to write a pure-POSIX
flock::Lock::new
which guarantees mutual exclusion across different file descriptions of the same file in the same process, andflock::Lock::new
must not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our ownsess
is evidence we're alive and using it.