Fix running Cargo concurrently #2486

alexcrichton · 2016-03-15T05:35:23Z

Cargo has historically had no protections against running it concurrently. This
is pretty unfortunate, however, as it essentially just means that you can only
run one instance of Cargo at a time globally on a system.

An "easy solution" to this would be the use of file locks, except they need to
be applied judiciously. It'd be a pretty bad experience to just lock the entire
system globally for Cargo (although it would work), but otherwise Cargo must be
principled how it accesses the filesystem to ensure that locks are properly
held. This commit intends to solve all of these problems.

A new utility module is added to cargo, util::flock, which contains two types:

FileLock - a locked version of a File. This RAII guard will unlock the
lock on Drop and I/O can be performed through this object. The actual
underlying Path can be read from this object as well.
Filesystem - an unlocked representation of a Path. There is no "safe"
method to access the underlying path without locking a file on the filesystem
first.

Built on the fs2 library, these locks use the flock system call on Unix and
LockFileEx on Windows. Although file locking on Unix is documented as not so
great, but largely only because of NFS, these are just advisory, and
there's no byte-range locking. These issues don't necessarily plague Cargo,
however, so we should try to leverage them. On both Windows and Unix the file
locks are released when the underlying OS handle is closed, which means that
if the process dies the locks are released.

Cargo has a number of global resources which it now needs to lock, and the
strategy is done in a fairly straightforward way:

Each registry's index contains one lock (a dotfile in the index). Updating the
index requires a read/write lock while reading the index requires a shared
lock. This should allow each process to ensure a registry update happens while
not blocking out others for an unnecessarily long time. Additionally any
number of processes can read the index.
When downloading crates, each downloaded crate is individually locked. A lock
for the downloaded crate implies a lock on the output directory as well.
Because downloaded crates are immutable, once the downloaded directory exists
the lock is no longer needed as it won't be modified, so it can be released.
This granularity of locking allows multiple Cargo instances to download
dependencies in parallel.
Git repositories have separate locks for the database and for the project
checkout. The datbase and checkout are locked for read/write access when an
update is performed, and the lock of the checkout is held for the entire
lifetime of the git source. This is done to ensure that any other Cargo
processes must wait while we use the git repository. Unfortunately there's
just not that much parallelism here.
Binaries managed by cargo install are locked by the local metadata file that
Cargo manages. This is relatively straightforward.
The actual artifact output directory is just globally locked for the entire
build. It's hypothesized that running Cargo concurrently in one directory is
less of a feature needed rather than running multiple instances of Cargo
globally (for now at least). It would be possible to have finer grained
locking here, but that can likely be deferred to a future PR.

So with all of this infrastructure in place, Cargo is now ready to grab some
locks and ensure that you can call it concurrently anywhere at any time and
everything always works out as one might expect.

One interesting question, however, is what does Cargo do on contention? On one
hand Cargo could immediately abort, but this would lead to a pretty poor UI as
any Cargo process on the system could kick out any other. Instead this PR takes
a more nuanced approach.

First, all locks are attempted to be acquired (a "try lock"). If this
succeeds, we're done.
Next, Cargo prints a message to the console that it's going to block waiting
for a lock. This is done because it's indeterminate how long Cargo will wait
for the lock to become available, and most long-lasting operations in Cargo
have a message printed for them.
Finally, a blocking acquisition of the lock is issued and we wait for it to
become available.

So all in all this should help Cargo fix any future concurrency bugs with file
locking in a principled fashion while also allowing concurrent Cargo processes
to proceed reasonably across the system.

Closes #354

rust-highfive · 2016-03-15T05:35:29Z

r? @wycats

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2016-03-15T05:35:38Z

r? @brson

alexcrichton · 2016-03-15T05:36:46Z

cc #1764 and #1781, previous attempts to do the same thing

brson · 2016-03-16T00:09:21Z

src/cargo/util/flock.rs

+    })
+}
+
+fn create_dir_all(path: &Path) -> io::Result<()> {


Why does this module have it's own function for this instead of using std?

The version in the standard library doesn't currently handle the NotFound error as it's not intended to be run concurrently (which I have personally advocated for). This is basically the concurrent variant of those functions.

Maybe we should consider adding it to libstd at some point...

brson · 2016-03-16T00:10:41Z

Hard to evaluate how correct it is but it looks awesome. r=me

alexcrichton · 2016-03-17T00:25:18Z

@bors: r=brson

bors · 2016-03-17T00:25:19Z

📌 Commit e3ffa0c has been approved by brson

bors · 2016-03-17T00:43:29Z

☔ The latest upstream changes (presumably #2241) made this pull request unmergeable. Please resolve the merge conflicts.

alexcrichton · 2016-03-17T00:49:13Z

@bors: r=brson e22b08c

bors · 2016-03-17T01:04:36Z

⌛ Testing commit e22b08c with merge 1a5ec9b...

bors · 2016-03-17T01:12:31Z

💔 Test failed - cargo-mac-64

Cargo has historically had no protections against running it concurrently. This is pretty unfortunate, however, as it essentially just means that you can only run one instance of Cargo at a time **globally on a system**. An "easy solution" to this would be the use of file locks, except they need to be applied judiciously. It'd be a pretty bad experience to just lock the entire system globally for Cargo (although it would work), but otherwise Cargo must be principled how it accesses the filesystem to ensure that locks are properly held. This commit intends to solve all of these problems. A new utility module is added to cargo, `util::flock`, which contains two types: * `FileLock` - a locked version of a `File`. This RAII guard will unlock the lock on `Drop` and I/O can be performed through this object. The actual underlying `Path` can be read from this object as well. * `Filesystem` - an unlocked representation of a `Path`. There is no "safe" method to access the underlying path without locking a file on the filesystem first. Built on the [fs2] library, these locks use the `flock` system call on Unix and `LockFileEx` on Windows. Although file locking on Unix is [documented as not so great][unix-bad], but largely only because of NFS, these are just advisory, and there's no byte-range locking. These issues don't necessarily plague Cargo, however, so we should try to leverage them. On both Windows and Unix the file locks are released when the underlying OS handle is closed, which means that if the process dies the locks are released. Cargo has a number of global resources which it now needs to lock, and the strategy is done in a fairly straightforward way: * Each registry's index contains one lock (a dotfile in the index). Updating the index requires a read/write lock while reading the index requires a shared lock. This should allow each process to ensure a registry update happens while not blocking out others for an unnecessarily long time. Additionally any number of processes can read the index. * When downloading crates, each downloaded crate is individually locked. A lock for the downloaded crate implies a lock on the output directory as well. Because downloaded crates are immutable, once the downloaded directory exists the lock is no longer needed as it won't be modified, so it can be released. This granularity of locking allows multiple Cargo instances to download dependencies in parallel. * Git repositories have separate locks for the database and for the project checkout. The datbase and checkout are locked for read/write access when an update is performed, and the lock of the checkout is held for the entire lifetime of the git source. This is done to ensure that any other Cargo processes must wait while we use the git repository. Unfortunately there's just not that much parallelism here. * Binaries managed by `cargo install` are locked by the local metadata file that Cargo manages. This is relatively straightforward. * The actual artifact output directory is just globally locked for the entire build. It's hypothesized that running Cargo concurrently in *one directory* is less of a feature needed rather than running multiple instances of Cargo globally (for now at least). It would be possible to have finer grained locking here, but that can likely be deferred to a future PR. So with all of this infrastructure in place, Cargo is now ready to grab some locks and ensure that you can call it concurrently anywhere at any time and everything always works out as one might expect. One interesting question, however, is what does Cargo do on contention? On one hand Cargo could immediately abort, but this would lead to a pretty poor UI as any Cargo process on the system could kick out any other. Instead this PR takes a more nuanced approach. * First, all locks are attempted to be acquired (a "try lock"). If this succeeds, we're done. * Next, Cargo prints a message to the console that it's going to block waiting for a lock. This is done because it's indeterminate how long Cargo will wait for the lock to become available, and most long-lasting operations in Cargo have a message printed for them. * Finally, a blocking acquisition of the lock is issued and we wait for it to become available. So all in all this should help Cargo fix any future concurrency bugs with file locking in a principled fashion while also allowing concurrent Cargo processes to proceed reasonably across the system. [fs2]: https://github.com/danburkert/fs2-rs [unix-bad]: http://0pointer.de/blog/projects/locking.html Closes rust-lang#354

alexcrichton · 2016-03-17T05:14:33Z

@bors: r=brson

bors · 2016-03-17T05:14:34Z

📌 Commit 8eac1d6 has been approved by brson

bors · 2016-03-17T05:14:36Z

⌛ Testing commit 8eac1d6 with merge c5360c4...

Fix running Cargo concurrently Cargo has historically had no protections against running it concurrently. This is pretty unfortunate, however, as it essentially just means that you can only run one instance of Cargo at a time **globally on a system**. An "easy solution" to this would be the use of file locks, except they need to be applied judiciously. It'd be a pretty bad experience to just lock the entire system globally for Cargo (although it would work), but otherwise Cargo must be principled how it accesses the filesystem to ensure that locks are properly held. This commit intends to solve all of these problems. A new utility module is added to cargo, `util::flock`, which contains two types: * `FileLock` - a locked version of a `File`. This RAII guard will unlock the lock on `Drop` and I/O can be performed through this object. The actual underlying `Path` can be read from this object as well. * `Filesystem` - an unlocked representation of a `Path`. There is no "safe" method to access the underlying path without locking a file on the filesystem first. Built on the [fs2] library, these locks use the `flock` system call on Unix and `LockFileEx` on Windows. Although file locking on Unix is [documented as not so great][unix-bad], but largely only because of NFS, these are just advisory, and there's no byte-range locking. These issues don't necessarily plague Cargo, however, so we should try to leverage them. On both Windows and Unix the file locks are released when the underlying OS handle is closed, which means that if the process dies the locks are released. Cargo has a number of global resources which it now needs to lock, and the strategy is done in a fairly straightforward way: * Each registry's index contains one lock (a dotfile in the index). Updating the index requires a read/write lock while reading the index requires a shared lock. This should allow each process to ensure a registry update happens while not blocking out others for an unnecessarily long time. Additionally any number of processes can read the index. * When downloading crates, each downloaded crate is individually locked. A lock for the downloaded crate implies a lock on the output directory as well. Because downloaded crates are immutable, once the downloaded directory exists the lock is no longer needed as it won't be modified, so it can be released. This granularity of locking allows multiple Cargo instances to download dependencies in parallel. * Git repositories have separate locks for the database and for the project checkout. The datbase and checkout are locked for read/write access when an update is performed, and the lock of the checkout is held for the entire lifetime of the git source. This is done to ensure that any other Cargo processes must wait while we use the git repository. Unfortunately there's just not that much parallelism here. * Binaries managed by `cargo install` are locked by the local metadata file that Cargo manages. This is relatively straightforward. * The actual artifact output directory is just globally locked for the entire build. It's hypothesized that running Cargo concurrently in *one directory* is less of a feature needed rather than running multiple instances of Cargo globally (for now at least). It would be possible to have finer grained locking here, but that can likely be deferred to a future PR. So with all of this infrastructure in place, Cargo is now ready to grab some locks and ensure that you can call it concurrently anywhere at any time and everything always works out as one might expect. One interesting question, however, is what does Cargo do on contention? On one hand Cargo could immediately abort, but this would lead to a pretty poor UI as any Cargo process on the system could kick out any other. Instead this PR takes a more nuanced approach. * First, all locks are attempted to be acquired (a "try lock"). If this succeeds, we're done. * Next, Cargo prints a message to the console that it's going to block waiting for a lock. This is done because it's indeterminate how long Cargo will wait for the lock to become available, and most long-lasting operations in Cargo have a message printed for them. * Finally, a blocking acquisition of the lock is issued and we wait for it to become available. So all in all this should help Cargo fix any future concurrency bugs with file locking in a principled fashion while also allowing concurrent Cargo processes to proceed reasonably across the system. [fs2]: https://github.com/danburkert/fs2-rs [unix-bad]: http://0pointer.de/blog/projects/locking.html Closes #354

bors · 2016-03-17T05:29:10Z

☀️ Test successful - cargo-linux-32, cargo-linux-64, cargo-mac-32, cargo-mac-64, cargo-win-gnu-32, cargo-win-gnu-64, cargo-win-msvc-32, cargo-win-msvc-64

Currently we have several lines of work: - Integrating with the kernel tree and build system (Nick's & mine, both uploaded as branches, based on `rustc`; and another one I have been working on, based on `cargo`). - Bindings and the first bits of functionality (Alex's & Geoffrey's, based on `cargo`). This patch effectively merges the work we have been doing and integrates it in the latest mainline kernel tree. This does *not* mean anything needs to stay as-is, but this gives us a working, common base to work and experiment upon. Hopefully, it will also attract external people to join! As a summary, I added: - `cargo` integration with the kernel `Makefile`s: + Virtual `cargo` workspace to have a single lock file and to share deps between `cargo` jobs. + Picks the same optimization level as configured for C. + Verbose output on `V=1`. + A `cargoclean` target to clean all the Rust-related artifacts. - Initial support for built-in modules (i.e. `Y` as well as `M`): + It is a hack, we need to implement a few things (see the `TODO`s), but it is enough to start playing with things that depend on `MODULE`. + Passes `--cfg module` to built-in modules to be able to compile conditionally inside Rust. + Increased `KSYM_NAME_LEN` length to avoid warnings due to Rust long mangled symbols. - Rust infrastructure in a new top level folder `rust/`: + A `kernel` package which contains the sources from Alex & Geoffrey's work plus some changes: * Adapted `build.rs`. * Removed the `THIS_MODULE` emulation until it is implemented. * Removed `Makefile` logic and the code that `cfg`-depended on kernel version (no need in mainline). * Moved the helpers to be triggered via normal compilation, renamed them under `rust_*` and exported via `EXPORT_SYMBOL` instead. * Added a prelude. + A `shlex` package which serves as an example of an "inline" dependency (i.e. package checked out copy to avoid the network) - The example driver was setup at `drivers/char/rust_example/`. - Misc + The beginning of `Documentation/rust/` with a quick start guide. + `MAINTAINERS` entry. + SPDXs for all files. Other notes that aren't in `TODO`s: - We could minimize the network requirements (use `bindgen` binary, use more inline dependencies...), but it is not clear it would be a good idea for the moment due to `core`/`alloc`/`compiler-builtins`. - The intention of `rust/` is to have a place to put extra dependencies and split the `kernel` package into several in the future if it grows. It could resemble the usual kernel tree structure. - With several drivers being built-in, `cargo` recompiles `kernel` and triggers duplicate symbol errors when merging thin archives. We need to make it only compile `kernel` once. - When the above works, then `make`'s `-j` calling concurrent `cargo`s (e.g. several drivers at the same time) should be OK, I think. According to rust-lang/cargo#2486 it shouldn't miscompile anything, but if they start locking on each other we will have make jobs waiting that could have been doing something else. Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

rust-highfive assigned wycats Mar 15, 2016

rust-highfive assigned brson and unassigned wycats Mar 15, 2016

brson reviewed Mar 16, 2016
View reviewed changes

alexcrichton force-pushed the flock branch 3 times, most recently from 97c6a31 to e3ffa0c Compare March 17, 2016 00:25

alexcrichton force-pushed the flock branch from e3ffa0c to e22b08c Compare March 17, 2016 00:48

alexcrichton force-pushed the flock branch from e22b08c to 8eac1d6 Compare March 17, 2016 05:14

bors merged commit 8eac1d6 into rust-lang:master Mar 17, 2016

This was referenced Mar 17, 2016

Support relative URLs, closes #2252 #2253

Closed

Network retry issue 1602 #2396

Merged

Update to rust-url 1.0 #2428

Closed

alexcrichton mentioned this pull request Mar 17, 2016

assertion failed: checkout.is_fresh() #2271

Closed

colin-kiegel mentioned this pull request Mar 18, 2016

Spawning rustc process on every save AtomLinter/linter-rust#16

Closed

alexcrichton deleted the flock branch March 28, 2016 05:52

jfhumann mentioned this pull request Mar 30, 2016

Cargo does not recover from failed git update (File lock issue?) #2533

Closed

japaric mentioned this pull request Apr 2, 2016

can't run xargo concurrently japaric/xargo#5

Closed

alexcrichton mentioned this pull request Apr 5, 2016

Use lockfile on incr. comp. directory rust-lang/rust#32754

Closed

retrry mentioned this pull request Apr 19, 2016

buildRustPackage is not very nix-ish NixOS/nixpkgs#11144

Closed

ojeda mentioned this pull request Sep 4, 2020

Initial Rust support Rust-for-Linux/linux#4

Merged

tjkirch mentioned this pull request Mar 5, 2021

os: run static and non-static Rust builds in parallel bottlerocket-os/bottlerocket#1368

Merged

idelvall mentioned this pull request Feb 12, 2024

Remove target locking earthly/lib#43

Closed

l4l mentioned this pull request Jun 12, 2024

Sharing TARGET_DIR leads to spurious missing items #14053

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix running Cargo concurrently #2486

Fix running Cargo concurrently #2486

alexcrichton commented Mar 15, 2016

rust-highfive commented Mar 15, 2016

alexcrichton commented Mar 15, 2016

alexcrichton commented Mar 15, 2016

brson Mar 16, 2016

alexcrichton Mar 16, 2016

brson commented Mar 16, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

Fix running Cargo concurrently #2486

Fix running Cargo concurrently #2486

Conversation

alexcrichton commented Mar 15, 2016

rust-highfive commented Mar 15, 2016

alexcrichton commented Mar 15, 2016

alexcrichton commented Mar 15, 2016

brson Mar 16, 2016

Choose a reason for hiding this comment

alexcrichton Mar 16, 2016

Choose a reason for hiding this comment

brson commented Mar 16, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

alexcrichton commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016

bors commented Mar 17, 2016