Avoid buffering large amounts of rustc output. #7838

ehuss · 2020-01-26T23:23:21Z

If rustc prints out a lot of information (such as with RUSTC_LOG, or a huge number of diagnostics), cargo would buffer up large amounts of that in memory. For normal builds, this would happen if the terminal does not print fast enough. For "fresh" replay, everything was being buffered.

There are two issues:

There is no back-pressure on the mpsc queue. If messages come in faster than they can be processed, it grows without bounds.
The cache-replay code runs in the "fresh" code path which does not spawn a thread. Thus the main thread was blocked and unable to process Messages while the replay is happening.

The solution here is to use a bounded queue, and to always spawn a thread for the "fresh" case.

The main concern here is performance. Previously the "fresh" jobs avoided spawning a thread to improve performance. I did a fair bit of profiling to understand the impact, using projects with anywhere from 100 to 500 units. On my macOS machine, I found spawning a thread to be slightly faster (1-5%). On Linux and Windows, it was generally about 0 to 5% slower. It might be helpful for others to profile it on their own system.

I'm on the fence for the cost/benefit here. It seems generally good to reduce memory usage, but the slight performance hit is disappointing. I tried several other approaches to fix this, all with worse trade offs (I can discuss them if interested).

Fixes #6197

rust-highfive · 2020-01-26T23:23:24Z

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

Eh2406 · 2020-01-27T02:40:53Z

The code looks ok to r+ when we are done with the discussion.

alexcrichton · 2020-01-28T07:14:25Z

Agreed that the concern here is perf, and my "benchmark" is to check out servo, do cargo build, and then see how long it takes to do that again. I did some quick testing locally though and also haven't been able to see much of a difference.

I think this is a case where the difference in perf, if any, is going to be pretty small (as measured by @ehuss). I also think it's best to be "more correct" in terms of not blocking the main thread and not hogging tons of memory.

Before I r+, though, I wanted to clarify something. When we buffer huge amounts of output here, is Cargo actually going to print all the output? Or is cargo buffering it in one location and later deciding to not print it? (if this is the case this seems like a better bug to fix, but if we're actually destined to print everything we read then we're just optimizing here what we're already printing).

Mark-Simulacrum · 2020-01-28T12:36:55Z

I would like to see a comment on the sync_channel pointing at this PR, at least.

Did you choose 100 as the bound for the queue length for some reason, or just "some number"? I'm wondering if that will fail to work well with the jobserver-per-rustc flag, as in that scenario on a ~16 core machine we would expect 16*15 = 240 token requests to come in from all the rustc processes fairly quickly (and keep coming in, during the first few seconds of the build). For that case, we actually don't need to do anything with those requests (most will be ~immediately dropped on the floor, as by the time we get to them the process is already done I imagine), but if this length limit to 100 causes us to stall out and miss "Finished" events that could seriously slow Cargo down.

(To be clear, I don't think fixing the above is necessary, it's an unstable flag for a reason -- but wanted to dump my thoughts somewhere at least :)

alexcrichton · 2020-01-29T08:32:36Z

As we continue to scrutinize the channels in Cargo I'm becoming a bit more wary of making sends blocking. I think we may want to do a quick audit as well where cargo sends messages to see if it's ok to block. The areas I can see are:

All methods on JobState may block now. This notably includes the stdout printing and such (the whole point of this PR). I think this is mostly ok but we may experience some timing issues depending on when timestamps get written and such. For example rustc may finish but it may take us quite some time to later touch a file to make a timestamp. I don't think this is a problem, but figured it'd be worth mentioning.
Sending tokens from the jobserver helper thread to the main thread. @ehuss this is like the interaction you were seeing although I would have expected deadlock rather than blocking for just awhile. In any case this is pretty sketchy because the jobserver helper thread needs to be shut down but it's blocking here where it didn't expect to be blocking. (note that the signals are intended to interrupt the blocking read call, if one is active)
Handling diagnostics which I think is basically the same bug as the jobserver helper thread all over again. We want to terminate the thread eventually but doing so may be problematic if it's blocking where we didn't expect it to block.

I suppose though the "maybe issues" are in practice not ever going to arise because we in theory should never start shut down until the whole message queue has been drained.

alexcrichton · 2020-01-29T08:48:33Z

Reading more of @Mark-Simulacrum's comment as well, I think it's actually a pretty good point. I'm wondering now if it might be best to have a more surgical fix here where we rate-limit stdout information getting printed but not rate limit other more high-priority messages. For example everything about jobserver management is a pretty high-priority message (or anything related to scheduling) whereas printing things is informational and can happen whenever.

We could perhaps consider a fix where there's a fixed capacity, above which print messages block the sender, but that's it. All other messages (such as scheduling things) are unconditionally sent and never block

Mark-Simulacrum · 2020-01-29T11:21:28Z

One similar option perhaps is to try and move all stderr/stdout printing to a separate thread. AFAICT, it doesn't interact with the scheduling pretty much at all. It also seems like 90% of the problem comes from the fact that currently all Fresh jobs (whose output is on disk) are loading it into memory and sending it over the channel. Can we instead make the Message event have two variants, one of which we'd thread down as deeply as possible and then stream from disk to stderr/out? Ideally that would avoid most buffering, whereas today I believe some buffering is sort of unavoidable (i.e. a single message could be 20 megabytes for larger crates).

ehuss · 2020-01-29T21:58:35Z

When we buffer huge amounts of output here, is Cargo actually going to print all the output? Or is cargo buffering it in one location and later deciding to not print it?

It is going to be printed.

Did you choose 100 as the bound for the queue length for some reason, or just "some number"?

100 is pretty arbitrary. Messages can be large (multiple kB), so I figured keeping up to a few megabytes in memory seemed like a good limit.

Your concerns about large numbers of token messages sounds reasonable.

more surgical fix here where we rate-limit stdout information getting printed but not rate limit other more high-priority messages

This sounds good to me. I actually started with a different design where I had two separate queues, one for stdout/stderr, and one for everything else. But it ended up being quite a bit more complex. (I also had a branch where Shell is in a mutex and I removed Message::Stdout/Stderr, but it made the progress bar flicker too much.)

I'll try to take some time and digest your comments. I think you're both right, and this should probably have a different solution.

bors · 2020-01-30T08:20:15Z

☔ The latest upstream changes (presumably #7844) made this pull request unmergeable. Please resolve the merge conflicts.

ehuss · 2020-03-06T02:34:46Z

I pushed a different approach using two channels.

This change is somewhat risky, since there are some really subtle behaviors here. I've tried to think of all that could go wrong, and haven't come up with anything, yet. All other solutions I've thought of tend to be more complicated and riskier.

Only one change that I can think of: The message "build failed, waiting for other jobs to finish..." can now be printed in-between messages, where previously it would be printed after the faulty job finished. I'm not sure how likely that is, or whether it really matters.

alexcrichton · 2020-03-06T21:31:43Z

Once we get into the realm of multiple channels I agree it's pretty hairy. Could we stick with one channel though? We could implement our own simple channel which is just a wrapper around Arc<Mutex<Vec<T>>> and we have two methods, one which always pushes and one which waits to push until the list is under a certain amount?

I think that would ideally help keep the concurrency here pretty simple since it's still just one queue of channels going out.

ehuss · 2020-03-06T21:44:10Z

You mean #7845? I'd be much happier with that.

alexcrichton · 2020-03-06T21:57:14Z

Effectively, yeah, I don't think it's worth trying to bend over backwards to use crates.io for a simple channel here, especially if it's adding a lot of complexity in thinking about the concurrency here.

We don't need the complexity of most channels since this is not a performance sensitive part of Cargo, nor is it likely to be so any time soon. Coupled with recent bugs (rust-lang#7840) we believe in `std::sync::mpsc`, let's just not use that and use a custom queue type locally which should be amenable to a blocking push soon too.

ehuss · 2020-03-08T18:27:47Z

I have pushed a new approach that uses #7845 instead. I'm still not sure how I feel about it. I can't think of specific problems. I ran a variety of performance tests, and it was roughly the same.

alexcrichton · 2020-03-10T14:40:20Z

src/cargo/core/compiler/job_queue.rs

        }
+        scope.spawn(move |_| doit());


I think this change may no longer be necessary, but did you want to include it anyway here?

It is necessary, otherwise the cached message playback would deadlock if there were more than 100 messages. The playback shouldn't happen on the main thread, otherwise there is nothing to drain messages while they are added to the queue.

Ah right yeah, forgot about that!

I added a test for message caching to check for deadlock.

alexcrichton · 2020-03-10T14:42:35Z

src/cargo/util/queue.rs

+    /// Pushes an item onto the queue, blocking if the queue is full.
+    pub fn push_bounded(&self, item: T) {
+        let mut state = self.state.lock().unwrap();
+        loop {


This might be able to make use of the nifty wait_until method:

let state = self.bounded_cv.wait_until(state, |s| s.items.len() < self.bound).unwrap();

Didn't know that existed!

alexcrichton · 2020-03-10T14:46:59Z

src/cargo/util/queue.rs

+            // Assumes threads cannot be canceled.
+            self.bounded_cv.notify_one();
+        }
+        Some(value)


This might actually also get cleaned up a good amount with wait_timeout_until

let (mut state, result) = self.popper_cv.wait_timeout_until( self.state.lock().unwrap(), timeout, |s| s.items.len() > 0, ).unwrap(); if result.timed_out() { None } else { // conditionally notify `bounded_cv` state.items.pop_front() }

Hm, after thinking about it some more, this subtly changes the semantics. If there are multiple poppers, and both are awoken, then one will get a value and the other won't. We don't use multiple poppers, but for the push_bounded case, it could result in pushing too many elements on the queue. To guard against that, we would need to keep the loops, which ends up not simplifying at all.

In general, it probably doesn't matter, but I would prefer to keep the current semantics with the loop that "retries" after the thread is awakened.

Hm I'm not sure I follow, because if the closure returns true then that lock is persisted and returned, so we can't have two poppers simultaneously exit the wait timeout loop I believe? I think this is the same for the push case as well, where when we get a lock back after wait_until we're guaranteed that the condition evaluates true for the lock state we were returned.

Ah. Somehow it didn't click that it was atomically locked.

Pushed a commit with the change. Since it is unstable until 1.42, it will need to wait until Thursday.

Oh oops sorry about that, I totally didn't realize it was still unsable... In any case at least Thursday isn't that far off!

alexcrichton · 2020-03-12T16:55:08Z

@bors: r+

bors · 2020-03-12T16:55:09Z

📌 Commit 05a1f43 has been approved by alexcrichton

bors · 2020-03-12T16:55:17Z

⌛ Testing commit 05a1f43 with merge 2bdc879...

bors · 2020-03-12T17:16:04Z

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 2bdc879 to master...

Update cargo Update cargo 21 commits in bda50510d1daf6e9c53ad6ccf603da6e0fa8103f..7019b3ed3d539db7429d10a343b69be8c426b576 2020-03-02 18:05:34 +0000 to 2020-03-17 21:02:00 +0000 - Run through clippy (rust-lang/cargo#8015) - Fix config profiles using "dev" in `cargo test`. (rust-lang/cargo#8012) - Run CI on all PRs. (rust-lang/cargo#8011) - Add unit-graph JSON output. (rust-lang/cargo#7977) - Split workspace/validate() into multiple functions (rust-lang/cargo#8008) - Use Option::as_deref (rust-lang/cargo#8005) - De-duplicate edges (rust-lang/cargo#7993) - Revert "Disable preserving mtimes on archives" (rust-lang/cargo#7935) - Close the front door for clippy but open the back (rust-lang/cargo#7533) - Fix CHANGELOG.md typos (rust-lang/cargo#7999) - Update changelog note about crate-versions flag. (rust-lang/cargo#7998) - Bump to 0.45.0, update changelog (rust-lang/cargo#7997) - Bump libgit2 dependencies (rust-lang/cargo#7996) - Avoid buffering large amounts of rustc output. (rust-lang/cargo#7838) - Add "Updating" status for git submodules. (rust-lang/cargo#7989) - WorkspaceResolve: Use descriptive lifetime label. (rust-lang/cargo#7990) - Support old html anchors in manifest chapter. (rust-lang/cargo#7983) - Don't create hardlink for library test and integrations tests, fixing rust-lang/cargo#7960 (rust-lang/cargo#7965) - Partially revert change to filter debug_assertions. (rust-lang/cargo#7970) - Try to better handle restricted crate names. (rust-lang/cargo#7959) - Fix bug with new feature resolver and required-features. (rust-lang/cargo#7962)

rust-highfive assigned Eh2406 Jan 26, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 26, 2020

This was referenced Jan 28, 2020

Cargo is entering an infinite loop #7840

Closed

Swap std::sync::mpsc channel with crossbeam_channel #7844

Merged

ehuss mentioned this pull request Jan 29, 2020

Replace std::sync::mpsc with a much simpler queue #7845

Closed

ehuss force-pushed the fix-memory-rustc-output branch from e3f5032 to 9839826 Compare March 6, 2020 02:27

alexcrichton and others added 2 commits March 7, 2020 14:01

Avoid buffering large amounts of rustc output.

e2b28f7

ehuss force-pushed the fix-memory-rustc-output branch from 9839826 to e2b28f7 Compare March 8, 2020 18:25

alexcrichton reviewed Mar 10, 2020

View reviewed changes

alexcrichton approved these changes Mar 10, 2020

View reviewed changes

ehuss added 2 commits March 10, 2020 11:23

Add test for caching large output.

c67cd7a

Use wait_while for Condvar in Queue to simplify code.

05a1f43

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 12, 2020

bors merged commit 2bdc879 into rust-lang:master Mar 12, 2020

This was referenced Mar 12, 2020

Close the front door for clippy but open the back #7533

Merged

Several updates to token/index handling. #7973

Merged

ehuss mentioned this pull request Mar 18, 2020

Update cargo rust-lang/rust#69907

Merged

ehuss mentioned this pull request May 15, 2020

Hangs if procedural macro prints output when stdout is closed #8245

Closed

alexcrichton mentioned this pull request Nov 9, 2020

Improve performance of almost fresh builds #8837

Merged

ehuss mentioned this pull request Jul 12, 2021

Deduplicate compiler diagnostics. #9675

Merged

ehuss added this to the 1.44.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid buffering large amounts of rustc output. #7838

Avoid buffering large amounts of rustc output. #7838

ehuss commented Jan 26, 2020

rust-highfive commented Jan 26, 2020

Eh2406 commented Jan 27, 2020

alexcrichton commented Jan 28, 2020

Mark-Simulacrum commented Jan 28, 2020

alexcrichton commented Jan 29, 2020

alexcrichton commented Jan 29, 2020

Mark-Simulacrum commented Jan 29, 2020

ehuss commented Jan 29, 2020

bors commented Jan 30, 2020

ehuss commented Mar 6, 2020

alexcrichton commented Mar 6, 2020

ehuss commented Mar 6, 2020

alexcrichton commented Mar 6, 2020

ehuss commented Mar 8, 2020

alexcrichton Mar 10, 2020

ehuss Mar 10, 2020

alexcrichton Mar 10, 2020

ehuss Mar 10, 2020

alexcrichton Mar 10, 2020

ehuss Mar 10, 2020

alexcrichton Mar 10, 2020

ehuss Mar 10, 2020

alexcrichton Mar 10, 2020

ehuss Mar 10, 2020

alexcrichton Mar 10, 2020

alexcrichton commented Mar 12, 2020

bors commented Mar 12, 2020

bors commented Mar 12, 2020

bors commented Mar 12, 2020

Avoid buffering large amounts of rustc output. #7838

Avoid buffering large amounts of rustc output. #7838

Conversation

ehuss commented Jan 26, 2020

rust-highfive commented Jan 26, 2020

Eh2406 commented Jan 27, 2020

alexcrichton commented Jan 28, 2020

Mark-Simulacrum commented Jan 28, 2020

alexcrichton commented Jan 29, 2020

alexcrichton commented Jan 29, 2020

Mark-Simulacrum commented Jan 29, 2020

ehuss commented Jan 29, 2020

bors commented Jan 30, 2020

ehuss commented Mar 6, 2020

alexcrichton commented Mar 6, 2020

ehuss commented Mar 6, 2020

alexcrichton commented Mar 6, 2020

ehuss commented Mar 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Mar 12, 2020

bors commented Mar 12, 2020

bors commented Mar 12, 2020

bors commented Mar 12, 2020