Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid buffering large amounts of rustc output. #7838

Merged
merged 4 commits into from Mar 12, 2020

Conversation

@ehuss
Copy link
Contributor

ehuss commented Jan 26, 2020

If rustc prints out a lot of information (such as with RUSTC_LOG, or a huge number of diagnostics), cargo would buffer up large amounts of that in memory. For normal builds, this would happen if the terminal does not print fast enough. For "fresh" replay, everything was being buffered.

There are two issues:

  1. There is no back-pressure on the mpsc queue. If messages come in faster than they can be processed, it grows without bounds.
  2. The cache-replay code runs in the "fresh" code path which does not spawn a thread. Thus the main thread was blocked and unable to process Messages while the replay is happening.

The solution here is to use a bounded queue, and to always spawn a thread for the "fresh" case.

The main concern here is performance. Previously the "fresh" jobs avoided spawning a thread to improve performance. I did a fair bit of profiling to understand the impact, using projects with anywhere from 100 to 500 units. On my macOS machine, I found spawning a thread to be slightly faster (1-5%). On Linux and Windows, it was generally about 0 to 5% slower. It might be helpful for others to profile it on their own system.

I'm on the fence for the cost/benefit here. It seems generally good to reduce memory usage, but the slight performance hit is disappointing. I tried several other approaches to fix this, all with worse trade offs (I can discuss them if interested).

Fixes #6197

@rust-highfive

This comment has been minimized.

Copy link

rust-highfive commented Jan 26, 2020

r? @Eh2406

(rust_highfive has picked a reviewer for you, use r? to override)

@Eh2406

This comment has been minimized.

Copy link
Contributor

Eh2406 commented Jan 27, 2020

The code looks ok to r+ when we are done with the discussion.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 28, 2020

Agreed that the concern here is perf, and my "benchmark" is to check out servo, do cargo build, and then see how long it takes to do that again. I did some quick testing locally though and also haven't been able to see much of a difference.

I think this is a case where the difference in perf, if any, is going to be pretty small (as measured by @ehuss). I also think it's best to be "more correct" in terms of not blocking the main thread and not hogging tons of memory.

Before I r+, though, I wanted to clarify something. When we buffer huge amounts of output here, is Cargo actually going to print all the output? Or is cargo buffering it in one location and later deciding to not print it? (if this is the case this seems like a better bug to fix, but if we're actually destined to print everything we read then we're just optimizing here what we're already printing).

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jan 28, 2020

I would like to see a comment on the sync_channel pointing at this PR, at least.

Did you choose 100 as the bound for the queue length for some reason, or just "some number"? I'm wondering if that will fail to work well with the jobserver-per-rustc flag, as in that scenario on a ~16 core machine we would expect 16*15 = 240 token requests to come in from all the rustc processes fairly quickly (and keep coming in, during the first few seconds of the build). For that case, we actually don't need to do anything with those requests (most will be ~immediately dropped on the floor, as by the time we get to them the process is already done I imagine), but if this length limit to 100 causes us to stall out and miss "Finished" events that could seriously slow Cargo down.

(To be clear, I don't think fixing the above is necessary, it's an unstable flag for a reason -- but wanted to dump my thoughts somewhere at least :)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 29, 2020

As we continue to scrutinize the channels in Cargo I'm becoming a bit more wary of making sends blocking. I think we may want to do a quick audit as well where cargo sends messages to see if it's ok to block. The areas I can see are:

  • All methods on JobState may block now. This notably includes the stdout printing and such (the whole point of this PR). I think this is mostly ok but we may experience some timing issues depending on when timestamps get written and such. For example rustc may finish but it may take us quite some time to later touch a file to make a timestamp. I don't think this is a problem, but figured it'd be worth mentioning.
  • Sending tokens from the jobserver helper thread to the main thread. @ehuss this is like the interaction you were seeing although I would have expected deadlock rather than blocking for just awhile. In any case this is pretty sketchy because the jobserver helper thread needs to be shut down but it's blocking here where it didn't expect to be blocking. (note that the signals are intended to interrupt the blocking read call, if one is active)
  • Handling diagnostics which I think is basically the same bug as the jobserver helper thread all over again. We want to terminate the thread eventually but doing so may be problematic if it's blocking where we didn't expect it to block.

I suppose though the "maybe issues" are in practice not ever going to arise because we in theory should never start shut down until the whole message queue has been drained.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 29, 2020

Reading more of @Mark-Simulacrum's comment as well, I think it's actually a pretty good point. I'm wondering now if it might be best to have a more surgical fix here where we rate-limit stdout information getting printed but not rate limit other more high-priority messages. For example everything about jobserver management is a pretty high-priority message (or anything related to scheduling) whereas printing things is informational and can happen whenever.

We could perhaps consider a fix where there's a fixed capacity, above which print messages block the sender, but that's it. All other messages (such as scheduling things) are unconditionally sent and never block

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jan 29, 2020

One similar option perhaps is to try and move all stderr/stdout printing to a separate thread. AFAICT, it doesn't interact with the scheduling pretty much at all. It also seems like 90% of the problem comes from the fact that currently all Fresh jobs (whose output is on disk) are loading it into memory and sending it over the channel. Can we instead make the Message event have two variants, one of which we'd thread down as deeply as possible and then stream from disk to stderr/out? Ideally that would avoid most buffering, whereas today I believe some buffering is sort of unavoidable (i.e. a single message could be 20 megabytes for larger crates).

@ehuss

This comment has been minimized.

Copy link
Contributor Author

ehuss commented Jan 29, 2020

When we buffer huge amounts of output here, is Cargo actually going to print all the output? Or is cargo buffering it in one location and later deciding to not print it?

It is going to be printed.

Did you choose 100 as the bound for the queue length for some reason, or just "some number"?

100 is pretty arbitrary. Messages can be large (multiple kB), so I figured keeping up to a few megabytes in memory seemed like a good limit.

Your concerns about large numbers of token messages sounds reasonable.

more surgical fix here where we rate-limit stdout information getting printed but not rate limit other more high-priority messages

This sounds good to me. I actually started with a different design where I had two separate queues, one for stdout/stderr, and one for everything else. But it ended up being quite a bit more complex. (I also had a branch where Shell is in a mutex and I removed Message::Stdout/Stderr, but it made the progress bar flicker too much.)

I'll try to take some time and digest your comments. I think you're both right, and this should probably have a different solution.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 30, 2020

☔️ The latest upstream changes (presumably #7844) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss ehuss force-pushed the ehuss:fix-memory-rustc-output branch from e3f5032 to 9839826 Mar 6, 2020
@ehuss

This comment has been minimized.

Copy link
Contributor Author

ehuss commented Mar 6, 2020

I pushed a different approach using two channels.

This change is somewhat risky, since there are some really subtle behaviors here. I've tried to think of all that could go wrong, and haven't come up with anything, yet. All other solutions I've thought of tend to be more complicated and riskier.

Only one change that I can think of: The message "build failed, waiting for other jobs to finish..." can now be printed in-between messages, where previously it would be printed after the faulty job finished. I'm not sure how likely that is, or whether it really matters.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 6, 2020

Once we get into the realm of multiple channels I agree it's pretty hairy. Could we stick with one channel though? We could implement our own simple channel which is just a wrapper around Arc<Mutex<Vec<T>>> and we have two methods, one which always pushes and one which waits to push until the list is under a certain amount?

I think that would ideally help keep the concurrency here pretty simple since it's still just one queue of channels going out.

@ehuss

This comment has been minimized.

Copy link
Contributor Author

ehuss commented Mar 6, 2020

You mean #7845? I'd be much happier with that.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 6, 2020

Effectively, yeah, I don't think it's worth trying to bend over backwards to use crates.io for a simple channel here, especially if it's adding a lot of complexity in thinking about the concurrency here.

alexcrichton and others added 2 commits Jan 29, 2020
We don't need the complexity of most channels since this is not a
performance sensitive part of Cargo, nor is it likely to be so any time
soon. Coupled with recent bugs (#7840) we believe in `std::sync::mpsc`,
let's just not use that and use a custom queue type locally which should
be amenable to a blocking push soon too.
@ehuss ehuss force-pushed the ehuss:fix-memory-rustc-output branch from 9839826 to e2b28f7 Mar 8, 2020
@ehuss

This comment has been minimized.

Copy link
Contributor Author

ehuss commented Mar 8, 2020

I have pushed a new approach that uses #7845 instead. I'm still not sure how I feel about it. I can't think of specific problems. I ran a variety of performance tests, and it was roughly the same.

}
scope.spawn(move |_| doit());

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

I think this change may no longer be necessary, but did you want to include it anyway here?

This comment has been minimized.

Copy link
@ehuss

ehuss Mar 10, 2020

Author Contributor

It is necessary, otherwise the cached message playback would deadlock if there were more than 100 messages. The playback shouldn't happen on the main thread, otherwise there is nothing to drain messages while they are added to the queue.

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

Ah right yeah, forgot about that!

This comment has been minimized.

Copy link
@ehuss

ehuss Mar 10, 2020

Author Contributor

I added a test for message caching to check for deadlock.

/// Pushes an item onto the queue, blocking if the queue is full.
pub fn push_bounded(&self, item: T) {
let mut state = self.state.lock().unwrap();
loop {

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

This might be able to make use of the nifty wait_until method:

let state = self.bounded_cv.wait_until(state, |s| s.items.len() < self.bound).unwrap();

This comment has been minimized.

Copy link
@ehuss

ehuss Mar 10, 2020

Author Contributor

Didn't know that existed!

// Assumes threads cannot be canceled.
self.bounded_cv.notify_one();
}
Some(value)

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

This might actually also get cleaned up a good amount with wait_timeout_until

let (mut state, result) = self.popper_cv.wait_timeout_until(
    self.state.lock().unwrap(),
    timeout,
    |s| s.items.len() > 0,
).unwrap();
if result.timed_out() {
    None 
} else {
    // conditionally notify `bounded_cv`
    state.items.pop_front()
}

This comment has been minimized.

Copy link
@ehuss

ehuss Mar 10, 2020

Author Contributor

Hm, after thinking about it some more, this subtly changes the semantics. If there are multiple poppers, and both are awoken, then one will get a value and the other won't. We don't use multiple poppers, but for the push_bounded case, it could result in pushing too many elements on the queue. To guard against that, we would need to keep the loops, which ends up not simplifying at all.

In general, it probably doesn't matter, but I would prefer to keep the current semantics with the loop that "retries" after the thread is awakened.

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

Hm I'm not sure I follow, because if the closure returns true then that lock is persisted and returned, so we can't have two poppers simultaneously exit the wait timeout loop I believe? I think this is the same for the push case as well, where when we get a lock back after wait_until we're guaranteed that the condition evaluates true for the lock state we were returned.

This comment has been minimized.

Copy link
@ehuss

ehuss Mar 10, 2020

Author Contributor

Ah. Somehow it didn't click that it was atomically locked.

Pushed a commit with the change. Since it is unstable until 1.42, it will need to wait until Thursday.

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Mar 10, 2020

Member

Oh oops sorry about that, I totally didn't realize it was still unsable... In any case at least Thursday isn't that far off!

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 12, 2020

@bors: r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 12, 2020

📌 Commit 05a1f43 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 12, 2020

⌛️ Testing commit 05a1f43 with merge 2bdc879...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Mar 12, 2020

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 2bdc879 to master...

@bors bors merged commit 2bdc879 into rust-lang:master Mar 12, 2020
6 of 11 checks passed
6 of 11 checks passed
rust-lang.cargo Build #20200310.3 failed
Details
rust-lang.cargo (Linux stable) Linux stable failed
Details
rust-lang.cargo (Windows x86_64-msvc) Windows x86_64-msvc failed
Details
rust-lang.cargo (macOS) macOS failed
Details
rust-lang.cargo (resolver) resolver failed
Details
homu Test successful
Details
rust-lang.cargo (Linux beta) Linux beta succeeded
Details
rust-lang.cargo (Linux nightly) Linux nightly succeeded
Details
rust-lang.cargo (build_std) build_std succeeded
Details
rust-lang.cargo (docs) docs succeeded
Details
rust-lang.cargo (rustfmt) rustfmt succeeded
Details
@ehuss ehuss mentioned this pull request Mar 18, 2020
bors added a commit to rust-lang/rust that referenced this pull request Mar 18, 2020
Update cargo

Update cargo

21 commits in bda50510d1daf6e9c53ad6ccf603da6e0fa8103f..7019b3ed3d539db7429d10a343b69be8c426b576
2020-03-02 18:05:34 +0000 to 2020-03-17 21:02:00 +0000
- Run through clippy (rust-lang/cargo#8015)
- Fix config profiles using "dev" in `cargo test`. (rust-lang/cargo#8012)
- Run CI on all PRs. (rust-lang/cargo#8011)
- Add unit-graph JSON output. (rust-lang/cargo#7977)
- Split workspace/validate() into multiple functions (rust-lang/cargo#8008)
- Use Option::as_deref (rust-lang/cargo#8005)
- De-duplicate edges (rust-lang/cargo#7993)
- Revert "Disable preserving mtimes on archives" (rust-lang/cargo#7935)
- Close the front door for clippy but open the back (rust-lang/cargo#7533)
- Fix CHANGELOG.md typos (rust-lang/cargo#7999)
- Update changelog note about crate-versions flag. (rust-lang/cargo#7998)
- Bump to 0.45.0, update changelog (rust-lang/cargo#7997)
- Bump libgit2 dependencies (rust-lang/cargo#7996)
- Avoid buffering large amounts of rustc output. (rust-lang/cargo#7838)
- Add "Updating" status for git submodules. (rust-lang/cargo#7989)
- WorkspaceResolve: Use descriptive lifetime label. (rust-lang/cargo#7990)
- Support old html anchors in manifest chapter. (rust-lang/cargo#7983)
- Don't create hardlink for library test and integrations tests, fixing rust-lang/cargo#7960 (rust-lang/cargo#7965)
- Partially revert change to filter debug_assertions. (rust-lang/cargo#7970)
- Try to better handle restricted crate names. (rust-lang/cargo#7959)
- Fix bug with new feature resolver and required-features. (rust-lang/cargo#7962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

6 participants
You can’t perform that action at this time.