Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fork-related code from Rust #7894

Merged
merged 4 commits into from Jun 20, 2019

Conversation

gshuflin
Copy link
Contributor

cf. #7821

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Holding off on approval only because I'm not enough of an authority on Rust to confidently approve.

Thanks for doing this!

src/rust/engine/src/context.rs Show resolved Hide resolved
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to the project! This looks great, thanks!

CI is showing that the rust code hasn't been formatted with rustfmt; can you run build-support/bin/check_rust_formatting.sh -f and commit the result?

I recommend running build-support/bin/setup.sh one time in your git checkout, which will install git hooks which will do things like check formatting before you commit :)

.unwrap(),
remote_store_rpc_retries,
futures_timer_thread2.handle(),
let local_store_dir = local_store_dir.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone is no longer necessary, because we're now moving the value into a FnOnce closure not an Fn closure.

@@ -378,13 +378,11 @@ pub extern "C" fn scheduler_metrics(
pub extern "C" fn scheduler_fork_context(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a question for a separate PR, but I believe this is now unused, and it (and the python half which calls this) can also be deleted

@gshuflin gshuflin changed the title Remove resettable Remove fork-related code from Rust Jun 19, 2019
@gshuflin gshuflin force-pushed the remove_resettable branch 2 times, most recently from 57ff713 to 4e1de00 Compare June 19, 2019 23:22
@Eric-Arellano
Copy link
Contributor

This is still failing due to error cannot allocate memory https://travis-ci.org/pantsbuild/pants/jobs/547962452#L2810.

I originally thought this was us running out of memory because of V2 using too much of it, but I don't think that's the case anymore. This failure has happened twice now for this PR, and hasn't shown up on any other builds.

@illicitonion any thoughts on what might be causing it? Greg wasn't able to reproduce locally via ./pants --no-v1 --v2 test tests/python/pants_test/engine:fs.

--

Also, has a merge conflict now.

Copy link
Contributor

@blorente blorente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to the project!

I don't know why the out of memory error happens, but to reproduce it locally you might want to try setting this value to something lower: https://github.com/pantsbuild/pants/blob/master/src/rust/engine/fs/src/store.rs#L24

// Allow for some overhead for bookkeeping threads (if any).
process_execution_parallelism + 2,
store.clone(),
futures_timer_thread2.clone(), // TODO remove clone once the store and http are not resettables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this clone?

@illicitonion
Copy link
Contributor

This is still failing due to error cannot allocate memory https://travis-ci.org/pantsbuild/pants/jobs/547962452#L2810.

I originally thought this was us running out of memory because of V2 using too much of it, but I don't think that's the case anymore. This failure has happened twice now for this PR, and hasn't shown up on any other builds.

@illicitonion any thoughts on what might be causing it? Greg wasn't able to reproduce locally via ./pants --no-v1 --v2 test tests/python/pants_test/engine:fs.

I do have thoughts! So, a while ago, we made the store attempt to allocate a ridiculous amount of virtual memory. This is needed because it mmaps a potentially very large file from disk, and we want to allow the file tot be very large. We make a store per pants process, or per scheduler we create in unit tests. @stuhood ended up making us re-use schedulers across the test-cases in a test-class, because he couldn't find a convenient and reliable way to force python to gc away the schedulers (which have stores) after each test-case, and so we ended up collecting a bunch of these, and they blew up memory usage like crazy.

We currently have two separate LMDB stores, one for files (which may be large), and one for metadata about directories (which should be tiny), but we allocate the same giant amount of virtual memory to both of them. We can probably trivially drop the allocation to the one for directories by a few orders of magnitude, which I will try doing now... This is still kind of hacking around the problem, but... shrug

The reason this PR will be making us more likely to max out memory is that before Stores were only lazily initialized when they were actually needed (because that's how Resettables worked), whereas now we're eagerly initialising them every time we make a scheduler; so some schedulers we made before which didn't take a bunch of virtual memory, now will.

Also, it's not just this PR; I saw this on another of my PRs today too.

--

Also, has a merge conflict now.

I've just pushed a rebase, because that merge conflict was totally my fault, and totally foreseeable by me! Sorry for the hassle!

@illicitonion illicitonion merged commit f7d45dc into pantsbuild:master Jun 20, 2019
@gshuflin gshuflin deleted the remove_resettable branch June 20, 2019 21:56
@@ -79,11 +77,10 @@ impl Core {
let mut remote_store_servers = remote_store_servers;
remote_store_servers.shuffle(&mut rand::thread_rng());

let runtime = Resettable::new(|| {
let runtime =
Arc::new(RwLock::new(Runtime::new().unwrap_or_else(|e| {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime is Send and Sync afaict, so the RwLock should no longer be necessary...? I wonder whether it ever was.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...oh, sorry. Except we need mutable access to it for spawn and block_on. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants