Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lintcheck: parallelize #6764

Merged
merged 6 commits into from
Feb 20, 2021
Merged

Conversation

matthiaskrgr
Copy link
Member

@matthiaskrgr matthiaskrgr commented Feb 18, 2021

By default we use a single thread and one target dir as before.

If -j n is passed, use n target dirs and run one clippy in each of them.
We need several target dirs because cargo would lock them for a single process otherwise which would prevent the parallelism.
-j 0 makes rayon use $thread_count/2 (which I assume is the number of physical cores of a machine) for the number of threads.

Other change:
Show output of clippy being compiled when building it for lintcheck (makes it easier to spot compiler errors etc)
Show some progress indication in the "Linting... foo 1.2.3" message.
Sort crates before linting (previously crates would be split randomly between target dirs, with the sorting, we try to make sure that even crates land in target dir 0 and odd ones in target dir 1 etc..)

Please write a short comment explaining your change (or "none" for internal only changes)
changelog: parallelize lintcheck with rayon

Use rayon to figure out the threadcount and half that for core count.
For each core, create a target dir that is used.
Otherwise, when running multiple clippys with the same target-dir, cargo would lock the dir and prevent parallelism.
This way we can run multiple clippys at the same time (on root crates) but we sacrifice cache-hits (when we already cargo-checked crate-deps).
@rust-highfive
Copy link

r? @flip1995

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Feb 18, 2021
@flip1995
Copy link
Member

Timings on 40c server

  • first 100 lines of this list, aka everything from line 102 onward removed, aka error-chain is the last line
  • Clippy and clippy_dev precompiled. All crates already downloaded.
  • Without rayon
cargo dev-lintcheck  576.96s user 53.13s system 161% cpu 6:30.50 total

Size of target dirs:

$ du -sh target
4.2G    target
$ du -sh target/lintcheck
1.6G    target/lintcheck
$ du -sh target/lintcheck/shared_target_dir
1.5G    target/lintcheck/shared_target_dir

  • first 100 lines of this list, aka everything from line 102 onward removed, aka error-chain is the last line
  • Clippy and clippy_dev precompiled. All crates already downloaded.
  • With rayon
cargo dev-lintcheck  1165.68s user 99.59s system 1837% cpu 1:08.87 total

Size of target dirs:

$ du -sh target
4.7G    target
$ du -sh target/lintcheck
2.2G    target/lintcheck
$ du -sh target/lintcheck/shared_target_dir
2.1G    target/lintcheck/shared_target_dir

Overall a speed up of 5.67x. So it is definitely worth it if you have a machine with many cores, but not that relevant on smaller machines.

So I would add a flag --parallel-checks N and only check in parallel if you specify that. (Maybe if N isn't specified keep the current behavior of using half of the cores)

@matthiaskrgr
Copy link
Member Author

Lol, so we we throw in 40x the performance for a ~5-6x speedup 😆 ...
Then again, you only checked the first 100 crates, that's almost only twice as much as the core count.
I assume the parallel version would be more efficient by a larger margin if we had checked the full 400 crates?

I wonder if we could get slightly better performance by checking crates with large deptrees first or if the difference is negligible.

@flip1995
Copy link
Member

Well if you used 4c/8t before its more like 10x the power. I also looked at htop a bit and when using the sequential version only 4-8 cores were in use at the same time. The parallel version used all 40 cores throughout.

wonder if we could get slightly better performance by checking crates with large deptrees first or if the difference is negligible.

I don't think that this will help much. Computing the dep trees and the figuring out what to compile where and in which order sounds definitely too complicated.

@matthiaskrgr
Copy link
Member Author

matthiaskrgr commented Feb 19, 2021

Extremely hacky and ugly way to figure out the total number of deps in crate dep tree:

format!("{:?}", cargo_metadata::MetadataCommand::new().manifest_path("./Cargo.toml").exec().unwrap().resolve.unwrap().nodes).matches("dependencies: ").count();

defaults to 1
-j 0 choses the number of threads automtically (= number of physical cores)
@matthiaskrgr matthiaskrgr changed the title [WIP] lintcheck: parallelize lintcheck: parallelize Feb 19, 2021
@flip1995
Copy link
Member

@bors r+

Thanks!

Does this command compute only the direct deps or the complete tree?

@bors
Copy link
Collaborator

bors commented Feb 20, 2021

📌 Commit 8499a32 has been approved by flip1995

@bors
Copy link
Collaborator

bors commented Feb 20, 2021

⌛ Testing commit 8499a32 with merge 23de801...

@bors
Copy link
Collaborator

bors commented Feb 20, 2021

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: flip1995
Pushing 23de801 to master...

@bors bors merged commit 23de801 into rust-lang:master Feb 20, 2021
@matthiaskrgr
Copy link
Member Author

It counts the number of deps in the entire dep tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants