Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer rayon crate feature that parallelizes index-building #401

Closed
4 tasks
obi1kenobi opened this issue Aug 20, 2024 · 4 comments · Fixed by #424
Closed
4 tasks

Offer rayon crate feature that parallelizes index-building #401

obi1kenobi opened this issue Aug 20, 2024 · 4 comments · Fixed by #424
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@obi1kenobi
Copy link
Owner

obi1kenobi commented Aug 20, 2024

Our IndexedCrate type builds a variety of indexes based on the underlying rustdoc crate.

Rustdoc crates' JSON can be ~450MB in size for particularly large crates, so our indexes can be significant in size too. They could probably be built in parallel using rayon.

We don't want to require rayon, since that may not be suitable for all use cases. But we should offer it as a non-default option in case the performance improvement is worth it. For example, cargo-semver-checks would enable it, whereas the rustdoc Trustfall playground would probably not enable it.

Required steps:

  • add optional rayon feature
  • adjust CI to run tests both with and without the feature, so both configurations are known correct
  • implement rayon parallelization for index-building
  • benchmark (on your own machine) how much faster rayon makes IndexedCrate::new() on the rustdoc JSON of a large crate like aws-sdk-ec2
    • to generate rustdoc JSON, clone that repo, cd into the crate's directory, and use the RUSTDOCFLAGS="-Z unstable-options --document-private-items --document-hidden-items --output-format=json --cap-lints=allow" cargo doc --lib --no-deps shell command which will put aws-sdk-ec2.json into the target/doc directory of the workspace

This is a good first issue for someone who is already familiar with Rust, somewhat familiar with rayon, and wants to start contributing to this project. It is not a good first issue for folks new to Rust, since it requires understanding advanced Rust concepts such as Send + Sync auto-traits etc.

Comment to temporarily claim the issue, in case multiple people are interested in it!

@obi1kenobi obi1kenobi added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 20, 2024
@jalil-salame
Copy link
Contributor

I have 2h tomorrow I can spend on this (in ~22h from the time I posted this comment). I'll do a test implementation and go from there c:

@obi1kenobi
Copy link
Owner Author

Hey, welcome to the project and thanks for looking into this! Feel free to reach out with any questions, I'm happy to help!

@jalil-salame
Copy link
Contributor

Closed by #424 ?

@obi1kenobi
Copy link
Owner Author

Indeed, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
2 participants