Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crates.io: Remove dev-dependencies from the index #3674

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Jul 31, 2024

@Turbo87 Turbo87 added the T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. label Jul 31, 2024
Co-authored-by: Ed Page <eopage@gmail.com>
Co-authored-by: Joe ST <joe@fbstj.net>
Co-authored-by: Jake Goulding <shepmaster@mac.com>
text/3674-crates-io-remove-dev-deps.md Show resolved Hide resolved

The crates.io server will still process and save dev-dependencies in the database, but it will no longer include them in the index. To be more precise, any item in the `deps` field with `"kind": "dev"` will be removed from the index.

To reduce the amount of unnecessary commits to download for users of the git index we could implement this in a way where dev-dependencies are only removed from an index file if a release for the corresponding crate is being published and the file needs to be touched anyway. We could keep running in this state for a couple of weeks/months and then later trigger a full sync when a bigger chunk of the actively maintained crates have already been updated, reducing the amount of commits needed for the migration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following the relationship of the number of commits here. This paragraph seems to be written with the assumption that a full sync would be implemented as a separate commit per crate? Why wouldn't a sync be implemented as a single commit across all crates?

If it is done as a single commit, whether it is done immediately or later doesn't really matter. It's still going to generate a large number of deltas which will impact updates as mentioned above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thinking was that if the deltas are spread out over a number of weeks/months then the individual updates won't take quite as long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I may have used the wrong wording in the RFC text and caused confusion. By "unnecessary commits" I meant "unnecessary deltas". Maybe that resolves your question?

@Turbo87
Copy link
Member Author

Turbo87 commented Aug 16, 2024

it looks like there are no major concerns brought up so far and the point about the large number of deltas is already addressed in the RFC, so let's get this process started:

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 16, 2024

Team member @Turbo87 has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Aug 16, 2024
@Eh2406
Copy link
Contributor

Eh2406 commented Aug 19, 2024

I wonder if the issues with the excessive deltas/commits could also be reduced by starting the process shortly after having done a index squash, and by doing another index squash shortly after completion of the process.

I am currently (ab)using the presence of dev-dependencies in the index to construct benchmark cases for cargo's new resolver. This is absolutely not a blocker, because the data is available from other places like downloading and extracting the .crate files (or possibly the crates.io database dump depending on implementation details). Alternatively, as a stopgap, I could pin my benchmarking code to the last version of the index before this data was removed.

@Turbo87
Copy link
Member Author

Turbo87 commented Aug 19, 2024

I wonder if the issues with the excessive deltas/commits could also be reduced by starting the process shortly after having done a index squash, and by doing another index squash shortly after completion of the process.

yeah, I thought I had put that in the RFC text, but apparently I forgot. my plan would be to keep this running for a couple of weeks/months and when we do a full sync we will likely couple it with an index squash then.

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Aug 23, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Aug 23, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔


- This change will temporarily increase the size of the git index, due to the amount of file changes necessary to remove the dev-dependencies from the index. This could potentially be coupled with an index squash though, which would reduce the size of the index again.

- This change could potentially break other users of the index, if they rely on the dev-dependencies being present in the index. Part of the reason for this RFC is to see whether there are any users of the dev-dependencies in the index and what we could do to help them migrate to a different solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth not doing this, but historically I have thought about making Crater try to structure its build graph based on what dependencies are actually needed for a crate -- and since Crater runs tests, it needs dev-deps too. The goal of that structuring would be better cache efficiency in terms of not rebuilding dependencies many times.

I think without this information in the index Crater would have to generate lockfiles (currently costly, since it probably requires sandboxes), but that seems like a fairly reasonable tradeoff -- especially since it's at least partially needed anyway for e.g. feature resolution.

Mostly commenting since it sounds like we're looking for reasons to use this information and this has been one I've been thinking about for a while, even if there are no active implementation plans to my knowledge.

@Turbo87
Copy link
Member Author

Turbo87 commented Aug 27, 2024

before I forget: we discussed this RFC in the crates.io team meeting last week and once accepted and implemented we are planning to publish an Inside Rust blog post with a cut-off date before we enable this behavior. this should give people that don't follow the RFCs a bigger chance of getting notified of the behavior change and have time to adjust if they are actually using the dev-dependencies in the index for anything.

@Turbo87
Copy link
Member Author

Turbo87 commented Aug 29, 2024

@rfcbot concern features using dev-dependencies

On Zulip @shepmaster notified us that cargo and crates.io currently allow publishing crates with features that depend on dev-dependencies. While these features are not usable from the outside, they can be useful for development purposes. One example of this is in the popular syn crate.

While these "private" features are not problematic in isolation, cargo currently throws up a parsing error when features exist that depend on a missing dependency declaration in the index:

failed to parse "3/s/syn" registry package: feature test includes syn-test-suite/all-features, but syn-test-suite is not a dependency

This makes removing the dev-dependencies more complicated, since we can't do it unconditionally anymore and have to check the features declarations first.

I currently see three options:

  1. we cancel this effort and keep the dev-dependencies in the index.
  2. we parse the features and keep only those dev-dependencies that are used in any of the features.
  3. we (recursively?) remove any features that depend on dev-dependencies.

I guess the latter two options are slightly more risky as we have to ensure that our logic matches that of cargo.

It ultimately becomes a question of whether the risk-benefit-ratio is high enough. I wrote this RFC under the assumption that cargo does not use the dev-dependencies in the index in any sort of form, since that was the initial response from the cargo team. Now that we discovered that cargo does indeed need the dev-dependencies in some cases I'm no longer sure whether the risk and additional complexity is worth the benefit anymore.

@rust-lang/crates-io @rust-lang/cargo I'm open to thoughts on how to proceed from here. I'm slightly leaning towards one of these three options, but I'm curious what others think :)

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Aug 29, 2024
@LawnGnome
Copy link

@rust-lang/crates-io @rust-lang/cargo I'm open to thoughts on how to proceed from here. I'm slightly leaning towards one of these three options, but I'm curious what others think :)

I'm inclined towards option 1 (don't make the change), in this case — option 2 (keep only used dev-dependencies) results in less of a win, and requires additional knowledge on the part of crates.io on how Cargo manifests are processed, and option 3 (remove features that depend on dev-dependencies) scares me a little. (At best, we'd have to understand how tooling is using the features metadata that's currently in the index.)

The lesson is never try.

@VorpalBlade
Copy link

VorpalBlade commented Sep 1, 2024

I'm interested in using the devs dependency information for a tool that I'm currently planning which will run the test suite for all your dependencies recurisvely (as this is something cargo doesn't support currently).

@VorpalBlade
Copy link

Another concern: does lib.rs need this info @kornelski

@Turbo87
Copy link
Member Author

Turbo87 commented Sep 1, 2024

I'm interested in using the devs dependency information for a tool that I'm currently planning which will run the test suite for all your dependencies recurisvely

I'd suggest using the Cargo.toml files of the dependencies for that instead.

Another concern: does lib.rs need this info

lib.rs ingests the daily database dump and parses the Cargo.toml files itself, so I don't think they need this in the index either.

@Turbo87
Copy link
Member Author

Turbo87 commented Sep 1, 2024

regarding the concern raised in #3674 (comment): in the crates.io team meeting last week we decided to delay our decision on how to proceed a little bit. @shepmaster is currently working on some feature in margo that might help us validate how viable the options are. once we have more information we will be better prepared to take a decision on this. until that time this RFC will stay open.

@kornelski
Copy link
Contributor

I'm currently using dev-dependencies from the index for reverse deps statistics on lib.rs, but I can live without this info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC.
Projects
Development

Successfully merging this pull request may close these issues.

None yet