-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crates.io: Remove dev-dependencies from the index #3674
base: master
Are you sure you want to change the base?
Conversation
6529494
to
3067adc
Compare
Co-authored-by: Ed Page <eopage@gmail.com>
Co-authored-by: Joe ST <joe@fbstj.net>
Co-authored-by: Jake Goulding <shepmaster@mac.com>
|
||
The crates.io server will still process and save dev-dependencies in the database, but it will no longer include them in the index. To be more precise, any item in the `deps` field with `"kind": "dev"` will be removed from the index. | ||
|
||
To reduce the amount of unnecessary commits to download for users of the git index we could implement this in a way where dev-dependencies are only removed from an index file if a release for the corresponding crate is being published and the file needs to be touched anyway. We could keep running in this state for a couple of weeks/months and then later trigger a full sync when a bigger chunk of the actively maintained crates have already been updated, reducing the amount of commits needed for the migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following the relationship of the number of commits here. This paragraph seems to be written with the assumption that a full sync would be implemented as a separate commit per crate? Why wouldn't a sync be implemented as a single commit across all crates?
If it is done as a single commit, whether it is done immediately or later doesn't really matter. It's still going to generate a large number of deltas which will impact updates as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my thinking was that if the deltas are spread out over a number of weeks/months then the individual updates won't take quite as long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I may have used the wrong wording in the RFC text and caused confusion. By "unnecessary commits" I meant "unnecessary deltas". Maybe that resolves your question?
it looks like there are no major concerns brought up so far and the point about the large number of deltas is already addressed in the RFC, so let's get this process started: @rfcbot fcp merge |
Team member @Turbo87 has proposed to merge this. The next step is review by the rest of the tagged team members: Concerns:
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
I wonder if the issues with the excessive deltas/commits could also be reduced by starting the process shortly after having done a index squash, and by doing another index squash shortly after completion of the process. I am currently (ab)using the presence of dev-dependencies in the index to construct benchmark cases for cargo's new resolver. This is absolutely not a blocker, because the data is available from other places like downloading and extracting the |
yeah, I thought I had put that in the RFC text, but apparently I forgot. my plan would be to keep this running for a couple of weeks/months and when we do a full sync we will likely couple it with an index squash then. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
|
||
- This change will temporarily increase the size of the git index, due to the amount of file changes necessary to remove the dev-dependencies from the index. This could potentially be coupled with an index squash though, which would reduce the size of the index again. | ||
|
||
- This change could potentially break other users of the index, if they rely on the dev-dependencies being present in the index. Part of the reason for this RFC is to see whether there are any users of the dev-dependencies in the index and what we could do to help them migrate to a different solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth not doing this, but historically I have thought about making Crater try to structure its build graph based on what dependencies are actually needed for a crate -- and since Crater runs tests, it needs dev-deps too. The goal of that structuring would be better cache efficiency in terms of not rebuilding dependencies many times.
I think without this information in the index Crater would have to generate lockfiles (currently costly, since it probably requires sandboxes), but that seems like a fairly reasonable tradeoff -- especially since it's at least partially needed anyway for e.g. feature resolution.
Mostly commenting since it sounds like we're looking for reasons to use this information and this has been one I've been thinking about for a while, even if there are no active implementation plans to my knowledge.
before I forget: we discussed this RFC in the crates.io team meeting last week and once accepted and implemented we are planning to publish an Inside Rust blog post with a cut-off date before we enable this behavior. this should give people that don't follow the RFCs a bigger chance of getting notified of the behavior change and have time to adjust if they are actually using the dev-dependencies in the index for anything. |
@rfcbot concern features using dev-dependencies On Zulip @shepmaster notified us that While these "private" features are not problematic in isolation,
This makes removing the dev-dependencies more complicated, since we can't do it unconditionally anymore and have to check the features declarations first. I currently see three options:
I guess the latter two options are slightly more risky as we have to ensure that our logic matches that of It ultimately becomes a question of whether the risk-benefit-ratio is high enough. I wrote this RFC under the assumption that @rust-lang/crates-io @rust-lang/cargo I'm open to thoughts on how to proceed from here. I'm slightly leaning towards one of these three options, but I'm curious what others think :) |
I'm inclined towards option 1 (don't make the change), in this case — option 2 (keep only used dev-dependencies) results in less of a win, and requires additional knowledge on the part of crates.io on how Cargo manifests are processed, and option 3 (remove features that depend on dev-dependencies) scares me a little. (At best, we'd have to understand how tooling is using the features metadata that's currently in the index.) |
I'm interested in using the devs dependency information for a tool that I'm currently planning which will run the test suite for all your dependencies recurisvely (as this is something cargo doesn't support currently). |
Another concern: does lib.rs need this info @kornelski |
I'd suggest using the
lib.rs ingests the daily database dump and parses the |
regarding the concern raised in #3674 (comment): in the crates.io team meeting last week we decided to delay our decision on how to proceed a little bit. @shepmaster is currently working on some feature in margo that might help us validate how viable the options are. once we have more information we will be better prepared to take a decision on this. until that time this RFC will stay open. |
I'm currently using dev-dependencies from the index for reverse deps statistics on lib.rs, but I can live without this info. |
Rendered