Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-switch to multi-sources vendor directory layout #10344

Closed
wants to merge 3 commits into from

Conversation

weihanglo
Copy link
Member

@weihanglo weihanglo commented Jan 31, 2022

What does this PR try to resolve?

External command cargo-vendor has been broken for a while, there is no a clear alternative way to solve the situation described in #10310. This PR tries to do what --no-merge-sources does but automatically switching to multi-sources layout instead of introducing the flag. (see #10344 (review) and #10344 (comment))

Fixes #10310

How should we test and review this PR?

Part of the logic is copied from https://github.com/alexcrichton/cargo-vendor.

Several tests are updated and added:

  • 🆕 vendor::duplicate_version_from_multiple_sources: Auto-switch between non-merged and merged sources.
  • vendor::git_duplicate: Removed. cargo-vendor now can auto-switch.
  • vendor::vendor_sample_config: Merged into vendor::vendor_simple.

Additional information

There are somethings I am uncertain:

@rust-highfive
Copy link

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 31, 2022
@weihanglo
Copy link
Member Author

weihanglo commented Jan 31, 2022

If we're not ready to accept this flag, the alternative way to make cargo-vendor continue working is compiling it with older toolchain, say, 1.34.2.

# This is NOT recommended. Use at your own risk.
cargo +1.34.2 install --git https://github.com/alexcrichton/cargo-vendor#0.1.23
cargo-vendor vendor --no-merge-sources

Not recommended because old toolchains might not work as expected and contain possible security issues.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a ux perspective I would personally prefer to avoid the need for this flag at all. In some sense cargo vendor should "just work" and it shouldn't require a flag to be passed to avoid an error. We try to put the vendor directory into a "nice" configuration where it's flat and easy to read, but if duplicate versions and/or sources make that invalid then I think it's reasonable to just automatically switch the behavior of cargo vendor.

Or, in other words, for any particular crate graph I think we can roughly infer what the "best looking" vendor directory looks like and try to produce that rather than requiring this option to succeed.

src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
@bors
Copy link
Collaborator

bors commented Jan 31, 2022

☔ The latest upstream changes (presumably #10348) made this pull request unmergeable. Please resolve the merge conflicts.

@weihanglo weihanglo changed the title Add --no-merge-sources back to cargo vendor Auto-switch to multi-sources vendor directory layout Feb 1, 2022
@weihanglo
Copy link
Member Author

Great idea! Auto-switching is more friendly. I've just updated the PR. Thanks for the advice!

@ehuss
Copy link
Contributor

ehuss commented Feb 1, 2022

r? alexcrichton

@alexcrichton
Copy link
Member

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned ehuss Feb 1, 2022
@weihanglo weihanglo force-pushed the issue-10310 branch 2 times, most recently from bcb91bc to 88a81a0 Compare February 1, 2022 17:03
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now by accident that the request to switch behavior automatically still doesn't leave an option for users to force enable this behavior even if the old behavior would work. That may be a use case desired where we would have an option to force-enable this different layout, even if not necessary.

@YellowOnion as the original requester of this, do you have thoughts on this? I have proposed not actually adding a --no-merge-sources flag and instead having cargo vendor "just work" by using a different layout if necessary. For your vendoring use case, however, do you specifically always want to use the --no-merge-sources layout even if a merged layout would suffice?

src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
// The layout of the vendor directory depends on if there is any duplicate
// version. cargo-vendor auto switches between the two layouts and deletes
// the old vendor directory before vendoring crates.
match (sources_file.exists(), contains_duplicate_versions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused still on what's going on here. My current impression is that cargo vendor gets to blow away the entire vendor directory unless you pass --no-delete, and in that situation it simply just appends what's being added. Is that model still accurate?

If so, how come this is removing contents plus the block below handling no_delete?

Copy link
Member Author

@weihanglo weihanglo Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about I haven't explained it too much. This piece was copied from https://github.com/alexcrichton/cargo-vendor/blob/07570e23d0841936cea8af2a125484f4d10ccdfa/src/main.rs#L147-L152.

The merged and no-merged layout are not compatible with each other, so cargo must delete something to make it work. For example, we have a crate with log and serde vendored,

vendor/
├── log/
└── serde/

If we add the other log deps from git and vendor it, the directory tree would look like:

vendor/
├── registry-1ecc6299db9ec823/ # from crates.io
│  ├── log/
│  └── serde/
└── git-1936cea8af2a1111/ # from git
   └── serde/

Since both registry-1ecc6299db9ec823 and git-1936cea8af2a1111 are valid package names, we need to delete them all to avoid ambiguity or even errors.

Off the top of my head, one solution is that cargo can put crates from crates.io source at the first level, and duplicate version crate from other sources at the second level with folder names contains special characters which make them not valid package names. For instances, prefix sources with ~:

vendor/
├── log/ # from crates.io
├── serde/ # from crates.io
└── ~git-1936cea8af2a1111/ # git source
   └── serde/

With this solution, the vendored crates becomes appendable and can respect no-delete again. Also, the use of vendor/.sources file are not necessary anymore. One caveat is that someone vendoring crates mostly from their own registry might contain far less crates at the first level of the vendor dir than the second level separate registry source dir.

src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
src/cargo/ops/vendor.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member

Ok sorry for taking awhile to get back to this, but I'm back to it now! Thanks for your comments as well, they definitely help clarify things!

I actually quite like your suggestion here -- #10344 (comment) -- to have top-level crates stay the same way and moving conflicts to their own directory. One way perhaps to solve the custom registry problem is to choose the source with the most crates and put that at the top, relegating all other sources to ~foo directories. I also like the idea of using a character that's not allowed in package names to avoid conflicts here, although we may want to use something like @ or something other than ~ to avoid clashes with unix shells. Especially if this removes the .sources file this seems like the right way to go to me!

Also sorry I clearly have lost all memory of the old cargo-vendor despite having probably "reviewed" many of the PRs coming in. Thanks for the links though they're helpful!

Part of the code is copied from https://github.com/alexcrichton/cargo-vendor

When duplicate versions detected. `cargo vendor` keeps the source of
majority at the top of vendor directory, crates from other sources go
into separate directories in order to prevent name collisions.

To avoid name conflicts with valid crate names, the separate source
directories are prefixed with `@`, so that they can co-exist at the top
of the vendor directory.
@alexcrichton
Copy link
Member

This all looks great to me, thanks again! Since this is a meaningful change I'm gonna ping some other folks here as well to ensure that everyone's ok with this:

@rfcbot fcp merge

For those being cc'd this is fixing an issue in cargo vendor where if the same name/version pair comes from two different sources previously cargo vendor would error and now it will automatically switch the layout of the vendor directory generated to have vendor/foo-1.2.3-style vendorings for the largest source of packages and other sources will have vendor/@git+deadbeef/foo-1.2.3. This automatically happens and doesn't require a flag to cargo vendor.

Ideally this shouldn't impact any users today since we're just making cases that previously error'd start to work now.

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 10, 2022

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period An FCP proposal has started, but not yet signed off. disposition-merge FCP with intent to merge labels Feb 10, 2022
@joshtriplett
Copy link
Member

I like this idea, and how automatic it is.

One concern, though: doesn't cargo's "directory source" support only look one directory down, not two or more? So, if someone was using a directory source to point to the directory maintained by cargo vendor, that would break in this case.

I think we should either make directory sources work with this layout, or require an explicit option to enable this layout.

@rfcbot concern directory-source

@joshtriplett
Copy link
Member

Some possible options, in rough order of preference:

  • Make directory sources work with this layout, such as by looking one level deeper under subdirectories whose names start with @
  • Tweak this layout to work with directory sources (e.g. by prefixing each crate directory like @git+xyz+foo-1.2.3 rather than using a subdirectory)
  • Require an option to enable this layout

@alexcrichton
Copy link
Member

I'm not sure I understand your concern, the generated .cargo/config.toml information prints out what's necessary to use the vendored config, and that will have multiple directory sources as necessary (one per what source we're overriding). If users are trying to use the vendor directory as an independent directory source that seems like something that will keep working and is otherwise unaffected by this and they might just have to handle the case that new directories appear?

Basically that sounds like a sort of edge-use-case which may not necessarily affect this.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 10, 2022

@alexcrichton Directory sources are commonly used by folks trying to do fully offline builds. cargo vendor works quite naturally as a mechanism to maintain those sources; I don't think that's an edge case. And in the offline-build case, I've seen that people commonly just write a .cargo/config.toml once and don't expect to have to modify or update it when vendoring more sources. Vendoring a new crate and having it not available from the directory source seems unfortunate, and it seems like something we could straightforwardly fix (e.g. by having the directory source code check for crates in subdirectories under directories that start with @).

@weihanglo
Copy link
Member Author

weihanglo commented Feb 11, 2022

Vendoring a new crate and having it not available from the directory source seems unfortunate

That happens today if people try to add new git dependencies but forget to update .cargo/config.toml

by having the directory source code check for crates in subdirectories under directories that start with @

If I understand your proposal correctly, let me extend it. The "auto-search-subdirectory" fix contains two parts:

  • Put only version-conflict crates into subdirectories of each sources separately.
  • A directory source starts collecting extra crates under the subdir prefixed with @.

For example, we have a package with log and serde vendored,

vendor/
├── log/
└── serde/

With the "auto-search-subdirectory" fix, If we add a dep serde with the same version from alternatvie registry and vendor it, the directory tree would look like:

vendor/
├── log/
├── @registry-1ecc6299db9ec823/ # from crates.io
│  └── serde/
└── @registry-1936cea8af2a1111/ # from alternative registry
   └── serde/

Since crates under subdirectories are added to directory source, the two source-replacement configs are happy to point to the same directory without modification.

[source.crates-io]
replace-with = "vendored-sources"

[source."http://alt-registry"]
registry = "http://alt-registry"
replace-with = "vendored-sources"

[source.vendored-sources]
directory = "vendor"

If I didn't misunderstand your intent, this solution also seems reasonable and less intrusive. One flaw is that the search logic of directory source changes, old toolchain might not be able to recognize the new vendor layout if there is a version conflict.

@YellowOnion
Copy link

Sorry for late reply, I'm not familiar with the internals of buildRustPackage in nix, but I've cc'd some devs to help.

@weihanglo
Copy link
Member Author

weihanglo commented Feb 15, 2022

I am not sure which solution is better. Let me summarize them:

Move minority sources into @ prefixed directories

  • Link
  • Brief: When a version conflict occurs, keep the source of majority in the top level of vendor directory. Move other sources into each @ prefixed directories separately.
  • Pros:
    • Vendor directories are backward compatible. Old toolchain can reuse vendor directories generated by new toolchain when conflict occurs (only need to apply some extra vendor configs).
  • Cons:
    • User needs to apply extra source-replacement configs when a version conflict occurs.
    • Vendor layout changes when version conflict occurs. It breaks if a user blindly depends on the directory source.

Directory sources start collecting crates from @ prefixed directories

  • Link
  • Brief: When version conflict occurs, put conflict crates into each @ prefixed directory. Directory source would collects extra crates in those directories.
  • Pros:
    • User need not to apply any extra source-replacement config when a version conflict occurs. New directory source logic would handle them.
  • Cons:
    • Vendor directories generated by new toolchain cannot be used by old toolchain.

@joshtriplett @alexcrichton do you have any preference?

@alexcrichton
Copy link
Member

I don't really have a great sense for what the best choice is here. I don't know how directory sources are used and it sounds like that question is better for @joshtriplett perhaps?

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/cargo meeting, and settled on the second solution, but with some further nuance: Cargo directory sources shouldn't look at all @ subdirectories, just the @ subdirectory corresponding to the source being replaced (in addition to the top-level directory).

@weihanglo weihanglo added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 5, 2022
@alexcrichton alexcrichton removed their assignment Mar 8, 2022
@weihanglo
Copy link
Member Author

rust-lang/rfcs#3243 is proposing namespace package name as @foo/bar (though it will start bikeshedding). This might conflict with what this PR tries to do. We can wait and see what happens next.

@weihanglo
Copy link
Member Author

@rfcbot cancel

I'm going to close this due to the current solution might not be compatible with future syntax of rust-lang/rfcs#3243. Feel free to take it if coming up with a better solution.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 21, 2022

@weihanglo proposal cancelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. T-cargo Team: Cargo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cargo vendor can't handle duplicates.
8 participants