Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: create ephemeral workspace for git source #13689

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hi-rustin
Copy link
Member

@hi-rustin hi-rustin commented Apr 2, 2024

What does this PR try to resolve?

close #13259

Root cause

The root cause of this problem is that cargo install creates a non-ephemeral workspace for git sources, that would try finding all member packages in that git repo, and add them as path dependencies.

However, when computing fingerprints to determine if a rebuild is needed, Cargo doesn't hash git revision into fingerprint directly. Instead, it checks if source file path has changed. For git source, it is an absolute path to ~/.cargo/git/checkouts//, so commit SHA is always there. For path dependency, it is relative to workspace root, so nothing change and Cargo doesn't rebuild even it is from a different git revision.

Sloution

If any time we create an ephemeral workspace, then we will update the git source every time and that would help us use the right code and rebuild the binary correctly.

There are some historical PRs related to this change:

  1. 4 years ago, we tried to all use ephemeral workspace in Simplify install #5350. But it caused cargo install doesn't follow workspace target #5662 so we revert that change in Fix cargo install using a workspace target dir #5685.
  2. Search for root manifest with ephemeral workspaces #7768 In this PR, we try to search the root manifest by using Workspace::new, but it also caused the cargo install not working correctly when using git repo #13259.

So in my fix, I just rollback to use ephemera workspace but using find_root to achieve the same goal of searching the root manifest. You can find more in the test: git_install_reads_workspace_manifest

How should we test and review this PR?

Additional information

None

@rustbot
Copy link
Collaborator

rustbot commented Apr 2, 2024

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added Command-install S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2024
@hi-rustin hi-rustin changed the title DO NOT READY FOR REVIEW: fix: always create ephemeral workspace for install NOT READY FOR REVIEW: fix: always create ephemeral workspace for install Apr 2, 2024
@hi-rustin
Copy link
Member Author

Debugging note:

  1. The failed case was caused by my change.
  2. The reason is that we don't check the package content in Workspace::ephemera function, but in Workspace::new we will try to load the package again. This way, it can bail out the correct error from the manifest.
  3. Perhaps we should explicitly check the manifest in the ephemera function or find a way to abort the process before make_ws_rustc_target.

@rustbot rustbot added the A-workspaces Area: workspaces label Apr 12, 2024
@hi-rustin hi-rustin changed the title NOT READY FOR REVIEW: fix: always create ephemeral workspace for install fix: always create ephemeral workspace for install Apr 12, 2024
@hi-rustin hi-rustin force-pushed the rustin-patch-install-git branch 5 times, most recently from f74eabe to ce67365 Compare April 13, 2024 03:26
@hi-rustin hi-rustin marked this pull request as ready for review April 13, 2024 03:32
@hi-rustin
Copy link
Member Author

Tested locally:

  1. Add a new config
[build] 
target-dir = "/Volumes/t7/code/cargo/target/tmp1" 
  1. Install the video_in_waveform with new cargo build:
./target/debug/cargo install --git https://github.com/hi-rustin/cargo-information --rev 5fd64bb6afe37092ee1dd7c518f4fc4865175c22
   Compiling bytesize v1.3.0
   Compiling cargo-information v0.4.2 (https://github.com/hi-rustin/cargo-information?rev=5fd64bb6afe37092ee1dd7c518f4fc4865175c22#5fd64bb6)
    Finished `release` profile [optimized] target(s) in 1m 26s
  Installing /Users/hi-rustin/.cargo/bin/cargo-info
   Installed package `cargo-information v0.4.2 (https://github.com/hi-rustin/cargo-information?rev=5fd64bb6afe37092ee1dd7c518f4fc4865175c22#5fd64bb6)` (executable `cargo-info`)
  1. Calculate the hash:
md5 ~/.cargo/bin/cargo-info
MD5 (/Users/hi-rustin/.cargo/bin/cargo-info) = 3a7ccae20527098730fda436790ddf72
  1. Install it again with a different version:
./target/debug/cargo install --git https://github.com/hi-rustin/cargo-information --rev e8b3419bc033d3252ad19081f516946c94e0c02e
    Updating git repository `https://github.com/hi-rustin/cargo-information`
    Updating crates.io index
   Compiling cargo-information v0.4.2 (https://github.com/hi-rustin/cargo-information?rev=e8b3419bc033d3252ad19081f516946c94e0c02e#e8b3419b)
    Finished `release` profile [optimized] target(s) in 7.85s
   Replacing /Users/hi-rustin/.cargo/bin/cargo-info
    Replaced package `cargo-information v0.4.2 (https://github.com/hi-rustin/cargo-information?rev=5fd64bb6afe37092ee1dd7c518f4fc4865175c22#5fd64bb6)` with `cargo-information v0.4.2 (https://github.com/hi-rustin/cargo-information?rev=e8b3419bc033d3252ad19081f516946c94e0c02e#e8b3419b)` (executable `cargo-info`)
  1. Calculate the hash:
md5 ~/.cargo/bin/cargo-info
MD5 (/Users/hi-rustin/.cargo/bin/cargo-info) = abd82eed7d6f3f88cf7a5371bf75d469

@hi-rustin hi-rustin changed the title fix: always create ephemeral workspace for install fix: create ephemeral workspace for git source Apr 13, 2024
@hi-rustin
Copy link
Member Author

@weihanglo
Could you please take a look? Thanks! 💚 💙 💜 💛 ❤️

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking back into the history!

@@ -283,6 +283,10 @@ impl<'gctx> Workspace<'gctx> {
ws.member_ids.insert(id);
ws.default_members.push(ws.current_manifest.clone());
ws.set_resolve_behavior();
// The find_root function is used here to traverse the directory tree and locate the root of the workspace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit worried about this change. The in-memory property isn't held for ephemeral workspaces after this patch. It is also a bit fragile because the workspace discovery logic is only guarded by the other place, which is not immediately clear and hard to track.

I wonder if we really need this line here. git_install_reads_workspace_manifest seems to fail if we don't do find_root, but does cargo ever use profile or anything from there? Things we might want to figure out:

  • What kind of workspace support cargo install --git current provide?
  • Can we skip this and still find all members and inheritable fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but does cargo ever use profile or anything from there?

After this PR we can only check the manifest for errors. Before that, I belive that we could use any information defined by the workspace. For example lints etc.

  • What kind of workspace support cargo install --git current provide?

Before this PR, we just load the whole workspace and create a workspace based on the root manifest.
After this PR, we create the workspace based on the specific package and we don't use the information from the root manifest. Only searched the root to see if there is an error in the manifest.

  • Can we skip this and still find all members and inheritable fields?

After this PR, we only checked the manifest, we don't respect workspace information anymore.

For example:
If we try cargo install --git https://github.com/hi-rustin/test-cargo-git-install bin-tool;

Before this change:

test-cargo-git-install on  master via 🦀 v1.77.2 cargo install --git https://github.com/hi-rustin/test-cargo-git-install bin-tool 
    Updating git repository `https://github.com/hi-rustin/test-cargo-git-install`
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
  Installing bin-tool v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)
   Compiling lib1 v0.1.0 (/Users/hi-rustin/.cargo/git/checkouts/test-cargo-git-install-fae8728302b9b2da/d7a594e/lib1)
   Compiling bin-tool v0.1.0 (/Users/hi-rustin/.cargo/git/checkouts/test-cargo-git-install-fae8728302b9b2da/d7a594e/bin-tool)
error: usage of an `unsafe` block
 --> bin-tool/src/main.rs:2:5
  |
2 | /     unsafe {
3 | |         println!("Hello, world!");
4 | |     }
  | |_____^
  |
  = note: requested on the command line with `-F unsafe-code`

warning: unnecessary `unsafe` block
 --> bin-tool/src/main.rs:2:5
  |
2 |     unsafe {
  |     ^^^^^^ unnecessary `unsafe` block
  |
  = note: `#[warn(unused_unsafe)]` on by default

warning: `bin-tool` (bin "bin-tool") generated 1 warning
error: could not compile `bin-tool` (bin "bin-tool") due to 1 previous error; 1 warning emitted
error: failed to compile `bin-tool v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)`, intermediate artifacts can be found at `/var/folders/j1/7l6snwpx6svgqxh79bz_d27m0000gn/T/cargo-installbO7SHc`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

After this change:

test-cargo-git-install on  master via 🦀 v1.77.2 ../cargo/target/debug/cargo install --git  https://github.com/hi-rustin/test-cargo-git-install bin-tool

    Updating git repository `https://github.com/hi-rustin/test-cargo-git-install`
  Installing bin-tool v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)
   Compiling lib1 v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)
   Compiling bin-tool v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)
    Finished `release` profile [optimized] target(s) in 2.57s
  Installing /Users/hi-rustin/.cargo/bin/bin-tool
   Installed package `bin-tool v0.1.0 (https://github.com/hi-rustin/test-cargo-git-install#d7a594e5)` (executable `bin-tool`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a workspace information perspective, I think this PR introduces some regression.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a workspace information perspective, I think this PR introduces some regression.

workspace is a feature which should not be there as it complicates all optimizations? it is only there because CARGO_TARGET_DIR does not work very well, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace is already there so to move forward we need to first figure out the feature set cargo install --git offers. However, I agree with soloturn, we may need to step back a bit to re-evaludate the bug.

Both workspace initialization and CARGO_TARGET_DIR are symptoms of the bug. From my previous investigation, the bug is in workspace, members are collected as from PathSource, hence cargo install recognizes the as path dependencies. In contrast, when using in git dependency, workspace members will be marked as from GitSource. I wonder if we should mark the entire workspace with the same source as the root one for cargo install. Thinking of nested packages rust-lang/rfcs#3452, if we had that today, we might also need this, otherwise nested package will be marked as from local path and never get updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace is a feature which should not be there as it complicates all optimizations? it is only there because CARGO_TARGET_DIR does not work very well, isn't it?

I think it's hard to say the workspace feature complicates all optimizations. As my example in https://github.com/hi-rustin/test-cargo-git-install, respect workspace information will help us avoid inconsistency.

The problem is not caused by the CARGO_TARGET_DIR. If users set a CARGO_TARGET_DIR, then all compilation would happen in the CARGO_TARGET_DIR.

The problem here is as @weihanglo said, we cannot rebuild the binary after the git repository is updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should mark the entire workspace with the same source as the root one for cargo install. Thinking of nested packages rust-lang/rfcs#3452, if we had that today, we might also need this, otherwise nested package will be marked as from local path and never get updated.

That sounds like a reasonable solution, I will take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should mark the entire workspace with the same source as the root one for cargo install

I've looked at how we calculate the fingerprint of the path member.

pub fn path_args(ws: &Workspace<'_>, unit: &Unit) -> (PathBuf, PathBuf) {
    let ws_root = ws.root();
    let src = match unit. target.src_path() {
        TargetSourcePath::Path(path) => path.to_path_buf(),
        TargetSourcePath::Metabuild => unit.pkg.manifest().metabuild_path(ws.target_dir()),
    };
    assert!(src.is_absolute());
    if unit.pkg.package_id().source_id().is_path() {
        if let Ok(path) = src.strip_prefix(ws_root) {
            return (path.to_path_buf(), ws_root.to_path_buf());
        }
    }
    (src, unit.pkg.root().to_path_buf())
}

It seems we always try to track the workspace as much as possible.

So I am not sure what you mean by mark the entire workspace with the same source. Can you explain it a little bit?

Copy link
Member

@weihanglo weihanglo Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we read a dependency from a Git source, no matter it is a single package or workspace member, it always returns a PackageId with SourceKind::Git. Under the hood it is actually a PathSource assoicated with a Git SourceId. This helps Cargo track the actual source.

let source_id = self
.source_id
.with_git_precise(Some(actual_rev.to_string()));
let path_source = PathSource::new_recursive(&checkout_path, source_id, self.gctx);
self.path_source = Some(path_source);

However, in cargo install we use workspace to find members so we lost that sticky fashion.

)
.run();

cargo_process(&format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a comment or two pointing out we want to observe a rebuild?
Or we can actually do a shasum comparsion like the original did?

@bors
Copy link
Collaborator

bors commented Apr 18, 2024

☔ The latest upstream changes (presumably #13769) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-workspaces Area: workspaces Command-install S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cargo install not working correctly when using git repo
5 participants