Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gitoxide for list_files_git #13592

Merged
merged 11 commits into from
Mar 24, 2024
Merged

Conversation

Byron
Copy link
Member

@Byron Byron commented Mar 15, 2024

Related to #10150.

Tasks

  • update gix to v0.60
  • assure this is tested (currently only git-tests run with git2 and gitoxide)
  • allow list_files_git to use gitoxide if it is enabled as feature.
  • use dirwalk iterator
  • use new release of gix with necessary updates

Review Notes

As this PR has come a long way, I decided to keep a few of the steps leading up to the final state, showing the PR's evolution in the hope it helps the review.

  • Would it be better to simply use gitoxide for this without a switch? I don't think
    it will cause more trouble than git2, and if there is an issue I will fix it with priority.
  • In one test, the walk resolves a symlink to a submodule to individual files, including the .git/* folder contained in the submodule which is ignored by the walk, i.e. submodule/* does not contain it, but submodule-link/* does. This is fixed in the gitoxide version, and the git2 version.
  • I noticed that symlinks are resolved for packaging and are allowed to point to anywhere, even outside of package root. I left it, but felt that maybe this should be reconsidered.

Remarks

  • I love the test-suite! It's incredibly exhaustive to the point where it uncovers shortcomings in gitoxide, which I greatly appreciate.
  • I also love git2 as it's API for many things leads to pretty idiomatic code, and sometimes I really have to work to match it. The example here is the initial dirwalk() method which requires a delegate as it doesn't just collect into a Vec like git2 does (for good reason). Turning that into an iterator via dirwalk_iter() makes it far more usable, and will definitely be good for performance as the dirwalk work is offloaded into its own thread.

@rustbot
Copy link
Collaborator

rustbot commented Mar 15, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 15, 2024
@Byron Byron mentioned this pull request Mar 16, 2024
5 tasks
It contains the feature required to get a directory traversal.
@rustbot rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label Mar 18, 2024
@Byron Byron marked this pull request as ready for review March 18, 2024 08:52
With a traditional walk, `.git` will be picked up, and so will be
ignored directories. This commit also doesn't give submodules special
treatment - instead it just tries to open directories as repositories,
or walks them if that fails.
This is the case for the git2 implementation naturally, but as
`gitoxide` doesn't yet have a dirwalk iterator, it's much
less intuitive here.
Copy link
Member Author

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once CI has passed, I think this is ready for review. I left plenty of comments in this PR, to associate some of my 'Review Notes' with code.

Overall, I am really excited about this PR and can say that it greatly improved gitoxide in the process. It's quite amazing to see what Cargo does to list files as well :).

.gctx
.cli_unstable()
.gitoxide
.map_or(false, |features| features.fetch)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I don't know if it's great to above fetch. For now I left it as you might decide to use gix by default here. If not, one could add another feature toggle like list, even though I fear very low adoption/testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate flag might be of use if we want to stabilize piece-meal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is implemented now, and I called the flag 'list-files'.

let rel = rel.replace(r"\", "/");
match repo.find_submodule(&rel).and_then(|s| s.open()) {
warn!(" found directory {}", file_path.display());
match git2::Repository::open(&file_path) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the intent is to add the directory, submodule or not, it's better to open it as Repository to avoid picking up .git folders. This fix I applied after noticing this in the gitoxide implementation.

.emit_ignored(None)
.emit_tracked(true)
.recurse_repositories(false)
.symlinks_to_directories_are_ignored_like_directories(true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option I added specifically to support a difference in behaviour when comparing libgit2 to Git. Symlinks to directories are treated as directories by the .gitignore machinery (on Unix only in libgit2, all platforms in gitoxide). Git itself always treats symlinks as files.

let mut pattern = include.clone();
pattern
.to_mut()
.insert_str(0, if include.is_empty() { ":!" } else { ":!/" });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pathspec prefix is short for 'exclude from top', with the 'top' being required to avoid the CWD to affect the pathspec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the : syntax and where can I read more about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is it a path separator? Why are we using strings with separators rather than Vec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pathspec syntax is document in the Git glossary of all places.

With a top-level Cargo.toml file, this code produces two pathspecs: :/ and :!/target/, with a sub-package it would look like :/sub-package and :!/sub-package/target/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a Rust API using stringly-typed values based on a command-line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you are getting at.

I didn't think of innovating here, following the API used by libgit2 and git2 respectively. Having strings for pathspecs runs deep, and they are treated more like paths or regex or globs rather than a data structure. The datastructure itself also isn't particularly well suited to be constructed by hand, as git settings affect it (like case-sensitivity). It's quite complex, but I can imagine to tackle this by adding a trait like TryIntoPathspec which then works for strings as well as for hand-created pathspecs.

I see how being able to not have to go through the parser here would be beneficial though, so I have put this into the roadmap.

(Since it's not worse than git2 which is also stringly typed, I hope this won't have to block this PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, thank you for the explanation!

For me, it is more of trying to make the intent of the code clearer. I'll mark this as resolved but have some other thoughts for improving it with what we have now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, was tempted to encourage pathspec being added to the variables but the leading let pathspec = helps. Still easy to miss the structure these variables are intended to uphold. I feel like I could go either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely quite complex code, with a bunch of variables and inter-dependencies to make it all line up. This could already be improved a lit if format!() could be used, but since paths are involved, that's not possible :/.
Maybe the pathspec-code can be refactored into something better nonetheless.

// Do not trust what's recorded in the index, enforce checking the disk.
// This traversal is not part of a `status()`, and tracking things in `target/`
// is rare.
None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a possible source of failures in the git2 version. The index can generally not be trusted to represent the working tree unless a status call has set the UPTODATE flag on the entry. This isn't currently happening, and fixing this would be more complicated than it's worth.
Maybe another argument to switch this part over to gix.

// This could be a submodule, or a sub-repository. In any case, we prefer to walk
// it with git-support to leverage ignored files and to avoid pulling in entire
// .git repositories.
match gix::open_opts(&file_path, gix::open::Options::isolated()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for 'isolated' open (to avoid being affected by user or system settings) as these shouldn't be relevant for what we are doing. This makes sure of that, and is faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the user/system level gitignore configuration is ignored? I believe that would be a change from the current model.

Some people put editor-specific ignore configuration in their user-level git configuration. Cargo should respect that when publishing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch, I don't know why I wasn't seeing this 🤦‍♂️.

It's fixed now, which also simplifies the code quite a bit as one can now discover and open the repository without altering the default configuration.

@@ -689,6 +689,7 @@ fn package_symlink_to_submodule() {
project
.cargo("package --no-verify -v")
.with_stderr_contains("[ARCHIVING] submodule/Makefile")
.with_stderr_does_not_contain("[ARCHIVING] submodule-link/.git/config")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to assure that in sub-repositories, we never pick up .git directories.

_ = fs::write(repo.workdir().unwrap().join(".gitignore"), "target/").unwrap();
git::add(&repo);
git::commit(&repo);
_ = fs::write(p.build_dir().join("untracked.txt"), "").unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was added to further stress the gitoxide implementation, which I could otherwise have implemented wrongly. But even if that wasn't the case, it's good to see that only tracked files in target will be picked up.

.github/renovate.json5 Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
* remove renovate group as it's not needed anymore
* repository discovery will open with isolation
Byron added a commit to Byron/gitoxide that referenced this pull request Mar 18, 2024
For pathspecs, plan to provide a type-safe version of pathspec
instantiation, as suggested [in this comment](rust-lang/cargo#13592 (comment)).
@weihanglo
Copy link
Member

@arlosi in case you missed this.

@Byron
Copy link
Member Author

Byron commented Mar 20, 2024

Just for clarity, from my point of view all issues brought up so far have been addressed. Is this PR blocked on me nonetheless?

@epage
Copy link
Contributor

epage commented Mar 20, 2024

For myself, I don't fully feel confident in answering a lot of the questions you raised (or identifying issues in this code). I can say "this is unstable, just merge" but I worry we might lose track of a lot of this.

Would appreciate a look for someone else, especially with a lot of experience with this code.

@weihanglo
Copy link
Member

I'll find a time reviewing this.

@rustbot rustbot added the A-unstable Area: nightly unstable support label Mar 20, 2024
@arlosi
Copy link
Contributor

arlosi commented Mar 21, 2024

I confirmed this PR does fix the issue I was having related to #10150 for the missing sdir extension, as well as the link extension mentioned in the issue.

src/cargo/sources/path.rs Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
It's not necessary, and adds noise.
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
Byron and others added 2 commits March 23, 2024 08:41
Co-authored-by: Arlo Siemsen <arkixml@gmail.com>
Top-level pathspecs are needed to assure they are not affected by
the CWD. The trailing `/` in `'target` is needed to assure excluded
items are in a folder, and that only entries in that folder are extracted
from the index.
@arlosi
Copy link
Contributor

arlosi commented Mar 24, 2024

Thanks! Looking forward to making this the default.

@bors r+

@bors
Copy link
Collaborator

bors commented Mar 24, 2024

📌 Commit 9178160 has been approved by arlosi

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 24, 2024
@bors
Copy link
Collaborator

bors commented Mar 24, 2024

⌛ Testing commit 9178160 with merge a5b31eb...

@bors
Copy link
Collaborator

bors commented Mar 24, 2024

☀️ Test successful - checks-actions
Approved by: arlosi
Pushing a5b31eb to master...

@bors bors merged commit a5b31eb into rust-lang:master Mar 24, 2024
23 checks passed
@weihanglo weihanglo added the Z-gitoxide Nightly: gitoxide integration label Mar 24, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 25, 2024
Update cargo

13 commits in d438c80c45c24be676ef5867edc79d0a14910efe..a510712d05c6c98f987af24dd73cdfafee8922e6
2024-03-19 16:11:22 +0000 to 2024-03-25 03:45:54 +0000
- Remove unnecessary test (rust-lang/cargo#13637)
- Use `gitoxide` for `list_files_git` (rust-lang/cargo#13592)
- fix: Warn on -Zlints (rust-lang/cargo#13632)
- feat: Add a basic linting system (rust-lang/cargo#13621)
- docs: remove untrue TODO for `native_dirs` (rust-lang/cargo#13631)
- refactor(testsuite): Rename lints to lints_table (rust-lang/cargo#13627)
- Fix debuginfo strip when using `--target` (rust-lang/cargo#13618)
- refactor(toml): Push diagnostic complexity on annotate-snippets (rust-lang/cargo#13619)
- Fix publish script due to crates.io CDN change (rust-lang/cargo#13614)
- fix(alias): dont panic when resolving an empty alias (rust-lang/cargo#13613)
- Update annotate snippets (rust-lang/cargo#13609)
- refactor(vendor): tiny not important refactors (rust-lang/cargo#13610)
- feat: Report some dependency changes on any command (rust-lang/cargo#13561)

r? ghost
@rustbot rustbot added this to the 1.79.0 milestone Mar 25, 2024
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Mar 26, 2024
Update cargo

13 commits in d438c80c45c24be676ef5867edc79d0a14910efe..a510712d05c6c98f987af24dd73cdfafee8922e6
2024-03-19 16:11:22 +0000 to 2024-03-25 03:45:54 +0000
- Remove unnecessary test (rust-lang/cargo#13637)
- Use `gitoxide` for `list_files_git` (rust-lang/cargo#13592)
- fix: Warn on -Zlints (rust-lang/cargo#13632)
- feat: Add a basic linting system (rust-lang/cargo#13621)
- docs: remove untrue TODO for `native_dirs` (rust-lang/cargo#13631)
- refactor(testsuite): Rename lints to lints_table (rust-lang/cargo#13627)
- Fix debuginfo strip when using `--target` (rust-lang/cargo#13618)
- refactor(toml): Push diagnostic complexity on annotate-snippets (rust-lang/cargo#13619)
- Fix publish script due to crates.io CDN change (rust-lang/cargo#13614)
- fix(alias): dont panic when resolving an empty alias (rust-lang/cargo#13613)
- Update annotate snippets (rust-lang/cargo#13609)
- refactor(vendor): tiny not important refactors (rust-lang/cargo#13610)
- feat: Report some dependency changes on any command (rust-lang/cargo#13561)

r? ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. A-unstable Area: nightly unstable support S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. Z-gitoxide Nightly: gitoxide integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants