Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from git2 to gix #531

Merged
merged 3 commits into from
Sep 25, 2023
Merged

Migrate from git2 to gix #531

merged 3 commits into from
Sep 25, 2023

Conversation

Byron
Copy link
Contributor

@Byron Byron commented Sep 8, 2023

Switch from git2 to gix to increase performance and reduce compile times. tame-index also uses gix, which means that previously two different git implementations were used. Now the implementation is unified, reducing the amount of dependencies.

Please note that this also means that, to avoid duplication of gix in the tree, each time tame-index is upgraded, one should validate that its gix dependency matches up with the gix dependency used here.

Tasks

  • upgrade
  • wait for a new release of tame-index

Measurements

Before the change:

cargo build  834.74s user 31.81s system 873% cpu 1:39.23 total
cargo build --release  548.64s user 24.55s system 827% cpu 1:09.27 total

After the change:

cargo build  752.77s user 27.67s system 904% cpu 1:26.31 total
cargo build --release  505.19s user 20.98s system 892% cpu 58.938 total

Why gix indicates breaking changes basically by default

gix uses cargo smart-release to handle setting manifest versions based on conventional git commit comments, and to release interconnected crates in the correct order, based on need. There is also the notion of safety-bumps, which means that the downstream will indicate breaking changes if an upstream crate indicates a breaking change. This is because some crates may re-export their dependencies, adding their API surface in the process. However, this isn't always the case and generally, gix indicates breaking changes way more than it would have to. Wouldn't it be nice if breaking changes could be detected instead of assumed?

I think it would be a dream if cargo semver-check could somehow be integrated into the release-process of gix so that safety-bumps could be done based on actual need. Thus far I didn't look into this as it's an optimization, but it would definitely be very, very nice to have. And I am just mentioning this here in case it's interesting or even inspiring.

@obi1kenobi obi1kenobi added the C-enhancement Category: raise the bar on expectations label Sep 8, 2023
Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thank you! I especially love the max-performance feature name :)

Would you mind adding a comment near the gix and tame-index requirements that mentions the need to keep them in sync? Without such a comment, I think it's likely we'll forget to do that sooner or later and your wisdom will go to waste until rediscovered by re-reading this PR.

@obi1kenobi
Copy link
Owner

I think it would be a dream if cargo semver-check could somehow be integrated into the release-process of gix so that safety-bumps could be done based on actual need. Thus far I didn't look into this as it's an optimization, but it would definitely be very, very nice to have.

That'd be great! Unfortunately cargo-semver-checks currently can't connect different crates' APIs together since the rustc/rustdoc mechanism for doing this reliably isn't available yet. There's interest from the rustdoc team to do this, I just don't think it's happened yet.

It might be possible to do by scanning the 3rd party crates directly to find which APIs had breaking changes, then separately scanning gix to find which 3rd party APIs it uses, and finally intersecting the two datasets. This would require some custom work that I unfortunately don't have cycles for, but I'd be happy to mentor contributors who might be interested in it!

@Byron
Copy link
Contributor Author

Byron commented Sep 8, 2023

This is awesome, thank you! I especially love the max-performance feature name :)

I hope for a future version of gix, I will be introducing ultra-performance ;). For what's done here, I actually think max-performance isn't needed and the feature could just be removed. It's definitely your call - the trade here is performance for purity 😅.

Would you mind adding a comment near the gix and tame-index requirements that mentions the need to keep them in sync?

Makes perfect sense and is now done.

Please note that I just finished some compile-time optimization work which comes to effect only once tame-index has upgraded as well. I will wait until that happened unless you want to rather see that happen in a separate PR. Please let me know.

@obi1kenobi
Copy link
Owner

I hope for a future version of gix, I will be introducing ultra-performance ;). For what's done here, I actually think max-performance isn't needed and the feature could just be removed. It's definitely your call - the trade here is performance for purity 😅.

I trust your judgment. I haven't looked too closely at what the tradeoff is, and I'm happy to do what you think is best. Excited to get ultra-performance one day too! 🔥

Please note that I just finished some compile-time optimization work which comes to effect only once tame-index has upgraded as well. I will wait until that happened unless you want to rather see that happen in a separate PR. Please let me know.

Either way is totally fine. I'm probably not going to manage to make another release before I head to RustConf next week, so waiting is fine and probably won't make any difference. But again, your call, I'm equally happy to wait or merge this as-is.

@Byron
Copy link
Contributor Author

Byron commented Sep 9, 2023

Alright, then I will wait for tame-index so the switch to latest version of gix becomes possible. I'll keep you posted :).

Jake-Shadle added a commit to EmbarkStudios/tame-index that referenced this pull request Sep 11, 2023
Please note that I have removed the `max-performance-safe` feature
which pushes the choice to the consumer of (any) library that uses
`gix`. It's also causing less C to be compiled by default.

### Checklist

* [x] I have read the [Contributor Guide](../../CONTRIBUTING.md)
* [x] I have read and agree to the [Code of
Conduct](../../CODE_OF_CONDUCT.md)
* [x] I have added a description of my changes and why I'd like them
included in the section below

### Description of Changes

An upgrade to the latest `gix`, which includes less dependencies now and
compiles a little faster
because of it. It also avoids making decisions about performance (and
the inclusion of C-crates)
by removing the `max-performance-safe` feature.

The motivation for this update is to get a new `tame-index` release for
consumption in [this
PR](obi1kenobi/cargo-semver-checks#531)
so that it can also use the latest `gix` version without duplication.

### Related Issues

None

---------

Co-authored-by: Jake Shadle <jake.shadle@embark-studios.com>
@Byron Byron marked this pull request as ready for review September 22, 2023 16:35
@Byron
Copy link
Contributor Author

Byron commented Sep 22, 2023

The CI failure seems to be unrelated. Probably it works when restarted.

@Byron
Copy link
Contributor Author

Byron commented Sep 25, 2023

I am a bit puzzled about the reproducible CI error:

Error: failed to open crates.io git index

Caused by:
    Refusing to initialize the non-empty directory as '/home/runner/.cargo/registry/index/github.com-1ecc6299db9ec823'

Maybe this isn't the switch to gix, but rather, the switch to the latest tame-index? It seems to try to open the crates-index, and initialize it despite non-empty. Maybe it's related to the way the crates-index is created on CI?

I downgraded tame-index just to see if that makes CI pass. In the mean-time, do you @obi1kenobi have suggestions on how to reproduce the CI failure locally?

@Byron
Copy link
Contributor Author

Byron commented Sep 25, 2023

The problem with CI I am running into is as follows:

  • the cache restores the crates-index directory in a form that contains a .cache directory, but no .git directory, see this CI log.
  • This causes tame-index to fail to open the git repository, which makes it think it can clone it.
  • That fails as the folder it tries to clone into isn't empty - it contains the .cache directory after all.

CC @Jake-Shadle who might have an idea if tame-index could possibly handle that (and I am not suggesting it should) and CC @obi1kenobi who might know more about this CI setup and how it could be adjusted to include a .git directory.

Thanks everyone for their help.

@obi1kenobi
Copy link
Owner

obi1kenobi commented Sep 25, 2023

Our caching setup just uses the Swatinem/rust-cache action, so I'm not sure why a .git directory might not be present if it's expected to be there. I haven't had a chance to look into how Rust's .cache directory is expected to be structured, but I've deleted the GitHub Actions cache entry that the job was using, so we can see if the problem repeats even after a cache miss.

Thanks for all the work you've been doing here, and especially for debugging the problem!

@obi1kenobi
Copy link
Owner

Ah with the wiped cache, some of the debugging code is causing the job to fail. Sorry about that!

@Byron
Copy link
Contributor Author

Byron commented Sep 25, 2023

No worries at all, thanks for your help! Maybe it works now :).

@obi1kenobi
Copy link
Owner

Evicted the cache again, re-running CI.

@Byron
Copy link
Contributor Author

Byron commented Sep 25, 2023

I have seen it green once by the way :)!
This is the last push, and if it's still green it should be ready to go.

@obi1kenobi
Copy link
Owner

Ah, unfortunately it seems to have failed :(

I don't think there's anything particularly unusual about our caching setup, so I'm wondering why other projects haven't run into this issue as well. Perhaps it's just a matter of time / upgrading to newer versions until they do as well? If so, we should probably flag this with whoever we think is best-positioned to contain the problem before it starts affecting other projects.

Do you think this is more likely to be a Swatinem/rust-cache GitHub Action issue, a tame-index issue, or something fundamental in cargo / Rust?

@Byron
Copy link
Contributor Author

Byron commented Sep 25, 2023

Maybe it was a recent rust-cache update that now prevents .git directories to become a part of the cache, that would explain why it starts happening now. Maybe it's also related to older versions of crates-index/tame-index still being able to clone into an existing directory?

My suggestion is to remove the cache in this case and see if CI times are still acceptable or lower than the longest-running job. Then the added time due to the index clone might not even be noticeable.

@obi1kenobi
Copy link
Owner

I opened #540 to check if the issue might be related to the use of Rust 1.67 in that test case. With that PR, we'll use Rust 1.70 versus stable (currently, Rust 1.72) as the versions we check, which might cause a change in the cache behavior.

@obi1kenobi
Copy link
Owner

Seems like my guess was correct, the caching issue was related to Rust 1.67 and upgrading the job to 1.70 makes it pass both for cache hits and cache misses.

Our next non-patch release will drop 1.67 support anyway, so I don't feel like we have to get to the bottom of it right now. Merging! Thanks for putting this together, much appreciated 🚀

@obi1kenobi obi1kenobi merged commit c1669f3 into obi1kenobi:main Sep 25, 2023
34 checks passed
@Byron Byron deleted the git2-to-gix branch September 25, 2023 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: raise the bar on expectations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants