Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving deltas takes long time #11014

Closed
ar37-rs opened this issue Aug 22, 2022 · 13 comments
Closed

Resolving deltas takes long time #11014

ar37-rs opened this issue Aug 22, 2022 · 13 comments

Comments

@ar37-rs
Copy link

ar37-rs commented Aug 22, 2022

Problem

Cargo can be very slow updating the crates.io index. You may see a progress bar such as:

    Updating crates.io index
       Fetch [=================>       ]  74.01%, (64415/95919) resolving deltas

Workaround

There are three different workarounds for this issue. The following instruction assume that CARGO_HOME is the default of .cargo in your home directory.

Use net.git-fetch-with-cli

The net.git-fetch-with-cli config option will instruct cargo to use the git CLI to update the index instead of its built-in git client. The git CLI should be more efficient at updating the repository. Enter the following into your global cargo configuration file, usually ~/.cargo/config.toml (or ~/.cargo/config for older setups):

[net]
git-fetch-with-cli = true

Or set the environment variable CARGO_NET_GIT_FETCH_WITH_CLI=true.

Delete the index cache

To work around this issue, delete the existing git index in cargo's cache. This will cause cargo to re-download the index which should have minimal deltas to resolve.

For most users, the following should be sufficient:

rm -rf ~/.cargo/registry/index

For Powershell users:

rm -R -Force ~\.cargo\registry\index

For Windows cmd users:

rmdir /s /q %USERPROFILE%\.cargo\registry\index

Switch to the sparse protocol

Warning: This is only available starting in 1.68 (to be released 2023-03-09). Due to a config issue, older releases will forbid setting this option.

In the 1.68 release of Rust, you can opt-in to the new sparse protocol. This uses HTTPS instead of git to update the index, which should be much more efficient in most cases. In your global ~/.cargo/config.toml file, enter:

[registries.crates-io]
protocol = "sparse"

Or set the CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse environment variable.

Cause

Cargo uses the libgit2 library for handling git operations. Its "delta resolution" algorithm has some inefficiencies, particularly when updating an existing repository that has a chain of historic updates. This issue is tracked upstream in libgit2/libgit2#4674.

Solutions

There are several approaches that the Cargo Team is investigating for resolving this:

  • Enabling the sparse protocol by default.
  • Improving the delta resolution algorithm in libgit2.
  • More aggressively garbage-collecting or deleting the git index when it grows too large.
  • Switching to gitoxide for git operations.
@ar37-rs ar37-rs added the C-tracking-issue Category: A tracking issue for something unstable. label Aug 22, 2022
@weihanglo
Copy link
Member

Hi. Supposed your are talking about updating crates.io index, such like this

    Updating crates.io index
    Fetch [==============>          ]  63.70%, (76766/170478) resolving deltas

It is a known issue and the community are working on this from different angles.

First, Cargo introduced a unstable feature called sparse index, which is tracked here1. It provides a new way to fetch crate dependencies on demand, without downloading the whole index up front. Please check its doc and help us improve and push it to stable!

The other approach is a speedup of Git itself. We're going to do experiments on replacing some of git2 functionality in Cargo with gitoxide, which is a more modular, safer, and faster Git implementation in Rust. This doesn't remove the entire "resolve deltas" phase but does accelerate it, not to mention the long lingering performance issue about shallow clone2 and local index3 being fixed.

In the meanwhile, you can use net.git-fetch-with-cli or its equivalent environment variable CARGO_NET_GIT_FETCH_WITH_CLI to make git fetching faster.

I am going to close this as each approach has already been tracked by other issues. Thank you for the report!

Footnotes

  1. https://github.com/rust-lang/cargo/issues/9069

  2. https://github.com/rust-lang/cargo/issues/1171

  3. https://github.com/rust-lang/cargo/issues/9167

@weihanglo weihanglo closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2022
@weihanglo weihanglo removed the C-tracking-issue Category: A tracking issue for something unstable. label Aug 22, 2022
@ar37-rs
Copy link
Author

ar37-rs commented Aug 23, 2022

@weihanglo that's a good news, thank you for your reply.

@katopz
Copy link

katopz commented Jan 12, 2023

Just hit this today via

cargo install -q worker-build && worker-build --release

I've to change it to

cargo install worker-build && worker-build --release

to see what going on and seem like I get stuck at

Updating crates.io index
     Fetch [====>                    ]  20.72%, 7.22MiB/s  

for a long waiting time, then i try again and get

 Updating crates.io index
       Fetch [=================>       ]  75.61%, (65996/95304) resolving deltas

yet another long waiting time, maybe after update wrangler to 2.7.1 or maybe after rustup update stable not sure (i did both).

@skull-squadron
Copy link

skull-squadron commented Jan 12, 2023

Also experiencing this. Cargo should be using a global anycast HA endpoint something other than leaning on github. It should be minimal, binary, versioned, differential transformations, compressed, authenticated, continuable, and performant... not git.

Wasting my life waiting for tens of KiB to populate at a time.

I could be wrong, but I believe it's hitting GH's API tarpit throttling code or there's some network storm going on.

@joshuataylor
Copy link

Maybe it's a local Github country mirror issue, my main one is the deltas part. Everything else is okay, but AU is offpeak. I get around 30-90mb/s to Github.

@Eh2406
Copy link
Contributor

Eh2406 commented Jan 12, 2023

In addition to the fixes mentioned above delete ~/.cargo/registry will probably also work. This is a known long-standing bug with libgit2. The process of a clean clone of a repository is linear in the history, but the process of updating into an existing repository is quadratic. Also cargoes index is big and can have a lot of history.

@Eh2406
Copy link
Contributor

Eh2406 commented Jan 12, 2023

Apparently the index normalization was done today. Which means that today's delta includes a change of almost every file.

@Eh2406

This comment was marked as outdated.

@weihanglo
Copy link
Member

Some good news:

@weihanglo
Copy link
Member

Sparse registry is now the default for crates.io in 1.70. -Zgitoxide unstable flag also provides shallow clone on crates.io index since nightly-2023-05-05 IIRC.

I would say this is largely resolved. Close and thanks everyone.

@joshlf
Copy link

joshlf commented Feb 8, 2024

Does anyone have a suggested fix for this on older Rust versions? I'm currently trying to test on my crate's MSRV (1.57), and so the more recent Cargo fixes aren't available. I believe what I'm seeing is a download issue because there's no "resolving deltas" message printed, but I keep getting network timeouts to GitHub despite my other network traffic behaving as normal. It's timed out multiple times, meaning I literally am unable to update the index, and so I can't compile, period. Has anyone figured out a workaround for the network part of this (notably on older versions of Cargo)?

@Byron
Copy link
Member

Byron commented Feb 8, 2024

Here is something you can try:

cd ~/.cargo/registry/index/github.com-1ecc6299db9ec823
git fetch https://github.com/rust-lang/crates.io-index refs/heads/master:refs/remotes/origin/master

That will fetch all objects you are missing, so next time cargo tries to fetch it won't have to fetch that much, which hopefully won't timeout.

Please let me know how it goes and good luck!

@joshlf
Copy link

joshlf commented Feb 8, 2024

Here is something you can try:

cd ~/.cargo/registry/index/github.com-1ecc6299db9ec823
git fetch https://github.com/rust-lang/crates.io-index refs/heads/master:refs/remotes/origin/master

That will fetch all objects you are missing, so next time cargo tries to fetch it won't have to fetch that much, which hopefully won't timeout.

Please let me know how it goes and good luck!

Okay I'll try that, thanks!

EDIT: A few hours after I posted my comment, network conditions improved and I was able to download the index successfully. I tried this anyway, and it worked, although I'm not sure what that tells me since the problem had resolved itself (presumably transiently). Your suggestion did work, at least 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants