Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object not found when fetching git dependency #13555

Closed
banool opened this issue Mar 7, 2024 · 13 comments · Fixed by #13946
Closed

object not found when fetching git dependency #13555

banool opened this issue Mar 7, 2024 · 13 comments · Fixed by #13946
Labels
A-crate-dependencies Area: [dependencies] of any kind A-git Area: anything dealing with git C-bug Category: bug S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.

Comments

@banool
Copy link

banool commented Mar 7, 2024

Problem

When building a crate that depends on other crates via git, sometimes you get an error like this:

Caused by:
  failed to load source for dependency `processor`

Caused by:
  Unable to update https://github.com/aptos-labs/aptos-indexer-processors.git?rev=d44b2d209f57872ac593299c34751a5531b51352#d44b2d20
  
Caused by:
  object not found - no match for id (d44b2d209f57872ac593299c34751a5531b51352); class=Odb (9); code=NotFound (-3)

You can see an instance of this failure here: https://github.com/Homebrew/homebrew-core/pull/165260/files. Which comes from here in case the error fails to show up on the first link: https://github.com/Homebrew/homebrew-core/actions/runs/8180427193/job/22368538918?pr=165260.

This only happens sometimes. It seems to happen more often in CI environments, I'm not sure I've seen it happen locally.

Steps

Repro is challenging because it occurs only sometimes. If you look at any of the past brew version bumps for the aptos formula you'll see that one of the CI steps fails at least once every time due to this issue: https://github.com/Homebrew/homebrew-core/pulls?q=is%3Apr+aptos+is%3Aclosed.

The dep in question comes from here: https://github.com/aptos-labs/aptos-indexer-processors. That repo itself has git submodules, so that could be a factor.

Possible Solution(s)

I'm not sure how to fix this, mostly for now we've just been adding retries to get around it.

Notes

No response

Version

This is the cargo version we're using for the binary in question.

cargo 1.75.0 (1d8b05cdd 2023-11-20)
release: 1.75.0
commit-hash: 1d8b05cdd1287c64467306cf3ca2c8ac60c11eb0
commit-date: 2023-11-20
host: aarch64-apple-darwin
libgit2: 1.7.1 (sys:0.18.1 vendored)
libcurl: 8.4.0 (sys:0.4.68+curl-8.4.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1u  30 May 2023
os: Mac OS 14.3.1 [64-bit]
@banool banool added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Mar 7, 2024
@epage epage added A-git Area: anything dealing with git A-crate-dependencies Area: [dependencies] of any kind labels Mar 7, 2024
@weihanglo
Copy link
Member

From my understanding, under the same CI environment, a retry seems to work? That's odd. Are they any custom Git config in the CI environment?

@weihanglo weihanglo added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. and removed S-triage Status: This issue is waiting on initial triage. labels Mar 7, 2024
@banool
Copy link
Author

banool commented Mar 7, 2024

I've noticed this happening in GitHub Actions, but notably on a variety of different runners, workflows, etc.

You'll see in the original error btw that the commit of the repo it says it can't find does indeed exist: aptos-labs/aptos-indexer-processors@d44b2d2. But it's an orphan, not sure if that is relevant.

@ehuss
Copy link
Contributor

ehuss commented Mar 7, 2024

Indeed orphans can be a problem, but usually only when the rev is not specified. That does not appear to be the case here.

One thing to investigate is that it appears https://github.com/aptos-labs/aptos-indexer-processors.git is specified with three different commit revs. I don't recall how cargo handles fetching those separate revs. If it is randomly fetching one of them, and the others aren't "reachable", then that could be a problem. It could also depend on how github's servers decide what to send, since they don't always send a minimal set, and it might change depending on which server is accessed or the phase of the moon.

@banool
Copy link
Author

banool commented Mar 7, 2024

One thing to investigate is that it appears https://github.com/aptos-labs/aptos-indexer-processors.git is specified with three different commit revs

Where can you see that? I can try fix that to reduce the odds of this problem happening.

@weihanglo
Copy link
Member

Here are the three commits:

I haven't got time creating a minimal reproducer with a similar layout though.


@ehuss Would it be the case that git gc removed unreachable objects?

@banool
Copy link
Author

banool commented Mar 7, 2024

Oh I see what you mean. Unfortunately this is intentional, we have something of a... complicated versioning scheme at the moment.

@ehuss
Copy link
Contributor

ehuss commented Mar 7, 2024

@ehuss Would it be the case that git gc removed unreachable objects?

Not directly, I don't think. The actions linked above seem to have some caching, but not of the cargo directory that I can see (and would likely be too large for the cache anyways). Cargo won't run git gc for a long time.

GitHub runs git gc internally, and that can affect things. I believe they use some heuristics for how frequently it runs. Referring to orphans runs the risk that GitHub will remove them complete (and that is partially why I said it may contribute to the randomness, since from what I've seen GitHub has mirrors that are not consistent with one another in terms of how they compress and gc their pack files).

@banool
Copy link
Author

banool commented Mar 7, 2024

I believe this orphan has been this way for quite a while, so it seems strange that at this point it'd be in this partially available state. I guess some bad state sharding on their side or something?

So I suppose if we depend on a non orphan commit that'll improve our odds.

On the cargo side, does cargo retry in this situation?

@ehuss
Copy link
Contributor

ehuss commented Mar 8, 2024

OK, I think I see one possibility of what is happening.

When fetching a repo, cargo doesn't know if a rev refers to a commit or a tag. It can resolve this via github_fast_path which uses the GitHub API to determine the specific OID from the rev.

However, if you have exceeded the API rate limit, that function gets a 403 HTTP response, and returns FastPathRev::Indeterminate and then the fetch function uses a refspec of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"], fetch all branches hoping the commit lives on one of them. However the commit d44b2d209f57872ac593299c34751a5531b51352 is only reachable by a PR, and GitHub PR's are only accessible from pull/ID/head refs.

I have opened #13563 to include the GITHUB_TOKEN to avoid the API rate limit on CI.

Another option is cargo could assume something that is 40 hexadecimal characters is an commit hash, and not a tag, and use the single commit refspec. I'm not entirely certain about that, but seems relatively safe? @weihanglo WDYT?

Generally, though, I would strongly recommend against using commits from PRs.

@weihanglo
Copy link
Member

Nice finding!

Another option is cargo could assume something that is 40 hexadecimal characters is an commit hash, and not a tag, and use the single commit refspec. I'm not entirely certain about that, but seems relatively safe? @weihanglo WDYT?

In #10807, we expect Git will eventually move to support SHA256 at some point for future compatibility. Not sure if we want to go back to assume it is always 40 characters.

bors added a commit that referenced this issue Mar 13, 2024
Use GITHUB_TOKEN in github_fast_path if available.

This changes the GitHub fast-path to include an Authorization header if the GITHUB_TOKEN environment variable is set. This is intended to help with API rate limits, which is not too hard to hit in CI. If it hits the rate limit, then the fetch falls back to fetching all branches which is significantly slower for some repositories.

This is also a partial solution for #13555, where the user has a `rev="<commit-hash>"` where that commit hash points to a PR. The default refspec of `["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"]` will not fetch commits from PRs (unless they are reachable from some branch).

There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause `github_fast_path` to fail where it previously wouldn't. I think that is probably unlikely to happen, though.

This also adds redirect support. Apparently GitHub now issues a redirect for `repos/{org}/{repo}/commits/{commit}` to `/repositories/{id}/commits/{commit}`.

The test `https::github_works` exercises this code path, and will use the token on CI.
@mqudsi
Copy link

mqudsi commented May 7, 2024

I don't know if this is the same case/bug or not, but I wanted to comment here and see before opening a new issue. We have a fairly repeatable issue with the same "object not found - no match for id" error that presents under CI when attempting to resolve Cargo.toml dependencies specified as a (GitHub) git url and a rev value, but the weird thing is that the error only occurs under macOS and not under any of the other runner images.

It's only recently popped up, but it now happens more often than not, always for the same URL/SHA pair. You can see the most recent CI failure here: https://github.com/fish-shell/fish-shell/actions/runs/8988940232/job/24690902288

The repo and revision in question is this one: https://github.com/meh/rust-terminfo/commits/7259f5aa5786a9d396162da0d993e268f6163fb2/

The error happens after several other git-resident dependencies have been fetched and used OK as part of the build process. Using CARGO_NET_GIT_FETCH_WITH_CLI=true did not make any difference (I understand why not, this is just for posterity).

Do you have any pointers on what we can try or any additional information we can provide?

mqudsi referenced this issue in fish-shell/fish-shell May 7, 2024
CARGO_NET_GIT_FETCH_WITH_CLI uses the `git` executable instead of the rust
git2 crate/lib, which speeds things up and is known to resolve some issues
fetching the registry or individual crates.

This is to work around a specific issue with git-resident Cargo.toml
dependencies (e.g. terminfo) that keep randomly failing to download under macOS
CI.
@weihanglo
Copy link
Member

@mqudsi
Pretty not sure. In #13718 the affected project using some orphan commits. Perhaps fetching orphan commits needs more requests so easier to hit rate limit because they might be gc'd? This thought came in my mind because the linked commit of fish-shell terminfo is also an orphan commit. Maybe change to meh/rust-terminfo@708facf and see if it happens again?

@iliana
Copy link
Contributor

iliana commented May 21, 2024

In #10807, we expect Git will eventually move to support SHA256 at some point for future compatibility. Not sure if we want to go back to assume it is always 40 characters.

It looks like this is the change that introduced this bug, because it changed the logic of whether to add refs/commit/{0} to the list of refspecs to fetch based from "is it GitHub and is this a 40-character commit hash" to "did github_fast_path succeed", which is implicitly relying on your ability to access api.github.com (and not being rate-limited). This is pretty unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-crate-dependencies Area: [dependencies] of any kind A-git Area: anything dealing with git C-bug Category: bug S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants