Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: separate object lookup error for promisor packs #5776

Closed
wants to merge 3 commits into from

Conversation

rcoup
Copy link
Contributor

@rcoup rcoup commented Jan 8, 2021

There's currently no support in libgit2 for partial clones (#5564). This attempts to start support them by distinguishing object-lookup faults between ENOTFOUND (the requested oid is bad or the repo is corrupt) from a new EMISSING error (indicating the oid is referenced from a promisor pack, and therefore should be available remotely).

  1. Detect a promisor packfile by looking for a .packfile sidecar and track it in an attribute of git_pack_file
  2. When an object is referenced from a promisor packfile (eg. tree_entry → blob), raise EMISSING instead of ENOTFOUND if the object isn't present anywhere in the ODB.

To be clear, there's no support for fetching missing objects from a promisor remote.

The code doesn't actually do (2) properly when there are multiple ODB backends in place, but hopefully it conveys the basic idea.

Conceptually, EMISSING could also be used for things like shallow clones where parent commits aren't available locally, etc.

Checks for the presence of a '<pack>.promisor' sidecar file.
"Missing" means that an object isn't present in the ODB but should
be available from a promisor remote or similar. Promisor packfiles may
reference objects that aren't present in the packfile but are available
remotely.

Compare with to GIT_ENOTFOUND which indicates there is no information to
suggest the object will ever be available - the repository may be
corrupt or a oid lookup is not present.
If an object is referenced from a promisor packfile and the object is
not found elsewhere in the ODB, then return EMISSING instead of
ENOTFOUND.

Promisor packfiles can refer to objects that are not available in the
pack but should be available remotely. Contrast with ENOTFOUND to
indicate there is no information this object will ever exist.
rcoup added a commit to koordinates/pygit2 that referenced this pull request Jan 8, 2021
libgit2/libgit2#5776

Catch and deal with GIT_EMISSING and convert it to ObjectMissingError.
This is a subclass of KeyError which is mapped from GIT_ENOTFOUND.
@rcoup rcoup changed the title WIP: separate object lookup error for promisor packs RFC: separate object lookup error for promisor packs Jan 12, 2021
@rcoup
Copy link
Contributor Author

rcoup commented May 11, 2021

@ethomson @tiennou @pks-t sorry to nag, possible to get some feedback here? Happy to think through a plan for future steps if you think it's important to consider atm.

@olsen232
Copy link

olsen232 commented Jul 5, 2021

Hi github maintainers, I'm willing to do some work to help improve libgit2's partial clone support. What can I do to help this along?

Copy link
Member

@ethomson ethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, @rcoup; I haven't had a lot of bandwidth in 2020. I think that the direction that this is going is largely correct.

The code doesn't actually do (2) properly when there are multiple ODB backends in place, but hopefully it conveys the basic idea.

Is the problem here that we're short circuiting the lookup on the first promisor pack, and a subsequent odb backend may have the object? (In other words, we erroneously report EMISSING even though we have the object?) Or is there a deeper underlying problem that I'm not seeing yet?

Conceptually, EMISSING could also be used for things like shallow clones where parent commits aren't available locally, etc.

I wonder if that's what we would want to do. I agree that they're similar (object exists on the remote and doesn't exist locally) but I wonder if they should be distinct error codes so that a caller can distinguish them more easily. 🤔

if (!git_disable_pack_keep_file_checks) {
memcpy(p->pack_name + root_len, ".keep", sizeof(".keep"));
if (git_path_exists(p->pack_name) == true)
p->pack_keep = 1;
}

memcpy(p->pack_name + root_len, ".pack", sizeof(".pack"));
} else {
root_len = path_len - strlen(".pack");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 1181 we ensure that path_len is at least strlen(".idx"). If it were strlen(".idx"), here we would set root_len to (size_t)-1.

I think that it's unlikely that there's a call path that would call us with just .idx as the path but some sort of guard seems like it should be in order.

@rcoup
Copy link
Contributor Author

rcoup commented Jul 28, 2021

Is the problem here that we're short circuiting the lookup on the first promisor pack, and a subsequent odb backend may have the object? (In other words, we erroneously report EMISSING even though we have the object?)

Yes. Presumably there might be some other complexities wrt multiple ODB backends.

Conceptually, EMISSING could also be used for things like shallow clones where parent commits aren't available locally, etc.

I wonder if that's what we would want to do. I agree that they're similar (object exists on the remote and doesn't exist locally) but I wonder if they should be distinct error codes so that a caller can distinguish them more easily. 🤔

Yes, maybe EPROMISED & EGRAFT (or ESHALLOW_GRAFT?) to make it clearer?

@olsen232
Copy link

FYI I picked up this PR and started to address the review comments - however I ran into a major obstacle - git_object_lookup doesn't actually have the necessary context to know if an object is non-existent, missing-without-good-reason (ie corrupt), or promised. In order to say that an object is promised, we should ensure the following:

  1. The object is not in the ODB.
  2. The object is referenced by an object that is in the ODB.
  3. The object that is in the ODB is in a packfile, and, that packfile is a promisor packfile.
    Since git_object_lookup doesn't have any context except the OID that we're looking for, we can't make the distinction at this layer without changing the API.

This PR doesn't currently do those things and lacks the context to do so without changing the libgit2 API. I think this PR promotes ENOTFOUND to EPROMISED basically every time if there is a promisor packfile involved, which is too eager (unless we define the error code as meaning "this object might be promised", as opposed to "is promised"). I think this PR is probably best simply closed - there are two other PRs that I will try to write instead:

  • Separate error code eg EMISSING for when an object lookup by path fails due to missing object (not due to an invalid path). For instance, if I run git_tree_entry_bypath(&entry, root, "ab/cd/ef") and "ab/cd/ef" is not a valid path - perhaps ab has no child cd - then I get ENOTFOUND, as I should. But if I run git_tree_entry_bypath(&entry, root, "ab/cd/ef")` and "ab/cd/ef" is valid, as far as can be tested, but an object is missing, I should get this new error instead. (Eg if ab does have a child cd, but that child is nowhere to be found.

  • Separate error code eg EPROMISED for when an object is EMISSING (as in the last PR) AND the packfile where we would expect to find the object is a promisor packfile. I would salvage some of this PR to check if the promisor file exists.

Limitations - I can only add these codes to the path API eg git_tree_entry_bypath, git_tree_entry_byname, git_object_lookup_bypath. git itself also has this limitation eg if you run git cat-file $SOME_OBJECT_ID it doesn't have enough context to know if the object is non-existent or promised, so it has to search all promisor backends for it.

@rcoup
Copy link
Contributor Author

rcoup commented Aug 16, 2021

Closed in favour of #5993 and follow-ups

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants