Make pygit2 throw if tree of a commit is not found #682
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commit
objects in git always have atree_id
associated, that points to the correspondingTree
object.When the
Tree
object is missing, the repo is corrupted.In those cases:
128
and message:fatal: unable to read tree <hash>
GIT_ENOTFOUND
when callinggit_commit_tree()
repo[commit.tree_id]
raises aKeyError: <hash>
But on the other hand, on the commit object, rather than throwing and exception, pygit2 is swallowing the error returned by libgit2 and setting the
<Commit object>.tree
property toNone
.This patch changes the behavior to raise an error in those cases.
Rationale:
None
is arguably the wrong choice to encode an error condition, specially in python that is used heavily.In particular this caused in our system to assume there was an empty tree, and the sync service that tails git repo changes decided to DELETE everything. The code was using None to represent empty tree, useful for example when we need to compare a path between two commits (the path might be non-existent at one of the commits you are comparing).
I think that in this case the right decision would be to raise since is an exceptional case, caused by a corrupted repo, is more consistent with other tools, and ensures user code does not take the wrong decisions.
For curiosity the corrupted repository can happen more commonly than expected. We run our repositories on a shared NFS filer, and one of our servers didn't have the lookupcache=positive option. This makes NFS cache the metadata (files on a directory for example) and use that for negative lookups (to deny existance of files). In this case, the commit object was on a directory not cached, so the commit was seen immediately, but the tree object was in a folder that was cached, the cache didn't contained the tree object, and thus for some seconds the tree was not existing and the repo was corrupted. Our sync service saw tree being None and decided to delete everything, causing a lot of issues down the way.
Repro steps:
rm .git/objects/36/83f870be446c7cc05ffaef9fa06415276e1828
Before:
repo.revparse_single('HEAD').tree
will beNone
After: it
repo.revparse_single('HEAD')
raisesGitError: Unable to read tree 3683f870be446c7cc05ffaef9fa06415276e1828