Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refdb_fs: fix loose/packed refs lookup racing with repacks #4984

Merged
merged 5 commits into from
Feb 22, 2019

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Feb 15, 2019

This is a different approach to #4981 that should correctly fix the issues described by @peff.

There had been an ordering dependency between loading loose and packed refs in that loading loose refs set the PACKREF_SHADOWED flag in already loaded packed refs. Upon reloading packed refs, this flag gets reset. Due to that, we always had to load loose and packed refs in the wrong order. This PR fixes this dependency and thus allows us to first load the loose refs and only afterwards load the packed refs, effectively mitigating the race.

It also fixes the same race condition in the exists callback that has been pointed out by @peff, too.

When repacking references, git.git will first update the packed refs and only
afterwards delete any existing loose references that have now been moved to the
new packed refs file. Due to this, there is a potential for racing if one first
reads the packfile (which has not been updated yet) and only then trying to read
the loose reference (which has just been deleted). In this case, one will
incorrectly fail to lookup the reference and it will be reported as missing.
Naturally, this is exactly what we've been doing in `refdb_fs_backend__exists`.

Fix the race by reversing the lookup: we will now first check if the loose
reference exists and only afterwards refresh the packed file.
Refactor the error handling in `refdb_fs_backend__iterator` to always return the
correct error code returned by the failing function.
When creating a new iterator, we eagerly load loose refs but only lazily create
a copy of packed refs. The lazy load only happens as soon as we have iterated
over all loose refs, opening up a potentially wide window for races. This
may lead to an inconsistent view e.g. when the caller decides to reload packed
references somewhen between iterating the loose refs, which is unexpected.

Fix the issue by eagerly copying the sorted cache. Note that right now, we are
heavily dependent on ordering here: we first need to reload packed refs, then we
have to load loose refs and only as a last step are we allowed to copy the
cache. This is because loading loose refs has the side-effect of setting the
`PACKED_SHADOWED` flag in the packed refs cache, which we require to avoid
outputting packed refs that already exist as loose refs.
Right now, loading loose refs has the side-effect of setting the
`PACKREF_SHADOWED` flag for references that exist both in the loose and the
packed refs. Because of this, we are force do first look up packed refs and only
afterwards loading the packed refs. This is susceptible to a race, though, when
refs are being repacked: when first loading the packed cache, then it may not
yet have the migrated loose ref. But when now trying to look up the loose
reference afterwards, then it may already have been migrated. Thus, we would
fail to find this reference in this scenario.

Remove this ordering dependency to allow fixing the above race. Instead of
setting the flag when loading loose refs, we will now instead set it lazily when
iterating over the loose refs. This even has the added benefit of not requiring
us to lock the packed refs cache, as we already have an owned copy of it.
Right now, we first load the packed refs cache and only afterwards load the
loose references. This is susceptible to a race when the loose ref is being
migrated to a packed cache by e.g. git-pack-refs(1):

                libgit2                             git-pack-refs

   1. We load the packed ref, which
      does not yet have the migrated
      reference.

                                      2. git-pack-refs updates the packed ref
                                         file to have the migrated ref.

                                      3. git-pack-refs deletes the old loose
                                         ref.

   4. We look up the loose ref.

So we now do not find the reference at all and will never iterate over it.

Fix the issue by reversing the order: instead of first loading the packed refs,
we will now look up the loose reference first. If it has already been deleted,
then it must already be present in the packed-refs by definition, as git.git
will only delete the reference after updating the packed refs file.
@pks-t pks-t changed the title [WIP] refdb_fs: fix loose/packed refs lookup racing with repacks refdb_fs: fix loose/packed refs lookup racing with repacks Feb 15, 2019
@pks-t
Copy link
Member Author

pks-t commented Feb 15, 2019

Force-pushed to provide proper commit messages and remove unrelated commits.

@ethomson ethomson merged commit 75dd7f2 into libgit2:master Feb 22, 2019
@ethomson
Copy link
Member

Cool, thanks for doing this, and thanks for the report @peff.

@ethomson
Copy link
Member

Note: I logged #5003 around the delete_tail issue that @peff noted in #4981, so that we would not forget.

@pks-t pks-t deleted the pks/refdb-fs-race branch February 22, 2019 10:30
@pks-t
Copy link
Member Author

pks-t commented Feb 22, 2019

Thanks, @ethomson! And thanks again @peff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants