-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse-checkout: don't update sparse-checkout file on error #204
sparse-checkout: don't update sparse-checkout file on error #204
Conversation
48a132e
to
9d74a40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few questions to help with my understanding of this code.
@@ -71,6 +71,7 @@ struct pattern_list { | |||
* excludes array above. If non-zero, that check succeeded. | |||
*/ | |||
unsigned use_cone_patterns; | |||
unsigned full_cone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we instead need to handle the case where that second
line does not exist. Due to the MUSTBEDIR and NEGATIVE flags, the
second pattern becomes "/*".Create a new option in struct pattern_list to store this "full
cone" mode. When we see the "/" pattern with no flags, we toggle
full_cone to on. If we later see "/" pattern with MUSTBEDIR and
NEGATIVE flags, then turn full_cone off.
For my own understanding, are these flags set when the file is parsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is part of the parse_path_pattern()
method in dir.c
, which is called by add_pattern()
inside add_patterns_from_buffer()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method also strips out the tail /
for directories (MUSTBEDIR
flag) and the preceding !
for negations (NEGATIVE
flag).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Due to an oversight when handling cone patterns in sparse-checkout, the full cone patternset (only "/*") would not fill any entries in the hashsets, and only files at root would be populated. For cone mode patterns, the first two patterns are _usually_ the following: /* !/*/ However, we instead need to handle the case where that second line does not exist. Due to the MUSTBEDIR and NEGATIVE flags, the second pattern becomes "/*". Create a new option in struct pattern_list to store this "full cone" mode. When we see the "/*" pattern with no flags, we toggle full_cone to on. If we later see "/*" pattern with MUSTBEDIR and NEGATIVE flags, then turn full_cone off. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
If two 'git sparse-checkout set' subcommands are launched at the same time, the behavior can be unexpected as they compete to write the sparse-checkout file and update the working directory. Take a lockfile around the writes to the sparse-checkout file. In addition, acquire this lock around the working directory update to avoid two commands updating the working directory in different ways. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
9d74a40
to
c963314
Compare
Ping on this one. I adapted these commits into v3 of the sparse-checkout series which went upstream yesterday. I expect to need a re-roll due to clashes with some hashmap changes. cc: @kewillford @wilbaker |
if (given->flags & PATTERN_FLAG_NEGATIVE && | ||
given->flags & PATTERN_FLAG_MUSTBEDIR && | ||
!strcmp(given->pattern, "/*")) { | ||
pl->full_cone = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if we already discussed this (I'm having trouble finding it if we did), but is there somewhere that git enforces that the sparse-checkout file is "valid", or are the entries guaranteed to be in a particular order?
I ask because I'm wondering if after setting pl->full_cone = 0
here we might read an entry that sets it to 1 below (or vice-versa). Is that the expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is absolutely important in the sparse-checkout
file. In our typical cone-based patterns, we start with these two lines:
/* # include all files
!/*/ # except those in folders
The order gives the meaning to the negation. The last pattern to match the file /A/file.txt
is !/*/
, so the sparse-checkout says "do not include this file". (If no pattern matches at all, the default is "do not include this file".)
If they are reversed:
!/*/ # do not include things in folders
/* # include everything
then the sparse-checkout file will match everything, as the last pattern to match is a positive pattern.
So, this code is saying "If I see !/*/
, then set full_cone
to zero. If I see /*
, then set full_cone
to true."
If there are other patterns that make this file not look like a cone-based file, then use_cone_patterns
is set to zero and full_cone
is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions to help me better understand.
builtin/sparse-checkout.c
Outdated
o.head_idx = -1; | ||
o.src_index = r->index; | ||
o.dst_index = r->index; | ||
o.skip_sparse_checkout = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide these were the right options for unpack_trees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code in builtin/read-tree.c
and tracked what would happen in a call to git read-tree -mu HEAD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but @kewillford is more of an expert on this code so you might want to wait for his approval before merging.
builtin/sparse-checkout.c
Outdated
|
||
cache_tree_free(&r->index->cache_tree); | ||
|
||
repo_hold_locked_index(r, &lock_file, LOCK_DIE_ON_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should hold the index.lock
for the entire sparse-checkout
command to prevent races? For example running a checkout after this code commits the index.lock
but it runs before the sparse-checkout file was updated?
Since commands get the index.lock before modifying the index, I'm thinking the sequence should be:
- Get the
index.lock
- Get the
sparse-checkout.lock
- Update the working directory and sparse-checkout (I don't think order matters here since we have both locks)
- Commit the
sparse-checkout
- Commit the
index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give that a try, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rearranged the code to do exactly this in 5cd306f. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Approved with the one suggestion about locking.
/* commit index.lock file */ | ||
finish_update_data(ud); | ||
free(ud); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the commit of the index.lock
be the last thing to happen so after the commit of the sparse-checkout.lock
? I'm not sure what git reads first the sparse-checkout file or the index. Not sure it matters too much either since both renames should happen rather fast. But since the commit has
5. Commit sparse-checkout.lock.
6. Commit index.lock.
Thought I would ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the sparse-checkout
builtin is the only thing looking at sparse-checkout.lock
, this won't affect parallel checkouts. A checkout
-like command would take index.lock
before reading the sparse-checkout
file.
To avoid deadlocks between competing git sparse-checkout set
commands, it is best to "pop" the locks in a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I definitely did them in the wrong order in the code. Thanks for catching that!
There is a narrow window for race conditions while one process updates the sparse-checkout file and another process tries to do a checkout. To avoid these races, follow this pattern in 'git sparse-checkout set': 1. Take index.lock. 2. Take sparse-checkout.lock. 3. Update the working directory according to in-memory patterns. 4. Write patterns to sparse-checkout.lock. 5. Commit sparse-checkout.lock. 6. Commit index.lock. This avoids the race condition, as any checkout must first take the index.lock. The code is organized into a new "struct update_data" so we can hold the multiple structs and options across this time period. Helped-by: Kevin Willford <kewillford@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
5cd306f
to
b9ef9f4
Compare
…heckout feature See microsoft/git#204 for the Git updates. The `SparseVerb` tests require some changes because we don't get "stuck" with a sparse-checkout if the set command fails. Thanks, @jeschu1 for creating such good tests so we can demonstrate the behavior change here!
If the index has an unstaged change in a file that will be dropped
by a change to the sparse-checkout file, then the index update
will fail. If this happens during 'git sparse-checkout set', then
the entire command will fail. However, the sparse-checkout file is
still updated, so the working directory is in a half-finished state.
Update the sparse-checkout builtin to run
unpack_trees()
directly instead of through an external
git read-tree
process.While doing so, allow passing a set of patterns, so we can
validate the patterns before trying to write to the sparse-checkout
file.
Also, write sparse-checkout under a .lock and take the
sparse-checkout lock around the unpack_trees() check.
Resolves #198.