Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse-checkout: don't update sparse-checkout file on error #204

Conversation

derrickstolee
Copy link
Collaborator

@derrickstolee derrickstolee commented Oct 1, 2019

If the index has an unstaged change in a file that will be dropped
by a change to the sparse-checkout file, then the index update
will fail. If this happens during 'git sparse-checkout set', then
the entire command will fail. However, the sparse-checkout file is
still updated, so the working directory is in a half-finished state.

Update the sparse-checkout builtin to run unpack_trees()
directly instead of through an external git read-tree process.
While doing so, allow passing a set of patterns, so we can
validate the patterns before trying to write to the sparse-checkout
file.

Also, write sparse-checkout under a .lock and take the
sparse-checkout lock around the unpack_trees() check.

Resolves #198.

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions to help with my understanding of this code.

builtin/sparse-checkout.c Outdated Show resolved Hide resolved
@@ -71,6 +71,7 @@ struct pattern_list {
* excludes array above. If non-zero, that check succeeded.
*/
unsigned use_cone_patterns;
unsigned full_cone;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we instead need to handle the case where that second
line does not exist. Due to the MUSTBEDIR and NEGATIVE flags, the
second pattern becomes "/*".

Create a new option in struct pattern_list to store this "full
cone" mode. When we see the "/" pattern with no flags, we toggle
full_cone to on. If we later see "/
" pattern with MUSTBEDIR and
NEGATIVE flags, then turn full_cone off.

For my own understanding, are these flags set when the file is parsed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is part of the parse_path_pattern() method in dir.c, which is called by add_pattern() inside add_patterns_from_buffer().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method also strips out the tail / for directories (MUSTBEDIR flag) and the preceding ! for negations (NEGATIVE flag).

dir.c Show resolved Hide resolved
@derrickstolee derrickstolee changed the title sparse-checkout: revert sparse-checkout file on error sparse-checkout: don't update sparse-checkout file on error Oct 3, 2019
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Due to an oversight when handling cone patterns in sparse-checkout,
the full cone patternset (only "/*") would not fill any entries in
the hashsets, and only files at root would be populated.

For cone mode patterns, the first two patterns are _usually_ the
following:

	/*
	!/*/

However, we instead need to handle the case where that second
line does not exist. Due to the MUSTBEDIR and NEGATIVE flags, the
second pattern becomes "/*".

Create a new option in struct pattern_list to store this "full
cone" mode. When we see the "/*" pattern with no flags, we toggle
full_cone to on. If we later see "/*" pattern with MUSTBEDIR and
NEGATIVE flags, then turn full_cone off.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
If two 'git sparse-checkout set' subcommands are launched at the
same time, the behavior can be unexpected as they compete to write
the sparse-checkout file and update the working directory.

Take a lockfile around the writes to the sparse-checkout file. In
addition, acquire this lock around the working directory update
to avoid two commands updating the working directory in different
ways.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Collaborator Author

Ping on this one. I adapted these commits into v3 of the sparse-checkout series which went upstream yesterday. I expect to need a re-roll due to clashes with some hashmap changes.

cc: @kewillford @wilbaker

if (given->flags & PATTERN_FLAG_NEGATIVE &&
given->flags & PATTERN_FLAG_MUSTBEDIR &&
!strcmp(given->pattern, "/*")) {
pl->full_cone = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if we already discussed this (I'm having trouble finding it if we did), but is there somewhere that git enforces that the sparse-checkout file is "valid", or are the entries guaranteed to be in a particular order?

I ask because I'm wondering if after setting pl->full_cone = 0 here we might read an entry that sets it to 1 below (or vice-versa). Is that the expected behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is absolutely important in the sparse-checkout file. In our typical cone-based patterns, we start with these two lines:

/*     # include all files
!/*/   # except those in folders

The order gives the meaning to the negation. The last pattern to match the file /A/file.txt is !/*/, so the sparse-checkout says "do not include this file". (If no pattern matches at all, the default is "do not include this file".)

If they are reversed:

!/*/     # do not include things in folders
/*       # include everything

then the sparse-checkout file will match everything, as the last pattern to match is a positive pattern.

So, this code is saying "If I see !/*/, then set full_cone to zero. If I see /*, then set full_cone to true."

If there are other patterns that make this file not look like a cone-based file, then use_cone_patterns is set to zero and full_cone is ignored.

Copy link
Member

@kewillford kewillford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions to help me better understand.

dir.c Show resolved Hide resolved
dir.c Show resolved Hide resolved
o.head_idx = -1;
o.src_index = r->index;
o.dst_index = r->index;
o.skip_sparse_checkout = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide these were the right options for unpack_trees?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code in builtin/read-tree.c and tracked what would happen in a call to git read-tree -mu HEAD.

builtin/sparse-checkout.c Show resolved Hide resolved
builtin/sparse-checkout.c Show resolved Hide resolved
Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but @kewillford is more of an expert on this code so you might want to wait for his approval before merging.


cache_tree_free(&r->index->cache_tree);

repo_hold_locked_index(r, &lock_file, LOCK_DIE_ON_ERROR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should hold the index.lock for the entire sparse-checkout command to prevent races? For example running a checkout after this code commits the index.lock but it runs before the sparse-checkout file was updated?

Since commands get the index.lock before modifying the index, I'm thinking the sequence should be:

  1. Get the index.lock
  2. Get the sparse-checkout.lock
  3. Update the working directory and sparse-checkout (I don't think order matters here since we have both locks)
  4. Commit the sparse-checkout
  5. Commit the index

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that a try, thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rearranged the code to do exactly this in 5cd306f. Thanks!

Copy link
Member

@kewillford kewillford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Approved with the one suggestion about locking.

Comment on lines +366 to +376
/* commit index.lock file */
finish_update_data(ud);
free(ud);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the commit of the index.lock be the last thing to happen so after the commit of the sparse-checkout.lock? I'm not sure what git reads first the sparse-checkout file or the index. Not sure it matters too much either since both renames should happen rather fast. But since the commit has

  5. Commit sparse-checkout.lock.
  6. Commit index.lock.

Thought I would ask.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the sparse-checkout builtin is the only thing looking at sparse-checkout.lock, this won't affect parallel checkouts. A checkout-like command would take index.lock before reading the sparse-checkout file.

To avoid deadlocks between competing git sparse-checkout set commands, it is best to "pop" the locks in a stack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I definitely did them in the wrong order in the code. Thanks for catching that!

There is a narrow window for race conditions while one process
updates the sparse-checkout file and another process tries to
do a checkout.

To avoid these races, follow this pattern in 'git sparse-checkout
set':

  1. Take index.lock.
  2. Take sparse-checkout.lock.
  3. Update the working directory according to in-memory patterns.
  4. Write patterns to sparse-checkout.lock.
  5. Commit sparse-checkout.lock.
  6. Commit index.lock.

This avoids the race condition, as any checkout must first take
the index.lock.

The code is organized into a new "struct update_data" so we can
hold the multiple structs and options across this time period.

Helped-by: Kevin Willford <kewillford@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee merged commit afc37de into microsoft:features/sparse-checkout-2.23.0 Oct 15, 2019
derrickstolee added a commit to microsoft/scalar that referenced this pull request Oct 17, 2019
…heckout feature

See microsoft/git#204 for the Git updates.

The `SparseVerb` tests require some changes because we don't get "stuck" with a sparse-checkout if the set command fails. Thanks, @jeschu1 for creating such good tests so we can demonstrate the behavior change here!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants