-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden the sparse-checkout builtin #513
Harden the sparse-checkout builtin #513
Conversation
f63ccaa
to
3f55c57
Compare
5a8520b
to
79b6e9a
Compare
/submit |
Submitted as pull.513.git.1579029962.gitgitgadget@gmail.com |
@@ -1130,7 +1130,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) | |||
if (option_required_reference.nr || option_optional_reference.nr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
Hi Stolee,
On Tue, Jan 14, 2020 at 07:25:57PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The --sparse option was added to the clone builtin in d89f09c (clone:
> add --sparse mode, 2019-11-21) and was tested with a local path clone
> in t1091-sparse-checkout-builtin.sh. However, due to a difference in
> how local paths are handled versus URLs, this mechanism does not work
> with URLs.
As we discussed off-list, both of us (as well as Peff) were able to
reproduce this issue. I think that this paragraph is a good description
of what's going on heee.
> Modify the test to use a "file://" URL, which would output this error
> before the code change:
>
> Cloning into 'clone'...
> fatal: cannot change to 'file://.../repo': No such file or directory
> error: failed to initialize sparse-checkout
Nice, this should give us confidence that there won't be a regression
here in the future. I don't think that the explanation is complicated
enough for a single commit which introduced an expected failure, so
grouping it all together in this patch seems good to me.
> These errors are due to using a "-C <path>" option to call 'git -C
> <path> sparse-checkout init' but the URL is being given instead of
> the target directory.
>
> Update that target directory to evaluate this correctly. I have also
> manually tested that https:// URLs are handled correctly as well.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> builtin/clone.c | 2 +-
> t/t1091-sparse-checkout-builtin.sh | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 4348d962c9..2caefc44fb 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1130,7 +1130,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> if (option_required_reference.nr || option_optional_reference.nr)
> setup_reference();
>
> - if (option_sparse_checkout && git_sparse_checkout_init(repo))
> + if (option_sparse_checkout && git_sparse_checkout_init(dir))
I agree that 'dir' is the right thing to use here. It's the string we
read from to print "Cloning into ...", which always displays the
directory relative to the cwd. Looking at the implementation in
'git_sparse_checkout_init', this matches my understanding, too.
> return 1;
>
> remote = remote_get(option_origin);
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 37365dc668..58d9c69163 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -90,7 +90,7 @@ test_expect_success 'init with existing sparse-checkout' '
> '
>
> test_expect_success 'clone --sparse' '
> - git clone --sparse repo clone &&
> + git clone --sparse "file://$(pwd)/repo" clone &&
> git -C clone sparse-checkout list >actual &&
> cat >expect <<-\EOF &&
> /*
> --
> gitgitgadget
This all looks good to me.
Acked-by: Taylor Blau <me@ttaylorr.com>
Thanks,
Taylor
On the Git mailing list, Taylor Blau wrote (reply to this):
|
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
@@ -651,6 +651,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 14, 2020 at 07:25:58PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
> command creates a restricted set of possible patterns that are used
> by a custom algorithm to quickly match those patterns.
>
> If a user manually edits the sparse-checkout file, then they could
> create patterns that do not match these expectations. The cone-mode
> matching algorithm can return incorrect results. The solution is to
> detect these incorrect patterns, warn that we do not recognize them,
> and revert to the standard algorithm.
>
> Check each pattern for the "**" substring, and revert to the old
> logic if seen. While technically a "/<dir>/**" pattern matches
> the meaning of "/<dir>/", it is not one that would be written by
> the sparse-checkout builtin in cone mode. Attempting to accept that
> pattern change complicates the logic and instead we punt and do
> not accept any instance of "**".
That all makes sense.
> diff --git a/dir.c b/dir.c
> index 22d08e61c2..f8e350dda2 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -651,6 +651,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
> return;
> }
>
> + if (strstr(given->pattern, "**")) {
> + /* Not a cone pattern. */
> + pl->use_cone_patterns = 0;
> + warning(_("unrecognized pattern: '%s'"), given->pattern);
> + goto clear_hashmaps;
> + }
The clear_hashmaps label already unsets pl->use_cone_patterns, so the
first line is redundant (the same is true of existing goto jumps, as
well, though).
I wondered whether this warning could be triggered accidentally by
somebody who just happened to add such a pattern. But we'd exit
immediately from add_pattern_to_hashsets() immediately unless the user
has set core.sparseCheckoutCone. And if that's set, then warning is
definitely the right thing to do.
-Peff
@@ -630,11 +630,38 @@ int pl_hashmap_cmp(const void *unused_cmp_data, | |||
return strncmp(ee1->pattern, ee2->pattern, min_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 14, 2020 at 07:26:01PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> In cone mode, the sparse-checkout feature uses hashset containment
> queries to match paths. Make this algorithm respect escaped asterisk
> (*) and backslash (\) characters.
>
> Create dup_and_filter_pattern() method to convert a pattern by
> removing escape characters and dropping an optional "/*" at the end.
> This method is available in dir.h as we will use it in
> builtin/sparse-chekcout.c in a later change.
s/chekcout/checkout/
It took me a minute to understand the problem here, but I think it's: if
a path in the sparse-checkout file has "\*" in it, we'd try to match a
literal "\*" in the hash, not "*"?
But we wouldn't run into that yet because we don't properly _write_ the
escaped names until patch 8.
Is that right?
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 1/14/2020 4:21 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 07:26:01PM +0000, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> In cone mode, the sparse-checkout feature uses hashset containment
>> queries to match paths. Make this algorithm respect escaped asterisk
>> (*) and backslash (\) characters.
>>
>> Create dup_and_filter_pattern() method to convert a pattern by
>> removing escape characters and dropping an optional "/*" at the end.
>> This method is available in dir.h as we will use it in
>> builtin/sparse-chekcout.c in a later change.
>
> s/chekcout/checkout/
Thanks.
> It took me a minute to understand the problem here, but I think it's: if
> a path in the sparse-checkout file has "\*" in it, we'd try to match a
> literal "\*" in the hash, not "*"?
Yes, the hashset would have the string "\*" instead of the string "*". This
would lead to missing directories when cone mode is enabled compared to
cone mode not being enabled.
> But we wouldn't run into that yet because we don't properly _write_ the
> escaped names until patch 8.
We wouldn't run into it when using the builtin, but also a user could
edit their sparse-checkout file manually OR figure out how to get the
"right" pattern by running "git sparse-checkout set "my\\*dir" (where the
escaped backslash is collapsed by the shell and Git sees "my\*dir".
Thanks,
-Stolee
@@ -140,6 +140,22 @@ static int update_working_directory(struct pattern_list *pl) | |||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 14, 2020 at 07:26:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> If a user somehow creates a directory with an asterisk (*) or backslash
> (\), then the "git sparse-checkout set" command will struggle to provide
> the correct pattern in the sparse-checkout file. When not in cone mode,
> the provided pattern is written directly into the sparse-checkout file.
> However, in cone mode we expect a list of paths to directories and then
> we convert those into patterns.
>
> Even more specifically, the goal is to always allow the following from
> the root of a repo:
>
> git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
>
> The ls-tree command provides directory names with an unescaped asterisk.
> It also quotes the directories that contain an escaped backslash. We
> must remove these quotes, then keep the escaped backslashes.
Do we need to document these rules somewhere? Naively I'd expect
"--stdin" to take in literal pathnames. But of course it can't represent
a path with a newline. So perhaps it makes sense to take quoted names by
default, and allow literal NUL-separated input with "-z" if anybody
wants it.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 1/14/2020 4:25 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 07:26:02PM +0000, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> If a user somehow creates a directory with an asterisk (*) or backslash
>> (\), then the "git sparse-checkout set" command will struggle to provide
>> the correct pattern in the sparse-checkout file. When not in cone mode,
>> the provided pattern is written directly into the sparse-checkout file.
>> However, in cone mode we expect a list of paths to directories and then
>> we convert those into patterns.
>>
>> Even more specifically, the goal is to always allow the following from
>> the root of a repo:
>>
>> git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
>>
>> The ls-tree command provides directory names with an unescaped asterisk.
>> It also quotes the directories that contain an escaped backslash. We
>> must remove these quotes, then keep the escaped backslashes.
>
> Do we need to document these rules somewhere? Naively I'd expect
> "--stdin" to take in literal pathnames. But of course it can't represent
> a path with a newline. So perhaps it makes sense to take quoted names by
> default, and allow literal NUL-separated input with "-z" if anybody
> wants it.
This is worth thinking about the right way to describe the rules:
1. You don't _need_ quotes. They happen to come along for the ride in
'git ls-tree' so it doesn't mess up shell scripts that iterate on
those entries. At least, that's why I think they are quoted.
2. If you use quotes, the first layer of quotes will be removed.
How much of this needs to be documented explicitly, or how much should
we say "The input format matches what we would expect from 'git ls-tree
--name-only'"?
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote:
> > Do we need to document these rules somewhere? Naively I'd expect
> > "--stdin" to take in literal pathnames. But of course it can't represent
> > a path with a newline. So perhaps it makes sense to take quoted names by
> > default, and allow literal NUL-separated input with "-z" if anybody
> > wants it.
>
> This is worth thinking about the right way to describe the rules:
>
> 1. You don't _need_ quotes. They happen to come along for the ride in
> 'git ls-tree' so it doesn't mess up shell scripts that iterate on
> those entries. At least, that's why I think they are quoted.
It's not just shell scripts. Without quoting, the syntax becomes
ambiguous (e.g., imagine a file with a newline in it). So most Git
output that shows a filename will quote it if necessary, unless
NUL separators are being used.
> 2. If you use quotes, the first layer of quotes will be removed.
I take this to mean that anything starting with a double-quote will have
the outer layer removed, and backslash escapes inside expanded. And
anything without a starting double quote (even if it has internal
backslash escapes!) will be taken literally.
That would match how things like "update-index --index-info" work.
As far as implementation, I know you're trying to keep some of the
escaping, but I think it might make more sense to do use
unquote_c_style() to parse the input (see update-index's use for some
prior art), and then re-quote as necessary to put things into the
sparse-checkout file (I guess quoting more than just quote_c_style()
would do, since you need to quote glob metacharacters like '*' and
probably "!"). But as much as possible, I think you'd want literal
strings inside the program, and just quoting/unquoting at the edges.
> How much of this needs to be documented explicitly, or how much should
> we say "The input format matches what we would expect from 'git ls-tree
> --name-only'"?
I think it's fine to say that, and maybe call attention to the quoting.
Like:
The input format matches the output of `git ls-tree --name-only`. This
includes interpreting pathnames that begin with a double quote (") as
C-style quoted strings.
Disappointingly, update-index does not seem to explain the rules
anywhere. fast-import does cover it. Maybe it's something that ought to
be hoisted out into gitcli(7) or similar (or maybe it has been and I
just can't find it).
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 1/14/2020 5:48 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote:
>
>>> Do we need to document these rules somewhere? Naively I'd expect
>>> "--stdin" to take in literal pathnames. But of course it can't represent
>>> a path with a newline. So perhaps it makes sense to take quoted names by
>>> default, and allow literal NUL-separated input with "-z" if anybody
>>> wants it.
>>
>> This is worth thinking about the right way to describe the rules:
>>
>> 1. You don't _need_ quotes. They happen to come along for the ride in
>> 'git ls-tree' so it doesn't mess up shell scripts that iterate on
>> those entries. At least, that's why I think they are quoted.
>
> It's not just shell scripts. Without quoting, the syntax becomes
> ambiguous (e.g., imagine a file with a newline in it). So most Git
> output that shows a filename will quote it if necessary, unless
> NUL separators are being used.
Good to know.
>> 2. If you use quotes, the first layer of quotes will be removed.
>
> I take this to mean that anything starting with a double-quote will have
> the outer layer removed, and backslash escapes inside expanded. And
> anything without a starting double quote (even if it has internal
> backslash escapes!) will be taken literally.
Hm. Perhaps you are right! The ls-tree output for the test example
is:
deep
folder1
folder2
"zbad\\dir"
zdoes*exist
so the "zdoes*exist" value is not escaped. I believe the current
logic does something extra: consider supplying this input to
'git sparse-checkout set --stdin':
deep
folder1
folder2
"zbad\\dir"
zdoes\*exist
then should we un-escape "\*" to "*"? Or is this not a valid input
because a backslash should have been quoted into C-style quotes?
The behavior in the current series allows this output that would
never be written by "git ls-tree".
> That would match how things like "update-index --index-info" work.
>
> As far as implementation, I know you're trying to keep some of the
> escaping, but I think it might make more sense to do use
> unquote_c_style() to parse the input (see update-index's use for some
> prior art), and then re-quote as necessary to put things into the
> sparse-checkout file (I guess quoting more than just quote_c_style()
> would do, since you need to quote glob metacharacters like '*' and
> probably "!"). But as much as possible, I think you'd want literal
> strings inside the program, and just quoting/unquoting at the edges.
I was playing around with this, and I think that quote_c_style() is
necessary for the output, but we have a strange in-memory situation
for the other escaping: we both fill the hashsets with the un-escaped
data and fill the pattern list with the escaped patterns.
I'll add a commit with the quote_c_style() calls during the 'list'
subcommand.
>> How much of this needs to be documented explicitly, or how much should
>> we say "The input format matches what we would expect from 'git ls-tree
>> --name-only'"?
>
> I think it's fine to say that, and maybe call attention to the quoting.
> Like:
>
> The input format matches the output of `git ls-tree --name-only`. This
> includes interpreting pathnames that begin with a double quote (") as
> C-style quoted strings.
>
> Disappointingly, update-index does not seem to explain the rules
> anywhere. fast-import does cover it. Maybe it's something that ought to
> be hoisted out into gitcli(7) or similar (or maybe it has been and I
> just can't find it).
I'll start the process by using your recommended language. I noticed
also that the 'set' command doesn't actually document what happens
when in cone mode, so I will correct that, too.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Fri, Jan 24, 2020 at 04:10:21PM -0500, Derrick Stolee wrote:
> Hm. Perhaps you are right! The ls-tree output for the test example
> is:
>
> deep
> folder1
> folder2
> "zbad\\dir"
> zdoes*exist
>
> so the "zdoes*exist" value is not escaped. I believe the current
> logic does something extra: consider supplying this input to
> 'git sparse-checkout set --stdin':
>
> deep
> folder1
> folder2
> "zbad\\dir"
> zdoes\*exist
>
> then should we un-escape "\*" to "*"? Or is this not a valid input
> because a backslash should have been quoted into C-style quotes?
I'd think we should not un-escape anything, because we weren't told that
this was a C-style quoted string by the presence of an opening
double-quote. And that's how, say, update-index behaves:
$ blob=$(echo foo | git hash-object -w --stdin)
$ printf '100644 %s\t%s\n' \
$blob 'just*asterisk' \
$blob 'backslash\without\quotes' \
$blob '"backslash\\with\\quotes"' |
git update-index --index-info
which results in:
$ git ls-files
"backslash\\with\\quotes"
"backslash\\without\\quotes"
just*asterisk
[same, but without quoting]
$ git ls-files -z | tr '\0' '\n'
backslash\with\quotes
backslash\without\quotes
just*asterisk
> The behavior in the current series allows this output that would
> never be written by "git ls-tree".
Yes, I think we'd never write that, because ls-tree would quote anything
with a backslash in it, even though it's not strictly necessary. But it
would be valid input to specify a file that has backslashes but not
double-quotes, and I think sparse-checkout should be changed to match
update-index here.
> I was playing around with this, and I think that quote_c_style() is
> necessary for the output, but we have a strange in-memory situation
> for the other escaping: we both fill the hashsets with the un-escaped
> data and fill the pattern list with the escaped patterns.
Yeah, but I think that the syntactic escaping on input might not have
identical rules to the escaping needed for the patterns.
So it makes sense to me to handle input as a separate mechanism, get a
pristine copy of what the user was trying to communicate to us, and then
re-escape whatever we need to put into the pattern list.
And ultimately the flow would be something like:
- read input
- if argument is from command-line, use it verbatim
- else if reading stdin with "-z", use it verbatim
- else if line starts with double-quote, unquote_c_style()
- else use line verbatim
- the result is a single pristine filename
- fill hashset with pristine filenames
- generate pattern list to write to sparse file, escaping filenames as
necessary according to sparse-pattern rules
Obviously you don't have a "-z" yet, but I think it's something we'd
probably want in the long run. And anything coming from the command-line
shouldn't need quoting to get it to us either (and so we'd need to
escape before writing to the sparse file).
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 1/24/2020 4:42 PM, Jeff King wrote:
> And ultimately the flow would be something like:
>
> - read input
> - if argument is from command-line, use it verbatim
> - else if reading stdin with "-z", use it verbatim
> - else if line starts with double-quote, unquote_c_style()
> - else use line verbatim
> - the result is a single pristine filename
> - fill hashset with pristine filenames
> - generate pattern list to write to sparse file, escaping filenames as
> necessary according to sparse-pattern rules
>
> Obviously you don't have a "-z" yet, but I think it's something we'd
> probably want in the long run. And anything coming from the command-line
> shouldn't need quoting to get it to us either (and so we'd need to
> escape before writing to the sparse file).
This recommendation came async with my v2, so I'll follow shortly with
a v3 that uses this flow. I have something that I think works, after
slightly adapting my tests, but now I need to make sure that all the
patches still make sense and build cleanly.
Thanks,
-Stolee
On the Git mailing list, Jeff King wrote (reply to this):
|
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
@@ -12,6 +12,13 @@ list_files() { | |||
(cd "$1" && printf '%s\n' *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When testing the sparse-checkout feature, we need to compare the
> contents of the working-directory against some expected output.
> Using here-docs was useful in the beginning, but became repetetive
> as the test script grew.
>
> Create a check_files helper to make the tests simpler and easier
> to extend. It also reduces instances of bad here-doc whitespace.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> t/t1091-sparse-checkout-builtin.sh | 215 ++++++++++-------------------
> 1 file changed, 71 insertions(+), 144 deletions(-)
>
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index ff7f8f7a1f..20caefe155 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -12,6 +12,13 @@ list_files() {
> (cd "$1" && printf '%s\n' *)
> }
>
> +check_files() {
> + DIR=$1
> + printf "%s\n" $2 >expect &&
> + list_files $DIR >actual &&
It is unclear if the script is being deliberate or sloppy.
It turns out that not quoting $2 is deliberate (i.e. it wants to
pass more than one words in $2, have them split at $IFS and show
each of them on a separate line), at the same time not quoting $DIR
is simply sloppy.
And it is totally unnecessary to confuse readers like this.
Unless you plan to extend this helper further, I think this would be
much less burdensome to the readers:
check_files () {
list_files "$1" >actual &&
shift &&
printf "%s\n" "$@" >expect &&
test_cmp expect actual
}
This ...
> test_cmp expect repo/.git/info/sparse-checkout &&
> - list_files repo >dir &&
> - cat >expect <<-EOF &&
> - a
> - folder1
> - folder2
> - EOF
> - test_cmp expect dir
> + check_files repo "a folder1 folder2"
... is a kind of change that the log message advertises, which is a
very nice rewrite.
And ...
> test_expect_success 'clone --sparse' '
> git clone --sparse repo clone &&
> git -C clone sparse-checkout list >actual &&
> - cat >expect <<-EOF &&
> - /*
> - !/*/
> + cat >expect <<-\EOF &&
> + /*
> + !/*/
> EOF
... this is a style-fix that is another nice rewrite but in a
different category. I wonder if they should be done in separate
commits.
Other than that, makes sense.
Thanks.
@@ -199,6 +199,10 @@ static int write_patterns_and_update(struct pattern_list *pl) | |||
int result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'git init' command creates the ".git/info" directory and fills it
> with some default files. However, 'git worktree add' does not create
> the info directory for that worktree. This causes a problem when running
> "git sparse-checkout init" inside a worktree. While care was taken to
> allow the sparse-checkout config to be specific to a worktree, this
> initialization was untested.
>
> Safely create the leading directories for the sparse-checkout file. This
> is the safest thing to do even without worktrees, as a user could delete
> their ".git/info" directory and expect Git to recover safely.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> builtin/sparse-checkout.c | 4 ++++
> t/t1091-sparse-checkout-builtin.sh | 10 ++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index b3bed891cb..3cee8ab46e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -199,6 +199,10 @@ static int write_patterns_and_update(struct pattern_list *pl)
> int result;
>
> sparse_filename = get_sparse_checkout_filename();
> +
> + if (safe_create_leading_directories(sparse_filename))
> + die(_("failed to create directory for sparse-checkout file"));
> +
The use of safe_create_leading_directories() here, which uses
adjust_shared_perm(), is the right thing to do.
Looks good.
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 20caefe155..37365dc668 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -295,4 +295,14 @@ test_expect_success 'interaction with submodules' '
> check_files super/modules/child "a deep folder1 folder2"
> '
>
> +test_expect_success 'different sparse-checkouts with worktrees' '
> + git -C repo worktree add --detach ../worktree &&
> + check_files worktree "a deep folder1 folder2" &&
> + git -C worktree sparse-checkout init --cone &&
> + git -C repo sparse-checkout set folder1 &&
> + git -C worktree sparse-checkout set deep/deeper1 &&
> + check_files repo "a folder1" &&
> + check_files worktree "a deep"
> +'
> +
> test_done
ed6fed6
to
df000dc
Compare
When testing the sparse-checkout feature, we need to compare the contents of the working-directory against some expected output. Using here-docs was useful in the beginning, but became repetetive as the test script grew. Create a check_files helper to make the tests simpler and easier to extend. It also reduces instances of bad here-doc whitespace. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
t1091-sparse-checkout-builtin.sh uses here-docs to populate the expected contents of the sparse-checkout file. These do not use shell interpolation, so use "-\EOF" instead of "-EOF". Also use proper tabbing. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The 'git init' command creates the ".git/info" directory and fills it with some default files. However, 'git worktree add' does not create the info directory for that worktree. This causes a problem when running "git sparse-checkout init" inside a worktree. While care was taken to allow the sparse-checkout config to be specific to a worktree, this initialization was untested. Safely create the leading directories for the sparse-checkout file. This is the safest thing to do even without worktrees, as a user could delete their ".git/info" directory and expect Git to recover safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The --sparse option was added to the clone builtin in d89f09c (clone: add --sparse mode, 2019-11-21) and was tested with a local path clone in t1091-sparse-checkout-builtin.sh. However, due to a difference in how local paths are handled versus URLs, this mechanism does not work with URLs. Modify the test to use a "file://" URL, which would output this error before the code change: Cloning into 'clone'... fatal: cannot change to 'file://.../repo': No such file or directory error: failed to initialize sparse-checkout These errors are due to using a "-C <path>" option to call 'git -C <path> sparse-checkout init' but the URL is being given instead of the target directory. Update that target directory to evaluate this correctly. I have also manually tested that https:// URLs are handled correctly as well. Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command creates a restricted set of possible patterns that are used by a custom algorithm to quickly match those patterns. If a user manually edits the sparse-checkout file, then they could create patterns that do not match these expectations. The cone-mode matching algorithm can return incorrect results. The solution is to detect these incorrect patterns, warn that we do not recognize them, and revert to the standard algorithm. Check each pattern for the "**" substring, and revert to the old logic if seen. While technically a "/<dir>/**" pattern matches the meaning of "/<dir>/", it is not one that would be written by the sparse-checkout builtin in cone mode. Attempting to accept that pattern change complicates the logic and instead we punt and do not accept any instance of "**". Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
In cone mode, the shortest pattern the sparse-checkout command will write into the sparse-checkout file is "/*". This is handled carefully in add_pattern_to_hashsets(), so warn if any other pattern is this short. This will assist future pattern checks by allowing us to assume there are at least three characters in the pattern. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
df000dc
to
c27a17a
Compare
/submit |
Submitted as pull.513.v2.git.1579900782.gitgitgadget@gmail.com |
This branch is now known as |
This patch series was integrated into pu via git@cb30b3d. |
This patch series was integrated into pu via git@5d3bf0a. |
This patch series was integrated into pu via git@a0268fb. |
This patch series was integrated into pu via git@8d3f320. |
The sparse-checkout patterns allow special globs according to fnmatch(3). When writing cone-mode patterns for paths containing these characters, they must be escaped. Use is_glob_special() to check which characters must be escaped this way, and add a path to the tests that contains all glob characters at once. Note that ']' is not special, since the initial bracket '[' is escaped. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The existing documentation does not clarify how the 'set' subcommand changes when core.sparseCheckoutCone is enabled. Correct this by changing some language around the "A/B/C" example. Also include a description of the input format matching the output of 'git ls-tree --name-only'. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
10b380c
to
3dd8f97
Compare
The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled. When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path. When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone. Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression. Reported-by: Finn Bryant <finnbryant@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
bf21156
to
5e9fcce
Compare
This patch series was integrated into pu via git@b652833. |
/submit |
Submitted as pull.513.v4.git.1580501775.gitgitgadget@gmail.com |
On the Git mailing list, Elijah Newren wrote (reply to this):
|
This patch series was integrated into pu via git@2ca6622. |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
This patch series was integrated into pu via git@96489ed. |
This patch series was integrated into pu via git@1257ff8. |
This patch series was integrated into next via git@56c09b9. |
On the Git mailing list, Taylor Blau wrote (reply to this):
|
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This patch series was integrated into pu via git@c37891f. |
This patch series was integrated into pu via git@8720e4d. |
This patch series was integrated into pu via git@433b8aa. |
This patch series was integrated into next via git@433b8aa. |
This patch series was integrated into master via git@433b8aa. |
Closed via 433b8aa. |
This series is based on ds/sparse-list-in-cone-mode.
This series attempts to clean up some rough edges in the sparse-checkout feature, especially around the cone mode.
Unfortunately, after the v2.25.0 release, we noticed an issue with the "git clone --sparse" option when using a URL instead of a local path. This is fixed and properly tested here.
Also, let's improve Git's response to these more complicated scenarios:
Updates in V2:
Update in V3:
[1] https://lore.kernel.org/git/062301d5d0bc$c3e17760$4ba46620$@Frontier.com/
Thanks,
-Stolee
Cc: me@ttaylorr.com, peff@peff.net, newren@gmail.com