Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior #1018

Closed
56 changes: 50 additions & 6 deletions dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
* then our prefix match is all we need; we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):


This patch changes the behavior of .gitignore such that directories are
now matched by prefix instead of matching exactly.

The failure that we observed is something like the following:

In "a/.gitignore", we have the pattern "git/". We should expect that
"a/git/foo" to be ignored because "git/" should be matched exactly.
However, "a/git-foo/bar" is also ignored because "git-foo" matches the
prefix.

I'll prepare a test case for this as soon as I figure out how to write
it..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

> This patch changes the behavior of .gitignore such that directories are
> now matched by prefix instead of matching exactly.
>
> The failure that we observed is something like the following:
>
> In "a/.gitignore", we have the pattern "git/". We should expect that
> "a/git/foo" to be ignored because "git/" should be matched exactly.
> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
> prefix.
>
> I'll prepare a test case for this as soon as I figure out how to write
> it..

FWIW, reverting this commit (and nothing else) from 'master' does not
seem to break any test.  That does not necessarily mean much (it may
be indicating a gap in the test coverage).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/1/2021 8:34 PM, Junio C Hamano wrote:
> Glen Choo <chooglen@google.com> writes:
> 
>> This patch changes the behavior of .gitignore such that directories are
>> now matched by prefix instead of matching exactly.

Thank you for pointing out an unintended consequence.

>> The failure that we observed is something like the following:
>>
>> In "a/.gitignore", we have the pattern "git/". We should expect that
>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>> prefix.
>>
>> I'll prepare a test case for this as soon as I figure out how to write
>> it..
> 
> FWIW, reverting this commit (and nothing else) from 'master' does not
> seem to break any test.  That does not necessarily mean much (it may
> be indicating a gap in the test coverage).

Hm. I definitely seemed confident in my commit message that this change
was important for a test I was introducing at some point. Let me revert
it in microsoft/git and see if our Scalar functional tests find an
issue.

In the meantime, I'll try to create a Git test that demonstrates a
problem one way or another.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/2/2021 9:42 AM, Derrick Stolee wrote:
> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>> Glen Choo <chooglen@google.com> writes:
>>
>>> This patch changes the behavior of .gitignore such that directories are
>>> now matched by prefix instead of matching exactly.
> 
> Thank you for pointing out an unintended consequence.
> 
>>> The failure that we observed is something like the following:
>>>
>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>> prefix.
>>>
>>> I'll prepare a test case for this as soon as I figure out how to write
>>> it..
...
> In the meantime, I'll try to create a Git test that demonstrates a
> problem one way or another.

I created a test, but had some trouble reproducing it due to some
subtleties higher in the call stack. Here is a patch that reverts
the change and adds some tests.

The Scalar functional tests passed with the revert, so the original
patch was worthless to begin with. I don't recall what motivated the
change, but clearly it was a mistake. Sorry.

---- >8 ----

From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 2 Nov 2021 10:40:06 -0400
Subject: [PATCH] dir: fix directory-matching bug

This reverts the change from ed49584 (dir: fix pattern matching on dirs,
2021-09-24), which claimed to fix a directory-matching problem without a
test case. It turns out to _create_ a bug, but it is a bit subtle.

The bug would have been revealed by the first of two tests being added to
t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
would fail before the revert.

The second test shows what happens if the test instead uses a pattern "git/"
and this test passes both before and after the revert.

The difference in these two cases are due to how
last_matching_pattern_from_list() checks patterns both if they have the
PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
match_pathname() not affect the end result of
last_matching_pattern_from_list().

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c              |  2 +-
 t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index c6d7a8647b9..94489298f4c 100644
--- a/dir.c
+++ b/dir.c
@@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
 		 * then our prefix match is all we need; we
 		 * do not need to call fnmatch at all.
 		 */
-		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
+		if (!patternlen && !namelen)
 			return 1;
 	}
 
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 532637de882..1889cfc60e0 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
 	grep top-level-dir actual
 '
 
+test_expect_success 'exact prefix matching (with root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo /git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'exact prefix matching (without root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
 ############################################################################
 #
 # test whitespace handling
-- 
2.34.0.vfs.0.0.rc0.dirty

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Nov 02 2021, Derrick Stolee wrote:

> On 11/2/2021 9:42 AM, Derrick Stolee wrote:
>> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>>> Glen Choo <chooglen@google.com> writes:
>>>
>>>> This patch changes the behavior of .gitignore such that directories are
>>>> now matched by prefix instead of matching exactly.
>> 
>> Thank you for pointing out an unintended consequence.
>> 
>>>> The failure that we observed is something like the following:
>>>>
>>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>>> prefix.
>>>>
>>>> I'll prepare a test case for this as soon as I figure out how to write
>>>> it..
> ...
>> In the meantime, I'll try to create a Git test that demonstrates a
>> problem one way or another.
>
> I created a test, but had some trouble reproducing it due to some
> subtleties higher in the call stack. Here is a patch that reverts
> the change and adds some tests.
>
> The Scalar functional tests passed with the revert, so the original
> patch was worthless to begin with. I don't recall what motivated the
> change, but clearly it was a mistake. Sorry.
>
> ---- >8 ----
>
> From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 2 Nov 2021 10:40:06 -0400
> Subject: [PATCH] dir: fix directory-matching bug
>
> This reverts the change from ed49584 (dir: fix pattern matching on dirs,
> 2021-09-24), which claimed to fix a directory-matching problem without a
> test case. It turns out to _create_ a bug, but it is a bit subtle.
>
> The bug would have been revealed by the first of two tests being added to
> t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
> file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
> would fail before the revert.
>
> The second test shows what happens if the test instead uses a pattern "git/"
> and this test passes both before and after the revert.
>
> The difference in these two cases are due to how
> last_matching_pattern_from_list() checks patterns both if they have the
> PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
> the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
> match_pathname() not affect the end result of
> last_matching_pattern_from_list().
>
> Reported-by: Glen Choo <chooglen@google.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c              |  2 +-
>  t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index c6d7a8647b9..94489298f4c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
>  		 * then our prefix match is all we need; we
>  		 * do not need to call fnmatch at all.
>  		 */
> -		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
> +		if (!patternlen && !namelen)
>  			return 1;
>  	}
>  
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 532637de882..1889cfc60e0 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
>  	grep top-level-dir actual
>  '
>  
> +test_expect_success 'exact prefix matching (with root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo /git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'exact prefix matching (without root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  ############################################################################
>  #
>  # test whitespace handling

We have t3070-wildmatch.sh testing various combinations of these, and
indeed this code resolves back to wildmatch().

But I think in this case this is due to dir.c's particular behavior of
splitting paths before feeding them to wildmatch, as it needs to match
things relative to the subdirectory.

Still, we've got a matrix of these in t3070-wildmatch.sh, which already
tests some (but apparently not all) cases where we need to create an
actual file on disk. These sorts of test blindspots are why I added that
in de8bada2bf6 (wildmatch test: create & test files on disk in addition
to in-memory, 2018-01-30).

Wouldn't it be better & more exhaustive here to change its test lines
like:


    match 0 0 1 1 foo/bar/baz 'bar'

To say:

    match 0 0 1 1 ? ? foo/bar/baz 'bar'

And just add to its match() function so that if we have a subject with a
slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
.gitignore file therein with the "foo" directory with the "bar" content,
perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.

Then create a "bar.txt" in the directory as well, and a
bar-otherdir/somefile.txt or whatever.

And fill in the "? ?" depending on whether those variants matched or
not...

Anyway, just an idea, but if you pursue that you should be able to get
much more exhaustive testing in this area that we've apparently had a
blindspot in.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/2/2021 11:33 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> We have t3070-wildmatch.sh testing various combinations of these, and
> indeed this code resolves back to wildmatch().
> 
> But I think in this case this is due to dir.c's particular behavior of
> splitting paths before feeding them to wildmatch, as it needs to match
> things relative to the subdirectory.
> 
> Still, we've got a matrix of these in t3070-wildmatch.sh, which already
> tests some (but apparently not all) cases where we need to create an
> actual file on disk. These sorts of test blindspots are why I added that
> in de8bada2bf6 (wildmatch test: create & test files on disk in addition
> to in-memory, 2018-01-30).
> 
> Wouldn't it be better & more exhaustive here to change its test lines
> like:
> 
> 
>     match 0 0 1 1 foo/bar/baz 'bar'
> 
> To say:
> 
>     match 0 0 1 1 ? ? foo/bar/baz 'bar'
> 
> And just add to its match() function so that if we have a subject with a
> slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
> .gitignore file therein with the "foo" directory with the "bar" content,
> perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.
> 
> Then create a "bar.txt" in the directory as well, and a
> bar-otherdir/somefile.txt or whatever.
> 
> And fill in the "? ?" depending on whether those variants matched or
> not...
> 
> Anyway, just an idea, but if you pursue that you should be able to get
> much more exhaustive testing in this area that we've apparently had a
> blindspot in.

Those tests are quite exhaustive, but the test script is pretty
inscrutable and the refactor you suggest is pretty major. I'd prefer
to keep to the focused test for the sake of fixing this in time for
the release.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> Those tests are quite exhaustive, but the test script is pretty
> inscrutable and the refactor you suggest is pretty major. I'd prefer
> to keep to the focused test for the sake of fixing this in time for
> the release.

Thanks, both.

* do not need to call fnmatch at all.
*/
if (!patternlen && !namelen)
if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
return 1;
}

Expand All @@ -1303,6 +1303,44 @@ int match_pathname(const char *pathname, int pathlen,
WM_PATHNAME) == 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote:

> +	/*
> +	 * Use 'alloc' as an indicator that the string has not been
> +	 * initialized, in case the parent is the root directory.
> +	 */
> +	if (!path_parent->alloc) {

This isn't wrong, but seems to be way too cozy with the internal
implementation details of strbuf. For what it's worth I renamed it to
"alloc2" and found that this would be only the 3rd bit of code out of
strbuf.[ch] that cares about that member.

> +		char *slash;
> +		strbuf_addstr(path_parent, pathname);

So is "pathname" ever the empty string? If not we could check the
length?

Or probably better: ...

> @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>  {
>  	struct path_pattern *res = NULL; /* undecided */
>  	int i;
> +	struct strbuf path_parent = STRBUF_INIT;

Just malloc + strbuf_init() this in the above function and have a
"struct strbuf *" initialized to NULL here? Then we can use a much more
idiomatic "is it NULL?" to check if it's initialized.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/12/2021 6:21 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote:
> 
>> +	/*
>> +	 * Use 'alloc' as an indicator that the string has not been
>> +	 * initialized, in case the parent is the root directory.
>> +	 */
>> +	if (!path_parent->alloc) {
> 
> This isn't wrong, but seems to be way too cozy with the internal
> implementation details of strbuf. For what it's worth I renamed it to
> "alloc2" and found that this would be only the 3rd bit of code out of
> strbuf.[ch] that cares about that member.

I can understand not wanting to poke into the internals.

>> +		char *slash;
>> +		strbuf_addstr(path_parent, pathname);
> 
> So is "pathname" ever the empty string? If not we could check the
> length?

We are given 'pathlen' as a parameter, so this should just use
strbuf_add() instead.

> Or probably better: ...
> 
>> @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>>  {
>>  	struct path_pattern *res = NULL; /* undecided */
>>  	int i;
>> +	struct strbuf path_parent = STRBUF_INIT;
> 
> Just malloc + strbuf_init() this in the above function and have a
> "struct strbuf *" initialized to NULL here? Then we can use a much more
> idiomatic "is it NULL?" to check if it's initialized.
 
That makes sense. Can do.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 12, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When matching a path against a list of patterns, the ones that require a
> directory match previously did not work when a filename is specified.
> This was fine when all pattern-matching was done within methods such as
> unpack_trees() that check a directory before recursing into the
> contained files. However, other commands will start matching individual
> files against pattern lists without that recursive approach.
>
> We modify path_matches_dir_pattern() to take a strbuf 'path_parent' that
> is used to store the parent directory of 'pathname' between multiple
> pattern matching tests. This is loaded lazily, only on the first pattern
> it finds that has the PATTERN_FLAG_MUSTBEDIR flag.
>
> If we find that a path has a parent directory, we start by checking to
> see if that parent directory matches the pattern. If so, then we do not
> need to query the index for the type (which can be expensive). If we
> find that the parent does not match, then we still must check the type
> from the index for the given pathname.
>
> Note that this does not affect cone mode pattern matching, but instead
> the more general -- and slower -- full pattern set. Thus, this does not
> affect the sparse index.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 652135df896..fe5ee87bb5f 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1305,10 +1305,38 @@ int match_pathname(const char *pathname, int pathlen,
>
>  static int path_matches_dir_pattern(const char *pathname,
>                                     int pathlen,
> +                                   struct strbuf *path_parent,
>                                     int *dtype,
>                                     struct path_pattern *pattern,
>                                     struct index_state *istate)
>  {
> +       /*
> +        * Use 'alloc' as an indicator that the string has not been
> +        * initialized, in case the parent is the root directory.
> +        */
> +       if (!path_parent->alloc) {
> +               char *slash;
> +               strbuf_addstr(path_parent, pathname);
> +               slash = find_last_dir_sep(path_parent->buf);
> +
> +               if (slash)
> +                       *slash = '\0';

Are you breaking strbuf invariants here?  path_parent->len will not be
corrected by this string manipulation.  Perhaps replace this if-else
block with

    strbuf_setlen(path_parent, slash ? slash - path_parent->buf : 0)

> +               else
> +                       strbuf_setlen(path_parent, 0);
> +       }
> +
> +       /*
> +        * If the parent directory matches the pattern, then we do not
> +        * need to check for dtype.
> +        */
> +       if (path_parent->len &&
> +           match_pathname(path_parent->buf, path_parent->len,
> +                          pattern->base,
> +                          pattern->baselen ? pattern->baselen - 1 : 0,
> +                          pattern->pattern, pattern->nowildcardlen,
> +                          pattern->patternlen, pattern->flags))
> +               return 1;
> +
>         *dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>         if (*dtype != DT_DIR)
>                 return 0;
> @@ -1331,6 +1359,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>  {
>         struct path_pattern *res = NULL; /* undecided */
>         int i;
> +       struct strbuf path_parent = STRBUF_INIT;
>
>         if (!pl->nr)
>                 return NULL;    /* undefined */
> @@ -1340,8 +1369,8 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>                 const char *exclude = pattern->pattern;
>                 int prefix = pattern->nowildcardlen;
>
> -               if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) &&
> -                   !path_matches_dir_pattern(pathname, pathlen,
> +               if (pattern->flags & PATTERN_FLAG_MUSTBEDIR &&
> +                   !path_matches_dir_pattern(pathname, pathlen, &path_parent,
>                                               dtype, pattern, istate))
>                         continue;
>
> @@ -1367,6 +1396,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>                         break;
>                 }
>         }
> +       strbuf_release(&path_parent);
>         return res;
>  }
>
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/15/2021 10:54 AM, Elijah Newren wrote:
>> +       /*
>> +        * Use 'alloc' as an indicator that the string has not been
>> +        * initialized, in case the parent is the root directory.
>> +        */
>> +       if (!path_parent->alloc) {
>> +               char *slash;
>> +               strbuf_addstr(path_parent, pathname);
>> +               slash = find_last_dir_sep(path_parent->buf);
>> +
>> +               if (slash)
>> +                       *slash = '\0';
> 
> Are you breaking strbuf invariants here?  path_parent->len will not be
> corrected by this string manipulation.  Perhaps replace this if-else
> block with
> 
>     strbuf_setlen(path_parent, slash ? slash - path_parent->buf : 0)

Yes, I am. I noticed and fixed this when I was rewriting this
patch for Ævar's feedback. Thanks for pointing it out.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int path_matches_dir_pattern(const char *pathname,
> +				    int pathlen,
> +				    int *dtype,
> +				    struct path_pattern *pattern,
> +				    struct index_state *istate)
> +{
> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
> +	if (*dtype != DT_DIR)
> +		return 0;
> +
> +	return 1;
> +}

The function name and parameter list have "pattern" but as far as I
can see any "matches" or "pattern" comes into the picture.  The code
in the caller after calling this function may be doing pattern
matching, but not this one.

What this helper is doing is "signal if the pathname in the working
tree is supposed to be a directory with the return value, while
filling *dtype with what kind of thing it is."

path_must_be_dir_in_working_tree() or something, perhaps?

> @@ -1327,11 +1340,10 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
>  		const char *exclude = pattern->pattern;
>  		int prefix = pattern->nowildcardlen;
>  
> -		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
> -			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
> -			if (*dtype != DT_DIR)
> -				continue;
> -		}
> +		if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) &&
> +		    !path_matches_dir_pattern(pathname, pathlen,
> +					      dtype, pattern, istate))
> +			continue;
>  
>  		if (pattern->flags & PATTERN_FLAG_NODIR) {
>  			if (match_basename(basename,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/22/2021 7:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +static int path_matches_dir_pattern(const char *pathname,
>> +				    int pathlen,
>> +				    int *dtype,
>> +				    struct path_pattern *pattern,
>> +				    struct index_state *istate)
>> +{
>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>> +	if (*dtype != DT_DIR)
>> +		return 0;
>> +
>> +	return 1;
>> +}
> 
> The function name and parameter list have "pattern" but as far as I
> can see any "matches" or "pattern" comes into the picture.  The code
> in the caller after calling this function may be doing pattern
> matching, but not this one.
> 
> What this helper is doing is "signal if the pathname in the working
> tree is supposed to be a directory with the return value, while
> filling *dtype with what kind of thing it is."
> 
> path_must_be_dir_in_working_tree() or something, perhaps?

Yes, a rename would be prudent here. Thanks.

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/23/2021 9:39 AM, Derrick Stolee wrote:
> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> +static int path_matches_dir_pattern(const char *pathname,
>>> +				    int pathlen,
>>> +				    int *dtype,
>>> +				    struct path_pattern *pattern,
>>> +				    struct index_state *istate)
>>> +{
>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>> +	if (*dtype != DT_DIR)
>>> +		return 0;
>>> +
>>> +	return 1;
>>> +}
>>
>> The function name and parameter list have "pattern" but as far as I
>> can see any "matches" or "pattern" comes into the picture.  The code
>> in the caller after calling this function may be doing pattern
>> matching, but not this one.
>>
>> What this helper is doing is "signal if the pathname in the working
>> tree is supposed to be a directory with the return value, while
>> filling *dtype with what kind of thing it is."
>>
>> path_must_be_dir_in_working_tree() or something, perhaps?
> 
> Yes, a rename would be prudent here. Thanks.

Of course, when I go to amend the commit, the commit message says

	We will expand the path_matches_dir_pattern() method in a following
	change.

which means that more will follow that will actually care about the
pattern and matching as a directory.

After looking at the extension in the next patch, do you still think a
rename is necessary? Specifically, this diff hunk:

diff --git a/dir.c b/dir.c
index 652135df896..9ea6cfe61cb 100644
--- a/dir.c
+++ b/dir.c
@@ -1305,10 +1305,35 @@ int match_pathname(const char *pathname, int pathlen,
 
 static int path_matches_dir_pattern(const char *pathname,
 				    int pathlen,
+				    struct strbuf **path_parent,
 				    int *dtype,
 				    struct path_pattern *pattern,
 				    struct index_state *istate)
 {
+	if (!*path_parent) {
+		char *slash;
+		CALLOC_ARRAY(*path_parent, 1);
+		strbuf_add(*path_parent, pathname, pathlen);
+		slash = find_last_dir_sep((*path_parent)->buf);
+
+		if (slash)
+			strbuf_setlen(*path_parent, slash - (*path_parent)->buf);
+		else
+			strbuf_setlen(*path_parent, 0);
+	}
+
+	/*
+	 * If the parent directory matches the pattern, then we do not
+	 * need to check for dtype.
+	 */
+	if ((*path_parent)->len &&
+	    match_pathname((*path_parent)->buf, (*path_parent)->len,
+			   pattern->base,
+			   pattern->baselen ? pattern->baselen - 1 : 0,
+			   pattern->pattern, pattern->nowildcardlen,
+			   pattern->patternlen, pattern->flags))
+		return 1;
+
 	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
 	if (*dtype != DT_DIR)
 		return 0;

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> On 9/23/2021 9:39 AM, Derrick Stolee wrote:
>> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> +static int path_matches_dir_pattern(const char *pathname,
>>>> +				    int pathlen,
>>>> +				    int *dtype,
>>>> +				    struct path_pattern *pattern,
>>>> +				    struct index_state *istate)
>>>> +{
>>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>>> +	if (*dtype != DT_DIR)
>>>> +		return 0;
>>>> +
>>>> +	return 1;
>>>> +}
>>>
>>> The function name and parameter list have "pattern" but as far as I
>>> can see any "matches" or "pattern" comes into the picture.  The code
>>> in the caller after calling this function may be doing pattern
>>> matching, but not this one.
>>>
>>> What this helper is doing is "signal if the pathname in the working
>>> tree is supposed to be a directory with the return value, while
>>> filling *dtype with what kind of thing it is."
>>>
>>> path_must_be_dir_in_working_tree() or something, perhaps?
>> 
>> Yes, a rename would be prudent here. Thanks.
>
> Of course, when I go to amend the commit, the commit message says
>
> 	We will expand the path_matches_dir_pattern() method in a following
> 	change.
>
> which means that more will follow that will actually care about the
> pattern and matching as a directory.
>
> After looking at the extension in the next patch, do you still think a
> rename is necessary?

When the focus and purpose of the function changes, it may warrant a
rename to include "matching" or "pattern", but not before.

Or we might be seeing a premature refactoring with these two steps.
Are we gaining multiple callers of this function before it gets
extended to care about pattern and matching?  If not, perhaps
teaching the inlined codepath about the pattern and matching in
place first before extracting the code to a helper function for
readability and reusability may help make the resulting series
easier to follow, and we do not have to see a function with a
misleading name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/23/2021 2:23 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> On 9/23/2021 9:39 AM, Derrick Stolee wrote:
>>> On 9/22/2021 7:13 PM, Junio C Hamano wrote:
>>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>>
>>>>> +static int path_matches_dir_pattern(const char *pathname,
>>>>> +				    int pathlen,
>>>>> +				    int *dtype,
>>>>> +				    struct path_pattern *pattern,
>>>>> +				    struct index_state *istate)
>>>>> +{
>>>>> +	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
>>>>> +	if (*dtype != DT_DIR)
>>>>> +		return 0;
>>>>> +
>>>>> +	return 1;
>>>>> +}
>>>>
>>>> The function name and parameter list have "pattern" but as far as I
>>>> can see any "matches" or "pattern" comes into the picture.  The code
>>>> in the caller after calling this function may be doing pattern
>>>> matching, but not this one.
>>>>
>>>> What this helper is doing is "signal if the pathname in the working
>>>> tree is supposed to be a directory with the return value, while
>>>> filling *dtype with what kind of thing it is."
>>>>
>>>> path_must_be_dir_in_working_tree() or something, perhaps?
>>>
>>> Yes, a rename would be prudent here. Thanks.
>>
>> Of course, when I go to amend the commit, the commit message says
>>
>> 	We will expand the path_matches_dir_pattern() method in a following
>> 	change.
>>
>> which means that more will follow that will actually care about the
>> pattern and matching as a directory.
>>
>> After looking at the extension in the next patch, do you still think a
>> rename is necessary?
> 
> When the focus and purpose of the function changes, it may warrant a
> rename to include "matching" or "pattern", but not before.
> 
> Or we might be seeing a premature refactoring with these two steps.
> Are we gaining multiple callers of this function before it gets
> extended to care about pattern and matching?  If not, perhaps
> teaching the inlined codepath about the pattern and matching in
> place first before extracting the code to a helper function for
> readability and reusability may help make the resulting series
> easier to follow, and we do not have to see a function with a
> misleading name.
 
Squashing these two patches together has the same effect, but
takes a little bit extra work to see that the re-used code is
the same. It's small enough that I don't see that as a huge
hurdle.

Thanks,
-Stolee

}

static int path_matches_dir_pattern(const char *pathname,
int pathlen,
struct strbuf **path_parent,
int *dtype,
struct path_pattern *pattern,
struct index_state *istate)
{
if (!*path_parent) {
char *slash;
CALLOC_ARRAY(*path_parent, 1);
strbuf_add(*path_parent, pathname, pathlen);
slash = find_last_dir_sep((*path_parent)->buf);

if (slash)
strbuf_setlen(*path_parent, slash - (*path_parent)->buf);
else
strbuf_setlen(*path_parent, 0);
}

/*
* If the parent directory matches the pattern, then we do not
* need to check for dtype.
*/
if ((*path_parent)->len &&
match_pathname((*path_parent)->buf, (*path_parent)->len,
pattern->base,
pattern->baselen ? pattern->baselen - 1 : 0,
pattern->pattern, pattern->nowildcardlen,
pattern->patternlen, pattern->flags))
return 1;

*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
if (*dtype != DT_DIR)
return 0;

return 1;
}

/*
* Scan the given exclude list in reverse to see whether pathname
* should be ignored. The first match (i.e. the last on the list), if
Expand All @@ -1318,6 +1356,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
{
struct path_pattern *res = NULL; /* undecided */
int i;
struct strbuf *path_parent = NULL;

if (!pl->nr)
return NULL; /* undefined */
Expand All @@ -1327,11 +1366,10 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
const char *exclude = pattern->pattern;
int prefix = pattern->nowildcardlen;

if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
if (*dtype != DT_DIR)
continue;
}
if (pattern->flags & PATTERN_FLAG_MUSTBEDIR &&
!path_matches_dir_pattern(pathname, pathlen, &path_parent,
dtype, pattern, istate))
continue;

if (pattern->flags & PATTERN_FLAG_NODIR) {
if (match_basename(basename,
Expand All @@ -1355,6 +1393,12 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
break;
}
}

if (path_parent) {
strbuf_release(path_parent);
free(path_parent);
}

return res;
}

Expand Down
28 changes: 28 additions & 0 deletions t/t1092-sparse-checkout-compatibility.sh
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,16 @@ test_sparse_match () {
test_cmp sparse-checkout-err sparse-index-err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote:

> +# NEEDSWORK: This documents current behavior, but is not a desirable
> +# behavior (untracked files are handled differently than tracked).

I wonder if a test_expect_failure test would be better for the thing
that is the desired behavior, but maybe we don't know what the CLI UI
for that would look like yet.


> +test_expect_success 'add outside sparse cone' '
> +	init_repos &&
> +
> +	run_on_sparse mkdir folder1 &&
> +	run_on_sparse ../edit-contents folder1/a &&
> +	run_on_sparse ../edit-contents folder1/newfile &&
> +	test_sparse_match test_must_fail git add folder1/a &&
> +	test_i18ngrep "Disable or modify the sparsity rules" sparse-checkout-err &&

Just "grep" is preferred over "test_i18ngrep" now, the GETTEXT_POISON
went away.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/12/2021 6:17 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, Sep 12 2021, Derrick Stolee via GitGitGadget wrote:
> 
>> +# NEEDSWORK: This documents current behavior, but is not a desirable
>> +# behavior (untracked files are handled differently than tracked).
> 
> I wonder if a test_expect_failure test would be better for the thing
> that is the desired behavior, but maybe we don't know what the CLI UI
> for that would look like yet.

The problem with test_expect_failure is that you don't know which
line of the test is the problem. That's probably all fine and good
when we completely understand the situation we want to solve but
don't have a good approach to fixing it, but here I want to document
a change in behavior.

Using test_expect_success allows me to demonstrate "it works this
way now" and then "this is how behavior changes".

>> +test_expect_success 'add outside sparse cone' '
>> +	init_repos &&
>> +
>> +	run_on_sparse mkdir folder1 &&
>> +	run_on_sparse ../edit-contents folder1/a &&
>> +	run_on_sparse ../edit-contents folder1/newfile &&
>> +	test_sparse_match test_must_fail git add folder1/a &&
>> +	test_i18ngrep "Disable or modify the sparsity rules" sparse-checkout-err &&
> 
> Just "grep" is preferred over "test_i18ngrep" now, the GETTEXT_POISON
> went away.

Right. Force of habit.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Add some tests to demonstrate the current behavior around adding files
> outside of the sparse-checkout cone. Currently, untracked files are
> handled differently from tracked files. A future change will make these
> cases be handled the same way.
>
> Further expand checking that a failed 'git add' does not stage changes
> to the index.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 28 ++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 886e78715fe..3fb764f5eb9 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -187,6 +187,16 @@ test_sparse_match () {
>  	test_cmp sparse-checkout-err sparse-index-err
>  }
>  
> +test_sparse_unstaged () {
> +	file=$1 &&
> +	for repo in sparse-checkout sparse-index
> +	do
> +		git -C $repo status --porcelain >$repo-out &&
> +		! grep "^A  $file\$" $repo-out &&
> +		! grep "^M  $file\$" $repo-out || return 1

Is addition and modification very special, as opposed to other kinds
of changes?  Is the reason why we say "we do not want to see
addition nor modification" here because there is no concrete X that
we can say "we do want to see X" in various callers of this helper?

I am also wondering, if this is asserting that $file is not added to
the index, why we are using "git status" and not "ls-files", for
example.  Wouldn't

    test_must_fail git ls-files --error-unmatch "$file"

be a more direct way to do so?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/22/2021 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Add some tests to demonstrate the current behavior around adding files
>> outside of the sparse-checkout cone. Currently, untracked files are
>> handled differently from tracked files. A future change will make these
>> cases be handled the same way.
>>
>> Further expand checking that a failed 'git add' does not stage changes
>> to the index.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 28 ++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 886e78715fe..3fb764f5eb9 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -187,6 +187,16 @@ test_sparse_match () {
>>  	test_cmp sparse-checkout-err sparse-index-err
>>  }
>>  
>> +test_sparse_unstaged () {
>> +	file=$1 &&
>> +	for repo in sparse-checkout sparse-index
>> +	do
>> +		git -C $repo status --porcelain >$repo-out &&
>> +		! grep "^A  $file\$" $repo-out &&
>> +		! grep "^M  $file\$" $repo-out || return 1
> 
> Is addition and modification very special, as opposed to other kinds
> of changes?  Is the reason why we say "we do not want to see
> addition nor modification" here because there is no concrete X that
> we can say "we do want to see X" in various callers of this helper?
> 
> I am also wondering, if this is asserting that $file is not added to
> the index, why we are using "git status" and not "ls-files", for
> example.  Wouldn't
> 
>     test_must_fail git ls-files --error-unmatch "$file"
> 
> be a more direct way to do so?

The intention is to avoid a staged change, even if that is a modification
or addition. The --error-unmatch approach only works if we are trying to
avoid an add, and we need to know that the path does not exist in the
index already if we want to take that approach.

Upon reflection, perhaps what we really want is this:

	git diff --staged -- $path >diff &&
	test_must_be_empty diff

This applies to the previous patch, too.

Thanks,
-Stolee

}

test_sparse_unstaged () {
file=$1 &&
for repo in sparse-checkout sparse-index
do
# Skip "unmerged" paths
git -C $repo diff --staged --diff-filter=ACDMRTXB -- "$file" >diff &&
test_must_be_empty diff || return 1
done
}

test_expect_success 'sparse-index contents' '
init_repos &&

Expand Down Expand Up @@ -291,6 +301,20 @@ test_expect_success 'add, commit, checkout' '
test_all_match git checkout -
'

# NEEDSWORK: This documents current behavior, but is not a desirable
# behavior (untracked files are handled differently than tracked).
test_expect_success 'add outside sparse cone' '
init_repos &&

run_on_sparse mkdir folder1 &&
run_on_sparse ../edit-contents folder1/a &&
run_on_sparse ../edit-contents folder1/newfile &&
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_sparse_match git add folder1/newfile
'

test_expect_success 'commit including unstaged changes' '
init_repos &&

Expand Down Expand Up @@ -339,7 +363,11 @@ test_expect_success 'status/add: outside sparse cone' '

# Adding the path outside of the sparse-checkout cone should fail.
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_sparse_match test_must_fail git add --refresh folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&

# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
test_sparse_match git add folder1/new &&
Expand Down
14 changes: 14 additions & 0 deletions t/t3705-add-sparse-checkout.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ setup_sparse_entry () {
fi &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Sun, Sep 12, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The tests in t3705-add-sparse-checkout.sh check to see how 'git add'
> behaves with paths outside the sparse-checkout definition. These
> currently check to see if a given warning is present but not that the
> index is not updated with the sparse entries. Add a new
> 'test_sparse_entry_unstaged' helper to be sure 'git add' is behaving
> correctly.
>
> We need to modify setup_sparse_entry to actually commit the sparse_entry
> file so it exists at HEAD but is not already staged in the index.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t3705-add-sparse-checkout.sh | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> index 2b1fd0d0eef..af81b4b6846 100755
> --- a/t/t3705-add-sparse-checkout.sh
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -19,6 +19,7 @@ setup_sparse_entry () {
>         fi &&
>         git add sparse_entry &&
>         git update-index --skip-worktree sparse_entry &&
> +       git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
>         SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
>  }
>
> @@ -36,6 +37,11 @@ setup_gitignore () {
>         EOF
>  }
>
> +test_sparse_entry_unstaged () {
> +       git status --porcelain >actual &&
> +       ! grep "^M  sparse_entry\$" actual

Is there a reason this is ^M rather than ^D?  Granted, both would be
bugs so I wouldn't expect either to appear, but the point of the check
is looking for likely errors.  Wouldn't the more likely error case for
a file not present in the working tree be that we stage the deletion
of the file?

> +}
> +
>  test_expect_success 'setup' "
>         cat >sparse_error_header <<-EOF &&
>         The following pathspecs didn't match any eligible path, but they do match index
> @@ -55,6 +61,7 @@ test_expect_success 'git add does not remove sparse entries' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
>         test_must_fail git add sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp error_and_hint stderr &&
>         test_sparse_entry_unchanged
>  '
> @@ -73,6 +80,7 @@ test_expect_success 'git add . does not remove sparse entries' '
>         rm sparse_entry &&
>         setup_gitignore &&
>         test_must_fail git add . 2>stderr &&
> +       test_sparse_entry_unstaged &&
>
>         cat sparse_error_header >expect &&
>         echo . >>expect &&
> @@ -88,6 +96,7 @@ do
>                 setup_sparse_entry &&
>                 echo modified >sparse_entry &&
>                 test_must_fail git add $opt sparse_entry 2>stderr &&
> +               test_sparse_entry_unstaged &&
>                 test_cmp error_and_hint stderr &&
>                 test_sparse_entry_unchanged
>         '
> @@ -98,6 +107,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
>         git ls-files --debug sparse_entry | grep mtime >before &&
>         test-tool chmtime -60 sparse_entry &&
>         test_must_fail git add --refresh sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp error_and_hint stderr &&
>         git ls-files --debug sparse_entry | grep mtime >after &&
>         test_cmp before after
> @@ -106,6 +116,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
>  test_expect_success 'git add --chmod does not update sparse entries' '
>         setup_sparse_entry &&
>         test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp error_and_hint stderr &&
>         test_sparse_entry_unchanged &&
>         ! test -x sparse_entry
> @@ -116,6 +127,7 @@ test_expect_success 'git add --renormalize does not update sparse entries' '
>         setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>         echo "sparse_entry text=auto" >.gitattributes &&
>         test_must_fail git add --renormalize sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp error_and_hint stderr &&
>         test_sparse_entry_unchanged
>  '
> @@ -124,6 +136,7 @@ test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
>         test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp error_and_hint stderr &&
>         test_sparse_entry_unchanged
>  '
> @@ -148,6 +161,7 @@ test_expect_success 'do not warn when pathspec matches dense entries' '
>  test_expect_success 'add obeys advice.updateSparsePath' '
>         setup_sparse_entry &&
>         test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
> +       test_sparse_entry_unstaged &&
>         test_cmp sparse_entry_error stderr
>
>  '
> --
> gitgitgadget

Looks fine otherwise.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/15/2021 1:22 AM, Elijah Newren wrote:
> On Sun, Sep 12, 2021 at 6:23 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The tests in t3705-add-sparse-checkout.sh check to see how 'git add'
>> behaves with paths outside the sparse-checkout definition. These
>> currently check to see if a given warning is present but not that the
>> index is not updated with the sparse entries. Add a new
>> 'test_sparse_entry_unstaged' helper to be sure 'git add' is behaving
>> correctly.
>>
>> We need to modify setup_sparse_entry to actually commit the sparse_entry
>> file so it exists at HEAD but is not already staged in the index.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t3705-add-sparse-checkout.sh | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
>> index 2b1fd0d0eef..af81b4b6846 100755
>> --- a/t/t3705-add-sparse-checkout.sh
>> +++ b/t/t3705-add-sparse-checkout.sh
>> @@ -19,6 +19,7 @@ setup_sparse_entry () {
>>         fi &&
>>         git add sparse_entry &&
>>         git update-index --skip-worktree sparse_entry &&
>> +       git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
>>         SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
>>  }
>>
>> @@ -36,6 +37,11 @@ setup_gitignore () {
>>         EOF
>>  }
>>
>> +test_sparse_entry_unstaged () {
>> +       git status --porcelain >actual &&
>> +       ! grep "^M  sparse_entry\$" actual
> 
> Is there a reason this is ^M rather than ^D?  Granted, both would be
> bugs so I wouldn't expect either to appear, but the point of the check
> is looking for likely errors.  Wouldn't the more likely error case for
> a file not present in the working tree be that we stage the deletion
> of the file?

You are right that we should be checking for deletions or adds
_as well_ as modifications. The test_sparse_entry_unstaged helper
is used in a variety of situations that typically would trigger
a modification, but at least one instance in this test is a
possible deletion.

> 
>> +}
>> +
>>  test_expect_success 'setup' "
>>         cat >sparse_error_header <<-EOF &&
>>         The following pathspecs didn't match any eligible path, but they do match index
>> @@ -55,6 +61,7 @@ test_expect_success 'git add does not remove sparse entries' '
>>         setup_sparse_entry &&
>>         rm sparse_entry &&
>>         test_must_fail git add sparse_entry 2>stderr &&
>> +       test_sparse_entry_unstaged &&

Here, sparse_entry could be staged as a deletion.

>>         test_cmp error_and_hint stderr &&
>>         test_sparse_entry_unchanged
>>  '
>> @@ -73,6 +80,7 @@ test_expect_success 'git add . does not remove sparse entries' '
>>         rm sparse_entry &&
>>         setup_gitignore &&
>>         test_must_fail git add . 2>stderr &&
>> +       test_sparse_entry_unstaged &&

Deletion here.

>>         cat sparse_error_header >expect &&
>>         echo . >>expect &&
>> @@ -88,6 +96,7 @@ do
>>                 setup_sparse_entry &&
>>                 echo modified >sparse_entry &&
>>                 test_must_fail git add $opt sparse_entry 2>stderr &&
>> +               test_sparse_entry_unstaged &&

But here would be modified.

>>                 test_cmp error_and_hint stderr &&
>>                 test_sparse_entry_unchanged
>>         '
>> @@ -98,6 +107,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
>>         git ls-files --debug sparse_entry | grep mtime >before &&
>>         test-tool chmtime -60 sparse_entry &&
>>         test_must_fail git add --refresh sparse_entry 2>stderr &&
>> +       test_sparse_entry_unstaged &&

Same here.

>>         test_cmp error_and_hint stderr &&
>>         git ls-files --debug sparse_entry | grep mtime >after &&
>>         test_cmp before after
>> @@ -106,6 +116,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
>>  test_expect_success 'git add --chmod does not update sparse entries' '
>>         setup_sparse_entry &&
>>         test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
>> +       test_sparse_entry_unstaged &&

Here it would be modified with a mode change.

Using the pattern "^[MDARCU][M ] sparse_entry\$" seems to work while also
covering these other cases.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The tests in t3705-add-sparse-checkout.sh check to see how 'git add'
> behaves with paths outside the sparse-checkout definition. These
> currently check to see if a given warning is present but not that the
> index is not updated with the sparse entries. Add a new
> 'test_sparse_entry_unstaged' helper to be sure 'git add' is behaving
> correctly.
>
> We need to modify setup_sparse_entry to actually commit the sparse_entry
> file so it exists at HEAD and as an entry in the index, but its exact
> contents are not staged in the index.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t3705-add-sparse-checkout.sh | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> index 2b1fd0d0eef..e202a2ff74a 100755
> --- a/t/t3705-add-sparse-checkout.sh
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -19,6 +19,7 @@ setup_sparse_entry () {
>  	fi &&
>  	git add sparse_entry &&
>  	git update-index --skip-worktree sparse_entry &&
> +	git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
>  	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
>  }
>  
> @@ -36,6 +37,11 @@ setup_gitignore () {
>  	EOF
>  }
>  
> +test_sparse_entry_unstaged () {
> +	git status --porcelain >actual &&
> +	! grep "^[MDARCU][M ] sparse_entry\$" actual

Does this say "we do not want any difference from the index, be it
modification, deletion, addtion, etc."?

Just wondering if there were a reason why the pattern is more
complex than "^[^ ][M ]" (i.e. anything but "unmodified since the
index"), not necessarily suggesting to spell the test differently.

Thanks.

git add sparse_entry &&
git update-index --skip-worktree sparse_entry &&
git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
}

Expand All @@ -36,6 +37,11 @@ setup_gitignore () {
EOF
}

test_sparse_entry_unstaged () {
git diff --staged -- sparse_entry >diff &&
test_must_be_empty diff
}

test_expect_success 'setup' "
cat >sparse_error_header <<-EOF &&
The following pathspecs didn't match any eligible path, but they do match index
Expand All @@ -55,6 +61,7 @@ test_expect_success 'git add does not remove sparse entries' '
setup_sparse_entry &&
rm sparse_entry &&
test_must_fail git add sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
test_sparse_entry_unchanged
'
Expand All @@ -73,6 +80,7 @@ test_expect_success 'git add . does not remove sparse entries' '
rm sparse_entry &&
setup_gitignore &&
test_must_fail git add . 2>stderr &&
test_sparse_entry_unstaged &&

cat sparse_error_header >expect &&
echo . >>expect &&
Expand All @@ -88,6 +96,7 @@ do
setup_sparse_entry &&
echo modified >sparse_entry &&
test_must_fail git add $opt sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
test_sparse_entry_unchanged
'
Expand All @@ -98,6 +107,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
git ls-files --debug sparse_entry | grep mtime >before &&
test-tool chmtime -60 sparse_entry &&
test_must_fail git add --refresh sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
git ls-files --debug sparse_entry | grep mtime >after &&
test_cmp before after
Expand All @@ -106,6 +116,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
test_expect_success 'git add --chmod does not update sparse entries' '
setup_sparse_entry &&
test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
test_sparse_entry_unchanged &&
! test -x sparse_entry
Expand All @@ -116,6 +127,7 @@ test_expect_success 'git add --renormalize does not update sparse entries' '
setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
echo "sparse_entry text=auto" >.gitattributes &&
test_must_fail git add --renormalize sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
test_sparse_entry_unchanged
'
Expand All @@ -124,6 +136,7 @@ test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
setup_sparse_entry &&
rm sparse_entry &&
test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp error_and_hint stderr &&
test_sparse_entry_unchanged
'
Expand All @@ -148,6 +161,7 @@ test_expect_success 'do not warn when pathspec matches dense entries' '
test_expect_success 'add obeys advice.updateSparsePath' '
setup_sparse_entry &&
test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
test_sparse_entry_unstaged &&
test_cmp sparse_entry_error stderr

'
Expand Down