Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior #1018

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 16, 2021

This series is based on ds/mergies-with-sparse-index.

As requested, this series looks to update the behavior of git add, git rm, and git mv when they attempt to modify paths outside of the sparse-checkout cone. In particular, this care is expanded to not just cache entries with the SKIP_WORKTREE bit, but also paths that do not match the sparse-checkout definition.

This means that commands that worked before this series can now fail. In particular, if 'git merge' results in a conflict outside of the sparse-checkout cone, then 'git add ' will now fail.

In order to allow users to circumvent these protections, a new '--sparse' option is added that ignores the sparse-checkout patterns and the SKIP_WORKTREE bit. The message for advice.updateSparsePath is adjusted to assist with discovery of this option.

There is a subtle issue with git mv in that it does not check the index until it discovers a directory and then uses the index to find the contained entries. This means that in non-cone-mode patterns, a pattern such as "sub/dir" will not match the path "sub" and this can cause an issue.

In order to allow for checking arbitrary paths against the sparse-checkout patterns, some changes to the underlying pattern matching code is required. It turns out that there are some bugs in the methods as advertised, but these bugs were never discovered because of the way methods like unpack_trees() will check a directory for a pattern match before checking its contained paths. Our new "check patterns on-demand" approach pokes holes in that approach, specifically with patterns that match entire directories.

Updates in v4

  • Instead of using 'git status' and 'grep' to detect staged changes, we use 'git diff --staged'. t1092 uses an additional --diff-filter because it tests with merge conflicts, so it needs this extra flag.

  • Patches 3 and 4 are merged into the new patch 3 to avoid temporarily having a poorly named method.

Updates in v3

  • Fixed an incorrectly-squashed commit. Spread out some changes in a better way. For example, I don't add --sparse to tests before introducing the option.

  • Use a NULL struct strbuf pointer to indicate an uninitialized value instead of relying on an internal member.

  • Use grep over test_i18ngrep.

  • Fixed line wrapping for error messages.

  • Use strbuf_setlen() over modifying the len member manually.

Updates in v2

  • I got no complaints about these restrictions, so this is now a full series, not RFC.

  • Thanks to Matheus, several holes are filled with extra testing and bugfixes.

  • New patches add --chmod and --renormalize improvements. These are added after the --sparse option to make them be one change each.

Thanks,
-Stolee

cc: newren@gmail.com
cc: gitster@pobox.com
cc: matheus.bernardino@usp.br
cc: stolee@gmail.com
cc: vdye@github.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Sean Christopherson seanjc@google.com
cc: Glen Choo chooglen@google.com

@derrickstolee derrickstolee self-assigned this Aug 16, 2021
@derrickstolee derrickstolee changed the title Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior [RFC] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Aug 16, 2021
@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch from f6e52f0 to 1f9f77c Compare August 16, 2021 18:20
@derrickstolee
Copy link
Author

Note to self: do not submit this until after the git merge integration is submitted.

@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch 2 times, most recently from 299a610 to cd881c8 Compare August 17, 2021 15:47
@derrickstolee derrickstolee force-pushed the stolee/sparse-index/merge branch 2 times, most recently from 350ed86 to f6d800c Compare August 24, 2021 01:12
@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch from cd881c8 to 7601145 Compare August 24, 2021 01:12
@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch from 7601145 to da5ed05 Compare August 24, 2021 01:28
@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch from da5ed05 to cef2167 Compare August 24, 2021 19:00
@derrickstolee derrickstolee force-pushed the sparse-index/add-rm-mv-behavior branch from cef2167 to 7749a69 Compare August 24, 2021 21:25
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2021

Submitted as pull.1018.git.1629842085.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1018/derrickstolee/sparse-index/add-rm-mv-behavior-v1

To fetch this version to local tag pr-1018/derrickstolee/sparse-index/add-rm-mv-behavior-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1018/derrickstolee/sparse-index/add-rm-mv-behavior-v1

@@ -443,6 +443,7 @@ static void check_embedded_repo(const char *path)
static int add_files(struct dir_struct *dir, int flags)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_files() method in builtin/add.c takes a set of untracked files
> that are being added by the input pathspec and inserts them into the
> index. If these files are outside of the sparse-checkout cone, then they
> gain the SKIP_WORKTREE bit at some point. However, this was not checked
> before inserting into the index, so these files are added even though we
> want to avoid modifying the index outside of the sparse-checkout cone.
>
> Add a check within add_files() for these files and write the advice
> about files outside of the sprase-checkout cone.

s/sprase/sparse/

> This behavior change modifies some existing tests within t1092. These
> tests intended to document how a user could interact with the existing
> behavior in place. Many of these tests need to be marked as expecting
> failure. A future change will allow these tests to pass by adding a flag
> to 'git add' that allows users to modify index entries outside of the
> sparse-checkout cone.
>
> The 'submodule handling' test is intended to document what happens to
> directories that contain a submodule when the sparse index is enabled.
> It is not trying to say that users should be able to add submodules
> outside of the sparse-checkout cone, so that test can be modified to
> avoid that operation.

While I was playing with this patch, I did the following:

echo a >a
echo b >b
git add .
git commit -m files
git sparse-checkout set a
echo c >c
git add c

And the last `git add` was successful in adding the untracked `c` file
which is outside the sparse checkout. I'm not sure if I'm doing
something wrong, but it seems that `path_in_sparse_checkout()` returns
UNDECIDED for `c`. Is it because there was no pattern in the list
explicitly excluding it? And if so, should we consider UNDECIDED as
NOT_MATCHED for `path_in_sparse_checkout()`?

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c                            | 14 ++++++++++
>  t/t1092-sparse-checkout-compatibility.sh | 33 +++++++++++++++++-------
>  2 files changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 88a6c0c69fb..3a109276b74 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -443,6 +443,7 @@ static void check_embedded_repo(const char *path)
>  static int add_files(struct dir_struct *dir, int flags)
>  {
>         int i, exit_status = 0;
> +       struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;

I see this reuses the logic from cmd_add() and refresh(). But since we
are operating on untracked files here, perhaps we could replace
"skip_worktree" by "sparse_paths" or something similar?

>         if (dir->ignored_nr) {
>                 fprintf(stderr, _(ignore_error));
> @@ -456,6 +457,11 @@ static int add_files(struct dir_struct *dir, int flags)
>         }
>
>         for (i = 0; i < dir->nr; i++) {
> +               if (!path_in_sparse_checkout(dir->entries[i]->name, &the_index)) {
> +                       string_list_append(&only_match_skip_worktree,
> +                                          dir->entries[i]->name);
> +                       continue;
> +               }
>                 if (add_file_to_index(&the_index, dir->entries[i]->name, flags)) {
>                         if (!ignore_add_errors)
>                                 die(_("adding files failed"));
> @@ -464,6 +470,14 @@ static int add_files(struct dir_struct *dir, int flags)
>                         check_embedded_repo(dir->entries[i]->name);
>                 }
>         }
> +
> +       if (only_match_skip_worktree.nr) {
> +               advise_on_updating_sparse_paths(&only_match_skip_worktree);


Hmm, advise_on_updating_sparse_paths() takes a list of pathspecs that
only matched sparse paths, but here we are passing a list of actual
pathnames... Well, these are technically pathspecs too, but the advice
message may be confusing.

For example, if we ran `git add *.c` on a repo with the untracked
files `d1/file.c` and `d2/file.c`, we will get:

The following pathspecs didn't match any eligible path, but they do match index
entries outside the current sparse checkout:
d1/file.c
d2/file.c

However, `d1/file.c` and `d2/file.c` are neither index entries nor the
pathspecs that the user has given to `git add`. So perhaps we need to
change the error/advice message?


> +               exit_status = 1;
> +       }
> +       string_list_clear(&only_match_skip_worktree, 0);
> +
>         return exit_status;
>  }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Fri, Aug 27, 2021 at 6:06 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> While I was playing with this patch, I did the following:
>
> echo a >a
> echo b >b
> git add .
> git commit -m files
> git sparse-checkout set a
> echo c >c
> git add c
>
> And the last `git add` was successful in adding the untracked `c` file
> which is outside the sparse checkout. I'm not sure if I'm doing
> something wrong, but it seems that `path_in_sparse_checkout()` returns
> UNDECIDED for `c`. Is it because there was no pattern in the list
> explicitly excluding it? And if so, should we consider UNDECIDED as
> NOT_MATCHED for `path_in_sparse_checkout()`?

Please disconsider this, It was my fault indeed. I had applied the
patches onto the wrong base. Now I fetched them again but from the GGG
tag, and my manual test worked as expected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 5:06 PM, Matheus Tavares Bernardino wrote:
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

Thanks for adding your review. I'm sorry I'm so late getting back to it.

>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The add_files() method in builtin/add.c takes a set of untracked files
>> that are being added by the input pathspec and inserts them into the
>> index. If these files are outside of the sparse-checkout cone, then they
>> gain the SKIP_WORKTREE bit at some point. However, this was not checked
>> before inserting into the index, so these files are added even though we
>> want to avoid modifying the index outside of the sparse-checkout cone.
>>
>> Add a check within add_files() for these files and write the advice
>> about files outside of the sprase-checkout cone.
> 
> s/sprase/sparse/

Thanks.

>> diff --git a/builtin/add.c b/builtin/add.c
>> index 88a6c0c69fb..3a109276b74 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -443,6 +443,7 @@ static void check_embedded_repo(const char *path)
>>  static int add_files(struct dir_struct *dir, int flags)
>>  {
>>         int i, exit_status = 0;
>> +       struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
> 
> I see this reuses the logic from cmd_add() and refresh(). But since we
> are operating on untracked files here, perhaps we could replace
> "skip_worktree" by "sparse_paths" or something similar?

How about "matched_sparse_paths" as a whole name swap? The earlier uses
cared only if every match was sparse, but here we are actually looking
at cases that are untracked, and the pathspec could also match other
non-sparse cases.

>> +
>> +       if (only_match_skip_worktree.nr) {
>> +               advise_on_updating_sparse_paths(&only_match_skip_worktree);
> 
> 
> Hmm, advise_on_updating_sparse_paths() takes a list of pathspecs that
> only matched sparse paths, but here we are passing a list of actual
> pathnames... Well, these are technically pathspecs too, but the advice
> message may be confusing.
> 
> For example, if we ran `git add *.c` on a repo with the untracked
> files `d1/file.c` and `d2/file.c`, we will get:
> 
> The following pathspecs didn't match any eligible path, but they do match index
> entries outside the current sparse checkout:
> d1/file.c
> d2/file.c
> 
> However, `d1/file.c` and `d2/file.c` are neither index entries nor the
> pathspecs that the user has given to `git add`. So perhaps we need to
> change the error/advice message?

I think the advice should be modified to refer to paths and/or pathspecs,
and also it is not completely correct anymore.

Instead of

  The following pathspecs didn't match any eligible path, but they do match index
  entries outside the current sparse checkout:

perhaps

  The following paths and/or pathspecs matched paths that exist outside of your
  sparse-checkout definition, so will not be updated in the index:

I'm going to save this one for a new patch at the end to make sure it handles
all of the cases involved in this series.

Thanks,
-Stolee

@@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>
> Subject: [PATCH 06/13] add: skip paths that are outside sparse-checkout cone

Perhaps this could be "skip _tracked_ paths that ..." (to help
differentiate the end goal of this patch from the previous one).

> diff --git a/pathspec.c b/pathspec.c
> index 44306fdaca2..0e6e60fdc5a 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                 return;
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
> -               if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> +               if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
> +                   (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))

Hmm, even though we skip the sparse paths here, cmd_add() will call
add_files_to_cache() at the end and still update these paths in the
index. I think there are two ways to fix this. We could either change
run_diff_files() to skip these paths (but I don't know how other callers
of this functions want to handle this, so maybe this needs to hide
behind an option flag):

diff --git a/diff-lib.c b/diff-lib.c
index f9eadc4fc1..4245d7ead5 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -198,7 +198,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				continue;
 		}
 
-		if (ce_uptodate(ce) || ce_skip_worktree(ce))
+		if (ce_uptodate(ce) || ce_skip_worktree(ce) ||
+		    !path_in_sparse_checkout(ce->name, istate))
 			continue;
 
 		/*

Or we could change the callback in add itself:

diff --git a/builtin/add.c b/builtin/add.c
index f675bdeae4..3d7762aac2 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -94,6 +94,10 @@ static void update_callback(struct diff_queue_struct *q,
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		const char *path = p->one->path;
+
+		if (!path_in_sparse_checkout(path, &the_index))
+			continue;
+
 		switch (fix_unmerged_status(p, data)) {
 		default:
 			die(_("unexpected diff status %c"), p->status);

I believe we also need to update a few other places to use the
`(ce_skip_worktree(ce) || !path_in_sparse_checkout())` logic in order to
avoid updating tracked sparse paths: chmod_pathspec() for add's --chmod
option, renormalize_tracked_files() for --renormalize, and
read-cache.c:refresh_index() for --refresh.

>                         continue;
>                 ce_path_match(istate, ce, pathspec, seen);
>         }

Hmm, don't we also want to update
find_pathspecs_matching_skip_worktree() in this patch to use
path_in_sparse_checkout()? I see you did that in patch 8, but I think
this should be together with this current patch as, without it, we stop
adding tracked sparse paths but we print no error/advice message about
it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 5:13 PM, Matheus Tavares wrote:
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>
>> Subject: [PATCH 06/13] add: skip paths that are outside sparse-checkout cone
> 
> Perhaps this could be "skip _tracked_ paths that ..." (to help
> differentiate the end goal of this patch from the previous one).
> 
>> diff --git a/pathspec.c b/pathspec.c
>> index 44306fdaca2..0e6e60fdc5a 100644
>> --- a/pathspec.c
>> +++ b/pathspec.c
>> @@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>                 return;
>>         for (i = 0; i < istate->cache_nr; i++) {
>>                 const struct cache_entry *ce = istate->cache[i];
>> -               if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
>> +               if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>> +                   (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
> 
> Hmm, even though we skip the sparse paths here, cmd_add() will call
> add_files_to_cache() at the end and still update these paths in the
> index. I think there are two ways to fix this. We could either change
> run_diff_files() to skip these paths (but I don't know how other callers
> of this functions want to handle this, so maybe this needs to hide
> behind an option flag):

You are absolutely right to point this out. I had missed this interaction.
But, this is also already broken. The patch below adds a check to show that
'git add' does not add the sparse_entry, but it does (even when applied
before any patch in this series). That is: all the modified tests fail
after this change. I'll work to fix this issue before the next version of
this series.

Thanks,
-Stolee

--- >8 ---

From 21dab466d221e8632d98553f5f1fa900a2d47c7f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Wed, 8 Sep 2021 15:40:32 -0400
Subject: [PATCH] t3705: test that 'sparse_entry' is unstaged

The tests in t3705-add-sparse-checkout.sh check to see how 'git add'
behaves with paths outside the sparse-checkout definition. These
currently check to see if a given warning is present but not that the
index is not updated with the sparse entries. Add a new
'test_sparse_entry_unstaged' helper to be sure 'git add' is behaving
correctly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t3705-add-sparse-checkout.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 2b1fd0d0eef..d31a0d4f550 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -36,6 +36,11 @@ setup_gitignore () {
 	EOF
 }
 
+test_sparse_entry_unstaged () {
+	git status --porcelain >actual &&
+	! grep "A  sparse_entry" actual
+}
+
 test_expect_success 'setup' "
 	cat >sparse_error_header <<-EOF &&
 	The following pathspecs didn't match any eligible path, but they do match index
@@ -55,6 +60,7 @@ test_expect_success 'git add does not remove sparse entries' '
 	setup_sparse_entry &&
 	rm sparse_entry &&
 	test_must_fail git add sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
@@ -73,6 +79,7 @@ test_expect_success 'git add . does not remove sparse entries' '
 	rm sparse_entry &&
 	setup_gitignore &&
 	test_must_fail git add . 2>stderr &&
+	test_sparse_entry_unstaged &&
 
 	cat sparse_error_header >expect &&
 	echo . >>expect &&
@@ -88,6 +95,7 @@ do
 		setup_sparse_entry &&
 		echo modified >sparse_entry &&
 		test_must_fail git add $opt sparse_entry 2>stderr &&
+		test_sparse_entry_unstaged &&
 		test_cmp error_and_hint stderr &&
 		test_sparse_entry_unchanged
 	'
@@ -98,6 +106,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
 	git ls-files --debug sparse_entry | grep mtime >before &&
 	test-tool chmtime -60 sparse_entry &&
 	test_must_fail git add --refresh sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp error_and_hint stderr &&
 	git ls-files --debug sparse_entry | grep mtime >after &&
 	test_cmp before after
@@ -106,6 +115,7 @@ test_expect_success 'git add --refresh does not update sparse entries' '
 test_expect_success 'git add --chmod does not update sparse entries' '
 	setup_sparse_entry &&
 	test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp error_and_hint stderr &&
 	test_sparse_entry_unchanged &&
 	! test -x sparse_entry
@@ -116,6 +126,7 @@ test_expect_success 'git add --renormalize does not update sparse entries' '
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
 	test_must_fail git add --renormalize sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
@@ -124,6 +135,7 @@ test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
 	setup_sparse_entry &&
 	rm sparse_entry &&
 	test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
@@ -148,6 +160,7 @@ test_expect_success 'do not warn when pathspec matches dense entries' '
 test_expect_success 'add obeys advice.updateSparsePath' '
 	setup_sparse_entry &&
 	test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
+	test_sparse_entry_unstaged &&
 	test_cmp sparse_entry_error stderr
 
 '
-- 
2.33.0.vfs.signtest.55.g904c8365e91

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/8/2021 3:46 PM, Derrick Stolee wrote:
> On 8/27/2021 5:13 PM, Matheus Tavares wrote:
>> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>>
>>> Subject: [PATCH 06/13] add: skip paths that are outside sparse-checkout cone
>>
>> Perhaps this could be "skip _tracked_ paths that ..." (to help
>> differentiate the end goal of this patch from the previous one).
>>
>>> diff --git a/pathspec.c b/pathspec.c
>>> index 44306fdaca2..0e6e60fdc5a 100644
>>> --- a/pathspec.c
>>> +++ b/pathspec.c
>>> @@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>>                 return;
>>>         for (i = 0; i < istate->cache_nr; i++) {
>>>                 const struct cache_entry *ce = istate->cache[i];
>>> -               if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
>>> +               if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>>> +                   (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
>>
>> Hmm, even though we skip the sparse paths here, cmd_add() will call
>> add_files_to_cache() at the end and still update these paths in the
>> index. I think there are two ways to fix this. We could either change
>> run_diff_files() to skip these paths (but I don't know how other callers
>> of this functions want to handle this, so maybe this needs to hide
>> behind an option flag):
> 
> You are absolutely right to point this out. I had missed this interaction.
> But, this is also already broken. The patch below adds a check to show that
> 'git add' does not add the sparse_entry, but it does (even when applied
> before any patch in this series). That is: all the modified tests fail
> after this change. I'll work to fix this issue before the next version of
> this series.

Of course, the reason for the failures is because the 'sparse_entry' is
staged as part of setup_sparse_entry. Not a bug.

This makes things more difficult to test, so I'll look around for an
alternative way to test that 'git add' is behaving correctly.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 5:13 PM, Matheus Tavares wrote:
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>
>> Subject: [PATCH 06/13] add: skip paths that are outside sparse-checkout cone
> 
> Perhaps this could be "skip _tracked_ paths that ..." (to help
> differentiate the end goal of this patch from the previous one).
> 
>> diff --git a/pathspec.c b/pathspec.c
>> index 44306fdaca2..0e6e60fdc5a 100644
>> --- a/pathspec.c
>> +++ b/pathspec.c
>> @@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>                 return;
>>         for (i = 0; i < istate->cache_nr; i++) {
>>                 const struct cache_entry *ce = istate->cache[i];
>> -               if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
>> +               if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>> +                   (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
> 
> Hmm, even though we skip the sparse paths here, cmd_add() will call
> add_files_to_cache() at the end and still update these paths in the
> index. I think there are two ways to fix this. We could either change
> run_diff_files() to skip these paths (but I don't know how other callers
> of this functions want to handle this, so maybe this needs to hide
> behind an option flag):

Ok, this sent me off on a tangent (see other replies) trying to show
that 'git add <sparse-path>' is still modifying index entries. I finally
added enough checks that this fails in my merge/cherry-pick/rebase tests
for conflicts outside of the sparse-checkout cone. Here is a test that I
can add to t3705 to get a failure:

test_expect_success 'git add fails outside of sparse-checkout definition' '
	test_when_finished git sparse-checkout disable &&
	test_commit a &&
	git sparse-checkout init &&
	git sparse-checkout set a &&

	git update-index --no-skip-worktree sparse_entry &&
	test_must_fail git add sparse_entry &&
	test_sparse_entry_unstaged
'
 
> diff --git a/diff-lib.c b/diff-lib.c
> index f9eadc4fc1..4245d7ead5 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -198,7 +198,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  				continue;
>  		}
>  
> -		if (ce_uptodate(ce) || ce_skip_worktree(ce))
> +		if (ce_uptodate(ce) || ce_skip_worktree(ce) ||
> +		    !path_in_sparse_checkout(ce->name, istate))
>  			continue;
>  
>  		/*
> 
> Or we could change the callback in add itself:
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index f675bdeae4..3d7762aac2 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -94,6 +94,10 @@ static void update_callback(struct diff_queue_struct *q,
>  	for (i = 0; i < q->nr; i++) {
>  		struct diff_filepair *p = q->queue[i];
>  		const char *path = p->one->path;
> +
> +		if (!path_in_sparse_checkout(path, &the_index))
> +			continue;
> +
>  		switch (fix_unmerged_status(p, data)) {
>  		default:
>  			die(_("unexpected diff status %c"), p->status);

If I use this second callback, then I have access to 'include_sparse'
in the later change that adds the --sparse option.

> I believe we also need to update a few other places to use the
> `(ce_skip_worktree(ce) || !path_in_sparse_checkout())` logic in order to
> avoid updating tracked sparse paths: chmod_pathspec() for add's --chmod
> option, renormalize_tracked_files() for --renormalize, and
> read-cache.c:refresh_index() for --refresh.

OK, I'll make sure to include and test these cases.
 
>>                         continue;
>>                 ce_path_match(istate, ce, pathspec, seen);
>>         }
> 
> Hmm, don't we also want to update
> find_pathspecs_matching_skip_worktree() in this patch to use
> path_in_sparse_checkout()? I see you did that in patch 8, but I think
> this should be together with this current patch as, without it, we stop
> adding tracked sparse paths but we print no error/advice message about
> it.

I'll merge the patches to avoid confusion.

Thanks,
-Stolee

@@ -9,7 +9,7 @@ SYNOPSIS
--------
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 3a109276b74..68f2de80594 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
>
> @@ -46,7 +47,7 @@ static int chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
>                 struct cache_entry *ce = active_cache[i];
>                 int err;
>
> -               if (ce_skip_worktree(ce))
> +               if (!include_sparse && ce_skip_worktree(ce))
>                         continue;

Not related to this change but just as a reminder: if you agree with
my suggestion from the previous patch to use the skipping logic at
run_diff_files() or update_callback(), we also need to override it in
this patch when include_sparse is true.

@@ -39,7 +39,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When a merge results in a conflict outside of the sparse-checkout cone,
> the conflicted file is written to the working tree and the index entry
> loses the SKIP_WORKTREE bit. This allows users to add the file to the
> index without realizing that the file might leave the working tree in a
> later Git command.
>
> Block this behavior, but keep in mind that the user can override the
> failure using the '--sparse' option.

Hmm, didn't we already block this behavior at patch 6?

Nevertheless, as I mentioned there, I think the change to
find_pathspecs_matching_skip_worktree() from this patch should be
together with the other changes from 6.

@@ -72,6 +72,12 @@ For more details, see the 'pathspec' entry in linkgit:gitglossary[7].
--ignore-unmatch::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Subject: [PATCH 09/12] rm: add --sparse option

Maybe mention in the commit message that, for now, rm's --sparse only
affects entries with the skip_worktree bit set? (Which will be changed
in the following patch.)

> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -298,7 +300,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>         ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
> -               if (ce_skip_worktree(ce))
> +
> +               if (!include_sparse && ce_skip_worktree(ce))
>                         continue;

OK, we no longer ignore the skip_worktree entry if --sparse is given ...

>                 if (!ce_path_match(&the_index, ce, &pathspec, seen))
>                         continue;
> @@ -322,7 +325,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>                                 seen_any = 1;
>                         else if (ignore_unmatch)
>                                 continue;
> -                       else if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen))
> +                       else if (!include_sparse &&
> +                                matches_skip_worktree(&pathspec, i, &skip_worktree_seen))
>                                 string_list_append(&only_match_skip_worktree, original);

... and we also don't need to look for matches here as we won't
display the error/advice match.

>                         else
>                                 die(_("pathspec '%s' did not match any files"), original);
> diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
> index e9e9a15c74c..a34b978bfd8 100755
> --- a/t/t3602-rm-sparse-checkout.sh
> +++ b/t/t3602-rm-sparse-checkout.sh
> @@ -36,13 +36,25 @@ done
>
>  test_expect_success 'recursive rm does not remove sparse entries' '
>         git reset --hard &&
> -       git sparse-checkout set sub/dir &&
> +       git sparse-checkout set sub/dir/ &&

Is this change necessary?

>         git rm -r sub &&
>         git status --porcelain -uno >actual &&
>         echo "D  sub/dir/e" >expected &&
>         test_cmp expected actual
>  '
>
> +test_expect_success 'recursive rm --sparse removes sparse entries' '
> +       git reset --hard &&
> +       git sparse-checkout set "sub/dir" &&
> +       git rm --sparse -r sub &&
> +       git status --porcelain -uno >actual &&
> +       cat >expected <<-\EOF &&
> +       D  sub/d
> +       D  sub/dir/e
> +       EOF
> +       test_cmp expected actual
> +'

Nice!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 5:17 PM, Matheus Tavares Bernardino wrote:
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> Subject: [PATCH 09/12] rm: add --sparse option
> 
> Maybe mention in the commit message that, for now, rm's --sparse only
> affects entries with the skip_worktree bit set? (Which will be changed
> in the following patch.)

I will expand with these details.
 
>>  test_expect_success 'recursive rm does not remove sparse entries' '
>>         git reset --hard &&
>> -       git sparse-checkout set sub/dir &&
>> +       git sparse-checkout set sub/dir/ &&
> 
> Is this change necessary?

No, it is not. Thanks.

-Stolee

@@ -237,6 +237,7 @@ static int check_local_mod(struct object_id *head, int index_only)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 4208f3f9a5f..a6da03da2be 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -301,7 +301,9 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
>
> -               if (!include_sparse && ce_skip_worktree(ce))
> +               if (!include_sparse &&
> +                   (ce_skip_worktree(ce) ||
> +                    !path_in_sparse_checkout(ce->name, &the_index)))
>                         continue;

OK. And we don't need to update the advice reporting code below,
because `matches_skip_worktree()` already uses the
`(ce_skip_worktree(ce) || !path_in_sparse_checkout())`  logic.

> +                    !path_in_sparse_checkout"

>                 if (!ce_path_match(&the_index, ce, &pathspec, seen))
>                         continue;
> diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
> index a34b978bfd8..44f3e923164 100755
> --- a/t/t3602-rm-sparse-checkout.sh
> +++ b/t/t3602-rm-sparse-checkout.sh
> @@ -87,4 +87,15 @@ test_expect_success 'do not warn about sparse entries with --ignore-unmatch' '
>         git ls-files --error-unmatch b
>  '
>
> +test_expect_success 'refuse to rm a non-skip-worktree path outside sparse cone' '
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       git update-index --no-skip-worktree b &&
> +       test_must_fail git rm b 2>stderr &&
> +       test_cmp b_error_and_hint stderr &&
> +       git rm --sparse b 2>stderr &&
> +       test_must_be_empty stderr &&
> +       test_path_is_missing b
> +'

OK.

@@ -133,6 +133,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/builtin/mv.c b/builtin/mv.c
> index c2f96c8e895..b58fd4ce5ba 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -176,10 +177,22 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>                 const char *src = source[i], *dst = destination[i];
>                 int length, src_is_dir;
>                 const char *bad = NULL;
> +               int skip_sparse = 0;
>
>                 if (show_only)
>                         printf(_("Checking rename of '%s' to '%s'\n"), src, dst);
>
> +               if (!path_in_sparse_checkout(src, &the_index)) {

`git mv` can only move/rename tracked paths, but since we check
whether `src` is sparse before checking if it is in the index, the
user will get the sparse error message instead. This is OK, but the
advice might be misleading, as it says they can use `--sparse` if they
really want to move the file, but repeating the command with
`--sparse` will now fail for another reason. I wonder if we should
check whether `src` is tracked before checking if it is sparse, or if
that is not really an issue we should bother with.

> +                       string_list_append(&only_match_skip_worktree, src);
> +                       skip_sparse = 1;
> +               }
> +               if (!path_in_sparse_checkout(dst, &the_index)) {
> +                       string_list_append(&only_match_skip_worktree, dst);
> +                       skip_sparse = 1;
> +               }
> +               if (skip_sparse)
> +                       continue;
> +
>                 length = strlen(src);
>                 if (lstat(src, &st) < 0)
>                         bad = _("bad source");
>
> diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh
> new file mode 100755
> index 00000000000..5397c6d07bd
> --- /dev/null
> +++ b/t/t7002-mv-sparse-checkout.sh
> @@ -0,0 +1,99 @@
> +#!/bin/sh
> +
> +test_description='git mv in sparse working trees'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' "
> +       mkdir -p sub/dir sub/dir2 &&
> +       touch a b c sub/d sub/dir/e sub/dir2/e &&
> +       git add -A &&
> +       git commit -m files &&
> +
> +       cat >sparse_error_header <<-EOF &&
> +       The following pathspecs didn't match any eligible path, but they do match index
> +       entries outside the current sparse checkout:
> +       EOF
> +
> +       cat >sparse_hint <<-EOF
> +       hint: Disable or modify the sparsity rules if you intend to update such entries.
> +       hint: Disable this message with \"git config advice.updateSparsePath false\"
> +       EOF
> +"
> +
> +test_expect_success 'mv refuses to move sparse-to-sparse' '
> +       rm -f e &&

At first glance, it confused me a bit that we are removing `e` when
the setup didn't create it. But then I realized the test itself might
create `e` if `git mv` succeeds in moving the `b` file. Could perhaps
this and the others `rm -f e` be a `test_when_finished`, to make it
clearer that it is a cleanup?

> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       touch b &&
> +       test_must_fail git mv b e 2>stderr &&

Here we try to move a "tracked sparse path" to an "untracked sparse
path". Do we also want to test with a tracked to tracked operation?
(Although the code path will be the same, of course.)

> +       cat sparse_error_header >expect &&
> +       echo b >>expect &&
> +       echo e >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'
> +
> +test_expect_success 'mv refuses to move sparse-to-sparse, ignores failure' '
> +       rm -f e &&
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       touch b &&
> +       git mv -k b e 2>stderr &&

Maybe also check that `b` is still there, and `e` is missing?

> +       cat sparse_error_header >expect &&
> +       echo b >>expect &&
> +       echo e >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'
> +
> +test_expect_success 'mv refuses to move non-sparse-to-sparse' '
> +       rm -f e &&
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       test_must_fail git mv a e 2>stderr &&
> +       cat sparse_error_header >expect &&
> +       echo e >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'

OK.

> +test_expect_success 'mv refuses to move sparse-to-non-sparse' '
> +       rm -f e &&
> +       git reset --hard &&
> +       git sparse-checkout set a e &&
> +       touch b &&
> +       test_must_fail git mv b e 2>stderr &&
> +       cat sparse_error_header >expect &&
> +       echo b >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'

OK.

> +test_expect_success 'recursive mv refuses to move (possible) sparse' '
> +       rm -f e &&
> +       git reset --hard &&
> +       # Without cone mode, "sub" and "sub2" do not match
> +       git sparse-checkout set sub/dir sub2/dir &&
> +       test_must_fail git mv sub sub2 2>stderr &&
> +       cat sparse_error_header >expect &&
> +       echo sub >>expect &&
> +       echo sub2 >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'
> +
> +test_expect_success 'recursive mv refuses to move sparse' '
> +       git reset --hard &&
> +       # Use cone mode so "sub/" matches the sparse-checkout patterns
> +       git sparse-checkout init --cone &&
> +       git sparse-checkout set sub/dir sub2/dir &&
> +       test_must_fail git mv sub sub2 2>stderr &&
> +       cat sparse_error_header >expect &&
> +       echo sub/dir2/e >>expect &&
> +       echo sub2/dir2/e >>expect &&
> +       cat sparse_hint >>expect &&
> +       test_cmp expect stderr
> +'
> +

Ah, these last two are very interesting cases!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Fri, Aug 27, 2021 at 6:20 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > diff --git a/builtin/mv.c b/builtin/mv.c
> > index c2f96c8e895..b58fd4ce5ba 100644
> > --- a/builtin/mv.c
> > +++ b/builtin/mv.c
> > @@ -176,10 +177,22 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                 const char *src = source[i], *dst = destination[i];
> >                 int length, src_is_dir;
> >                 const char *bad = NULL;
> > +               int skip_sparse = 0;
> >
> >                 if (show_only)
> >                         printf(_("Checking rename of '%s' to '%s'\n"), src, dst);
> >
> > +               if (!path_in_sparse_checkout(src, &the_index)) {
>
> `git mv` can only move/rename tracked paths, but since we check
> whether `src` is sparse before checking if it is in the index, the
> user will get the sparse error message instead. This is OK, but the
> advice might be misleading, as it says they can use `--sparse` if they
> really want to move the file, but repeating the command with
> `--sparse` will now fail for another reason. I wonder if we should
> check whether `src` is tracked before checking if it is sparse, or if
> that is not really an issue we should bother with.

Another problem is that the displayed message will say that the
pathspecs "match index entries outside sparse checkout" even when the
path given to mv doesn't really exist:

git sparse-checkout set some/dir/
git mv nonexistent-file foo

The following pathspecs didn't match any eligible path, but they do match index
entries outside the current sparse checkout:
nonexistent-file
hint: Disable or modify the sparsity rules if you intend to update such entries.
hint: Disable this message with "git config advice.updateSparsePath false"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/27/2021 5:20 PM, Matheus Tavares Bernardino wrote:
> On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> diff --git a/builtin/mv.c b/builtin/mv.c
>> index c2f96c8e895..b58fd4ce5ba 100644
>> --- a/builtin/mv.c
>> +++ b/builtin/mv.c
>> @@ -176,10 +177,22 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>                 const char *src = source[i], *dst = destination[i];
>>                 int length, src_is_dir;
>>                 const char *bad = NULL;
>> +               int skip_sparse = 0;
>>
>>                 if (show_only)
>>                         printf(_("Checking rename of '%s' to '%s'\n"), src, dst);
>>
>> +               if (!path_in_sparse_checkout(src, &the_index)) {
> 
> `git mv` can only move/rename tracked paths, but since we check
> whether `src` is sparse before checking if it is in the index, the
> user will get the sparse error message instead. This is OK, but the
> advice might be misleading, as it says they can use `--sparse` if they
> really want to move the file, but repeating the command with
> `--sparse` will now fail for another reason. I wonder if we should
> check whether `src` is tracked before checking if it is sparse, or if
> that is not really an issue we should bother with.

I will move the logic to the last possible place before "accepting"
the move, then add a comment detailing why it should be there.

>> +                       string_list_append(&only_match_skip_worktree, src);
>> +                       skip_sparse = 1;
>> +               }
>> +               if (!path_in_sparse_checkout(dst, &the_index)) {
>> +                       string_list_append(&only_match_skip_worktree, dst);
>> +                       skip_sparse = 1;
>> +               }
>> +               if (skip_sparse)
>> +                       continue;
>> +
...
>> +test_expect_success 'mv refuses to move sparse-to-sparse' '
>> +       rm -f e &&
> 
> At first glance, it confused me a bit that we are removing `e` when
> the setup didn't create it. But then I realized the test itself might
> create `e` if `git mv` succeeds in moving the `b` file. Could perhaps
> this and the others `rm -f e` be a `test_when_finished`, to make it
> clearer that it is a cleanup?

test_when_finished is cleaner.

> 
>> +       git reset --hard &&
>> +       git sparse-checkout set a &&
>> +       touch b &&
>> +       test_must_fail git mv b e 2>stderr &&
> 
> Here we try to move a "tracked sparse path" to an "untracked sparse
> path". Do we also want to test with a tracked to tracked operation?
> (Although the code path will be the same, of course.)

I can expand these tests to include tracked and untracked targets.

>> +       cat sparse_error_header >expect &&
>> +       echo b >>expect &&
>> +       echo e >>expect &&
>> +       cat sparse_hint >>expect &&
>> +       test_cmp expect stderr
>> +'
>> +
>> +test_expect_success 'mv refuses to move sparse-to-sparse, ignores failure' '
>> +       rm -f e &&
>> +       git reset --hard &&
>> +       git sparse-checkout set a &&
>> +       touch b &&
>> +       git mv -k b e 2>stderr &&
> 
> Maybe also check that `b` is still there, and `e` is missing?

Good idea.

In fact, there is a problem that the '-k' gets around the
protections because it doesn't return 1 early. I'll fix this
by jumping to the end of the loop which removes the entries
from the arrays.

Thanks,
-Stolee

@@ -118,21 +118,23 @@ static int index_range_of_same_dir(const char *src, int length,
int cmd_mv(int argc, const char **argv, const char *prefix)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Matheus Tavares Bernardino wrote (reply to this):

On Tue, Aug 24, 2021 at 6:54 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> diff --git a/builtin/mv.c b/builtin/mv.c
> index b58fd4ce5ba..92ea9f0ca37 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -118,17 +118,18 @@ static int index_range_of_same_dir(const char *src, int length,
>  int cmd_mv(int argc, const char **argv, const char *prefix)
>  {
>         int i, flags, gitmodules_modified = 0;
> -       int verbose = 0, show_only = 0, force = 0, ignore_errors = 0;
> +       int verbose = 0, show_only = 0, force = 0, ignore_errors = 0, ignore_sparse = 0;
>         struct option builtin_mv_options[] = {
>                 OPT__VERBOSE(&verbose, N_("be verbose")),
>                 OPT__DRY_RUN(&show_only, N_("dry run")),
>                 OPT__FORCE(&force, N_("force move/rename even if target exists"),
>                            PARSE_OPT_NOCOMPLETE),
>                 OPT_BOOL('k', NULL, &ignore_errors, N_("skip move/rename errors")),
> +               OPT_BOOL(0, "sparse", &ignore_sparse, N_("allow updating entries outside of the sparse-checkout cone")),

Should this include a doc update too?

>                 OPT_END(),
>         };
>         const char **source, **destination, **dest_path, **submodule_gitfile;
> -       enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX } *modes;
> +       enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
>         struct stat st;
>         struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
>         struct lock_file lock_file = LOCK_INIT;
> @@ -182,11 +183,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>                 if (show_only)
>                         printf(_("Checking rename of '%s' to '%s'\n"), src, dst);
>
> -               if (!path_in_sparse_checkout(src, &the_index)) {
> +               if (!ignore_sparse && !path_in_sparse_checkout(src, &the_index)) {
>                         string_list_append(&only_match_skip_worktree, src);
>                         skip_sparse = 1;
>                 }
> -               if (!path_in_sparse_checkout(dst, &the_index)) {
> +               if (!ignore_sparse && !path_in_sparse_checkout(dst, &the_index)) {
>                         string_list_append(&only_match_skip_worktree, dst);
>                         skip_sparse = 1;
>                 }
> @@ -194,9 +195,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>                         continue;
>
>                 length = strlen(src);
> -               if (lstat(src, &st) < 0)
> -                       bad = _("bad source");
> -               else if (!strncmp(src, dst, length) &&
> +               if (lstat(src, &st) < 0) {
> +                       /* only error if existence is expected. */
> +                       if (modes[i] != SPARSE)
> +                               bad = _("bad source");

OK, so this is about the directories which contain sparse entries in
it, right? In this case, we don't expect such entries to be in the
working tree, so we don't error out if they are missing and still let
the parent directory be moved.

This made me wonder about a slightly different case: would it be
interesting to also allow `git mv --sparse` to rename a single sparse
entry even if it's not in the working tree? I mean, something like:

echo a >a
echo b >b
git add a b
git commit -m files
git sparse-checkout set a
git mv --sparse b c

This currently wouldn't be allowed because "b" is not in the working
tree ("fatal: bad source" error). But, perhaps, would it be
interesting to allow the index to be updated anyway?

> +               } else if (!strncmp(src, dst, length) &&
>                                 (dst[length] == 0 || dst[length] == '/')) {
>                         bad = _("can not move directory into itself");
>                 } else if ((src_is_dir = S_ISDIR(st.st_mode))
> @@ -225,11 +228,12 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>                                 dst_len = strlen(dst);
>
>                                 for (j = 0; j < last - first; j++) {
> -                                       const char *path = active_cache[first + j]->name;
> +                                       const struct cache_entry *ce = active_cache[first + j];
> +                                       const char *path = ce->name;
>                                         source[argc + j] = path;
>                                         destination[argc + j] =
>                                                 prefix_path(dst, dst_len, path + length + 1);
> -                                       modes[argc + j] = INDEX;
> +                                       modes[argc + j] = ce_skip_worktree(ce) ? SPARSE : INDEX;

OK, here we are assigning the SPARSE mode to sparse index entries that
are inside a directory we want to move. Later iterations of the loop
will then process these entries, see this mode, and not error out if
the files are missing.

>                                         submodule_gitfile[argc + j] = NULL;
>                                 }
>                                 argc += last - first;
> @@ -293,7 +297,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>                         printf(_("Renaming %s to %s\n"), src, dst);
>                 if (show_only)
>                         continue;
> -               if (mode != INDEX && rename(src, dst) < 0) {
> +               if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {

And finally, we don't want to rename the SPARSE working tree file (if
it exists) because its parent directory was already moved.

>                         if (ignore_errors)
>                                 continue;
>                         die_errno(_("renaming '%s' failed"), src);
> diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh
> index 5397c6d07bd..517fd587fa8 100755
> --- a/t/t7002-mv-sparse-checkout.sh
> +++ b/t/t7002-mv-sparse-checkout.sh
> @@ -31,7 +31,9 @@ test_expect_success 'mv refuses to move sparse-to-sparse' '
>         echo b >>expect &&
>         echo e >>expect &&
>         cat sparse_hint >>expect &&
> -       test_cmp expect stderr
> +       test_cmp expect stderr &&
> +       git mv --sparse b e 2>stderr &&
> +       test_must_be_empty stderr
>  '
>
>  test_expect_success 'mv refuses to move sparse-to-sparse, ignores failure' '
> @@ -44,7 +46,9 @@ test_expect_success 'mv refuses to move sparse-to-sparse, ignores failure' '
>         echo b >>expect &&
>         echo e >>expect &&
>         cat sparse_hint >>expect &&
> -       test_cmp expect stderr
> +       test_cmp expect stderr &&
> +       git mv --sparse -k b e 2>stderr &&
> +       test_must_be_empty stderr

Nit: Isn't this case a bit redundant considering the test before this
one? That is, with `--sparse` there should be no error for `-k` to
ignore, and in the test above it we already checked that this command
indeed succeeds with `--sparse`.

>
>  test_expect_success 'recursive mv refuses to move (possible) sparse' '
> @@ -80,7 +88,14 @@ test_expect_success 'recursive mv refuses to move (possible) sparse' '
>         echo sub >>expect &&
>         echo sub2 >>expect &&
>         cat sparse_hint >>expect &&
> -       test_cmp expect stderr
> +       test_cmp expect stderr &&
> +       git mv --sparse sub sub2 2>stderr &&
> +       test_must_be_empty stderr &&
> +       git commit -m "moved sub to sub2" &&
> +       git rev-parse HEAD~1:sub >expect &&
> +       git rev-parse HEAD:sub2 >actual &&
> +       test_cmp expect actual &&
> +       git reset --hard HEAD~1

Perhaps this could be a `test_when_finished` (maybe right after the
`git commit` invocation), so that we can restore the original state
for the next tests even if this one fails?

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 27, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Sep 24, 2021 at 8:39 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
...
> Updates in v4
> =============
>
>  * Instead of using 'git status' and 'grep' to detect staged changes, we use
>    'git diff --staged'. t1092 uses an additional --diff-filter because it
>    tests with merge conflicts, so it needs this extra flag.
>
>  * Patches 3 and 4 are merged into the new patch 3 to avoid temporarily
>    having a poorly named method.
>
>
...
>
> Range-diff vs v3:
>
>   1:  ea940f10a7c !  1:  642b05fc020 t3705: test that 'sparse_entry' is unstaged
>      @@ t/t3705-add-sparse-checkout.sh: setup_gitignore () {
>        }
>
>       +test_sparse_entry_unstaged () {
>      -+ git status --porcelain >actual &&
>      -+ ! grep "^[MDARCU][M ] sparse_entry\$" actual
>      ++ git diff --staged -- sparse_entry >diff &&
>      ++ test_must_be_empty diff
>       +}
>       +
>        test_expect_success 'setup' "
>   2:  c7dedb41291 !  2:  58389edc76c t1092: behavior for adding sparse files
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_match () {
>       + file=$1 &&
>       + for repo in sparse-checkout sparse-index
>       + do
>      -+         git -C $repo status --porcelain >$repo-out &&
>      -+         ! grep "^A  $file\$" $repo-out &&
>      -+         ! grep "^M  $file\$" $repo-out || return 1
>      ++         # Skip "unmerged" paths
>      ++         git -C $repo diff --staged --diff-filter=ACDMRTXB -- "$file" >diff &&

Wouldn't this be more naturally spelled as --diff-filter=u ? (Note:
lowercase 'u', not uppercase.)  Then you could drop the comment too.

Other than that nit, this round looks good to me.  Feel free to add a

Reviewed-by: Elijah Newren <newren@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 27, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

>>   2:  c7dedb41291 !  2:  58389edc76c t1092: behavior for adding sparse files
>>      @@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_match () {
>>       + file=$1 &&
>>       + for repo in sparse-checkout sparse-index
>>       + do
>>      -+         git -C $repo status --porcelain >$repo-out &&
>>      -+         ! grep "^A  $file\$" $repo-out &&
>>      -+         ! grep "^M  $file\$" $repo-out || return 1
>>      ++         # Skip "unmerged" paths
>>      ++         git -C $repo diff --staged --diff-filter=ACDMRTXB -- "$file" >diff &&
>
> Wouldn't this be more naturally spelled as --diff-filter=u ? (Note:
> lowercase 'u', not uppercase.)  Then you could drop the comment too.

Excellent.

>
> Other than that nit, this round looks good to me.  Feel free to add a
>
> Reviewed-by: Elijah Newren <newren@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 27, 2021

This patch series was integrated into seen via git@79acbf9.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 28, 2021

There was a status update in the "Cooking" section about the branch ds/add-rm-with-sparse-index on the Git mailing list:

"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.

Will merge to 'next'?

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 28, 2021

This patch series was integrated into seen via git@1a35cff.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 28, 2021

This patch series was integrated into seen via git@cedd1c9.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@c5e54cf.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@86ffc5f.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@7374b78.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 29, 2021

This patch series was integrated into seen via git@465984d.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 1, 2021

There was a status update in the "Cooking" section about the branch ds/add-rm-with-sparse-index on the Git mailing list:

"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.

Will merge to 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 2, 2021

This patch series was integrated into seen via git@7022f8a.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 4, 2021

This patch series was integrated into seen via git@95c46d3.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 4, 2021

There was a status update in the "Cooking" section about the branch ds/add-rm-with-sparse-index on the Git mailing list:

"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.

Will merge to 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 6, 2021

This patch series was integrated into seen via git@fbae8f3.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 6, 2021

This patch series was integrated into next via git@80a9cda.

@gitgitgadget gitgitgadget bot added the next label Oct 6, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 7, 2021

There was a status update in the "Cooking" section about the branch ds/add-rm-with-sparse-index on the Git mailing list:

"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.

Will merge to 'master'.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2021

On the Git mailing list, Sean Christopherson wrote (reply to this):

On Sun, Sep 12, 2021, Derrick Stolee via GitGitGadget wrote:
> This series is based on ds/mergies-with-sparse-index.
> 
> As requested, this series looks to update the behavior of git add, git rm,
> and git mv when they attempt to modify paths outside of the sparse-checkout
> cone. In particular, this care is expanded to not just cache entries with
> the SKIP_WORKTREE bit, but also paths that do not match the sparse-checkout
> definition.

I suspect something in this series broke 'git add' and friends with "odd" sparse
definitions (I haven't actually bisected).  git 2.33.0 rejects attempts to add
files with the below sparse-checkout and modified files.  There appears to be a
discrepancy in the query vs. checkout logic as the rejected files are checked out
in the working tree, e.g. git sees that the local file was deleted, yet will not
stage the deletion.

There's also arguably a flaw in the "advise" trigger.  AFAICT, the help message
is displayed if and only if the entire path is excluded from the working tree.
In my perfect world, git would complain and advise if there are unstaged changes
for tracked files covered by the specified path.

Note, my sparse-checkout is very much the result of trial and error to get the
exact files I care about.  It's entirely possible I'm doing something weird, but
at the same time git itself is obviously confused.

Thanks!

$ cat .git/info/sparse-checkout
!arch/*
!tools/arch/*
!virt/kvm/arm/*
/*
arch/.gitignore
arch/Kconfig
arch/x86
tools/arch/x86
tools/include/uapi/linux/kvm.h
!Documentation
!drivers

$ git read-tree -mu HEAD

$ rm arch/x86/kvm/x86.c

$ git commit -a
On branch x86/kvm_find_cpuid_entry_index
Your branch is up to date with 'kvm/queue'.

You are in a sparse checkout with 40% of tracked files present.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    arch/x86/kvm/x86.c

no changes added to commit (use "git add" and/or "git commit -a")

$ git add arch

$ git add .

$ git add arch/x86
The following paths and/or pathspecs matched paths that exist
outside of your sparse-checkout definition, so will not be
updated in the index:
arch/x86
hint: If you intend to update such entries, try one of the following:
hint: * Use the --sparse option.
hint: * Disable or modify the sparsity rules.
hint: Disable this message with "git config advice.updateSparsePath false"

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2021

User Sean Christopherson <seanjc@google.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 19, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/18/2021 5:28 PM, Sean Christopherson wrote:
> On Sun, Sep 12, 2021, Derrick Stolee via GitGitGadget wrote:
>> This series is based on ds/mergies-with-sparse-index.
>>
>> As requested, this series looks to update the behavior of git add, git rm,
>> and git mv when they attempt to modify paths outside of the sparse-checkout
>> cone. In particular, this care is expanded to not just cache entries with
>> the SKIP_WORKTREE bit, but also paths that do not match the sparse-checkout
>> definition.
> 
> I suspect something in this series broke 'git add' and friends with "odd" sparse
> definitions (I haven't actually bisected).  git 2.33.0 rejects attempts to add
> files with the below sparse-checkout and modified files.  There appears to be a
> discrepancy in the query vs. checkout logic as the rejected files are checked out
> in the working tree, e.g. git sees that the local file was deleted, yet will not
> stage the deletion.

Are you using v2.33.0? This change is not in that version.

However, mt/add-rm-in-sparse-checkout [1] was introduced in v2.33.0 and
introduced these advice suggestions.

[1] https://github.com/git/git/compare/a5828ae6b52137b913b978e16cd2334482eb4c1f...d5f4b8260f623d6fdef36d5eaa8a0c2350390472

The series you are commenting on goes even farther in restricting adds to
be within the sparse-checkout definitions, even for unstaged files or files
that removed the skip-worktree bit due to a merge conflict. It also creates
an override '--sparse' option that allows you to ignore these protections.

> There's also arguably a flaw in the "advise" trigger.  AFAICT, the help message
> is displayed if and only if the entire path is excluded from the working tree.
> In my perfect world, git would complain and advise if there are unstaged changes
> for tracked files covered by the specified path.
>> Note, my sparse-checkout is very much the result of trial and error to get the
> exact files I care about.  It's entirely possible I'm doing something weird, but
> at the same time git itself is obviously confused.
> 
> Thanks!
> 
> $ cat .git/info/sparse-checkout
> !arch/*
> !tools/arch/*
> !virt/kvm/arm/*
> /*
> arch/.gitignore
> arch/Kconfig
> arch/x86
> tools/arch/x86
> tools/include/uapi/linux/kvm.h
> !Documentation
> !drivers

Have you tried using 'arch/x86/' and 'tools/arch/x86/' to specify
that these are directories? Just a thought.

> $ git read-tree -mu HEAD
> 
> $ rm arch/x86/kvm/x86.c
> 
> $ git commit -a
...
> 	deleted:    arch/x86/kvm/x86.c

This is certainly odd. Worth more investigation that I don't have
time for at this moment.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 19, 2021

On the Git mailing list, Sean Christopherson wrote (reply to this):

On Tue, Oct 19, 2021, Derrick Stolee wrote:
> On 10/18/2021 5:28 PM, Sean Christopherson wrote:
> > On Sun, Sep 12, 2021, Derrick Stolee via GitGitGadget wrote:
> >> This series is based on ds/mergies-with-sparse-index.
> >>
> >> As requested, this series looks to update the behavior of git add, git rm,
> >> and git mv when they attempt to modify paths outside of the sparse-checkout
> >> cone. In particular, this care is expanded to not just cache entries with
> >> the SKIP_WORKTREE bit, but also paths that do not match the sparse-checkout
> >> definition.
> > 
> > I suspect something in this series broke 'git add' and friends with "odd" sparse
> > definitions (I haven't actually bisected).  git 2.33.0 rejects attempts to add
> > files with the below sparse-checkout and modified files.  There appears to be a
> > discrepancy in the query vs. checkout logic as the rejected files are checked out
> > in the working tree, e.g. git sees that the local file was deleted, yet will not
> > stage the deletion.
> 
> Are you using v2.33.0? This change is not in that version.

Hrm, it's an internal build that says v2.33.0 is the bsae, but the --sparse option
is available so who knows what's actually underneath the hood.  I can try vanilla
upstream builds if that would help narrow down the issue.

> However, mt/add-rm-in-sparse-checkout [1] was introduced in v2.33.0 and
> introduced these advice suggestions.
> 
> [1] https://github.com/git/git/compare/a5828ae6b52137b913b978e16cd2334482eb4c1f...d5f4b8260f623d6fdef36d5eaa8a0c2350390472
>
> The series you are commenting on goes even farther in restricting adds to
> be within the sparse-checkout definitions, even for unstaged files or files
> that removed the skip-worktree bit due to a merge conflict. It also creates
> an override '--sparse' option that allows you to ignore these protections.
> 
> > There's also arguably a flaw in the "advise" trigger.  AFAICT, the help message
> > is displayed if and only if the entire path is excluded from the working tree.
> > In my perfect world, git would complain and advise if there are unstaged changes
> > for tracked files covered by the specified path.
> >> Note, my sparse-checkout is very much the result of trial and error to get the
> > exact files I care about.  It's entirely possible I'm doing something weird, but
> > at the same time git itself is obviously confused.
> > 
> > Thanks!
> > 
> > $ cat .git/info/sparse-checkout
> > !arch/*
> > !tools/arch/*
> > !virt/kvm/arm/*
> > /*
> > arch/.gitignore
> > arch/Kconfig
> > arch/x86
> > tools/arch/x86
> > tools/include/uapi/linux/kvm.h
> > !Documentation
> > !drivers
> 
> Have you tried using 'arch/x86/' and 'tools/arch/x86/' to specify
> that these are directories? Just a thought.

Nice!  That workaround resolves the issue.  I vaguely recall intentionally omitting
the trailing slash, but adding it back doesn't seem to have any unwanted side effects
on the current git versions I'm using.

> > $ git read-tree -mu HEAD
> > 
> > $ rm arch/x86/kvm/x86.c
> > 
> > $ git commit -a
> ...
> > 	deleted:    arch/x86/kvm/x86.c
> 
> This is certainly odd. Worth more investigation that I don't have
> time for at this moment.

I've no objection to punting on this now that I have a workaround.  The man pages
are quite clear that sparse checkouts are still experimental and it's no trouble
for me to whine again if something breaks in the future :-)

Thanks again!

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Sean Christopherson <seanjc@google.com> writes:

>> Are you using v2.33.0? This change is not in that version.
>
> Hrm, it's an internal build that says v2.33.0 is the bsae, but the --sparse option
> is available so who knows what's actually underneath the hood.  I can try vanilla
> upstream builds if that would help narrow down the issue.

$ git version

Guessing from the e-mail address, perhaps you are using something
derived from the next branch of the day, maintained by Jonathan
Nieder's group, for internal consumption at Google.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2021

On the Git mailing list, Sean Christopherson wrote (reply to this):

On Wed, Oct 20, 2021, Junio C Hamano wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> >> Are you using v2.33.0? This change is not in that version.
> >
> > Hrm, it's an internal build that says v2.33.0 is the bsae, but the --sparse option
> > is available so who knows what's actually underneath the hood.  I can try vanilla
> > upstream builds if that would help narrow down the issue.
> 
> $ git version
> 
> Guessing from the e-mail address, perhaps you are using something
> derived from the next branch of the day, maintained by Jonathan
> Nieder's group, for internal consumption at Google.

That's more than likely the case.  2.33.0.1079.g6e70778dc9-goog

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 22, 2021

On the Git mailing list, Matheus Tavares wrote (reply to this):

On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@google.com> wrote:
>
> $ cat .git/info/sparse-checkout
> !arch/*
> !tools/arch/*
> !virt/kvm/arm/*
> /*
> arch/.gitignore
> arch/Kconfig
> arch/x86
> tools/arch/x86
> tools/include/uapi/linux/kvm.h
> !Documentation
> !drivers
>
> $ git read-tree -mu HEAD
>
> $ rm arch/x86/kvm/x86.c
[...]
> $ git add arch/x86
> The following paths and/or pathspecs matched paths that exist
> outside of your sparse-checkout definition, so will not be
> updated in the index:
> arch/x86

I think the problem may be that we are performing pattern matching
slightly different in add, mv, and rm, in comparison to "git
sparse-checkout". On "git sparse-checkout init" (or reapply), we call
clear_ce_flags() which calls path_matches_pattern_list() for each
component of the working tree paths. If the full path gives a match
result of UNDECIDED, we recursively try to use the match result from
the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED).

In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but
"arch/x86" gives MATCHED, so we end up using that for the full path.

However, in add|mv|rm we only call path_matches_pattern_list() for the
full path and get UNDECIDED, which we consider the same as NOT_MATCHED,
and end up disallowing the path update operation with a warning message.

The commands do work if we replace the sparsity pattern "arch/x86" with
"arch/x86/" (with a trailing slash), but note that it only works
because the pattern is relative to the root (see dir.c:1297). If we
change it to "x86/", it would no longer work.

So far, the only way I could think of to fix this would be to perform
pattern matching for the leading components of the paths too. That
doesn't seem very nice, though, as it can probably be quite expensive...
But here is a patch for discussion:

-- >8 --
Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs

These three commands recently learned to avoid updating paths that do
not match the sparse-checkout patterns even if they are missing the
SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which
tries to match the path with the current set of sparsity rules using
path_matches_pattern_list(). This is similar to what clear_ce_flags()
does when we run "git sparse-checkout init" or "git sparse-checkout
reapply". But note that clear_ce_flags() has a recursive behavior,
calling path_matches_pattern_list() for each component in a path,
whereas path_in_sparse_checkout() only calls it for the full path. This
makes the function miss matches such as the one between path "a/b/c" and
the pattern "b/". So if the user has the sparsity rules "!/a" and "b/",
for example, add, rm, and mv will fail to update the path "a/b/c" and
end up displaying a warning about "a/b/c" being outside the sparse
checkout even though it isn't. Note that this problem only occurs with
non-cone mode.

Fix this by making path_in_sparse_checkout() perform pattern matching
for every component in the given path when cone mode is disabled. (This
can be expensive, and we might want to do some form of caching for the
match results of the leading components. However, this is not
implemented in this patch.) Also add two tests for each command (add,
rm, and mv) to check that they behave correctly with the said pattern
matching. The first test would previously fail without this patch, while
the second already succeeded. It is added mostly to make sure that we
are not breaking the existing pattern matching for directories that are
really sparse, and also as a protection against any future
regressions.

Note that two other existing tests had to be changed: one test in t3602
checks that "git rm -r <dir>" won't remove sparse entries, but it
didn't allow the non-sparse entries inside <dir> to be removed. The
other one, in t7002, tested that "git mv" would correctly display a
warning message for sparse paths, but it accidentally expected the
message to include two non-sparse paths as well.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 dir.c                          | 33 ++++++++++++++++++++++++------
 t/t3602-rm-sparse-checkout.sh  | 37 +++++++++++++++++++++++++++++++---
 t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++
 t/t7002-mv-sparse-checkout.sh  | 28 +++++++++++++++++++++++--
 4 files changed, 105 insertions(+), 11 deletions(-)

diff --git a/dir.c b/dir.c
index a4306ab874..225487a59c 100644
--- a/dir.c
+++ b/dir.c
@@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
 				     struct index_state *istate,
 				     int require_cone_mode)
 {
-	const char *base;
 	int dtype = DT_REG;
+	enum pattern_match_result ret = NOT_MATCHED;
+	const char *p, *last_slash = NULL;
 
 	/*
 	 * We default to accepting a path if there are no patterns or
@@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
 	     !istate->sparse_checkout_patterns->use_cone_patterns))
 		return 1;
 
-	base = strrchr(path, '/');
-	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
-					 &dtype,
-					 istate->sparse_checkout_patterns,
-					 istate) > 0;
+	if (istate->sparse_checkout_patterns->use_cone_patterns) {
+		const char *base = strrchr(path, '/');
+		return path_matches_pattern_list(path, strlen(path),
+				base ? base + 1 : path, &dtype,
+				istate->sparse_checkout_patterns, istate) > 0;
+	}
+
+	for (p = path; ; p++) {
+		enum pattern_match_result match;
+
+		if (*p && *p != '/')
+			continue;
+
+		match  = path_matches_pattern_list(path, p - path,
+				last_slash ? last_slash + 1 : path, &dtype,
+				istate->sparse_checkout_patterns, istate);
+
+		if (match != UNDECIDED)
+			ret = match;
+		if (!*p)
+			break;
+		last_slash = p;
+	}
+
+	return ret;
 }
 
 int path_in_sparse_checkout(const char *path,
diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
index ecce497a9c..6e127b966e 100755
--- a/t/t3602-rm-sparse-checkout.sh
+++ b/t/t3602-rm-sparse-checkout.sh
@@ -40,14 +40,20 @@ done
 test_expect_success 'recursive rm does not remove sparse entries' '
 	git reset --hard &&
 	git sparse-checkout set sub/dir &&
-	test_must_fail git rm -r sub &&
-	git rm --sparse -r sub &&
+	git rm -r sub &&
 	git status --porcelain -uno >actual &&
 	cat >expected <<-\EOF &&
+	D  sub/dir/e
+	EOF
+	test_cmp expected actual &&
+
+	git rm --sparse -r sub &&
+	git status --porcelain -uno >actual2 &&
+	cat >expected2 <<-\EOF &&
 	D  sub/d
 	D  sub/dir/e
 	EOF
-	test_cmp expected actual
+	test_cmp expected2 actual2
 '
 
 test_expect_success 'recursive rm --sparse removes sparse entries' '
@@ -105,4 +111,29 @@ test_expect_success 'refuse to rm a non-skip-worktree path outside sparse cone'
 	test_path_is_missing b
 '
 
+test_expect_success 'can remove files from non-sparse dir' '
+	git reset --hard &&
+	git sparse-checkout disable &&
+	mkdir -p w x/y &&
+	test_commit w/f &&
+	test_commit x/y/f &&
+
+	git sparse-checkout set w !/x y &&
+	git rm w/f.t x/y/f.t 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to remove non-skip-worktree file from sparse dir' '
+	git reset --hard &&
+	git sparse-checkout disable &&
+	mkdir -p x/y/z &&
+	test_commit x/y/z/f &&
+	git sparse-checkout set !/x y !x/y/z &&
+
+	git update-index --no-skip-worktree x/y/z/f.t &&
+	test_must_fail git rm x/y/z/f.t 2>stderr &&
+	echo x/y/z/f.t | cat sparse_error_header - sparse_hint >expect &&
+	test_cmp expect stderr
+'
+
 test_done
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 5b904988d4..63f888af12 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -214,4 +214,22 @@ test_expect_success 'add allows sparse entries with --sparse' '
 	test_must_be_empty stderr
 '
 
+test_expect_success 'can add files from non-sparse dir' '
+	git sparse-checkout set w !/x y &&
+	mkdir -p w x/y &&
+	touch w/f x/y/f &&
+	git add w/f x/y/f 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to add non-skip-worktree file from sparse dir' '
+	git sparse-checkout set !/x y !x/y/z &&
+	mkdir -p x/y/z &&
+	touch x/y/z/f &&
+	test_must_fail git add x/y/z/f 2>stderr &&
+	echo x/y/z/f | cat sparse_error_header - sparse_hint >expect &&
+	test_cmp expect stderr
+
+'
+
 test_done
diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh
index 545748949a..91a857bf05 100755
--- a/t/t7002-mv-sparse-checkout.sh
+++ b/t/t7002-mv-sparse-checkout.sh
@@ -143,8 +143,6 @@ test_expect_success 'recursive mv refuses to move (possible) sparse' '
 	cat >>expect <<-\EOF &&
 	sub/d
 	sub2/d
-	sub/dir/e
-	sub2/dir/e
 	sub/dir2/e
 	sub2/dir2/e
 	EOF
@@ -186,4 +184,30 @@ test_expect_success 'recursive mv refuses to move sparse' '
 	git reset --hard HEAD~1
 '
 
+test_expect_success 'can move files to non-sparse dir' '
+	git reset --hard &&
+	git sparse-checkout init --no-cone &&
+	git sparse-checkout set a b c w !/x y &&
+	mkdir -p w x/y &&
+
+	git mv a w/new-a 2>stderr &&
+	git mv b x/y/new-b 2>stderr &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'refuse to move file to non-skip-worktree sparse path' '
+	git reset --hard &&
+	git sparse-checkout init --no-cone &&
+	git sparse-checkout set a !/x y !x/y/z &&
+	mkdir -p x/y/z &&
+
+	test_must_fail git mv a x/y/z/new-a 2>stderr &&
+	cat sparse_error_header >expect &&
+	cat >>expect <<-\EOF &&
+	x/y/z/new-a
+	EOF
+	cat sparse_hint >>expect &&
+	test_cmp expect stderr
+'
+
 test_done
-- 
2.33.0

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 22, 2021

On the Git mailing list, Matheus Tavares wrote (reply to this):

On Thu, Oct 21, 2021 at 11:28 PM Matheus Tavares <matheus.bernardino@usp.br> wrote:
>
> diff --git a/dir.c b/dir.c
> index a4306ab874..225487a59c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>  
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	if (istate->sparse_checkout_patterns->use_cone_patterns) {
> +		const char *base = strrchr(path, '/');
> +		return path_matches_pattern_list(path, strlen(path),
> +				base ? base + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate) > 0;
> +	}
> +
> +	for (p = path; ; p++) {
> +		enum pattern_match_result match;
> +
> +		if (*p && *p != '/')
> +			continue;
> +
> +		match  = path_matches_pattern_list(path, p - path,
> +				last_slash ? last_slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		if (match != UNDECIDED)
> +			ret = match;
> +		if (!*p)
> +			break;
> +		last_slash = p;
> +	}
> +
> +	return ret;
>  }

Of course, after hitting send I realized it would make a lot more sense
to start the pattern matching from the full path and only go backwards
through the parent dirs until we find the first non-UNDECIDED result.
I.e. something like this:

static int path_in_sparse_checkout_1(const char *path,
				     struct index_state *istate,
				     int require_cone_mode)
{
	int dtype = DT_REG;
	enum pattern_match_result ret;
	const char *p, *base;

	/*
	 * We default to accepting a path if there are no patterns or
	 * they are of the wrong type.
	 */
	if (init_sparse_checkout_patterns(istate) ||
	    (require_cone_mode &&
	     !istate->sparse_checkout_patterns->use_cone_patterns))
		return 1;

	if (istate->sparse_checkout_patterns->use_cone_patterns) {
		base = strrchr(path, '/');
		return path_matches_pattern_list(path, strlen(path),
				base ? base + 1 : path, &dtype,
				istate->sparse_checkout_patterns, istate) > 0;
	}

	/*
	 * If the match for the path is UNDECIDED, try to match the parent dir
	 * recursively.
	 */
	for (p = path + strlen(path); p && p > path; p = base) {
		base = memrchr(path, '/', p - path);
		ret = path_matches_pattern_list(path, p - path,
				base ? base + 1 : path, &dtype,
				istate->sparse_checkout_patterns, istate);

		if (ret != UNDECIDED)
			break;
	}

	return ret == UNDECIDED ? NOT_MATCHED : ret;
}

But I will let others comment on the overall idea and/or other
alternatives before sending a possible v2.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 25, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/21/2021 10:28 PM, Matheus Tavares wrote:
> On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> $ cat .git/info/sparse-checkout
>> !arch/*
>> !tools/arch/*
>> !virt/kvm/arm/*
>> /*
>> arch/.gitignore
>> arch/Kconfig
>> arch/x86
>> tools/arch/x86
>> tools/include/uapi/linux/kvm.h
>> !Documentation
>> !drivers
>>
>> $ git read-tree -mu HEAD
>>
>> $ rm arch/x86/kvm/x86.c
> [...]
>> $ git add arch/x86
>> The following paths and/or pathspecs matched paths that exist
>> outside of your sparse-checkout definition, so will not be
>> updated in the index:
>> arch/x86
> 
> I think the problem may be that we are performing pattern matching
> slightly different in add, mv, and rm, in comparison to "git
> sparse-checkout". On "git sparse-checkout init" (or reapply), we call
> clear_ce_flags() which calls path_matches_pattern_list() for each
> component of the working tree paths. If the full path gives a match
> result of UNDECIDED, we recursively try to use the match result from
> the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED).

Yes! I think this is absolutely the problem. Thanks for pointing
this out!
 
> In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but
> "arch/x86" gives MATCHED, so we end up using that for the full path.
> 
> However, in add|mv|rm we only call path_matches_pattern_list() for the
> full path and get UNDECIDED, which we consider the same as NOT_MATCHED,
> and end up disallowing the path update operation with a warning message.
> 
> The commands do work if we replace the sparsity pattern "arch/x86" with
> "arch/x86/" (with a trailing slash), but note that it only works
> because the pattern is relative to the root (see dir.c:1297). If we
> change it to "x86/", it would no longer work.
> 
> So far, the only way I could think of to fix this would be to perform
> pattern matching for the leading components of the paths too. That
> doesn't seem very nice, though, as it can probably be quite expensive...
> But here is a patch for discussion:

I agree that it is expensive, but that's already the case for the
non-cone sparse-checkout patterns. Hopefully it is sufficient that
these cases are restricted to modified files (in the case of `git add .`)
or specific pathspecs (in the case of `git mv` and `git rm`).

> -- >8 --
> Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs
> 
> These three commands recently learned to avoid updating paths that do
> not match the sparse-checkout patterns even if they are missing the
> SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which
> tries to match the path with the current set of sparsity rules using
> path_matches_pattern_list(). This is similar to what clear_ce_flags()
> does when we run "git sparse-checkout init" or "git sparse-checkout
> reapply". But note that clear_ce_flags() has a recursive behavior,
> calling path_matches_pattern_list() for each component in a path,
> whereas path_in_sparse_checkout() only calls it for the full path. This
> makes the function miss matches such as the one between path "a/b/c" and
> the pattern "b/". So if the user has the sparsity rules "!/a" and "b/",
> for example, add, rm, and mv will fail to update the path "a/b/c" and
> end up displaying a warning about "a/b/c" being outside the sparse
> checkout even though it isn't. Note that this problem only occurs with
> non-cone mode.
> 
> Fix this by making path_in_sparse_checkout() perform pattern matching
> for every component in the given path when cone mode is disabled. (This
> can be expensive, and we might want to do some form of caching for the
> match results of the leading components. However, this is not
> implemented in this patch.) Also add two tests for each command (add,
> rm, and mv) to check that they behave correctly with the said pattern
> matching. The first test would previously fail without this patch, while
> the second already succeeded. It is added mostly to make sure that we
> are not breaking the existing pattern matching for directories that are
> really sparse, and also as a protection against any future
> regressions.
> 
> Note that two other existing tests had to be changed: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it
> didn't allow the non-sparse entries inside <dir> to be removed. The
> other one, in t7002, tested that "git mv" would correctly display a
> warning message for sparse paths, but it accidentally expected the
> message to include two non-sparse paths as well.


> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result ret = NOT_MATCHED;
> +	const char *p, *last_slash = NULL;
>  
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>  
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	if (istate->sparse_checkout_patterns->use_cone_patterns) {
> +		const char *base = strrchr(path, '/');
> +		return path_matches_pattern_list(path, strlen(path),
> +				base ? base + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate) > 0;
> +	}
> +
> +	for (p = path; ; p++) {
> +		enum pattern_match_result match;
> +
> +		if (*p && *p != '/')
> +			continue;
> +
> +		match  = path_matches_pattern_list(path, p - path,
> +				last_slash ? last_slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		if (match != UNDECIDED)
> +			ret = match;
> +		if (!*p)
> +			break;
> +		last_slash = p;
> +	}
> +
> +	return ret;

This implementation makes sense to me.

>  test_expect_success 'recursive rm does not remove sparse entries' '
>  	git reset --hard &&
>  	git sparse-checkout set sub/dir &&
> -	test_must_fail git rm -r sub &&
> -	git rm --sparse -r sub &&
> +	git rm -r sub &&

Interesting that the new pattern-matching already presents a change of
behavior in this test case.

>  	git status --porcelain -uno >actual &&
>  	cat >expected <<-\EOF &&
> +	D  sub/dir/e
> +	EOF
> +	test_cmp expected actual &&

And here is why. Excellent. I suppose that setting the pattern to be
"sub/dir/" would have shown this behavior before.

> +
> +	git rm --sparse -r sub &&
> +	git status --porcelain -uno >actual2 &&
> +	cat >expected2 <<-\EOF &&
>  	D  sub/d
>  	D  sub/dir/e
>  	EOF
> -	test_cmp expected actual
> +	test_cmp expected2 actual2
>  '

The rest of the test cases add new checks that are very valuable.

I love this idea and I agree that it would be better to change the
loop direction to match the full path first (as you mention in your
response).

Thanks,
-Stolee

@@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
* then our prefix match is all we need; we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):


This patch changes the behavior of .gitignore such that directories are
now matched by prefix instead of matching exactly.

The failure that we observed is something like the following:

In "a/.gitignore", we have the pattern "git/". We should expect that
"a/git/foo" to be ignored because "git/" should be matched exactly.
However, "a/git-foo/bar" is also ignored because "git-foo" matches the
prefix.

I'll prepare a test case for this as soon as I figure out how to write
it..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Glen Choo <chooglen@google.com> writes:

> This patch changes the behavior of .gitignore such that directories are
> now matched by prefix instead of matching exactly.
>
> The failure that we observed is something like the following:
>
> In "a/.gitignore", we have the pattern "git/". We should expect that
> "a/git/foo" to be ignored because "git/" should be matched exactly.
> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
> prefix.
>
> I'll prepare a test case for this as soon as I figure out how to write
> it..

FWIW, reverting this commit (and nothing else) from 'master' does not
seem to break any test.  That does not necessarily mean much (it may
be indicating a gap in the test coverage).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/1/2021 8:34 PM, Junio C Hamano wrote:
> Glen Choo <chooglen@google.com> writes:
> 
>> This patch changes the behavior of .gitignore such that directories are
>> now matched by prefix instead of matching exactly.

Thank you for pointing out an unintended consequence.

>> The failure that we observed is something like the following:
>>
>> In "a/.gitignore", we have the pattern "git/". We should expect that
>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>> prefix.
>>
>> I'll prepare a test case for this as soon as I figure out how to write
>> it..
> 
> FWIW, reverting this commit (and nothing else) from 'master' does not
> seem to break any test.  That does not necessarily mean much (it may
> be indicating a gap in the test coverage).

Hm. I definitely seemed confident in my commit message that this change
was important for a test I was introducing at some point. Let me revert
it in microsoft/git and see if our Scalar functional tests find an
issue.

In the meantime, I'll try to create a Git test that demonstrates a
problem one way or another.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/2/2021 9:42 AM, Derrick Stolee wrote:
> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>> Glen Choo <chooglen@google.com> writes:
>>
>>> This patch changes the behavior of .gitignore such that directories are
>>> now matched by prefix instead of matching exactly.
> 
> Thank you for pointing out an unintended consequence.
> 
>>> The failure that we observed is something like the following:
>>>
>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>> prefix.
>>>
>>> I'll prepare a test case for this as soon as I figure out how to write
>>> it..
...
> In the meantime, I'll try to create a Git test that demonstrates a
> problem one way or another.

I created a test, but had some trouble reproducing it due to some
subtleties higher in the call stack. Here is a patch that reverts
the change and adds some tests.

The Scalar functional tests passed with the revert, so the original
patch was worthless to begin with. I don't recall what motivated the
change, but clearly it was a mistake. Sorry.

---- >8 ----

From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 2 Nov 2021 10:40:06 -0400
Subject: [PATCH] dir: fix directory-matching bug

This reverts the change from ed49584 (dir: fix pattern matching on dirs,
2021-09-24), which claimed to fix a directory-matching problem without a
test case. It turns out to _create_ a bug, but it is a bit subtle.

The bug would have been revealed by the first of two tests being added to
t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
would fail before the revert.

The second test shows what happens if the test instead uses a pattern "git/"
and this test passes both before and after the revert.

The difference in these two cases are due to how
last_matching_pattern_from_list() checks patterns both if they have the
PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
match_pathname() not affect the end result of
last_matching_pattern_from_list().

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c              |  2 +-
 t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index c6d7a8647b9..94489298f4c 100644
--- a/dir.c
+++ b/dir.c
@@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
 		 * then our prefix match is all we need; we
 		 * do not need to call fnmatch at all.
 		 */
-		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
+		if (!patternlen && !namelen)
 			return 1;
 	}
 
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 532637de882..1889cfc60e0 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
 	grep top-level-dir actual
 '
 
+test_expect_success 'exact prefix matching (with root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo /git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'exact prefix matching (without root)' '
+	test_when_finished rm -r a &&
+	mkdir -p a/git a/git-foo &&
+	touch a/git/foo a/git-foo/bar &&
+	echo git/ >a/.gitignore &&
+	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
+	cat >expect <<-\EOF &&
+	a/git
+	a/git/foo
+	EOF
+	test_cmp expect actual
+'
+
 ############################################################################
 #
 # test whitespace handling
-- 
2.34.0.vfs.0.0.rc0.dirty

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Nov 02 2021, Derrick Stolee wrote:

> On 11/2/2021 9:42 AM, Derrick Stolee wrote:
>> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>>> Glen Choo <chooglen@google.com> writes:
>>>
>>>> This patch changes the behavior of .gitignore such that directories are
>>>> now matched by prefix instead of matching exactly.
>> 
>> Thank you for pointing out an unintended consequence.
>> 
>>>> The failure that we observed is something like the following:
>>>>
>>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>>> prefix.
>>>>
>>>> I'll prepare a test case for this as soon as I figure out how to write
>>>> it..
> ...
>> In the meantime, I'll try to create a Git test that demonstrates a
>> problem one way or another.
>
> I created a test, but had some trouble reproducing it due to some
> subtleties higher in the call stack. Here is a patch that reverts
> the change and adds some tests.
>
> The Scalar functional tests passed with the revert, so the original
> patch was worthless to begin with. I don't recall what motivated the
> change, but clearly it was a mistake. Sorry.
>
> ---- >8 ----
>
> From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Tue, 2 Nov 2021 10:40:06 -0400
> Subject: [PATCH] dir: fix directory-matching bug
>
> This reverts the change from ed49584 (dir: fix pattern matching on dirs,
> 2021-09-24), which claimed to fix a directory-matching problem without a
> test case. It turns out to _create_ a bug, but it is a bit subtle.
>
> The bug would have been revealed by the first of two tests being added to
> t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
> file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
> would fail before the revert.
>
> The second test shows what happens if the test instead uses a pattern "git/"
> and this test passes both before and after the revert.
>
> The difference in these two cases are due to how
> last_matching_pattern_from_list() checks patterns both if they have the
> PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
> the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
> match_pathname() not affect the end result of
> last_matching_pattern_from_list().
>
> Reported-by: Glen Choo <chooglen@google.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c              |  2 +-
>  t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index c6d7a8647b9..94489298f4c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
>  		 * then our prefix match is all we need; we
>  		 * do not need to call fnmatch at all.
>  		 */
> -		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
> +		if (!patternlen && !namelen)
>  			return 1;
>  	}
>  
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 532637de882..1889cfc60e0 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
>  	grep top-level-dir actual
>  '
>  
> +test_expect_success 'exact prefix matching (with root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo /git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'exact prefix matching (without root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  ############################################################################
>  #
>  # test whitespace handling

We have t3070-wildmatch.sh testing various combinations of these, and
indeed this code resolves back to wildmatch().

But I think in this case this is due to dir.c's particular behavior of
splitting paths before feeding them to wildmatch, as it needs to match
things relative to the subdirectory.

Still, we've got a matrix of these in t3070-wildmatch.sh, which already
tests some (but apparently not all) cases where we need to create an
actual file on disk. These sorts of test blindspots are why I added that
in de8bada2bf6 (wildmatch test: create & test files on disk in addition
to in-memory, 2018-01-30).

Wouldn't it be better & more exhaustive here to change its test lines
like:


    match 0 0 1 1 foo/bar/baz 'bar'

To say:

    match 0 0 1 1 ? ? foo/bar/baz 'bar'

And just add to its match() function so that if we have a subject with a
slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
.gitignore file therein with the "foo" directory with the "bar" content,
perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.

Then create a "bar.txt" in the directory as well, and a
bar-otherdir/somefile.txt or whatever.

And fill in the "? ?" depending on whether those variants matched or
not...

Anyway, just an idea, but if you pursue that you should be able to get
much more exhaustive testing in this area that we've apparently had a
blindspot in.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/2/2021 11:33 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> We have t3070-wildmatch.sh testing various combinations of these, and
> indeed this code resolves back to wildmatch().
> 
> But I think in this case this is due to dir.c's particular behavior of
> splitting paths before feeding them to wildmatch, as it needs to match
> things relative to the subdirectory.
> 
> Still, we've got a matrix of these in t3070-wildmatch.sh, which already
> tests some (but apparently not all) cases where we need to create an
> actual file on disk. These sorts of test blindspots are why I added that
> in de8bada2bf6 (wildmatch test: create & test files on disk in addition
> to in-memory, 2018-01-30).
> 
> Wouldn't it be better & more exhaustive here to change its test lines
> like:
> 
> 
>     match 0 0 1 1 foo/bar/baz 'bar'
> 
> To say:
> 
>     match 0 0 1 1 ? ? foo/bar/baz 'bar'
> 
> And just add to its match() function so that if we have a subject with a
> slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
> .gitignore file therein with the "foo" directory with the "bar" content,
> perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.
> 
> Then create a "bar.txt" in the directory as well, and a
> bar-otherdir/somefile.txt or whatever.
> 
> And fill in the "? ?" depending on whether those variants matched or
> not...
> 
> Anyway, just an idea, but if you pursue that you should be able to get
> much more exhaustive testing in this area that we've apparently had a
> blindspot in.

Those tests are quite exhaustive, but the test script is pretty
inscrutable and the refactor you suggest is pretty major. I'd prefer
to keep to the focused test for the sake of fixing this in time for
the release.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> Those tests are quite exhaustive, but the test script is pretty
> inscrutable and the refactor you suggest is pretty major. I'd prefer
> to keep to the focused test for the sake of fixing this in time for
> the release.

Thanks, both.

@gitgitgadget
Copy link

gitgitgadget bot commented Nov 2, 2021

User Glen Choo <chooglen@google.com> has been added to the cc: list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant