Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse index: integrate with 'read-tree' #1157

Closed
wants to merge 8 commits into from
Closed

Sparse index: integrate with 'read-tree' #1157

wants to merge 8 commits into from

Conversation

vdye
Copy link

@vdye vdye commented Feb 22, 2022

Like previous integrations [1] [2], this series allows 'git read-tree' to operate using a sparse index.

The first three patches are bugfixes for issues found while implementing the 'read-tree' integration:

  • The first (patch [1/8]) fixes an edge case in which a repo with no in-cone files or directories would have its root collapsed into a sparse directory; the fix ensures the root directory is never collapsed into a sparse directory.
  • The second (patch [2/8]) corrects the 'git status' reporting of changes nested inside the subdirectory of a sparse directory, ensuring that the modified file (not the subdirectory) is correctly reported as having changes.
  • The third (patch [3/8]) explicitly disallows the somewhat ambiguous case of using 'git read-tree --prefix=/' to represent the repo root (it could be interpreted as a leading slash - even though it's actually trailing slash - which is not allowed for prefixes). Now, an error specifying that prefixes cannot begin with '/' guides users more directly to the correct syntax. If a user does want to specify the repo root as their prefix, that can still be done with 'git read-tree --prefix='

The remainder of the series focuses on utilizing the sparse index in 'git read-tree'. After some baseline behavior-establishing tests (patch [4/8]), sparse index usage is trivially enabled (patch [5/8]) for 'read-tree' except:

  • usage with '--prefix'
  • two- and three-way merge

These cases require additional changes in order to work as expected (i.e., outwardly matching non-sparse index sparse-checkout). For the former, the sparse index can be enabled as long as the index is expanded when the prefix is a directory outside the sparse cone (patch [6/8]). For the latter, sparse directories that cannot be trivially merged must have their contents merged file-by-file, done by recursively traversing the trees represented by the sparse directories (patches [7/8] & [8/8]).

Changes since V2

  • revised/clarified explanation for why the 'recursive' flag is necessary in 'git status' internal diff in both the commit message for [2/8] and the code comments
  • added patch [3/8] to disallow '/' as a prefix
  • moved the loop over different 'read-tree' scenarios outside the test in t1092 test 'read-tree --merge with files outside sparse definition', separating the tests into individually-runnable and traceable cases
  • improved test case clarity and variety in 'read-tree --prefix' tests
  • removed redundant arg to 'git reset' in 'sparse index is not expanded: read-tree'
  • pointed out pre-existing testing that covers '--prefix' in [6/8] commit message
  • moved prefix-dependent index expansion logic into function 'update_sparsity_for_prefix', improved explanation for function behavior in comments
  • pointed out pre-existing testing that covers two- and three-way merge in [7/8] & [8/8] commit messages, respectively
  • added test cases for "ensure not expanded" tests of read-tree for both two- and three-way merge
  • fixed issue with three-way merge implementation (if the "no merge" cases described in 't/t1000-read-tree-m-3way.sh' are reached, sparse directories should be recursively merged), testing added in the "ensure not expanded" cases

Changes since V1

  • switched an empty string check from '!strlen(path)' to the slightly-less-expensive '!*path'

Thanks!

  • Victoria

[1] https://lore.kernel.org/git/pull.1109.v2.git.1641924306.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.1048.v6.git.1638201164.gitgitgadget@gmail.com/

cc: newren@gmail.com
cc: gitster@pobox.com
cc: Derrick Stolee derrickstolee@github.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Glen Choo chooglen@google.com

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 22, 2022

There are issues in commit e30124b:
read-tree: integrate with sparse index
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 22, 2022

There are issues in commit e894f3c:
read-tree: narrow scope of index expansion for --prefix``
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.

@vdye vdye changed the base branch from next to master February 23, 2022 17:34
@vdye
Copy link
Author

vdye commented Feb 23, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 23, 2022

Submitted as pull.1157.git.1645640717.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1157/vdye/sparse/read-tree-v1

To fetch this version to local tag pr-1157/vdye/sparse/read-tree-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1157/vdye/sparse/read-tree-v1

@@ -1463,10 +1463,11 @@ static int path_in_sparse_checkout_1(const char *path,
const char *end, *slash;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/23/2022 1:25 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>

Aside: It looks like you have the same "mailto:" email addresses in the
CC similar to a mistake I made. The root cause of my issue was that I
copy-pasted from the rendered description of another PR instead of
copying the plaintext available by clicking the "Edit" button. Perhaps
this is worth extending GGG to prevent this issue in the future.

> Prevent the repository root from being collapsed into a sparse directory by
> treating an empty path as "inside the sparse-checkout". When collapsing a
> sparse index (e.g. in 'git sparse-checkout reapply'), the root directory
> typically could not become a sparse directory due to the presence of in-cone
> root-level files and directories. However, if no such in-cone files or
> directories were present, there was no explicit check signaling that the
> "repository root path" (an empty string, in the case of
> 'convert_to_sparse(...)') was in-cone, and a sparse directory index entry
> would be created from the repository root directory.
> 
> The documentation in Documentation/git-sparse-checkout.txt explicitly states
> that the files in the root directory are expected to be in-cone for a
> cone-mode sparse-checkout. Collapsing the root into a sparse directory entry
> violates that assumption, as sparse directory entries are expected to be
> outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state
> in turn causes issues with commands that interact with the index, e.g.
> 'git status'.
> 
> Treating an empty (root) path as in-cone prevents the creation of a root
> sparse directory in 'convert_to_sparse(...)'. Because the repository root is
> otherwise never compared with sparse patterns (in both cone-mode and
> non-cone sparse-checkouts), the new check does not cause additional changes
> to how sparse patterns are applied.

Good catch! I agree with everything said here.

> +	if (!strlen(path) ||

Instead of a full strlen(), could we just check "if (!*path ||" to
look for the nul byte?

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 24, 2022

User Derrick Stolee <derrickstolee@github.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 24, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 2/23/2022 1:25 PM, Victoria Dye via GitGitGadget wrote:
> Like previous integrations [1] [2], this series allows 'git read-tree' to
> operate using a sparse index.

(Fixed the CCs on this reply, too.)
 
> The first two patches are bugfixes for issues found while implementing the
> 'read-tree' integration:
> 
>  * The first (patch 1/7) fixes an edge case in which a repo with no in-cone
>    files or directories would have its root collapsed into a sparse
>    directory; the fix ensures the root directory is never collapsed into a
>    sparse directory.
>  * The second (patch 2/7) corrects the 'git status' reporting of changes
>    nested inside the subdirectory of a sparse directory, ensuring that the
>    modified file (not the subdirectory) is correctly reported as having
>    changes.

Thanks for these! I just found one tiny optimization in the first patch.

> The remainder of the series focuses on utilizing the sparse index in 'git
> read-tree'. After some baseline behavior-establishing tests (patch 3/7),
> sparse index usage is trivially enabled (patch 4/7) for 'read-tree' except:
> 
>  * usage with '--prefix'
>  * two- and three-way merge
> 
> These cases require additional changes in order to work as expected (i.e.,
> outwardly matching non-sparse index sparse-checkout). For the former, the
> sparse index can be enabled as long as the index is expanded when the prefix
> is a directory outside the sparse cone (patch 5/7). For the latter, sparse
> directories that cannot be trivially merged must have their contents merged
> file-by-file, done by recursively traversing the trees represented by the
> sparse directories (patches 6/7 & 7/7).

I enjoyed reading these remaining patches. I'm impressed with how you
constructed these tests and patches to do the smallest amount of change
per patch.

I couldn't find any fault in these patches, but perhaps Elijah's deep
experience with merge machinery could help add confidence, especially
for patches 6 & 7.

Thanks,
-Stolee

Prevent the repository root from being collapsed into a sparse directory by
treating an empty path as "inside the sparse-checkout". When collapsing a
sparse index (e.g. in 'git sparse-checkout reapply'), the root directory
typically could not become a sparse directory due to the presence of in-cone
root-level files and directories. However, if no such in-cone files or
directories were present, there was no explicit check signaling that the
"repository root path" (an empty string, in the case of
'convert_to_sparse(...)') was in-cone, and a sparse directory index entry
would be created from the repository root directory.

The documentation in Documentation/git-sparse-checkout.txt explicitly states
that the files in the root directory are expected to be in-cone for a
cone-mode sparse-checkout. Collapsing the root into a sparse directory entry
violates that assumption, as sparse directory entries are expected to be
outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state
in turn causes issues with commands that interact with the index, e.g.
'git status'.

Treating an empty (root) path as in-cone prevents the creation of a root
sparse directory in 'convert_to_sparse(...)'. Because the repository root is
otherwise never compared with sparse patterns (in both cone-mode and
non-cone sparse-checkouts), the new check does not cause additional changes
to how sparse patterns are applied.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
@vdye
Copy link
Author

vdye commented Feb 24, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 24, 2022

Submitted as pull.1157.v2.git.1645742073.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1157/vdye/sparse/read-tree-v2

To fetch this version to local tag pr-1157/vdye/sparse/read-tree-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1157/vdye/sparse/read-tree-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2022

This branch is now known as vd/sparse-read-tree.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 25, 2022

This patch series was integrated into seen via git@26734f4.

@gitgitgadget gitgitgadget bot added the seen label Feb 25, 2022
@@ -244,6 +244,24 @@ test_expect_success 'expanded in-memory index matches full index' '
test_sparse_match git ls-files --stage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add the 'recursive' flag to 'wt_status_collect_changes_index(...)'. Without

Perhaps "Set the 'recursive' diff option flag in
'wt_status_collect_changes_index(...)'" ?  There's no function
argument named 'recursive' in wt_status_collect_changes_index() before
or after your changes, which is what the wording led me to think of.

> the 'recursive' flag, 'git status' could report index changes incorrectly
> when the following conditions were met:
>
> * sparse index is enabled
> * there is a difference between the index and HEAD in a file inside a
>   *subdirectory* of a sparse directory
> * the sparse directory index entry is *not* expanded in-core
>
> In this scenario, 'git status' would not recurse into the sparse directory's
> subdirectories to identify which file contained the difference between the
> index and HEAD. Instead, it would report the immediate subdirectory itself
> as "modified".
>
> Example:
>
> $ git init
> $ mkdir -p sparse/sub
> $ echo test >sparse/sub/foo
> $ git add .
> $ git commit -m "commit 1"
> $ echo somethingelse >sparse/sub/foo
> $ git add .
> $ git commit -a -m "commit 2"
> $ git sparse-checkout set --cone --sparse-index 'sparse'
> $ git reset --soft HEAD~1
> $ git status
> On branch master
> You are in a sparse checkout.
>
> Changes to be committed:
>   (use "git restore --staged <file>..." to unstage)
>         modified:   sparse/sub
>
> The 'recursive' diff option in 'wt_status_collect_changes_index' corrects
> this by indicating that 'git status' should recurse into sparse directories
> to find modified files. Given the same repository setup as the example
> above, the corrected result of `git status` is:
>
> $ git status
> On branch master
> You are in a sparse checkout.
>
> Changes to be committed:
>   (use "git restore --staged <file>..." to unstage)
>         modified:   sparse/sub/foo
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 7 +++++++
>  wt-status.c                              | 9 +++++++++
>  2 files changed, 16 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 9ef7cd80885..b1dcaa0e642 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -278,6 +278,13 @@ test_expect_success 'status with options' '
>         test_all_match git status --porcelain=v2 -uno
>  '
>
> +test_expect_success 'status with diff in unexpanded sparse directory' '
> +       init_repos &&
> +       test_all_match git checkout rename-base &&
> +       test_all_match git reset --soft rename-out-to-out &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  test_expect_success 'status reports sparse-checkout' '
>         init_repos &&
>         git -C sparse-checkout status >full &&
> diff --git a/wt-status.c b/wt-status.c
> index 335e723a71e..4a5b9beeca1 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -651,6 +651,15 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>         rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>         rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>         rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
> +
> +       /*
> +        * The `recursive` option must be enabled to show differences in files
> +        * *more than* one level deep in a sparse directory index entry (e.g., given
> +        * sparse directory 'sparse-dir/', reporting a difference in the file
> +        * 'sparse-dir/another-dir/my-file').
> +        */
> +       rev.diffopt.flags.recursive = 1;

Kind of clever, and makes sense.

I'm wondering if there's an alternate wording that might be helpful
here or in the commit message, that instead of just saying the
'recursive' option is necessary, perhaps says a little bit about why
it helps.  In particular, the diff machinery, by default, is not
recursive and stops at comparing the first level of trees.  (See e.g.
the -r option in diff-tree, it's just that it's turned on by default
in 'git diff' and by the -p option in 'git log'.)  I'm guessing the
recursive option never needed to be turned on previously within
wt-status, due to something about the nature of the index only holding
files previously.  Now, however, the sparse index changes that.  (And
it also suggests that perhaps we should look to see if other commands
run the diff machinery without the recursive flag, and see if they
need it now due to sparse indices.)

Granted, I'm not totally sure how to work these facts in (in part
because I don't know how comparison to the index normally avoids the
need for the recursive flag), and maybe what you have is fine.  Just
thought I'd point it out since I wasn't aware of the non-recursive
nature of the diff machinery until I started doing things with
diff-tree.


> +
>         copy_pathspec(&rev.prune_data, &s->pathspec);
>         run_diff_index(&rev, 1);
>         object_array_clear(&rev.pending);
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add the 'recursive' flag to 'wt_status_collect_changes_index(...)'. Without
> 
> Perhaps "Set the 'recursive' diff option flag in
> 'wt_status_collect_changes_index(...)'" ?  There's no function
> argument named 'recursive' in wt_status_collect_changes_index() before
> or after your changes, which is what the wording led me to think of.
> 

Re-reading that now, I agree. Will fix!

>> the 'recursive' flag, 'git status' could report index changes incorrectly
>> when the following conditions were met:
>>
>> * sparse index is enabled
>> * there is a difference between the index and HEAD in a file inside a
>>   *subdirectory* of a sparse directory
>> * the sparse directory index entry is *not* expanded in-core
>>
>> In this scenario, 'git status' would not recurse into the sparse directory's
>> subdirectories to identify which file contained the difference between the
>> index and HEAD. Instead, it would report the immediate subdirectory itself
>> as "modified".
>>
>> Example:
>>
>> $ git init
>> $ mkdir -p sparse/sub
>> $ echo test >sparse/sub/foo
>> $ git add .
>> $ git commit -m "commit 1"
>> $ echo somethingelse >sparse/sub/foo
>> $ git add .
>> $ git commit -a -m "commit 2"
>> $ git sparse-checkout set --cone --sparse-index 'sparse'
>> $ git reset --soft HEAD~1
>> $ git status
>> On branch master
>> You are in a sparse checkout.
>>
>> Changes to be committed:
>>   (use "git restore --staged <file>..." to unstage)
>>         modified:   sparse/sub
>>
>> The 'recursive' diff option in 'wt_status_collect_changes_index' corrects
>> this by indicating that 'git status' should recurse into sparse directories
>> to find modified files. Given the same repository setup as the example
>> above, the corrected result of `git status` is:
>>
>> $ git status
>> On branch master
>> You are in a sparse checkout.
>>
>> Changes to be committed:
>>   (use "git restore --staged <file>..." to unstage)
>>         modified:   sparse/sub/foo
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 7 +++++++
>>  wt-status.c                              | 9 +++++++++
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 9ef7cd80885..b1dcaa0e642 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -278,6 +278,13 @@ test_expect_success 'status with options' '
>>         test_all_match git status --porcelain=v2 -uno
>>  '
>>
>> +test_expect_success 'status with diff in unexpanded sparse directory' '
>> +       init_repos &&
>> +       test_all_match git checkout rename-base &&
>> +       test_all_match git reset --soft rename-out-to-out &&
>> +       test_all_match git status --porcelain=v2
>> +'
>> +
>>  test_expect_success 'status reports sparse-checkout' '
>>         init_repos &&
>>         git -C sparse-checkout status >full &&
>> diff --git a/wt-status.c b/wt-status.c
>> index 335e723a71e..4a5b9beeca1 100644
>> --- a/wt-status.c
>> +++ b/wt-status.c
>> @@ -651,6 +651,15 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>>         rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>>         rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>>         rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>> +
>> +       /*
>> +        * The `recursive` option must be enabled to show differences in files
>> +        * *more than* one level deep in a sparse directory index entry (e.g., given
>> +        * sparse directory 'sparse-dir/', reporting a difference in the file
>> +        * 'sparse-dir/another-dir/my-file').
>> +        */
>> +       rev.diffopt.flags.recursive = 1;
> 
> Kind of clever, and makes sense.
> 
> I'm wondering if there's an alternate wording that might be helpful
> here or in the commit message, that instead of just saying the
> 'recursive' option is necessary, perhaps says a little bit about why
> it helps.  In particular, the diff machinery, by default, is not
> recursive and stops at comparing the first level of trees.  (See e.g.
> the -r option in diff-tree, it's just that it's turned on by default
> in 'git diff' and by the -p option in 'git log'.)  I'm guessing the
> recursive option never needed to be turned on previously within
> wt-status, due to something about the nature of the index only holding
> files previously.  Now, however, the sparse index changes that.  (And
> it also suggests that perhaps we should look to see if other commands
> run the diff machinery without the recursive flag, and see if they
> need it now due to sparse indices.)
> 
> Granted, I'm not totally sure how to work these facts in (in part
> because I don't know how comparison to the index normally avoids the
> need for the recursive flag), and maybe what you have is fine.  Just
> thought I'd point it out since I wasn't aware of the non-recursive
> nature of the diff machinery until I started doing things with
> diff-tree.
>

All good points - I'll reword the commit message and code comment to explain
that 1) diff is not recursive by default, and 2) why the diff of a sparse
directory requires `recursive = 1`.
 
> 
>> +
>>         copy_pathspec(&rev.prune_data, &s->pathspec);
>>         run_diff_index(&rev, 1);
>>         object_array_clear(&rev.pending);
>> --
>> gitgitgadget

@@ -160,15 +160,18 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
argc = parse_options(argc, argv, cmd_prefix, read_tree_options,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> When 'git read-tree' is provided with a prefix, expand the index only if the
> prefix is equivalent to a sparse directory or contained within one. If the
> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
> that the relevant sparse directory is not in the prefix/traverse path,
> skipping past it and not unpacking the appropriate tree(s).
>
> If the prefix is in-cone, its sparse subdirectories (if any) will be
> traversed correctly without index expansion.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/read-tree.c                      |  3 +--
>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++-
>  unpack-trees.c                           | 30 ++++++++++++++++++++++++
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index c2fdbc2657f..a7b7f822281 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>         if (opts.merge && !opts.index_only)
>                 setup_work_tree();
>
> -       /* TODO: audit sparse index behavior in unpack_trees */
> -       if (opts.skip_sparse_checkout || opts.prefix)
> +       if (opts.skip_sparse_checkout)
>                 ensure_full_index(&the_index);
>
>         if (opts.merge) {
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index ae44451a0a9..a404be0a10f 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
>         do
>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
>                 ensure_not_expanded reset --hard HEAD || return 1
> -       done
> +       done &&
> +
> +       rm -rf sparse-index/deep/deeper2 &&
> +       ensure_not_expanded add . &&
> +       ensure_not_expanded commit -m "test" &&
> +
> +       ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
>  '
>
>  test_expect_success 'ls-files' '
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 360844bda3a..dba122a02bb 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>                 setup_standard_excludes(o->dir);
>         }
>
> +       /*
> +        * If the prefix is equal to or contained within a sparse directory, the
> +        * index needs to be expanded to traverse with the specified prefix.
> +        */
> +       if (o->prefix && o->src_index->sparse_index) {
> +               int prefix_len = strlen(o->prefix);
> +
> +               while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
> +                       prefix_len--;
> +
> +               if (prefix_len > 0) {

Is this condition check necessary?  If we want some safety check here,
could it instead be something like

   if (prefix_len <= 0)
       BUG("Broken prefix passed to unpack_trees");

and then dedent the following code?  (Or are callers allowed to not
sanitize their input before passing to unpack_trees(), meaning that we
should use a die() rather than a BUG()?)

To test this idea, near the top of unpack_trees(), I added:
    if (o->prefix)
        assert(*o->prefix && *o->prefix != '/');
and reran all tests.  They all ran without hitting that assertion.  FWIW.

> +                       struct strbuf ce_prefix = STRBUF_INIT;
> +                       strbuf_grow(&ce_prefix, prefix_len + 1);
> +                       strbuf_add(&ce_prefix, o->prefix, prefix_len);
> +                       strbuf_addch(&ce_prefix, '/');
> +
> +                       /*
> +                        * If the prefix is not inside the sparse cone, then the
> +                        * index is explicitly expanded if it is found as a sparse
> +                        * directory, or implicitly expanded (by 'index_name_pos')
> +                        * if the path is inside a sparse directory.
> +                        */
> +                       if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
> +                           index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)

style nit: Can you rewrap both the comments and the code at 80 characters?

It took me a bit of playing and testing to understand these two lines.
The comment helps, but it's still a bit dense to unpack; somehow I
didn't understand that the comment was referring to index_name_pos()'s
call to ensure_full_index().  Once I understood that, it all looks
good.


> +                               ensure_full_index(o->src_index);
> +
> +                       strbuf_release(&ce_prefix);
> +               }
> +       }
> +
>         if (!core_apply_sparse_checkout || !o->update)
>                 o->skip_sparse_checkout = 1;
>         if (!o->skip_sparse_checkout && !o->pl) {
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> When 'git read-tree' is provided with a prefix, expand the index only if the
>> prefix is equivalent to a sparse directory or contained within one. If the
>> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
>> that the relevant sparse directory is not in the prefix/traverse path,
>> skipping past it and not unpacking the appropriate tree(s).
>>
>> If the prefix is in-cone, its sparse subdirectories (if any) will be
>> traversed correctly without index expansion.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  builtin/read-tree.c                      |  3 +--
>>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++-
>>  unpack-trees.c                           | 30 ++++++++++++++++++++++++
>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>> index c2fdbc2657f..a7b7f822281 100644
>> --- a/builtin/read-tree.c
>> +++ b/builtin/read-tree.c
>> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>>         if (opts.merge && !opts.index_only)
>>                 setup_work_tree();
>>
>> -       /* TODO: audit sparse index behavior in unpack_trees */
>> -       if (opts.skip_sparse_checkout || opts.prefix)
>> +       if (opts.skip_sparse_checkout)
>>                 ensure_full_index(&the_index);
>>
>>         if (opts.merge) {
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index ae44451a0a9..a404be0a10f 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
>>         do
>>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
>>                 ensure_not_expanded reset --hard HEAD || return 1
>> -       done
>> +       done &&
>> +
>> +       rm -rf sparse-index/deep/deeper2 &&
>> +       ensure_not_expanded add . &&
>> +       ensure_not_expanded commit -m "test" &&
>> +
>> +       ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
>>  '
>>
>>  test_expect_success 'ls-files' '
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 360844bda3a..dba122a02bb 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>                 setup_standard_excludes(o->dir);
>>         }
>>
>> +       /*
>> +        * If the prefix is equal to or contained within a sparse directory, the
>> +        * index needs to be expanded to traverse with the specified prefix.
>> +        */
>> +       if (o->prefix && o->src_index->sparse_index) {
>> +               int prefix_len = strlen(o->prefix);
>> +
>> +               while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
>> +                       prefix_len--;
>> +
>> +               if (prefix_len > 0) {
> 
> Is this condition check necessary?  If we want some safety check here,
> could it instead be something like
> 
>    if (prefix_len <= 0)
>        BUG("Broken prefix passed to unpack_trees");
> 

This condition was intended to skip unnecessary computation for the
(probably unlikely, but still technically valid) case where the prefix is
the repo root (e.g., '--prefix=/') - because the repo root is represented
with only directory separator(s), `prefix_len` would be 0 after removing
trailing '/'. In that scenario, the index won't need to be expanded, so we
don't need to go looking in the index for that path. 

None of that is particularly clear from reading the patch, though, so I'll
add a comment & test covering it explicitly.

> and then dedent the following code?  (Or are callers allowed to not
> sanitize their input before passing to unpack_trees(), meaning that we
> should use a die() rather than a BUG()?)
> 
> To test this idea, near the top of unpack_trees(), I added:
>     if (o->prefix)
>         assert(*o->prefix && *o->prefix != '/');
> and reran all tests.  They all ran without hitting that assertion.  FWIW.
> 
>> +                       struct strbuf ce_prefix = STRBUF_INIT;
>> +                       strbuf_grow(&ce_prefix, prefix_len + 1);
>> +                       strbuf_add(&ce_prefix, o->prefix, prefix_len);
>> +                       strbuf_addch(&ce_prefix, '/');
>> +
>> +                       /*
>> +                        * If the prefix is not inside the sparse cone, then the
>> +                        * index is explicitly expanded if it is found as a sparse
>> +                        * directory, or implicitly expanded (by 'index_name_pos')
>> +                        * if the path is inside a sparse directory.
>> +                        */
>> +                       if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
>> +                           index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
> 
> style nit: Can you rewrap both the comments and the code at 80 characters?
> 

I couldn't think of a way to wrap the condition that wouldn't make it more
difficult to read. The best I could come up with was:

			if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, 
							       o->src_index) &&
			    index_name_pos(o->src_index, 
					   ce_prefix.buf, 
					   ce_prefix.len) >= 0)
				ensure_full_index(o->src_index);


which, to me, is a bit hard to parse. Alternatively, though, I can move the
prefix-checking logic into its own function (kind of like
'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
change the current wrapping to keep it under 80 characters.

[1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/

> It took me a bit of playing and testing to understand these two lines.
> The comment helps, but it's still a bit dense to unpack; somehow I
> didn't understand that the comment was referring to index_name_pos()'s
> call to ensure_full_index().  Once I understood that, it all looks
> good.
> 

Sorry about that, I'll revise to make that clearer.

> 
>> +                               ensure_full_index(o->src_index);
>> +
>> +                       strbuf_release(&ce_prefix);
>> +               }
>> +       }
>> +
>>         if (!core_apply_sparse_checkout || !o->update)
>>                 o->skip_sparse_checkout = 1;
>>         if (!o->skip_sparse_checkout && !o->pl) {
>> --
>> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi Victoria,

On Fri, Feb 25, 2022 at 12:25 PM Victoria Dye <vdye@github.com> wrote:
>
> Elijah Newren wrote:
> > On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Victoria Dye <vdye@github.com>
> >>
> >> When 'git read-tree' is provided with a prefix, expand the index only if the
> >> prefix is equivalent to a sparse directory or contained within one. If the
> >> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
> >> that the relevant sparse directory is not in the prefix/traverse path,
> >> skipping past it and not unpacking the appropriate tree(s).
> >>
> >> If the prefix is in-cone, its sparse subdirectories (if any) will be
> >> traversed correctly without index expansion.
> >>
> >> Signed-off-by: Victoria Dye <vdye@github.com>
> >> ---
> >>  builtin/read-tree.c                      |  3 +--
> >>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++-
> >>  unpack-trees.c                           | 30 ++++++++++++++++++++++++
> >>  3 files changed, 38 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> >> index c2fdbc2657f..a7b7f822281 100644
> >> --- a/builtin/read-tree.c
> >> +++ b/builtin/read-tree.c
> >> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >>         if (opts.merge && !opts.index_only)
> >>                 setup_work_tree();
> >>
> >> -       /* TODO: audit sparse index behavior in unpack_trees */
> >> -       if (opts.skip_sparse_checkout || opts.prefix)
> >> +       if (opts.skip_sparse_checkout)
> >>                 ensure_full_index(&the_index);
> >>
> >>         if (opts.merge) {
> >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> >> index ae44451a0a9..a404be0a10f 100755
> >> --- a/t/t1092-sparse-checkout-compatibility.sh
> >> +++ b/t/t1092-sparse-checkout-compatibility.sh
> >> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
> >>         do
> >>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
> >>                 ensure_not_expanded reset --hard HEAD || return 1
> >> -       done
> >> +       done &&
> >> +
> >> +       rm -rf sparse-index/deep/deeper2 &&
> >> +       ensure_not_expanded add . &&
> >> +       ensure_not_expanded commit -m "test" &&
> >> +
> >> +       ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
> >>  '
> >>
> >>  test_expect_success 'ls-files' '
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index 360844bda3a..dba122a02bb 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >>                 setup_standard_excludes(o->dir);
> >>         }
> >>
> >> +       /*
> >> +        * If the prefix is equal to or contained within a sparse directory, the
> >> +        * index needs to be expanded to traverse with the specified prefix.
> >> +        */
> >> +       if (o->prefix && o->src_index->sparse_index) {
> >> +               int prefix_len = strlen(o->prefix);
> >> +
> >> +               while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
> >> +                       prefix_len--;
> >> +
> >> +               if (prefix_len > 0) {
> >
> > Is this condition check necessary?  If we want some safety check here,
> > could it instead be something like
> >
> >    if (prefix_len <= 0)
> >        BUG("Broken prefix passed to unpack_trees");
> >
>
> This condition was intended to skip unnecessary computation for the
> (probably unlikely, but still technically valid) case where the prefix is
> the repo root (e.g., '--prefix=/') - because the repo root is represented
> with only directory separator(s), `prefix_len` would be 0 after removing
> trailing '/'. In that scenario, the index won't need to be expanded, so we
> don't need to go looking in the index for that path.

Wow, that doesn't result in an error?!?  That surprises me.  I never
even thought to test such a thing.  Clearly, the following related
command does give an error:

  git read-tree --prefix=/toplevel/ $TREE

(namely, "error: invalid path '/toplevel/subpath'")

whereas, the reason a single slash is accepted it because git is
trying to be forgiving and treat the following two commands the same:

  git read-tree --prefix=subdir/ $TREE
  git read-tree --prefix=subdir $TREE

i.e. it tries to allow the trailing slash to be optional.  And, by
implementation quirk, making a trailing slash be optional turns out to
mean that --prefix=/ is treated the same as no prefix at all, because
empty string prefix just happens to give the same behavior as NULL
prefix.

I think we should just throw an error if prefix starts with '/'.
unpack_trees() can make it be a BUG() (and at the beginning of the
function rather than down at this point and only inside some
conditional). builtin/read-tree.c, the only thing that ever sets
prefix in unpack_trees_options, should die() if it's passed something
that starts with a '/'.  Having paths start with a '/' is antithetical
to how "prefix" is used throughout the codebase, though I guess I can
see people making that mistake if they are used to gitignore-style
patterns instead.

> None of that is particularly clear from reading the patch, though, so I'll
> add a comment & test covering it explicitly.
>
> > and then dedent the following code?  (Or are callers allowed to not
> > sanitize their input before passing to unpack_trees(), meaning that we
> > should use a die() rather than a BUG()?)
> >
> > To test this idea, near the top of unpack_trees(), I added:
> >     if (o->prefix)
> >         assert(*o->prefix && *o->prefix != '/');
> > and reran all tests.  They all ran without hitting that assertion.  FWIW.
> >
> >> +                       struct strbuf ce_prefix = STRBUF_INIT;
> >> +                       strbuf_grow(&ce_prefix, prefix_len + 1);
> >> +                       strbuf_add(&ce_prefix, o->prefix, prefix_len);
> >> +                       strbuf_addch(&ce_prefix, '/');
> >> +
> >> +                       /*
> >> +                        * If the prefix is not inside the sparse cone, then the
> >> +                        * index is explicitly expanded if it is found as a sparse
> >> +                        * directory, or implicitly expanded (by 'index_name_pos')
> >> +                        * if the path is inside a sparse directory.
> >> +                        */
> >> +                       if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
> >> +                           index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
> >
> > style nit: Can you rewrap both the comments and the code at 80 characters?
> >
>
> I couldn't think of a way to wrap the condition that wouldn't make it more
> difficult to read. The best I could come up with was:
>
>                         if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf,
>                                                                o->src_index) &&
>                             index_name_pos(o->src_index,
>                                            ce_prefix.buf,
>                                            ce_prefix.len) >= 0)
>                                 ensure_full_index(o->src_index);
>
>
> which, to me, is a bit hard to parse. Alternatively, though, I can move the
> prefix-checking logic into its own function (kind of like
> 'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
> change the current wrapping to keep it under 80 characters.
>
> [1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/
>
> > It took me a bit of playing and testing to understand these two lines.
> > The comment helps, but it's still a bit dense to unpack; somehow I
> > didn't understand that the comment was referring to index_name_pos()'s
> > call to ensure_full_index().  Once I understood that, it all looks
> > good.
> >
>
> Sorry about that, I'll revise to make that clearer.

Thanks.  :-)

> >
> >> +                               ensure_full_index(o->src_index);
> >> +
> >> +                       strbuf_release(&ce_prefix);
> >> +               }
> >> +       }
> >> +
> >>         if (!core_apply_sparse_checkout || !o->update)
> >>                 o->skip_sparse_checkout = 1;
> >>         if (!o->skip_sparse_checkout && !o->pl) {
> >> --
> >> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Elijah Newren wrote:
> Hi Victoria,
> 
> On Fri, Feb 25, 2022 at 12:25 PM Victoria Dye <vdye@github.com> wrote:
>>
>> Elijah Newren wrote:
>>> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>>
>>>> From: Victoria Dye <vdye@github.com>
>>>>
>>>> When 'git read-tree' is provided with a prefix, expand the index only if the
>>>> prefix is equivalent to a sparse directory or contained within one. If the
>>>> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
>>>> that the relevant sparse directory is not in the prefix/traverse path,
>>>> skipping past it and not unpacking the appropriate tree(s).
>>>>
>>>> If the prefix is in-cone, its sparse subdirectories (if any) will be
>>>> traversed correctly without index expansion.
>>>>
>>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>>> ---
>>>>  builtin/read-tree.c                      |  3 +--
>>>>  t/t1092-sparse-checkout-compatibility.sh |  8 ++++++-
>>>>  unpack-trees.c                           | 30 ++++++++++++++++++++++++
>>>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>>>> index c2fdbc2657f..a7b7f822281 100644
>>>> --- a/builtin/read-tree.c
>>>> +++ b/builtin/read-tree.c
>>>> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>>>>         if (opts.merge && !opts.index_only)
>>>>                 setup_work_tree();
>>>>
>>>> -       /* TODO: audit sparse index behavior in unpack_trees */
>>>> -       if (opts.skip_sparse_checkout || opts.prefix)
>>>> +       if (opts.skip_sparse_checkout)
>>>>                 ensure_full_index(&the_index);
>>>>
>>>>         if (opts.merge) {
>>>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>>>> index ae44451a0a9..a404be0a10f 100755
>>>> --- a/t/t1092-sparse-checkout-compatibility.sh
>>>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>>>> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
>>>>         do
>>>>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
>>>>                 ensure_not_expanded reset --hard HEAD || return 1
>>>> -       done
>>>> +       done &&
>>>> +
>>>> +       rm -rf sparse-index/deep/deeper2 &&
>>>> +       ensure_not_expanded add . &&
>>>> +       ensure_not_expanded commit -m "test" &&
>>>> +
>>>> +       ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
>>>>  '
>>>>
>>>>  test_expect_success 'ls-files' '
>>>> diff --git a/unpack-trees.c b/unpack-trees.c
>>>> index 360844bda3a..dba122a02bb 100644
>>>> --- a/unpack-trees.c
>>>> +++ b/unpack-trees.c
>>>> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>>>                 setup_standard_excludes(o->dir);
>>>>         }
>>>>
>>>> +       /*
>>>> +        * If the prefix is equal to or contained within a sparse directory, the
>>>> +        * index needs to be expanded to traverse with the specified prefix.
>>>> +        */
>>>> +       if (o->prefix && o->src_index->sparse_index) {
>>>> +               int prefix_len = strlen(o->prefix);
>>>> +
>>>> +               while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
>>>> +                       prefix_len--;
>>>> +
>>>> +               if (prefix_len > 0) {
>>>
>>> Is this condition check necessary?  If we want some safety check here,
>>> could it instead be something like
>>>
>>>    if (prefix_len <= 0)
>>>        BUG("Broken prefix passed to unpack_trees");
>>>
>>
>> This condition was intended to skip unnecessary computation for the
>> (probably unlikely, but still technically valid) case where the prefix is
>> the repo root (e.g., '--prefix=/') - because the repo root is represented
>> with only directory separator(s), `prefix_len` would be 0 after removing
>> trailing '/'. In that scenario, the index won't need to be expanded, so we
>> don't need to go looking in the index for that path.
> 
> Wow, that doesn't result in an error?!?  That surprises me.  I never
> even thought to test such a thing.  Clearly, the following related
> command does give an error:
> 
>   git read-tree --prefix=/toplevel/ $TREE
> 
> (namely, "error: invalid path '/toplevel/subpath'")
> 
> whereas, the reason a single slash is accepted it because git is
> trying to be forgiving and treat the following two commands the same:
> 
>   git read-tree --prefix=subdir/ $TREE
>   git read-tree --prefix=subdir $TREE
> 
> i.e. it tries to allow the trailing slash to be optional.  And, by
> implementation quirk, making a trailing slash be optional turns out to
> mean that --prefix=/ is treated the same as no prefix at all, because
> empty string prefix just happens to give the same behavior as NULL
> prefix.
> 

If by "NULL prefix" you mean "the --prefix option isn't specified", the
behavior isn't *quite* the same as with an empty prefix, since `git
read-tree $TREE` will completely overwrite the index without regard for
what's already there, whereas `git read-tree --prefix= $TREE` "merges"
`$TREE` into the current index (failing if any entries in `$TREE` already
exist in the index). I still think it's an extremely niche usage, but can
also imagine a couple instances where it might be useful (e.g., reading from
an orphan branch).

> I think we should just throw an error if prefix starts with '/'.
> unpack_trees() can make it be a BUG() (and at the beginning of the
> function rather than down at this point and only inside some
> conditional). builtin/read-tree.c, the only thing that ever sets
> prefix in unpack_trees_options, should die() if it's passed something
> that starts with a '/'.  Having paths start with a '/' is antithetical
> to how "prefix" is used throughout the codebase, though I guess I can
> see people making that mistake if they are used to gitignore-style
> patterns instead.
> 

I think this makes sense (especially because it doesn't cause a loss of
functionality - a user can still specify '--prefix=' for the repo root if
they really want to). I'll include a patch for it in my next re-roll.

>> None of that is particularly clear from reading the patch, though, so I'll
>> add a comment & test covering it explicitly.
>>
>>> and then dedent the following code?  (Or are callers allowed to not
>>> sanitize their input before passing to unpack_trees(), meaning that we
>>> should use a die() rather than a BUG()?)
>>>
>>> To test this idea, near the top of unpack_trees(), I added:
>>>     if (o->prefix)
>>>         assert(*o->prefix && *o->prefix != '/');
>>> and reran all tests.  They all ran without hitting that assertion.  FWIW.
>>>
>>>> +                       struct strbuf ce_prefix = STRBUF_INIT;
>>>> +                       strbuf_grow(&ce_prefix, prefix_len + 1);
>>>> +                       strbuf_add(&ce_prefix, o->prefix, prefix_len);
>>>> +                       strbuf_addch(&ce_prefix, '/');
>>>> +
>>>> +                       /*
>>>> +                        * If the prefix is not inside the sparse cone, then the
>>>> +                        * index is explicitly expanded if it is found as a sparse
>>>> +                        * directory, or implicitly expanded (by 'index_name_pos')
>>>> +                        * if the path is inside a sparse directory.
>>>> +                        */
>>>> +                       if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
>>>> +                           index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
>>>
>>> style nit: Can you rewrap both the comments and the code at 80 characters?
>>>
>>
>> I couldn't think of a way to wrap the condition that wouldn't make it more
>> difficult to read. The best I could come up with was:
>>
>>                         if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf,
>>                                                                o->src_index) &&
>>                             index_name_pos(o->src_index,
>>                                            ce_prefix.buf,
>>                                            ce_prefix.len) >= 0)
>>                                 ensure_full_index(o->src_index);
>>
>>
>> which, to me, is a bit hard to parse. Alternatively, though, I can move the
>> prefix-checking logic into its own function (kind of like
>> 'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
>> change the current wrapping to keep it under 80 characters.
>>
>> [1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/
>>
>>> It took me a bit of playing and testing to understand these two lines.
>>> The comment helps, but it's still a bit dense to unpack; somehow I
>>> didn't understand that the comment was referring to index_name_pos()'s
>>> call to ensure_full_index().  Once I understood that, it all looks
>>> good.
>>>
>>
>> Sorry about that, I'll revise to make that clearer.
> 
> Thanks.  :-)
> 
>>>
>>>> +                               ensure_full_index(o->src_index);
>>>> +
>>>> +                       strbuf_release(&ce_prefix);
>>>> +               }
>>>> +       }
>>>> +
>>>>         if (!core_apply_sparse_checkout || !o->update)
>>>>                 o->skip_sparse_checkout = 1;
>>>>         if (!o->skip_sparse_checkout && !o->pl) {
>>>> --
>>>> gitgitgadget

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@011834f.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@f417268.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

There was a status update in the "New Topics" section about the branch vd/sparse-read-tree on the Git mailing list:

"git read-tree" has been made to be aware of the sparse-index
feature.

Needs review.
source: <pull.1157.v2.git.1645742073.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@a1fae29.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

This patch series was integrated into seen via git@819de4e.

@@ -117,6 +117,7 @@ test_perf_on_all git diff
test_perf_on_all git diff --cached
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
> emphasis is placed on interactions with files outside the sparse cone, e.g.
> merges with out-of-cone conflicts.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/perf/p2000-sparse-operations.sh        |  1 +
>  t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 2a7106b9495..382716cfca9 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>  test_perf_on_all git diff --cached
>  test_perf_on_all git blame $SPARSE_CONE/a
>  test_perf_on_all git blame $SPARSE_CONE/f3/a
> +test_perf_on_all git read-tree -mu HEAD
>  test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index b1dcaa0e642..9d58da4e925 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>         test_cmp expect sparse-checkout-out
>  '
>
> +test_expect_success 'read-tree --merge with files outside sparse definition' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b test-branch update-folder1 &&
> +       for MERGE_TREES in "base HEAD update-folder2" \
> +                          "update-folder1 update-folder2" \
> +                          "update-folder2"
> +       do
> +               # Clean up and remove on-disk files
> +               test_all_match git reset --hard HEAD &&
> +               test_sparse_match git sparse-checkout reapply &&
> +
> +               # Although the index matches, without --no-sparse-checkout, outside-of-
> +               # definition files will not exist on disk for sparse checkouts
> +               test_all_match git read-tree -mu $MERGE_TREES &&
> +               test_all_match git status --porcelain=v2 &&
> +               test_path_is_missing sparse-checkout/folder2 &&
> +               test_path_is_missing sparse-index/folder2 &&
> +
> +               test_all_match git read-tree --reset -u HEAD &&
> +               test_all_match git status --porcelain=v2 &&
> +
> +               test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
> +               test_all_match git status --porcelain=v2 &&
> +               test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
> +               test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
> +       done
> +'
> +
> +test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
> +       init_repos &&
> +
> +       # Merge of multiple changes to same directory (but not same files) should
> +       # succeed
> +       test_all_match git read-tree -mu base rename-base update-folder1 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset --hard &&
> +
> +       test_all_match git read-tree -mu rename-base update-folder2 &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset --hard &&
> +
> +       test_all_match test_must_fail git read-tree -mu base update-folder1 rename-out-to-in &&
> +       test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
> +'
> +
> +test_expect_success 'read-tree --prefix outside sparse definition' '
> +       init_repos &&
> +
> +       # Cannot read-tree --prefix with a single argument when files exist within
> +       # prefix

Given the comments in the cover letter about --prefix needing special
work, it's not clear to me whether the below is expected behavior or
current-but-buggy behavior that you are testing and documenting.
Could you clarify?

> +       test_all_match test_must_fail git read-tree --prefix=folder1/ -u update-folder1 &&
> +
> +       test_all_match git read-tree --prefix=folder2/0 -u rename-base &&
> +       test_path_is_missing sparse-checkout/folder2 &&
> +       test_path_is_missing sparse-index/folder2 &&
> +
> +       test_all_match git read-tree --reset -u HEAD &&
> +       test_all_match git read-tree --prefix=folder2/0 -u --no-sparse-checkout rename-base &&
> +       test_cmp sparse-checkout/folder2/0/a sparse-index/folder2/0/a &&
> +       test_cmp sparse-checkout/folder2/0/a full-checkout/folder2/0/a
> +'
> +
> +test_expect_success 'read-tree --merge with directory-file conflicts' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b test-branch rename-base &&
> +
> +       # Although the index matches, without --no-sparse-checkout, outside-of-
> +       # definition files will not exist on disk for sparse checkouts
> +       test_sparse_match git read-tree -mu rename-out-to-out &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_path_is_missing sparse-checkout/folder2 &&
> +       test_path_is_missing sparse-index/folder2 &&
> +
> +       test_sparse_match git read-tree --reset -u HEAD &&
> +       test_sparse_match git status --porcelain=v2 &&
> +
> +       test_sparse_match git read-tree -mu --no-sparse-checkout rename-out-to-out &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_cmp sparse-checkout/folder2/0/1 sparse-index/folder2/0/1
> +'
> +
>  test_expect_success 'merge, cherry-pick, and rebase' '
>         init_repos &&
>
> --
> gitgitgadget
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>> merges with out-of-cone conflicts.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  t/perf/p2000-sparse-operations.sh        |  1 +
>>  t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>>  2 files changed, 86 insertions(+)
>>
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 2a7106b9495..382716cfca9 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>>  test_perf_on_all git diff --cached
>>  test_perf_on_all git blame $SPARSE_CONE/a
>>  test_perf_on_all git blame $SPARSE_CONE/f3/a
>> +test_perf_on_all git read-tree -mu HEAD
>>  test_perf_on_all git checkout-index -f --all
>>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index b1dcaa0e642..9d58da4e925 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>>         test_cmp expect sparse-checkout-out
>>  '
>>
>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout -b test-branch update-folder1 &&
>> +       for MERGE_TREES in "base HEAD update-folder2" \
>> +                          "update-folder1 update-folder2" \
>> +                          "update-folder2"
>> +       do
>> +               # Clean up and remove on-disk files
>> +               test_all_match git reset --hard HEAD &&
>> +               test_sparse_match git sparse-checkout reapply &&
>> +
>> +               # Although the index matches, without --no-sparse-checkout, outside-of-
>> +               # definition files will not exist on disk for sparse checkouts
>> +               test_all_match git read-tree -mu $MERGE_TREES &&
>> +               test_all_match git status --porcelain=v2 &&
>> +               test_path_is_missing sparse-checkout/folder2 &&
>> +               test_path_is_missing sparse-index/folder2 &&
>> +
>> +               test_all_match git read-tree --reset -u HEAD &&
>> +               test_all_match git status --porcelain=v2 &&
>> +
>> +               test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>> +               test_all_match git status --porcelain=v2 &&
>> +               test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>> +               test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>> +       done
>> +'
>> +
>> +test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
>> +       init_repos &&
>> +
>> +       # Merge of multiple changes to same directory (but not same files) should
>> +       # succeed
>> +       test_all_match git read-tree -mu base rename-base update-folder1 &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       test_all_match git reset --hard &&
>> +
>> +       test_all_match git read-tree -mu rename-base update-folder2 &&
>> +       test_all_match git status --porcelain=v2 &&
>> +
>> +       test_all_match git reset --hard &&
>> +
>> +       test_all_match test_must_fail git read-tree -mu base update-folder1 rename-out-to-in &&
>> +       test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
>> +'
>> +
>> +test_expect_success 'read-tree --prefix outside sparse definition' '
>> +       init_repos &&
>> +
>> +       # Cannot read-tree --prefix with a single argument when files exist within
>> +       # prefix
> 
> Given the comments in the cover letter about --prefix needing special
> work, it's not clear to me whether the below is expected behavior or
> current-but-buggy behavior that you are testing and documenting.
> Could you clarify?
> 

It's expected - per 'Documentation/git-read-tree.txt':

	The command will refuse to overwrite entries that already
	existed in the original index file.

I'll update the comment to make that clearer. Thanks!

>> +       test_all_match test_must_fail git read-tree --prefix=folder1/ -u update-folder1 &&
>> +
>> +       test_all_match git read-tree --prefix=folder2/0 -u rename-base &&
>> +       test_path_is_missing sparse-checkout/folder2 &&
>> +       test_path_is_missing sparse-index/folder2 &&
>> +
>> +       test_all_match git read-tree --reset -u HEAD &&
>> +       test_all_match git read-tree --prefix=folder2/0 -u --no-sparse-checkout rename-base &&
>> +       test_cmp sparse-checkout/folder2/0/a sparse-index/folder2/0/a &&
>> +       test_cmp sparse-checkout/folder2/0/a full-checkout/folder2/0/a
>> +'
>> +
>> +test_expect_success 'read-tree --merge with directory-file conflicts' '
>> +       init_repos &&
>> +
>> +       test_all_match git checkout -b test-branch rename-base &&
>> +
>> +       # Although the index matches, without --no-sparse-checkout, outside-of-
>> +       # definition files will not exist on disk for sparse checkouts
>> +       test_sparse_match git read-tree -mu rename-out-to-out &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_path_is_missing sparse-checkout/folder2 &&
>> +       test_path_is_missing sparse-index/folder2 &&
>> +
>> +       test_sparse_match git read-tree --reset -u HEAD &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +
>> +       test_sparse_match git read-tree -mu --no-sparse-checkout rename-out-to-out &&
>> +       test_sparse_match git status --porcelain=v2 &&
>> +       test_cmp sparse-checkout/folder2/0/1 sparse-index/folder2/0/1
>> +'
>> +
>>  test_expect_success 'merge, cherry-pick, and rebase' '
>>         init_repos &&
>>
>> --
>> gitgitgadget
>>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 26, 2022

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Like previous integrations [1] [2], this series allows 'git read-tree' to
> operate using a sparse index.
>
> The first two patches are bugfixes for issues found while implementing the
> 'read-tree' integration:
>
>  * The first (patch 1/7) fixes an edge case in which a repo with no in-cone
>    files or directories would have its root collapsed into a sparse
>    directory; the fix ensures the root directory is never collapsed into a
>    sparse directory.
>  * The second (patch 2/7) corrects the 'git status' reporting of changes
>    nested inside the subdirectory of a sparse directory, ensuring that the
>    modified file (not the subdirectory) is correctly reported as having
>    changes.
>
> The remainder of the series focuses on utilizing the sparse index in 'git
> read-tree'. After some baseline behavior-establishing tests (patch 3/7),
> sparse index usage is trivially enabled (patch 4/7) for 'read-tree' except:
>
>  * usage with '--prefix'
>  * two- and three-way merge
>
> These cases require additional changes in order to work as expected (i.e.,
> outwardly matching non-sparse index sparse-checkout). For the former, the
> sparse index can be enabled as long as the index is expanded when the prefix
> is a directory outside the sparse cone (patch 5/7). For the latter, sparse
> directories that cannot be trivially merged must have their contents merged
> file-by-file, done by recursively traversing the trees represented by the
> sparse directories (patches 6/7 & 7/7).
>
>
> Changes since V1
> ================
>
>  * switched an empty string check from '!strlen(path)' to the
>    slightly-less-expensive '!*path'

I've read over the series.  It was a nice read, well motivated, and
split up rather nicely.  I only had a few small commetns.

I think it'd be nice to insert another patch into the series that
throws an error if the argument to --prefix starts with a '/'.  That
would also allow you to simplify patch 5/7 a little.

Patch 6/7 has the right idea, but I'm worried about one part of it; a
test would go a long way towards verifying whether that aspect is
handled correctly or whether my concern is warranted.

I had a couple smaller comments on some of the other patches.

Overall, nicely done.


> Thanks!
>
>  * Victoria
>
> [1]
> https://lore.kernel.org/git/pull.1109.v2.git.1641924306.gitgitgadget@gmail.com/
> [2]
> https://lore.kernel.org/git/pull.1048.v6.git.1638201164.gitgitgadget@gmail.com/
>
> Victoria Dye (7):
>   sparse-index: prevent repo root from becoming sparse
>   status: fix nested sparse directory diff in sparse index
>   read-tree: expand sparse checkout test coverage
>   read-tree: integrate with sparse index
>   read-tree: narrow scope of index expansion for '--prefix'
>   read-tree: make two-way merge sparse-aware
>   read-tree: make three-way merge sparse-aware
>
>  builtin/read-tree.c                      |  10 +-
>  dir.c                                    |   7 +-
>  t/perf/p2000-sparse-operations.sh        |   1 +
>  t/t1092-sparse-checkout-compatibility.sh | 129 +++++++++++++++++++++++
>  unpack-trees.c                           | 121 ++++++++++++++++++++-
>  wt-status.c                              |   9 ++
>  6 files changed, 268 insertions(+), 9 deletions(-)
>
>
> base-commit: e6ebfd0e8cbbd10878070c8a356b5ad1b3ca464e
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1157%2Fvdye%2Fsparse%2Fread-tree-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1157/vdye/sparse/read-tree-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1157
>
> Range-diff vs v1:
>
>  1:  90da1f9f33a ! 1:  744668eeece sparse-index: prevent repo root from becoming sparse
>      @@ Commit message
>           non-cone sparse-checkouts), the new check does not cause additional changes
>           to how sparse patterns are applied.
>
>      +    Helped-by: Derrick Stolee <derrickstolee@github.com>
>           Signed-off-by: Victoria Dye <vdye@github.com>
>
>        ## dir.c ##
>      @@ dir.c: static int path_in_sparse_checkout_1(const char *path,
>       +  * patterns, or the patterns are of the wrong type.
>          */
>       - if (init_sparse_checkout_patterns(istate) ||
>      -+ if (!strlen(path) ||
>      ++ if (!*path ||
>       +     init_sparse_checkout_patterns(istate) ||
>             (require_cone_mode &&
>              !istate->sparse_checkout_patterns->use_cone_patterns))
>  2:  c21c9b9be34 = 2:  f0cff03b95d status: fix nested sparse directory diff in sparse index
>  3:  ac42ae21d4a = 3:  ffe0b6aff2b read-tree: expand sparse checkout test coverage
>  4:  5ee193bfa87 = 4:  cb7e0cf419c read-tree: integrate with sparse index
>  5:  bea482b6b28 = 5:  4f05fa70209 read-tree: narrow scope of index expansion for '--prefix'
>  6:  9fdcab038b2 = 6:  94c2aad2f93 read-tree: make two-way merge sparse-aware
>  7:  1502e9acb32 = 7:  c4080e99d6e read-tree: make three-way merge sparse-aware
>
> --
> gitgitgadget

@@ -117,6 +117,7 @@ test_perf_on_all git diff
test_perf_on_all git diff --cached
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
> emphasis is placed on interactions with files outside the sparse cone, e.g.
> merges with out-of-cone conflicts.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/perf/p2000-sparse-operations.sh        |  1 +
>  t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>  2 files changed, 86 insertions(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 2a7106b9495..382716cfca9 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>  test_perf_on_all git diff --cached
>  test_perf_on_all git blame $SPARSE_CONE/a
>  test_perf_on_all git blame $SPARSE_CONE/f3/a
> +test_perf_on_all git read-tree -mu HEAD
>  test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>  
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index b1dcaa0e642..9d58da4e925 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>  	test_cmp expect sparse-checkout-out
>  '
>  
> +test_expect_success 'read-tree --merge with files outside sparse definition' '
> +	init_repos &&
> +
> +	test_all_match git checkout -b test-branch update-folder1 &&
> +	for MERGE_TREES in "base HEAD update-folder2" \
> +			   "update-folder1 update-folder2" \
> +			   "update-folder2"
> +	do
> +		# Clean up and remove on-disk files
> +		test_all_match git reset --hard HEAD &&
> +		test_sparse_match git sparse-checkout reapply &&
> +
> +		# Although the index matches, without --no-sparse-checkout, outside-of-
> +		# definition files will not exist on disk for sparse checkouts
> +		test_all_match git read-tree -mu $MERGE_TREES &&
> +		test_all_match git status --porcelain=v2 &&
> +		test_path_is_missing sparse-checkout/folder2 &&
> +		test_path_is_missing sparse-index/folder2 &&
> +
> +		test_all_match git read-tree --reset -u HEAD &&
> +		test_all_match git status --porcelain=v2 &&
> +
> +		test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
> +		test_all_match git status --porcelain=v2 &&
> +		test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
> +		test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
> +	done
> +'

Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:

for MERGE_TREES in "base HEAD update-folder2" [...]
do
	test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
		init_repos &&
		test_when_finished "test_all_match git reset --hard HEAD" &&
                ...
	'
done

It makes failures easier to reason about since you see which for-loop
iteration you're in right away, and can e.g. pick one with --run.

And we can do the cleanup in test_when_finished instead of at the start
of every loop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>> merges with out-of-cone conflicts.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  t/perf/p2000-sparse-operations.sh        |  1 +
>>  t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>>  2 files changed, 86 insertions(+)
>>
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 2a7106b9495..382716cfca9 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>>  test_perf_on_all git diff --cached
>>  test_perf_on_all git blame $SPARSE_CONE/a
>>  test_perf_on_all git blame $SPARSE_CONE/f3/a
>> +test_perf_on_all git read-tree -mu HEAD
>>  test_perf_on_all git checkout-index -f --all
>>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>  
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index b1dcaa0e642..9d58da4e925 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>>  	test_cmp expect sparse-checkout-out
>>  '
>>  
>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>> +	init_repos &&
>> +
>> +	test_all_match git checkout -b test-branch update-folder1 &&
>> +	for MERGE_TREES in "base HEAD update-folder2" \
>> +			   "update-folder1 update-folder2" \
>> +			   "update-folder2"
>> +	do
>> +		# Clean up and remove on-disk files
>> +		test_all_match git reset --hard HEAD &&
>> +		test_sparse_match git sparse-checkout reapply &&
>> +
>> +		# Although the index matches, without --no-sparse-checkout, outside-of-
>> +		# definition files will not exist on disk for sparse checkouts
>> +		test_all_match git read-tree -mu $MERGE_TREES &&
>> +		test_all_match git status --porcelain=v2 &&
>> +		test_path_is_missing sparse-checkout/folder2 &&
>> +		test_path_is_missing sparse-index/folder2 &&
>> +
>> +		test_all_match git read-tree --reset -u HEAD &&
>> +		test_all_match git status --porcelain=v2 &&
>> +
>> +		test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>> +		test_all_match git status --porcelain=v2 &&
>> +		test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>> +		test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>> +	done
>> +'
> 
> Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:
> 
> for MERGE_TREES in "base HEAD update-folder2" [...]
> do
> 	test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
> 		init_repos &&
> 		test_when_finished "test_all_match git reset --hard HEAD" &&
>                 ...
> 	'
> done
> 
> It makes failures easier to reason about since you see which for-loop
> iteration you're in right away, and can e.g. pick one with --run.
> 

I like how this separates the test cases (while not adding any
redundant/copied code). I'll update in the next version, thanks!

> And we can do the cleanup in test_when_finished instead of at the start
> of every loop.

Because `init_repos` completely resets the test repos, this actually lets me
remove the extra cleanup steps completely.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Feb 28 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>>> merges with out-of-cone conflicts.
>>>
>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>> ---
>>>  t/perf/p2000-sparse-operations.sh        |  1 +
>>>  t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>>>  2 files changed, 86 insertions(+)
>>>
>>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>>> index 2a7106b9495..382716cfca9 100755
>>> --- a/t/perf/p2000-sparse-operations.sh
>>> +++ b/t/perf/p2000-sparse-operations.sh
>>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>>>  test_perf_on_all git diff --cached
>>>  test_perf_on_all git blame $SPARSE_CONE/a
>>>  test_perf_on_all git blame $SPARSE_CONE/f3/a
>>> +test_perf_on_all git read-tree -mu HEAD
>>>  test_perf_on_all git checkout-index -f --all
>>>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>>  
>>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>>> index b1dcaa0e642..9d58da4e925 100755
>>> --- a/t/t1092-sparse-checkout-compatibility.sh
>>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>>>  	test_cmp expect sparse-checkout-out
>>>  '
>>>  
>>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>>> +	init_repos &&
>>> +
>>> +	test_all_match git checkout -b test-branch update-folder1 &&
>>> +	for MERGE_TREES in "base HEAD update-folder2" \
>>> +			   "update-folder1 update-folder2" \
>>> +			   "update-folder2"
>>> +	do
>>> +		# Clean up and remove on-disk files
>>> +		test_all_match git reset --hard HEAD &&
>>> +		test_sparse_match git sparse-checkout reapply &&
>>> +
>>> +		# Although the index matches, without --no-sparse-checkout, outside-of-
>>> +		# definition files will not exist on disk for sparse checkouts
>>> +		test_all_match git read-tree -mu $MERGE_TREES &&
>>> +		test_all_match git status --porcelain=v2 &&
>>> +		test_path_is_missing sparse-checkout/folder2 &&
>>> +		test_path_is_missing sparse-index/folder2 &&
>>> +
>>> +		test_all_match git read-tree --reset -u HEAD &&
>>> +		test_all_match git status --porcelain=v2 &&
>>> +
>>> +		test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>>> +		test_all_match git status --porcelain=v2 &&
>>> +		test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>>> +		test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>>> +	done
>>> +'
>> 
>> Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:
>> 
>> for MERGE_TREES in "base HEAD update-folder2" [...]
>> do
>> 	test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
>> 		init_repos &&
>> 		test_when_finished "test_all_match git reset --hard HEAD" &&
>>                 ...
>> 	'
>> done
>> 
>> It makes failures easier to reason about since you see which for-loop
>> iteration you're in right away, and can e.g. pick one with --run.
>> 
>
> I like how this separates the test cases (while not adding any
> redundant/copied code). I'll update in the next version, thanks!
>
>> And we can do the cleanup in test_when_finished instead of at the start
>> of every loop.

Sounds good!

Note for <reasons> we eval the body of the test into existence, but
*not* the description. So:

    for x in [...] test_expect_success "$x" '$x'

Works to expand "$x" in both cases, but not:

    for x in [...] test_expect_success '$x' '$x'

And you don't need to do:

    for x in [...] test_expect_success "$x" "$x"

Which is handy as double-quoting the body is often a hassle with
escaping stuff.

I only think I got that wrong the first 1, 2, 3.... etc. times I used
this pattern, so I thought I'd mention it :)

> Because `init_repos` completely resets the test repos, this actually lets me
> remove the extra cleanup steps completely.

\o/

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 28, 2022

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 28, 2022

This patch series was integrated into seen via git@73c31f7.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

This patch series was integrated into seen via git@c6a76f4.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

This patch series was integrated into seen via git@f13af5e.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

There was a status update in the "Cooking" section about the branch vd/sparse-read-tree on the Git mailing list:

"git read-tree" has been made to be aware of the sparse-index
feature.

Needs review.
source: <pull.1157.v2.git.1645742073.gitgitgadget@gmail.com>

Enable the 'recursive' diff option for the diff executed as part of 'git
status'. Without the 'recursive' enabled, 'git status' reports index
changes incorrectly when the following conditions were met:

* sparse index is enabled
* there is a difference between the index and HEAD in a file inside a
  *subdirectory* of a sparse directory
* the sparse directory index entry is *not* expanded in-core

Because it is not recursive by default, the diff in 'git status' reports
changes only at the level of files and directories that are immediate
children of a sparse directory, rather than recursing into directories with
changes to identify the modified file(s). As a result, 'git status' reports
the immediate subdirectory itself as "modified".

Example:

$ git init
$ mkdir -p sparse/sub
$ echo test >sparse/sub/foo
$ git add .
$ git commit -m "commit 1"
$ echo somethingelse >sparse/sub/foo
$ git add .
$ git commit -a -m "commit 2"
$ git sparse-checkout set --cone --sparse-index 'sparse'
$ git reset --soft HEAD~1
$ git status
On branch master
You are in a sparse checkout.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   sparse/sub

Enabling the 'recursive' diff option in 'wt_status_collect_changes_index'
corrects this issue by allowing the diff to recurse into subdirectories of
sparse directories to find modified files. Given the same repository setup
as the example above, the corrected result of `git status` is:

$ git status
On branch master
You are in a sparse checkout.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   sparse/sub/foo

Signed-off-by: Victoria Dye <vdye@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

This patch series was integrated into seen via git@c5fc750.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 1, 2022

This patch series was integrated into seen via git@a732655.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 2, 2022

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Mar 1, 2022 at 12:24 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Like previous integrations [1] [2], this series allows 'git read-tree' to
> operate using a sparse index.
>
> The first three patches are bugfixes for issues found while implementing the
> 'read-tree' integration:
>
>  * The first (patch [1/8]) fixes an edge case in which a repo with no
>    in-cone files or directories would have its root collapsed into a sparse
>    directory; the fix ensures the root directory is never collapsed into a
>    sparse directory.
>  * The second (patch [2/8]) corrects the 'git status' reporting of changes
>    nested inside the subdirectory of a sparse directory, ensuring that the
>    modified file (not the subdirectory) is correctly reported as having
>    changes.
>  * The third (patch [3/8]) explicitly disallows the somewhat ambiguous case
>    of using 'git read-tree --prefix=/' to represent the repo root (it could
>    be interpreted as a leading slash - even though it's actually trailing
>    slash - which is not allowed for prefixes). Now, an error specifying that
>    prefixes cannot begin with '/' guides users more directly to the correct
>    syntax. If a user does want to specify the repo root as their prefix,
>    that can still be done with 'git read-tree --prefix='
>
> The remainder of the series focuses on utilizing the sparse index in 'git
> read-tree'. After some baseline behavior-establishing tests (patch [4/8]),
> sparse index usage is trivially enabled (patch [5/8]) for 'read-tree'
> except:
>
>  * usage with '--prefix'
>  * two- and three-way merge
>
> These cases require additional changes in order to work as expected (i.e.,
> outwardly matching non-sparse index sparse-checkout). For the former, the
> sparse index can be enabled as long as the index is expanded when the prefix
> is a directory outside the sparse cone (patch [6/8]). For the latter, sparse
> directories that cannot be trivially merged must have their contents merged
> file-by-file, done by recursively traversing the trees represented by the
> sparse directories (patches [7/8] & [8/8]).
>
>
> Changes since V2
> ================
>
>  * revised/clarified explanation for why the 'recursive' flag is necessary
>    in 'git status' internal diff in both the commit message for [2/8] and
>    the code comments
>  * added patch [3/8] to disallow '/' as a prefix
>  * moved the loop over different 'read-tree' scenarios outside the test in
>    t1092 test 'read-tree --merge with files outside sparse definition',
>    separating the tests into individually-runnable and traceable cases
>  * improved test case clarity and variety in 'read-tree --prefix' tests
>  * removed redundant arg to 'git reset' in 'sparse index is not expanded:
>    read-tree'
>  * pointed out pre-existing testing that covers '--prefix' in [6/8] commit
>    message
>  * moved prefix-dependent index expansion logic into function
>    'update_sparsity_for_prefix', improved explanation for function behavior
>    in comments
>  * pointed out pre-existing testing that covers two- and three-way merge in
>    [7/8] & [8/8] commit messages, respectively
>  * added test cases for "ensure not expanded" tests of read-tree for both
>    two- and three-way merge
>  * fixed issue with three-way merge implementation (if the "no merge" cases
>    described in 't/t1000-read-tree-m-3way.sh' are reached, sparse
>    directories should be recursively merged), testing added in the "ensure
>    not expanded" cases

I've read through all of these.  This round is:

Reviewed-by: Elijah Newren <newren@gmail.com>

>
>
> Changes since V1
> ================
>
>  * switched an empty string check from '!strlen(path)' to the
>    slightly-less-expensive '!*path'
>
> Thanks!
>
>  * Victoria
>
> [1]
> https://lore.kernel.org/git/pull.1109.v2.git.1641924306.gitgitgadget@gmail.com/
> [2]
> https://lore.kernel.org/git/pull.1048.v6.git.1638201164.gitgitgadget@gmail.com/
>
> Victoria Dye (8):
>   sparse-index: prevent repo root from becoming sparse
>   status: fix nested sparse directory diff in sparse index
>   read-tree: explicitly disallow prefixes with a leading '/'
>   read-tree: expand sparse checkout test coverage
>   read-tree: integrate with sparse index
>   read-tree: narrow scope of index expansion for '--prefix'
>   read-tree: make two-way merge sparse-aware
>   read-tree: make three-way merge sparse-aware
>
>  builtin/read-tree.c                      |  14 ++-
>  dir.c                                    |   7 +-
>  t/perf/p2000-sparse-operations.sh        |   1 +
>  t/t1003-read-tree-prefix.sh              |  10 ++
>  t/t1092-sparse-checkout-compatibility.sh | 133 ++++++++++++++++++++
>  unpack-trees.c                           | 147 +++++++++++++++++++++--
>  wt-status.c                              |   9 ++
>  7 files changed, 308 insertions(+), 13 deletions(-)
>
>
> base-commit: e6ebfd0e8cbbd10878070c8a356b5ad1b3ca464e
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1157%2Fvdye%2Fsparse%2Fread-tree-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1157/vdye/sparse/read-tree-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1157
>
> Range-diff vs v2:
>
>  1:  744668eeece = 1:  744668eeece sparse-index: prevent repo root from becoming sparse
>  2:  f0cff03b95d ! 2:  26f4d30bd95 status: fix nested sparse directory diff in sparse index
>      @@ Metadata
>        ## Commit message ##
>           status: fix nested sparse directory diff in sparse index
>
>      -    Add the 'recursive' flag to 'wt_status_collect_changes_index(...)'. Without
>      -    the 'recursive' flag, 'git status' could report index changes incorrectly
>      -    when the following conditions were met:
>      +    Enable the 'recursive' diff option for the diff executed as part of 'git
>      +    status'. Without the 'recursive' enabled, 'git status' reports index
>      +    changes incorrectly when the following conditions were met:
>
>           * sparse index is enabled
>           * there is a difference between the index and HEAD in a file inside a
>             *subdirectory* of a sparse directory
>           * the sparse directory index entry is *not* expanded in-core
>
>      -    In this scenario, 'git status' would not recurse into the sparse directory's
>      -    subdirectories to identify which file contained the difference between the
>      -    index and HEAD. Instead, it would report the immediate subdirectory itself
>      -    as "modified".
>      +    Because it is not recursive by default, the diff in 'git status' reports
>      +    changes only at the level of files and directories that are immediate
>      +    children of a sparse directory, rather than recursing into directories with
>      +    changes to identify the modified file(s). As a result, 'git status' reports
>      +    the immediate subdirectory itself as "modified".
>
>           Example:
>
>      @@ Commit message
>             (use "git restore --staged <file>..." to unstage)
>                   modified:   sparse/sub
>
>      -    The 'recursive' diff option in 'wt_status_collect_changes_index' corrects
>      -    this by indicating that 'git status' should recurse into sparse directories
>      -    to find modified files. Given the same repository setup as the example
>      -    above, the corrected result of `git status` is:
>      +    Enabling the 'recursive' diff option in 'wt_status_collect_changes_index'
>      +    corrects this issue by allowing the diff to recurse into subdirectories of
>      +    sparse directories to find modified files. Given the same repository setup
>      +    as the example above, the corrected result of `git status` is:
>
>           $ git status
>           On branch master
>      @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
>         rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>       +
>       + /*
>      -+  * The `recursive` option must be enabled to show differences in files
>      -+  * *more than* one level deep in a sparse directory index entry (e.g., given
>      -+  * sparse directory 'sparse-dir/', reporting a difference in the file
>      -+  * 'sparse-dir/another-dir/my-file').
>      ++  * The `recursive` option must be enabled to allow the diff to recurse
>      ++  * into subdirectories of sparse directory index entries. If it is not
>      ++  * enabled, a subdirectory containing file(s) with changes is reported
>      ++  * as "modified", rather than the modified files themselves.
>       +  */
>       + rev.diffopt.flags.recursive = 1;
>       +
>  -:  ----------- > 3:  e48a281a4d3 read-tree: explicitly disallow prefixes with a leading '/'
>  3:  ffe0b6aff2b ! 4:  90ebcb7b8ff read-tree: expand sparse checkout test coverage
>      @@ Commit message
>           emphasis is placed on interactions with files outside the sparse cone, e.g.
>           merges with out-of-cone conflicts.
>
>      +    Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>           Signed-off-by: Victoria Dye <vdye@github.com>
>
>        ## t/perf/p2000-sparse-operations.sh ##
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'update-index --ca
>         test_cmp expect sparse-checkout-out
>        '
>
>      -+test_expect_success 'read-tree --merge with files outside sparse definition' '
>      -+ init_repos &&
>      -+
>      -+ test_all_match git checkout -b test-branch update-folder1 &&
>      -+ for MERGE_TREES in "base HEAD update-folder2" \
>      -+                    "update-folder1 update-folder2" \
>      -+                    "update-folder2"
>      -+ do
>      -+         # Clean up and remove on-disk files
>      -+         test_all_match git reset --hard HEAD &&
>      -+         test_sparse_match git sparse-checkout reapply &&
>      ++for MERGE_TREES in "base HEAD update-folder2" \
>      ++            "update-folder1 update-folder2" \
>      ++            "update-folder2"
>      ++do
>      ++ test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
>      ++         init_repos &&
>       +
>       +         # Although the index matches, without --no-sparse-checkout, outside-of-
>       +         # definition files will not exist on disk for sparse checkouts
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'update-index --ca
>       +         test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>       +         test_all_match git status --porcelain=v2 &&
>       +         test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>      -+         test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>      -+ done
>      -+'
>      ++         test_cmp sparse-checkout/folder2/a full-checkout/folder2/a
>      ++
>      ++ '
>      ++done
>       +
>       +test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
>       + init_repos &&
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'update-index --ca
>       + test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
>       +'
>       +
>      -+test_expect_success 'read-tree --prefix outside sparse definition' '
>      ++test_expect_success 'read-tree --prefix' '
>       + init_repos &&
>       +
>      -+ # Cannot read-tree --prefix with a single argument when files exist within
>      -+ # prefix
>      -+ test_all_match test_must_fail git read-tree --prefix=folder1/ -u update-folder1 &&
>      ++ # If files differing between the index and target <commit-ish> exist
>      ++ # inside the prefix, `read-tree --prefix` should fail
>      ++ test_all_match test_must_fail git read-tree --prefix=deep/ deepest &&
>      ++ test_all_match test_must_fail git read-tree --prefix=folder1/ update-folder1 &&
>       +
>      -+ test_all_match git read-tree --prefix=folder2/0 -u rename-base &&
>      -+ test_path_is_missing sparse-checkout/folder2 &&
>      -+ test_path_is_missing sparse-index/folder2 &&
>      ++ # If no differing index entries exist matching the prefix,
>      ++ # `read-tree --prefix` updates the index successfully
>      ++ test_all_match git rm -rf deep/deeper1/deepest/ &&
>      ++ test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest &&
>      ++ test_all_match git status --porcelain=v2 &&
>      ++
>      ++ test_all_match git rm -rf --sparse folder1/ &&
>      ++ test_all_match git read-tree --prefix=folder1/ -u update-folder1 &&
>      ++ test_all_match git status --porcelain=v2 &&
>       +
>      -+ test_all_match git read-tree --reset -u HEAD &&
>      -+ test_all_match git read-tree --prefix=folder2/0 -u --no-sparse-checkout rename-base &&
>      -+ test_cmp sparse-checkout/folder2/0/a sparse-index/folder2/0/a &&
>      -+ test_cmp sparse-checkout/folder2/0/a full-checkout/folder2/0/a
>      ++ test_all_match git rm -rf --sparse folder2/0 &&
>      ++ test_all_match git read-tree --prefix=folder2/0/ -u rename-out-to-out &&
>      ++ test_all_match git status --porcelain=v2
>       +'
>       +
>       +test_expect_success 'read-tree --merge with directory-file conflicts' '
>  4:  cb7e0cf419c ! 5:  015618a9f29 read-tree: integrate with sparse index
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is n
>       + for MERGE_TREES in "update-folder2"
>       + do
>       +         ensure_not_expanded read-tree -mu $MERGE_TREES &&
>      -+         ensure_not_expanded reset --hard HEAD || return 1
>      ++         ensure_not_expanded reset --hard || return 1
>       + done
>       +'
>       +
>  5:  4f05fa70209 ! 6:  1a9365c3bc5 read-tree: narrow scope of index expansion for '--prefix'
>      @@ Commit message
>           If the prefix is in-cone, its sparse subdirectories (if any) will be
>           traversed correctly without index expansion.
>
>      +    The behavior of 'git read-tree' with prefixes 1) inside of cone, 2) equal to
>      +    a sparse directory, and 3) inside a sparse directory are all tested as part
>      +    of the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --prefix',
>      +    ensuring that the sparse index case works the way it did prior to this
>      +    change as well as matching non-sparse index sparse-checkout.
>      +
>      +    Helped-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Victoria Dye <vdye@github.com>
>
>        ## builtin/read-tree.c ##
>      @@ t/t1092-sparse-checkout-compatibility.sh
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is not expanded: read-tree' '
>         do
>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
>      -          ensure_not_expanded reset --hard HEAD || return 1
>      +          ensure_not_expanded reset --hard || return 1
>       - done
>       + done &&
>       +
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is n
>        test_expect_success 'ls-files' '
>
>        ## unpack-trees.c ##
>      -@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>      -          setup_standard_excludes(o->dir);
>      -  }
>      +@@ unpack-trees.c: static void populate_from_existing_patterns(struct unpack_trees_options *o,
>      +          o->pl = pl;
>      + }
>
>      -+ /*
>      -+  * If the prefix is equal to or contained within a sparse directory, the
>      -+  * index needs to be expanded to traverse with the specified prefix.
>      -+  */
>      -+ if (o->prefix && o->src_index->sparse_index) {
>      -+         int prefix_len = strlen(o->prefix);
>      ++static void update_sparsity_for_prefix(const char *prefix,
>      ++                                struct index_state *istate)
>      ++{
>      ++ int prefix_len = strlen(prefix);
>      ++ struct strbuf ce_prefix = STRBUF_INIT;
>       +
>      -+         while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
>      -+                 prefix_len--;
>      ++ if (!istate->sparse_index)
>      ++         return;
>       +
>      -+         if (prefix_len > 0) {
>      -+                 struct strbuf ce_prefix = STRBUF_INIT;
>      -+                 strbuf_grow(&ce_prefix, prefix_len + 1);
>      -+                 strbuf_add(&ce_prefix, o->prefix, prefix_len);
>      -+                 strbuf_addch(&ce_prefix, '/');
>      ++ while (prefix_len > 0 && prefix[prefix_len - 1] == '/')
>      ++         prefix_len--;
>       +
>      -+                 /*
>      -+                  * If the prefix is not inside the sparse cone, then the
>      -+                  * index is explicitly expanded if it is found as a sparse
>      -+                  * directory, or implicitly expanded (by 'index_name_pos')
>      -+                  * if the path is inside a sparse directory.
>      -+                  */
>      -+                 if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
>      -+                     index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
>      -+                         ensure_full_index(o->src_index);
>      ++ if (prefix_len <= 0)
>      ++         BUG("Invalid prefix passed to update_sparsity_for_prefix");
>       +
>      -+                 strbuf_release(&ce_prefix);
>      -+         }
>      -+ }
>      ++ strbuf_grow(&ce_prefix, prefix_len + 1);
>      ++ strbuf_add(&ce_prefix, prefix, prefix_len);
>      ++ strbuf_addch(&ce_prefix, '/');
>      ++
>      ++ /*
>      ++  * If the prefix points to a sparse directory or a path inside a sparse
>      ++  * directory, the index should be expanded. This is accomplished in one
>      ++  * of two ways:
>      ++  * - if the prefix is inside a sparse directory, it will be expanded by
>      ++  *   the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
>      ++  * - if the prefix matches an existing sparse directory entry,
>      ++  *   'index_name_pos(...)' will return its index position, triggering
>      ++  *   the 'ensure_full_index(...)' below.
>      ++  */
>      ++ if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
>      ++     index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
>      ++         ensure_full_index(istate);
>      ++
>      ++ strbuf_release(&ce_prefix);
>      ++}
>      +
>      + static int verify_absent(const struct cache_entry *,
>      +                   enum unpack_trees_error_types,
>      +@@ unpack-trees.c: int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>      +          setup_standard_excludes(o->dir);
>      +  }
>      +
>      ++ if (o->prefix)
>      ++         update_sparsity_for_prefix(o->prefix, o->src_index);
>       +
>         if (!core_apply_sparse_checkout || !o->update)
>                 o->skip_sparse_checkout = 1;
>  6:  94c2aad2f93 ! 7:  71bec3ddad1 read-tree: make two-way merge sparse-aware
>      @@ Commit message
>           This process allows sparse directories to be individually merged at the
>           necessary depth *without* expanding a full index.
>
>      +    The 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with
>      +    edit/edit conflicts in sparse directories' tests two-way merges with 1)
>      +    changes inside sparse directories that do not conflict and 2) changes that
>      +    do conflict (with the correct file(s) reported in the error message).
>      +    Additionally, add two-way merge cases to 'sparse index is not expanded:
>      +    read-tree' to confirm that the index is not expanded regardless of whether
>      +    edit/edit conflicts are present in a sparse directory.
>      +
>           Signed-off-by: Victoria Dye <vdye@github.com>
>
>        ## builtin/read-tree.c ##
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is n
>
>         ensure_not_expanded checkout -b test-branch update-folder1 &&
>       - for MERGE_TREES in "update-folder2"
>      -+ for MERGE_TREES in "update-folder2" \
>      -+                    "base update-folder2"
>      ++ for MERGE_TREES in "base update-folder2" \
>      ++                    "base rename-base" \
>      ++                    "update-folder2"
>         do
>                 ensure_not_expanded read-tree -mu $MERGE_TREES &&
>      -          ensure_not_expanded reset --hard HEAD || return 1
>      +          ensure_not_expanded reset --hard || return 1
>
>        ## unpack-trees.c ##
>       @@ unpack-trees.c: static int is_sparse_directory_entry(struct cache_entry *ce,
>  7:  c4080e99d6e ! 8:  6bc11325dd1 read-tree: make three-way merge sparse-aware
>      @@ Commit message
>           merge, the contents of each conflicted sparse directory are merged without
>           referencing the index, avoiding sparse index expansion.
>
>      +    As with two-way merge, the 't/t1092-sparse-checkout-compatibility.sh' test
>      +    'read-tree --merge with edit/edit conflicts in sparse directories' confirms
>      +    that three-way merges with edit/edit changes (both with and without
>      +    conflicts) inside a sparse directory result in the correct index state or
>      +    error message. To ensure the index is not unnecessarily expanded, add
>      +    three-way merge cases to 'sparse index is not expanded: read-tree'.
>      +
>           Signed-off-by: Victoria Dye <vdye@github.com>
>
>        ## builtin/read-tree.c ##
>      @@ builtin/read-tree.c: int cmd_read_tree(int argc, const char **argv, const char *
>
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is not expanded: read-tree' '
>      +  init_repos &&
>
>         ensure_not_expanded checkout -b test-branch update-folder1 &&
>      -  for MERGE_TREES in "update-folder2" \
>      --                    "base update-folder2"
>      +- for MERGE_TREES in "base update-folder2" \
>      ++ for MERGE_TREES in "base HEAD update-folder2" \
>      ++                    "base HEAD rename-base" \
>       +                    "base update-folder2" \
>      -+                    "base HEAD update-folder2"
>      +                     "base rename-base" \
>      +                     "update-folder2"
>         do
>      -          ensure_not_expanded read-tree -mu $MERGE_TREES &&
>      -          ensure_not_expanded reset --hard HEAD || return 1
>
>        ## unpack-trees.c ##
>       @@ unpack-trees.c: int threeway_merge(const struct cache_entry * const *stages,
>      @@ unpack-trees.c: int threeway_merge(const struct cache_entry * const *stages,
>
>         if (head) {
>                 /* #5ALT, #15 */
>      +@@ unpack-trees.c: int threeway_merge(const struct cache_entry * const *stages,
>      +
>      +  }
>      +
>      +- /* Below are "no merge" cases, which require that the index be
>      +-  * up-to-date to avoid the files getting overwritten with
>      +-  * conflict resolution files.
>      +-  */
>      ++ /* Handle "no merge" cases (see t/t1000-read-tree-m-3way.sh) */
>      +  if (index) {
>      ++         /*
>      ++          * If we've reached the "no merge" cases and we're merging
>      ++          * a sparse directory, we may have an "edit/edit" conflict that
>      ++          * can be resolved by individually merging directory contents.
>      ++          */
>      ++         if (S_ISSPARSEDIR(index->ce_mode))
>      ++                 return merged_sparse_dir(stages, 4, o);
>      ++
>      ++         /*
>      ++          * If we're not merging a sparse directory, ensure the index is
>      ++          * up-to-date to avoid files getting overwritten with conflict
>      ++          * resolution files
>      ++          */
>      +          if (verify_uptodate(index, o))
>      +                  return -1;
>      +  }
>
> --
> gitgitgadget

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 2, 2022

This patch series was integrated into seen via git@64a4615.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 2, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/2/2022 2:22 AM, Elijah Newren wrote:
> On Tue, Mar 1, 2022 at 12:24 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

> I've read through all of these.  This round is:
> 
> Reviewed-by: Elijah Newren <newren@gmail.com>

Thanks to your and Ævar's extra eyes, this series has
only improved since I reviewed v1.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 3, 2022

This patch series was integrated into seen via git@ffe9d23.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 3, 2022

This patch series was integrated into seen via git@d1d8afc.

@@ -160,15 +160,22 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
argc = parse_options(argc, argv, cmd_prefix, read_tree_options,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
> +static void update_sparsity_for_prefix(const char *prefix,
> +				       struct index_state *istate)
> +{
> +	int prefix_len = strlen(prefix);
> +	struct strbuf ce_prefix = STRBUF_INIT;
> +
> +	if (!istate->sparse_index)
> +		return;
> +
> +	while (prefix_len > 0 && prefix[prefix_len - 1] == '/')
> +		prefix_len--;
> +
> +	if (prefix_len <= 0)
> +		BUG("Invalid prefix passed to update_sparsity_for_prefix");
> +
> +	strbuf_grow(&ce_prefix, prefix_len + 1);
> +	strbuf_add(&ce_prefix, prefix, prefix_len);
> +	strbuf_addch(&ce_prefix, '/');
> +
> +	/*
> +	 * If the prefix points to a sparse directory or a path inside a sparse
> +	 * directory, the index should be expanded. This is accomplished in one
> +	 * of two ways:
> +	 * - if the prefix is inside a sparse directory, it will be expanded by
> +	 *   the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
> +	 * - if the prefix matches an existing sparse directory entry,
> +	 *   'index_name_pos(...)' will return its index position, triggering
> +	 *   the 'ensure_full_index(...)' below.
> +	 */
> +	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
> +	    index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
> +		ensure_full_index(istate);
> +
> +	strbuf_release(&ce_prefix);
> +}

Hm, I don't think I follow the rationale for having two different ways
of expanding the index:

- If the prefix is inside a sparse directory, we should expand the
  index.
- If the prefix matches a sparse directory entry, we should expand the
  index.

So it seems like distinguishing between the two cases with
index_name_pos(...) isn't necessary. I've attached a diff that does
exactly this, and it passes t1092-sparse-checkout-compatibility.sh as
far as I can tell. I've also amended the comment in a way that makes
more sense to me, but I'm not 100% sure if it's accurate.

I'm also a little averse to using a side effect of index_name_pos() to
achieve what we really want, so I'd prefer to get rid of the call if we
can :)

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

diff --git a/unpack-trees.c b/unpack-trees.c
index b876caca0d..5b07055605 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1749,17 +1749,11 @@ static void update_sparsity_for_prefix(const char *prefix,
 	strbuf_addch(&ce_prefix, '/');
 
 	/*
-	 * If the prefix points to a sparse directory or a path inside a sparse
-	 * directory, the index should be expanded. This is accomplished in one
-	 * of two ways:
-	 * - if the prefix is inside a sparse directory, it will be expanded by
-	 *   the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
-	 * - if the prefix matches an existing sparse directory entry,
-	 *   'index_name_pos(...)' will return its index position, triggering
-	 *   the 'ensure_full_index(...)' below.
+	 * If the prefix points to a sparse directory or a path inside a
+	 * sparse directory (not within the sparse patterns), the index
+	 * should be expanded.
 	 */
-	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
-	    index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
+	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate))
 		ensure_full_index(istate);
 
 	strbuf_release(&ce_prefix);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Glen Choo wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>> +static void update_sparsity_for_prefix(const char *prefix,
>> +				       struct index_state *istate)
>> +{
>> +	int prefix_len = strlen(prefix);
>> +	struct strbuf ce_prefix = STRBUF_INIT;
>> +
>> +	if (!istate->sparse_index)
>> +		return;
>> +
>> +	while (prefix_len > 0 && prefix[prefix_len - 1] == '/')
>> +		prefix_len--;
>> +
>> +	if (prefix_len <= 0)
>> +		BUG("Invalid prefix passed to update_sparsity_for_prefix");
>> +
>> +	strbuf_grow(&ce_prefix, prefix_len + 1);
>> +	strbuf_add(&ce_prefix, prefix, prefix_len);
>> +	strbuf_addch(&ce_prefix, '/');
>> +
>> +	/*
>> +	 * If the prefix points to a sparse directory or a path inside a sparse
>> +	 * directory, the index should be expanded. This is accomplished in one
>> +	 * of two ways:
>> +	 * - if the prefix is inside a sparse directory, it will be expanded by
>> +	 *   the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
>> +	 * - if the prefix matches an existing sparse directory entry,
>> +	 *   'index_name_pos(...)' will return its index position, triggering
>> +	 *   the 'ensure_full_index(...)' below.
>> +	 */
>> +	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
>> +	    index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
>> +		ensure_full_index(istate);
>> +
>> +	strbuf_release(&ce_prefix);
>> +}
> 
> Hm, I don't think I follow the rationale for having two different ways
> of expanding the index:
> 
> - If the prefix is inside a sparse directory, we should expand the
>   index.
> - If the prefix matches a sparse directory entry, we should expand the
>   index.
> 

There's an (admittedly subtle if you aren't familiar with sparse index)
distinction between a "sparse directory" index entry and "a directory
outside the sparse-checkout cone with SKIP_WORKTREE enabled on its entries".
Ultimately, that's what necessitates the two checks but, as in [1], I want
to use this as an opportunity to shed some light on what 'unpack_trees(...)'
does.

Taking a step back, why would index expansion matter when you pass a prefix
to 'read-tree'? The answer lies in the tree traversal at the core of
'unpack_trees(...)'; when a prefix is provided, 'unpack_trees' does the
following:

1. Set the "traversal prefix" to the user-provided prefix.
2. Call 'traverse_trees(...)', iterating through the child files/directories
   of the prefix directory (NOTE: this does *not* use the index - it finds
   the child entries by extracting them from the tree).
3. For each child, call 'unpack_callback' which, among other things, looks
   for the child entry as it exists in the index to merge the input trees
   into it.

The problem with sparse directories arises in step 3. Suppose you have the
following repo:

.
├── bar
│   └── f1
├── baz
│   ├── deep
│   │   └── a
│   └── f2
├── foo
└── foo1


where directory 'baz/' is sparse, leading to an index that looks like this
(per 'git ls-files -t --sparse'):

H bar/f1
S baz/
H foo
H foo1

If you provide 'baz/' as a prefix, 'unpack_callback(...)' tries to find
'baz/deep/a' and 'baz/f2' in the index, but they won't be found because
they're "wrapped" in sparse directory 'baz/'. Ultimately, this leads to a
corrupted index with duplicate 'baz/' contents merged in:

H bar/f1
S baz/
S baz/deep/a
S baz/f2
H foo
H foo1

This explains why you need to expand the index at all, but not why you need
to check that the prefix is not in the sparse cone *and* that it (or its
parent directory) doesn't exist in the index. For that, an important bit of
context is that you can have a sparse index with non-sparse directories
outside the cone. This can happen in a number of ways (for example, running
'git update-index --no-skip-worktree' on a file in a sparse directory), but
the important thing is that it is a completely valid state and not entirely
uncommon. 

Using our example above, suppose 'baz/' is partially expanded in the index,
with the following index contents:

H bar/f1
S baz/deep/
S baz/f2
H foo
H foo1

If we use the prefix 'baz/' here, we actually traverse the trees properly:
'baz/deep/' and 'baz/f2' will be found and merged - no index expansion
needed! But if we only checked '!path_in_cone_mode_sparse_checkout(...)', we
would have expanded the index because 'baz/' is outside the sparse cone. 

This presents a problem because index expansion is *extremely* expensive -
we should avoid it whenever possible. That's where checking
'index_name_pos(...)' comes in: if the directory is in the index as a sparse
directory, the position is '>= 0' and 'ensure_full_index(...)' is called; if
the directory is inside an existing sparse directory, the position will be
'< 0' but the index will be expanded implicitly. In every other case, we
avoid expanding the index and proceed with the merge as normal.

Hope this helps!

[1] https://lore.kernel.org/git/dc47f12b-8724-22ef-ed2c-096badfafd76@github.com/

> So it seems like distinguishing between the two cases with
> index_name_pos(...) isn't necessary. I've attached a diff that does
> exactly this, and it passes t1092-sparse-checkout-compatibility.sh as
> far as I can tell. I've also amended the comment in a way that makes
> more sense to me, but I'm not 100% sure if it's accurate.
> 
> I'm also a little averse to using a side effect of index_name_pos() to
> achieve what we really want, so I'd prefer to get rid of the call if we
> can :)
> 
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index b876caca0d..5b07055605 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1749,17 +1749,11 @@ static void update_sparsity_for_prefix(const char *prefix,
>  	strbuf_addch(&ce_prefix, '/');
>  
>  	/*
> -	 * If the prefix points to a sparse directory or a path inside a sparse
> -	 * directory, the index should be expanded. This is accomplished in one
> -	 * of two ways:
> -	 * - if the prefix is inside a sparse directory, it will be expanded by
> -	 *   the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
> -	 * - if the prefix matches an existing sparse directory entry,
> -	 *   'index_name_pos(...)' will return its index position, triggering
> -	 *   the 'ensure_full_index(...)' below.
> +	 * If the prefix points to a sparse directory or a path inside a
> +	 * sparse directory (not within the sparse patterns), the index
> +	 * should be expanded.
>  	 */
> -	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
> -	    index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
> +	if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate))
>  		ensure_full_index(istate);
>  
>  	strbuf_release(&ce_prefix);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

Victoria Dye <vdye@github.com> writes:
> There's an (admittedly subtle if you aren't familiar with sparse index)
> distinction between a "sparse directory" index entry and "a directory
> outside the sparse-checkout cone with SKIP_WORKTREE enabled on its entries".
> Ultimately, that's what necessitates the two checks but, as in [1], I want
> to use this as an opportunity to shed some light on what 'unpack_trees(...)'
> does.

Thanks! This explanation was really helpful.

> Using our example above, suppose 'baz/' is partially expanded in the index,
> with the following index contents:
>
> H bar/f1
> S baz/deep/
> S baz/f2
> H foo
> H foo1
>
> If we use the prefix 'baz/' here, we actually traverse the trees properly:
> 'baz/deep/' and 'baz/f2' will be found and merged - no index expansion
> needed! But if we only checked '!path_in_cone_mode_sparse_checkout(...)', we
> would have expanded the index because 'baz/' is outside the sparse cone. 

In particular, I didn't consider that a directory outside of the
sparse-checkout cone could be partially expanded. This seems to be the
crux of it, which is that even if the path is outside of the
sparse-checkout clone, we can still get correct behavior (without
expanding the index) if its entries are expanded...

> This presents a problem because index expansion is *extremely* expensive -
> we should avoid it whenever possible. That's where checking
> 'index_name_pos(...)' comes in: if the directory is in the index as a sparse
> directory, the position is '>= 0' and 'ensure_full_index(...)' is called; if
> the directory is inside an existing sparse directory, the position will be
> '< 0' but the index will be expanded implicitly. In every other case, we
> avoid expanding the index and proceed with the merge as normal.

and because of this, we don't always need to expand the index when the
path is outside of the cone, so my suggested patch expands the index in
too many cases.

What I also didn't consider is that index_name_pos() doesn't _always_
expand the index, it only expands the index when the directory is
inside a sparse directory entry.

So the side-effect of index_name_pos() is actually _exactly_ what we
want. Granted, it would be clearer if we had a function that did _only_
'expand if path is inside a sparse directory entry', but I suppose it's
overkill.

(Purely optional suggestion) I wonder if we could add a test that can
distinguish between 'always expand if --prefix is outside of the cone'
vs 'expand only if path is outside of cone AND inside a sparse directory
entry'. The scenario you described sounds perfect as a test case, though
I don't know how feasible it is to set up.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 3, 2022

User Glen Choo <chooglen@google.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 4, 2022

This patch series was integrated into seen via git@bd501c1.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 4, 2022

There was a status update in the "Cooking" section about the branch vd/sparse-read-tree on the Git mailing list:

"git read-tree" has been made to be aware of the sparse-index
feature.

Will merge to 'next'.
source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 4, 2022

This patch series was integrated into seen via git@166f3c8.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 7, 2022

This patch series was integrated into seen via git@8a8b71b.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 8, 2022

There was a status update in the "Cooking" section about the branch vd/sparse-read-tree on the Git mailing list:

"git read-tree" has been made to be aware of the sparse-index
feature.

Will merge to 'next'.
source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 8, 2022

This patch series was integrated into seen via git@1df3b59.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 8, 2022

This patch series was integrated into next via git@037c187.

@gitgitgadget gitgitgadget bot added the next label Mar 8, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Mar 9, 2022

This patch series was integrated into seen via git@4493ed7.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 11, 2022

This patch series was integrated into seen via git@1dff3d0.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 13, 2022

This patch series was integrated into seen via git@bf7da9f.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 14, 2022

This patch series was integrated into seen via git@b01664a.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 14, 2022

There was a status update in the "Cooking" section about the branch vd/sparse-read-tree on the Git mailing list:

"git read-tree" has been made to be aware of the sparse-index
feature.

Will merge to 'master'.
source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 15, 2022

This patch series was integrated into seen via git@e4ca37f.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 16, 2022

This patch series was integrated into seen via git@d0a6b79.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2022

This patch series was integrated into seen via git@190f9bf.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2022

This patch series was integrated into master via git@190f9bf.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2022

This patch series was integrated into next via git@190f9bf.

@gitgitgadget gitgitgadget bot added the master label Mar 17, 2022
@gitgitgadget gitgitgadget bot closed this Mar 17, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2022

Closed via 190f9bf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant