New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse index: integrate with 'read-tree' #1157
Conversation
There are issues in commit e30124b: |
There are issues in commit e894f3c: |
/submit |
Submitted as pull.1157.git.1645640717.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -1463,10 +1463,11 @@ static int path_in_sparse_checkout_1(const char *path, | |||
const char *end, *slash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 2/23/2022 1:25 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
Aside: It looks like you have the same "mailto:" email addresses in the
CC similar to a mistake I made. The root cause of my issue was that I
copy-pasted from the rendered description of another PR instead of
copying the plaintext available by clicking the "Edit" button. Perhaps
this is worth extending GGG to prevent this issue in the future.
> Prevent the repository root from being collapsed into a sparse directory by
> treating an empty path as "inside the sparse-checkout". When collapsing a
> sparse index (e.g. in 'git sparse-checkout reapply'), the root directory
> typically could not become a sparse directory due to the presence of in-cone
> root-level files and directories. However, if no such in-cone files or
> directories were present, there was no explicit check signaling that the
> "repository root path" (an empty string, in the case of
> 'convert_to_sparse(...)') was in-cone, and a sparse directory index entry
> would be created from the repository root directory.
>
> The documentation in Documentation/git-sparse-checkout.txt explicitly states
> that the files in the root directory are expected to be in-cone for a
> cone-mode sparse-checkout. Collapsing the root into a sparse directory entry
> violates that assumption, as sparse directory entries are expected to be
> outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state
> in turn causes issues with commands that interact with the index, e.g.
> 'git status'.
>
> Treating an empty (root) path as in-cone prevents the creation of a root
> sparse directory in 'convert_to_sparse(...)'. Because the repository root is
> otherwise never compared with sparse patterns (in both cone-mode and
> non-cone sparse-checkouts), the new check does not cause additional changes
> to how sparse patterns are applied.
Good catch! I agree with everything said here.
> + if (!strlen(path) ||
Instead of a full strlen(), could we just check "if (!*path ||" to
look for the nul byte?
Thanks,
-Stolee
User |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
Prevent the repository root from being collapsed into a sparse directory by treating an empty path as "inside the sparse-checkout". When collapsing a sparse index (e.g. in 'git sparse-checkout reapply'), the root directory typically could not become a sparse directory due to the presence of in-cone root-level files and directories. However, if no such in-cone files or directories were present, there was no explicit check signaling that the "repository root path" (an empty string, in the case of 'convert_to_sparse(...)') was in-cone, and a sparse directory index entry would be created from the repository root directory. The documentation in Documentation/git-sparse-checkout.txt explicitly states that the files in the root directory are expected to be in-cone for a cone-mode sparse-checkout. Collapsing the root into a sparse directory entry violates that assumption, as sparse directory entries are expected to be outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state in turn causes issues with commands that interact with the index, e.g. 'git status'. Treating an empty (root) path as in-cone prevents the creation of a root sparse directory in 'convert_to_sparse(...)'. Because the repository root is otherwise never compared with sparse patterns (in both cone-mode and non-cone sparse-checkouts), the new check does not cause additional changes to how sparse patterns are applied. Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com>
/submit |
Submitted as pull.1157.v2.git.1645742073.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
This branch is now known as |
This patch series was integrated into seen via git@26734f4. |
@@ -244,6 +244,24 @@ test_expect_success 'expanded in-memory index matches full index' ' | |||
test_sparse_match git ls-files --stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add the 'recursive' flag to 'wt_status_collect_changes_index(...)'. Without
Perhaps "Set the 'recursive' diff option flag in
'wt_status_collect_changes_index(...)'" ? There's no function
argument named 'recursive' in wt_status_collect_changes_index() before
or after your changes, which is what the wording led me to think of.
> the 'recursive' flag, 'git status' could report index changes incorrectly
> when the following conditions were met:
>
> * sparse index is enabled
> * there is a difference between the index and HEAD in a file inside a
> *subdirectory* of a sparse directory
> * the sparse directory index entry is *not* expanded in-core
>
> In this scenario, 'git status' would not recurse into the sparse directory's
> subdirectories to identify which file contained the difference between the
> index and HEAD. Instead, it would report the immediate subdirectory itself
> as "modified".
>
> Example:
>
> $ git init
> $ mkdir -p sparse/sub
> $ echo test >sparse/sub/foo
> $ git add .
> $ git commit -m "commit 1"
> $ echo somethingelse >sparse/sub/foo
> $ git add .
> $ git commit -a -m "commit 2"
> $ git sparse-checkout set --cone --sparse-index 'sparse'
> $ git reset --soft HEAD~1
> $ git status
> On branch master
> You are in a sparse checkout.
>
> Changes to be committed:
> (use "git restore --staged <file>..." to unstage)
> modified: sparse/sub
>
> The 'recursive' diff option in 'wt_status_collect_changes_index' corrects
> this by indicating that 'git status' should recurse into sparse directories
> to find modified files. Given the same repository setup as the example
> above, the corrected result of `git status` is:
>
> $ git status
> On branch master
> You are in a sparse checkout.
>
> Changes to be committed:
> (use "git restore --staged <file>..." to unstage)
> modified: sparse/sub/foo
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> t/t1092-sparse-checkout-compatibility.sh | 7 +++++++
> wt-status.c | 9 +++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 9ef7cd80885..b1dcaa0e642 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -278,6 +278,13 @@ test_expect_success 'status with options' '
> test_all_match git status --porcelain=v2 -uno
> '
>
> +test_expect_success 'status with diff in unexpanded sparse directory' '
> + init_repos &&
> + test_all_match git checkout rename-base &&
> + test_all_match git reset --soft rename-out-to-out &&
> + test_all_match git status --porcelain=v2
> +'
> +
> test_expect_success 'status reports sparse-checkout' '
> init_repos &&
> git -C sparse-checkout status >full &&
> diff --git a/wt-status.c b/wt-status.c
> index 335e723a71e..4a5b9beeca1 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -651,6 +651,15 @@ static void wt_status_collect_changes_index(struct wt_status *s)
> rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
> rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
> rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
> +
> + /*
> + * The `recursive` option must be enabled to show differences in files
> + * *more than* one level deep in a sparse directory index entry (e.g., given
> + * sparse directory 'sparse-dir/', reporting a difference in the file
> + * 'sparse-dir/another-dir/my-file').
> + */
> + rev.diffopt.flags.recursive = 1;
Kind of clever, and makes sense.
I'm wondering if there's an alternate wording that might be helpful
here or in the commit message, that instead of just saying the
'recursive' option is necessary, perhaps says a little bit about why
it helps. In particular, the diff machinery, by default, is not
recursive and stops at comparing the first level of trees. (See e.g.
the -r option in diff-tree, it's just that it's turned on by default
in 'git diff' and by the -p option in 'git log'.) I'm guessing the
recursive option never needed to be turned on previously within
wt-status, due to something about the nature of the index only holding
files previously. Now, however, the sparse index changes that. (And
it also suggests that perhaps we should look to see if other commands
run the diff machinery without the recursive flag, and see if they
need it now due to sparse indices.)
Granted, I'm not totally sure how to work these facts in (in part
because I don't know how comparison to the index normally avoids the
need for the recursive flag), and maybe what you have is fine. Just
thought I'd point it out since I wasn't aware of the non-recursive
nature of the diff machinery until I started doing things with
diff-tree.
> +
> copy_pathspec(&rev.prune_data, &s->pathspec);
> run_diff_index(&rev, 1);
> object_array_clear(&rev.pending);
> --
> gitgitgadget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add the 'recursive' flag to 'wt_status_collect_changes_index(...)'. Without
>
> Perhaps "Set the 'recursive' diff option flag in
> 'wt_status_collect_changes_index(...)'" ? There's no function
> argument named 'recursive' in wt_status_collect_changes_index() before
> or after your changes, which is what the wording led me to think of.
>
Re-reading that now, I agree. Will fix!
>> the 'recursive' flag, 'git status' could report index changes incorrectly
>> when the following conditions were met:
>>
>> * sparse index is enabled
>> * there is a difference between the index and HEAD in a file inside a
>> *subdirectory* of a sparse directory
>> * the sparse directory index entry is *not* expanded in-core
>>
>> In this scenario, 'git status' would not recurse into the sparse directory's
>> subdirectories to identify which file contained the difference between the
>> index and HEAD. Instead, it would report the immediate subdirectory itself
>> as "modified".
>>
>> Example:
>>
>> $ git init
>> $ mkdir -p sparse/sub
>> $ echo test >sparse/sub/foo
>> $ git add .
>> $ git commit -m "commit 1"
>> $ echo somethingelse >sparse/sub/foo
>> $ git add .
>> $ git commit -a -m "commit 2"
>> $ git sparse-checkout set --cone --sparse-index 'sparse'
>> $ git reset --soft HEAD~1
>> $ git status
>> On branch master
>> You are in a sparse checkout.
>>
>> Changes to be committed:
>> (use "git restore --staged <file>..." to unstage)
>> modified: sparse/sub
>>
>> The 'recursive' diff option in 'wt_status_collect_changes_index' corrects
>> this by indicating that 'git status' should recurse into sparse directories
>> to find modified files. Given the same repository setup as the example
>> above, the corrected result of `git status` is:
>>
>> $ git status
>> On branch master
>> You are in a sparse checkout.
>>
>> Changes to be committed:
>> (use "git restore --staged <file>..." to unstage)
>> modified: sparse/sub/foo
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> t/t1092-sparse-checkout-compatibility.sh | 7 +++++++
>> wt-status.c | 9 +++++++++
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 9ef7cd80885..b1dcaa0e642 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -278,6 +278,13 @@ test_expect_success 'status with options' '
>> test_all_match git status --porcelain=v2 -uno
>> '
>>
>> +test_expect_success 'status with diff in unexpanded sparse directory' '
>> + init_repos &&
>> + test_all_match git checkout rename-base &&
>> + test_all_match git reset --soft rename-out-to-out &&
>> + test_all_match git status --porcelain=v2
>> +'
>> +
>> test_expect_success 'status reports sparse-checkout' '
>> init_repos &&
>> git -C sparse-checkout status >full &&
>> diff --git a/wt-status.c b/wt-status.c
>> index 335e723a71e..4a5b9beeca1 100644
>> --- a/wt-status.c
>> +++ b/wt-status.c
>> @@ -651,6 +651,15 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>> rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>> rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>> rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>> +
>> + /*
>> + * The `recursive` option must be enabled to show differences in files
>> + * *more than* one level deep in a sparse directory index entry (e.g., given
>> + * sparse directory 'sparse-dir/', reporting a difference in the file
>> + * 'sparse-dir/another-dir/my-file').
>> + */
>> + rev.diffopt.flags.recursive = 1;
>
> Kind of clever, and makes sense.
>
> I'm wondering if there's an alternate wording that might be helpful
> here or in the commit message, that instead of just saying the
> 'recursive' option is necessary, perhaps says a little bit about why
> it helps. In particular, the diff machinery, by default, is not
> recursive and stops at comparing the first level of trees. (See e.g.
> the -r option in diff-tree, it's just that it's turned on by default
> in 'git diff' and by the -p option in 'git log'.) I'm guessing the
> recursive option never needed to be turned on previously within
> wt-status, due to something about the nature of the index only holding
> files previously. Now, however, the sparse index changes that. (And
> it also suggests that perhaps we should look to see if other commands
> run the diff machinery without the recursive flag, and see if they
> need it now due to sparse indices.)
>
> Granted, I'm not totally sure how to work these facts in (in part
> because I don't know how comparison to the index normally avoids the
> need for the recursive flag), and maybe what you have is fine. Just
> thought I'd point it out since I wasn't aware of the non-recursive
> nature of the diff machinery until I started doing things with
> diff-tree.
>
All good points - I'll reword the commit message and code comment to explain
that 1) diff is not recursive by default, and 2) why the diff of a sparse
directory requires `recursive = 1`.
>
>> +
>> copy_pathspec(&rev.prune_data, &s->pathspec);
>> run_diff_index(&rev, 1);
>> object_array_clear(&rev.pending);
>> --
>> gitgitgadget
@@ -160,15 +160,18 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix) | |||
argc = parse_options(argc, argv, cmd_prefix, read_tree_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> When 'git read-tree' is provided with a prefix, expand the index only if the
> prefix is equivalent to a sparse directory or contained within one. If the
> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
> that the relevant sparse directory is not in the prefix/traverse path,
> skipping past it and not unpacking the appropriate tree(s).
>
> If the prefix is in-cone, its sparse subdirectories (if any) will be
> traversed correctly without index expansion.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/read-tree.c | 3 +--
> t/t1092-sparse-checkout-compatibility.sh | 8 ++++++-
> unpack-trees.c | 30 ++++++++++++++++++++++++
> 3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index c2fdbc2657f..a7b7f822281 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> if (opts.merge && !opts.index_only)
> setup_work_tree();
>
> - /* TODO: audit sparse index behavior in unpack_trees */
> - if (opts.skip_sparse_checkout || opts.prefix)
> + if (opts.skip_sparse_checkout)
> ensure_full_index(&the_index);
>
> if (opts.merge) {
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index ae44451a0a9..a404be0a10f 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
> do
> ensure_not_expanded read-tree -mu $MERGE_TREES &&
> ensure_not_expanded reset --hard HEAD || return 1
> - done
> + done &&
> +
> + rm -rf sparse-index/deep/deeper2 &&
> + ensure_not_expanded add . &&
> + ensure_not_expanded commit -m "test" &&
> +
> + ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
> '
>
> test_expect_success 'ls-files' '
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 360844bda3a..dba122a02bb 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> setup_standard_excludes(o->dir);
> }
>
> + /*
> + * If the prefix is equal to or contained within a sparse directory, the
> + * index needs to be expanded to traverse with the specified prefix.
> + */
> + if (o->prefix && o->src_index->sparse_index) {
> + int prefix_len = strlen(o->prefix);
> +
> + while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
> + prefix_len--;
> +
> + if (prefix_len > 0) {
Is this condition check necessary? If we want some safety check here,
could it instead be something like
if (prefix_len <= 0)
BUG("Broken prefix passed to unpack_trees");
and then dedent the following code? (Or are callers allowed to not
sanitize their input before passing to unpack_trees(), meaning that we
should use a die() rather than a BUG()?)
To test this idea, near the top of unpack_trees(), I added:
if (o->prefix)
assert(*o->prefix && *o->prefix != '/');
and reran all tests. They all ran without hitting that assertion. FWIW.
> + struct strbuf ce_prefix = STRBUF_INIT;
> + strbuf_grow(&ce_prefix, prefix_len + 1);
> + strbuf_add(&ce_prefix, o->prefix, prefix_len);
> + strbuf_addch(&ce_prefix, '/');
> +
> + /*
> + * If the prefix is not inside the sparse cone, then the
> + * index is explicitly expanded if it is found as a sparse
> + * directory, or implicitly expanded (by 'index_name_pos')
> + * if the path is inside a sparse directory.
> + */
> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
> + index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
style nit: Can you rewrap both the comments and the code at 80 characters?
It took me a bit of playing and testing to understand these two lines.
The comment helps, but it's still a bit dense to unpack; somehow I
didn't understand that the comment was referring to index_name_pos()'s
call to ensure_full_index(). Once I understood that, it all looks
good.
> + ensure_full_index(o->src_index);
> +
> + strbuf_release(&ce_prefix);
> + }
> + }
> +
> if (!core_apply_sparse_checkout || !o->update)
> o->skip_sparse_checkout = 1;
> if (!o->skip_sparse_checkout && !o->pl) {
> --
> gitgitgadget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> When 'git read-tree' is provided with a prefix, expand the index only if the
>> prefix is equivalent to a sparse directory or contained within one. If the
>> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
>> that the relevant sparse directory is not in the prefix/traverse path,
>> skipping past it and not unpacking the appropriate tree(s).
>>
>> If the prefix is in-cone, its sparse subdirectories (if any) will be
>> traversed correctly without index expansion.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> builtin/read-tree.c | 3 +--
>> t/t1092-sparse-checkout-compatibility.sh | 8 ++++++-
>> unpack-trees.c | 30 ++++++++++++++++++++++++
>> 3 files changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>> index c2fdbc2657f..a7b7f822281 100644
>> --- a/builtin/read-tree.c
>> +++ b/builtin/read-tree.c
>> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>> if (opts.merge && !opts.index_only)
>> setup_work_tree();
>>
>> - /* TODO: audit sparse index behavior in unpack_trees */
>> - if (opts.skip_sparse_checkout || opts.prefix)
>> + if (opts.skip_sparse_checkout)
>> ensure_full_index(&the_index);
>>
>> if (opts.merge) {
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index ae44451a0a9..a404be0a10f 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
>> do
>> ensure_not_expanded read-tree -mu $MERGE_TREES &&
>> ensure_not_expanded reset --hard HEAD || return 1
>> - done
>> + done &&
>> +
>> + rm -rf sparse-index/deep/deeper2 &&
>> + ensure_not_expanded add . &&
>> + ensure_not_expanded commit -m "test" &&
>> +
>> + ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
>> '
>>
>> test_expect_success 'ls-files' '
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 360844bda3a..dba122a02bb 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>> setup_standard_excludes(o->dir);
>> }
>>
>> + /*
>> + * If the prefix is equal to or contained within a sparse directory, the
>> + * index needs to be expanded to traverse with the specified prefix.
>> + */
>> + if (o->prefix && o->src_index->sparse_index) {
>> + int prefix_len = strlen(o->prefix);
>> +
>> + while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
>> + prefix_len--;
>> +
>> + if (prefix_len > 0) {
>
> Is this condition check necessary? If we want some safety check here,
> could it instead be something like
>
> if (prefix_len <= 0)
> BUG("Broken prefix passed to unpack_trees");
>
This condition was intended to skip unnecessary computation for the
(probably unlikely, but still technically valid) case where the prefix is
the repo root (e.g., '--prefix=/') - because the repo root is represented
with only directory separator(s), `prefix_len` would be 0 after removing
trailing '/'. In that scenario, the index won't need to be expanded, so we
don't need to go looking in the index for that path.
None of that is particularly clear from reading the patch, though, so I'll
add a comment & test covering it explicitly.
> and then dedent the following code? (Or are callers allowed to not
> sanitize their input before passing to unpack_trees(), meaning that we
> should use a die() rather than a BUG()?)
>
> To test this idea, near the top of unpack_trees(), I added:
> if (o->prefix)
> assert(*o->prefix && *o->prefix != '/');
> and reran all tests. They all ran without hitting that assertion. FWIW.
>
>> + struct strbuf ce_prefix = STRBUF_INIT;
>> + strbuf_grow(&ce_prefix, prefix_len + 1);
>> + strbuf_add(&ce_prefix, o->prefix, prefix_len);
>> + strbuf_addch(&ce_prefix, '/');
>> +
>> + /*
>> + * If the prefix is not inside the sparse cone, then the
>> + * index is explicitly expanded if it is found as a sparse
>> + * directory, or implicitly expanded (by 'index_name_pos')
>> + * if the path is inside a sparse directory.
>> + */
>> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
>> + index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
>
> style nit: Can you rewrap both the comments and the code at 80 characters?
>
I couldn't think of a way to wrap the condition that wouldn't make it more
difficult to read. The best I could come up with was:
if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf,
o->src_index) &&
index_name_pos(o->src_index,
ce_prefix.buf,
ce_prefix.len) >= 0)
ensure_full_index(o->src_index);
which, to me, is a bit hard to parse. Alternatively, though, I can move the
prefix-checking logic into its own function (kind of like
'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
change the current wrapping to keep it under 80 characters.
[1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/
> It took me a bit of playing and testing to understand these two lines.
> The comment helps, but it's still a bit dense to unpack; somehow I
> didn't understand that the comment was referring to index_name_pos()'s
> call to ensure_full_index(). Once I understood that, it all looks
> good.
>
Sorry about that, I'll revise to make that clearer.
>
>> + ensure_full_index(o->src_index);
>> +
>> + strbuf_release(&ce_prefix);
>> + }
>> + }
>> +
>> if (!core_apply_sparse_checkout || !o->update)
>> o->skip_sparse_checkout = 1;
>> if (!o->skip_sparse_checkout && !o->pl) {
>> --
>> gitgitgadget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
Hi Victoria,
On Fri, Feb 25, 2022 at 12:25 PM Victoria Dye <vdye@github.com> wrote:
>
> Elijah Newren wrote:
> > On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Victoria Dye <vdye@github.com>
> >>
> >> When 'git read-tree' is provided with a prefix, expand the index only if the
> >> prefix is equivalent to a sparse directory or contained within one. If the
> >> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
> >> that the relevant sparse directory is not in the prefix/traverse path,
> >> skipping past it and not unpacking the appropriate tree(s).
> >>
> >> If the prefix is in-cone, its sparse subdirectories (if any) will be
> >> traversed correctly without index expansion.
> >>
> >> Signed-off-by: Victoria Dye <vdye@github.com>
> >> ---
> >> builtin/read-tree.c | 3 +--
> >> t/t1092-sparse-checkout-compatibility.sh | 8 ++++++-
> >> unpack-trees.c | 30 ++++++++++++++++++++++++
> >> 3 files changed, 38 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> >> index c2fdbc2657f..a7b7f822281 100644
> >> --- a/builtin/read-tree.c
> >> +++ b/builtin/read-tree.c
> >> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
> >> if (opts.merge && !opts.index_only)
> >> setup_work_tree();
> >>
> >> - /* TODO: audit sparse index behavior in unpack_trees */
> >> - if (opts.skip_sparse_checkout || opts.prefix)
> >> + if (opts.skip_sparse_checkout)
> >> ensure_full_index(&the_index);
> >>
> >> if (opts.merge) {
> >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> >> index ae44451a0a9..a404be0a10f 100755
> >> --- a/t/t1092-sparse-checkout-compatibility.sh
> >> +++ b/t/t1092-sparse-checkout-compatibility.sh
> >> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
> >> do
> >> ensure_not_expanded read-tree -mu $MERGE_TREES &&
> >> ensure_not_expanded reset --hard HEAD || return 1
> >> - done
> >> + done &&
> >> +
> >> + rm -rf sparse-index/deep/deeper2 &&
> >> + ensure_not_expanded add . &&
> >> + ensure_not_expanded commit -m "test" &&
> >> +
> >> + ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
> >> '
> >>
> >> test_expect_success 'ls-files' '
> >> diff --git a/unpack-trees.c b/unpack-trees.c
> >> index 360844bda3a..dba122a02bb 100644
> >> --- a/unpack-trees.c
> >> +++ b/unpack-trees.c
> >> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >> setup_standard_excludes(o->dir);
> >> }
> >>
> >> + /*
> >> + * If the prefix is equal to or contained within a sparse directory, the
> >> + * index needs to be expanded to traverse with the specified prefix.
> >> + */
> >> + if (o->prefix && o->src_index->sparse_index) {
> >> + int prefix_len = strlen(o->prefix);
> >> +
> >> + while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
> >> + prefix_len--;
> >> +
> >> + if (prefix_len > 0) {
> >
> > Is this condition check necessary? If we want some safety check here,
> > could it instead be something like
> >
> > if (prefix_len <= 0)
> > BUG("Broken prefix passed to unpack_trees");
> >
>
> This condition was intended to skip unnecessary computation for the
> (probably unlikely, but still technically valid) case where the prefix is
> the repo root (e.g., '--prefix=/') - because the repo root is represented
> with only directory separator(s), `prefix_len` would be 0 after removing
> trailing '/'. In that scenario, the index won't need to be expanded, so we
> don't need to go looking in the index for that path.
Wow, that doesn't result in an error?!? That surprises me. I never
even thought to test such a thing. Clearly, the following related
command does give an error:
git read-tree --prefix=/toplevel/ $TREE
(namely, "error: invalid path '/toplevel/subpath'")
whereas, the reason a single slash is accepted it because git is
trying to be forgiving and treat the following two commands the same:
git read-tree --prefix=subdir/ $TREE
git read-tree --prefix=subdir $TREE
i.e. it tries to allow the trailing slash to be optional. And, by
implementation quirk, making a trailing slash be optional turns out to
mean that --prefix=/ is treated the same as no prefix at all, because
empty string prefix just happens to give the same behavior as NULL
prefix.
I think we should just throw an error if prefix starts with '/'.
unpack_trees() can make it be a BUG() (and at the beginning of the
function rather than down at this point and only inside some
conditional). builtin/read-tree.c, the only thing that ever sets
prefix in unpack_trees_options, should die() if it's passed something
that starts with a '/'. Having paths start with a '/' is antithetical
to how "prefix" is used throughout the codebase, though I guess I can
see people making that mistake if they are used to gitignore-style
patterns instead.
> None of that is particularly clear from reading the patch, though, so I'll
> add a comment & test covering it explicitly.
>
> > and then dedent the following code? (Or are callers allowed to not
> > sanitize their input before passing to unpack_trees(), meaning that we
> > should use a die() rather than a BUG()?)
> >
> > To test this idea, near the top of unpack_trees(), I added:
> > if (o->prefix)
> > assert(*o->prefix && *o->prefix != '/');
> > and reran all tests. They all ran without hitting that assertion. FWIW.
> >
> >> + struct strbuf ce_prefix = STRBUF_INIT;
> >> + strbuf_grow(&ce_prefix, prefix_len + 1);
> >> + strbuf_add(&ce_prefix, o->prefix, prefix_len);
> >> + strbuf_addch(&ce_prefix, '/');
> >> +
> >> + /*
> >> + * If the prefix is not inside the sparse cone, then the
> >> + * index is explicitly expanded if it is found as a sparse
> >> + * directory, or implicitly expanded (by 'index_name_pos')
> >> + * if the path is inside a sparse directory.
> >> + */
> >> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
> >> + index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
> >
> > style nit: Can you rewrap both the comments and the code at 80 characters?
> >
>
> I couldn't think of a way to wrap the condition that wouldn't make it more
> difficult to read. The best I could come up with was:
>
> if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf,
> o->src_index) &&
> index_name_pos(o->src_index,
> ce_prefix.buf,
> ce_prefix.len) >= 0)
> ensure_full_index(o->src_index);
>
>
> which, to me, is a bit hard to parse. Alternatively, though, I can move the
> prefix-checking logic into its own function (kind of like
> 'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
> change the current wrapping to keep it under 80 characters.
>
> [1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/
>
> > It took me a bit of playing and testing to understand these two lines.
> > The comment helps, but it's still a bit dense to unpack; somehow I
> > didn't understand that the comment was referring to index_name_pos()'s
> > call to ensure_full_index(). Once I understood that, it all looks
> > good.
> >
>
> Sorry about that, I'll revise to make that clearer.
Thanks. :-)
> >
> >> + ensure_full_index(o->src_index);
> >> +
> >> + strbuf_release(&ce_prefix);
> >> + }
> >> + }
> >> +
> >> if (!core_apply_sparse_checkout || !o->update)
> >> o->skip_sparse_checkout = 1;
> >> if (!o->skip_sparse_checkout && !o->pl) {
> >> --
> >> gitgitgadget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Elijah Newren wrote:
> Hi Victoria,
>
> On Fri, Feb 25, 2022 at 12:25 PM Victoria Dye <vdye@github.com> wrote:
>>
>> Elijah Newren wrote:
>>> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>>
>>>> From: Victoria Dye <vdye@github.com>
>>>>
>>>> When 'git read-tree' is provided with a prefix, expand the index only if the
>>>> prefix is equivalent to a sparse directory or contained within one. If the
>>>> index is not expanded in these cases, 'ce_in_traverse_path' will indicate
>>>> that the relevant sparse directory is not in the prefix/traverse path,
>>>> skipping past it and not unpacking the appropriate tree(s).
>>>>
>>>> If the prefix is in-cone, its sparse subdirectories (if any) will be
>>>> traversed correctly without index expansion.
>>>>
>>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>>> ---
>>>> builtin/read-tree.c | 3 +--
>>>> t/t1092-sparse-checkout-compatibility.sh | 8 ++++++-
>>>> unpack-trees.c | 30 ++++++++++++++++++++++++
>>>> 3 files changed, 38 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
>>>> index c2fdbc2657f..a7b7f822281 100644
>>>> --- a/builtin/read-tree.c
>>>> +++ b/builtin/read-tree.c
>>>> @@ -213,8 +213,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
>>>> if (opts.merge && !opts.index_only)
>>>> setup_work_tree();
>>>>
>>>> - /* TODO: audit sparse index behavior in unpack_trees */
>>>> - if (opts.skip_sparse_checkout || opts.prefix)
>>>> + if (opts.skip_sparse_checkout)
>>>> ensure_full_index(&the_index);
>>>>
>>>> if (opts.merge) {
>>>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>>>> index ae44451a0a9..a404be0a10f 100755
>>>> --- a/t/t1092-sparse-checkout-compatibility.sh
>>>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>>>> @@ -1415,7 +1415,13 @@ test_expect_success 'sparse index is not expanded: read-tree' '
>>>> do
>>>> ensure_not_expanded read-tree -mu $MERGE_TREES &&
>>>> ensure_not_expanded reset --hard HEAD || return 1
>>>> - done
>>>> + done &&
>>>> +
>>>> + rm -rf sparse-index/deep/deeper2 &&
>>>> + ensure_not_expanded add . &&
>>>> + ensure_not_expanded commit -m "test" &&
>>>> +
>>>> + ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
>>>> '
>>>>
>>>> test_expect_success 'ls-files' '
>>>> diff --git a/unpack-trees.c b/unpack-trees.c
>>>> index 360844bda3a..dba122a02bb 100644
>>>> --- a/unpack-trees.c
>>>> +++ b/unpack-trees.c
>>>> @@ -1739,6 +1739,36 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>>>> setup_standard_excludes(o->dir);
>>>> }
>>>>
>>>> + /*
>>>> + * If the prefix is equal to or contained within a sparse directory, the
>>>> + * index needs to be expanded to traverse with the specified prefix.
>>>> + */
>>>> + if (o->prefix && o->src_index->sparse_index) {
>>>> + int prefix_len = strlen(o->prefix);
>>>> +
>>>> + while (prefix_len > 0 && o->prefix[prefix_len - 1] == '/')
>>>> + prefix_len--;
>>>> +
>>>> + if (prefix_len > 0) {
>>>
>>> Is this condition check necessary? If we want some safety check here,
>>> could it instead be something like
>>>
>>> if (prefix_len <= 0)
>>> BUG("Broken prefix passed to unpack_trees");
>>>
>>
>> This condition was intended to skip unnecessary computation for the
>> (probably unlikely, but still technically valid) case where the prefix is
>> the repo root (e.g., '--prefix=/') - because the repo root is represented
>> with only directory separator(s), `prefix_len` would be 0 after removing
>> trailing '/'. In that scenario, the index won't need to be expanded, so we
>> don't need to go looking in the index for that path.
>
> Wow, that doesn't result in an error?!? That surprises me. I never
> even thought to test such a thing. Clearly, the following related
> command does give an error:
>
> git read-tree --prefix=/toplevel/ $TREE
>
> (namely, "error: invalid path '/toplevel/subpath'")
>
> whereas, the reason a single slash is accepted it because git is
> trying to be forgiving and treat the following two commands the same:
>
> git read-tree --prefix=subdir/ $TREE
> git read-tree --prefix=subdir $TREE
>
> i.e. it tries to allow the trailing slash to be optional. And, by
> implementation quirk, making a trailing slash be optional turns out to
> mean that --prefix=/ is treated the same as no prefix at all, because
> empty string prefix just happens to give the same behavior as NULL
> prefix.
>
If by "NULL prefix" you mean "the --prefix option isn't specified", the
behavior isn't *quite* the same as with an empty prefix, since `git
read-tree $TREE` will completely overwrite the index without regard for
what's already there, whereas `git read-tree --prefix= $TREE` "merges"
`$TREE` into the current index (failing if any entries in `$TREE` already
exist in the index). I still think it's an extremely niche usage, but can
also imagine a couple instances where it might be useful (e.g., reading from
an orphan branch).
> I think we should just throw an error if prefix starts with '/'.
> unpack_trees() can make it be a BUG() (and at the beginning of the
> function rather than down at this point and only inside some
> conditional). builtin/read-tree.c, the only thing that ever sets
> prefix in unpack_trees_options, should die() if it's passed something
> that starts with a '/'. Having paths start with a '/' is antithetical
> to how "prefix" is used throughout the codebase, though I guess I can
> see people making that mistake if they are used to gitignore-style
> patterns instead.
>
I think this makes sense (especially because it doesn't cause a loss of
functionality - a user can still specify '--prefix=' for the repo root if
they really want to). I'll include a patch for it in my next re-roll.
>> None of that is particularly clear from reading the patch, though, so I'll
>> add a comment & test covering it explicitly.
>>
>>> and then dedent the following code? (Or are callers allowed to not
>>> sanitize their input before passing to unpack_trees(), meaning that we
>>> should use a die() rather than a BUG()?)
>>>
>>> To test this idea, near the top of unpack_trees(), I added:
>>> if (o->prefix)
>>> assert(*o->prefix && *o->prefix != '/');
>>> and reran all tests. They all ran without hitting that assertion. FWIW.
>>>
>>>> + struct strbuf ce_prefix = STRBUF_INIT;
>>>> + strbuf_grow(&ce_prefix, prefix_len + 1);
>>>> + strbuf_add(&ce_prefix, o->prefix, prefix_len);
>>>> + strbuf_addch(&ce_prefix, '/');
>>>> +
>>>> + /*
>>>> + * If the prefix is not inside the sparse cone, then the
>>>> + * index is explicitly expanded if it is found as a sparse
>>>> + * directory, or implicitly expanded (by 'index_name_pos')
>>>> + * if the path is inside a sparse directory.
>>>> + */
>>>> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, o->src_index) &&
>>>> + index_name_pos(o->src_index, ce_prefix.buf, ce_prefix.len) >= 0)
>>>
>>> style nit: Can you rewrap both the comments and the code at 80 characters?
>>>
>>
>> I couldn't think of a way to wrap the condition that wouldn't make it more
>> difficult to read. The best I could come up with was:
>>
>> if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf,
>> o->src_index) &&
>> index_name_pos(o->src_index,
>> ce_prefix.buf,
>> ce_prefix.len) >= 0)
>> ensure_full_index(o->src_index);
>>
>>
>> which, to me, is a bit hard to parse. Alternatively, though, I can move the
>> prefix-checking logic into its own function (kind of like
>> 'pathspec_needs_expanded_index(...)' in [1]), in which case I won't need to
>> change the current wrapping to keep it under 80 characters.
>>
>> [1] https://lore.kernel.org/git/822d7344587f698e73abba1ca726c3a905f7b403.1638201164.git.gitgitgadget@gmail.com/
>>
>>> It took me a bit of playing and testing to understand these two lines.
>>> The comment helps, but it's still a bit dense to unpack; somehow I
>>> didn't understand that the comment was referring to index_name_pos()'s
>>> call to ensure_full_index(). Once I understood that, it all looks
>>> good.
>>>
>>
>> Sorry about that, I'll revise to make that clearer.
>
> Thanks. :-)
>
>>>
>>>> + ensure_full_index(o->src_index);
>>>> +
>>>> + strbuf_release(&ce_prefix);
>>>> + }
>>>> + }
>>>> +
>>>> if (!core_apply_sparse_checkout || !o->update)
>>>> o->skip_sparse_checkout = 1;
>>>> if (!o->skip_sparse_checkout && !o->pl) {
>>>> --
>>>> gitgitgadget
This patch series was integrated into seen via git@011834f. |
This patch series was integrated into seen via git@f417268. |
There was a status update in the "New Topics" section about the branch "git read-tree" has been made to be aware of the sparse-index feature. Needs review. source: <pull.1157.v2.git.1645742073.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@a1fae29. |
This patch series was integrated into seen via git@819de4e. |
@@ -117,6 +117,7 @@ test_perf_on_all git diff | |||
test_perf_on_all git diff --cached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
> emphasis is placed on interactions with files outside the sparse cone, e.g.
> merges with out-of-cone conflicts.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> t/perf/p2000-sparse-operations.sh | 1 +
> t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 2a7106b9495..382716cfca9 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -117,6 +117,7 @@ test_perf_on_all git diff
> test_perf_on_all git diff --cached
> test_perf_on_all git blame $SPARSE_CONE/a
> test_perf_on_all git blame $SPARSE_CONE/f3/a
> +test_perf_on_all git read-tree -mu HEAD
> test_perf_on_all git checkout-index -f --all
> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index b1dcaa0e642..9d58da4e925 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
> test_cmp expect sparse-checkout-out
> '
>
> +test_expect_success 'read-tree --merge with files outside sparse definition' '
> + init_repos &&
> +
> + test_all_match git checkout -b test-branch update-folder1 &&
> + for MERGE_TREES in "base HEAD update-folder2" \
> + "update-folder1 update-folder2" \
> + "update-folder2"
> + do
> + # Clean up and remove on-disk files
> + test_all_match git reset --hard HEAD &&
> + test_sparse_match git sparse-checkout reapply &&
> +
> + # Although the index matches, without --no-sparse-checkout, outside-of-
> + # definition files will not exist on disk for sparse checkouts
> + test_all_match git read-tree -mu $MERGE_TREES &&
> + test_all_match git status --porcelain=v2 &&
> + test_path_is_missing sparse-checkout/folder2 &&
> + test_path_is_missing sparse-index/folder2 &&
> +
> + test_all_match git read-tree --reset -u HEAD &&
> + test_all_match git status --porcelain=v2 &&
> +
> + test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
> + test_all_match git status --porcelain=v2 &&
> + test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
> + test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
> + done
> +'
> +
> +test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
> + init_repos &&
> +
> + # Merge of multiple changes to same directory (but not same files) should
> + # succeed
> + test_all_match git read-tree -mu base rename-base update-folder1 &&
> + test_all_match git status --porcelain=v2 &&
> +
> + test_all_match git reset --hard &&
> +
> + test_all_match git read-tree -mu rename-base update-folder2 &&
> + test_all_match git status --porcelain=v2 &&
> +
> + test_all_match git reset --hard &&
> +
> + test_all_match test_must_fail git read-tree -mu base update-folder1 rename-out-to-in &&
> + test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
> +'
> +
> +test_expect_success 'read-tree --prefix outside sparse definition' '
> + init_repos &&
> +
> + # Cannot read-tree --prefix with a single argument when files exist within
> + # prefix
Given the comments in the cover letter about --prefix needing special
work, it's not clear to me whether the below is expected behavior or
current-but-buggy behavior that you are testing and documenting.
Could you clarify?
> + test_all_match test_must_fail git read-tree --prefix=folder1/ -u update-folder1 &&
> +
> + test_all_match git read-tree --prefix=folder2/0 -u rename-base &&
> + test_path_is_missing sparse-checkout/folder2 &&
> + test_path_is_missing sparse-index/folder2 &&
> +
> + test_all_match git read-tree --reset -u HEAD &&
> + test_all_match git read-tree --prefix=folder2/0 -u --no-sparse-checkout rename-base &&
> + test_cmp sparse-checkout/folder2/0/a sparse-index/folder2/0/a &&
> + test_cmp sparse-checkout/folder2/0/a full-checkout/folder2/0/a
> +'
> +
> +test_expect_success 'read-tree --merge with directory-file conflicts' '
> + init_repos &&
> +
> + test_all_match git checkout -b test-branch rename-base &&
> +
> + # Although the index matches, without --no-sparse-checkout, outside-of-
> + # definition files will not exist on disk for sparse checkouts
> + test_sparse_match git read-tree -mu rename-out-to-out &&
> + test_sparse_match git status --porcelain=v2 &&
> + test_path_is_missing sparse-checkout/folder2 &&
> + test_path_is_missing sparse-index/folder2 &&
> +
> + test_sparse_match git read-tree --reset -u HEAD &&
> + test_sparse_match git status --porcelain=v2 &&
> +
> + test_sparse_match git read-tree -mu --no-sparse-checkout rename-out-to-out &&
> + test_sparse_match git status --porcelain=v2 &&
> + test_cmp sparse-checkout/folder2/0/1 sparse-index/folder2/0/1
> +'
> +
> test_expect_success 'merge, cherry-pick, and rebase' '
> init_repos &&
>
> --
> gitgitgadget
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Elijah Newren wrote:
> On Thu, Feb 24, 2022 at 2:34 PM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>> merges with out-of-cone conflicts.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> t/perf/p2000-sparse-operations.sh | 1 +
>> t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>> 2 files changed, 86 insertions(+)
>>
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 2a7106b9495..382716cfca9 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>> test_perf_on_all git diff --cached
>> test_perf_on_all git blame $SPARSE_CONE/a
>> test_perf_on_all git blame $SPARSE_CONE/f3/a
>> +test_perf_on_all git read-tree -mu HEAD
>> test_perf_on_all git checkout-index -f --all
>> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index b1dcaa0e642..9d58da4e925 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>> test_cmp expect sparse-checkout-out
>> '
>>
>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>> + init_repos &&
>> +
>> + test_all_match git checkout -b test-branch update-folder1 &&
>> + for MERGE_TREES in "base HEAD update-folder2" \
>> + "update-folder1 update-folder2" \
>> + "update-folder2"
>> + do
>> + # Clean up and remove on-disk files
>> + test_all_match git reset --hard HEAD &&
>> + test_sparse_match git sparse-checkout reapply &&
>> +
>> + # Although the index matches, without --no-sparse-checkout, outside-of-
>> + # definition files will not exist on disk for sparse checkouts
>> + test_all_match git read-tree -mu $MERGE_TREES &&
>> + test_all_match git status --porcelain=v2 &&
>> + test_path_is_missing sparse-checkout/folder2 &&
>> + test_path_is_missing sparse-index/folder2 &&
>> +
>> + test_all_match git read-tree --reset -u HEAD &&
>> + test_all_match git status --porcelain=v2 &&
>> +
>> + test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>> + test_all_match git status --porcelain=v2 &&
>> + test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>> + test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>> + done
>> +'
>> +
>> +test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
>> + init_repos &&
>> +
>> + # Merge of multiple changes to same directory (but not same files) should
>> + # succeed
>> + test_all_match git read-tree -mu base rename-base update-folder1 &&
>> + test_all_match git status --porcelain=v2 &&
>> +
>> + test_all_match git reset --hard &&
>> +
>> + test_all_match git read-tree -mu rename-base update-folder2 &&
>> + test_all_match git status --porcelain=v2 &&
>> +
>> + test_all_match git reset --hard &&
>> +
>> + test_all_match test_must_fail git read-tree -mu base update-folder1 rename-out-to-in &&
>> + test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
>> +'
>> +
>> +test_expect_success 'read-tree --prefix outside sparse definition' '
>> + init_repos &&
>> +
>> + # Cannot read-tree --prefix with a single argument when files exist within
>> + # prefix
>
> Given the comments in the cover letter about --prefix needing special
> work, it's not clear to me whether the below is expected behavior or
> current-but-buggy behavior that you are testing and documenting.
> Could you clarify?
>
It's expected - per 'Documentation/git-read-tree.txt':
The command will refuse to overwrite entries that already
existed in the original index file.
I'll update the comment to make that clearer. Thanks!
>> + test_all_match test_must_fail git read-tree --prefix=folder1/ -u update-folder1 &&
>> +
>> + test_all_match git read-tree --prefix=folder2/0 -u rename-base &&
>> + test_path_is_missing sparse-checkout/folder2 &&
>> + test_path_is_missing sparse-index/folder2 &&
>> +
>> + test_all_match git read-tree --reset -u HEAD &&
>> + test_all_match git read-tree --prefix=folder2/0 -u --no-sparse-checkout rename-base &&
>> + test_cmp sparse-checkout/folder2/0/a sparse-index/folder2/0/a &&
>> + test_cmp sparse-checkout/folder2/0/a full-checkout/folder2/0/a
>> +'
>> +
>> +test_expect_success 'read-tree --merge with directory-file conflicts' '
>> + init_repos &&
>> +
>> + test_all_match git checkout -b test-branch rename-base &&
>> +
>> + # Although the index matches, without --no-sparse-checkout, outside-of-
>> + # definition files will not exist on disk for sparse checkouts
>> + test_sparse_match git read-tree -mu rename-out-to-out &&
>> + test_sparse_match git status --porcelain=v2 &&
>> + test_path_is_missing sparse-checkout/folder2 &&
>> + test_path_is_missing sparse-index/folder2 &&
>> +
>> + test_sparse_match git read-tree --reset -u HEAD &&
>> + test_sparse_match git status --porcelain=v2 &&
>> +
>> + test_sparse_match git read-tree -mu --no-sparse-checkout rename-out-to-out &&
>> + test_sparse_match git status --porcelain=v2 &&
>> + test_cmp sparse-checkout/folder2/0/1 sparse-index/folder2/0/1
>> +'
>> +
>> test_expect_success 'merge, cherry-pick, and rebase' '
>> init_repos &&
>>
>> --
>> gitgitgadget
>>
On the Git mailing list, Elijah Newren wrote (reply to this):
|
@@ -117,6 +117,7 @@ test_perf_on_all git diff | |||
test_perf_on_all git diff --cached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
> emphasis is placed on interactions with files outside the sparse cone, e.g.
> merges with out-of-cone conflicts.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> t/perf/p2000-sparse-operations.sh | 1 +
> t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
> 2 files changed, 86 insertions(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 2a7106b9495..382716cfca9 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -117,6 +117,7 @@ test_perf_on_all git diff
> test_perf_on_all git diff --cached
> test_perf_on_all git blame $SPARSE_CONE/a
> test_perf_on_all git blame $SPARSE_CONE/f3/a
> +test_perf_on_all git read-tree -mu HEAD
> test_perf_on_all git checkout-index -f --all
> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index b1dcaa0e642..9d58da4e925 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
> test_cmp expect sparse-checkout-out
> '
>
> +test_expect_success 'read-tree --merge with files outside sparse definition' '
> + init_repos &&
> +
> + test_all_match git checkout -b test-branch update-folder1 &&
> + for MERGE_TREES in "base HEAD update-folder2" \
> + "update-folder1 update-folder2" \
> + "update-folder2"
> + do
> + # Clean up and remove on-disk files
> + test_all_match git reset --hard HEAD &&
> + test_sparse_match git sparse-checkout reapply &&
> +
> + # Although the index matches, without --no-sparse-checkout, outside-of-
> + # definition files will not exist on disk for sparse checkouts
> + test_all_match git read-tree -mu $MERGE_TREES &&
> + test_all_match git status --porcelain=v2 &&
> + test_path_is_missing sparse-checkout/folder2 &&
> + test_path_is_missing sparse-index/folder2 &&
> +
> + test_all_match git read-tree --reset -u HEAD &&
> + test_all_match git status --porcelain=v2 &&
> +
> + test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
> + test_all_match git status --porcelain=v2 &&
> + test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
> + test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
> + done
> +'
Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:
for MERGE_TREES in "base HEAD update-folder2" [...]
do
test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
init_repos &&
test_when_finished "test_all_match git reset --hard HEAD" &&
...
'
done
It makes failures easier to reason about since you see which for-loop
iteration you're in right away, and can e.g. pick one with --run.
And we can do the cleanup in test_when_finished instead of at the start
of every loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Ævar Arnfjörð Bjarmason wrote:
>
> On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:
>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>> merges with out-of-cone conflicts.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> t/perf/p2000-sparse-operations.sh | 1 +
>> t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>> 2 files changed, 86 insertions(+)
>>
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 2a7106b9495..382716cfca9 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>> test_perf_on_all git diff --cached
>> test_perf_on_all git blame $SPARSE_CONE/a
>> test_perf_on_all git blame $SPARSE_CONE/f3/a
>> +test_perf_on_all git read-tree -mu HEAD
>> test_perf_on_all git checkout-index -f --all
>> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index b1dcaa0e642..9d58da4e925 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>> test_cmp expect sparse-checkout-out
>> '
>>
>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>> + init_repos &&
>> +
>> + test_all_match git checkout -b test-branch update-folder1 &&
>> + for MERGE_TREES in "base HEAD update-folder2" \
>> + "update-folder1 update-folder2" \
>> + "update-folder2"
>> + do
>> + # Clean up and remove on-disk files
>> + test_all_match git reset --hard HEAD &&
>> + test_sparse_match git sparse-checkout reapply &&
>> +
>> + # Although the index matches, without --no-sparse-checkout, outside-of-
>> + # definition files will not exist on disk for sparse checkouts
>> + test_all_match git read-tree -mu $MERGE_TREES &&
>> + test_all_match git status --porcelain=v2 &&
>> + test_path_is_missing sparse-checkout/folder2 &&
>> + test_path_is_missing sparse-index/folder2 &&
>> +
>> + test_all_match git read-tree --reset -u HEAD &&
>> + test_all_match git status --porcelain=v2 &&
>> +
>> + test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>> + test_all_match git status --porcelain=v2 &&
>> + test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>> + test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>> + done
>> +'
>
> Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:
>
> for MERGE_TREES in "base HEAD update-folder2" [...]
> do
> test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
> init_repos &&
> test_when_finished "test_all_match git reset --hard HEAD" &&
> ...
> '
> done
>
> It makes failures easier to reason about since you see which for-loop
> iteration you're in right away, and can e.g. pick one with --run.
>
I like how this separates the test cases (while not adding any
redundant/copied code). I'll update in the next version, thanks!
> And we can do the cleanup in test_when_finished instead of at the start
> of every loop.
Because `init_repos` completely resets the test repos, this actually lets me
remove the extra cleanup steps completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Mon, Feb 28 2022, Victoria Dye wrote:
> Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Feb 24 2022, Victoria Dye via GitGitGadget wrote:
>>
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Add tests focused on how 'git read-tree' behaves in sparse checkouts. Extra
>>> emphasis is placed on interactions with files outside the sparse cone, e.g.
>>> merges with out-of-cone conflicts.
>>>
>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>> ---
>>> t/perf/p2000-sparse-operations.sh | 1 +
>>> t/t1092-sparse-checkout-compatibility.sh | 85 ++++++++++++++++++++++++
>>> 2 files changed, 86 insertions(+)
>>>
>>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>>> index 2a7106b9495..382716cfca9 100755
>>> --- a/t/perf/p2000-sparse-operations.sh
>>> +++ b/t/perf/p2000-sparse-operations.sh
>>> @@ -117,6 +117,7 @@ test_perf_on_all git diff
>>> test_perf_on_all git diff --cached
>>> test_perf_on_all git blame $SPARSE_CONE/a
>>> test_perf_on_all git blame $SPARSE_CONE/f3/a
>>> +test_perf_on_all git read-tree -mu HEAD
>>> test_perf_on_all git checkout-index -f --all
>>> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>>
>>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>>> index b1dcaa0e642..9d58da4e925 100755
>>> --- a/t/t1092-sparse-checkout-compatibility.sh
>>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>>> @@ -819,6 +819,91 @@ test_expect_success 'update-index --cacheinfo' '
>>> test_cmp expect sparse-checkout-out
>>> '
>>>
>>> +test_expect_success 'read-tree --merge with files outside sparse definition' '
>>> + init_repos &&
>>> +
>>> + test_all_match git checkout -b test-branch update-folder1 &&
>>> + for MERGE_TREES in "base HEAD update-folder2" \
>>> + "update-folder1 update-folder2" \
>>> + "update-folder2"
>>> + do
>>> + # Clean up and remove on-disk files
>>> + test_all_match git reset --hard HEAD &&
>>> + test_sparse_match git sparse-checkout reapply &&
>>> +
>>> + # Although the index matches, without --no-sparse-checkout, outside-of-
>>> + # definition files will not exist on disk for sparse checkouts
>>> + test_all_match git read-tree -mu $MERGE_TREES &&
>>> + test_all_match git status --porcelain=v2 &&
>>> + test_path_is_missing sparse-checkout/folder2 &&
>>> + test_path_is_missing sparse-index/folder2 &&
>>> +
>>> + test_all_match git read-tree --reset -u HEAD &&
>>> + test_all_match git status --porcelain=v2 &&
>>> +
>>> + test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
>>> + test_all_match git status --porcelain=v2 &&
>>> + test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
>>> + test_cmp sparse-checkout/folder2/a full-checkout/folder2/a || return 1
>>> + done
>>> +'
>>
>> Nit: Isn't this nicer/easier by unrolling the for-loop to the top-level, i.e.:
>>
>> for MERGE_TREES in "base HEAD update-folder2" [...]
>> do
>> test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
>> init_repos &&
>> test_when_finished "test_all_match git reset --hard HEAD" &&
>> ...
>> '
>> done
>>
>> It makes failures easier to reason about since you see which for-loop
>> iteration you're in right away, and can e.g. pick one with --run.
>>
>
> I like how this separates the test cases (while not adding any
> redundant/copied code). I'll update in the next version, thanks!
>
>> And we can do the cleanup in test_when_finished instead of at the start
>> of every loop.
Sounds good!
Note for <reasons> we eval the body of the test into existence, but
*not* the description. So:
for x in [...] test_expect_success "$x" '$x'
Works to expand "$x" in both cases, but not:
for x in [...] test_expect_success '$x' '$x'
And you don't need to do:
for x in [...] test_expect_success "$x" "$x"
Which is handy as double-quoting the body is often a hassle with
escaping stuff.
I only think I got that wrong the first 1, 2, 3.... etc. times I used
this pattern, so I thought I'd mention it :)
> Because `init_repos` completely resets the test repos, this actually lets me
> remove the extra cleanup steps completely.
\o/
User |
This patch series was integrated into seen via git@73c31f7. |
This patch series was integrated into seen via git@c6a76f4. |
This patch series was integrated into seen via git@f13af5e. |
There was a status update in the "Cooking" section about the branch "git read-tree" has been made to be aware of the sparse-index feature. Needs review. source: <pull.1157.v2.git.1645742073.gitgitgadget@gmail.com> |
Enable the 'recursive' diff option for the diff executed as part of 'git status'. Without the 'recursive' enabled, 'git status' reports index changes incorrectly when the following conditions were met: * sparse index is enabled * there is a difference between the index and HEAD in a file inside a *subdirectory* of a sparse directory * the sparse directory index entry is *not* expanded in-core Because it is not recursive by default, the diff in 'git status' reports changes only at the level of files and directories that are immediate children of a sparse directory, rather than recursing into directories with changes to identify the modified file(s). As a result, 'git status' reports the immediate subdirectory itself as "modified". Example: $ git init $ mkdir -p sparse/sub $ echo test >sparse/sub/foo $ git add . $ git commit -m "commit 1" $ echo somethingelse >sparse/sub/foo $ git add . $ git commit -a -m "commit 2" $ git sparse-checkout set --cone --sparse-index 'sparse' $ git reset --soft HEAD~1 $ git status On branch master You are in a sparse checkout. Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: sparse/sub Enabling the 'recursive' diff option in 'wt_status_collect_changes_index' corrects this issue by allowing the diff to recurse into subdirectories of sparse directories to find modified files. Given the same repository setup as the example above, the corrected result of `git status` is: $ git status On branch master You are in a sparse checkout. Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: sparse/sub/foo Signed-off-by: Victoria Dye <vdye@github.com>
This patch series was integrated into seen via git@c5fc750. |
This patch series was integrated into seen via git@a732655. |
On the Git mailing list, Elijah Newren wrote (reply to this):
|
This patch series was integrated into seen via git@64a4615. |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
This patch series was integrated into seen via git@ffe9d23. |
This patch series was integrated into seen via git@d1d8afc. |
@@ -160,15 +160,22 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix) | |||
argc = parse_options(argc, argv, cmd_prefix, read_tree_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Glen Choo wrote (reply to this):
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Victoria Dye <vdye@github.com>
> +static void update_sparsity_for_prefix(const char *prefix,
> + struct index_state *istate)
> +{
> + int prefix_len = strlen(prefix);
> + struct strbuf ce_prefix = STRBUF_INIT;
> +
> + if (!istate->sparse_index)
> + return;
> +
> + while (prefix_len > 0 && prefix[prefix_len - 1] == '/')
> + prefix_len--;
> +
> + if (prefix_len <= 0)
> + BUG("Invalid prefix passed to update_sparsity_for_prefix");
> +
> + strbuf_grow(&ce_prefix, prefix_len + 1);
> + strbuf_add(&ce_prefix, prefix, prefix_len);
> + strbuf_addch(&ce_prefix, '/');
> +
> + /*
> + * If the prefix points to a sparse directory or a path inside a sparse
> + * directory, the index should be expanded. This is accomplished in one
> + * of two ways:
> + * - if the prefix is inside a sparse directory, it will be expanded by
> + * the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
> + * - if the prefix matches an existing sparse directory entry,
> + * 'index_name_pos(...)' will return its index position, triggering
> + * the 'ensure_full_index(...)' below.
> + */
> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
> + index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
> + ensure_full_index(istate);
> +
> + strbuf_release(&ce_prefix);
> +}
Hm, I don't think I follow the rationale for having two different ways
of expanding the index:
- If the prefix is inside a sparse directory, we should expand the
index.
- If the prefix matches a sparse directory entry, we should expand the
index.
So it seems like distinguishing between the two cases with
index_name_pos(...) isn't necessary. I've attached a diff that does
exactly this, and it passes t1092-sparse-checkout-compatibility.sh as
far as I can tell. I've also amended the comment in a way that makes
more sense to me, but I'm not 100% sure if it's accurate.
I'm also a little averse to using a side effect of index_name_pos() to
achieve what we really want, so I'd prefer to get rid of the call if we
can :)
----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
diff --git a/unpack-trees.c b/unpack-trees.c
index b876caca0d..5b07055605 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1749,17 +1749,11 @@ static void update_sparsity_for_prefix(const char *prefix,
strbuf_addch(&ce_prefix, '/');
/*
- * If the prefix points to a sparse directory or a path inside a sparse
- * directory, the index should be expanded. This is accomplished in one
- * of two ways:
- * - if the prefix is inside a sparse directory, it will be expanded by
- * the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
- * - if the prefix matches an existing sparse directory entry,
- * 'index_name_pos(...)' will return its index position, triggering
- * the 'ensure_full_index(...)' below.
+ * If the prefix points to a sparse directory or a path inside a
+ * sparse directory (not within the sparse patterns), the index
+ * should be expanded.
*/
- if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
- index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
+ if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate))
ensure_full_index(istate);
strbuf_release(&ce_prefix);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Victoria Dye wrote (reply to this):
Glen Choo wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Victoria Dye <vdye@github.com>
>> +static void update_sparsity_for_prefix(const char *prefix,
>> + struct index_state *istate)
>> +{
>> + int prefix_len = strlen(prefix);
>> + struct strbuf ce_prefix = STRBUF_INIT;
>> +
>> + if (!istate->sparse_index)
>> + return;
>> +
>> + while (prefix_len > 0 && prefix[prefix_len - 1] == '/')
>> + prefix_len--;
>> +
>> + if (prefix_len <= 0)
>> + BUG("Invalid prefix passed to update_sparsity_for_prefix");
>> +
>> + strbuf_grow(&ce_prefix, prefix_len + 1);
>> + strbuf_add(&ce_prefix, prefix, prefix_len);
>> + strbuf_addch(&ce_prefix, '/');
>> +
>> + /*
>> + * If the prefix points to a sparse directory or a path inside a sparse
>> + * directory, the index should be expanded. This is accomplished in one
>> + * of two ways:
>> + * - if the prefix is inside a sparse directory, it will be expanded by
>> + * the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
>> + * - if the prefix matches an existing sparse directory entry,
>> + * 'index_name_pos(...)' will return its index position, triggering
>> + * the 'ensure_full_index(...)' below.
>> + */
>> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
>> + index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
>> + ensure_full_index(istate);
>> +
>> + strbuf_release(&ce_prefix);
>> +}
>
> Hm, I don't think I follow the rationale for having two different ways
> of expanding the index:
>
> - If the prefix is inside a sparse directory, we should expand the
> index.
> - If the prefix matches a sparse directory entry, we should expand the
> index.
>
There's an (admittedly subtle if you aren't familiar with sparse index)
distinction between a "sparse directory" index entry and "a directory
outside the sparse-checkout cone with SKIP_WORKTREE enabled on its entries".
Ultimately, that's what necessitates the two checks but, as in [1], I want
to use this as an opportunity to shed some light on what 'unpack_trees(...)'
does.
Taking a step back, why would index expansion matter when you pass a prefix
to 'read-tree'? The answer lies in the tree traversal at the core of
'unpack_trees(...)'; when a prefix is provided, 'unpack_trees' does the
following:
1. Set the "traversal prefix" to the user-provided prefix.
2. Call 'traverse_trees(...)', iterating through the child files/directories
of the prefix directory (NOTE: this does *not* use the index - it finds
the child entries by extracting them from the tree).
3. For each child, call 'unpack_callback' which, among other things, looks
for the child entry as it exists in the index to merge the input trees
into it.
The problem with sparse directories arises in step 3. Suppose you have the
following repo:
.
├── bar
│ └── f1
├── baz
│ ├── deep
│ │ └── a
│ └── f2
├── foo
└── foo1
where directory 'baz/' is sparse, leading to an index that looks like this
(per 'git ls-files -t --sparse'):
H bar/f1
S baz/
H foo
H foo1
If you provide 'baz/' as a prefix, 'unpack_callback(...)' tries to find
'baz/deep/a' and 'baz/f2' in the index, but they won't be found because
they're "wrapped" in sparse directory 'baz/'. Ultimately, this leads to a
corrupted index with duplicate 'baz/' contents merged in:
H bar/f1
S baz/
S baz/deep/a
S baz/f2
H foo
H foo1
This explains why you need to expand the index at all, but not why you need
to check that the prefix is not in the sparse cone *and* that it (or its
parent directory) doesn't exist in the index. For that, an important bit of
context is that you can have a sparse index with non-sparse directories
outside the cone. This can happen in a number of ways (for example, running
'git update-index --no-skip-worktree' on a file in a sparse directory), but
the important thing is that it is a completely valid state and not entirely
uncommon.
Using our example above, suppose 'baz/' is partially expanded in the index,
with the following index contents:
H bar/f1
S baz/deep/
S baz/f2
H foo
H foo1
If we use the prefix 'baz/' here, we actually traverse the trees properly:
'baz/deep/' and 'baz/f2' will be found and merged - no index expansion
needed! But if we only checked '!path_in_cone_mode_sparse_checkout(...)', we
would have expanded the index because 'baz/' is outside the sparse cone.
This presents a problem because index expansion is *extremely* expensive -
we should avoid it whenever possible. That's where checking
'index_name_pos(...)' comes in: if the directory is in the index as a sparse
directory, the position is '>= 0' and 'ensure_full_index(...)' is called; if
the directory is inside an existing sparse directory, the position will be
'< 0' but the index will be expanded implicitly. In every other case, we
avoid expanding the index and proceed with the merge as normal.
Hope this helps!
[1] https://lore.kernel.org/git/dc47f12b-8724-22ef-ed2c-096badfafd76@github.com/
> So it seems like distinguishing between the two cases with
> index_name_pos(...) isn't necessary. I've attached a diff that does
> exactly this, and it passes t1092-sparse-checkout-compatibility.sh as
> far as I can tell. I've also amended the comment in a way that makes
> more sense to me, but I'm not 100% sure if it's accurate.
>
> I'm also a little averse to using a side effect of index_name_pos() to
> achieve what we really want, so I'd prefer to get rid of the call if we
> can :)
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index b876caca0d..5b07055605 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1749,17 +1749,11 @@ static void update_sparsity_for_prefix(const char *prefix,
> strbuf_addch(&ce_prefix, '/');
>
> /*
> - * If the prefix points to a sparse directory or a path inside a sparse
> - * directory, the index should be expanded. This is accomplished in one
> - * of two ways:
> - * - if the prefix is inside a sparse directory, it will be expanded by
> - * the 'ensure_full_index(...)' call in 'index_name_pos(...)'.
> - * - if the prefix matches an existing sparse directory entry,
> - * 'index_name_pos(...)' will return its index position, triggering
> - * the 'ensure_full_index(...)' below.
> + * If the prefix points to a sparse directory or a path inside a
> + * sparse directory (not within the sparse patterns), the index
> + * should be expanded.
> */
> - if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate) &&
> - index_name_pos(istate, ce_prefix.buf, ce_prefix.len) >= 0)
> + if (!path_in_cone_mode_sparse_checkout(ce_prefix.buf, istate))
> ensure_full_index(istate);
>
> strbuf_release(&ce_prefix);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Glen Choo wrote (reply to this):
Victoria Dye <vdye@github.com> writes:
> There's an (admittedly subtle if you aren't familiar with sparse index)
> distinction between a "sparse directory" index entry and "a directory
> outside the sparse-checkout cone with SKIP_WORKTREE enabled on its entries".
> Ultimately, that's what necessitates the two checks but, as in [1], I want
> to use this as an opportunity to shed some light on what 'unpack_trees(...)'
> does.
Thanks! This explanation was really helpful.
> Using our example above, suppose 'baz/' is partially expanded in the index,
> with the following index contents:
>
> H bar/f1
> S baz/deep/
> S baz/f2
> H foo
> H foo1
>
> If we use the prefix 'baz/' here, we actually traverse the trees properly:
> 'baz/deep/' and 'baz/f2' will be found and merged - no index expansion
> needed! But if we only checked '!path_in_cone_mode_sparse_checkout(...)', we
> would have expanded the index because 'baz/' is outside the sparse cone.
In particular, I didn't consider that a directory outside of the
sparse-checkout cone could be partially expanded. This seems to be the
crux of it, which is that even if the path is outside of the
sparse-checkout clone, we can still get correct behavior (without
expanding the index) if its entries are expanded...
> This presents a problem because index expansion is *extremely* expensive -
> we should avoid it whenever possible. That's where checking
> 'index_name_pos(...)' comes in: if the directory is in the index as a sparse
> directory, the position is '>= 0' and 'ensure_full_index(...)' is called; if
> the directory is inside an existing sparse directory, the position will be
> '< 0' but the index will be expanded implicitly. In every other case, we
> avoid expanding the index and proceed with the merge as normal.
and because of this, we don't always need to expand the index when the
path is outside of the cone, so my suggested patch expands the index in
too many cases.
What I also didn't consider is that index_name_pos() doesn't _always_
expand the index, it only expands the index when the directory is
inside a sparse directory entry.
So the side-effect of index_name_pos() is actually _exactly_ what we
want. Granted, it would be clearer if we had a function that did _only_
'expand if path is inside a sparse directory entry', but I suppose it's
overkill.
(Purely optional suggestion) I wonder if we could add a test that can
distinguish between 'always expand if --prefix is outside of the cone'
vs 'expand only if path is outside of cone AND inside a sparse directory
entry'. The scenario you described sounds perfect as a test case, though
I don't know how feasible it is to set up.
User |
This patch series was integrated into seen via git@bd501c1. |
There was a status update in the "Cooking" section about the branch "git read-tree" has been made to be aware of the sparse-index feature. Will merge to 'next'. source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@166f3c8. |
This patch series was integrated into seen via git@8a8b71b. |
There was a status update in the "Cooking" section about the branch "git read-tree" has been made to be aware of the sparse-index feature. Will merge to 'next'. source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@1df3b59. |
This patch series was integrated into next via git@037c187. |
This patch series was integrated into seen via git@4493ed7. |
This patch series was integrated into seen via git@1dff3d0. |
This patch series was integrated into seen via git@bf7da9f. |
This patch series was integrated into seen via git@b01664a. |
There was a status update in the "Cooking" section about the branch "git read-tree" has been made to be aware of the sparse-index feature. Will merge to 'master'. source: <pull.1157.v3.git.1646166271.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@e4ca37f. |
This patch series was integrated into seen via git@d0a6b79. |
This patch series was integrated into seen via git@190f9bf. |
This patch series was integrated into master via git@190f9bf. |
This patch series was integrated into next via git@190f9bf. |
Closed via 190f9bf. |
Like previous integrations [1] [2], this series allows 'git read-tree' to operate using a sparse index.
The first three patches are bugfixes for issues found while implementing the 'read-tree' integration:
The remainder of the series focuses on utilizing the sparse index in 'git read-tree'. After some baseline behavior-establishing tests (patch [4/8]), sparse index usage is trivially enabled (patch [5/8]) for 'read-tree' except:
These cases require additional changes in order to work as expected (i.e., outwardly matching non-sparse index sparse-checkout). For the former, the sparse index can be enabled as long as the index is expanded when the prefix is a directory outside the sparse cone (patch [6/8]). For the latter, sparse directories that cannot be trivially merged must have their contents merged file-by-file, done by recursively traversing the trees represented by the sparse directories (patches [7/8] & [8/8]).
Changes since V2
Changes since V1
Thanks!
[1] https://lore.kernel.org/git/pull.1109.v2.git.1641924306.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.1048.v6.git.1638201164.gitgitgadget@gmail.com/
cc: newren@gmail.com
cc: gitster@pobox.com
cc: Derrick Stolee derrickstolee@github.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Glen Choo chooglen@google.com