Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dir: check for single file cone patterns #1446

Conversation

williams-unity
Copy link

@williams-unity williams-unity commented Dec 20, 2022

Resubmitting this without the superfluous double negation of the 'strcmp' output.

cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Victoria Dye vdye@github.com

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2022

There are issues in commit ac14e6c:
dir: check for single file cone patterns
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@williams-unity
Copy link
Author

/preview

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2022

Preview email sent as pull.1446.git.1671544091487.gitgitgadget@gmail.com

@williams-unity
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2022

Submitted as pull.1446.git.1671546459151.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v1

To fetch this version to local tag pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Dec 20 2022, William Sprent via GitGitGadget wrote:

> From: William Sprent <williams@unity3d.com>
> [...]
> +	if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
> +	    !!strcmp(given->pattern, "/*")) {

Style: We use !! to convert 123 to 1 or whatever, but here we just care
about "not matched", so we can drop this !!strcmp() for strcmp().

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2022

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

The sparse checkout documentation states that the cone mode pattern set
is limited to patterns that either recursively include directories or
patterns that match all files in a directory. In the sparse checkout
file, the former manifest in the form:

    /A/B/C/

while the latter become a pair of patterns either in the form:

    /A/B/
    !/A/B/*/

or in the special case of matching the toplevel files:

    /*
    !/*/

The 'add_pattern_to_hashsets()' function contains checks which serve to
disable cone-mode when non-cone patterns are encountered. However, these
do not catch when the pattern list attempts to match a single file or
directory, e.g. a pattern in the form:

    /A/B/C

This causes sparse-checkout to exhibit unexpected behaviour when such a
pattern is in the sparse-checkout file and cone mode is enabled.
Concretely, with the pattern like the above, sparse-checkout, in
non-cone mode, will only include the directory or file located at
'/A/B/C'. However, with cone mode enabled, sparse-checkout will instead
just manifest the toplevel files but not any file located at '/A/B/C'.

Relatedly, issues occur when supplying the same kind of filter when
partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will
correctly just include the objects that match the non-cone pattern
matching. Which means that checking out the newly cloned repo with the
same filter, but with cone mode enabled, fails due to missing objects.

To fix these issues, add a cone mode pattern check that asserts that
every pattern is either a directory match or the pattern '/*'. Add a
test to verify the new pattern check and modify another to reflect that
non-directory patterns are caught earlier.

Signed-off-by: William Sprent <williams@unity3d.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2022

On the Git mailing list, William Sprent wrote (reply to this):

On 20/12/2022 15.33, Ævar Arnfjörð Bjarmason wrote:
> > On Tue, Dec 20 2022, William Sprent via GitGitGadget wrote:
> >> From: William Sprent <williams@unity3d.com>
>> [...]
>> +	if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
>> +	    !!strcmp(given->pattern, "/*")) {
> > Style: We use !! to convert 123 to 1 or whatever, but here we just care
> about "not matched", so we can drop this !!strcmp() for strcmp().

Thanks. I'll drop the '!!' for next version.

@williams-unity
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 3, 2023

Submitted as pull.1446.v2.git.1672734059938.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2

To fetch this version to local tag pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

"William Sprent via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: William Sprent <williams@unity3d.com>
>
> The sparse checkout documentation states that the cone mode pattern set
> is limited to patterns that either recursively include directories or
> patterns that match all files in a directory. In the sparse checkout
> file, the former manifest in the form:
>
>     /A/B/C/
>
> while the latter become a pair of patterns either in the form:
>
>     /A/B/
>     !/A/B/*/
>
> or in the special case of matching the toplevel files:
>
>     /*
>     !/*/
>
> The 'add_pattern_to_hashsets()' function contains checks which serve to
> disable cone-mode when non-cone patterns are encountered. However, these
> do not catch when the pattern list attempts to match a single file or
> directory, e.g. a pattern in the form:
>
>     /A/B/C
>
> This causes sparse-checkout to exhibit unexpected behaviour when such a
> pattern is in the sparse-checkout file and cone mode is enabled.
> Concretely, with the pattern like the above, sparse-checkout, in
> non-cone mode, will only include the directory or file located at
> '/A/B/C'. However, with cone mode enabled, sparse-checkout will instead
> just manifest the toplevel files but not any file located at '/A/B/C'.
>
> Relatedly, issues occur when supplying the same kind of filter when
> partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will
> correctly just include the objects that match the non-cone pattern
> matching. Which means that checking out the newly cloned repo with the
> same filter, but with cone mode enabled, fails due to missing objects.
>
> To fix these issues, add a cone mode pattern check that asserts that
> every pattern is either a directory match or the pattern '/*'. Add a
> test to verify the new pattern check and modify another to reflect that
> non-directory patterns are caught earlier.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---

Summoning those who worked in the area, which includes area experts,
for further comments, picked out of output from

    $ git shortlog --no-merges --grep=cone --since=2.years -s -n -e


> diff --git a/dir.c b/dir.c
> index fbdb24fc819..4e99f0c868f 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -732,6 +732,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
>  		goto clear_hashmaps;
>  	}
>  
> +	if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
> +	    strcmp(given->pattern, "/*")) {
> +		/* Not a cone pattern. */
> +		warning(_("unrecognized pattern: '%s'"), given->pattern);
> +		goto clear_hashmaps;
> +	}
> +
>  	prev = given->pattern;
>  	cur = given->pattern + 1;
>  	next = given->pattern + 2;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index b563d6c263e..627267be153 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -238,7 +238,7 @@ test_expect_success 'cone mode: match patterns' '
>  test_expect_success 'cone mode: warn on bad pattern' '
>  	test_when_finished mv sparse-checkout repo/.git/info/ &&
>  	cp repo/.git/info/sparse-checkout . &&
> -	echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout &&
> +	echo "!/deep/deeper/*/" >>repo/.git/info/sparse-checkout &&
>  	git -C repo read-tree -mu HEAD 2>err &&
>  	test_i18ngrep "unrecognized negative pattern" err
>  '
> @@ -667,6 +667,15 @@ test_expect_success 'pattern-checks: starting "*"' '
>  	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
>  '
>  
> +test_expect_success 'pattern-checks: non directory pattern' '
> +	cat >repo/.git/info/sparse-checkout <<-\EOF &&
> +	/deep/deeper1/a
> +	EOF
> +	check_read_tree_errors repo deep "disabling cone pattern matching" &&
> +	check_files repo/deep deeper1 &&
> +	check_files repo/deep/deeper1 a
> +'
> +
>  test_expect_success 'pattern-checks: contained glob characters' '
>  	for c in "[a]" "\\" "?" "*"
>  	do
>
> base-commit: 7c2ef319c52c4997256f5807564523dfd4acdfc7

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2023

On the Git mailing list, Victoria Dye wrote (reply to this):

William Sprent via GitGitGadget wrote:
> From: William Sprent <williams@unity3d.com>
> 
> The sparse checkout documentation states that the cone mode pattern set
> is limited to patterns that either recursively include directories or
> patterns that match all files in a directory. In the sparse checkout
> file, the former manifest in the form:
> 
>     /A/B/C/
> 
> while the latter become a pair of patterns either in the form:
> 
>     /A/B/
>     !/A/B/*/
> 
> or in the special case of matching the toplevel files:
> 
>     /*
>     !/*/
> 
> The 'add_pattern_to_hashsets()' function contains checks which serve to
> disable cone-mode when non-cone patterns are encountered. However, these
> do not catch when the pattern list attempts to match a single file or
> directory, e.g. a pattern in the form:
> 
>     /A/B/C
> 
> This causes sparse-checkout to exhibit unexpected behaviour when such a
> pattern is in the sparse-checkout file and cone mode is enabled.
> Concretely, with the pattern like the above, sparse-checkout, in
> non-cone mode, will only include the directory or file located at
> '/A/B/C'. However, with cone mode enabled, sparse-checkout will instead
> just manifest the toplevel files but not any file located at '/A/B/C'.
> 
> Relatedly, issues occur when supplying the same kind of filter when
> partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will
> correctly just include the objects that match the non-cone pattern
> matching. Which means that checking out the newly cloned repo with the
> same filter, but with cone mode enabled, fails due to missing objects.

This clearly explains the issue you're trying to fix: cone mode
sparse-checkout requires patterns like '/A/B/', !/A/B/*/', '/A/B/C/', etc.,
but invalid patterns like '/A/B/C' (no trailing slash) currently don't force
the switch to non-cone mode, leading to unexpected behavior.

> 
> To fix these issues, add a cone mode pattern check that asserts that
> every pattern is either a directory match or the pattern '/*'. Add a
> test to verify the new pattern check and modify another to reflect that
> non-directory patterns are caught earlier.

I think this is the best way to maintain the current intended behavior of
cone mode sparse-checkout. While the idea of single file "exceptions" to
cone patterns has been brought up in the past (I think most recently at the
Contributor's Summit this past September [1]), it'd be a substantial change
that's definitely out-of-scope of this small bugfix.

[1] https://lore.kernel.org/git/YzXwOsaoCdBhHsX1@nand.local/

> 
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
>     dir: check for single file cone patterns
>     
>     Resubmitting this without the superfluous double negation of the
>     'strcmp' output.
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1446%2Fwilliams-unity%2Fws%2Fsparse-checkout-pattern-match-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1446
> 
> Range-diff vs v1:
> 
>  1:  cfb4b75c378 ! 1:  d5406e62f6f dir: check for single file cone patterns
>      @@ dir.c: static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_
>        	}
>        
>       +	if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
>      -+	    !!strcmp(given->pattern, "/*")) {
>      ++	    strcmp(given->pattern, "/*")) {
>       +		/* Not a cone pattern. */
>       +		warning(_("unrecognized pattern: '%s'"), given->pattern);
>       +		goto clear_hashmaps;
> 
> 
>  dir.c                              |  7 +++++++
>  t/t1091-sparse-checkout-builtin.sh | 11 ++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index fbdb24fc819..4e99f0c868f 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -732,6 +732,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
>  		goto clear_hashmaps;
>  	}
>  
> +	if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
> +	    strcmp(given->pattern, "/*")) {

This condition is only entered if:

- the pattern *does not* end in a trailing slash 
  ('!(given->flags & PATTERN_FLAG_MUSTBEDIR)'), AND
- the pattern *does not* start with '/*' ('strcmp(given->pattern, "/*")')

The '/*'-like patterns are already handled by other parts of
'add_pattern_to_hashsets()', so only the '/A/B/C'-like patterns will trigger
the "unrecognized" warning. Looks good!


> +		/* Not a cone pattern. */
> +		warning(_("unrecognized pattern: '%s'"), given->pattern);
> +		goto clear_hashmaps;
> +	}
> +
>  	prev = given->pattern;
>  	cur = given->pattern + 1;
>  	next = given->pattern + 2;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index b563d6c263e..627267be153 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -238,7 +238,7 @@ test_expect_success 'cone mode: match patterns' '
>  test_expect_success 'cone mode: warn on bad pattern' '
>  	test_when_finished mv sparse-checkout repo/.git/info/ &&
>  	cp repo/.git/info/sparse-checkout . &&
> -	echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout &&
> +	echo "!/deep/deeper/*/" >>repo/.git/info/sparse-checkout &&

This test changes because, without the trailing slash on the pattern, the
error would become "unrecognized pattern: !/deep/deeper/*" due to the new
check in 'add_pattern_to_hashsets()'. Changing the test pattern ensures
we're still exercising what the test was originally intended to exercise,
and doesn't indicate a regression.

>  	git -C repo read-tree -mu HEAD 2>err &&
>  	test_i18ngrep "unrecognized negative pattern" err
>  '
> @@ -667,6 +667,15 @@ test_expect_success 'pattern-checks: starting "*"' '
>  	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
>  '
>  
> +test_expect_success 'pattern-checks: non directory pattern' '
> +	cat >repo/.git/info/sparse-checkout <<-\EOF &&
> +	/deep/deeper1/a
> +	EOF
> +	check_read_tree_errors repo deep "disabling cone pattern matching" &&
> +	check_files repo/deep deeper1 &&
> +	check_files repo/deep/deeper1 a
> +'

And this test ensures the new check is working for the appropriate patterns.

This patch looks good to me, thanks for finding & fixing the bug!

> +
>  test_expect_success 'pattern-checks: contained glob characters' '
>  	for c in "[a]" "\\" "?" "*"
>  	do
> 
> base-commit: 7c2ef319c52c4997256f5807564523dfd4acdfc7

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2023

User Victoria Dye <vdye@github.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

Victoria Dye <vdye@github.com> writes:

> This clearly explains the issue you're trying to fix: cone mode
> sparse-checkout requires patterns like '/A/B/', !/A/B/*/', '/A/B/C/', etc.,
> but invalid patterns like '/A/B/C' (no trailing slash) currently don't force
> the switch to non-cone mode, leading to unexpected behavior.
> ...
>> To fix these issues, add a cone mode pattern check that asserts that
>> every pattern is either a directory match or the pattern '/*'. Add a
>> test to verify the new pattern check and modify another to reflect that
>> non-directory patterns are caught earlier.
>
> I think this is the best way to maintain the current intended behavior of
> cone mode sparse-checkout. While the idea of single file "exceptions" to
> cone patterns has been brought up in the past (I think most recently at the
> Contributor's Summit this past September [1]), it'd be a substantial change
> that's definitely out-of-scope of this small bugfix.
> ...
> And this test ensures the new check is working for the appropriate patterns.
>
> This patch looks good to me, thanks for finding & fixing the bug!

Thanks for an excellently written review, and a patch that is well
done.

Will queue.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2023

This branch is now known as ws/single-file-cone.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2023

This patch series was integrated into seen via git@c4136f8.

@gitgitgadget gitgitgadget bot added the seen label Jan 5, 2023
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2023

On the Git mailing list, William Sprent wrote (reply to this):

On 05/01/2023 00.48, Victoria Dye wrote:
> > I think this is the best way to maintain the current intended behavior of
> cone mode sparse-checkout. While the idea of single file "exceptions" to
> cone patterns has been brought up in the past (I think most recently at the
> Contributor's Summit this past September [1]), it'd be a substantial change
> that's definitely out-of-scope of this small bugfix.
> > [1] https://lore.kernel.org/git/YzXwOsaoCdBhHsX1@nand.local/
> Thanks for the context. I didn't know that it had been discussed.

>>
>> ...
>>
> > And this test ensures the new check is working for the appropriate patterns.
> > This patch looks good to me, thanks for finding & fixing the bug!
> Thanks for taking the time to review!

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 6, 2023

This patch series was integrated into seen via git@99ebb44.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 6, 2023

There was a status update in the "New Topics" section about the branch ws/single-file-cone on the Git mailing list:

The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.

Will merge to 'next'.
source: <pull.1446.v2.git.1672734059938.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 7, 2023

This patch series was integrated into seen via git@606fba4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 8, 2023

This patch series was integrated into seen via git@69fea17.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 8, 2023

This patch series was integrated into seen via git@34f8f17.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 8, 2023

This patch series was integrated into seen via git@60b73d4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2023

This patch series was integrated into seen via git@2c67126.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2023

This patch series was integrated into next via git@b6d4d7b.

@gitgitgadget gitgitgadget bot added the next label Jan 9, 2023
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2023

There was a status update in the "Cooking" section about the branch ws/single-file-cone on the Git mailing list:

The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.

Will merge to 'master'.
source: <pull.1446.v2.git.1672734059938.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2023

This patch series was integrated into seen via git@bc8d351.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 14, 2023

This patch series was integrated into seen via git@211c853.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 14, 2023

There was a status update in the "Cooking" section about the branch ws/single-file-cone on the Git mailing list:

The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.

Will merge to 'master'.
source: <pull.1446.v2.git.1672734059938.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 14, 2023

This patch series was integrated into seen via git@fa465d3.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 14, 2023

This patch series was integrated into seen via git@2f050e9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2023

This patch series was integrated into seen via git@fa0d3e1.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2023

This patch series was integrated into seen via git@fb4d122.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2023

This patch series was integrated into seen via git@72ed483.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2023

This patch series was integrated into seen via git@1368943.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2023

This patch series was integrated into seen via git@ab85a7d.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2023

This patch series was integrated into master via git@ab85a7d.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2023

This patch series was integrated into next via git@ab85a7d.

@gitgitgadget gitgitgadget bot added the master label Jan 16, 2023
@gitgitgadget gitgitgadget bot closed this Jan 16, 2023
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2023

Closed via ab85a7d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant