Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index-pack: fsck honor checks #1658

Closed
wants to merge 2 commits into from

Conversation

john-cai
Copy link
Contributor

@john-cai john-cai commented Jan 24, 2024

git-index-pack has a --strict mode that can take an optional argument to
provide a list of fsck issues to change their severity. --fsck-objects
does not have such a utility, which would be useful if one would like to
be more lenient or strict on data integrity in a repository.

Like --strict, Allow --fsck-objects to also take a list of fsck msgs to
change the severity.

Changes since V3:

  • clarification of --fsck-objects documentation wording

Changes since V2:

  • fixed some typos in the documentation
  • added commit trailers

Change since V1:

  • edited commit messages
  • clarified formatting in documentation for --strict= and --fsck-objects=

cc: Jonathan Tan jonathantanmy@google.com
cc: Patrick Steinhardt ps@pks.im
cc: Christian Couder christian.couder@gmail.com
cc: SZEDER Gábor szeder.dev@gmail.com

@john-cai john-cai force-pushed the jc/index-pack-fsck-honor-checks branch 2 times, most recently from 9c83b88 to 074e0c7 Compare January 25, 2024 19:49
@john-cai
Copy link
Contributor Author

/preview

Copy link

Preview email sent as pull.1658.git.git.1706212367.gitgitgadget@gmail.com

@john-cai
Copy link
Contributor Author

/submit

Copy link

Submitted as pull.1658.git.git.1706215884.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v1

To fetch this version to local tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v1

@@ -79,8 +79,13 @@ OPTIONS
to force the version for the generated pack index, and to force

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> 5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
> 2015-06-22) allowed a list of fsck msg to downgrade to be passed to
> --strict. However this is a hidden argument that was not documented nor
> tested. Though true that most users would not call this option
> direction, (nor use index-pack for that matter) it is still useful to

Though it is true that ... call this option directly (nor use
index-pack, for that matter), it is still ...

or something like that, probably.

> document and test this feature.

And I agree with that.  Thanks for adding the necessary doc.

> +--strict[=<msg-ids>]::

<msg-id> in the context of "git fsck --help" seems to refer to the
left hand side of <msg-id>=<severity>.  <msg-ids> sounds as if it is
just the list of <msg-id> without saying anything about their severity,
which is not what we want to imply.

Either use a made-up word that is clearly different and can not be
mistaken as a list of <msg-id>, or spell it out a bit more
explicitly, may make it easier to follow?

	--strict[=<fsck-config>]
	--strict[=<msg-id>=<severity>...]

I dunno.

Use of <msg-id> and <severity> below looks good in the body of the
paragraph here.

> +	Die, if the pack contains broken objects or links. If `<msg-ids>` is passed,
> +	it should be a comma-separated list of `<msg-id>=<severity>` elements where
> +	`<msg-id>` and `<severity>` are used to change the severity of some possible
> +	issues, eg: `--strict="missingEmail=ignore,badTagName=error"`. See the entry
> +	for the `fsck.<msg-id>` configuration options in `linkgit:git-fsck[1] for
> +	more information on the possible values of `<msg-id>` and `<severity>`.

"eg:" -> "e.g.," probably.

> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index d402ec18b79..9563372ae27 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -441,6 +441,28 @@ test_expect_success 'index-pack with --strict' '
>  	)
>  '
>  
> +test_expect_success 'index-pack with --strict downgrading fsck msgs' '
> +	test_when_finished rm -rf strict &&
> +	git init strict &&
> +	(
> +		cd strict &&
> +		test_commit first hello &&
> +		cat >commit <<-EOF &&
> +		tree $(git rev-parse HEAD^{tree})
> +		parent $(git rev-parse HEAD)
> +		author A U Thor
> +		committer A U Thor
> +
> +		commit: this is a commit wit bad emails

"wit" -> "with"; as this typo does not contribute anything to
the badness we expect index-pack to notice, it would pay to
make sure we do not have it, to avoid distracting readers.

> +		EOF
> +		git hash-object --literally -t commit -w --stdin <commit >commit_list &&
> +		PACK=$(git pack-objects test <commit_list) &&
> +		test_must_fail git index-pack --strict "test-$PACK.pack" &&
> +		git index-pack --strict="missingEmail=ignore" "test-$PACK.pack"
> +	)
> +'
> +
>  test_expect_success 'honor pack.packSizeLimit' '
>  	git config pack.packSizeLimit 3m &&
>  	packname_10=$(git pack-objects test-10 <obj-list) &&

@@ -91,8 +96,14 @@ default and "Indexing objects" when `--stdin` is specified.
--check-self-contained-and-connected::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> git-index-pack has a --strict mode that can take an optional argument to

"mode" -> "option", probably.

> provide a list of fsck issues to change their severity. --fsck-objects
> does not have such a utility, which would be useful if one would like to
> be more lenient or strict on data integrity in a repository.
>
> Like --strict, Allow --fsck-objects to also take a list of fsck msgs to
> change the severity.

"Allow" -> "allow".

> This commit also removes the "For internal use only" note for
> --fsck-objects, and documents the option. This won't often be used by
> the normal end user, but it turns out it is useful for Git forges like
> GitLab.

"This commit also removes", "documents" -> "Remove", "document".

> ---fsck-objects::
> -	For internal use only.
> +--fsck-objects[=<msg-ids>]::
> +	Instructs index-pack to check for broken objects instead of broken
> +	links. If `<msg-ids>` is passed, it should be  a comma-separated list of

Very much pleased to see an additional description that is written
to clarify the difference between this option and the other --strict
option.  The original was totally unclear on this point, and it is
very much appreciated.

The other option notices and dies upon seeing either a broken object
or a dangling link.  This one only diagnoses broken objects and does
not care if the objects are connected.  However, saying "instead of"
here tempts readers to mistakenly think that the other one only
checks links and this one only checks contents, which is not what we
want to say.  Perhaps "to check for broken objects, but unlike
`--strict`, do not choke on broken links" or something?

Same comment on <msg-ids> as the previous step.

> +	`<msg-id>=<severity>` where `<msg-id>` and `<severity>` are used to
> +	change the severity of `fsck` errors, eg: `--strict="missingEmail=ignore,badTagName=ignore"`.

Same comment for "eg:" as before.

> @@ -1785,8 +1785,9 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  			} else if (!strcmp(arg, "--check-self-contained-and-connected")) {
>  				strict = 1;
>  				check_self_contained_and_connected = 1;
> -			} else if (!strcmp(arg, "--fsck-objects")) {
> +			} else if (skip_to_optional_arg(arg, "--fsck-objects", &arg)) {
>  				do_fsck_object = 1;
> +				fsck_set_msg_types(&fsck_options, arg);
>  			} else if (!strcmp(arg, "--verify")) {
>  				verify = 1;
>  			} else if (!strcmp(arg, "--verify-stat")) {

The implementation of this part looks quite obvious, once you see
how "--strict[=<msgid>=<level>]" is implemented.

Looking good.

Thanks.

Copy link

This branch is now known as jc/index-pack-fsck-levels.

Copy link

This patch series was integrated into seen via 0883498.

@john-cai john-cai force-pushed the jc/index-pack-fsck-honor-checks branch from 074e0c7 to cce63c6 Compare January 26, 2024 17:08
@john-cai
Copy link
Contributor Author

/submit

Copy link

Submitted as pull.1658.v2.git.git.1706289180.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v2

To fetch this version to local tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v2

@@ -79,8 +79,13 @@ OPTIONS
to force the version for the generated pack index, and to force

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> 5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
> 2015-06-22) allowed a list of fsck msg to downgrade to be passed to
> --strict. However this is a hidden argument that was not documented nor
> tested. Though it is true that most users would not call this option
> directly, (nor use index-pack for that matter) it is still useful to
> document and test this feature.
>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  Documentation/git-index-pack.txt |  9 +++++++--
>  builtin/index-pack.c             |  2 +-
>  t/t5300-pack-object.sh           | 22 ++++++++++++++++++++++
>  3 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
> index 6486620c3d8..f7a98bbf9c8 100644
> --- a/Documentation/git-index-pack.txt
> +++ b/Documentation/git-index-pack.txt
> @@ -79,8 +79,13 @@ OPTIONS
>  	to force the version for the generated pack index, and to force
>  	64-bit index entries on objects located above the given offset.
>  
> ---strict::
> -	Die, if the pack contains broken objects or links.
> +--strict[=<msg-id>=<severity>...]::
> +	Die, if the pack contains broken objects or links. If `<msg-ids>` is passed,
> +	it should be a comma-separated list of `<msg-id>=<severity>` elements where
> +	`<msg-id>` and `<severity>` are used to change the severity of some possible
> +	issues, e.g., `--strict="missingEmail=ignore,badTagName=error"`. See the entry

There no longer is <msg-ids>, so I'll tweak the text perhaps like so:

	An optional value that is a comma-separated list of '<msg-id>=<severity>'
	can be passed to change the severity of some possible issues, ...

while queueing.  Will probably do the same for the --fsck-objects
side in the next patch.

Other than that, thanks for a pleasant read.

> +	for the `fsck.<msg-id>` configuration options in `linkgit:git-fsck[1] for
> +	more information on the possible values of `<msg-id>` and `<severity>`.
>  
>  --progress-title::
>  	For internal use only.
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 1ea87e01f29..1e53ca23775 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -24,7 +24,7 @@
>  #include "setup.h"
>  
>  static const char index_pack_usage[] =
> -"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
> +"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-ids>]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
>  
>  struct object_entry {
>  	struct pack_idx_entry idx;
> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index d402ec18b79..496fffa0f8a 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -441,6 +441,28 @@ test_expect_success 'index-pack with --strict' '
>  	)
>  '
>  
> +test_expect_success 'index-pack with --strict downgrading fsck msgs' '
> +	test_when_finished rm -rf strict &&
> +	git init strict &&
> +	(
> +		cd strict &&
> +		test_commit first hello &&
> +		cat >commit <<-EOF &&
> +		tree $(git rev-parse HEAD^{tree})
> +		parent $(git rev-parse HEAD)
> +		author A U Thor
> +		committer A U Thor
> +
> +		commit: this is a commit with bad emails
> +
> +		EOF
> +		git hash-object --literally -t commit -w --stdin <commit >commit_list &&
> +		PACK=$(git pack-objects test <commit_list) &&
> +		test_must_fail git index-pack --strict "test-$PACK.pack" &&
> +		git index-pack --strict="missingEmail=ignore" "test-$PACK.pack"
> +	)
> +'
> +
>  test_expect_success 'honor pack.packSizeLimit' '
>  	git config pack.packSizeLimit 3m &&
>  	packname_10=$(git pack-objects test-10 <obj-list) &&

@@ -91,8 +96,14 @@ default and "Indexing objects" when `--stdin` is specified.
--check-self-contained-and-connected::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +--fsck-objects[=<msg-ids>=<severity>...]::
> +	Instructs index-pack to check for broken objects, but unlike `--strict`,
> +	does not choke on broken links. If `<msg-ids>` is passed, it should be
> +	a comma-separated list of `<msg-id>=<severity>` where `<msg-id>` and
> +	`<severity>` are used to change the severity of `fsck` errors e.g.,
> +	`--strict="missingEmail=ignore,badTagName=ignore"`. See the entry for

In addition to the comment I made in my reply to 1/2, this should
be `--fsck-objects="missingEmail=ignore,badTagName=ignore"`, I
think.  Will treak locally.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 26 Jan 2024, at 13:13, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> +--fsck-objects[=<msg-ids>=<severity>...]::
>> +	Instructs index-pack to check for broken objects, but unlike `--strict`,
>> +	does not choke on broken links. If `<msg-ids>` is passed, it should be
>> +	a comma-separated list of `<msg-id>=<severity>` where `<msg-id>` and
>> +	`<severity>` are used to change the severity of `fsck` errors e.g.,
>> +	`--strict="missingEmail=ignore,badTagName=ignore"`. See the entry for
>
> In addition to the comment I made in my reply to 1/2, this should
> be `--fsck-objects="missingEmail=ignore,badTagName=ignore"`, I
> think.  Will treak locally.

Good catch. I might as well re-roll this since I forgot to add a Reviewed-by:
Christian Couder <christian.couder@gmail.com> trailer to the commits.

>
> Thanks.

@john-cai john-cai force-pushed the jc/index-pack-fsck-honor-checks branch 2 times, most recently from ac9ed44 to 16233cf Compare January 26, 2024 20:39
5d477a3 (fsck (receive-pack): allow demoting errors to warnings,
2015-06-22) allowed a list of fsck msg to downgrade to be passed to
--strict. However this is a hidden argument that was not documented nor
tested. Though it is true that most users would not call this option
directly, (nor use index-pack for that matter) it is still useful to
document and test this feature.

Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/index-pack-fsck-honor-checks branch from 16233cf to a2b9adb Compare January 26, 2024 20:56
@john-cai
Copy link
Contributor Author

/submit

Copy link

Submitted as pull.1658.v3.git.git.1706302749.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v3

To fetch this version to local tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v3

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
>      @@ Metadata
>       Author: John Cai <johncai86@gmail.com>
>       
>        ## Commit message ##
>      -    index-pack: test and document --strict=<msg>
>      +    index-pack: test and document --strict=<msg-id>=<severity>...

Ah, I missed this one.  Nice spotting.

>           5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
>           2015-06-22) allowed a list of fsck msg to downgrade to be passed to
>      @@ Commit message
>           directly, (nor use index-pack for that matter) it is still useful to
>           document and test this feature.
>       
>      +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
>           Signed-off-by: John Cai <johncai86@gmail.com>

I haven't seen Christian involved (by getting Cc'ed these patches,
sending out review comments, or giving his Reviewed-by:) during
these three rounds of this topic.  I'll wait until I hear from him
before queuing this, just to be safe.

>      ++	Die, if the pack contains broken objects or links. An optional
>      ++	comma-separated list of `<msg-id>=<severity>` can be passed to change
>      ++	the severity of some possible issues, e.g.,
>      ++	 `--strict="missingEmail=ignore,badTagName=error"`. See the entry for the
>      ++	`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
>      ++	information on the possible values of `<msg-id>` and `<severity>`.

This is much better than the tentative text I tweaked.  Nice.

>      ++--fsck-objects[=<msg-id>=<severity>...]::
>      ++	Die if the pack contains broken objects. If the pack contains a tree
>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>      ++	that blob (for the caller to check) after the hash that goes into the
>      ++	name of the pack/idx file (see "Notes").

Not a new problem bit I have to wonder what happens if the pack
contains many trees that point at different blobs for ".gitmodules"
path and many of these blobs are not included in the packfile?  Will
the caller receive all of these blob object names so that they can
be verified?  The reference to the "Notes" only refer to the fact
that usually a single hash value that is used in constructing the
name of the packfile "pack-<Hashvalue>.pack" is emitted to the
standard output, which is not wrong per se, but does not help
readers very much wrt to understanding this.

[jc: dragging JTan into the thread, as this comes from his 5476e1ef
(fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

>        +
>      ++Unlike `--strict` however, don't choke on broken links. An optional

You'd need a comma on both sides of "however" used like this, I
think.  

In any case, I thought your original construction to have this
"unlike" immediately after "die on broken objects" was far easier to
follow.

Thanks.

git-index-pack has a --strict option that can take an optional argument
to provide a list of fsck issues to change their severity.
--fsck-objects does not have such a utility, which would be useful if
one would like to be more lenient or strict on data integrity in a
repository.

Like --strict, allow --fsck-objects to also take a list of fsck msgs to
change the severity.

Remove the "For internal use only" note for --fsck-objects, and document
the option. This won't often be used by the normal end user, but it
turns out it is useful for Git forges like GitLab.

Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/index-pack-fsck-honor-checks branch from a2b9adb to f29ab91 Compare January 26, 2024 22:14
Copy link

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 26 Jan 2024, at 16:18, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
>>      @@ Metadata
>>       Author: John Cai <johncai86@gmail.com>
>>
>>        ## Commit message ##
>>      -    index-pack: test and document --strict=<msg>
>>      +    index-pack: test and document --strict=<msg-id>=<severity>...
>
> Ah, I missed this one.  Nice spotting.
>
>>           5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
>>           2015-06-22) allowed a list of fsck msg to downgrade to be passed to
>>      @@ Commit message
>>           directly, (nor use index-pack for that matter) it is still useful to
>>           document and test this feature.
>>
>>      +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
>>           Signed-off-by: John Cai <johncai86@gmail.com>
>
> I haven't seen Christian involved (by getting Cc'ed these patches,
> sending out review comments, or giving his Reviewed-by:) during
> these three rounds of this topic.  I'll wait until I hear from him
> before queuing this, just to be safe.

Christian was involved on an off-list review of this patch series. You can see
it in [1].


1. https://gitlab.com/gitlab-org/git/-/merge_requests/88
>
>>      ++	Die, if the pack contains broken objects or links. An optional
>>      ++	comma-separated list of `<msg-id>=<severity>` can be passed to change
>>      ++	the severity of some possible issues, e.g.,
>>      ++	 `--strict="missingEmail=ignore,badTagName=error"`. See the entry for the
>>      ++	`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
>>      ++	information on the possible values of `<msg-id>` and `<severity>`.
>
> This is much better than the tentative text I tweaked.  Nice.
>
>>      ++--fsck-objects[=<msg-id>=<severity>...]::
>>      ++	Die if the pack contains broken objects. If the pack contains a tree
>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>>      ++	that blob (for the caller to check) after the hash that goes into the
>>      ++	name of the pack/idx file (see "Notes").
>
> Not a new problem bit I have to wonder what happens if the pack
> contains many trees that point at different blobs for ".gitmodules"
> path and many of these blobs are not included in the packfile?  Will
> the caller receive all of these blob object names so that they can
> be verified?  The reference to the "Notes" only refer to the fact
> that usually a single hash value that is used in constructing the
> name of the packfile "pack-<Hashvalue>.pack" is emitted to the
> standard output, which is not wrong per se, but does not help
> readers very much wrt to understanding this.
>
> [jc: dragging JTan into the thread, as this comes from his 5476e1ef
> (fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

sounds good, will wait for some clarification here
>
>>        +
>>      ++Unlike `--strict` however, don't choke on broken links. An optional
>
> You'd need a comma on both sides of "however" used like this, I
> think.

good catch

>
> In any case, I thought your original construction to have this
> "unlike" immediately after "die on broken objects" was far easier to
> follow.

I'll reformulate this to be clearer. From the previous version I realized I
didn't take into account the pre-existing "Die if the pack contains broken
objects" block so I put it at the beginning. But now I think you're right in
that the "Unlike..." comes too late.

>
> Thanks.

Copy link

On the Git mailing list, Jonathan Tan wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:
> >      ++--fsck-objects[=<msg-id>=<severity>...]::
> >      ++	Die if the pack contains broken objects. If the pack contains a tree
> >      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
> >      ++	that blob (for the caller to check) after the hash that goes into the
> >      ++	name of the pack/idx file (see "Notes").
> 
> Not a new problem bit I have to wonder what happens if the pack
> contains many trees that point at different blobs for ".gitmodules"
> path and many of these blobs are not included in the packfile?  Will
> the caller receive all of these blob object names so that they can
> be verified?  The reference to the "Notes" only refer to the fact
> that usually a single hash value that is used in constructing the
> name of the packfile "pack-<Hashvalue>.pack" is emitted to the
> standard output, which is not wrong per se, but does not help
> readers very much wrt to understanding this.
> 
> [jc: dragging JTan into the thread, as this comes from his 5476e1ef
> (fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

Ah...I can see how that documentation isn't clear. The intention of that
commit is to check every link to a .gitmodules blob. The tests perhaps
should have been written with 2 .gitmodules blobs (in separate commits),
but I think the production code works: I tried changing the test to have
2 commits each with their own .gitmodules blob, and error messages were
printed for both blobs.

(If someone changes that test, e.g. to have 2 blobs, the ">h" in the
"configure_exclusion" invocations look superfluous and is perhaps a
copy-and-paste error from other tests that needed the hash later.)

Copy link

User Jonathan Tan <jonathantanmy@google.com> has been added to the cc: list.

Copy link

This patch series was integrated into seen via 9d836bb.

Copy link

There was a status update in the "New Topics" section about the branch jc/index-pack-fsck-levels on the Git mailing list:

The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.

Expecting a reroll.
cf. <BF772E83-2BFE-4652-A742-67FADF3D8FE2@gmail.com>
source: <pull.1658.v3.git.git.1706302749.gitgitgadget@gmail.com>

@john-cai
Copy link
Contributor Author

john-cai commented Feb 1, 2024

/submit

Copy link

Submitted as pull.1658.v4.git.git.1706751483.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v4

To fetch this version to local tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v4

Copy link

On the Git mailing list, John Cai wrote (reply to this):

Hi Jonathan,

On 31 Jan 2024, at 17:30, Jonathan Tan wrote:

> John Cai <johncai86@gmail.com> writes:
>> Hi Jonathan,
>>
>> On 26 Jan 2024, at 17:13, Jonathan Tan wrote:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>>      ++--fsck-objects[=<msg-id>=<severity>...]::
>>>>>      ++	Die if the pack contains broken objects. If the pack contains a tree
>>>>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>>>>>      ++	that blob (for the caller to check) after the hash that goes into the
>>>>>      ++	name of the pack/idx file (see "Notes").
>
>> Thanks for clarifying! Would you mind providing a patch to revise the wording
>> here to make it clearer? I would try but I feel like I might get the wording
>> wrong.
>
> I think the wording there is already mostly correct, except maybe make
> everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
> blobs, hash of that blob -> hashes of those blobs). We might also need
> to modify a test to show that the current code indeed handles the plural
> situation correctly. I don't have time right now to get to this, so
> hopefully someone could pick this up.

Thanks! It sounds like we may want to tackle this as part of another patch.

John

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

John Cai <johncai86@gmail.com> writes:

>>> Thanks for clarifying! Would you mind providing a patch to revise the wording
>>> here to make it clearer? I would try but I feel like I might get the wording
>>> wrong.
>>
>> I think the wording there is already mostly correct, except maybe make
>> everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
>> blobs, hash of that blob -> hashes of those blobs). We might also need
>> to modify a test to show that the current code indeed handles the plural
>> situation correctly. I don't have time right now to get to this, so
>> hopefully someone could pick this up.
>
> Thanks! It sounds like we may want to tackle this as part of another patch.

Yeah, the existing documentation has been with our users for some
time, and it is not ultra urgent to fix it in that sense.  I'd say
that it can even wait until JTan gets bored with what he's doing and
needs some distraction himself ;-) 

As long as our collective mind remembers it as #leftoverbits it
would be sufficient.

Thanks, both.

Copy link

This patch series was integrated into seen via 5817b63.

Copy link

On the Git mailing list, Christian Couder wrote (reply to this):

(John, sorry for having already sent this only to you.)

On Thu, Feb 1, 2024 at 2:47 AM John Cai via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> git-index-pack has a --strict mode that can take an optional argument to
> provide a list of fsck issues to change their severity. --fsck-objects does
> not have such a utility, which would be useful if one would like to be more
> lenient or strict on data integrity in a repository.
>
> Like --strict, Allow --fsck-objects to also take a list of fsck msgs to
> change the severity.
>
> Changes since V3:
>
>  * clarification of --fsck-objects documentation wording
>
> Changes since V2:
>
>  * fixed some typos in the documentation
>  * added commit trailers
>
> Change since V1:
>
>  * edited commit messages
>  * clarified formatting in documentation for --strict= and --fsck-objects=
>
> John Cai (2):
>   index-pack: test and document --strict=<msg-id>=<severity>...
>   index-pack: --fsck-objects to take an optional argument for fsck msgs

I reviewed internally on GitLab the initial versions of this series
and I reviewed this version 4. It looks great to me, so I am happy to
give my "Reviewed-by:".

Thanks!

Copy link

User Christian Couder <christian.couder@gmail.com> has been added to the cc: list.

Copy link

This patch series was integrated into seen via 0e4ef26.

Copy link

This patch series was integrated into seen via 4d26bd8.

Copy link

This patch series was integrated into next via 0e4ef26.

@gitgitgadget-git gitgitgadget-git bot added the next label Feb 2, 2024
Copy link

There was a status update in the "Cooking" section about the branch jc/index-pack-fsck-levels on the Git mailing list:

The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.

Will merge to 'master'.
source: <pull.1658.v4.git.git.1706751483.gitgitgadget@gmail.com>

Copy link

This patch series was integrated into seen via 0e4ef26.

Copy link

This patch series was integrated into seen via 945ce3c.

Copy link

There was a status update in the "Cooking" section about the branch jc/index-pack-fsck-levels on the Git mailing list:

The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.

Will merge to 'master'.
source: <pull.1658.v4.git.git.1706751483.gitgitgadget@gmail.com>

Copy link

This patch series was integrated into seen via 39515a5.

Copy link

This patch series was integrated into seen via b6d75b6.

Copy link

There was a status update in the "Cooking" section about the branch jc/index-pack-fsck-levels on the Git mailing list:

The "--fsck-objects" option of "git index-pack" now can take the
optional parameter to tweak severity of different fsck errors.

Will merge to 'master'.
source: <pull.1658.v4.git.git.1706751483.gitgitgadget@gmail.com>

Copy link

This patch series was integrated into seen via 41c4570.

Copy link

This patch series was integrated into seen via cabe1ba.

Copy link

This patch series was integrated into seen via 2c015ba.

Copy link

This patch series was integrated into seen via fc320f5.

Copy link

This patch series was integrated into seen via 2c90347.

Copy link

This patch series was integrated into master via 2c90347.

Copy link

This patch series was integrated into next via 2c90347.

Copy link

Closed via 2c90347.

@@ -91,13 +96,18 @@ default and "Indexing objects" when `--stdin` is specified.
--check-self-contained-and-connected::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Thu, Feb 01, 2024 at 01:38:02AM +0000, John Cai via GitGitGadget wrote:
> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 496fffa0f8a..a58f91035d1 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -441,8 +441,7 @@ test_expect_success 'index-pack with --strict' '
>  	)
>  '
>  
> -test_expect_success 'index-pack with --strict downgrading fsck msgs' '
> -	test_when_finished rm -rf strict &&
> +test_expect_success 'setup for --strict and --fsck-objects downgrading fsck msgs' '
>  	git init strict &&
>  	(
>  		cd strict &&
> @@ -457,12 +456,32 @@ test_expect_success 'index-pack with --strict downgrading fsck msgs' '
>  
>  		EOF
>  		git hash-object --literally -t commit -w --stdin <commit >commit_list &&
> -		PACK=$(git pack-objects test <commit_list) &&
> -		test_must_fail git index-pack --strict "test-$PACK.pack" &&
> -		git index-pack --strict="missingEmail=ignore" "test-$PACK.pack"
> +		git pack-objects test <commit_list >pack-name
>  	)
>  '
>  
> +test_with_bad_commit () {
> +	must_fail_arg="$1" &&
> +	must_pass_arg="$2" &&
> +	(
> +		cd strict &&
> +		test_expect_fail git index-pack "$must_fail_arg" "test-$(cat pack-name).pack"

There is no such command as 'test_expect_fail', resulting in:

  expecting success of 5300.34 'index-pack with --strict downgrading fsck msgs':
          test_with_bad_commit --strict --strict="missingEmail=ignore"

  + test_with_bad_commit --strict --strict=missingEmail=ignore
  + must_fail_arg=--strict
  + must_pass_arg=--strict=missingEmail=ignore
  + cd strict
  + cat pack-name
  + test_expect_fail git index-pack --strict test-e4e1649155bf444fbd9cd85e376628d6eaf3d3bd.pack
  ./t5300-pack-object.sh: 468: eval: test_expect_fail: not found
  + cat pack-name
  + git index-pack --strict=missingEmail=ignore test-e4e1649155bf444fbd9cd85e376628d6eaf3d3bd.pack
  e4e1649155bf444fbd9cd85e376628d6eaf3d3bd

  ok 34 - index-pack with --strict downgrading fsck msgs

The missing command should fail the test, but it doesn't, because the
&&-chain is broken on this line as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Szeder,

On 8 Mar 2024, at 17:24, SZEDER Gábor wrote:

> On Thu, Feb 01, 2024 at 01:38:02AM +0000, John Cai via GitGitGadget wrote:
>> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
>> index 496fffa0f8a..a58f91035d1 100755
>> --- a/t/t5300-pack-object.sh
>> +++ b/t/t5300-pack-object.sh
>> @@ -441,8 +441,7 @@ test_expect_success 'index-pack with --strict' '
>>  	)
>>  '
>>
>> -test_expect_success 'index-pack with --strict downgrading fsck msgs' '
>> -	test_when_finished rm -rf strict &&
>> +test_expect_success 'setup for --strict and --fsck-objects downgrading fsck msgs' '
>>  	git init strict &&
>>  	(
>>  		cd strict &&
>> @@ -457,12 +456,32 @@ test_expect_success 'index-pack with --strict downgrading fsck msgs' '
>>
>>  		EOF
>>  		git hash-object --literally -t commit -w --stdin <commit >commit_list &&
>> -		PACK=$(git pack-objects test <commit_list) &&
>> -		test_must_fail git index-pack --strict "test-$PACK.pack" &&
>> -		git index-pack --strict="missingEmail=ignore" "test-$PACK.pack"
>> +		git pack-objects test <commit_list >pack-name
>>  	)
>>  '
>>
>> +test_with_bad_commit () {
>> +	must_fail_arg="$1" &&
>> +	must_pass_arg="$2" &&
>> +	(
>> +		cd strict &&
>> +		test_expect_fail git index-pack "$must_fail_arg" "test-$(cat pack-name).pack"
>
> There is no such command as 'test_expect_fail', resulting in:

Indeed, thanks for catching this.

>
>   expecting success of 5300.34 'index-pack with --strict downgrading fsck msgs':
>           test_with_bad_commit --strict --strict="missingEmail=ignore"
>
>   + test_with_bad_commit --strict --strict=missingEmail=ignore
>   + must_fail_arg=--strict
>   + must_pass_arg=--strict=missingEmail=ignore
>   + cd strict
>   + cat pack-name
>   + test_expect_fail git index-pack --strict test-e4e1649155bf444fbd9cd85e376628d6eaf3d3bd.pack
>   ./t5300-pack-object.sh: 468: eval: test_expect_fail: not found
>   + cat pack-name
>   + git index-pack --strict=missingEmail=ignore test-e4e1649155bf444fbd9cd85e376628d6eaf3d3bd.pack
>   e4e1649155bf444fbd9cd85e376628d6eaf3d3bd
>
>   ok 34 - index-pack with --strict downgrading fsck msgs
>
> The missing command should fail the test, but it doesn't, because the
> &&-chain is broken on this line as well.

yes will need to fix this as well

thanks
John

Copy link

User SZEDER Gábor <szeder.dev@gmail.com> has been added to the cc: list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant