Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' #1310

Closed
wants to merge 11 commits into from
Closed

Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' #1310

wants to merge 11 commits into from

Conversation

vdye
Copy link

@vdye vdye commented Aug 1, 2022

As part of the preparation for moving Scalar out of 'contrib/' and into Git, this series moves the functionality of 'scalar diagnose' into a new builtin ('git diagnose') and a new option ('--diagnose') for 'git bugreport'. This change further aligns Scalar with the objective [1] of having it only contain functionality and settings that benefit large Git repositories, but not all repositories. The diagnostics reported by 'scalar diagnose' relevant for investigating issues in any Git repository, so generating them should be part of a "normal" Git builtin.

The series is organized as follows:

  • Miscellaneous fixes for the existing 'scalar diagnose' implementation
  • Moving the code for generating diagnostics into a common location in the Git tree
  • Implementing 'git diagnose'
  • Implementing 'git bugreport --diagnose'
  • Updating the Scalar roadmap

Finally, despite 'scalar diagnose' now being nothing more than a wrapper for 'git bugreport --diagnose', it is not being deprecated in this series. Although deprecation -> removal could be a future cleanup effort, 'scalar diagnose' is kept around for now as an alias for users already accustomed to using it in 'scalar'.

Changes since V3

  • Moved 'parse-options.h' import in 'diagnose.h' into the commit where it's first needed.
  • Removed a redundant 'if (!res)' condition gating the final "Diagnostics complete" message in 'create_diagnostics_archive()'.
  • Improved error reporting in 'create_diagnostics_archive()'. I was originally going to modify the "failed to write archive" error to trigger whenever 'create_diagnostics_archive()' returned a nonzero value [2]. However, while working on it I realized the message would no longer be tied to a failure of 'write_archive()', making it less helpful in pinpointing an issue. To address the original issue ('add_directory_to_archiver()' silently failing in 'create_diagnostics_archive()'), I instead refactored those calls into a loop and added the error message. Now, there's exactly one error message printed for each possible early exit scenario from 'create_diagnostics_archive()', hopefully avoiding both redundancy & under-reporting.

Changes since V2

  • Replaced 'int include_everything' arg to 'create_diagnostic_archive()' with 'enum diagnose_mode mode'.
  • Replaced '--all' with configurable '--mode' option in 'git diagnose'; moved 'option_parse_diagnose()' into 'diagnose.c' so that it can be used for both 'git bugreport --diagnose' and 'git diagnose --mode'.
  • Split "builtin/diagnose.c: gate certain data behind '--all'" (formerly patch 7/10) into "diagnose.c: add option to configure archive contents" (patch 6/11) and "builtin/diagnose.c: add '--mode' option" (patch 8/11).
  • Added '--no-diagnose' for 'git bugreport'. I was initially going to use '--diagnose=none', but '--no-diagnose' was easier to configure when using the shared 'option_parse_diagnose()' function .
  • Updated usage strings, option descriptions, and documentation files for 'mode' option. To avoid needing to keep multiple lists of valid 'mode' values up-to-date, format mode value as <mode> everywhere except option description in 'git-diagnose.txt', where the values are listed. The documentation of '--diagnose' in 'git-bugreport.txt' links to 'git-diagnose.txt' and explicitly calls out that details on 'mode' can be found there.
  • Reworded 'git diagnose' and 'git bugreport' command & option documentation.
  • Added additional checks to 't0091-bugreport.sh' and 't0092-diagnose.sh' tests
  • Moved '#include "cache.h" from 'diagnose.h' to 'diagnose.c'.
  • Fixed '--output-directory' usage string in 'builtin/diagnose.c'.
  • Replaced 'die()' with 'die_errno()' in error triggered when leading directories of archive cannot be created.
  • Changed hardcoded '-1' error exit code in 'scalar diagnose' to returning the exit code from 'git diagnose --mode=all'.

Changes since V1

  • Reorganized patches to fix minor issues (e.g., more gently adding directories to the archive) of 'scalar diagnose' in 'scalar.c', before the code is moved out of that file.
  • (Almost) entirely redesigned the UI for generating diagnostics. The new approach avoids cluttering 'git bugreport' with a mode that doesn't actually generate a report. Now, there are distinct options for different use cases: generating extra diagnostics with a bug report ('git bugreport --diagnose') and generating diagnostics for personal debugging/addition to an existing bug report ('git diagnose').
  • Moved 'get_disk_info()' into 'compat/'.
  • Moved 'create_diagnostics_archive()' into a new 'diagnose.c', as it now has multiple callers.
  • Updated command & option documentation to more clearly guide users on how to use the new options.
  • Added the '--all' (and '--diagnose=all') option to change the default behavior of diagnostics generation to exclude '.git' directory contents. For many bug reporters, this would reveal private repository contents they don't want to expose to the public mailing list. This has the added benefit of creating much smaller archives by default, which will be more likely to successfully send to the mailing list.

Thanks!

  • Victoria

[1] https://lore.kernel.org/git/pull.1275.v2.git.1657584367.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/

CC: derrickstolee@github.com
CC: johannes.schindelin@gmx.de
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com

@vdye vdye self-assigned this Aug 1, 2022
@vdye
Copy link
Author

vdye commented Aug 1, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

Submitted as pull.1310.git.1659388498.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1310/vdye/scalar/generalize-diagnose-v1

To fetch this version to local tag pr-1310/vdye/scalar/generalize-diagnose-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1310/vdye/scalar/generalize-diagnose-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> As part of the preparation for moving Scalar out of 'contrib/' and into Git,
> this series moves the functionality of 'scalar diagnose' into a new option
> ('--diagnose') for 'git bugreport'. This change further aligns Scalar with
> the objective [1] of having it only contain functionality and settings that
> benefit large Git repositories, but not all repositories. The diagnostics
> reported by 'scalar diagnose' relevant for investigating issues in any Git
> repository, so generating them should be part of a "normal" Git builtin.

;-)

@@ -109,14 +109,14 @@ test_expect_success UNZIP 'scalar diagnose' '
sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the
> 'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar
> tests on systems where 'unzip' is not in the system path.

Makes sense.  It makes it more in line with how a handful of tests
in t/ already use the zip archive.

@@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Create a '--diagnose' option for 'git bugreport' to collect additional
> information about the repository and write it to a zipped archive.
>
> The "diagnose" functionality was originally implemented for Scalar in
> aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
> diagnostics gathered are not specific to Scalar-cloned repositories and
> could be useful when diagnosing issues in any Git repository.
>
> Note that, while this patch appears large, it is mostly copied directly out
> of 'scalar.c'. Specifically, the functions
>
> - dir_file_stats_objects()
> - dir_file_stats()
> - count_files()
> - loose_objs_stats()
> - add_directory_to_archiver()
> - get_disk_info()

Yup.  As this does not "move" code across from older place to the
new home, it takes a bit of processing to verify the above claim,
but

 $ git blame -C -C -C -s -b master.. -- builtin/bugreport.c

shows that these are largely verbatim copies.

> +#ifndef WIN32
> +#include <sys/statvfs.h>
> +#endif
> +
> +static int get_disk_info(struct strbuf *out)
> +{
> +#ifdef WIN32
> +	struct strbuf buf = STRBUF_INIT;
> +...
> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> +	strbuf_humanise_bytes(out, avail2caller.QuadPart);
> +...
> +#else
> +...
> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> +	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +...
> +#endif
> +	return 0;
> +}

As a proper part of Git, this part should probably be factored out
so that a platform specific helper function, implemented in compat/
layer, grabs "available disk space" number in off_t and the caller
of the above function becomes

	strbuf_realpath(&dir, ".", 1);
	strbuf_addf(out, "Available space on '%s:' ", dir.buf);
	strbuf_humanise_bytes(out, get_disk_size(dir.buf));

or something, without having to have #ifdef droppings.

> +static int create_diagnostics_archive(struct strbuf *zip_path)
> +{

Large part of this function is also lifted from scalar, and it looks
OK.  One thing I noticed is that "res" is explicitly initialized to
0, but given that the way the code is structured to use the "we
process sequencially in successful case, and branch out by 'goto'
immediately when we see a breakage" pattern, it may be better to
initialize it to -1 (i.e. assume error), or even better, leave it
uninitialized (i.e. let the compiler notice if a jump to cleanup is
made without setting res appropriately).

> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		close(1);
> +		dup2(stdout_fd, 1);
> +	}
> +	free(argv_copy);
> +	strvec_clear(&archiver_args);
> +	strbuf_release(&buf);

Hmph, stdout_fd is a copy of the file descriptor 1 that was saved
away at the beginning.  Then archiver_fd was created to write into
the zip archive, and during the bulk of the function it was dup2'ed
to the file descriptor 1, to make anything written to the latter
appear in the zip output.

When we successfully opened archive_fd but failed to dup2(), we may
close a wrong file desciptor 1 here, but we recover from that by
using the saved-away stdout_fd, so we'd be OK.  If we did dup2(),
then we would be OK, too.

I am wondering if archiver_fd itself is leaking here, though.

Also, if we failed to open archiver_fd, then we have stdout_fd
leaking here, I suspect.

> +	return res;
> +}

Other than that, looks good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a '--diagnose' option for 'git bugreport' to collect additional
>> information about the repository and write it to a zipped archive.
>>
>> The "diagnose" functionality was originally implemented for Scalar in
>> aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the
>> diagnostics gathered are not specific to Scalar-cloned repositories and
>> could be useful when diagnosing issues in any Git repository.
>>
>> Note that, while this patch appears large, it is mostly copied directly out
>> of 'scalar.c'. Specifically, the functions
>>
>> - dir_file_stats_objects()
>> - dir_file_stats()
>> - count_files()
>> - loose_objs_stats()
>> - add_directory_to_archiver()
>> - get_disk_info()
> 
> Yup.  As this does not "move" code across from older place to the
> new home, it takes a bit of processing to verify the above claim,
> but
> 
>  $ git blame -C -C -C -s -b master.. -- builtin/bugreport.c
> 
> shows that these are largely verbatim copies.
> 
>> +#ifndef WIN32
>> +#include <sys/statvfs.h>
>> +#endif
>> +
>> +static int get_disk_info(struct strbuf *out)
>> +{
>> +#ifdef WIN32
>> +	struct strbuf buf = STRBUF_INIT;
>> +...
>> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> +	strbuf_humanise_bytes(out, avail2caller.QuadPart);
>> +...
>> +#else
>> +...
>> +	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> +	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +...
>> +#endif
>> +	return 0;
>> +}
> 
> As a proper part of Git, this part should probably be factored out
> so that a platform specific helper function, implemented in compat/
> layer, grabs "available disk space" number in off_t and the caller
> of the above function becomes
> 
> 	strbuf_realpath(&dir, ".", 1);
> 	strbuf_addf(out, "Available space on '%s:' ", dir.buf);
> 	strbuf_humanise_bytes(out, get_disk_size(dir.buf));
> 
> or something, without having to have #ifdef droppings.
> 

This makes sense, I'll probably follow an approach similar to what was done
with 'compat/compiler.h' in [1] (unless adding to 'git-compat-util.h' would
be more appropriate?).

[1] https://lore.kernel.org/git/20200416211807.60811-6-emilyshaffer@google.com/

>> +static int create_diagnostics_archive(struct strbuf *zip_path)
>> +{
> 
> Large part of this function is also lifted from scalar, and it looks
> OK.  One thing I noticed is that "res" is explicitly initialized to
> 0, but given that the way the code is structured to use the "we
> process sequencially in successful case, and branch out by 'goto'
> immediately when we see a breakage" pattern, it may be better to
> initialize it to -1 (i.e. assume error), or even better, leave it
> uninitialized (i.e. let the compiler notice if a jump to cleanup is
> made without setting res appropriately).
> 

I'll go with the "uninitialized" approach in the re-roll; I like the
simplicity of relying on the compiler to determine if it's unassigned.

>> +diagnose_cleanup:
>> +	if (archiver_fd >= 0) {
>> +		close(1);
>> +		dup2(stdout_fd, 1);
>> +	}
>> +	free(argv_copy);
>> +	strvec_clear(&archiver_args);
>> +	strbuf_release(&buf);
> 
> Hmph, stdout_fd is a copy of the file descriptor 1 that was saved
> away at the beginning.  Then archiver_fd was created to write into
> the zip archive, and during the bulk of the function it was dup2'ed
> to the file descriptor 1, to make anything written to the latter
> appear in the zip output.
> 
> When we successfully opened archive_fd but failed to dup2(), we may
> close a wrong file desciptor 1 here, but we recover from that by
> using the saved-away stdout_fd, so we'd be OK.  If we did dup2(),
> then we would be OK, too.
> 
> I am wondering if archiver_fd itself is leaking here, though.
> 
> Also, if we failed to open archiver_fd, then we have stdout_fd
> leaking here, I suspect.
> 

If I'm not mistaken, both 'archiver_fd' and 'stdout_fd' are always leaked if
they're successfully created (they're never 'close()'d). There's also an
unnecessary check for 'archiver_fd < 0', since 'xopen()' will die if it
can't open the file. And, as you mentioned, the wrong file descriptor 1 is
closed if the 'dup2()' of 'archiver_fd' fails.

I'll clean this up for V2, thanks.

>> +	return res;
>> +}
> 
> Other than that, looks good to me.

@@ -5,6 +5,10 @@
#include "compat/compiler.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Avoid size_t overflow when reporting the available disk space in
> 'get_disk_info' by casting the block size and available block count to
> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
> bytes of available space.

Sane.  But shouldn't the cast be to off_t, which is what
strbuf_humanise_bytes() takes anyway?

>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/bugreport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 35b1fc48bf1..720889a37ad 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>  	}
>  
>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>  	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
>  	strbuf_release(&buf);
>  #endif

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Avoid size_t overflow when reporting the available disk space in
>> 'get_disk_info' by casting the block size and available block count to
>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>> bytes of available space.
> 
> Sane.  But shouldn't the cast be to off_t, which is what
> strbuf_humanise_bytes() takes anyway?
> 

I chose 'uint64_t' to mimic 'throughput_string()' [1], but the signed
'off_t' is a better choice given its use in 'strbuf_humanise_bytes()'.

On a related note, while writing this I made the (unsubstantiated)
assumption that 'off_t' would be a 64-bit int, even on 32-bit systems. Your
comment prompted me to confirm that assumption; while 'off_t' is not always
guaranteed to be an int64 by default [2], Git is compiled with '#define
_FILE_OFFSET_BITS 64' [3] so 'off_t' is equivalent to 'off64_t'.

I'll update the casts to 'off_t' in V2. Thanks!

[1] https://lore.kernel.org/git/20171110173956.25105-4-newren@gmail.com/
[2] https://www.gnu.org/software/libc/manual/html_mono/libc.html#index-off_005ft
[3] https://lore.kernel.org/git/7vr6smc1de.fsf@assigned-by-dhcp.cox.net/

>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  builtin/bugreport.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>> index 35b1fc48bf1..720889a37ad 100644
>> --- a/builtin/bugreport.c
>> +++ b/builtin/bugreport.c
>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>  	}
>>  
>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>>  	strbuf_addf(out, " (mount flags 0x%lx)\n", stat.f_flag);
>>  	strbuf_release(&buf);
>>  #endif

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Victoria Dye <vdye@github.com> writes:

> I chose 'uint64_t' to mimic 'throughput_string()' [1], but the signed
> 'off_t' is a better choice given its use in 'strbuf_humanise_bytes()'.
>
> On a related note, while writing this I made the (unsubstantiated)
> assumption that 'off_t' would be a 64-bit int, even on 32-bit systems. Your
> comment prompted me to confirm that assumption; while 'off_t' is not always
> guaranteed to be an int64 by default [2], Git is compiled with '#define
> _FILE_OFFSET_BITS 64' [3] so 'off_t' is equivalent to 'off64_t'.

Offset into a single file can be smaller than the size of the whole
disk, after all, so from that point of view, use of off_t in
"humanise_bytes" API may be something we would want to fix
eventually to reduce confusion.

I do not particularly mind casting up to uint64, especially if that
matches the code lifted from scalar.  As long as our longer term
plan is to update strbuf_humanise_bytes() to take the widest
possible unsigned integer, what we do right now would not make that
much of a difference.

Thanks.

@@ -5,6 +5,10 @@
#include "compat/compiler.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  	int at_root = !*path;
> -	DIR *dir = opendir(at_root ? "." : path);
> +	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t len;
>  	int res = 0;
>  
> +	if (!file_exists(at_root ? "." : path)) {
> +		warning(_("directory '%s' does not exist, will not be archived"), path);
> +		return 0;
> +	}
> +
> +	dir = opendir(at_root ? "." : path);
>  	if (!dir)
>  		return error_errno(_("could not open directory '%s'"), path);

I am not sure if TOCTTOU is how we want to be more gentle.  Do we
rather want to do something like this

	dir = opendir(...);
	if (!dir) {
		if (errno == ENOENT) {
			warning(_("not archiving missing directory '%s'", path);
		        return 0;
		}
                return error_errno(_("cannot open directory '%s'"), path);
	}

or am I missing something subtle?

Thanks.

> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> index 3cf983aa67f..e9db89ef2c8 100755
> --- a/t/t0091-bugreport.sh
> +++ b/t/t0091-bugreport.sh
> @@ -78,7 +78,7 @@ test_expect_success 'indicates populated hooks' '
>  	test_cmp expect actual
>  '
>  
> -test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
> +test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
>  	test_when_finished rm -rf report &&
>  
>  	git bugreport --diagnose -o report -s test >out &&
> @@ -98,4 +98,13 @@ test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>  	grep "^Total: [0-9][0-9]*" out
>  '
>  
> +test_expect_success '--diagnose warns when archived dir does not exist' '
> +	test_when_finished rm -rf report &&
> +
> +	# Remove logs - not guaranteed to exist
> +	rm -rf .git/logs &&
> +	git bugreport --diagnose -o report -s test 2>err &&
> +	grep "directory .\.git/logs. does not exist, will not be archived" err
> +'
> +
>  test_done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  	int at_root = !*path;
>> -	DIR *dir = opendir(at_root ? "." : path);
>> +	DIR *dir;
>>  	struct dirent *e;
>>  	struct strbuf buf = STRBUF_INIT;
>>  	size_t len;
>>  	int res = 0;
>>  
>> +	if (!file_exists(at_root ? "." : path)) {
>> +		warning(_("directory '%s' does not exist, will not be archived"), path);
>> +		return 0;
>> +	}
>> +
>> +	dir = opendir(at_root ? "." : path);
>>  	if (!dir)
>>  		return error_errno(_("could not open directory '%s'"), path);
> 
> I am not sure if TOCTTOU is how we want to be more gentle.  Do we
> rather want to do something like this
> 
> 	dir = opendir(...);
> 	if (!dir) {
> 		if (errno == ENOENT) {
> 			warning(_("not archiving missing directory '%s'", path);
> 		        return 0;
> 		}
>                 return error_errno(_("cannot open directory '%s'"), path);
> 	}
> 
> or am I missing something subtle?
> 

The "gentleness" was meant to be a reference only to the error -> warning
change, the TOCTTOU change was just a miss by me. I'll fix it in the next
version, thanks!

> Thanks.
> 
>> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
>> index 3cf983aa67f..e9db89ef2c8 100755
>> --- a/t/t0091-bugreport.sh
>> +++ b/t/t0091-bugreport.sh
>> @@ -78,7 +78,7 @@ test_expect_success 'indicates populated hooks' '
>>  	test_cmp expect actual
>>  '
>>  
>> -test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>> +test_expect_success UNZIP '--diagnose creates diagnostics zip archive' '
>>  	test_when_finished rm -rf report &&
>>  
>>  	git bugreport --diagnose -o report -s test >out &&
>> @@ -98,4 +98,13 @@ test_expect_failure UNZIP '--diagnose creates diagnostics zip archive' '
>>  	grep "^Total: [0-9][0-9]*" out
>>  '
>>  
>> +test_expect_success '--diagnose warns when archived dir does not exist' '
>> +	test_when_finished rm -rf report &&
>> +
>> +	# Remove logs - not guaranteed to exist
>> +	rm -rf .git/logs &&
>> +	git bugreport --diagnose -o report -s test 2>err &&
>> +	grep "directory .\.git/logs. does not exist, will not be archived" err
>> +'
>> +
>>  test_done

@@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Add a '--no-report' option to 'git bugreport' to avoid writing the
> 'git-bugreport-<suffix>.txt' file. This gives users the option of creating
> only the diagnostic archive with '--diagnose' and mirroring the behavior of
> the original 'scalar diagnose' as closely as possible.
>
> If a user specifies '--no-report' *without* also specifying '--diagnose',
> the 'git bugreport' operation is a no-op; a warning message is printed and
> the command returns with a non-error exit code.

I think this makes sense from scalar side, and I have no objection
against this "--no-report" feature existing, but I wonder if those
who want to send report may want to have a handy way to tell the
command to "include" the diag archive in their report (instead of
creating separate report and diagnose files, having to attach two
files to their message).  Perhaps that is unneeded, or perhaps that
comes in later patches in the series, I dunno.

> +--no-report::
> +	Do not write out a 'git-bugreport-<suffix>.txt' file. This option is
> +	intended for use with `--diagnose` when only the diagnostic archive is
> +	needed. If `--no-report` is used without `--diagnose`, `git bugreport`
> +	is a no-op.

I wonder if thinking it this way may make the UI simpler to explain.

The "git bugreport" is capable of showing report and diagnose with
these two orthogonal options, i.e.

	--report::	writes bugreport file
	--diagnose::	writes diagnostic archive

And for backward compatibility reasons, the command pretends as if
you gave it "--report" when you run it without either.

That way, "bugreport --diagnose" will just show diagnostic archive
without having to pass "--no-report".  There is no need for "nothing
to do", either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Add a '--no-report' option to 'git bugreport' to avoid writing the
>> 'git-bugreport-<suffix>.txt' file. This gives users the option of creating
>> only the diagnostic archive with '--diagnose' and mirroring the behavior of
>> the original 'scalar diagnose' as closely as possible.
>>
>> If a user specifies '--no-report' *without* also specifying '--diagnose',
>> the 'git bugreport' operation is a no-op; a warning message is printed and
>> the command returns with a non-error exit code.
> 
> I think this makes sense from scalar side, and I have no objection
> against this "--no-report" feature existing, but I wonder if those
> who want to send report may want to have a handy way to tell the
> command to "include" the diag archive in their report (instead of
> creating separate report and diagnose files, having to attach two
> files to their message).  Perhaps that is unneeded, or perhaps that
> comes in later patches in the series, I dunno.
> 

I tried finding where in the documentation there are instructions on sending
a bug report to the mailing list, but didn't see anything (otherwise, I'd
add some info on '--diagnose' there). Maybe Emily would know?

If instructions like that don't exist, I'll update the command documentation
here to clarify that '--diagnose' generates an attachment that includes more
complete repository information to aid in debugging.

>> +--no-report::
>> +	Do not write out a 'git-bugreport-<suffix>.txt' file. This option is
>> +	intended for use with `--diagnose` when only the diagnostic archive is
>> +	needed. If `--no-report` is used without `--diagnose`, `git bugreport`
>> +	is a no-op.
> 
> I wonder if thinking it this way may make the UI simpler to explain.
> 
> The "git bugreport" is capable of showing report and diagnose with
> these two orthogonal options, i.e.
> 
> 	--report::	writes bugreport file
> 	--diagnose::	writes diagnostic archive
> 
> And for backward compatibility reasons, the command pretends as if
> you gave it "--report" when you run it without either.
> 
> That way, "bugreport --diagnose" will just show diagnostic archive
> without having to pass "--no-report".  There is no need for "nothing
> to do", either.
> 

I like the simplicity of this, but I'd imagine that a user would want to
generate diagnostics *with* a report more often than without one. The cases
I can think of for "standalone diagnostics" are: internally in 'scalar
diagnose', someone requesting more info after an initial bug report, or a
user looking into something on their own.

Maybe I could replace '--no-report' with '--diagnostics-only'? Then the
three modes of use would be:

- 'git bugreport': report only (most common usage)
- 'git bugreport --diagnose': report + diagnostics
- 'git bugreport --diagnostics-only': diagnostics only (least common usage)

It would eliminate the need for "nothing to do" while making it (I think?)
clearer to a user why you would want to use any of these options.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

This branch is now known as vd/scalar-generalize-diagnose.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

This patch series was integrated into seen via git@79d3423.

@gitgitgadget gitgitgadget bot added the seen label Aug 1, 2022
@@ -5,6 +5,10 @@
#include "compat/compiler.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Avoid size_t overflow when reporting the available disk space in
> 'get_disk_info' by casting the block size and available block count to
> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
> bytes of available space.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/bugreport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
> index 35b1fc48bf1..720889a37ad 100644
> --- a/builtin/bugreport.c
> +++ b/builtin/bugreport.c
> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>  	}
>  
>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);

Doesn't this remove the overflow guard on 64 bit systems to support
those 32 bit systems?

I also don't tthink it's correct that this would "correctly
report...". Before this we were simply assuming that "size_t" and
"unsigned long" & "fsblkcnt_t" would all yield the same thing.

But I don't think per [1] and [2] that POSIX is giving us any guarantees
in that regard, even on 32 bit systems, but perhaps it's a reasonable
assumption in practice.

1. https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html
2. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Avoid size_t overflow when reporting the available disk space in
>> 'get_disk_info' by casting the block size and available block count to
>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>> bytes of available space.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  builtin/bugreport.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>> index 35b1fc48bf1..720889a37ad 100644
>> --- a/builtin/bugreport.c
>> +++ b/builtin/bugreport.c
>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>  	}
>>  
>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
> 
> Doesn't this remove the overflow guard on 64 bit systems to support
> those 32 bit systems?
> 

It does, but the total disk space available on a system should be able to
fit into a 64-bit integer. I considered adding an explicit
'unsigned_mult_overflows', but decided against it because it's almost
certainly overkill for such an implausible edge case.

> I also don't tthink it's correct that this would "correctly
> report...". Before this we were simply assuming that "size_t" and
> "unsigned long" & "fsblkcnt_t" would all yield the same thing.
> 

The point I was making is that, if your 'size_t' is 32 bits, but you have
more than ~4GB of disk space available on your system, the result of the
multiplication will overflow 'size_t'. So, 'st_mult' failing because it
detects an overflow is "correct", rather than e.g. a false positive.

> But I don't think per [1] and [2] that POSIX is giving us any guarantees
> in that regard, even on 32 bit systems, but perhaps it's a reasonable
> assumption in practice.
> 
> 1. https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/statvfs.h.html
> 2. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Aug 02 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Avoid size_t overflow when reporting the available disk space in
>>> 'get_disk_info' by casting the block size and available block count to
>>> 'uint64_t' before multiplying them. Without this change, 'st_mult' would
>>> (correctly) report size_t overflow on 32-bit systems at or exceeding 2^32
>>> bytes of available space.
>>>
>>> Signed-off-by: Victoria Dye <vdye@github.com>
>>> ---
>>>  builtin/bugreport.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/builtin/bugreport.c b/builtin/bugreport.c
>>> index 35b1fc48bf1..720889a37ad 100644
>>> --- a/builtin/bugreport.c
>>> +++ b/builtin/bugreport.c
>>> @@ -258,7 +258,7 @@ static int get_disk_info(struct strbuf *out)
>>>  	}
>>>  
>>>  	strbuf_addf(out, "Available space on '%s': ", buf.buf);
>>> -	strbuf_humanise_bytes(out, st_mult(stat.f_bsize, stat.f_bavail));
>>> +	strbuf_humanise_bytes(out, (uint64_t)stat.f_bsize * (uint64_t)stat.f_bavail);
>> 
>> Doesn't this remove the overflow guard on 64 bit systems to support
>> those 32 bit systems?
>> 
>
> It does, but the total disk space available on a system should be able to
> fit into a 64-bit integer. I considered adding an explicit
> 'unsigned_mult_overflows', but decided against it because it's almost
> certainly overkill for such an implausible edge case.

Yeah it's probably overkill, and maybe this is good as-is & we don't
need to worry here.

But that's quite different from what the patch says, it's not "avoid
size_t overflow" but e.g.:

	bugreport.c: don't do size_t overflow check before casting to 32bit

	It's a hassle to support the check on 32 bit systems, and we
	don't think this is something we'll run into in practice [...]

Perhaps?

>> I also don't tthink it's correct that this would "correctly
>> report...". Before this we were simply assuming that "size_t" and
>> "unsigned long" & "fsblkcnt_t" would all yield the same thing.
>> 
>
> The point I was making is that, if your 'size_t' is 32 bits, but you have
> more than ~4GB of disk space available on your system, the result of the
> multiplication will overflow 'size_t'. So, 'st_mult' failing because it
> detects an overflow is "correct", rather than e.g. a false positive.

I think it would be useful to document these assumptions in the commit
message, POSIX just says "blkcnt_t and off_t shall be signed integer
types", and "size_t shall be an unsigned integer type.".

Do other bits of the standard(s) that I've missed say that off_t's
signed type must be double the width of size_t's unsigned, or is it one
of those things that's not standardized but can be relied on in
practice?

We have a related assertion in 37ee680d9b9 (http.postbuffer: allow full
range of ssize_t values, 2017-04-11) (xcurl_off_t()). Perhaps you want
to do something similar to sanity check your assumptions here?

1. https://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/types.h.html

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2022

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
SYNOPSIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
>  Documentation/git-bugreport.txt |  11 +-
>  builtin/bugreport.c             | 282 +++++++++++++++++++++++++++++++-
>  t/t0091-bugreport.sh            |  20 +++
>  3 files changed, 309 insertions(+), 4 deletions(-)
> [...]

Maybe it's not easy in this case, but I wonder if this series can't be
re-arranged in a way that more directly benefits from the diff move
detection.

E.g. if we moved the unchanged functions to a new repo-disk-usage.c or
something we could have an intermediate step of having both use that,
and then going forward would work towards a better lib/built-in
split-up...

> --- a/Documentation/git-bugreport.txt
> +++ b/Documentation/git-bugreport.txt
> @@ -8,7 +8,7 @@ git-bugreport - Collect information for user to file a bug report
>  SYNOPSIS
>  --------
>  [verse]
> -'git bugreport' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
> +'git bugreport' [<options>]
> [...]
>  static const char * const bugreport_usage[] = {
> -	N_("git bugreport [-o|--output-directory <file>] [-s|--suffix <format>]"),
> +	N_("git bugreport [<options>]"),
>  	NULL
>  };

We have some built-ins that punt on re-listing the synopsis in the -h
output, but we always list the full usage in the SYNOPSIS.

I think both of these hunks should be dropped, instead we should
(presumably) add a "git bugreport --diagnose" to this, and if it
combines (or not) with other options, let's update both accordingly.

> [...]
> +	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
>

Is the "scalar-diagnose" here a mistake?

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:

> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
> bugreport --diagnose', so I went with the latter, simpler approach.

I really wanted to like this, but I find the end result here really
confusing from a UX perspective.

You can now run "git bugreport --diagnose", which creates a giant *.zip
file to go along with your *.txt, but your *.txt makes no reference to
it.

Should you ... attach it to your bug report to this mailing list, do
something else?

The documentation doesn't offer much in the way of hints, other than
suggesting (with --no-report) that this --diagnose is for something
entirely different (and that's how "scalar" uses it).

I know what it's really for after reading this series, but for "git
bugreport" in particular we should be really careful about not making
the UX confusing.

The generated *.zip contains some really deep info about your repo (and
not just metadata, e.g. copies of the index, various logs etc.), someone
e.g. in a proprietary setting really doesn't want to be sharing that
info.

So I would like to see real integration into "git bugreport", i.e. for
us to smartly report more repository metrics, e.g. approx number of
loose objects, the sort of state "__git_ps1" might report, etc.

But I think the end-state here makes things much more confusing for
users.

> An alternative implementation considered was creating a new 'git diagnose'
> builtin, but the new command would end up duplicating much of
> 'builtin/bugreport.c'.

It seems we always "return" from cmd_bugreport() quite quickly, and we
basically only share the code to create the output directory. Just
duplicating or sharing that seems like a much better approach for now
than creating the above UX confusion.

Note that you can also share code between multiple built-ins, even in
the same file (see e.g. builtin/{checkout,log}.c). So we could even
share something like the safe_create_leading_directories() calling code
in bugreport.c without libifying it.

> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
> 'git bugreport --diagnose', it is not being deprecated in this series.
> Although deprecation -> removal could be a future cleanup effort, 'scalar
> diagnose' is kept around for now as an alias for users already accustomed to
> using it in 'scalar'.

We don't have a "make install" to get a "scalar" onto user's systems
yet, do we really need to worry about those users?

Or is this a reference to the out-of-tree version of "scalar", not
git.git's?

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2022

On the Git mailing list, Victoria Dye wrote (reply to this):

Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
> 
>> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
>> bugreport --diagnose', so I went with the latter, simpler approach.
> 
> I really wanted to like this, but I find the end result here really
> confusing from a UX perspective.
> 
> You can now run "git bugreport --diagnose", which creates a giant *.zip
> file to go along with your *.txt, but your *.txt makes no reference to
> it.
> 
> Should you ... attach it to your bug report to this mailing list, do
> something else?
> 
> The documentation doesn't offer much in the way of hints, other than
> suggesting (with --no-report) that this --diagnose is for something
> entirely different (and that's how "scalar" uses it).
> 
> I know what it's really for after reading this series, but for "git
> bugreport" in particular we should be really careful about not making
> the UX confusing.
> 
> The generated *.zip contains some really deep info about your repo (and
> not just metadata, e.g. copies of the index, various logs etc.), someone
> e.g. in a proprietary setting really doesn't want to be sharing that
> info.
> 
> So I would like to see real integration into "git bugreport", i.e. for
> us to smartly report more repository metrics, e.g. approx number of
> loose objects, the sort of state "__git_ps1" might report, etc.
> 
> But I think the end-state here makes things much more confusing for
> users.
> 

The "confusing UX" you describe here doesn't seem to be an inherent issue
with the implementation (nor is it as insurmountable as you're implying), it
largely appears to be an issue of under-documentation. I'll improve that in
V2 [1], but I want clarify what I was/am going for here as well.

In the context of a bug report, the diagnostics are intended to be used as
supplemental information to aid in debugging (i.e., attached with the report
in the email to the list). They're especially valuable when a bug reporter
isn't very familiar with Git internals and they can't reproduce the issue. A
lot of bugs can be investigated without those diagnostics, though, which is
why '--diagnose' isn't "on" by default.

There are also valid use-cases (beyond 'scalar diagnose') for '--no-report':
someone requests more information after looking into an already-generated
report, or a developer wants to investigate a bug on their own and use the
diagnostics as a "starting point" to guide more in-depth debugging. 

As for the proprietary data issue, I'd be open to having an option to
configure which diagnostics a user wants (either something like '--diagnose
<level>' or a separate option entirely). I'm pretty indifferent on the UI,
though, so I'll defer to other contributors on 1) if they want that feature,
and 2) what they think that should look like.

[1] https://lore.kernel.org/git/f3235afe-25cc-21a4-fc35-56e35d6be0ce@github.com/

>> An alternative implementation considered was creating a new 'git diagnose'
>> builtin, but the new command would end up duplicating much of
>> 'builtin/bugreport.c'.
> 
> It seems we always "return" from cmd_bugreport() quite quickly, and we
> basically only share the code to create the output directory. Just
> duplicating or sharing that seems like a much better approach for now
> than creating the above UX confusion.
> 
> Note that you can also share code between multiple built-ins, even in
> the same file (see e.g. builtin/{checkout,log}.c). So we could even
> share something like the safe_create_leading_directories() calling code
> in bugreport.c without libifying it.
> 

You deleted the part where I addressed this suggestion directly:

> Although that issue could be overcome with refactoring, I didn't see a
> major UX benefit of 'git diagnose' vs 'git bugreport --diagnose', so I
> went with the latter, simpler approach.

And, in the process of writing down my thoughts on the UX above, I've become
more convinced that including '--diagnose' in 'git bugreport' is the better
way to present this functionality to users.

>> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
>> 'git bugreport --diagnose', it is not being deprecated in this series.
>> Although deprecation -> removal could be a future cleanup effort, 'scalar
>> diagnose' is kept around for now as an alias for users already accustomed to
>> using it in 'scalar'.
> 
> We don't have a "make install" to get a "scalar" onto user's systems
> yet, do we really need to worry about those users?
> 
> Or is this a reference to the out-of-tree version of "scalar", not
> git.git's?
> 

In practice, it's the "out-of-tree Scalar" users that would care the most.
That said, with Scalar in the Git tree (albeit 'contrib/' and not built by
default), I think it's reasonable to want to avoid breaking changes if
possible. The continued existence of 'scalar diagnose' wouldn't really be
hurting anyone anyway, so there's no pressing need to deprecate it now.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 3, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Aug 02 2022, Victoria Dye wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Aug 01 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> [...] I didn't see a major UX benefit of 'git diagnose' vs 'git
>>> bugreport --diagnose', so I went with the latter, simpler approach.
>> 
>> I really wanted to like this, but I find the end result here really
>> confusing from a UX perspective.
>> 
>> You can now run "git bugreport --diagnose", which creates a giant *.zip
>> file to go along with your *.txt, but your *.txt makes no reference to
>> it.
>> 
>> Should you ... attach it to your bug report to this mailing list, do
>> something else?
>> 
>> The documentation doesn't offer much in the way of hints, other than
>> suggesting (with --no-report) that this --diagnose is for something
>> entirely different (and that's how "scalar" uses it).
>> 
>> I know what it's really for after reading this series, but for "git
>> bugreport" in particular we should be really careful about not making
>> the UX confusing.
>> 
>> The generated *.zip contains some really deep info about your repo (and
>> not just metadata, e.g. copies of the index, various logs etc.), someone
>> e.g. in a proprietary setting really doesn't want to be sharing that
>> info.
>> 
>> So I would like to see real integration into "git bugreport", i.e. for
>> us to smartly report more repository metrics, e.g. approx number of
>> loose objects, the sort of state "__git_ps1" might report, etc.
>> 
>> But I think the end-state here makes things much more confusing for
>> users.
>> 
>
> The "confusing UX" you describe here doesn't seem to be an inherent issue
> with the implementation (nor is it as insurmountable as you're implying),

I'm not implying or saying that it's insurmountable.

I think in principle having such a mode in "git bugreport" would make
sense.

But the UX here seems to be an afterthought. So I wonder if we shouldn't
hold off on it and just have a new *--helper or something instead.

> it
> largely appears to be an issue of under-documentation. I'll improve that in
> V2 [1], but I want clarify what I was/am going for here as well.
>
> In the context of a bug report, the diagnostics are intended to be used as
> supplemental information to aid in debugging (i.e., attached with the report
> in the email to the list).

Per https://lore.kernel.org/git/?q=n%3Azip we don't block *.zip
attachments, but we have fairly low size limits (to the point of
blocking some large patches).

E.g. e448263716f (po/git.pot: this is now a generated file, 2022-05-26)
(https://lore.kernel.org/git/20220526145035.18958-7-worldhello.net@gmail.com/)
was blocked, and it's just over 500k.

Aside from the data-sharing issues that seems like a good addition to
git-bugreport, i.e. tell the user if the attachment would be blocked due
to its size...

> They're especially valuable when a bug reporter
> isn't very familiar with Git internals and they can't reproduce the issue. A
> lot of bugs can be investigated without those diagnostics, though, which is
> why '--diagnose' isn't "on" by default.
>
> There are also valid use-cases (beyond 'scalar diagnose') for '--no-report':
> someone requests more information after looking into an already-generated
> report, or a developer wants to investigate a bug on their own and use the
> diagnostics as a "starting point" to guide more in-depth debugging. 

Yes, it's useful in a lot of cases. I'm just saying that we really need
to bridge the gap of telling the user what they should be doing with
this new file....

> As for the proprietary data issue, I'd be open to having an option to
> configure which diagnostics a user wants (either something like '--diagnose
> <level>' or a separate option entirely). I'm pretty indifferent on the UI,
> though, so I'll defer to other contributors on 1) if they want that feature,
> and 2) what they think that should look like.

I think it's really important that a bug report feature doesn't submit
private data by default. The name of a "--diagnose" option heavily
implies some aggregate metrics etc.

We then attach what to the user are opaque binary files, but which
contain even the file contants of the repository (index files).

I think scalar was developer for "internal" use-cases where such sharing
wasn't an issue for anyone, but I don't want anyone to get in trouble
because they shared their proprietary source code on the ML while trying
to file a bug report, so defaults really matter.

Especially because if there's one thing we've learned from bug reports &
questions to this & git-users is that the one thing you can rely on is
that users routinely don't carefully scrutinize documentation before
trying out various features...

> [1] https://lore.kernel.org/git/f3235afe-25cc-21a4-fc35-56e35d6be0ce@github.com/
>
>>> An alternative implementation considered was creating a new 'git diagnose'
>>> builtin, but the new command would end up duplicating much of
>>> 'builtin/bugreport.c'.
>> 
>> It seems we always "return" from cmd_bugreport() quite quickly, and we
>> basically only share the code to create the output directory. Just
>> duplicating or sharing that seems like a much better approach for now
>> than creating the above UX confusion.
>> 
>> Note that you can also share code between multiple built-ins, even in
>> the same file (see e.g. builtin/{checkout,log}.c). So we could even
>> share something like the safe_create_leading_directories() calling code
>> in bugreport.c without libifying it.
>> 
>
> You deleted the part where I addressed this suggestion directly:

Yes, the "Although that..." sentence, but I commented on the UX
trade-off elsewhere.

>> Although that issue could be overcome with refactoring, I didn't see a
>> major UX benefit of 'git diagnose' vs 'git bugreport --diagnose', so I
>> went with the latter, simpler approach.
>
> And, in the process of writing down my thoughts on the UX above, I've become
> more convinced that including '--diagnose' in 'git bugreport' is the better
> way to present this functionality to users.

We're on the same page there, we're just discussing if/how to make the
end-to-end process clearer to users.

I don't think it's clear in v1, and is sorely missing something like the
discussion we have around "ANONYMIZING" in git-fast-export(1), and we
really should have a "safe by default". Everything I'm noting here would
be addressed by e.g.:

 * The *.txt output would say "you can additionally attach a diagnostic
   *.zip" file etc. etc., noting if/when the user would do that.

 * We'd have that --diagnostics be safe by default in the way
   --anonymize is, e.g. including stats about number of refs & the like,
   not their contents.

 * We could also have e.g. "--diagnostics --no-anonymize", or
   "--diagnostics=full" or whatever, which would be some approximation
   of the current output.

>>> Finally, despite 'scalar diagnose' now being nothing more than a wrapper for
>>> 'git bugreport --diagnose', it is not being deprecated in this series.
>>> Although deprecation -> removal could be a future cleanup effort, 'scalar
>>> diagnose' is kept around for now as an alias for users already accustomed to
>>> using it in 'scalar'.
>> 
>> We don't have a "make install" to get a "scalar" onto user's systems
>> yet, do we really need to worry about those users?
>> 
>> Or is this a reference to the out-of-tree version of "scalar", not
>> git.git's?
>> 
>
> In practice, it's the "out-of-tree Scalar" users that would care the most.
> That said, with Scalar in the Git tree (albeit 'contrib/' and not built by
> default), I think it's reasonable to want to avoid breaking changes if
> possible. The continued existence of 'scalar diagnose' wouldn't really be
> hurting anyone anyway, so there's no pressing need to deprecate it now.

I'm fine with keeping it, but just found the juxtaposition of doing that
& previous discussions about scalar to be surprising.

When it was submitted (from my recollection, I see
Documentation/technical/scalar.txt is rather light on the topic) the
idea was (and this is from my own recollection, I haven't dug up ML
references) that scalar would be "going away", and that the reason to
have it in-tree was to inform the design of the main "git" tooling for
the use-cases scalar is addressing.

So, a bit similar to Cogito (although that was never in-tree).

I'm fine with just keeping it as an alias for "git bugreport
--whatever", but think it would make even more sense to change the
interface a bit as we integrate it to the rest of git, and then "die()"
saying "'scalar xyz' is now 'git abc', mostly....".

Existing scalar users used to 'scalar xyz' would then try the new 'git
abc', and report if/how that fits their use-case.

Then at some point if we do that with all of scalar's features we'd be
at the point of 'git rm'-ing it, as it would have served its stated
purpose.

Just my 0.02, I think in any case having the scalar.txt doc's "Roadmap"
updated to address this would be very useful.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 3, 2022

This patch series was integrated into seen via git@55dba0f.

@vdye vdye changed the title Generalize 'scalar diagnose' into 'git bugreport --diagnose' Generalize 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose' Aug 4, 2022
@vdye
Copy link
Author

vdye commented Aug 4, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 4, 2022

Submitted as pull.1310.v2.git.1659577543.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1310/vdye/scalar/generalize-diagnose-v2

To fetch this version to local tag pr-1310/vdye/scalar/generalize-diagnose-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1310/vdye/scalar/generalize-diagnose-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 4, 2022

This patch series was integrated into seen via git@b8d67d5.

@@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
const char *path, int recurse)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> If a directory added to the 'scalar diagnose' archiver does not exist, warn
> and return 0 from 'add_directory_to_archiver()' rather than failing with a
> fatal error. This handles a failure edge case where the '.git/logs' has not
> yet been created when running 'scalar diagnose', but extends to any
> situation where a directory may be missing in the '.git' dir.
>
> Now, when a directory is missing a warning is captured in the diagnostic
> logs. This provides a user with more complete information than if 'scalar
> diagnose' simply failed with an error.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  contrib/scalar/scalar.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index 04046452284..b9092f0b612 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -266,14 +266,20 @@ static int add_directory_to_archiver(struct strvec *archiver_args,
>  					  const char *path, int recurse)
>  {
>  	int at_root = !*path;
> -	DIR *dir = opendir(at_root ? "." : path);
> +	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf buf = STRBUF_INIT;
>  	size_t len;
>  	int res = 0;
>  
> -	if (!dir)
> +	dir = opendir(at_root ? "." : path);
> +	if (!dir) {
> +		if (errno == ENOENT) {

Per [1] I think this is incorrect or overly strict. Let's not spew
warnings if the user "rm -rf .git/hooks" or whatever.

It might be valuable to note in some file in the archive such oddities
we find, but warning() won't give us that.

> +			warning(_("could not archive missing directory '%s'"), path);

In any case, this should be e.g.:

	warning_errno(_("could not archive directory '%s'"), path);

You already have an errno, so using *_errno() will add the standard
information about what the issue is.

> +			return 0;
> +		}
>  		return error_errno(_("could not open directory '%s'"), path);
> +	}
>  
>  	if (!at_root)
>  		strbuf_addf(&buf, "%s/", path);

1. https://lore.kernel.org/git/220610.86ilp9s1x7.gmgdl@evledraar.gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	dir = opendir(at_root ? "." : path);
>> +	if (!dir) {
>> +		if (errno == ENOENT) {
>
> Per [1]

"Per [1]" somehow sounds more like a reference to an authoritative
source, at least to me.  Every time you use it, I have to see what
it refers to, and after realizing that you used it as a replacement
of "I said it already in [1]" again, it leaves a funny feeling.

> I think this is incorrect or overly strict. Let's not spew
> warnings if the user "rm -rf .git/hooks" or whatever.

The above is doing the right thing even in that situation, doesn't
it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
a warning to indicate that we found an unusual situation, and return
without failing.  If we got something other than ENOENT, we fail with
error_errno(), because opendir() failed for a reason other than "No
such file or directory".

> You already have an errno, so using *_errno() will add the standard
> information about what the issue is.

Reading the code aloud, slowly, may help.  When errno says ENOENT,
we know opendir() failed because of "No such file or directory",
so "path" was missing.  So let's say 'not archiving a missing directory'".

ENOENT or "No such file or directory" is an implementation detail
that does not help the end user.

The other side, i.e. when the errno is *not* ENOENT, already uses
error_errno().

So, I am puzzled.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +	dir = opendir(at_root ? "." : path);
>>> +	if (!dir) {
>>> +		if (errno == ENOENT) {
>>
>> Per [1]
>
> "Per [1]" somehow sounds more like a reference to an authoritative
> source, at least to me.  Every time you use it, I have to see what
> it refers to, and after realizing that you used it as a replacement
> of "I said it already in [1]" again, it leaves a funny feeling.

"Per" in the sense of "Per what I noted in [1]".

>> I think this is incorrect or overly strict. Let's not spew
>> warnings if the user "rm -rf .git/hooks" or whatever.
>
> The above is doing the right thing even in that situation, doesn't
> it?  If there is no ".git/hooks" that is fine.  We get ENOENT, give
> a warning to indicate that we found an unusual situation, and return
> without failing.  If we got something other than ENOENT, we fail with
> error_errno(), because opendir() failed for a reason other than "No
> such file or directory".

I'm mainly noting that the point of this step is to produce an archive
for the consumption of the remote end.

Therefore it seems to me like it would me much more useful to note these
"oddities" in some log that we're about to zip up, rather than issue a
warning().

The "per [1]" was a reference to the (paraphrasing) "maybe that's not
needed at all", but you seemed to think so. But for the purposes of the
discussion here let's assume we keep it.

>> You already have an errno, so using *_errno() will add the standard
>> information about what the issue is.
>
> Reading the code aloud, slowly, may help.  When errno says ENOENT,
> we know opendir() failed because of "No such file or directory",
> so "path" was missing.  So let's say 'not archiving a missing directory'".
>
> ENOENT or "No such file or directory" is an implementation detail
> that does not help the end user.
>
> The other side, i.e. when the errno is *not* ENOENT, already uses
> error_errno().
>
> So, I am puzzled.

I'm pointing out that we don't need to include that part in the message,
because warning_errno() will already give us that for free. I.e.:

	warning: could not archive directory '<some dir>': No such file or directory

v.s.:

	warning: could not archive missing directory '<some dir>'

The advantages of doing so being:

 * It's clear (at least to the keen eye) that it's using the "errno
   format", so you know it's not just saying "could not for <whatever
   reason>", it specifically got ENOENT.

 * The i18n for the strerror() comes from the C library, which will be
   translated already, whereas a new git.pot message won't be (but we'll
   hopefully bridge the gap eventually).

 * This way we we can share the message, whatever the errno happens to
   be, so we could e.g.:

      errno = ENOENT;
      warning_errno(_("could not archive directory '%s'"), "<some dir>");
      errno = ENOMEM;
      error_errno(_("could not archive directory '%s'"), "<some dir>");

   Whereas putting the reason for why we couldn't (which just duplicates
   the errno) in the message forces the messages & i18n to diverge.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I'm mainly noting that the point of this step is to produce an archive
> for the consumption of the remote end.
>
> Therefore it seems to me like it would me much more useful to note these
> "oddities" in some log that we're about to zip up, rather than issue a
> warning().

Hmph, the receiving end that inspects the archive will know that a
directory did not get archived, but they cannot tell if that is
because it did not exist, or because it was unreadable, and the
trouble the user may be having can well be the result of having
the directory unreadable.  So from that point of view, in addition
to these warnings and errors, it would be helpful to record the
errors we encounter while we generate the diagnostic archive in the
archive itself for inspection.

But the warning on the local side has merit to warn the user of an
unusual situation.  "I am puzzled why the hook I thought I wrote did
not trigger, but the diag tool says I do not have .git/hooks at all"
is a welcome side effect, even though it may not be the primary
effect we are aiming to gain by having these warning messages.

> I'm pointing out that we don't need to include that part in the message,
> because warning_errno() will already give us that for free. I.e.:
>
> 	warning: could not archive directory '<some dir>': No such file or directory
>
> v.s.:
>
> 	warning: could not archive missing directory '<some dir>'
>
> The advantages of doing so being:
>
>  * It's clear (at least to the keen eye) that it's using the "errno
>    format", so you know it's not just saying "could not for <whatever
>    reason>", it specifically got ENOENT.

Funny.  I find it much clearer if we can use our own message without
having to rely on whatever strerror(error) gives us.  We know better
than the C library why we got ENOENT and be more readable.  They say
"No such file or directory" because from ENOENT alone they cannot
tell you if it was a file or a directory that was missing, but we
have a better context like "we were trying to create an archive" and
"we tried opendir, expecting that the thing is a directory".

>  * The i18n for the strerror() comes from the C library, which will be
>    translated already, whereas a new git.pot message won't be (but we'll
>    hopefully bridge the gap eventually).

As I said above, I do not think it is an advantage in this case that
strerror() is translated, as the point of having a separate message
is because we can be more to-the-point, and we do not need to use
the strerror() result in there.

>  * This way we we can share the message, whatever the errno happens to
>    be, so we could e.g.:
>
>       errno = ENOENT;
>       warning_errno(_("could not archive directory '%s'"), "<some dir>");
>       errno = ENOMEM;
>       error_errno(_("could not archive directory '%s'"), "<some dir>");
>
>    Whereas putting the reason for why we couldn't (which just duplicates
>    the errno) in the message forces the messages & i18n to diverge.

And an added clarity that we can use a separate message is something
I think is worth having, compared to the cost of having an extra
message over the more generic one for any errno.

@@ -932,6 +932,7 @@ LIB_OBJS += ctype.o
LIB_OBJS += date.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
> diff --git a/diagnose.h b/diagnose.h
> new file mode 100644
> index 00000000000..e86e8a3c962
> --- /dev/null
> +++ b/diagnose.h
> @@ -0,0 +1,9 @@
> +#ifndef DIAGNOSE_H
> +#define DIAGNOSE_H
> +
> +#include "cache.h"

We don't need cache.h here, just ...

> +#include "strbuf.h"

...this, also a matter of preference, but we could also just skip the
includes here and use a forward decl (as is common in other headers):

	struct strbuf;

> +
> +int create_diagnostics_archive(struct strbuf *zip_path);
> +
> +#endif /* DIAGNOSE_H */

@@ -53,6 +53,7 @@
/git-cvsimport
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Create a 'git diagnose' builtin to generate a standalone zip archive of
> repository diagnostics.

It's good to have this as a built-in separate from "git bugreport",
but...

> +git-diagnose - Generate a zip archive of diagnostic information

...I'd really prefer for this not to squat on such a common name we
might regret having reserved later for such very specific
functionality. I'd think e.g. these would be better:

	git mk-diagnostics-zip

Or maybe:

	git archive-interesting-for-report

If I had to guess what a "git diagnose" did, I'd probably think:

 * It analyzes your config, and suggests redundancies/alternatives
 * It does some perf tests / heuritics, and e.g. suggests you turn on
   the commit-graph writing.

etc., this (arguably even too generic then) made sense as "scalar
diagnose" because scalar is all about being an opinionated interface
targeted at performance", so there's an implied "my repo's performance"
following a "scalar diagnose".

But as a top-level command-name I think we should pick something more
specific to what it does, which is (I haven't fully read ahead in the
re-roll, but I'm assuming is) mostly/entirely to be a "helper" for
"scalar diagnose" and/or "git bugreport".

> +	switch (safe_create_leading_directories(zip_path.buf)) {
> +	case SCLD_OK:
> +	case SCLD_EXISTS:
> +		break;
> +	default:
> +		die(_("could not create leading directories for '%s'"),
> +		    zip_path.buf);

This seems to be carrying forward a minor bug from bugreport.c we should
probably fix: we should use die_errno() here (and maybe lead with a
commit to fix bugreport.c's version).

The strbuf*() before that also seems like a good candidate for a utility
function in your new diagnose library, i.e. to have both bugreport.c and
diagnose.c pass it the prefix/suffix/format, then try to create that
directory, then replace the copy/pasting here with a one-line call to
that now-shared function.

The two codepaths only seem to differ in the prefix & suffix from a
quick skim, the rest is all copy/pasted...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>> repository diagnostics.
> 
> It's good to have this as a built-in separate from "git bugreport",
> but...
> 
>> +git-diagnose - Generate a zip archive of diagnostic information
> 
> ...I'd really prefer for this not to squat on such a common name we
> might regret having reserved later for such very specific
> functionality. I'd think e.g. these would be better:
> 
> 	git mk-diagnostics-zip
> 
> Or maybe:
> 
> 	git archive-interesting-for-report

These are not realistic replacements.

> If I had to guess what a "git diagnose" did, I'd probably think:
> 
>  * It analyzes your config, and suggests redundancies/alternatives
>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>    the commit-graph writing.

These sound like great options to add in the future, such as:

   --perf-test: Run performance tests on your repository using different
   Git config options and recommend certain settings.

(This --perf-test option would be a great way to get wider adoption
of parallel checkout, since its optimal settings are so machine
dependent.)

The thing is, even if we did these other things, it would result in
some kind of document that summarizes the repository shape and features.
That kind of data is exactly what this version of 'git diagnose' does.

For now, it leaves the human reader responsible for making decisions
based on those documents, but they have been incredibly helpful when we
are _diagnosing_ issues users are having with their repositories.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Fri, Aug 05 2022, Derrick Stolee wrote:

> On 8/4/2022 2:27 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
>> 
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Create a 'git diagnose' builtin to generate a standalone zip archive of
>>> repository diagnostics.
>> 
>> It's good to have this as a built-in separate from "git bugreport",
>> but...
>> 
>>> +git-diagnose - Generate a zip archive of diagnostic information
>> 
>> ...I'd really prefer for this not to squat on such a common name we
>> might regret having reserved later for such very specific
>> functionality. I'd think e.g. these would be better:
>> 
>> 	git mk-diagnostics-zip
>> 
>> Or maybe:
>> 
>> 	git archive-interesting-for-report
>
> These are not realistic replacements.

Maybe:

	git diagnose create-zip

?

>> If I had to guess what a "git diagnose" did, I'd probably think:
>> 
>>  * It analyzes your config, and suggests redundancies/alternatives
>>  * It does some perf tests / heuritics, and e.g. suggests you turn on
>>    the commit-graph writing.
>
> These sound like great options to add in the future, such as:
>
>    --perf-test: Run performance tests on your repository using different
>    Git config options and recommend certain settings.
>
> (This --perf-test option would be a great way to get wider adoption
> of parallel checkout, since its optimal settings are so machine
> dependent.)

...

> The thing is, even if we did these other things, it would result in
> some kind of document that summarizes the repository shape and features.
> That kind of data is exactly what this version of 'git diagnose' does.

I think a command like "git diagnose" that had options to do other
unrelated stuff, but by default created a zip archive when given no
options would be rather confusing.

Yes, it makes sense to emit some human-readable summary, but to zip it
up as well? That's something we just need for the "git bugreport"
case...

> For now, it leaves the human reader responsible for making decisions
> based on those documents, but they have been incredibly helpful when we
> are _diagnosing_ issues users are having with their repositories.

This is orthagonal to what I'm pointing out. You're basically saying the
user can just read the documentation to find out what this built-in
does.

That's true, what I'm pointing out is that it's unfortunate that such
highly specific functionality is squatting on such a short & generic
name, but just e.g. adding a "create-zip" sub-command would address
that.

@@ -0,0 +1,60 @@
git-diagnose(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]
>  --------
>  [verse]
>  'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
> +	       [-a | --all]

I have some local patches that...

>  static const char * const diagnose_usage[] = {
> -	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>]"),
> +	N_("git diagnose [-o|--output-directory <file>] [-s|--suffix <format>] [-a|--all]"),
>  	NULL
>  };

...spot when we have SYNOPSIS & -h discrepancies. In this case we break
with a \n after <format> in the SYNOPSIS, but you don't add a "\n" and
indentation here in the -h output. The two should be the same.

> @@ -13,6 +13,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
>  	struct strbuf zip_path = STRBUF_INIT;
>  	time_t now = time(NULL);
>  	struct tm tm;
> +	int include_everything = 0;
>  	char *option_output = NULL;
>  	char *option_suffix = "%Y-%m-%d-%H%M";
>  	char *prefixed_filename;
> @@ -22,6 +23,9 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix)
>  			   N_("specify a destination for the diagnostics archive")),
>  		OPT_STRING('s', "suffix", &option_suffix, N_("format"),
>  			   N_("specify a strftime format suffix for the filename")),
> +		OPT_BOOL_F('a', "all", &include_everything,
> +			   N_("collect complete diagnostic information"),
> +			   PARSE_OPT_NONEG),

Nice to have a "stats only" by default and some way to add the whole
shebang optionally...

> +int create_diagnostics_archive(struct strbuf *zip_path, int include_everything)

...but maybe...
>  {
>  	struct strvec archiver_args = STRVEC_INIT;
>  	char **argv_copy = NULL;
> @@ -176,11 +176,13 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>  	loose_objs_stats(&buf, ".git/objects");
>  	strvec_push(&archiver_args, buf.buf);
>  
> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +	/* Only include this if explicitly requested */
> +	if (include_everything &&
> +	    ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	     (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0))))
>  		goto diagnose_cleanup;

...this should be --include-gitdir-extract or something, instead of
"--all" and "--include-everything".

I'd think that "all" would be a thing that would actually tar up my
entire .git directory as-is (in a way that would pass git fsck on the
other end (unless that's the bug being reported...)).

Aside: Since we are getting the churn of adding this, then re-indenting
 it maybe a prep step of adding a add_directories_to_archiver() would be
 useful, which would just have a data-driven:

	{
		{ ".git" },
		[...],
		{ ".git/logs, 1 },
		NULL
	},

Then loop over that, making it easy to add/declare new subdirs to add.

@@ -11,8 +11,6 @@
#include "dir.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
>
> Replace implementation of 'scalar diagnose' with an internal invocation of
> 'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
> by making it a direct alias of 'git diagnose' and removes some code in
> 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
> alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
> of 'git diagnose'), if that is desired in the future.
>
> This introduces one minor change to the output of 'scalar diagnose', which
> is that the prefix of the created zip archive is changed from 'scalar_' to
> 'git-diagnostics-'.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  contrib/scalar/scalar.c | 29 +++++++----------------------
>  1 file changed, 7 insertions(+), 22 deletions(-)
>
> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> index b10955531ce..fe2a0e9decb 100644
> --- a/contrib/scalar/scalar.c
> +++ b/contrib/scalar/scalar.c
> @@ -11,7 +11,6 @@
>  #include "dir.h"
>  #include "packfile.h"
>  #include "help.h"
> -#include "diagnose.h"
>  
>  /*
>   * Remove the deepest subdirectory in the provided path string. Path must not
> @@ -510,34 +509,20 @@ static int cmd_diagnose(int argc, const char **argv)
>  		N_("scalar diagnose [<enlistment>]"),
>  		NULL
>  	};
> -	struct strbuf zip_path = STRBUF_INIT;
> -	time_t now = time(NULL);
> -	struct tm tm;
> +	struct strbuf diagnostics_root = STRBUF_INIT;
>  	int res = 0;
>  
>  	argc = parse_options(argc, argv, NULL, options,
>  			     usage, 0);
>  
> -	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
> -
> -	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
> -	strbuf_addftime(&zip_path,
> -			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
> -	strbuf_addstr(&zip_path, ".zip");
> -	switch (safe_create_leading_directories(zip_path.buf)) {
> -	case SCLD_EXISTS:
> -	case SCLD_OK:
> -		break;
> -	default:
> -		error_errno(_("could not create directory for '%s'"),
> -			    zip_path.buf);
> -		goto diagnose_cleanup;

Just spotting this now, but we had ad error, but we "goto
diagnose_cleanup", but that will use our "res = 0" above.

Is this untested already or in this series (didn't go back to look). But
maybe a moot point, the post-image replacement uses die()..

> -	}
> +	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
> +	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
>  
> -	res = create_diagnostics_archive(&zip_path, 1);
> +	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
> +		    "-o", diagnostics_root.buf, NULL) < 0)
> +		res = -1;

The code handling here seems really odd, issues:

 * This *can* return -1, if start_command() fails, but that's by far the
   rarer case, usually it would be 0 or >0 (only <0 if we can't start
   the command at all).

 * You should not be returning -1 from cmd_*() in general (we have
   outstanding issues with it, but those should be fixed). It will yield
   an exit code of 255 (but it's not portable)).

 * If you're going to return -1 at all, why override <0 with -1, just
   "res = run_git(...)" instead?

I think all-in-all this should be:

	res = run_git(...);

Then:

>  
> -diagnose_cleanup:
> -	strbuf_release(&zip_path);
> +	strbuf_release(&diagnostics_root);
>  	return res;

	return res < 0 ? -res : res;

Or whatever.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Aug 04 2022, Victoria Dye via GitGitGadget wrote:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> Replace implementation of 'scalar diagnose' with an internal invocation of
>> 'git diagnose --all'. This simplifies the implementation of 'cmd_diagnose'
>> by making it a direct alias of 'git diagnose' and removes some code in
>> 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the
>> alias also sets up a clean deprecation path for 'scalar diagnose' (in favor
>> of 'git diagnose'), if that is desired in the future.
>>
>> This introduces one minor change to the output of 'scalar diagnose', which
>> is that the prefix of the created zip archive is changed from 'scalar_' to
>> 'git-diagnostics-'.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  contrib/scalar/scalar.c | 29 +++++++----------------------
>>  1 file changed, 7 insertions(+), 22 deletions(-)
>>
>> diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
>> index b10955531ce..fe2a0e9decb 100644
>> --- a/contrib/scalar/scalar.c
>> +++ b/contrib/scalar/scalar.c
>> @@ -11,7 +11,6 @@
>>  #include "dir.h"
>>  #include "packfile.h"
>>  #include "help.h"
>> -#include "diagnose.h"
>>  
>>  /*
>>   * Remove the deepest subdirectory in the provided path string. Path must not
>> @@ -510,34 +509,20 @@ static int cmd_diagnose(int argc, const char **argv)
>>  		N_("scalar diagnose [<enlistment>]"),
>>  		NULL
>>  	};
>> -	struct strbuf zip_path = STRBUF_INIT;
>> -	time_t now = time(NULL);
>> -	struct tm tm;
>> +	struct strbuf diagnostics_root = STRBUF_INIT;
>>  	int res = 0;
>>  
>>  	argc = parse_options(argc, argv, NULL, options,
>>  			     usage, 0);
>>  
>> -	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
>> -
>> -	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
>> -	strbuf_addftime(&zip_path,
>> -			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
>> -	strbuf_addstr(&zip_path, ".zip");
>> -	switch (safe_create_leading_directories(zip_path.buf)) {
>> -	case SCLD_EXISTS:
>> -	case SCLD_OK:
>> -		break;
>> -	default:
>> -		error_errno(_("could not create directory for '%s'"),
>> -			    zip_path.buf);
>> -		goto diagnose_cleanup;
> 
> Just spotting this now, but we had ad error, but we "goto
> diagnose_cleanup", but that will use our "res = 0" above.
> 
> Is this untested already or in this series (didn't go back to look). But
> maybe a moot point, the post-image replacement uses die()..

Nice catch - this does appear to be a pre-existing bug in 'scalar diagnose'.
Given that both 'git diagnose' and 'git bugreport --diagnose' handle this
case more appropriately, though, I agree that it's a bit of a moot point and
not worth the churn created by a bugfix patch.

> 
>> -	}
>> +	setup_enlistment_directory(argc, argv, usage, options, &diagnostics_root);
>> +	strbuf_addstr(&diagnostics_root, "/.scalarDiagnostics");
>>  
>> -	res = create_diagnostics_archive(&zip_path, 1);
>> +	if (run_git("diagnose", "--all", "-s", "%Y%m%d_%H%M%S",
>> +		    "-o", diagnostics_root.buf, NULL) < 0)
>> +		res = -1;
> 
> The code handling here seems really odd, issues:
> 
>  * This *can* return -1, if start_command() fails, but that's by far the
>    rarer case, usually it would be 0 or >0 (only <0 if we can't start
>    the command at all).
> 
>  * You should not be returning -1 from cmd_*() in general (we have
>    outstanding issues with it, but those should be fixed). It will yield
>    an exit code of 255 (but it's not portable)).
> 
>  * If you're going to return -1 at all, why override <0 with -1, just
>    "res = run_git(...)" instead?

Thanks for the info, I'll replace the hardcoded '-1' return value with
something derived from 'res' in the next version.

> 
> I think all-in-all this should be:
> 
> 	res = run_git(...);
> 
> Then:
> 
>>  
>> -diagnose_cleanup:
>> -	strbuf_release(&zip_path);
>> +	strbuf_release(&diagnostics_root);
>>  	return res;
> 
> 	return res < 0 ? -res : res;
> 
> Or whatever.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 4, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * (Almost) entirely redesigned the UI for generating diagnostics. The new
>    approach avoids cluttering 'git bugreport' with a mode that doesn't
>    actually generate a report. Now, there are distinct options for different
>    use cases: generating extra diagnostics with a bug report ('git bugreport
>    --diagnose') and generating diagnostics for personal debugging/addition
>    to an existing bug report ('git diagnose').

An additional command gives us far more design flexibility, and in
this case I think it may be worth it.  It has a risk of confusing
users between "git bugreport --diag" and "git diagnose --report" a
way to send a report with diagnostic information, though.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2022

This patch series was integrated into seen via git@8f0a679.

@vdye
Copy link
Author

vdye commented Aug 12, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2022

Submitted as pull.1310.v4.git.1660335019.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1310/vdye/scalar/generalize-diagnose-v4

To fetch this version to local tag pr-1310/vdye/scalar/generalize-diagnose-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1310/vdye/scalar/generalize-diagnose-v4

@@ -932,6 +932,7 @@ LIB_OBJS += ctype.o
LIB_OBJS += date.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
> +			    the_repository, NULL, 0);
> +	if (res) {
> +		error(_("failed to write archive"));
> +		goto diagnose_cleanup;
> +	}
> +
> +	fprintf(stderr, "\n"
> +		"Diagnostics complete.\n"
> +		"All of the gathered info is captured in '%s'\n",
> +		zip_path->buf);
> +
> +diagnose_cleanup:
> +	if (archiver_fd >= 0) {
> +		dup2(stdout_fd, STDOUT_FILENO);
> +		close(stdout_fd);
> +		close(archiver_fd);
> +	}

Hmph, after reading 

https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/

I would have expected to see the above part more like:

                res = write_archive(...);

        diagnose_cleanup:
                if (res)
                        error(...);
                else
                        fprintf(stderr, "Diag complete");

                if (archiver_fd >= 0) {
                        ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Victoria Dye wrote (reply to this):

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
>> +			    the_repository, NULL, 0);
>> +	if (res) {
>> +		error(_("failed to write archive"));
>> +		goto diagnose_cleanup;
>> +	}
>> +
>> +	fprintf(stderr, "\n"
>> +		"Diagnostics complete.\n"
>> +		"All of the gathered info is captured in '%s'\n",
>> +		zip_path->buf);
>> +
>> +diagnose_cleanup:
>> +	if (archiver_fd >= 0) {
>> +		dup2(stdout_fd, STDOUT_FILENO);
>> +		close(stdout_fd);
>> +		close(archiver_fd);
>> +	}
> 
> Hmph, after reading 
> 
> https://lore.kernel.org/git/32f2cadc-556e-1cd5-a238-c8f1cdaaf470@github.com/
> 
> I would have expected to see the above part more like:
> 
>                 res = write_archive(...);
> 
>         diagnose_cleanup:
>                 if (res)
>                         error(...);
>                 else
>                         fprintf(stderr, "Diag complete");
> 
>                 if (archiver_fd >= 0) {
>                         ...
> 

I originally planned to implement it this way, but instead went with adding
an error printout explicitly for failed 'add_directory_to_archiver()' calls
(in patch 6 [1]). I elaborated on the thought process/reasoning for
modifying the approach in the cover letter [2]:

> Improved error reporting in 'create_diagnostics_archive()'. I was
> originally going to modify the "failed to write archive" error to trigger
> whenever 'create_diagnostics_archive()' returned a nonzero value.
> However, while working on it I realized the message would no longer be
> tied to a failure of 'write_archive()', making it less helpful in
> pinpointing an issue. To address the original issue
> ('add_directory_to_archiver()' silently failing in
> 'create_diagnostics_archive()'), I instead refactored those calls into a
> loop and added the error message. Now, there's exactly one error message
> printed for each possible early exit scenario from
> 'create_diagnostics_archive()', hopefully avoiding both redundancy &
> under-reporting.

To add a bit more context: when I used the "move 'diagnose_cleanup'"
approach, I felt that the added message was either redundant or too general
to help a user identify an issue. Redundancy appeared when, for example,
'dup2()' returned a nonzero code; if that happened, we'd get the printouts:

$ git diagnose --suffix test
error: could not redirect output: <ERRNO error message>
error: could not write archive
error: unable to create diagnostics archive 'git-diagnostics-test.zip': <ERRNO error message>

The first two are from 'create_diagnostics_archive()', and the second
doesn't really give us information that we don't get out of the third (in
'builtin/diagnose.c').

Conversely, a failure in 'add_directory_to_archiver()' and 'write_archive()'
would give us the same printouts (at least within the scope of
'diagnose.c'/'builtin/diagnose.c'):

$ git diagnose --suffix test
error: could not write archive
error: unable to create diagnostics archive 'git-diagnostics-test.zip: <some error message>

Previously, "could not write archive" would point someone debugging to
'write_archive()'; now, it's unclear.

I ended up settling on adding the error message directly to the
'add_directory_to_archiver()' loop in patch 6 because it meant that:

1. 'create_diagnostics_archive()' would only ever print one error message;
   the others printed would indicate (IMO more clearly) where in the call
   stack the error happened
2. there would be a unique error message for each condition that caused
   'create_diagnostics_archive()' to exit early

Apologies for not sending another reply with these details before
re-rolling. I'll be more direct when changing plans in the future.

Thanks!

[1] https://lore.kernel.org/git/710b67e5776363d199ed5043d019386819d44e7e.1660335019.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.1310.v4.git.1660335019.gitgitgadget@gmail.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Victoria Dye <vdye@github.com> writes:

>> Improved error reporting in 'create_diagnostics_archive()'. I was
>> originally going to modify the "failed to write archive" error to trigger
>> whenever 'create_diagnostics_archive()' returned a nonzero value.
>> However, while working on it I realized the message would no longer be
>> tied to a failure of 'write_archive()', making it less helpful in
>> pinpointing an issue. To address the original issue
>> ('add_directory_to_archiver()' silently failing in
>> 'create_diagnostics_archive()'), I instead refactored those calls into a
>> loop and added the error message. Now, there's exactly one error message
>> printed for each possible early exit scenario from
>> 'create_diagnostics_archive()', hopefully avoiding both redundancy &
>> under-reporting.

Ah, I see.  I probably should have read the cover letter before
responding.  I try to understand the new iteration _without_ relying
on the cover letter first, to ensure that the resulting history is
still understandable; when I see something questionable, however, I
should see if cover letter gives more context and clues.  Sorry for
the noise.

@@ -11,8 +11,7 @@
#include "dir.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -int create_diagnostics_archive(struct strbuf *zip_path)
> +int create_diagnostics_archive(struct strbuf *zip_path, enum diagnose_mode mode)
>  {
>  	struct strvec archiver_args = STRVEC_INIT;
>  	char **argv_copy = NULL;
>  	int stdout_fd = -1, archiver_fd = -1;
>  	struct strbuf buf = STRBUF_INIT;
> -	int res;
> +	int res, i;
> +	struct archive_dir archive_dirs[] = {
> +		{ ".git", 0 },
> +		{ ".git/hooks", 0 },
> +		{ ".git/info", 0 },
> +		{ ".git/logs", 1 },
> +		{ ".git/objects/info", 0 }
> +	};
> +
> +	if (mode == DIAGNOSE_NONE) {
> +		res = 0;
> +		goto diagnose_cleanup;
> +	}
>  
>  	stdout_fd = dup(STDOUT_FILENO);
>  	if (stdout_fd < 0) {
> @@ -177,12 +194,18 @@ int create_diagnostics_archive(struct strbuf *zip_path)
>  	loose_objs_stats(&buf, ".git/objects");
>  	strvec_push(&archiver_args, buf.buf);
>  
> -	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> -	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> -		goto diagnose_cleanup;
> +	/* Only include this if explicitly requested */
> +	if (mode == DIAGNOSE_ALL) {
> +		for (i = 0; i < ARRAY_SIZE(archive_dirs); i++) {
> +			if (add_directory_to_archiver(&archiver_args,
> +						      archive_dirs[i].path,
> +						      archive_dirs[i].recursive)) {
> +				res = error_errno(_("could not add directory '%s' to archiver"),
> +						  archive_dirs[i].path);
> +				goto diagnose_cleanup;
> +			}
> +		}
> +	}

Even without the "only include under DIAGNOSE_ALL" support added by
this step, the table-driven organization is much nicer.  The earlier
"move to common" step aimed to be as close as pure move, so this
step is our first opportunity to make this clean-up, so I do not
mind too much about this step doing two unrelated things (one is to
clean-up the if (A||B||C) into a loop over A, B and C, the other is
to introduce the diagnose_mode) at once.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2022

This patch series was integrated into seen via git@edbcd27.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2022

This patch series was integrated into seen via git@efda1e9.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2022

This patch series was integrated into seen via git@9ff408f.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 16, 2022

There was a status update in the "Cooking" section about the branch vd/scalar-generalize-diagnose on the Git mailing list:

The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".

Will merge to 'next'?
source: <pull.1310.v4.git.1660335019.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 16, 2022

This patch series was integrated into seen via git@3270574.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2022

This patch series was integrated into seen via git@a56c316.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2022

This patch series was integrated into seen via git@aecc965.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

There was a status update in the "Cooking" section about the branch vd/scalar-generalize-diagnose on the Git mailing list:

The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".

Will merge to 'next'?
source: <pull.1310.v4.git.1660335019.gitgitgadget@gmail.com>

@@ -53,6 +53,7 @@
/git-cvsimport
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Fri, Aug 12 2022, Victoria Dye via GitGitGadget wrote:

> From: Victoria Dye <vdye@github.com>
> [...]

This is correct:

> +'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]

...

> +	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),

But this is not, it's missing () around the short v.s. long option, and
we should have a space surrounding the "|" as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Aug 12 2022, Victoria Dye via GitGitGadget wrote:
>
>> From: Victoria Dye <vdye@github.com>
>> [...]
>
> This is correct:
>
>> +'git diagnose' [(-o | --output-directory) <path>] [(-s | --suffix) <format>]
>
> ...
>
>> +	N_("git diagnose [-o|--output-directory <path>] [-s|--suffix <format>]"),
>
> But this is not, it's missing () around the short v.s. long option, and
> we should have a space surrounding the "|" as well.

You are commenting on what appears inside N_() and I agree that it
should match the other one.

It is kind of sad that our usage strings do not allow proper
translation and instead force translaters to _know_ that they are
supposed to touch _only_ the placeholder strings (path and format in
this case).

Not your fault, or Victoria's, of course ;-)

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

This patch series was integrated into seen via git@b422958.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

This patch series was integrated into next via git@bd8e574.

@gitgitgadget gitgitgadget bot added the next label Aug 18, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 19, 2022

This patch series was integrated into seen via git@f28b562.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

This patch series was integrated into seen via git@4dc2500.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

There was a status update in the "Cooking" section about the branch vd/scalar-generalize-diagnose on the Git mailing list:

The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".

Will merge to 'master'.
source: <pull.1310.v4.git.1660335019.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

This patch series was integrated into seen via git@aaa07f6.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2022

This patch series was integrated into seen via git@21aea5e.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2022

This patch series was integrated into seen via git@f92d184.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into seen via git@460f5c1.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

There was a status update in the "Cooking" section about the branch vd/scalar-generalize-diagnose on the Git mailing list:

The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".

Will merge to 'master'.
source: <pull.1310.v4.git.1660335019.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into seen via git@f00ddc9.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into master via git@f00ddc9.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into next via git@f00ddc9.

@gitgitgadget gitgitgadget bot added the master label Aug 25, 2022
@gitgitgadget gitgitgadget bot closed this Aug 25, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

Closed via f00ddc9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant