Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit-graph write refactor (was: Create commit-graph file format v2) #112

17 changes: 0 additions & 17 deletions Documentation/technical/commit-graph.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,23 +127,6 @@ Design Details
helpful for these clones, anyway. The commit-graph will not be read or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> The commit-graph feature began with a long list of planned
> benefits, most of which are now complete. The future work
> section has only a few items left.
>
> As for making more algorithms aware of generation numbers,
> some are only waiting for generation number v2 to ensure the
> performance matches the existing behavior using commit date.
>
> It is unlikely that we will ever send a commit-graph file
> as part of the protocol, since we would need to verify the
> data, and that is as expensive as writing a commit-graph from
> scratch. If we want to start trusting remote content, then
> that item can be investigated again.

My best of 3 times for "write" followed by "verify" on linux.git are
8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write".

So that's a reduction of ~40%. I have another big in-house repo where I
get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for
a commit-graph file on the order of 50MB where it would be quicker for
me to download and verify it if the protocol supported it.

I'm not clamoring to make it part of the protocol, but the claim that
"verify" needs to do the equivalent of "write" seems to be demonstrably
wrong, or perhaps "verify" isn't doing all the work it should be doing?

> While there is more work to be done on the feature, having
> a section of the docs devoted to a TODO list is wasteful and
> hard to keep up-to-date.

Agreed, whatever we decide to do in the future I think it makes sense to
remove this section from the docs, although perhaps the commit message
should be amended per the above :)

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/commit-graph.txt | 17 -----------------
>  1 file changed, 17 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
> index 7805b0968c..fb53341d5e 100644
> --- a/Documentation/technical/commit-graph.txt
> +++ b/Documentation/technical/commit-graph.txt
> @@ -127,23 +127,6 @@ Design Details
>    helpful for these clones, anyway. The commit-graph will not be read or
>    written when shallow commits are present.
>
> -Future Work
> ------------
> -
> -- After computing and storing generation numbers, we must make graph
> -  walks aware of generation numbers to gain the performance benefits they
> -  enable. This will mostly be accomplished by swapping a commit-date-ordered
> -  priority queue with one ordered by generation number. The following
> -  operations are important candidates:
> -
> -    - 'log --topo-order'
> -    - 'tag --merged'
> -
> -- A server could provide a commit-graph file as part of the network protocol
> -  to avoid extra calculations by clients. This feature is only of benefit if
> -  the user is willing to trust the file, because verifying the file is correct
> -  is as hard as computing it from scratch.
> -
>  Related Links
>  -------------
>  [0] https://bugs.chromium.org/p/git/issues/detail?id=8

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/1/2019 10:58 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> The commit-graph feature began with a long list of planned
>> benefits, most of which are now complete. The future work
>> section has only a few items left.
>>
>> As for making more algorithms aware of generation numbers,
>> some are only waiting for generation number v2 to ensure the
>> performance matches the existing behavior using commit date.
>>
>> It is unlikely that we will ever send a commit-graph file
>> as part of the protocol, since we would need to verify the
>> data, and that is as expensive as writing a commit-graph from
>> scratch. If we want to start trusting remote content, then
>> that item can be investigated again.
> 
> My best of 3 times for "write" followed by "verify" on linux.git are
> 8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write".
> 
> So that's a reduction of ~40%. I have another big in-house repo where I
> get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for
> a commit-graph file on the order of 50MB where it would be quicker for
> me to download and verify it if the protocol supported it.

Keep in mind that your first "write" may have warmed up the file system
and your pack-files parsed faster the second time around.

You are right though, 'verify' doesn't do these things:

1. Sort a list of OIDs.
2. Write a file.

And perhaps some other things. I should mean that "the main task of
'git commit-graph verify' is to parse commits from the object store,
and this is the most expensive operation in 'git commit-graph write'."

Thanks,
-Stolee

written when shallow commits are present.

Future Work
-----------

- After computing and storing generation numbers, we must make graph
walks aware of generation numbers to gain the performance benefits they
enable. This will mostly be accomplished by swapping a commit-date-ordered
priority queue with one ordered by generation number. The following
operations are important candidates:

- 'log --topo-order'
- 'tag --merged'

- A server could provide a commit-graph file as part of the network protocol
to avoid extra calculations by clients. This feature is only of benefit if
the user is willing to trust the file, because verifying the file is correct
is as hard as computing it from scratch.

Related Links
-------------
[0] https://bugs.chromium.org/p/git/issues/detail?id=8
Expand Down
22 changes: 12 additions & 10 deletions builtin/commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ static int graph_write(int argc, const char **argv)
struct string_list *pack_indexes = NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() and write_commit_graph_reachable() methods
> currently take two boolean parameters: 'append' and 'report_progress'.
> We will soon expand the possible options to send to these methods, so
> instead of complicating the parameter list, first simplify it.
>
> Collapse these parameters into a 'flags' parameter, and adjust the
> callers to provide flags as necessary.

Nice.  It would make more sense for a collection of independent bits
to be in an unsigned, not signed integer variable, though.  Unless
you assign some special meaning to the top-most bit, that is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method uses die() to report failure and
> exit when confronted with an unexpected condition. This use of
> die() in a library function is incorrect and is now replaced by
> error() statements and an int return type.
>
> Now that we use 'goto cleanup' to jump to the terminal condition
> on an error, we have new paths that could lead to uninitialized
> values. New initializers are added to correct for this.
>
> The builtins 'commit-graph', 'gc', and 'commit' call these methods,
> so update them to check the return value.

Seems good to have a test to check for some of this behavior. I see that
can be done as just:

    echo doesnotexist | git commit-graph write --stdin-packs

And checking the exit code is 1 as it is now, not 128.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method uses die() to report failure and
> exit when confronted with an unexpected condition. This use of
> die() in a library function is incorrect and is now replaced by
> error() statements and an int return type.

OK.

> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 537fdfd0f0..2e86251f02 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
>  	struct string_list *pack_indexes = NULL;
>  	struct string_list *commit_hex = NULL;
>  	struct string_list lines;
> +	int result;
>  
>  	static struct option builtin_commit_graph_write_options[] = {
>  		OPT_STRING(0, "object-dir", &opts.obj_dir,
> @@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
>  
>  	read_replace_refs = 0;
>  
> -	if (opts.reachable) {
> -		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
> -		return 0;
> -	}
> +	if (opts.reachable)
> +		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
>  
>  	string_list_init(&lines, 0);
>  	if (opts.stdin_packs || opts.stdin_commits) {
> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>  		UNLEAK(buf);
>  	}
>  
> -	write_commit_graph(opts.obj_dir,
> -			   pack_indexes,
> -			   commit_hex,
> -			   opts.append,
> -			   1);
> +	result = write_commit_graph(opts.obj_dir,
> +				    pack_indexes,
> +				    commit_hex,
> +				    opts.append,
> +				    1);
>  
>  	UNLEAK(lines);
> -	return 0;
> +	return result;
>  }

What were the error values this function used to return?  I am
wondering if the callers of this function are prepraed to see the
returned values from write_commit_graph() this function stores in
'result' (which presumably are small negative value like our usual
internal API convention)?


> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2986553d5f..b9ea7222fa 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  		      "new_index file. Check that disk is not full and quota is\n"
>  		      "not exceeded, and then \"git reset HEAD\" to recover."));
>  
> -	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
> -		write_commit_graph_reachable(get_object_directory(), 0, 0);
> +	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
> +	    write_commit_graph_reachable(get_object_directory(), 0, 0))
> +		return 1;

This is good.  An error signalled as a small negative integer would
not seep thru to the exit status but is explicitly turned into 1
with this change.

> +	if (gc_write_commit_graph &&
> +	    write_commit_graph_reachable(get_object_directory(), 0,
> +					 !quiet && !daemonized))
> +		return 1;

Ditto.

> +int write_commit_graph_reachable(const char *obj_dir, int append,
> +				 int report_progress)
>  {
>  	struct string_list list = STRING_LIST_INIT_DUP;
> +	int result;
>  
>  	for_each_ref(add_ref_to_list, &list);
> -	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
> +	result = write_commit_graph(obj_dir, NULL, &list,
> +				    append, report_progress);
>  
>  	string_list_clear(&list, 0);
> +	return result;
>  }

OK.  The callers of write_commit_graph_reachable() can be careful
about its return values to the same degree as the callers of
write_commit_graph().

These functions perhaps deserves
/*
 * returns X when ....
 */
in front (or in *.h)?

> +int write_commit_graph(const char *obj_dir,
> +		       struct string_list *pack_indexes,
> +		       struct string_list *commit_hex,
> +		       int append, int report_progress)
>  {
>  	struct packed_oid_list oids;
>  	struct packed_commit_list commits;
>  	struct hashfile *f;
>  	uint32_t i, count_distinct = 0;
> -	char *graph_name;
> +	char *graph_name = NULL;
>  	struct lock_file lk = LOCK_INIT;
>  	uint32_t chunk_ids[5];
>  	uint64_t chunk_offsets[5];
> @@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
>  	uint64_t progress_cnt = 0;
>  	struct strbuf progress_title = STRBUF_INIT;
>  	unsigned long approx_nr_objects;
> +	int res = 0;
>  
>  	if (!commit_graph_compatible(the_repository))
> -		return;
> +		return 0;

OK.  I tend to find "return 0" easier to read/follow than "return
res" here.

>  	oids.nr = 0;
>  	approx_nr_objects = approximate_object_count();
>  	oids.alloc = approx_nr_objects / 32;
>  	oids.progress = NULL;
>  	oids.progress_done = 0;
> +	commits.list = NULL;
>  
>  	if (append) {
>  		prepare_commit_graph_one(the_repository, obj_dir);
> @@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
>  			strbuf_setlen(&packname, dirlen);
>  			strbuf_addstr(&packname, pack_indexes->items[i].string);
>  			p = add_packed_git(packname.buf, packname.len, 1);
> -			if (!p)
> -				die(_("error adding pack %s"), packname.buf);
> -			if (open_pack_index(p))
> -				die(_("error opening index for %s"), packname.buf);
> +			if (!p) {
> +				error(_("error adding pack %s"), packname.buf);
> +				res = 1;
> +				goto cleanup;
> +			}
> +			if (open_pack_index(p)) {
> +				error(_("error opening index for %s"), packname.buf);
> +				res = 1;
> +				goto cleanup;
> +			}

Hmph, is this signal an error by returning a positive "1"?  That's a
bit unusual.

> @@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
>  	}
>  	stop_progress(&progress);
>  
> -	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
> -		die(_("the commit graph format cannot write %d commits"), count_distinct);
> +	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
> +		error(_("the commit graph format cannot write %d commits"), count_distinct);
> +		res = 1;
> +		goto cleanup;
> +	}
>  
>  	commits.nr = 0;
>  	commits.alloc = count_distinct;
> @@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
>  	num_chunks = num_extra_edges ? 4 : 3;
>  	stop_progress(&progress);
>  
> -	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
> -		die(_("too many commits to write graph"));
> +	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
> +		error(_("too many commits to write graph"));
> +		res = 1;
> +		goto cleanup;
> +	}
>  
>  	compute_generation_numbers(&commits, report_progress);
>  
>  	graph_name = get_commit_graph_filename(obj_dir);
>  	if (safe_create_leading_directories(graph_name)) {
>  		UNLEAK(graph_name);
> -		die_errno(_("unable to create leading directories of %s"),
> -			  graph_name);
> +		error(_("unable to create leading directories of %s"),
> +			graph_name);
> +		res = errno;
> +		goto cleanup;
>  	}

Hmph.  Do we know errno==0 means no error everywhere?  Do we know
errno==1 is not used by anybody as a meaningful value?

What I am getting at is if a hardcoded "1" we saw above as "error
exists but we are not telling the caller what kind of system-level
error led to it by returning errno" (and a hardcoded "0" as "there
is no error") are consistent with this use of "res" where "the
callers are allowed to learn what system-level error led to this
error return from this function by sending the return value of this
function to strerror() or comparing with EWHATEVER".  I do not think
this is a good design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/12/2019 11:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>>  		UNLEAK(buf);
>>  	}
>>  
>> -	write_commit_graph(opts.obj_dir,
>> -			   pack_indexes,
>> -			   commit_hex,
>> -			   opts.append,
>> -			   1);
>> +	result = write_commit_graph(opts.obj_dir,
>> +				    pack_indexes,
>> +				    commit_hex,
>> +				    opts.append,
>> +				    1);
>>  
>>  	UNLEAK(lines);
>> -	return 0;
>> +	return result;
>>  }
> 
> What were the error values this function used to return?  I am
> wondering if the callers of this function are prepraed to see the
> returned values from write_commit_graph() this function stores in
> 'result' (which presumably are small negative value like our usual
> internal API convention)?

The only caller is cmd_commit_graph() and it is in this snippet:

        if (argc > 0) {
                if (!strcmp(argv[0], "read"))
                        return graph_read(argc, argv);
                if (!strcmp(argv[0], "verify"))
                        return graph_verify(argc, argv);
                if (!strcmp(argv[0], "write"))
                        return graph_write(argc, argv);
        }

So these return values are passed directly to the result of the
builtin. If that is against convention (passing an error code from
the library to the result of the builtin) then I can modify.

> OK.  The callers of write_commit_graph_reachable() can be careful
> about its return values to the same degree as the callers of
> write_commit_graph().
> 
> These functions perhaps deserves
> /*
>  * returns X when ....
>  */
> in front (or in *.h)?

Can do, in commit-graph.h.

>> +int write_commit_graph(const char *obj_dir,
>> +		       struct string_list *pack_indexes,
>> +		       struct string_list *commit_hex,
>> +		       int append, int report_progress)
>>  {
>>  	struct packed_oid_list oids;
>>  	struct packed_commit_list commits;
>>  	struct hashfile *f;
>>  	uint32_t i, count_distinct = 0;
>> -	char *graph_name;
>> +	char *graph_name = NULL;
>>  	struct lock_file lk = LOCK_INIT;
>>  	uint32_t chunk_ids[5];
>>  	uint64_t chunk_offsets[5];
>> @@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
>>  	uint64_t progress_cnt = 0;
>>  	struct strbuf progress_title = STRBUF_INIT;
>>  	unsigned long approx_nr_objects;
>> +	int res = 0;
>>  
>>  	if (!commit_graph_compatible(the_repository))
>> -		return;
>> +		return 0;
> 
> OK.  I tend to find "return 0" easier to read/follow than "return
> res" here.

Yes, this choice was deliberate as there is no cleanup to do if we
return this early. Also note that we don't "fail" because we did
exactly as much work as we expect in this scenario. I'll be careful
to point this out when I add a comment to the header file.

>>  	oids.nr = 0;
>>  	approx_nr_objects = approximate_object_count();
>>  	oids.alloc = approx_nr_objects / 32;
>>  	oids.progress = NULL;
>>  	oids.progress_done = 0;
>> +	commits.list = NULL;
>>  
>>  	if (append) {
>>  		prepare_commit_graph_one(the_repository, obj_dir);
>> @@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
>>  			strbuf_setlen(&packname, dirlen);
>>  			strbuf_addstr(&packname, pack_indexes->items[i].string);
>>  			p = add_packed_git(packname.buf, packname.len, 1);
>> -			if (!p)
>> -				die(_("error adding pack %s"), packname.buf);
>> -			if (open_pack_index(p))
>> -				die(_("error opening index for %s"), packname.buf);
>> +			if (!p) {
>> +				error(_("error adding pack %s"), packname.buf);
>> +				res = 1;
>> +				goto cleanup;
>> +			}
>> +			if (open_pack_index(p)) {
>> +				error(_("error opening index for %s"), packname.buf);
>> +				res = 1;
>> +				goto cleanup;
>> +			}
> 
> Hmph, is this signal an error by returning a positive "1"?  That's a
> bit unusual.

Your hint above of "passing a negative value by convention" did make me
think I must be doing something wrong.

>> @@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
>>  	}
>>  	stop_progress(&progress);
>>  
>> -	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
>> -		die(_("the commit graph format cannot write %d commits"), count_distinct);
>> +	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
>> +		error(_("the commit graph format cannot write %d commits"), count_distinct);
>> +		res = 1;
>> +		goto cleanup;
>> +	}
>>  
>>  	commits.nr = 0;
>>  	commits.alloc = count_distinct;
>> @@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
>>  	num_chunks = num_extra_edges ? 4 : 3;
>>  	stop_progress(&progress);
>>  
>> -	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
>> -		die(_("too many commits to write graph"));
>> +	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
>> +		error(_("too many commits to write graph"));
>> +		res = 1;
>> +		goto cleanup;
>> +	}
>>  
>>  	compute_generation_numbers(&commits, report_progress);
>>  
>>  	graph_name = get_commit_graph_filename(obj_dir);
>>  	if (safe_create_leading_directories(graph_name)) {
>>  		UNLEAK(graph_name);
>> -		die_errno(_("unable to create leading directories of %s"),
>> -			  graph_name);
>> +		error(_("unable to create leading directories of %s"),
>> +			graph_name);
>> +		res = errno;
>> +		goto cleanup;
>>  	}
> 
> Hmph.  Do we know errno==0 means no error everywhere?  Do we know
> errno==1 is not used by anybody as a meaningful value?
> 
> What I am getting at is if a hardcoded "1" we saw above as "error
> exists but we are not telling the caller what kind of system-level
> error led to it by returning errno" (and a hardcoded "0" as "there
> is no error") are consistent with this use of "res" where "the
> callers are allowed to learn what system-level error led to this
> error return from this function by sending the return value of this
> function to strerror() or comparing with EWHATEVER".  I do not think
> this is a good design.

That's a good point. In a new design, would you like me to (1) ignore
errno here and use a constant value for "write_commit_graph() failed
at some point" or to (2) split the possible _reasons_ for the failure
into different constants? I believe the use of error() should prevent
the need for the second option. The first option would only change
this 'res = errno' into 'res = 1'.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/13/2019 7:04 AM, Derrick Stolee wrote:
> On 5/12/2019 11:13 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>>>  		UNLEAK(buf);
>>>  	}
>>>  
>>> -	write_commit_graph(opts.obj_dir,
>>> -			   pack_indexes,
>>> -			   commit_hex,
>>> -			   opts.append,
>>> -			   1);
>>> +	result = write_commit_graph(opts.obj_dir,
>>> +				    pack_indexes,
>>> +				    commit_hex,
>>> +				    opts.append,
>>> +				    1);
>>>  
>>>  	UNLEAK(lines);
>>> -	return 0;
>>> +	return result;
>>>  }
>>
>> What were the error values this function used to return?  I am
>> wondering if the callers of this function are prepraed to see the
>> returned values from write_commit_graph() this function stores in
>> 'result' (which presumably are small negative value like our usual
>> internal API convention)?
> 
> The only caller is cmd_commit_graph() and it is in this snippet:
> 
>         if (argc > 0) {
>                 if (!strcmp(argv[0], "read"))
>                         return graph_read(argc, argv);
>                 if (!strcmp(argv[0], "verify"))
>                         return graph_verify(argc, argv);
>                 if (!strcmp(argv[0], "write"))
>                         return graph_write(argc, argv);
>         }
> 
> So these return values are passed directly to the result of the
> builtin. If that is against convention (passing an error code from
> the library to the result of the builtin) then I can modify.

And I see from your other feedback (upon re-reading) that you prefer
translating a negative error value from the library into a "1" here
for the builtin.

As I prepare my next version, I'll have write_commit_graph() return -1
for all errors and have graph_write() translate that to a 1. But I'll
wait to see if you want more specific error codes from write_commit_graph().

-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() and write_commit_graph_reachable() methods
> currently take two boolean parameters: 'append' and 'report_progress'.
> We will soon expand the possible options to send to these methods, so
> instead of complicating the parameter list, first simplify it.

I think this change to introduce "flags" and pack these two into a
single parameter, even if there is no plan to add code that starts
using third and subsequent bits immediately.

We are no longer adding anything beyond PROGRESS and APPEND in this
series, no?

>
> Collapse these parameters into a 'flags' parameter, and adjust the
> callers to provide flags as necessary.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit-graph.c | 8 +++++---
>  builtin/commit.c       | 2 +-
>  builtin/gc.c           | 4 ++--
>  commit-graph.c         | 9 +++++----
>  commit-graph.h         | 8 +++++---
>  5 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 2e86251f02..828b1a713f 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
>  	struct string_list *commit_hex = NULL;
>  	struct string_list lines;
>  	int result;
> +	int flags = COMMIT_GRAPH_PROGRESS;

Make it a habit to use "unsigned" not a signed type, when you pack a
collection of bits into a flag word, unless you are treating the MSB
specially, e.g. checking to see if it is negative is cheaper than
masking with MSB to see if it is set.

> ...
>  	result = write_commit_graph(opts.obj_dir,
>  				    pack_indexes,
>  				    commit_hex,
> -				    opts.append,
> -				    1);
> +				    flags);
> ...
> -int write_commit_graph_reachable(const char *obj_dir, int append,
> -				 int report_progress)
> +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
> ...
>  int write_commit_graph(const char *obj_dir,
>  		       struct string_list *pack_indexes,
>  		       struct string_list *commit_hex,
> -		       int append, int report_progress)
> +		       unsigned int flags)

OK, so the receivers of the flags word know the collection is
unsigned; it's just the user of the API in graph_write() that gets
the signedness wrong.  OK, easy enough to correct, I guess.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/12/2019 11:44 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The write_commit_graph() and write_commit_graph_reachable() methods
>> currently take two boolean parameters: 'append' and 'report_progress'.
>> We will soon expand the possible options to send to these methods, so
>> instead of complicating the parameter list, first simplify it.
> 
> I think this change to introduce "flags" and pack these two into a
> single parameter, even if there is no plan to add code that starts
> using third and subsequent bits immediately.
> 
> We are no longer adding anything beyond PROGRESS and APPEND in this
> series, no?

In this series, we are no longer expanding the options. I will add
a flag when I update the incremental file format series. I can modify
the message to no longer hint at an immediate addition.

>>
>> Collapse these parameters into a 'flags' parameter, and adjust the
>> callers to provide flags as necessary.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  builtin/commit-graph.c | 8 +++++---
>>  builtin/commit.c       | 2 +-
>>  builtin/gc.c           | 4 ++--
>>  commit-graph.c         | 9 +++++----
>>  commit-graph.h         | 8 +++++---
>>  5 files changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> index 2e86251f02..828b1a713f 100644
>> --- a/builtin/commit-graph.c
>> +++ b/builtin/commit-graph.c
>> @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
>>  	struct string_list *commit_hex = NULL;
>>  	struct string_list lines;
>>  	int result;
>> +	int flags = COMMIT_GRAPH_PROGRESS;
> 
> Make it a habit to use "unsigned" not a signed type, when you pack a
> collection of bits into a flag word, unless you are treating the MSB
> specially, e.g. checking to see if it is negative is cheaper than
> masking with MSB to see if it is set.

Ah sorry. I missed this one after changing the parameter in your earlier
feedback.
 
>> ...
>>  	result = write_commit_graph(opts.obj_dir,
>>  				    pack_indexes,
>>  				    commit_hex,
>> -				    opts.append,
>> -				    1);
>> +				    flags);
>> ...
>> -int write_commit_graph_reachable(const char *obj_dir, int append,
>> -				 int report_progress)
>> +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
>> ...
>>  int write_commit_graph(const char *obj_dir,
>>  		       struct string_list *pack_indexes,
>>  		       struct string_list *commit_hex,
>> -		       int append, int report_progress)
>> +		       unsigned int flags)
> 
> OK, so the receivers of the flags word know the collection is
> unsigned; it's just the user of the API in graph_write() that gets
> the signedness wrong.  OK, easy enough to correct, I guess.
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, SZEDER Gábor wrote (reply to this):

On Wed, Jun 12, 2019 at 06:29:37AM -0700, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index e80c1cac02..3b6fd0d728 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
>  	test_path_is_file info/commit-graph
>  '
>  
> +test_expect_success 'close with correct error on bad input' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	echo doesnotexist >in &&
> +	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
> +	test "$ret" = 1 &&

This could be: 

  test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr


> +	test_i18ngrep "error adding pack" stderr
> +'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/29/2019 1:23 PM, SZEDER Gábor wrote:
> On Wed, Jun 12, 2019 at 06:29:37AM -0700, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
>> index e80c1cac02..3b6fd0d728 100755
>> --- a/t/t5318-commit-graph.sh
>> +++ b/t/t5318-commit-graph.sh
>> @@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
>>  	test_path_is_file info/commit-graph
>>  '
>>  
>> +test_expect_success 'close with correct error on bad input' '
>> +	cd "$TRASH_DIRECTORY/full" &&
>> +	echo doesnotexist >in &&
>> +	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
>> +	test "$ret" = 1 &&
> 
> This could be: 
> 
>   test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr
> 
> 
>> +	test_i18ngrep "error adding pack" stderr
>> +'

Thanks!, you are right! test_expect_code is what I should have used here
instead of finding the "ret=$?" trick in t0005-signals.sh, which needs to
do more interesting logic on the return code.

Here is your suggestion as a diff. Junio: could you squash this in, or
should I submit a full patch?

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 22cb9d66430..4391007f4c1 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,8 +26,7 @@ test_expect_success 'write graph with no packs' '
 test_expect_success 'close with correct error on bad input' '
        cd "$TRASH_DIRECTORY/full" &&
        echo doesnotexist >in &&
-       { git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
-       test "$ret" = 1 &&
+       test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr &&
        test_i18ngrep "error adding pack" stderr
 '

I took inventory of when we are using "=$?" in the test scripts and saw
this was the only one that could easily be removed. Every other place is
doing something that can't be replaced by test_expect_code.

Thanks,
-Stolee

struct string_list *commit_hex = NULL;
struct string_list lines;
int result = 0;
unsigned int flags = COMMIT_GRAPH_PROGRESS;

static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
Expand All @@ -165,13 +167,13 @@ static int graph_write(int argc, const char **argv)
die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
if (opts.append)
flags |= COMMIT_GRAPH_APPEND;

read_replace_refs = 0;

if (opts.reachable) {
write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
return 0;
}
if (opts.reachable)
return write_commit_graph_reachable(opts.obj_dir, flags);

string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
Expand All @@ -188,14 +190,14 @@ static int graph_write(int argc, const char **argv)
UNLEAK(buf);
}

write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
opts.append,
1);
if (write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
flags))
result = 1;

UNLEAK(lines);
return 0;
return result;
}

int cmd_commit_graph(int argc, const char **argv, const char *prefix)
Expand Down
5 changes: 3 additions & 2 deletions builtin/commit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
"new_index file. Check that disk is not full and quota is\n"
"not exceeded, and then \"git reset HEAD\" to recover."));

if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
write_commit_graph_reachable(get_object_directory(), 0, 0);
if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
write_commit_graph_reachable(get_object_directory(), 0))
return 1;

repo_rerere(the_repository, 0);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
Expand Down
7 changes: 4 additions & 3 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
clean_pack_garbage();
}

if (gc_write_commit_graph)
write_commit_graph_reachable(get_object_directory(), 0,
!quiet && !daemonized);
if (gc_write_commit_graph &&
write_commit_graph_reachable(get_object_directory(),
!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
return 1;

if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
Expand Down
Loading