Maintenance builtin, allowing 'gc --auto' customization #671

gitgitgadget · 2020-07-25T01:30:44Z

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Jul 23, 2020 at 05:56:23PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> Everything in this patch looks very sensible to me, and I think that implementing 'git maintenance' in the same file as the current 'git gc' builtin makes a lot of sense to me. Reviewed-by: Taylor Blau <me@ttaylorr.com> Thanks, Taylor

gitgitgadget · 2020-07-25T01:48:27Z

On the Git mailing list, Đoàn Trần Công Danh wrote (reply to this):

On 2020-07-23 17:56:23+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > From: Derrick Stolee <dstolee@microsoft.com> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > new file mode 100755 > index 0000000000..d00641c4dd > --- /dev/null > +++ b/t/t7900-maintenance.sh > @@ -0,0 +1,22 @@ > +#!/bin/sh > + > +test_description='git maintenance builtin' > + > +GIT_TEST_COMMIT_GRAPH=0 > +GIT_TEST_MULTI_PACK_INDEX=0 > + > +. ./test-lib.sh > + > +test_expect_success 'help text' ' > + test_must_fail git maintenance -h 2>err && > + test_i18ngrep "usage: git maintenance run" err > +' I think this test has been tested better already in t0012, Anyway, if we would like to check the word "git maintenance run" explicitly here, it would be clearer to s/test_must_fail/test_expect_code 129/ > + > +test_expect_success 'gc [--auto]' ' > + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && > + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && > + grep ",\"gc\"]" run-no-auto.txt && > + grep ",\"gc\",\"--auto\"]" run-auto.txt > +' > + > +test_done > -- > gitgitgadget > -- Danh

gitgitgadget · 2020-07-29T22:23:14Z

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Hi, Derrick Stolee wrote: > [Subject: maintenance: create basic maintenance runner] Seems sensible, and a good way to set up for the later patches. Let's take a look at how it does that. [...] > --- /dev/null > +++ b/Documentation/git-maintenance.txt > @@ -0,0 +1,57 @@ [...] > +SUBCOMMANDS > +----------- > + > +run:: > + Run one or more maintenance tasks. [jrnieder] How do I supply the tasks on the command line? Are they parameters to this subcommand? If so, it could make sense for this to say something like run <task>...:: What is the exit code convention for "git maintenance run"? (Many Git commands don't document this, but since we're making a new command it seems worth building the habit.) [...] > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -699,3 +699,62 @@ int cmd_gc(int argc, const char **argv, const char *prefix) [...] > +static struct maintenance_opts { > + int auto_flag; > +} opts; Packing this in a struct feels a bit unusual. Is the struct going to be passed somewhere, or could these be individual locals in cmd_maintenance? [...] > + > +static int maintenance_task_gc(void) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + > + argv_array_pushl(&cmd, "gc", NULL); > + > + if (opts.auto_flag) > + argv_array_pushl(&cmd, "--auto", NULL); These are both pushing single strings, so they can use argv_array_push. [...] > --- /dev/null > +++ b/t/t7900-maintenance.sh > @@ -0,0 +1,22 @@ > +#!/bin/sh > + > +test_description='git maintenance builtin' > + > +GIT_TEST_COMMIT_GRAPH=0 > +GIT_TEST_MULTI_PACK_INDEX=0 Why does this disable commit graph and multipack index? Is that setting up for something that comes later? [...] > +test_expect_success 'gc [--auto]' ' > + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && > + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && > + grep ",\"gc\"]" run-no-auto.txt && > + grep ",\"gc\",\"--auto\"]" run-auto.txt This feels a little odd in two ways: - the use of "git gc" isn't a user-facing detail, so this is testing implementation instead of the user-facing behavior. That's okay, though --- it can be useful to test internals sometimes. - the way that this tests for "git gc" feels brittle: if the trace emitter changes some day to include a space after the comma, for example, then the test would start failing. I thought that in the spirit of fakes, it could make sense to write a custom "git gc" command using test_write_script, but that isn't likely to work either since gc is a builtin. Perhaps this is suggesting that we need some central test helper for parsing traces so we can do this reliably in one place. What do you think? Thanks, Jonathan

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > Hi, > > Derrick Stolee wrote: > >> [Subject: maintenance: create basic maintenance runner] > > Seems sensible, and a good way to set up for the later patches. Let's > take a look at how it does that. > > [...] >> --- /dev/null >> +++ b/Documentation/git-maintenance.txt >> @@ -0,0 +1,57 @@ > [...] >> +SUBCOMMANDS >> +----------- >> + >> +run:: >> + Run one or more maintenance tasks. > > [jrnieder] How do I supply the tasks on the command line? Are they > parameters to this subcommand? If so, it could make sense for this to > say something like > > run <task>...:: Hopefully this is documented to your satisfaction when the ability to customize the tasks is implemented. > What is the exit code convention for "git maintenance run"? (Many Git > commands don't document this, but since we're making a new command > it seems worth building the habit.) Is this worth doing? Do we really want every command to document that "0 means success, everything else means failure, and some of those exit codes mean a specific kind of failure that is global to Git"? > [...] >> --- a/builtin/gc.c >> +++ b/builtin/gc.c >> @@ -699,3 +699,62 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > [...] >> +static struct maintenance_opts { >> + int auto_flag; >> +} opts; > > Packing this in a struct feels a bit unusual. Is the struct going to > be passed somewhere, or could these be individual locals in > cmd_maintenance? This will grow, and I'd rather have one global struct than many individual global items. It makes it clearer when I use "opts.auto_flag" that this corresponds to whether "--auto" was provided as a command-line option. > [...] >> + >> +static int maintenance_task_gc(void) >> +{ >> + int result; >> + struct argv_array cmd = ARGV_ARRAY_INIT; >> + >> + argv_array_pushl(&cmd, "gc", NULL); >> + >> + if (opts.auto_flag) >> + argv_array_pushl(&cmd, "--auto", NULL); > > These are both pushing single strings, so they can use argv_array_push. Thanks. I noticed a few of these myself. Luckily, I'll be going through all of these invocations and replacing them with new methods soon ;) [1] https://lore.kernel.org/git/30933a71-3130-5478-cbfd-0ca5bb308cf2@gmail.com/ > [...] >> --- /dev/null >> +++ b/t/t7900-maintenance.sh >> @@ -0,0 +1,22 @@ >> +#!/bin/sh >> + >> +test_description='git maintenance builtin' >> + >> +GIT_TEST_COMMIT_GRAPH=0 >> +GIT_TEST_MULTI_PACK_INDEX=0 > > Why does this disable commit graph and multipack index? Is that setting > up for something that comes later? Yes, these don't need to be here yet. > [...] >> +test_expect_success 'gc [--auto]' ' >> + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && >> + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && >> + grep ",\"gc\"]" run-no-auto.txt && >> + grep ",\"gc\",\"--auto\"]" run-auto.txt > > This feels a little odd in two ways: > > - the use of "git gc" isn't a user-facing detail, so this is testing > implementation instead of the user-facing behavior. That's okay, > though --- it can be useful to test internals sometimes. Consider this a "unit test" of the builtin. I'm testing whether the command-line arguments had an effect on the child process. > - the way that this tests for "git gc" feels brittle: if the trace > emitter changes some day to include a space after the comma, for > example, then the test would start failing. I thought that in the > spirit of fakes, it could make sense to write a custom "git gc" > command using test_write_script, but that isn't likely to work > either since gc is a builtin. > > Perhaps this is suggesting that we need some central test helper for > parsing traces so we can do this reliably in one place. What do you > think? I'm specifically using GIT_TRACE2_EVENT, which is intended for consumption by computer, not humans. Adding whitespace or otherwise changing the output format would be an unnecessary disruption that is more likely to have downstream effects with external tools. In some way, adding extra dependencies like this in our own test suite can help ensure the output format is more stable. If there is a better way to ask "Did my command call 'git gc' (with no arguments|with these arguments)?" then I'm happy to consider it. Thanks, -Stolee

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Derrick Stolee wrote: > On 7/29/2020 6:19 PM, Jonathan Nieder wrote: >> [jrnieder] How do I supply the tasks on the command line? Are they >> parameters to this subcommand? If so, it could make sense for this to >> say something like >> >> run <task>...:: > > Hopefully this is documented to your satisfaction when the ability > to customize the tasks is implemented. Hm, do you mean that this change is too difficult to squash into the first patch? >> What is the exit code convention for "git maintenance run"? (Many Git >> commands don't document this, but since we're making a new command >> it seems worth building the habit.) > > Is this worth doing? Do we really want every command to document > that "0 means success, everything else means failure, and some of > those exit codes mean a specific kind of failure that is global to > Git"? I mean things like "what exit code does it run if there is no maintenance to do?" [...] >>> +static struct maintenance_opts { >>> + int auto_flag; >>> +} opts; >> >> Packing this in a struct feels a bit unusual. Is the struct going to >> be passed somewhere, or could these be individual locals in >> cmd_maintenance? > > This will grow, and I'd rather have one global struct than many > individual global items. It makes it clearer when I use > "opts.auto_flag" that this corresponds to whether "--auto" was > provided as a command-line option. That doesn't seem idiomatic within the Git codebase. Can they be locals instead? [...] >> Perhaps this is suggesting that we need some central test helper for >> parsing traces so we can do this reliably in one place. What do you >> think? > > I'm specifically using GIT_TRACE2_EVENT, which is intended for > consumption by computer, not humans. Adding whitespace or otherwise > changing the output format would be an unnecessary disruption that > is more likely to have downstream effects with external tools. > > In some way, adding extra dependencies like this in our own test > suite can help ensure the output format is more stable. Hm, I thought the contract for GIT_TRACE2_EVENT is that it is JSON, which is a well defined format. To that end, I would sincerely hope that external tools are not assuming anything beyond that the format is JSON. If that wasn't the intent, I'd rather that we not use a JSON-like format at all. That would make the expectations for evolving this interface clearer. > If there is a better way to ask "Did my command call 'git gc' (with > no arguments|with these arguments)?" then I'm happy to consider it. My proposal was just to factor this out into a function in test-lib-functions.sh so it's easy to evolve over time in one place. Thanks and hope that helps, Jonathan

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Derrick Stolee wrote: > On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > > Derrick Stolee wrote: >>> --- /dev/null >>> +++ b/Documentation/git-maintenance.txt >>> @@ -0,0 +1,57 @@ >> [...] >>> +SUBCOMMANDS >>> +----------- >>> + >>> +run:: >>> + Run one or more maintenance tasks. >> >> [jrnieder] How do I supply the tasks on the command line? Are they >> parameters to this subcommand? If so, it could make sense for this to >> say something like >> >> run <task>...:: > > Hopefully this is documented to your satisfaction when the ability > to customize the tasks is implemented. I think my last reply to this part missed the point a little, so let me try again. In multi-patch series, requests to make improvements to early parts that would be obsoleted by later parts may seem strange --- isn't this just wasted work? Sure, occasionally someone will stumble on the early part using "git bisect" or "git log", but isn't that rare enough to not be worth worrying too much about? And I'd agree with that reaction --- for a documentation nit like this one, that benefit alone is probably not enough to be worth the fuss of tweaking things. When I focus more on early parts of a series, it is for a different reason: from a release management point of view, it is useful to get the early part of a series "cooked" as soon as possible so it can be exposed to users in "next" and then become a foundation for later patches in "master". So we try to get as much of the early part of a series into that state as we can, and to merge the early part even without the later part as soon as the former is ready and valuable enough to users on its own. Relatedly, ensuring each commit is "complete" in this sense on its own gives us more flexibility in case a problem is discovered later: we can revert a patch that is causing issues without having to revert the preceding part. I think that might be part of the context I didn't convey well, and why I want to get this in tip-top shape and start using it. Thanks, Jonathan

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/30/2020 8:30 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: >> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > >>> [jrnieder] How do I supply the tasks on the command line? Are they >>> parameters to this subcommand? If so, it could make sense for this to >>> say something like >>> >>> run <task>...:: >> >> Hopefully this is documented to your satisfaction when the ability >> to customize the tasks is implemented. > > Hm, do you mean that this change is too difficult to squash into the > first patch? I mean that there is only one task right now. Until the commit-graph task is implemented, there is no need to have a --task=<task> option. Creating a --task=<task> option in this patch would unnecessarily bloat the patch by implementing multiple tasks while filling in the boilerplate of creating a new builtin. >>> What is the exit code convention for "git maintenance run"? (Many Git >>> commands don't document this, but since we're making a new command >>> it seems worth building the habit.) >> >> Is this worth doing? Do we really want every command to document >> that "0 means success, everything else means failure, and some of >> those exit codes mean a specific kind of failure that is global to >> Git"? > > I mean things like "what exit code does it run if there is no > maintenance to do?" > > [...] >>>> +static struct maintenance_opts { >>>> + int auto_flag; >>>> +} opts; >>> >>> Packing this in a struct feels a bit unusual. Is the struct going to >>> be passed somewhere, or could these be individual locals in >>> cmd_maintenance? >> >> This will grow, and I'd rather have one global struct than many >> individual global items. It makes it clearer when I use >> "opts.auto_flag" that this corresponds to whether "--auto" was >> provided as a command-line option. > > That doesn't seem idiomatic within the Git codebase. > > Can they be locals instead? Which part do you want to be idiomatic? The fact that the parse-options library typically refers to static global values or the fact that the data is not organized in a struct? Turning these into locals is _even worse_ than the current model of static globals because that then requires updating the prototypes of all the (static) local methods that operate on the arguments. Every time a new option is added, that would cascade in method prototypes all over again. The natural thing to do to "fix" that situation is to create a struct that holds the information from the parsed command-line arguments. But then why is it a local parameter that is passed through all of the local methods instead of a global (as presented here in this patch)? > [...] >>> Perhaps this is suggesting that we need some central test helper for >>> parsing traces so we can do this reliably in one place. What do you >>> think? >> >> I'm specifically using GIT_TRACE2_EVENT, which is intended for >> consumption by computer, not humans. Adding whitespace or otherwise >> changing the output format would be an unnecessary disruption that >> is more likely to have downstream effects with external tools. >> >> In some way, adding extra dependencies like this in our own test >> suite can help ensure the output format is more stable. > > Hm, I thought the contract for GIT_TRACE2_EVENT is that it is JSON, > which is a well defined format. To that end, I would sincerely hope > that external tools are not assuming anything beyond that the format > is JSON. > > If that wasn't the intent, I'd rather that we not use a JSON-like > format at all. That would make the expectations for evolving this > interface clearer. > >> If there is a better way to ask "Did my command call 'git gc' (with >> no arguments|with these arguments)?" then I'm happy to consider it. > > My proposal was just to factor this out into a function in > test-lib-functions.sh so it's easy to evolve over time in one place. This is a valuable suggestion, but this series is already too large to make such a change in addition to the patches already here. I'd consider this more if there were more tests that checked "did we run this subcommand with these arguments?" Better to extract a helper when this is actually used in multiple places. Thanks, -Stolee

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Derrick Stolee wrote: > What is seems like you are asking instead is for me to create a tool > in the test suite that parses each JSON line, extracts a specific > member from that JSON object, reconstructs a command-line invocation > from the JSON array, and reports whether that process worked for any > line in the event output. No, that isn't what I'm asking. I'm asking for this patch series to take the existing "grep" lines and put them in a function in test-lib-functions.sh, so that we can change them in one place when the trace emitter changes. [...] > If this is to truly be a hard requirement for these tests to move > forward, Yes, from my point of view it really is. But that "is this truly a hard requirement?" comes up tells me I have not done a good job of communicating in this review. A review is about participants in the project working together to improve a patch, not people making demands at each other. [...] > If I'm to spend time engineering something more complicated just to > check "did this subcommand run with these arguments?" then I don't see why this is more complicated than what is in patch 1. In fact, I think it would be a little more simple. Jonathan

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/4/2020 10:42 AM, Jonathan Nieder wrote: > Derrick Stolee wrote: > >> What is seems like you are asking instead is for me to create a tool >> in the test suite that parses each JSON line, extracts a specific >> member from that JSON object, reconstructs a command-line invocation >> from the JSON array, and reports whether that process worked for any >> line in the event output. > > No, that isn't what I'm asking. > > I'm asking for this patch series to take the existing "grep" lines > and put them in a function in test-lib-functions.sh, so that we can > change them in one place when the trace emitter changes. Thanks for clarifying, clearing up my misinterpretation of your request. I'm coming around to this idea. > [...] >> If this is to truly be a hard requirement for these tests to move >> forward, > > Yes, from my point of view it really is. > > But that "is this truly a hard requirement?" comes up tells me I have > not done a good job of communicating in this review. A review is > about participants in the project working together to improve a patch, > not people making demands at each other. It's also about a balance. A patch author and a reviewer can have disagreements, and it is important to discover exactly how hard each is holding to their opinion. > [...] >> If I'm to spend time engineering something more complicated just to >> check "did this subcommand run with these arguments?" then > > I don't see why this is more complicated than what is in patch 1. In > fact, I think it would be a little more simple. My interpretation (parsing JSON) _was_ more complicated, hence my reservations. The helper still requires non-trivial script-fu (for someone like me who is bad at scripting languages) but centralizing the grep has value. Here is my attempt so far: trace2_subcommand_data () { command="$1" && commas=$(echo $command | sed s/\ /\",\"/g) && printf "[\"$commas\"]" } test_subcommand_called () { data=$(trace2_subcommand_data $1) && grep $data $2 } test_subcommand_not_called () { ! test_subcommand_called $1 $2 } with callers looking like test_subcommand_called "gc" run-no-auto.txt && test_subcommand_not_called "gc --auto" run-auto.txt && test_subcommand_called "gc --quiet" run-quiet.txt Thanks, -Stolee

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Derrick Stolee wrote: > Here is my attempt so far: [...] > test_subcommand_called "gc" run-no-auto.txt && > test_subcommand_not_called "gc --auto" run-auto.txt && > test_subcommand_called "gc --quiet" run-quiet.txt I like it. Some tweaks: - can simplify by using 'printf' instead of sed - passing the trace in stdin - following the convention that helpers like test_i18ngrep use of accepting ! as a parameter to negate the exit code That would make (untested) # Check that the given command was invoked as part of the # trace2-format trace on stdin. # # test_subcommand_called [!] <command> <args>... < <trace> # # For example, to look for an invocation of "git upload-pack # /path/to/repo" # # GIT_TRACE2_EVENT=event.log git fetch ... && # test_subcommand_called git upload-pack "$PATH" <event.log # # If the first parameter passed is !, this instead checks that # the given command was not called. # test_subcommand_called () { local negate= if test "$1" = "!" then negate=t shift fi local expr=$(printf '"%s",' "$@") expr="${expr%,}" if test -n "$negate" then ! grep "\[$expr\]" else grep "\[$expr\]" fi }

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/4/2020 1:02 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: > >> Here is my attempt so far: > [...] >> test_subcommand_called "gc" run-no-auto.txt && >> test_subcommand_not_called "gc --auto" run-auto.txt && >> test_subcommand_called "gc --quiet" run-quiet.txt > > I like it. Some tweaks: > - can simplify by using 'printf' instead of sed > - passing the trace in stdin > - following the convention that helpers like test_i18ngrep use of > accepting ! as a parameter to negate the exit code > > That would make (untested) > > # Check that the given command was invoked as part of the > # trace2-format trace on stdin. > # > # test_subcommand_called [!] <command> <args>... < <trace> > # > # For example, to look for an invocation of "git upload-pack > # /path/to/repo" > # > # GIT_TRACE2_EVENT=event.log git fetch ... && > # test_subcommand_called git upload-pack "$PATH" <event.log > # > # If the first parameter passed is !, this instead checks that > # the given command was not called. > # > test_subcommand_called () { > local negate= > if test "$1" = "!" > then > negate=t > shift > fi > > local expr=$(printf '"%s",' "$@") > expr="${expr%,}" > > if test -n "$negate" > then > ! grep "\[$expr\]" > else > grep "\[$expr\]" > fi > } This works! Thanks. With the ! usage, I think a rename to 'test_subcommand' makes sense. Further, I was somehow wrong about not including "git" in the beginning, so here is my example use of this method; test_subcommand git gc <run-no-auto.txt && test_subcommand ! git gc --auto <run-auto.txt && test_subcommand git gc --quiet <run-quiet.txt Thanks, -Stolee

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/3/2020 1:46 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: >> On 7/30/2020 8:30 PM, Jonathan Nieder wrote: >>> Derrick Stolee wrote: >>>> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: Revisiting the first part of this thread, as it got lost in the discussion about trace2 and JSON parsing... >>>>> [jrnieder] How do I supply the tasks on the command line? Are they >>>>> parameters to this subcommand? If so, it could make sense for this to >>>>> say something like >>>>> >>>>> run <task>...:: >>>> >>>> Hopefully this is documented to your satisfaction when the ability >>>> to customize the tasks is implemented. > [...] >> I mean that there is only one task right now. Until the commit-graph >> task is implemented, there is no need to have a --task=<task> option. > > Ah, that wasn't clear to me from the docs. > > You're saying that "git maintenance run" runs the "gc" task? Can the > documentation say so? In this patch, the documentation says Run tasks to optimize Git repository data the plural is a bit incorrect at this point, but not a huge deal. The run subcommand says "Run one or more maintenance tasks" which again is a bit misleading since "or more" is not technically possible. This becomes a bit obvious when we see that the task list only contains a 'gc' task. So no, the documentation in this patch doesn't say "Run the 'gc' task" because that could be gathered from context _and_ it will be incorrect by the time this patch is integrated as part of the series. So I struggle with finding a way to improve the documentation in this patch to satisfy your concerns. Any change made here will just be un-done by a later patch in the same series. I'd rather have something vague now and improve the documentation incrementally along with the implementation. Another point you made earlier, but seems to be dropped in this email: you wanted the return code documented. This also seems like wasteful text, because the best I can think of is This command will return 0 unless a maintenance task failed in an unrecoverable way, or other typical failure cases such as incorrect usage or the command being run outside of a Git repository. Please recommend something better, because that description suffers from being too generic/obvious when talking about the maintenance task and being an obvious repetition of Git's existing error code behavior. Perhaps this would be easier if there was an established pattern of documenting the return code. So, I went digging. The examples I could find are: 1. 'git cat-file -e' 2. 'git check-ref-format' 3. 'git commit-graph --object-dir=<dir>' 4. 'git cvsimport' 5. 'git fast-import' 6. 'git filter-branch' 7. 'git grep --quiet' 8. 'git gui citool' 9. 'git ls-remote --exit-code' 10. 'git merge-base --is-ancestor' 11. 'git push --recurse-submodules' 12. 'git reflog exists' 13. 'git rev-parse --parseopt' 14. 'git update-index --test-untracked-cache' Most of these are for specific options, and the exit code is documented with relation to how that specific option may lead to a non-zero exit code (1, 3, 7, 8, 9, 10, 11, 12, 13, 14). The one that is most clearly part of the main description is for the plumbing command 'git check-ref-format': Checks if a given 'refname' is acceptable, and exits with a non-zero status if it is not. This seems appropriate for something intended for use as a helper in script, with a very simple behavior. The exit code is the only way it provides any feedback to the caller! This use of the exit code as the only output is used by some other cases (1, 7, 9, 10, 11, 12, 13, 14). Of course, we also have examples like this in git-cvsimport.txt: Otherwise, success is indicated the Unix way, i.e. by simply exiting with a zero exit status. This phrase should be _assumed_ as part of Git itself, instead of written in the documentation for specific commands. The documentation for 'git fast-import' is more useful here: If fast-import is supplied invalid input it will terminate with a non-zero exit status and create a crash report... But, you can see that the exit code is not the focus, but the crash report _is_ important to document. 'git filter-branch' has a special exit code for the case that no commits were found to update. 'git gui citool' returns an error code if the window is closed before the command can complete. This is also not a core feature, so is not a good example. My only conclusion from this investigation is that the exit code is documented when it is useful, but otherwise the standard Unix pattern is assumed and not worth documenting. If this is something we want to change, then I don't think it valuable to have that discussion here. Instead, let's start that discussion elsewhere with a series that updates existing docs to catch up to a new standard of documenting the exit code. [1] https://git-scm.com/docs/git-cat-file#Documentation/git-cat-file.txt--e [2] https://git-scm.com/docs/git-check-ref-format#_description [3] https://git-scm.com/docs/git-commit-graph#Documentation/git-commit-graph.txt---object-dir [4] https://git-scm.com/docs/git-cvsimport#_output [5] https://git-scm.com/docs/git-fast-import#_crash_reports [6] https://git-scm.com/docs/git-filter-branch#_exit_status [7] https://git-scm.com/docs/git-grep#Documentation/git-grep.txt---quiet [8] https://git-scm.com/docs/git-gui#Documentation/git-gui.txt-codegitguicitoolcode [9] https://git-scm.com/docs/git-ls-remote#Documentation/git-ls-remote.txt---exit-code [10] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor [11] https://git-scm.com/docs/git-push#Documentation/git-push.txt---recurse-submodulescheckon-demandonlyno [12] https://git-scm.com/docs/git-reflog [13] https://git-scm.com/docs/git-rev-parse#_parseopt [14] https://git-scm.com/docs/git-update-index#Documentation/git-update-index.txt---test-untracked-cache > [...] >>>>>> +static struct maintenance_opts { >>>>>> + int auto_flag; >>>>>> +} opts; >>>>> >>>>> Packing this in a struct feels a bit unusual. Is the struct going to >>>>> be passed somewhere, or could these be individual locals in >>>>> cmd_maintenance? >>>> >>>> This will grow, and I'd rather have one global struct than many >>>> individual global items. It makes it clearer when I use >>>> "opts.auto_flag" that this corresponds to whether "--auto" was >>>> provided as a command-line option. >>> >>> That doesn't seem idiomatic within the Git codebase. >>> >>> Can they be locals instead? >> >> Which part do you want to be idiomatic? The fact that the parse-options >> library typically refers to static global values or the fact that the >> data is not organized in a struct? > > parse-options has no requirement about the values being global. Some > older code does that, but newer code tends to use locals when > appropriate. You're right. Global is not required. The struct needs to be 'static' or initialized for compilation. While trying to get rid of globals based on your recommendation, I see that the callback functions in the option-parsing _do_ require global state. This is used in "maintenance: add --task option" with the global tasks array, but it also uses a global count of "how many tasks have been selected" currently in the struct. Two ways to get around that issue include (1) making a global int, or (2) looping through the task list to count how many have been selected so far. Neither of these solutions is clean. (In the diff below, I chose option 2.) > Putting it in a struct is fine as long as that struct is being passed > around. What isn't idiomatic in Git is using a global struct for > namespacing. In my opinion, using a global struct would be better than the existing pattern of a bunch of global variables, but I get your point about precedence. More on this below. > [...] >> The natural thing to do to "fix" that situation is to create a struct >> that holds the information from the parsed command-line arguments. But >> then why is it a local parameter that is passed through all of the >> local methods instead of a global (as presented here in this patch)? > > I'm having trouble parsing that last sentence. You're saying a global > is preferable over passing around a pointer to a local "opts" struct? Using a "singleton" like the opts struct here, the code is definitely much simpler to read. There is only ever one instance, because the struct definition comes along with the instance declaration. By making it a local instance, we then need to pass around the struct as a parameter to all of the static methods in the builtin, even though there is only ever one instance of this struct per process. The two points above are the ones that led me to use the global instance in the first place. To get back to your point about being idiomatic, these builtins use a struct to group options, then pass it around as a parameter: * cat-file (struct batch_options) * checkout (struct checkout_opts) * rebase (struct rebase_options) * tag (struct create_tag_options) * worktree (struct add_opts) Unfortunately, there _is_ an example of the behavior I was following in this series: my own example in the sparse-checkout builtin. So, I'm just repeating my own mistakes that were not challenged before. I updated my v4-in-progress branch with that change, and the tip-to-tip diff is below. It required some non-trivial care around the --task=<task> callback (which can't see the local options struct) and other cases. Thanks, -Stolee --- diff --git a/builtin/gc.c b/builtin/gc.c index 1f0c519d14..3e473a1293 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -709,11 +709,10 @@ static const char * const builtin_maintenance_usage[] = { NULL }; -static struct maintenance_opts { +struct maintenance_opts { int auto_flag; int quiet; - int tasks_selected; -} opts; +}; /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -789,7 +788,7 @@ static int should_write_commit_graph(void) return result; } -static int run_write_commit_graph(void) +static int run_write_commit_graph(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -797,13 +796,13 @@ static int run_write_commit_graph(void) strvec_pushl(&child.args, "commit-graph", "write", "--split", "--reachable", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); return !!run_command(&child); } -static int run_verify_commit_graph(void) +static int run_verify_commit_graph(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -811,24 +810,24 @@ static int run_verify_commit_graph(void) strvec_pushl(&child.args, "commit-graph", "verify", "--shallow", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); return !!run_command(&child); } -static int maintenance_task_commit_graph(void) +static int maintenance_task_commit_graph(struct maintenance_opts *opts) { struct repository *r = the_repository; char *chain_path; close_object_store(r->objects); - if (run_write_commit_graph()) { + if (run_write_commit_graph(opts)) { error(_("failed to write commit-graph")); return 1; } - if (!run_verify_commit_graph()) + if (!run_verify_commit_graph(opts)) return 0; warning(_("commit-graph verify caught error, rewriting")); @@ -840,14 +839,14 @@ static int maintenance_task_commit_graph(void) } free(chain_path); - if (!run_write_commit_graph()) + if (!run_write_commit_graph(opts)) return 0; error(_("failed to rewrite commit-graph")); return 1; } -static int fetch_remote(const char *remote) +static int fetch_remote(const char *remote, struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -855,7 +854,7 @@ static int fetch_remote(const char *remote) strvec_pushl(&child.args, "fetch", remote, "--prune", "--no-tags", "--no-write-fetch-head", "--refmap=", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote); @@ -871,7 +870,7 @@ static int fill_each_remote(struct remote *remote, void *cbdata) return 0; } -static int maintenance_task_prefetch(void) +static int maintenance_task_prefetch(struct maintenance_opts *opts) { int result = 0; struct string_list_item *item; @@ -886,23 +885,23 @@ static int maintenance_task_prefetch(void) for (item = remotes.items; item && item < remotes.items + remotes.nr; item++) - result |= fetch_remote(item->string); + result |= fetch_remote(item->string, opts); cleanup: string_list_clear(&remotes, 0); return result; } -static int maintenance_task_gc(void) +static int maintenance_task_gc(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_push(&child.args, "gc"); - if (opts.auto_flag) + if (opts->auto_flag) strvec_push(&child.args, "--auto"); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); else strvec_push(&child.args, "--no-quiet"); @@ -911,14 +910,14 @@ static int maintenance_task_gc(void) return run_command(&child); } -static int prune_packed(void) +static int prune_packed(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_push(&child.args, "prune-packed"); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); return !!run_command(&child); @@ -977,7 +976,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid, return ++(d->count) > d->batch_size; } -static int pack_loose(void) +static int pack_loose(struct maintenance_opts *opts) { struct repository *r = the_repository; int result = 0; @@ -996,7 +995,7 @@ static int pack_loose(void) pack_proc.git_cmd = 1; strvec_push(&pack_proc.args, "pack-objects"); - if (opts.quiet) + if (opts->quiet) strvec_push(&pack_proc.args, "--quiet"); strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path); @@ -1027,9 +1026,9 @@ static int pack_loose(void) return result; } -static int maintenance_task_loose_objects(void) +static int maintenance_task_loose_objects(struct maintenance_opts *opts) { - return prune_packed() || pack_loose(); + return prune_packed(opts) || pack_loose(opts); } static int incremental_repack_auto_condition(void) @@ -1061,14 +1060,14 @@ static int incremental_repack_auto_condition(void) return count >= incremental_repack_auto_limit; } -static int multi_pack_index_write(void) +static int multi_pack_index_write(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "write", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); if (run_command(&child)) @@ -1077,7 +1076,7 @@ static int multi_pack_index_write(void) return 0; } -static int rewrite_multi_pack_index(void) +static int rewrite_multi_pack_index(struct maintenance_opts *opts) { struct repository *r = the_repository; char *midx_name = get_midx_filename(r->objects->odb->path); @@ -1085,17 +1084,18 @@ static int rewrite_multi_pack_index(void) unlink(midx_name); free(midx_name); - return multi_pack_index_write(); + return multi_pack_index_write(opts); } -static int multi_pack_index_verify(const char *message) +static int multi_pack_index_verify(struct maintenance_opts *opts, + const char *message) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "verify", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); if (run_command(&child)) { @@ -1106,14 +1106,14 @@ static int multi_pack_index_verify(const char *message) return 0; } -static int multi_pack_index_expire(void) +static int multi_pack_index_expire(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "expire", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); close_object_store(the_repository->objects); @@ -1164,14 +1164,14 @@ static off_t get_auto_pack_size(void) return result_size; } -static int multi_pack_index_repack(void) +static int multi_pack_index_repack(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "repack", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); strvec_pushf(&child.args, "--batch-size=%"PRIuMAX, @@ -1185,7 +1185,7 @@ static int multi_pack_index_repack(void) return 0; } -static int maintenance_task_incremental_repack(void) +static int maintenance_task_incremental_repack(struct maintenance_opts *opts) { prepare_repo_settings(the_repository); if (!the_repository->settings.core_multi_pack_index) { @@ -1193,22 +1193,22 @@ static int maintenance_task_incremental_repack(void) return 0; } - if (multi_pack_index_write()) + if (multi_pack_index_write(opts)) return 1; - if (multi_pack_index_verify("after initial write")) - return rewrite_multi_pack_index(); - if (multi_pack_index_expire()) + if (multi_pack_index_verify(opts, "after initial write")) + return rewrite_multi_pack_index(opts); + if (multi_pack_index_expire(opts)) return 1; - if (multi_pack_index_verify("after expire step")) - return !!rewrite_multi_pack_index(); - if (multi_pack_index_repack()) + if (multi_pack_index_verify(opts, "after expire step")) + return !!rewrite_multi_pack_index(opts); + if (multi_pack_index_repack(opts)) return 1; - if (multi_pack_index_verify("after repack step")) - return !!rewrite_multi_pack_index(); + if (multi_pack_index_verify(opts, "after repack step")) + return !!rewrite_multi_pack_index(opts); return 0; } -typedef int maintenance_task_fn(void); +typedef int maintenance_task_fn(struct maintenance_opts *opts); /* * An auto condition function returns 1 if the task should run @@ -1275,9 +1275,9 @@ static int compare_tasks_by_selection(const void *a_, const void *b_) return b->selected_order - a->selected_order; } -static int maintenance_run(void) +static int maintenance_run(struct maintenance_opts *opts) { - int i; + int i, found_selected = 0; int result = 0; struct lock_file lk; struct repository *r = the_repository; @@ -1291,7 +1291,7 @@ static int maintenance_run(void) * recursive process stack. Do not report an error in * that case. */ - if (!opts.auto_flag && !opts.quiet) + if (!opts->auto_flag && !opts->quiet) error(_("lock file '%s' exists, skipping maintenance"), lock_path); free(lock_path); @@ -1299,23 +1299,26 @@ static int maintenance_run(void) } free(lock_path); - if (opts.tasks_selected) + for (i = 0; !found_selected && i < TASK__COUNT; i++) + found_selected = tasks[i].selected; + + if (found_selected) QSORT(tasks, TASK__COUNT, compare_tasks_by_selection); for (i = 0; i < TASK__COUNT; i++) { - if (opts.tasks_selected && !tasks[i].selected) + if (found_selected && !tasks[i].selected) continue; - if (!opts.tasks_selected && !tasks[i].enabled) + if (!found_selected && !tasks[i].enabled) continue; - if (opts.auto_flag && + if (opts->auto_flag && (!tasks[i].auto_condition || !tasks[i].auto_condition())) continue; trace2_region_enter("maintenance", tasks[i].name, r); - if (tasks[i].fn()) { + if (tasks[i].fn(opts)) { error(_("task '%s' failed"), tasks[i].name); result = 1; } @@ -1349,17 +1352,15 @@ static void initialize_task_config(void) static int task_option_parse(const struct option *opt, const char *arg, int unset) { - int i; + int i, num_selected = 0; struct maintenance_task *task = NULL; BUG_ON_OPT_NEG(unset); - opts.tasks_selected++; - for (i = 0; i < TASK__COUNT; i++) { + num_selected += tasks[i].selected; if (!strcasecmp(tasks[i].name, arg)) { task = &tasks[i]; - break; } } @@ -1374,13 +1375,14 @@ static int task_option_parse(const struct option *opt, } task->selected = 1; - task->selected_order = opts.tasks_selected; + task->selected_order = num_selected + 1; return 0; } int cmd_maintenance(int argc, const char **argv, const char *prefix) { + static struct maintenance_opts opts; static struct option builtin_maintenance_options[] = { OPT_BOOL(0, "auto", &opts.auto_flag, N_("run tasks based on the state of the repository")), @@ -1408,7 +1410,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix) if (argc == 1) { if (!strcmp(argv[0], "run")) - return maintenance_run(); + return maintenance_run(&opts); } usage_with_options(builtin_maintenance_usage,

gitgitgadget · 2020-07-23T20:25:29Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +--[no-]maintenance:: > --[no-]auto-gc:: > - Run `git gc --auto` at the end to perform garbage collection > - if needed. This is enabled by default. > + Run `git maintenance run --auto` at the end to perform garbage > + collection if needed. This is enabled by default. Shouldn't the new synonym be called --auto-maintenance or an abbreviation thereof? It is not like we will run the full maintenance suite when "--no-maintenance" is omitted, which certainly is not the impression we want to give our readers. > These objects may be removed by normal Git operations (such as `git commit`) > -which automatically call `git gc --auto`. (See linkgit:git-gc[1].) > -If these objects are removed and were referenced by the cloned repository, > -then the cloned repository will become corrupt. > +which automatically call `git maintenance run --auto` and `git gc --auto`. Hmph. Perhaps the picture may change in the end of the series but I got an impression that "gc --auto" would eventually become just part of "maintenance --auto" and the users won't have to be even aware of its existence? Wouldn't we rather want to say something like --[no-]auto-maintenance:: --[no-]auto-gc:: Run `git maintenance run --auto` at the end to perform garbage collection if needed (`--[no-]auto-gc` is a synonym). This is enabled by default. > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 82ac4be8a5..49a4d727d4 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = { > OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"), > N_("report that we have only objects reachable from this object")), > OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), > + OPT_BOOL(0, "maintenance", &enable_auto_gc, > + N_("run 'maintenance --auto' after fetching")), > OPT_BOOL(0, "auto-gc", &enable_auto_gc, > + N_("run 'maintenance --auto' after fetching")), OK, so this is truly a backward-compatible synonym at this point. > diff --git a/run-command.c b/run-command.c > index 9b3a57d1e3..82ad241638 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1865,14 +1865,17 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, > return result; > } > > -int run_auto_gc(int quiet) > +int run_auto_maintenance(int quiet) > { > struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; > int status; > > - argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL); > + argv_array_pushl(&argv_gc_auto, "maintenance", "run", "--auto", NULL); > if (quiet) > argv_array_push(&argv_gc_auto, "--quiet"); > + else > + argv_array_push(&argv_gc_auto, "--no-quiet"); > + > status = run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD); > argv_array_clear(&argv_gc_auto); > return status; Don't we want to replace all _gc_ with _maintenance_ in this function? I think the first business before we can do so would be to rethink if spelling out "maintenance" fully in code is a good idea in the first space. It would make names for variables, structures and fields unnecessarily long without contributing to ease of understanding an iota, and a easy-to-remember short-form or an abbreviation may be needed. Using a short-form/abbreviation wouldn't worsen the end-user experience, and not the developer experience for that matter. If we choose "gc" as the short-hand, most of the change in this step would become unnecessary. I also do not mind if we some other words or word-fragment (perhaps "maint"???) is chosen. > diff --git a/run-command.h b/run-command.h > index 191dfcdafe..d9a800e700 100644 > --- a/run-command.h > +++ b/run-command.h > @@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args); > /* > * Trigger an auto-gc > */ > -int run_auto_gc(int quiet); > +int run_auto_maintenance(int quiet); > > #define RUN_COMMAND_NO_STDIN 1 > #define RUN_GIT_CMD 2 /*If this is to be git sub-command */ > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index a66dbe0bde..9850ecde5d 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -919,7 +919,7 @@ test_expect_success 'fetching with auto-gc does not lock up' ' > git config fetch.unpackLimit 1 && > git config gc.autoPackLimit 1 && > git config gc.autoDetach false && > - GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 && > + GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 && > test_i18ngrep "Auto packing the repository" fetch.out && > ! grep "Should I try again" fetch.out > )

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Jul 23, 2020 at 01:21:55PM -0700, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +--[no-]maintenance:: > > --[no-]auto-gc:: > > - Run `git gc --auto` at the end to perform garbage collection > > - if needed. This is enabled by default. > > + Run `git maintenance run --auto` at the end to perform garbage > > + collection if needed. This is enabled by default. > > Shouldn't the new synonym be called --auto-maintenance or an > abbreviation thereof? It is not like we will run the full > maintenance suite when "--no-maintenance" is omitted, which > certainly is not the impression we want to give our readers. > > > These objects may be removed by normal Git operations (such as `git commit`) > > -which automatically call `git gc --auto`. (See linkgit:git-gc[1].) > > -If these objects are removed and were referenced by the cloned repository, > > -then the cloned repository will become corrupt. > > +which automatically call `git maintenance run --auto` and `git gc --auto`. > > Hmph. Perhaps the picture may change in the end of the series but I > got an impression that "gc --auto" would eventually become just part > of "maintenance --auto" and the users won't have to be even aware of > its existence? Wouldn't we rather want to say something like > > --[no-]auto-maintenance:: > --[no-]auto-gc:: > Run `git maintenance run --auto` at the end to perform > garbage collection if needed (`--[no-]auto-gc` is a > synonym). This is enabled by default. > > > diff --git a/builtin/fetch.c b/builtin/fetch.c > > index 82ac4be8a5..49a4d727d4 100644 > > --- a/builtin/fetch.c > > +++ b/builtin/fetch.c > > @@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = { > > OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"), > > N_("report that we have only objects reachable from this object")), > > OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), > > > + OPT_BOOL(0, "maintenance", &enable_auto_gc, > > + N_("run 'maintenance --auto' after fetching")), > > OPT_BOOL(0, "auto-gc", &enable_auto_gc, > > + N_("run 'maintenance --auto' after fetching")), > > OK, so this is truly a backward-compatible synonym at this point. I wouldn't be opposed to making the 'auto-gc' option an 'OPT_HIDDEN_BOOL', but I realize that users may not want to move as quickly as that. Perhaps we should visit this in a couple of releases (or perhaps you are getting to it in a later patch that I haven't read yet). > > diff --git a/run-command.c b/run-command.c > > index 9b3a57d1e3..82ad241638 100644 > > --- a/run-command.c > > +++ b/run-command.c > > @@ -1865,14 +1865,17 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, > > return result; > > } > > > > -int run_auto_gc(int quiet) > > +int run_auto_maintenance(int quiet) > > { > > struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; > > int status; > > > > - argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL); > > + argv_array_pushl(&argv_gc_auto, "maintenance", "run", "--auto", NULL); > > if (quiet) > > argv_array_push(&argv_gc_auto, "--quiet"); > > + else > > + argv_array_push(&argv_gc_auto, "--no-quiet"); > > + > > status = run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD); > > argv_array_clear(&argv_gc_auto); > > return status; > > Don't we want to replace all _gc_ with _maintenance_ in this > function? I think the first business before we can do so would be > to rethink if spelling out "maintenance" fully in code is a good > idea in the first space. It would make names for variables, > structures and fields unnecessarily long without contributing to > ease of understanding an iota, and a easy-to-remember short-form or > an abbreviation may be needed. Using a short-form/abbreviation > wouldn't worsen the end-user experience, and not the developer > experience for that matter. > > If we choose "gc" as the short-hand, most of the change in this step > would become unnecessary. I also do not mind if we some other words > or word-fragment (perhaps "maint"???) is chosen. Yeah, writing out 'maintenance' every time in the code and in command-line arguments is kind of a mouthful. I'm more willing to accept that '--maintenance' is something that users would write or script around, but 'maint' makes sense to me as a shorthand in the code. I could go either way on calling the command-line flag '--maint', though. > > > diff --git a/run-command.h b/run-command.h > > index 191dfcdafe..d9a800e700 100644 > > --- a/run-command.h > > +++ b/run-command.h > > @@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args); > > /* > > * Trigger an auto-gc > > */ > > -int run_auto_gc(int quiet); > > +int run_auto_maintenance(int quiet); > > > > #define RUN_COMMAND_NO_STDIN 1 > > #define RUN_GIT_CMD 2 /*If this is to be git sub-command */ > > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > > index a66dbe0bde..9850ecde5d 100755 > > --- a/t/t5510-fetch.sh > > +++ b/t/t5510-fetch.sh > > @@ -919,7 +919,7 @@ test_expect_success 'fetching with auto-gc does not lock up' ' > > git config fetch.unpackLimit 1 && > > git config gc.autoPackLimit 1 && > > git config gc.autoDetach false && > > - GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 && > > + GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 && > > test_i18ngrep "Auto packing the repository" fetch.out && > > ! grep "Should I try again" fetch.out > > ) Thanks, Taylor

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 4:21 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> +--[no-]maintenance:: >> --[no-]auto-gc:: >> - Run `git gc --auto` at the end to perform garbage collection >> - if needed. This is enabled by default. >> + Run `git maintenance run --auto` at the end to perform garbage >> + collection if needed. This is enabled by default. > > Shouldn't the new synonym be called --auto-maintenance or an > abbreviation thereof? It is not like we will run the full > maintenance suite when "--no-maintenance" is omitted, which > certainly is not the impression we want to give our readers. Adding "auto-" to the argument can be informative. I would think that abbreviating the option may make understanding the argument more difficult for users where English is not their first language. Of course, I'm a bad judge of that. I also don't think this option is called often by users at the command-line and instead is done by scripts and third-party tools. Abbreviating the word may have less value there. (This reminds me that is would be good to have a config option that prevents running this process _at all_. Sure, gc.auto=0 prevents the command from doing anything, but there is still an extra cost of starting a process. This is more of a problem on Windows. Making a note to self here.) >> These objects may be removed by normal Git operations (such as `git commit`) >> -which automatically call `git gc --auto`. (See linkgit:git-gc[1].) >> -If these objects are removed and were referenced by the cloned repository, >> -then the cloned repository will become corrupt. >> +which automatically call `git maintenance run --auto` and `git gc --auto`. > > Hmph. Perhaps the picture may change in the end of the series but I > got an impression that "gc --auto" would eventually become just part > of "maintenance --auto" and the users won't have to be even aware of > its existence? Wouldn't we rather want to say something like > > --[no-]auto-maintenance:: > --[no-]auto-gc:: > Run `git maintenance run --auto` at the end to perform > garbage collection if needed (`--[no-]auto-gc` is a > synonym). This is enabled by default. I don't completely eliminate 'git gc' at the end of this series, but instead hope that we can peel it apart slowly in follow-up series. However, you are correct that I should be more careful to obliterate it from the documentation instead of keeping references around. >> diff --git a/builtin/fetch.c b/builtin/fetch.c >> index 82ac4be8a5..49a4d727d4 100644 >> --- a/builtin/fetch.c >> +++ b/builtin/fetch.c >> @@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = { >> OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"), >> N_("report that we have only objects reachable from this object")), >> OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), > >> + OPT_BOOL(0, "maintenance", &enable_auto_gc, >> + N_("run 'maintenance --auto' after fetching")), >> OPT_BOOL(0, "auto-gc", &enable_auto_gc, >> + N_("run 'maintenance --auto' after fetching")), > > OK, so this is truly a backward-compatible synonym at this point. > >> diff --git a/run-command.c b/run-command.c >> index 9b3a57d1e3..82ad241638 100644 >> --- a/run-command.c >> +++ b/run-command.c >> @@ -1865,14 +1865,17 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, >> return result; >> } >> >> -int run_auto_gc(int quiet) >> +int run_auto_maintenance(int quiet) >> { >> struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; >> int status; >> >> - argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL); >> + argv_array_pushl(&argv_gc_auto, "maintenance", "run", "--auto", NULL); >> if (quiet) >> argv_array_push(&argv_gc_auto, "--quiet"); >> + else >> + argv_array_push(&argv_gc_auto, "--no-quiet"); >> + >> status = run_command_v_opt(argv_gc_auto.argv, RUN_GIT_CMD); >> argv_array_clear(&argv_gc_auto); >> return status; > > Don't we want to replace all _gc_ with _maintenance_ in this > function? I think the first business before we can do so would be > to rethink if spelling out "maintenance" fully in code is a good > idea in the first space. It would make names for variables, > structures and fields unnecessarily long without contributing to > ease of understanding an iota, and a easy-to-remember short-form or > an abbreviation may be needed. Using a short-form/abbreviation > wouldn't worsen the end-user experience, and not the developer > experience for that matter. > > If we choose "gc" as the short-hand, most of the change in this step > would become unnecessary. I also do not mind if we some other words > or word-fragment (perhaps "maint"???) is chosen. Yes, I should have noticed that. Also, with Peff's feedback from another thread, the method can look a bit simpler this way: int run_auto_maintenance(int quiet) { struct child_process maint = CHILD_PROCESS_INIT; maint.git_cmd = 1; argv_array_pushl(&maint.argv, "maintenance", "run", "--auto", NULL); if (quiet) argv_array_push(&maint.argv, "--quiet"); else argv_array_push(&maint.argv, "--no-quiet"); return run_command(&maint.argv); } (I will update it again to work on Peff's argv_array work, but hopefully it is clear how this is simpler.) Thanks, -Stolee

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/30/2020 9:29 AM, Derrick Stolee wrote: > On 7/23/2020 4:21 PM, Junio C Hamano wrote: >> Don't we want to replace all _gc_ with _maintenance_ in this >> function? I think the first business before we can do so would be >> to rethink if spelling out "maintenance" fully in code is a good >> idea in the first space. It would make names for variables, >> structures and fields unnecessarily long without contributing to >> ease of understanding an iota, and a easy-to-remember short-form or >> an abbreviation may be needed. Using a short-form/abbreviation >> wouldn't worsen the end-user experience, and not the developer >> experience for that matter. >> >> If we choose "gc" as the short-hand, most of the change in this step >> would become unnecessary. I also do not mind if we some other words >> or word-fragment (perhaps "maint"???) is chosen. > > Yes, I should have noticed that. Also, with Peff's feedback from > another thread, the method can look a bit simpler this way: It would help if I actually _compile_ code before sending it. Here is the fixed version: int run_auto_maintenance(int quiet) { struct child_process maint = CHILD_PROCESS_INIT; maint.git_cmd = 1; argv_array_pushl(&maint.args, "maintenance", "run", "--auto", NULL); if (quiet) argv_array_push(&maint.args, "--quiet"); else argv_array_push(&maint.args, "--no-quiet"); return run_command(&maint); }

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Thu, Jul 30, 2020 at 9:31 AM Derrick Stolee <stolee@gmail.com> wrote: > int run_auto_maintenance(int quiet) > { > struct child_process maint = CHILD_PROCESS_INIT; > maint.git_cmd = 1; > > argv_array_pushl(&maint.args, "maintenance", "run", "--auto", NULL); > if (quiet) > argv_array_push(&maint.args, "--quiet"); > else > argv_array_push(&maint.args, "--no-quiet"); It's subjective, but this might be a good fit for the ternary operator: argv_array_pushl(&maint.args, "maintenance", "run", "--auto", NULL); argv_array_push(&maint.args, quiet ? "--quiet" : "--no-quiet"); Or even: argv_array_pushl(&maint.args, "maintenance", "run", "--auto", quiet ? "--quiet" : "--no-quiet", NULL); The latter is a bit less readable to me.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/30/2020 3:00 PM, Eric Sunshine wrote: > On Thu, Jul 30, 2020 at 9:31 AM Derrick Stolee <stolee@gmail.com> wrote: >> int run_auto_maintenance(int quiet) >> { >> struct child_process maint = CHILD_PROCESS_INIT; >> maint.git_cmd = 1; >> >> argv_array_pushl(&maint.args, "maintenance", "run", "--auto", NULL); >> if (quiet) >> argv_array_push(&maint.args, "--quiet"); >> else >> argv_array_push(&maint.args, "--no-quiet"); > > It's subjective, but this might be a good fit for the ternary operator: > > argv_array_pushl(&maint.args, "maintenance", "run", "--auto", NULL); > argv_array_push(&maint.args, quiet ? "--quiet" : "--no-quiet"); Good idea! Thanks. > Or even: > > argv_array_pushl(&maint.args, "maintenance", "run", "--auto", > quiet ? "--quiet" : "--no-quiet", NULL); > > The latter is a bit less readable to me. I agree. -Stolee

gitgitgadget · 2020-07-09T02:33:10Z

On the Git mailing list, Jonathan Tan wrote (reply to this):

> +static int run_write_commit_graph(struct repository *r) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + > + argv_array_pushl(&cmd, "-C", r->worktree, > + "commit-graph", "write", > + "--split", "--reachable", > + NULL); As mentioned in my reply to an earlier patch (sent a few minutes ago), this won't work if there are environment variables like GIT_DIR present. Besides that, the overall design looks reasonable.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/8/2020 10:29 PM, Jonathan Tan wrote: >> +static int run_write_commit_graph(struct repository *r) >> +{ >> + int result; >> + struct argv_array cmd = ARGV_ARRAY_INIT; >> + >> + argv_array_pushl(&cmd, "-C", r->worktree, >> + "commit-graph", "write", >> + "--split", "--reachable", >> + NULL); > > As mentioned in my reply to an earlier patch (sent a few minutes ago), > this won't work if there are environment variables like GIT_DIR present. Do we not pass GIT_DIR to the subcommand? Or does using "-C" override the GIT_DIR? Thanks, -Stolee

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jul 09, 2020 at 07:14:41AM -0400, Derrick Stolee wrote: > On 7/8/2020 10:29 PM, Jonathan Tan wrote: > >> +static int run_write_commit_graph(struct repository *r) > >> +{ > >> + int result; > >> + struct argv_array cmd = ARGV_ARRAY_INIT; > >> + > >> + argv_array_pushl(&cmd, "-C", r->worktree, > >> + "commit-graph", "write", > >> + "--split", "--reachable", > >> + NULL); > > > > As mentioned in my reply to an earlier patch (sent a few minutes ago), > > this won't work if there are environment variables like GIT_DIR present. > > Do we not pass GIT_DIR to the subcommand? Or does using "-C" override > the GIT_DIR? We do pass GIT_DIR to the subcommand, and "-C" does not override it. I think this code would work as long as "r" is the_repository, which it would be in the current code. But then the "-C" would be doing nothing useful (it might change to the top of the worktree if we weren't there for some reason, but I don't think "commit-graph write" would care either way). But if "r" is some other repository, "commit-graph" would continue to operate in the parent process repository because of the inherited GIT_DIR. Using "--git-dir" would solve that, but as a general practice, if you're spawning a sub-process that might be in another repository, you should clear any repo-specific environment variables. The list is in local_repo_env, which you can feed to the "env" or "env_array" parameter of a child_process (see the use in connect.c for an example). Even in the current scheme where "r" is always the_repository, I suspect this might still be buggy. If we're in a bare repository, presumably r->worktree would be NULL. -Peff

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/9/2020 6:52 PM, Jeff King wrote: > On Thu, Jul 09, 2020 at 07:14:41AM -0400, Derrick Stolee wrote: > >> On 7/8/2020 10:29 PM, Jonathan Tan wrote: >>>> +static int run_write_commit_graph(struct repository *r) >>>> +{ >>>> + int result; >>>> + struct argv_array cmd = ARGV_ARRAY_INIT; >>>> + >>>> + argv_array_pushl(&cmd, "-C", r->worktree, >>>> + "commit-graph", "write", >>>> + "--split", "--reachable", >>>> + NULL); >>> >>> As mentioned in my reply to an earlier patch (sent a few minutes ago), >>> this won't work if there are environment variables like GIT_DIR present. >> >> Do we not pass GIT_DIR to the subcommand? Or does using "-C" override >> the GIT_DIR? > > We do pass GIT_DIR to the subcommand, and "-C" does not override it. I > think this code would work as long as "r" is the_repository, which it > would be in the current code. But then the "-C" would be doing nothing > useful (it might change to the top of the worktree if we weren't there > for some reason, but I don't think "commit-graph write" would care > either way). > > But if "r" is some other repository, "commit-graph" would continue to > operate in the parent process repository because of the inherited > GIT_DIR. Using "--git-dir" would solve that, but as a general practice, > if you're spawning a sub-process that might be in another repository, > you should clear any repo-specific environment variables. The list is in > local_repo_env, which you can feed to the "env" or "env_array" parameter > of a child_process (see the use in connect.c for an example). > > Even in the current scheme where "r" is always the_repository, I suspect > this might still be buggy. If we're in a bare repository, presumably > r->worktree would be NULL. Ah. I'll investigate this more and work to create a way to run a subcommand in a given repository. Your pointers will help a lot. Thanks, -Stolee

gitgitgadget · 2020-07-09T02:36:14Z

On the Git mailing list, Jonathan Tan wrote (reply to this):

> 3. By adding a new refspec "+refs/heads/*:refs/hidden/<remote>/*" > we can ensure that we actually load the new values somewhere in > our refspace while not updating refs/heads or refs/remotes. By > storing these refs here, the commit-graph job will update the > commit-graph with the commits from these hidden refs. > > 4. --prune will delete the refs/hidden/<remote> refs that no > longer appear on the remote. Having a ref path where Git can place commit IDs that it needs persisted is useful, not only in this case but in other cases (e.g. when fetching a submodule commit by hash, we might not have a ref name for that commit but want to persist it anyway), so I look forward to having something like this. The name of this special ref path and its specific nature could be discussed further, but maybe it is sufficient for now to just say that the refs under this special ref path are controlled by Git, and their layout is experimental and subject to change (e.g. future versions of Git could just erase the entire path and rewrite the refs its own way).

gitgitgadget · 2020-07-23T20:25:28Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 35b0be7d40..9204762e21 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -73,6 +73,10 @@ OPTIONS > --quiet:: > Do not report progress or other information over `stderr`. > > +--task=<task>:: > + If this option is specified one or more times, then only run the > + specified tasks in the specified order. > + > GIT > --- > Part of the linkgit:git[1] suite > diff --git a/builtin/gc.c b/builtin/gc.c > index 2cd17398ec..c58dea6fa5 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -710,6 +710,7 @@ static const char * const builtin_maintenance_usage[] = { > static struct maintenance_opts { > int auto_flag; > int quiet; > + int tasks_selected; > } opts; > > static int run_write_commit_graph(void) > @@ -804,20 +805,38 @@ typedef int maintenance_task_fn(void); > struct maintenance_task { > const char *name; > maintenance_task_fn *fn; > - unsigned enabled:1; > + int task_order; > + unsigned enabled:1, > + selected:1; > }; > > static struct maintenance_task *tasks[MAX_NUM_TASKS]; > static int num_tasks; > > +static int compare_tasks_by_selection(const void *a_, const void *b_) > +{ > + const struct maintenance_task *a, *b; > + a = (const struct maintenance_task *)a_; > + b = (const struct maintenance_task *)b_; > + > + return b->task_order - a->task_order; > +} It forces the reader to know intimately that task_order *is* selection order in order to understand why this is "tasks by selection". Perhaps rename the field to match what it is (i.e. "selection_order")? > static int maintenance_run(void) > { > int i; > int result = 0; > > + if (opts.tasks_selected) > + QSORT(tasks, num_tasks, compare_tasks_by_selection); > + > for (i = 0; !result && i < num_tasks; i++) { > - if (!tasks[i]->enabled) > + if (opts.tasks_selected && !tasks[i]->selected) > + continue; > + > + if (!opts.tasks_selected && !tasks[i]->enabled) > continue; I am not sure about this. Even if the task <x> is disabled, if the user says --task=<x>, it is run anyway? Doesn't make an immediate sense to me. As I am bad at deciphering de Morgan, I'd find it easier to read if the first were written more like so: if (!(!opts.tasks_selected || tasks[i]->selected)) continue; That is, "do not skip any when not limited, and do not skip the ones that are selected when limited". And that would easily extend to if (!tasks[i]->enabled || !(!opts.tasks_selected || tasks[i]->selected)) continue; > + > result = tasks[i]->fn(); > } > @@ -842,6 +861,44 @@ static void initialize_tasks(void) > num_tasks++; > } > > +static int task_option_parse(const struct option *opt, > + const char *arg, int unset) > +{ > + int i; > + struct maintenance_task *task = NULL; > + > + BUG_ON_OPT_NEG(unset); > + > + if (!arg || !strlen(arg)) { > + error(_("--task requires a value")); > + return 1; There is no need to special case an empty string that was explicitly given as the value---it will be caught as "'' is not a valid task". > + } > + > + opts.tasks_selected++; > + > + for (i = 0; i < MAX_NUM_TASKS; i++) { > + if (tasks[i] && !strcasecmp(tasks[i]->name, arg)) { > + task = tasks[i]; > + break; > + } > + } > + > + if (!task) { > + error(_("'%s' is not a valid task"), arg); > + return 1; > + } > + > + if (task->selected) { > + error(_("task '%s' cannot be selected multiple times"), arg); > + return 1; > + } > + > + task->selected = 1; > + task->task_order = opts.tasks_selected; > + > + return 0; > +}

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes: >> for (i = 0; !result && i < num_tasks; i++) { >> - if (!tasks[i]->enabled) >> + if (opts.tasks_selected && !tasks[i]->selected) >> + continue; >> + >> + if (!opts.tasks_selected && !tasks[i]->enabled) >> continue; > > I am not sure about this. Even if the task <x> is disabled, if the > user says --task=<x>, it is run anyway? Doesn't make an immediate > sense to me. OK, after seeing the title of 12/18, it does make sense. Even if it is disabled by default, you could still choose to run, which makes sense.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 4:21 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> +static int compare_tasks_by_selection(const void *a_, const void *b_) >> +{ >> + const struct maintenance_task *a, *b; >> + a = (const struct maintenance_task *)a_; >> + b = (const struct maintenance_task *)b_; >> + >> + return b->task_order - a->task_order; >> +} > > It forces the reader to know intimately that task_order *is* > selection order in order to understand why this is "tasks by > selection". Perhaps rename the field to match what it is > (i.e. "selection_order")? Good idea. I made this fix locally. >> static int maintenance_run(void) >> { >> int i; >> int result = 0; >> >> + if (opts.tasks_selected) >> + QSORT(tasks, num_tasks, compare_tasks_by_selection); >> + >> for (i = 0; !result && i < num_tasks; i++) { >> - if (!tasks[i]->enabled) >> + if (opts.tasks_selected && !tasks[i]->selected) >> + continue; >> + >> + if (!opts.tasks_selected && !tasks[i]->enabled) >> continue; > > I am not sure about this. Even if the task <x> is disabled, if the > user says --task=<x>, it is run anyway? Doesn't make an immediate > sense to me. You already replied that you figured this out. However, I could make it easier by adding some foreshadowing in the commit message here. > As I am bad at deciphering de Morgan, I'd find it easier to read if > the first were written more like so: > > if (!(!opts.tasks_selected || tasks[i]->selected)) > continue; > > That is, "do not skip any when not limited, and do not skip the ones > that are selected when limited". And that would easily extend to > > if (!tasks[i]->enabled || > !(!opts.tasks_selected || tasks[i]->selected)) > continue; This isn't quite right, due to the confusing nature of "enabled". The condition here will _never_ allow selecting a disabled task. Perhaps it would be better to rename 'enabled' to 'run_by_default'? That would make it clear that it is choosing which tasks to run unless specified otherwise with --task=<task> here, the config option maintenance.<task>.enabled later, and the --auto conditions even later. Looking even farther down the line (into the next series) there will be similar checks for auto-conditionschecking time-based schedules. Since this loop becomes more complicated in the future, I specifically wanted to group the "skip this task" conditions into their own if blocks: 1. If the user didn't specify --task=<task> explicitly and this task is disabled, then skip this task. 2. If the user _did_ specify --task=<task> explicitly and this task was not on the list, then skip this task. 3. If the user specified --auto and the auto condition fails, then skip this task. 4. (Later) If the user specified --scheduled and the time since the last run is too soon, then skip this task. With this being the planned future, I'd prefer these be split out as separate if conditions instead of a giant combined if. And since that is the plan, then I won't work too hard to combine conditions 1 and 2 into a single condition. >> + >> result = tasks[i]->fn(); >> } > >> @@ -842,6 +861,44 @@ static void initialize_tasks(void) >> num_tasks++; >> } >> >> +static int task_option_parse(const struct option *opt, >> + const char *arg, int unset) >> +{ >> + int i; >> + struct maintenance_task *task = NULL; >> + >> + BUG_ON_OPT_NEG(unset); >> + >> + if (!arg || !strlen(arg)) { >> + error(_("--task requires a value")); >> + return 1; > > There is no need to special case an empty string that was explicitly > given as the value---it will be caught as "'' is not a valid task". Sounds good. No need for this extra message. Thanks, -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: >>> for (i = 0; !result && i < num_tasks; i++) { >>> - if (!tasks[i]->enabled) >>> + if (opts.tasks_selected && !tasks[i]->selected) >>> + continue; >>> + >>> + if (!opts.tasks_selected && !tasks[i]->enabled) >>> continue; >> >> I am not sure about this. Even if the task <x> is disabled, if the >> user says --task=<x>, it is run anyway? Doesn't make an immediate >> sense to me. >> ... >> if (!tasks[i]->enabled || >> !(!opts.tasks_selected || tasks[i]->selected)) >> continue; > > This isn't quite right, due to the confusing nature of "enabled". Yes, in the message you are responding to, I was still assuming that --task=foo that defeat task.foo.enabled=no would be a bug. If we want to run disabled tasks by selection, of course the condition would need to change.

gitgitgadget · 2020-07-23T20:25:31Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static int maintenance_task_commit_graph(void) > +{ > + struct repository *r = the_repository; > + char *chain_path; > + > + /* Skip commit-graph when --auto is specified. */ > + if (opts.auto_flag) > + return 0; Stepping back a bit, back in "git gc" days, "--auto" had two distinct meanings rolled into one. Check if it even needs to be done, and perform only the lightweight variant if needed. For this task, there is no "lightweight variant" is possible, so returning without checking the need to do a lightweight one makes perfect sense here. But wouldn't it suggest perhaps we could name "auto" field of the options struct in a more meaningful way? Perhaps "quick" (i.e. only the quicker-variant of the maintenance job) or something? > + close_object_store(r->objects); > + if (run_write_commit_graph()) { > + error(_("failed to write commit-graph")); > + return 1; > + } > + > + if (!run_verify_commit_graph()) > + return 0; > + > + warning(_("commit-graph verify caught error, rewriting")); > + > + chain_path = get_commit_graph_chain_filename(r->objects->odb); > + if (unlink(chain_path)) { > + UNLEAK(chain_path); > + die(_("failed to remove commit-graph at %s"), chain_path); OK. > + } > + free(chain_path); > + > + if (!run_write_commit_graph()) > + return 0; > + > + error(_("failed to rewrite commit-graph")); > + return 1; > +} Error convention is "positive for error, zero for success?" That is a bit unusual for our internal API. > static int maintenance_task_gc(void) > { > int result; > @@ -768,6 +836,10 @@ static void initialize_tasks(void) > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; > num_tasks++; > + > + tasks[num_tasks]->name = "commit-graph"; > + tasks[num_tasks]->fn = maintenance_task_commit_graph; > + num_tasks++; Again, I am not sure if we want to keep piling on this. > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index e4e4036e50..216ac0b19e 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -12,7 +12,7 @@ test_expect_success 'help text' ' > test_i18ngrep "usage: git maintenance run" err > ' > > -test_expect_success 'gc [--auto|--quiet]' ' > +test_expect_success 'run [--auto|--quiet]' ' It does not look like this change belongs here. If "run" is appropriate title for this test at this step, it must have been so in the previous step. > GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run --no-quiet && > GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && > GIT_TRACE2_EVENT="$(pwd)/run-quiet.txt" git maintenance run --quiet &&

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 4:22 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> +static int maintenance_task_commit_graph(void) >> +{ >> + struct repository *r = the_repository; >> + char *chain_path; >> + >> + /* Skip commit-graph when --auto is specified. */ >> + if (opts.auto_flag) >> + return 0; Now that you point this out, this is actually a stray condition from an earlier version. We now have the ".enabled" config and the auto condition function pointer. That handles all of that "should we run this when --auto is specified?" logic outside of the task itself. > Stepping back a bit, back in "git gc" days, "--auto" had two > distinct meanings rolled into one. Check if it even needs to be > done, and perform only the lightweight variant if needed. > > For this task, there is no "lightweight variant" is possible, so > returning without checking the need to do a lightweight one makes > perfect sense here. > > But wouldn't it suggest perhaps we could name "auto" field of the > options struct in a more meaningful way? Perhaps "quick" (i.e. only > the quicker-variant of the maintenance job) or something? But you are discussing here how the _behavior_ can change when --auto is specified. And specifically, "git gc --auto" really meant "This is running after a foreground command, so only do work if necessary and do it quickly to minimize blocking time." I'd be happy to replace "--auto" with "--quick" in the maintenance builtin. This opens up some extra design space for how the individual tasks perform depending on "--quick" being specified or not. My intention was to create tasks that are already in "quick" mode: * loose-objects have a maximum batch size. * incremental-repack is capped in size. * commit-graph uses the --split option. But this "quick" distinction might be important for some of the tasks we intend to extract from the gc builtin. >> + close_object_store(r->objects); >> + if (run_write_commit_graph()) { >> + error(_("failed to write commit-graph")); >> + return 1; >> + } >> + >> + if (!run_verify_commit_graph()) >> + return 0; >> + >> + warning(_("commit-graph verify caught error, rewriting")); >> + >> + chain_path = get_commit_graph_chain_filename(r->objects->odb); >> + if (unlink(chain_path)) { >> + UNLEAK(chain_path); >> + die(_("failed to remove commit-graph at %s"), chain_path); > > OK. > >> + } >> + free(chain_path); >> + >> + if (!run_write_commit_graph()) >> + return 0; >> + >> + error(_("failed to rewrite commit-graph")); >> + return 1; >> +} > > Error convention is "positive for error, zero for success?" That is > a bit unusual for our internal API. Since the tasks are frequently running subcommands, returning 0 for success and non-zero for error matches the error codes returned by those subcommands. Should I instead change the behavior and clearly document that task functions matching maintenance_task_fn follow this error pattern? >> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh >> index e4e4036e50..216ac0b19e 100755 >> --- a/t/t7900-maintenance.sh >> +++ b/t/t7900-maintenance.sh >> @@ -12,7 +12,7 @@ test_expect_success 'help text' ' >> test_i18ngrep "usage: git maintenance run" err >> ' >> >> -test_expect_success 'gc [--auto|--quiet]' ' >> +test_expect_success 'run [--auto|--quiet]' ' > > It does not look like this change belongs here. If "run" is > appropriate title for this test at this step, it must have been so > in the previous step. Thanks. Will fix. -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: > But you are discussing here how the _behavior_ can change when > --auto is specified. And specifically, "git gc --auto" really > meant "This is running after a foreground command, so only do > work if necessary and do it quickly to minimize blocking time." > > I'd be happy to replace "--auto" with "--quick" in the > maintenance builtin. > > This opens up some extra design space for how the individual > tasks perform depending on "--quick" being specified or not. > My intention was to create tasks that are already in "quick" > mode: > > * loose-objects have a maximum batch size. > * incremental-repack is capped in size. > * commit-graph uses the --split option. > > But this "quick" distinction might be important for some of > the tasks we intend to extract from the gc builtin. Yup. To be honest, I came to this topic from a completely different direction. The field name "auto" alone (and no other field name) had to have an extra cruft (i.e. "_flag") attached to it, which is understandable but ugly. Then I started thinking if 'auto(matic)' is really the right word to describe what we want out of the option, and came to the realization that there may be better words. > Since the tasks are frequently running subcommands, returning > 0 for success and non-zero for error matches the error codes > returned by those subcommands. As long as these will _never_ be called from other helper functions but from the cmd_foo() top-level and their return values are only used directly as the top-level's return value, I do not mind too much. But whenever I am writing such a code, I find myself not brave enough to make such a bold promise (I saw other people call the helpers I wrote in unintended ways and had to adjust the semantics of them to accomodate the new callers too many times), so I'd rather see the caller do "return !!helper_fn()" to allow helper_fn() to be written more naturally (e.g. letting them return error(...)). Thanks.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Fri, Jul 24, 2020 at 12:47:00PM -0700, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > > > But you are discussing here how the _behavior_ can change when > > --auto is specified. And specifically, "git gc --auto" really > > meant "This is running after a foreground command, so only do > > work if necessary and do it quickly to minimize blocking time." > > > > I'd be happy to replace "--auto" with "--quick" in the > > maintenance builtin. > > > > This opens up some extra design space for how the individual > > tasks perform depending on "--quick" being specified or not. > > My intention was to create tasks that are already in "quick" > > mode: > > > > * loose-objects have a maximum batch size. > > * incremental-repack is capped in size. > > * commit-graph uses the --split option. > > > > But this "quick" distinction might be important for some of > > the tasks we intend to extract from the gc builtin. > > Yup. To be honest, I came to this topic from a completely different > direction. The field name "auto" alone (and no other field name) > had to have an extra cruft (i.e. "_flag") attached to it, which is > understandable but ugly. Then I started thinking if 'auto(matic)' > is really the right word to describe what we want out of the option, > and came to the realization that there may be better words. I wonder what the quick and slow paths are here. For the commit-graph code, what you wrote here seems to match what I'd expect with passing '--auto' in the sense of running 'git gc'. That is, I'm leaving it up to the commit-graph machinery's idea of the normal '--split' rules to figure out when to roll up layers of a commit-graph, as opposed to creating a new layer and extending the chain. So, I think that makes sense if the caller gave '--auto'. But, I'm not sure that it makes sense if they didn't, in which case I'd imagine something quicker to happen. There, I'd expect something more like: 1. Run 'git commit-graph write --reachable --split=no-merge'. 2. Run 'git commit-graph verify'. 3. If 'git commit-graph verify' failed, drop the existing commit graph and rebuild it with 'git commit-graph --reachable --split=replace'. 4. Otherwise, do nothing. I'm biased, of course, but I think that that matches roughly what I'd expect to happen in the fast/slow path. Granted, the steps to rebuild the commit graph are going to be slow no matter what (depending on the size of the repository), and so in that case maybe the commit-graph should just be dropped. I'm not really sure what to do about that... > > Since the tasks are frequently running subcommands, returning > > 0 for success and non-zero for error matches the error codes > > returned by those subcommands. > > As long as these will _never_ be called from other helper functions > but from the cmd_foo() top-level and their return values are only > used directly as the top-level's return value, I do not mind too > much. > > But whenever I am writing such a code, I find myself not brave > enough to make such a bold promise (I saw other people call the > helpers I wrote in unintended ways and had to adjust the semantics > of them to accomodate the new callers too many times), so I'd rather > see the caller do "return !!helper_fn()" to allow helper_fn() to be > written more naturally (e.g. letting them return error(...)). > > Thanks. Thanks, Taylor

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/24/2020 9:52 PM, Taylor Blau wrote: > On Fri, Jul 24, 2020 at 12:47:00PM -0700, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>> But you are discussing here how the _behavior_ can change when >>> --auto is specified. And specifically, "git gc --auto" really >>> meant "This is running after a foreground command, so only do >>> work if necessary and do it quickly to minimize blocking time." >>> >>> I'd be happy to replace "--auto" with "--quick" in the >>> maintenance builtin. >>> >>> This opens up some extra design space for how the individual >>> tasks perform depending on "--quick" being specified or not. >>> My intention was to create tasks that are already in "quick" >>> mode: >>> >>> * loose-objects have a maximum batch size. >>> * incremental-repack is capped in size. >>> * commit-graph uses the --split option. >>> >>> But this "quick" distinction might be important for some of >>> the tasks we intend to extract from the gc builtin. >> >> Yup. To be honest, I came to this topic from a completely different >> direction. The field name "auto" alone (and no other field name) >> had to have an extra cruft (i.e. "_flag") attached to it, which is >> understandable but ugly. Then I started thinking if 'auto(matic)' >> is really the right word to describe what we want out of the option, >> and came to the realization that there may be better words. > > I wonder what the quick and slow paths are here. For the commit-graph > code, what you wrote here seems to match what I'd expect with passing > '--auto' in the sense of running 'git gc'. That is, I'm leaving it up to > the commit-graph machinery's idea of the normal '--split' rules to > figure out when to roll up layers of a commit-graph, as opposed to > creating a new layer and extending the chain. I had intended all of my new tasks to be the "quick" version of their operations. The "slow" version would abandon hope of doing a small amount of work to create the best possible world for the repository. This would include: * The commit-graph would collapse all layers into one file. * The multi-pack-index repack would rewrite all object data into one pack-file. * The loose-objects task would not stop at a maximum number of loose objects (and would probably want to repack everything, anyway). I'm open to making this possibility more explicit by renaming "--auto" and just performing a translation to 'git gc --auto'. So, what should the name be? Here are a few options to consider: --quick --fast --limited --incremental -O[0|1|2...] (think GCC optimization flags, exposing granularity) --[non-]aggressive Regardless, this makes me rethink that the --[no-]maintenance option from PATCH 03/18 is better than --[no-]auto-maintenance, since we are really saying "run _some_ maintenance or _no_ maintenance" and the "how" of the maintenance is left intentionally vague. I've already made the change locally to add "auto-" so I'll wait for confirmation before reverting that change. > So, I think that makes sense if the caller gave '--auto'. But, I'm not > sure that it makes sense if they didn't, in which case I'd imagine > something quicker to happen. There, I'd expect something more like: > > 1. Run 'git commit-graph write --reachable --split=no-merge'. > 2. Run 'git commit-graph verify'. > 3. If 'git commit-graph verify' failed, drop the existing commit graph > and rebuild it with 'git commit-graph --reachable --split=replace'. > 4. Otherwise, do nothing. > > I'm biased, of course, but I think that that matches roughly what I'd > expect to happen in the fast/slow path. Granted, the steps to rebuild > the commit graph are going to be slow no matter what (depending on the > size of the repository), and so in that case maybe the commit-graph > should just be dropped. I'm not really sure what to do about that... I think this approach is the best we can do given the current behavior inside the commit-graph builtin. Perhaps in the future we could change the commit-graph builtin to include a "--verify" option so it could do the "git commit-graph verify --shallow" on the new layer before committing the new commit-graph-chain file and expiring old layers. That way, we would not need to delete and rewrite the whole thing when there is a problem writing the top layer. >>> Since the tasks are frequently running subcommands, returning >>> 0 for success and non-zero for error matches the error codes >>> returned by those subcommands. >> >> As long as these will _never_ be called from other helper functions >> but from the cmd_foo() top-level and their return values are only >> used directly as the top-level's return value, I do not mind too >> much. >> >> But whenever I am writing such a code, I find myself not brave >> enough to make such a bold promise (I saw other people call the >> helpers I wrote in unintended ways and had to adjust the semantics >> of them to accomodate the new callers too many times), so I'd rather >> see the caller do "return !!helper_fn()" to allow helper_fn() to be >> written more naturally (e.g. letting them return error(...)). I will try to be consistent here with the behavior: * 0 is success * 1 is failure Which is what I think you are implying by "return !!helper_fn()". Thanks, -Stolee

gitgitgadget · 2020-07-23T20:55:28Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 9204762e21..0927643247 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous > `commit-graph-chain` file. They will be deleted by a later run based on > the expiration delay. > > +prefetch:: > + The `fetch` task updates the object directory with the latest objects s/fetch/prefetch/ most likely. > + from all registered remotes. For each remote, a `git fetch` command > + is run. The refmap is custom to avoid updating local or remote s/remote/remote-tracking/ definitely. Do not forget the hyphen between the two words. I think it made the above unnecessarily confusing that you ended a sentence after "is run". It gives a wrong impression that you'd be doing a "real fetch", which you need to dispel with a follow up description of the refmap. For each remote, a `git fetch` command is run with a refspec to fetch their branches (those in their `refs/heads`) into our `refs/prefetch/<remote>/` hierarchy and without auto following tags (the configured refspec in the repository is ignored). > + branches (those in `refs/heads` or `refs/remotes`). Instead, the > + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are > + not updated. > ++ > +This means that foreground fetches are still required to update the > +remote refs, but the users is notified when the branches and tags are s/is notified/are notified/??? > +updated on the remote. Often, when one needs to say "X. This means Y.", X is a suboptimal way to explain what needs to be conveyed to the readers. But this is not such a case. Unlike the "This means" that is often an attempt to rephrasing a poor explanation given first, this gives an implication. But let's not start with a negative impression (i.e. even with prefetch, I still have to do X? What's the point???), but let them feel why it is a good thing. Perhaps (continuing my previous rewrite): This is done to avoid disrupting the remote-tracking branches---the end users expect them to stay unmoved unless they initiate a fetch. With prefetch task, however, the objects necessary to complete a later real fetch would already be obtained, so the real fetch would go faster. In the ideal case, it will just become an update to bunch of remote-tracking branches without any object transfer. or something like that? > + argv_array_pushl(&cmd, "fetch", remote, "--prune", > + "--no-tags", "--refmap=", NULL); > + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); > + argv_array_push(&cmd, refmap.buf); The command line looks somewhat fishy, but I think it is correct. At first glance it looks like a mistake to pass "--refmap=" and the refspec "+refs/heads/*:refs/prefetch/origin/*" as separate arguments, but I think that is exactly what you want here, i.e. - defeat any refspec in the configuration with "--refmap=<empty>" - give explicit refspec "+refs/heads/*:...", with "--no-tags" to decline auto-following, to tell what exactly are to be fetched and stored where. The description in the log message about refmap needs to be clarified, though (I've already done so in the above suggested rewrite). > +static int maintenance_task_prefetch(void) > +{ > + int result = 0; > + struct string_list_item *item; > + struct string_list remotes = STRING_LIST_INIT_DUP; > + > + if (for_each_remote(fill_each_remote, &remotes)) { > + error(_("failed to fill remotes")); > + result = 1; > + goto cleanup; > + } > + > + /* > + * Do not modify the result based on the success of the 'fetch' > + * operation, as a loss of network could cause 'fetch' to fail > + * quickly. We do not want that to stop the rest of our > + * background operations. > + */ The loop that runs different tasks abort at the first failure, though. Perhaps that loop needs to be rethought as well? > + for (item = remotes.items; > + item && item < remotes.items + remotes.nr; > + item++) > + fetch_remote(item->string); > + > +cleanup: > + string_list_clear(&remotes, 0); > + return result; > +} > + > static int maintenance_task_gc(void) > { > int result; > @@ -871,6 +929,10 @@ static void initialize_tasks(void) > for (i = 0; i < MAX_NUM_TASKS; i++) > tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); > > + tasks[num_tasks]->name = "prefetch"; > + tasks[num_tasks]->fn = maintenance_task_prefetch; > + num_tasks++; > + > tasks[num_tasks]->name = "gc"; > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; Two things. - As I said upfront, I do not see the point of preparing the table with code. - The reason why prefetch is placed in front is probably because you do not want to repack before you add more objects to the object store. But doesn't that imply that there is an inherent ordering that we, as those who are more expert on Git than the end users, prefer? Is it a wise decision to let the users affect the order of the tasks run by giving command line options in different order in the previous step?

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 4:53 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt >> index 9204762e21..0927643247 100644 >> --- a/Documentation/git-maintenance.txt >> +++ b/Documentation/git-maintenance.txt >> @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous >> `commit-graph-chain` file. They will be deleted by a later run based on >> the expiration delay. >> >> +prefetch:: >> + The `fetch` task updates the object directory with the latest objects > > s/fetch/prefetch/ most likely. > >> + from all registered remotes. For each remote, a `git fetch` command >> + is run. The refmap is custom to avoid updating local or remote > > s/remote/remote-tracking/ definitely. Do not forget the hyphen > between the two words. > > I think it made the above unnecessarily confusing that you ended a > sentence after "is run". It gives a wrong impression that you'd be > doing a "real fetch", which you need to dispel with a follow up > description of the refmap. > > For each remote, a `git fetch` command is run with a refspec > to fetch their branches (those in their `refs/heads`) into > our `refs/prefetch/<remote>/` hierarchy and without auto > following tags (the configured refspec in the repository is > ignored). > >> + branches (those in `refs/heads` or `refs/remotes`). Instead, the >> + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are >> + not updated. >> ++ >> +This means that foreground fetches are still required to update the >> +remote refs, but the users is notified when the branches and tags are > > s/is notified/are notified/??? > >> +updated on the remote. > > Often, when one needs to say "X. This means Y.", X is a suboptimal > way to explain what needs to be conveyed to the readers. But this > is not such a case. Unlike the "This means" that is often an > attempt to rephrasing a poor explanation given first, this gives an > implication. > > But let's not start with a negative impression (i.e. even with > prefetch, I still have to do X? What's the point???), but let them > feel why it is a good thing. Perhaps (continuing my previous > rewrite): > > This is done to avoid disrupting the remote-tracking > branches---the end users expect them to stay unmoved unless > they initiate a fetch. With prefetch task, however, the > objects necessary to complete a later real fetch would > already be obtained, so the real fetch would go faster. In > the ideal case, it will just become an update to bunch of > remote-tracking branches without any object transfer. > > or something like that? I like this clarification and have adapted it with minimal edits. >> + argv_array_pushl(&cmd, "fetch", remote, "--prune", >> + "--no-tags", "--refmap=", NULL); >> + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); >> + argv_array_push(&cmd, refmap.buf); > > The command line looks somewhat fishy, but I think it is correct. > At first glance it looks like a mistake to pass "--refmap=" and the > refspec "+refs/heads/*:refs/prefetch/origin/*" as separate arguments, > but I think that is exactly what you want here, i.e. > > - defeat any refspec in the configuration with "--refmap=<empty>" > > - give explicit refspec "+refs/heads/*:...", with "--no-tags" to > decline auto-following, to tell what exactly are to be fetched > and stored where. > > The description in the log message about refmap needs to be > clarified, though (I've already done so in the above suggested > rewrite). I could have made your life easier by referring to b40a50264ac (fetch: document and test --refmap="", 2020-01-21) in my commit message. It includes this sentence in Documentation/fetch-options.txt: Providing an empty `<refspec>` to the `--refmap` option causes Git to ignore the configured refspecs and rely entirely on the refspecs supplied as command-line arguments. >> +static int maintenance_task_prefetch(void) >> +{ >> + int result = 0; >> + struct string_list_item *item; >> + struct string_list remotes = STRING_LIST_INIT_DUP; >> + >> + if (for_each_remote(fill_each_remote, &remotes)) { >> + error(_("failed to fill remotes")); >> + result = 1; >> + goto cleanup; >> + } >> + >> + /* >> + * Do not modify the result based on the success of the 'fetch' >> + * operation, as a loss of network could cause 'fetch' to fail >> + * quickly. We do not want that to stop the rest of our >> + * background operations. >> + */ > > The loop that runs different tasks abort at the first failure, > though. Perhaps that loop needs to be rethought as well? You're right. These maintenance tasks are intended to be independent of each other, so let's try all of them and report a failure after all have been given an opportunity to run. That makes this failure behavior unnecessary. >> + for (item = remotes.items; >> + item && item < remotes.items + remotes.nr; >> + item++) >> + fetch_remote(item->string); >> + >> +cleanup: >> + string_list_clear(&remotes, 0); >> + return result; >> +} >> + >> static int maintenance_task_gc(void) >> { >> int result; >> @@ -871,6 +929,10 @@ static void initialize_tasks(void) >> for (i = 0; i < MAX_NUM_TASKS; i++) >> tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); >> >> + tasks[num_tasks]->name = "prefetch"; >> + tasks[num_tasks]->fn = maintenance_task_prefetch; >> + num_tasks++; >> + >> tasks[num_tasks]->name = "gc"; >> tasks[num_tasks]->fn = maintenance_task_gc; >> tasks[num_tasks]->enabled = 1; > > Two things. > > - As I said upfront, I do not see the point of preparing the table > with code. > > - The reason why prefetch is placed in front is probably because > you do not want to repack before you add more objects to the > object store. But doesn't that imply that there is an inherent > ordering that we, as those who are more expert on Git than the > end users, prefer? Is it a wise decision to let the users affect > the order of the tasks run by giving command line options in > different order in the previous step? I don't anticipate users specifying --task=<task> very often, as it requires deep knowledge of the tasks. If a user _does_ use the option, then we should trust their order as they might have a good reason to choose that order. Generally, my philosophy is to provide expert users with flexible choices while creating sensible defaults for non-expert users. That said, if this sort is more of a problem than the value it provides, then I can drop the sort and the duplicate --task=<task> check. Thanks, -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: > I don't anticipate users specifying --task=<task> very often, as > it requires deep knowledge of the tasks. If a user _does_ use the > option, then we should trust their order as they might have a > good reason to choose that order. > > Generally, my philosophy is to provide expert users with flexible > choices while creating sensible defaults for non-expert users. Sounds sensible.

gitgitgadget · 2020-07-23T21:07:40Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > Create a 'loose-objects' task for the 'git maintenance run' command. > This helps clean up loose objects without disrupting concurrent Git > commands using the following sequence of events: > > 1. Run 'git prune-packed' to delete any loose objects that exist > in a pack-file. Concurrent commands will prefer the packed > version of the object to the loose version. (Of course, there > are exceptions for commands that specifically care about the > location of an object. These are rare for a user to run on > purpose, and we hope a user that has selected background > maintenance will not be trying to do foreground maintenance.) OK. That would make sense. > 2. Run 'git pack-objects' on a batch of loose objects. These > objects are grouped by scanning the loose object directories in > lexicographic order until listing all loose objects -or- > reaching 50,000 objects. This is more than enough if the loose > objects are created only by a user doing normal development. I haven't seen this in action, but my gut feeling is that this would result in horrible locality and deltification in the resulting packfile. The order you feed the objects to pack-objects and the path hint you attach to each object matters quite a lot. I do agree that it would be useful to have a task to deal with only loose objects without touching existing packfiles. I just am not sure if 2. is a worthwhile thing to do. A poorly constructed pack will also contaminate later packfiles made without "-f" option to "git repack".

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 4:59 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Create a 'loose-objects' task for the 'git maintenance run' command. >> This helps clean up loose objects without disrupting concurrent Git >> commands using the following sequence of events: >> >> 1. Run 'git prune-packed' to delete any loose objects that exist >> in a pack-file. Concurrent commands will prefer the packed >> version of the object to the loose version. (Of course, there >> are exceptions for commands that specifically care about the >> location of an object. These are rare for a user to run on >> purpose, and we hope a user that has selected background >> maintenance will not be trying to do foreground maintenance.) > > OK. That would make sense. > >> 2. Run 'git pack-objects' on a batch of loose objects. These >> objects are grouped by scanning the loose object directories in >> lexicographic order until listing all loose objects -or- >> reaching 50,000 objects. This is more than enough if the loose >> objects are created only by a user doing normal development. > > I haven't seen this in action, but my gut feeling is that this would > result in horrible locality and deltification in the resulting > packfile. The order you feed the objects to pack-objects and the > path hint you attach to each object matters quite a lot. > > I do agree that it would be useful to have a task to deal with only > loose objects without touching existing packfiles. I just am not > sure if 2. is a worthwhile thing to do. A poorly constructed pack > will also contaminate later packfiles made without "-f" option to > "git repack". There are several factors going on here: * In a partial clone, it is likely that we get loose objects only due to a command like "git log -p" that downloads blobs one-by-one. In such a case, this step coming in later and picking up those blobs _will_ find good deltas because they are present in the same batch. * (I know this case isn't important to core Git, but please indulge me) In a VFS for Git repo, the loose objects correspond to blobs that were faulted in by a virtual filesystem read. In this case, the blobs are usually from a single commit in history, so good deltas between the blobs don't actually exist! * My experience indicates that the packs created by the loose-objects task are rather small (when created daily). This means that they get selected by the incremental-repack task to repack into a new pack-file where deltas are recomputed with modest success. As mentioned in that task, we saw a significant compression factor using that step for users of the Windows OS repo, mostly due to recomputing tree deltas. * Some amount of "extra" space is expected with this incremental repacking scheme. The most space-efficient thing to do is a full repack along with a tree walk that detects the paths used for each blob, allowing better hints for delta compression. However, that operation is very _time_ consuming. The trade-off here is something I should make more explicit. In my experience, disk space is cheap but CPU time is expensive. Most repositories could probably do a daily repack without being a disruption to the user. These steps enable maintenance for repositories where a full repack is too disruptive. I hope this adds some context. I would love if someone who knows more about delta compression could challenge my assumptions. Sharing that expertise can help create better maintenance strategies. Junio's initial concern here is a good first step in that direction. Thanks, -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: > * In a partial clone, it is likely that we get loose objects only > due to a command like "git log -p" that downloads blobs > one-by-one. In such a case, this step coming in later and picking > up those blobs _will_ find good deltas because they are present > in the same batch. > > * (I know this case isn't important to core Git, but please indulge > me) In a VFS for Git repo, the loose objects correspond to blobs > that were faulted in by a virtual filesystem read. In this case, > the blobs are usually from a single commit in history, so good > deltas between the blobs don't actually exist! Let me stop here by saying that I am now starting to worry about overfitting the repacking strategy to lazy clone repositories. I am perfectly fine with the plan to start with just one strategy overfit for partially cloned repositories, as long as we make sure that we can be extended it to suit other access/object acquisition patterns.

gitgitgadget · 2020-07-23T22:04:39Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > 1. 'git multi-pack-index write' creates a multi-pack-index file if > one did not exist, and otherwise will update the multi-pack-index > with any new pack-files that appeared since the last write. This > is particularly relevant with the background fetch job. > > When the multi-pack-index sees two copies of the same object, it > stores the offset data into the newer pack-file. This means that > some old pack-files could become "unreferenced" which I will use > to mean "a pack-file that is in the pack-file list of the > multi-pack-index but none of the objects in the multi-pack-index > reference a location inside that pack-file." An obvious alternative is to favor the copy in the older pack, right? Is the expectation that over time, most of the objects that are relevant would reappear in newer packs, so that eventually by favoring the copies in the newer packs, we can retire and remove the old pack, keeping only the newer ones? But would that assumption hold? The old packs hold objects that are necessary for the older parts of the history, so unless you are cauterizing away the old history, these objects in the older packs are likely to stay with us longer than those used by the newer parts of the history, some of which may not even have been pushed out yet and can be rebased away? > 2. 'git multi-pack-index expire' deletes any unreferenced pack-files > and updaes the multi-pack-index to drop those pack-files from the > list. This is safe to do as concurrent Git processes will see the > multi-pack-index and not open those packs when looking for object > contents. (Similar to the 'loose-objects' job, there are some Git > commands that open pack-files regardless of the multi-pack-index, > but they are rarely used. Further, a user that self-selects to > use background operations would likely refrain from using those > commands.) OK. > 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set > of pack-files that are listed in the multi-pack-index and creates > a new pack-file containing the objects whose offsets are listed > by the multi-pack-index to be in those objects. The set of pack- > files is selected greedily by sorting the pack-files by modified > time and adding a pack-file to the set if its "expected size" is > smaller than the batch size until the total expected size of the > selected pack-files is at least the batch size. The "expected > size" is calculated by taking the size of the pack-file divided > by the number of objects in the pack-file and multiplied by the > number of objects from the multi-pack-index with offset in that > pack-file. The expected size approximats how much data from that approximates. > pack-file will contribute to the resulting pack-file size. The > intention is that the resulting pack-file will be close in size > to the provided batch size. > +static int maintenance_task_incremental_repack(void) > +{ > + if (multi_pack_index_write()) { > + error(_("failed to write multi-pack-index")); > + return 1; > + } > + > + if (multi_pack_index_verify()) { > + warning(_("multi-pack-index verify failed after initial write")); > + return rewrite_multi_pack_index(); > + } > + > + if (multi_pack_index_expire()) { > + error(_("multi-pack-index expire failed")); > + return 1; > + } > + > + if (multi_pack_index_verify()) { > + warning(_("multi-pack-index verify failed after expire")); > + return rewrite_multi_pack_index(); > + } > + if (multi_pack_index_repack()) { > + error(_("multi-pack-index repack failed")); > + return 1; > + } Hmph, I wonder if these warning should come from each helper functions that are static to this function anyway. It also makes it easier to reason about this function by eliminating the need for having a different pattern only for the verify helper. Instead, verify could call rewrite internally when it notices a breakage. I.e. if (multi_pack_index_write()) return 1; if (multi_pack_index_verify("after initial write")) return 1; if (multi_pack_index_exire()) return 1; ... Also, it feels odd, compared to our internal API convention, that positive non-zero is used as an error here. > + return 0; > +} > + > typedef int maintenance_task_fn(void); > > struct maintenance_task { > @@ -1037,6 +1152,10 @@ static void initialize_tasks(void) > tasks[num_tasks]->fn = maintenance_task_loose_objects; > num_tasks++; > > + tasks[num_tasks]->name = "incremental-repack"; > + tasks[num_tasks]->fn = maintenance_task_incremental_repack; > + num_tasks++; > + > tasks[num_tasks]->name = "gc"; > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; Exactly the same comment as 08/18 about natural/inherent ordering applies here as well. Thanks.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/23/2020 6:00 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> 1. 'git multi-pack-index write' creates a multi-pack-index file if >> one did not exist, and otherwise will update the multi-pack-index >> with any new pack-files that appeared since the last write. This >> is particularly relevant with the background fetch job. >> >> When the multi-pack-index sees two copies of the same object, it >> stores the offset data into the newer pack-file. This means that >> some old pack-files could become "unreferenced" which I will use >> to mean "a pack-file that is in the pack-file list of the >> multi-pack-index but none of the objects in the multi-pack-index >> reference a location inside that pack-file." > > An obvious alternative is to favor the copy in the older pack, > right? Is the expectation that over time, most of the objects that > are relevant would reappear in newer packs, so that eventually by > favoring the copies in the newer packs, we can retire and remove the > old pack, keeping only the newer ones? > > But would that assumption hold? The old packs hold objects that are > necessary for the older parts of the history, so unless you are > cauterizing away the old history, these objects in the older packs > are likely to stay with us longer than those used by the newer parts > of the history, some of which may not even have been pushed out yet > and can be rebased away? If we created a new pack-file containing an already-packed object, then shouldn't we assume that the new pack-file does a _better_ job of compressing that object? Or at least, doesn't make it worse? For example, if we use 'git multi-pack-index repack --batch-size=0', then this creates a new pack-file containing every previously-packed object. This new pack-file should have better delta compression than the previous setup across multiple pack-files. We want this new pack-file to be used, not the old ones. This "pick the newest pack" strategy is also what allows us to safely use the 'expire' option to drop old pack-files. If we always keep the old copies, then when we try to switch the new pack-files, we cannot delete the old packs safely because a concurrent Git process could be trying to reference it. One race is as follows: * Process A opens the multi-pack-index referring to old pack P. It doesn't open the pack-file as it hasn't tried to parse objects yet. * Process B is 'git multi-pack-index expire'. It sees that pack P can be dropped because all objects appear in newer pack-files. It deletes P. * Process A tries to read from P, and this fails. A must reprepare its representation of the pack-files. This is the less disruptive race since A can recover with a small cost to its performance. The worse race (on Windows) is this: * Process A loads the multi-pack-index and tries to parse an object by loading "old" pack P. * Process B tries to delete P. However, on Windows the handle to P by A prevents the deletion. At this point, there could be two resolutions. The first is to have the 'expire' fail because we can't delete A. This means we might never delete A in a busy repository. The second is that the 'expire' command continues and drops A from the list in the multi-pack-index. However, now all Git processes add A to the packed_git list because it isn't referenced by the multi-pack-index. In summary: the decision to pick the newer copies is a fundamental part of how the write->expire->repack loop was designed. >> 2. 'git multi-pack-index expire' deletes any unreferenced pack-files >> and updaes the multi-pack-index to drop those pack-files from the >> list. This is safe to do as concurrent Git processes will see the >> multi-pack-index and not open those packs when looking for object >> contents. (Similar to the 'loose-objects' job, there are some Git >> commands that open pack-files regardless of the multi-pack-index, >> but they are rarely used. Further, a user that self-selects to >> use background operations would likely refrain from using those >> commands.) > > OK. > >> 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set >> of pack-files that are listed in the multi-pack-index and creates >> a new pack-file containing the objects whose offsets are listed >> by the multi-pack-index to be in those objects. The set of pack- >> files is selected greedily by sorting the pack-files by modified >> time and adding a pack-file to the set if its "expected size" is >> smaller than the batch size until the total expected size of the >> selected pack-files is at least the batch size. The "expected >> size" is calculated by taking the size of the pack-file divided >> by the number of objects in the pack-file and multiplied by the >> number of objects from the multi-pack-index with offset in that >> pack-file. The expected size approximats how much data from that > > approximates. > >> pack-file will contribute to the resulting pack-file size. The >> intention is that the resulting pack-file will be close in size >> to the provided batch size. > >> +static int maintenance_task_incremental_repack(void) >> +{ >> + if (multi_pack_index_write()) { >> + error(_("failed to write multi-pack-index")); >> + return 1; >> + } >> + >> + if (multi_pack_index_verify()) { >> + warning(_("multi-pack-index verify failed after initial write")); >> + return rewrite_multi_pack_index(); >> + } >> + >> + if (multi_pack_index_expire()) { >> + error(_("multi-pack-index expire failed")); >> + return 1; >> + } >> + >> + if (multi_pack_index_verify()) { >> + warning(_("multi-pack-index verify failed after expire")); >> + return rewrite_multi_pack_index(); >> + } >> + if (multi_pack_index_repack()) { >> + error(_("multi-pack-index repack failed")); >> + return 1; >> + } > > Hmph, I wonder if these warning should come from each helper > functions that are static to this function anyway. > > It also makes it easier to reason about this function by eliminating > the need for having a different pattern only for the verify helper. > Instead, verify could call rewrite internally when it notices a > breakage. I.e. > > if (multi_pack_index_write()) > return 1; > if (multi_pack_index_verify("after initial write")) > return 1; > if (multi_pack_index_exire()) > return 1; > ... This is a cleaner model. I'll work on that. > Also, it feels odd, compared to our internal API convention, that > positive non-zero is used as an error here. ... > Exactly the same comment as 08/18 about natural/inherent ordering > applies here as well. I'll leave these to be resolved in the earlier messages. Thanks, -Stolee

gitgitgadget · 2020-07-25T01:45:27Z

On the Git mailing list, Đoàn Trần Công Danh wrote (reply to this):

On 2020-07-23 17:56:30+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > When working with very large repositories, an incremental 'git fetch' > command can download a large amount of data. If there are many other > users pushing to a common repo, then this data can rival the initial > pack-file size of a 'git clone' of a medium-size repo. > > Users may want to keep the data on their local repos as close as > possible to the data on the remote repos by fetching periodically in > the background. This can break up a large daily fetch into several > smaller hourly fetches. > > The task is called "prefetch" because it is work done in advance > of a foreground fetch to make that 'git fetch' command much faster. > > However, if we simply ran 'git fetch <remote>' in the background, > then the user running a foregroudn 'git fetch <remote>' would lose > some important feedback when a new branch appears or an existing > branch updates. This is especially true if a remote branch is > force-updated and this isn't noticed by the user because it occurred > in the background. Further, the functionality of 'git push > --force-with-lease' becomes suspect. > > When running 'git fetch <remote> <options>' in the background, use > the following options for careful updating: Does this job interfere with FETCH_HEAD? From my quick test (by applying 01-08 on top of rc1, and messing with t7900), it looks like yes. I (and some other people, probably) rely on FETCH_HEAD for our scripts. Hence, it would be nice to not touch FETCH_HEAD with prefetch job. Thanks, -Danh > > 1. --no-tags prevents getting a new tag when a user wants to see > the new tags appear in their foreground fetches. > > 2. --refmap= removes the configured refspec which usually updates > refs/remotes/<remote>/* with the refs advertised by the remote. > > 3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*" > we can ensure that we actually load the new values somewhere in > our refspace while not updating refs/heads or refs/remotes. By > storing these refs here, the commit-graph job will update the > commit-graph with the commits from these hidden refs. > > 4. --prune will delete the refs/prefetch/<remote> refs that no > longer appear on the remote. > > We've been using this step as a critical background job in Scalar > [1] (and VFS for Git). This solved a pain point that was showing up > in user reports: fetching was a pain! Users do not like waiting to > download the data that was created while they were away from their > machines. After implementing background fetch, the foreground fetch > commands sped up significantly because they mostly just update refs > and download a small amount of new data. The effect is especially > dramatic when paried with --no-show-forced-udpates (through > fetch.showForcedUpdates=false). > > [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Documentation/git-maintenance.txt | 12 ++++++ > builtin/gc.c | 64 ++++++++++++++++++++++++++++++- > t/t7900-maintenance.sh | 24 ++++++++++++ > 3 files changed, 99 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 9204762e21..0927643247 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous > `commit-graph-chain` file. They will be deleted by a later run based on > the expiration delay. > > +prefetch:: > + The `fetch` task updates the object directory with the latest objects > + from all registered remotes. For each remote, a `git fetch` command > + is run. The refmap is custom to avoid updating local or remote > + branches (those in `refs/heads` or `refs/remotes`). Instead, the > + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are > + not updated. > ++ > +This means that foreground fetches are still required to update the > +remote refs, but the users is notified when the branches and tags are > +updated on the remote. > + > gc:: > Cleanup unnecessary files and optimize the local repository. "GC" > stands for "garbage collection," but this task performs many > diff --git a/builtin/gc.c b/builtin/gc.c > index 5d99b4b805..969c127877 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -28,6 +28,7 @@ > #include "blob.h" > #include "tree.h" > #include "promisor-remote.h" > +#include "remote.h" > > #define FAILED_RUN "failed to run %s" > > @@ -700,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > return 0; > } > > -#define MAX_NUM_TASKS 2 > +#define MAX_NUM_TASKS 3 > > static const char * const builtin_maintenance_usage[] = { > N_("git maintenance run [<options>]"), > @@ -781,6 +782,63 @@ static int maintenance_task_commit_graph(void) > return 1; > } > > +static int fetch_remote(const char *remote) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + struct strbuf refmap = STRBUF_INIT; > + > + argv_array_pushl(&cmd, "fetch", remote, "--prune", > + "--no-tags", "--refmap=", NULL); > + > + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); > + argv_array_push(&cmd, refmap.buf); > + > + if (opts.quiet) > + argv_array_push(&cmd, "--quiet"); > + > + result = run_command_v_opt(cmd.argv, RUN_GIT_CMD); > + > + strbuf_release(&refmap); > + return result; > +} > + > +static int fill_each_remote(struct remote *remote, void *cbdata) > +{ > + struct string_list *remotes = (struct string_list *)cbdata; > + > + string_list_append(remotes, remote->name); > + return 0; > +} > + > +static int maintenance_task_prefetch(void) > +{ > + int result = 0; > + struct string_list_item *item; > + struct string_list remotes = STRING_LIST_INIT_DUP; > + > + if (for_each_remote(fill_each_remote, &remotes)) { > + error(_("failed to fill remotes")); > + result = 1; > + goto cleanup; > + } > + > + /* > + * Do not modify the result based on the success of the 'fetch' > + * operation, as a loss of network could cause 'fetch' to fail > + * quickly. We do not want that to stop the rest of our > + * background operations. > + */ > + for (item = remotes.items; > + item && item < remotes.items + remotes.nr; > + item++) > + fetch_remote(item->string); > + > +cleanup: > + string_list_clear(&remotes, 0); > + return result; > +} > + > static int maintenance_task_gc(void) > { > int result; > @@ -871,6 +929,10 @@ static void initialize_tasks(void) > for (i = 0; i < MAX_NUM_TASKS; i++) > tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); > > + tasks[num_tasks]->name = "prefetch"; > + tasks[num_tasks]->fn = maintenance_task_prefetch; > + num_tasks++; > + > tasks[num_tasks]->name = "gc"; > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index c09a9eb90b..8b04a04c79 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -44,4 +44,28 @@ test_expect_success 'run --task duplicate' ' > test_i18ngrep "cannot be selected multiple times" err > ' > > +test_expect_success 'run --task=prefetch with no remotes' ' > + git maintenance run --task=prefetch 2>err && > + test_must_be_empty err > +' > + > +test_expect_success 'prefetch multiple remotes' ' > + git clone . clone1 && > + git clone . clone2 && > + git remote add remote1 "file://$(pwd)/clone1" && > + git remote add remote2 "file://$(pwd)/clone2" && > + git -C clone1 switch -c one && > + git -C clone2 switch -c two && > + test_commit -C clone1 one && > + test_commit -C clone2 two && > + GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch && > + grep ",\"fetch\",\"remote1\"" run-prefetch.txt && > + grep ",\"fetch\",\"remote2\"" run-prefetch.txt && > + test_path_is_missing .git/refs/remotes && > + test_cmp clone1/.git/refs/heads/one .git/refs/prefetch/remote1/one && > + test_cmp clone2/.git/refs/heads/two .git/refs/prefetch/remote2/two && > + git log prefetch/remote1/one && > + git log prefetch/remote2/two > +' > + > test_done > -- > gitgitgadget > -- Danh

On the Git mailing list, Junio C Hamano wrote (reply to this):

Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: >> When running 'git fetch <remote> <options>' in the background, use >> the following options for careful updating: > > Does this job interfere with FETCH_HEAD? > From my quick test (by applying 01-08 on top of rc1, and messing with t7900), > it looks like yes. > > I (and some other people, probably) rely on FETCH_HEAD for our scripts. > Hence, it would be nice to not touch FETCH_HEAD with prefetch job. Very good point. For that, Derrick may want to swallow the single patch from 'jc/no-update-fetch-head' topic into this series and pass the new command line option.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/24/2020 9:48 PM, Junio C Hamano wrote: > Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > >>> When running 'git fetch <remote> <options>' in the background, use >>> the following options for careful updating: >> >> Does this job interfere with FETCH_HEAD? >> From my quick test (by applying 01-08 on top of rc1, and messing with t7900), >> it looks like yes. >> >> I (and some other people, probably) rely on FETCH_HEAD for our scripts. >> Hence, it would be nice to not touch FETCH_HEAD with prefetch job. > > Very good point. For that, Derrick may want to swallow the single > patch from 'jc/no-update-fetch-head' topic into this series and > pass the new command line option. Thanks for the pointer! I appreciate the attention to detail here. I'll rebase onto jc/no-update-fetch-head for the next version, since that branch is based on v2.28.0-rc0, which is recent enough. Thanks, -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: > I'll rebase onto jc/no-update-fetch-head for the next version, since > that branch is based on v2.28.0-rc0, which is recent enough. I do not think it is wise to base a work on top of unfinished "you could do it this way, perhaps?" demonstration patch the original author does not have much inclination to finish, though. When I am really bored, I may go back to the topic to finish it, but I wouldn't mind if you took ownership of it at all. Thanks.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2020 12:13 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> I'll rebase onto jc/no-update-fetch-head for the next version, since >> that branch is based on v2.28.0-rc0, which is recent enough. > > I do not think it is wise to base a work on top of unfinished "you > could do it this way, perhaps?" demonstration patch the original > author does not have much inclination to finish, though. > > When I am really bored, I may go back to the topic to finish it, but > I wouldn't mind if you took ownership of it at all. Ah. I didn't understand the status of that branch. I'll pull it in to this topic. Thanks, -Stolee

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes: > On 7/27/2020 12:13 PM, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>> I'll rebase onto jc/no-update-fetch-head for the next version, since >>> that branch is based on v2.28.0-rc0, which is recent enough. >> >> I do not think it is wise to base a work on top of unfinished "you >> could do it this way, perhaps?" demonstration patch the original >> author does not have much inclination to finish, though. >> >> When I am really bored, I may go back to the topic to finish it, but >> I wouldn't mind if you took ownership of it at all. > > Ah. I didn't understand the status of that branch. I'll pull it in > to this topic. So here is with one of the two things that I found missing in the first iteration of the patch: documentation. The other thing that I found iffy (and still missing from this version) was what should be done when "git pull" is explicitly given the "--no-write-fetch-head" option. I think (but didn't check the recent code) that 'git pull' would pass only known-to-make-sense command line options to underlying 'git fetch', so it probably will barf with "unknown option", which is the best case. We might want to make it sure with a new test in 5521. On the other hand, if we get anything other than "no such option", we may want to think if we want to "fix" it or just leave it inside "if it hurts, don't do it" territory. Thanks. The patch without doc was Reviewed-by: Taylor Blau <me@ttaylorr.com> but this round has not been. -- >8 -- If you run fetch but record the result in remote-tracking branches, and either if you do nothing with the fetched refs (e.g. you are merely mirroring) or if you always work from the remote-tracking refs (e.g. you fetch and then merge origin/branchname separately), you can get away with having no FETCH_HEAD at all. Teach "git fetch" a command line option "--[no-]write-fetch-head" and "fetch.writeFetchHEAD" configuration variable. Without either, the default is to write FETCH_HEAD, and the usual rule that the command line option defeats configured default applies. Note that under "--dry-run" mode, FETCH_HEAD is never written; otherwise you'd see list of objects in the file that you do not actually have. Passing `--fetch-write-head` does not force `git fetch` to write the file. Also note that this option is explicitly passed when "git pull" internally invokes "git fetch", so that those who configured their "git fetch" not to write FETCH_HEAD would not be able to break the cooperation between these two commands. "git pull" must see what "git fetch" got recorded in FETCH_HEAD to work correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/config/fetch.txt | 7 ++++++ Documentation/fetch-options.txt | 10 +++++++++ builtin/fetch.c | 19 +++++++++++++--- builtin/pull.c | 3 ++- t/t5510-fetch.sh | 39 +++++++++++++++++++++++++++++++-- 5 files changed, 72 insertions(+), 6 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index b20394038d..0aaa05e8c0 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -91,3 +91,10 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.writeFetchHEAD:: + Setting it to false tells `git fetch` not to write the list + of remote refs fetched in the `FETCH_HEAD` file directly + under `$GIT_DIR`. Can be countermanded from the command + line with the `--[no-]write-fetch-head` option. Defaults to + true. diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 6e2a160a47..6775e8499f 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -64,6 +64,16 @@ documented in linkgit:git-config[1]. --dry-run:: Show what would be done, without making any changes. +ifndef::git-pull[] +--[no-]write-fetch-head:: + Write the list of remote refs fetched in the `FETCH_HEAD` + file directly under `$GIT_DIR`. This is the default unless + the configuration variable `fetch.writeFetchHEAD` is set to + false. Passing `--no-write-fetch-head` from the command + line tells Git not to write the file. Under `--dry-run` + option, the file is never written. +endif::git-pull[] + -f:: --force:: When 'git fetch' is used with `<src>:<dst>` refspec it may diff --git a/builtin/fetch.c b/builtin/fetch.c index 82ac4be8a5..3ccf69753f 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -56,6 +56,7 @@ static int prune_tags = -1; /* unspecified */ #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */ static int all, append, dry_run, force, keep, multiple, update_head_ok; +static int write_fetch_head = 1; static int verbosity, deepen_relative, set_upstream; static int progress = -1; static int enable_auto_gc = 1; @@ -118,6 +119,10 @@ static int git_fetch_config(const char *k, const char *v, void *cb) return 0; } + if (!strcmp(k, "fetch.writefetchhead")) { + write_fetch_head = git_config_bool(k, v); + return 0; + } return git_default_config(k, v, cb); } @@ -162,6 +167,8 @@ static struct option builtin_fetch_options[] = { PARSE_OPT_OPTARG, option_fetch_parse_recurse_submodules), OPT_BOOL(0, "dry-run", &dry_run, N_("dry run")), + OPT_BOOL(0, "write-fetch-head", &write_fetch_head, + N_("write fetched references to the FETCH_HEAD file")), OPT_BOOL('k', "keep", &keep, N_("keep downloaded pack")), OPT_BOOL('u', "update-head-ok", &update_head_ok, N_("allow updating of HEAD ref")), @@ -893,7 +900,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, const char *what, *kind; struct ref *rm; char *url; - const char *filename = dry_run ? "/dev/null" : git_path_fetch_head(the_repository); + const char *filename = (!write_fetch_head + ? "/dev/null" + : git_path_fetch_head(the_repository)); int want_status; int summary_width = transport_summary_width(ref_map); @@ -1327,7 +1336,7 @@ static int do_fetch(struct transport *transport, } /* if not appending, truncate FETCH_HEAD */ - if (!append && !dry_run) { + if (!append && write_fetch_head) { retcode = truncate_fetch_head(); if (retcode) goto cleanup; @@ -1594,7 +1603,7 @@ static int fetch_multiple(struct string_list *list, int max_children) int i, result = 0; struct argv_array argv = ARGV_ARRAY_INIT; - if (!append && !dry_run) { + if (!append && write_fetch_head) { int errcode = truncate_fetch_head(); if (errcode) return errcode; @@ -1795,6 +1804,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (depth || deepen_since || deepen_not.nr) deepen = 1; + /* FETCH_HEAD never gets updated in --dry-run mode */ + if (dry_run) + write_fetch_head = 0; + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/builtin/pull.c b/builtin/pull.c index 8159c5d7c9..e988d92b53 100644 --- a/builtin/pull.c +++ b/builtin/pull.c @@ -527,7 +527,8 @@ static int run_fetch(const char *repo, const char **refspecs) struct argv_array args = ARGV_ARRAY_INIT; int ret; - argv_array_pushl(&args, "fetch", "--update-head-ok", NULL); + argv_array_pushl(&args, "fetch", "--update-head-ok", + "--write-fetch-head", NULL); /* Shared options */ argv_push_verbosity(&args); diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index a66dbe0bde..3052c2d8d5 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -539,13 +539,48 @@ test_expect_success 'fetch into the current branch with --update-head-ok' ' ' -test_expect_success 'fetch --dry-run' ' - +test_expect_success 'fetch --dry-run does not touch FETCH_HEAD' ' rm -f .git/FETCH_HEAD && git fetch --dry-run . && ! test -f .git/FETCH_HEAD ' +test_expect_success '--no-write-fetch-head does not touch FETCH_HEAD' ' + rm -f .git/FETCH_HEAD && + git fetch --no-write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success '--write-fetch-head gets defeated by --dry-run' ' + rm -f .git/FETCH_HEAD && + git fetch --dry-run --write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and FETCH_HEAD' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=no fetch . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD gets defeated by --dry-run' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=yes fetch --dry-run . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and --no-write-fetch-head' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=yes fetch --no-write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and --write-fetch-head' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=no fetch --write-fetch-head . && + test -f .git/FETCH_HEAD +' + test_expect_success "should be able to fetch with duplicate refspecs" ' mkdir dups && ( -- 2.28.0

On the Git mailing list, Phillip Wood wrote (reply to this):

On 28/07/2020 17:37, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> On 7/27/2020 12:13 PM, Junio C Hamano wrote: >>> Derrick Stolee <stolee@gmail.com> writes: >>> >>>> I'll rebase onto jc/no-update-fetch-head for the next version, since >>>> that branch is based on v2.28.0-rc0, which is recent enough. >>> >>> I do not think it is wise to base a work on top of unfinished "you >>> could do it this way, perhaps?" demonstration patch the original >>> author does not have much inclination to finish, though. >>> >>> When I am really bored, I may go back to the topic to finish it, but >>> I wouldn't mind if you took ownership of it at all. >> >> Ah. I didn't understand the status of that branch. I'll pull it in >> to this topic. > > So here is with one of the two things that I found missing in the > first iteration of the patch: documentation. > > The other thing that I found iffy (and still missing from this > version) was what should be done when "git pull" is explicitly given > the "--no-write-fetch-head" option. > > I think (but didn't check the recent code) that 'git pull' would > pass only known-to-make-sense command line options to underlying > 'git fetch', so it probably will barf with "unknown option", which > is the best case. We might want to make it sure with a new test in > 5521. On the other hand, if we get anything other than "no such > option", we may want to think if we want to "fix" it or just leave > it inside "if it hurts, don't do it" territory. > > Thanks. > > The patch without doc was Reviewed-by: Taylor Blau <me@ttaylorr.com> > but this round has not been. > > -- >8 -- > > If you run fetch but record the result in remote-tracking branches, > and either if you do nothing with the fetched refs (e.g. you are > merely mirroring) or if you always work from the remote-tracking > refs (e.g. you fetch and then merge origin/branchname separately), > you can get away with having no FETCH_HEAD at all. > > Teach "git fetch" a command line option "--[no-]write-fetch-head" > and "fetch.writeFetchHEAD" configuration variable. Without either, > the default is to write FETCH_HEAD, and the usual rule that the > command line option defeats configured default applies. > > Note that under "--dry-run" mode, FETCH_HEAD is never written; > otherwise you'd see list of objects in the file that you do not > actually have. Passing `--fetch-write-head` Typo, it should be `--write-fetch-head` >does not force `git > fetch` to write the file. > > Also note that this option is explicitly passed when "git pull" > internally invokes "git fetch", so that those who configured their > "git fetch" not to write FETCH_HEAD would not be able to break the > cooperation between these two commands. "git pull" must see what > "git fetch" got recorded in FETCH_HEAD to work correctly. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/config/fetch.txt | 7 ++++++ > Documentation/fetch-options.txt | 10 +++++++++ > builtin/fetch.c | 19 +++++++++++++--- > builtin/pull.c | 3 ++- > t/t5510-fetch.sh | 39 +++++++++++++++++++++++++++++++-- > 5 files changed, 72 insertions(+), 6 deletions(-) > > diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt > index b20394038d..0aaa05e8c0 100644 > --- a/Documentation/config/fetch.txt > +++ b/Documentation/config/fetch.txt > @@ -91,3 +91,10 @@ fetch.writeCommitGraph:: > merge and the write may take longer. Having an updated commit-graph > file helps performance of many Git commands, including `git merge-base`, > `git push -f`, and `git log --graph`. Defaults to false. > + > +fetch.writeFetchHEAD:: > + Setting it to false tells `git fetch` not to write the list > + of remote refs fetched in the `FETCH_HEAD` file directly > + under `$GIT_DIR`. Can be countermanded from the command > + line with the `--[no-]write-fetch-head` option. Defaults to > + true. > diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt > index 6e2a160a47..6775e8499f 100644 > --- a/Documentation/fetch-options.txt > +++ b/Documentation/fetch-options.txt > @@ -64,6 +64,16 @@ documented in linkgit:git-config[1]. > --dry-run:: > Show what would be done, without making any changes. > > +ifndef::git-pull[] > +--[no-]write-fetch-head:: > + Write the list of remote refs fetched in the `FETCH_HEAD` > + file directly under `$GIT_DIR`. This is the default unless > + the configuration variable `fetch.writeFetchHEAD` is set to > + false. Passing `--no-write-fetch-head` from the command > + line tells Git not to write the file. Under `--dry-run` > + option, the file is never written. > +endif::git-pull[] > + > -f:: > --force:: > When 'git fetch' is used with `<src>:<dst>` refspec it may > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 82ac4be8a5..3ccf69753f 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -56,6 +56,7 @@ static int prune_tags = -1; /* unspecified */ > #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */ > > static int all, append, dry_run, force, keep, multiple, update_head_ok; > +static int write_fetch_head = 1; > static int verbosity, deepen_relative, set_upstream; > static int progress = -1; > static int enable_auto_gc = 1; > @@ -118,6 +119,10 @@ static int git_fetch_config(const char *k, const char *v, void *cb) > return 0; > } > > + if (!strcmp(k, "fetch.writefetchhead")) { > + write_fetch_head = git_config_bool(k, v); > + return 0; > + } > return git_default_config(k, v, cb); > } > > @@ -162,6 +167,8 @@ static struct option builtin_fetch_options[] = { > PARSE_OPT_OPTARG, option_fetch_parse_recurse_submodules), > OPT_BOOL(0, "dry-run", &dry_run, > N_("dry run")), > + OPT_BOOL(0, "write-fetch-head", &write_fetch_head, > + N_("write fetched references to the FETCH_HEAD file")), > OPT_BOOL('k', "keep", &keep, N_("keep downloaded pack")), > OPT_BOOL('u', "update-head-ok", &update_head_ok, > N_("allow updating of HEAD ref")), > @@ -893,7 +900,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, > const char *what, *kind; > struct ref *rm; > char *url; > - const char *filename = dry_run ? "/dev/null" : git_path_fetch_head(the_repository); > + const char *filename = (!write_fetch_head > + ? "/dev/null" > + : git_path_fetch_head(the_repository)); I was suspicious of this as we haven't cleared write_fetch_head in the --dry-run case yet but the test below seems to show that we still don't write FETCH_HEAD if --dry-run is given. That makes we wonder what the point of setting the filename based on the value of write_fetch_head is. The rest looks good to me, though it might be worth having a test to check that pull does indeed reject --no-write-fetch-head Best Wishes Phillip > int want_status; > int summary_width = transport_summary_width(ref_map); > > @@ -1327,7 +1336,7 @@ static int do_fetch(struct transport *transport, > } > > /* if not appending, truncate FETCH_HEAD */ > - if (!append && !dry_run) { > + if (!append && write_fetch_head) { > retcode = truncate_fetch_head(); > if (retcode) > goto cleanup; > @@ -1594,7 +1603,7 @@ static int fetch_multiple(struct string_list *list, int max_children) > int i, result = 0; > struct argv_array argv = ARGV_ARRAY_INIT; > > - if (!append && !dry_run) { > + if (!append && write_fetch_head) { > int errcode = truncate_fetch_head(); > if (errcode) > return errcode; > @@ -1795,6 +1804,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) > if (depth || deepen_since || deepen_not.nr) > deepen = 1; > > + /* FETCH_HEAD never gets updated in --dry-run mode */ > + if (dry_run) > + write_fetch_head = 0; > + > if (all) { > if (argc == 1) > die(_("fetch --all does not take a repository argument")); > diff --git a/builtin/pull.c b/builtin/pull.c > index 8159c5d7c9..e988d92b53 100644 > --- a/builtin/pull.c > +++ b/builtin/pull.c > @@ -527,7 +527,8 @@ static int run_fetch(const char *repo, const char **refspecs) > struct argv_array args = ARGV_ARRAY_INIT; > int ret; > > - argv_array_pushl(&args, "fetch", "--update-head-ok", NULL); > + argv_array_pushl(&args, "fetch", "--update-head-ok", > + "--write-fetch-head", NULL); > > /* Shared options */ > argv_push_verbosity(&args); > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh > index a66dbe0bde..3052c2d8d5 100755 > --- a/t/t5510-fetch.sh > +++ b/t/t5510-fetch.sh > @@ -539,13 +539,48 @@ test_expect_success 'fetch into the current branch with --update-head-ok' ' > > ' > > -test_expect_success 'fetch --dry-run' ' > - > +test_expect_success 'fetch --dry-run does not touch FETCH_HEAD' ' > rm -f .git/FETCH_HEAD && > git fetch --dry-run . && > ! test -f .git/FETCH_HEAD > ' > > +test_expect_success '--no-write-fetch-head does not touch FETCH_HEAD' ' > + rm -f .git/FETCH_HEAD && > + git fetch --no-write-fetch-head . && > + ! test -f .git/FETCH_HEAD > +' > + > +test_expect_success '--write-fetch-head gets defeated by --dry-run' ' > + rm -f .git/FETCH_HEAD && > + git fetch --dry-run --write-fetch-head . && > + ! test -f .git/FETCH_HEAD > +' > + > +test_expect_success 'fetch.writeFetchHEAD and FETCH_HEAD' ' > + rm -f .git/FETCH_HEAD && > + git -c fetch.writeFetchHEAD=no fetch . && > + ! test -f .git/FETCH_HEAD > +' > + > +test_expect_success 'fetch.writeFetchHEAD gets defeated by --dry-run' ' > + rm -f .git/FETCH_HEAD && > + git -c fetch.writeFetchHEAD=yes fetch --dry-run . && > + ! test -f .git/FETCH_HEAD > +' > + > +test_expect_success 'fetch.writeFetchHEAD and --no-write-fetch-head' ' > + rm -f .git/FETCH_HEAD && > + git -c fetch.writeFetchHEAD=yes fetch --no-write-fetch-head . && > + ! test -f .git/FETCH_HEAD > +' > + > +test_expect_success 'fetch.writeFetchHEAD and --write-fetch-head' ' > + rm -f .git/FETCH_HEAD && > + git -c fetch.writeFetchHEAD=no fetch --write-fetch-head . && > + test -f .git/FETCH_HEAD > +' > + > test_expect_success "should be able to fetch with duplicate refspecs" ' > mkdir dups && > ( >

On the Git mailing list, Phillip Wood wrote (reply to this):

On 29/07/2020 10:12, Phillip Wood wrote: > On 28/07/2020 17:37, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>> On 7/27/2020 12:13 PM, Junio C Hamano wrote: >>>> Derrick Stolee <stolee@gmail.com> writes: >>>> >>>>> I'll rebase onto jc/no-update-fetch-head for the next version, since >>>>> that branch is based on v2.28.0-rc0, which is recent enough. >>>> >>>> I do not think it is wise to base a work on top of unfinished "you >>>> could do it this way, perhaps?" demonstration patch the original >>>> author does not have much inclination to finish, though. >>>> >>>> When I am really bored, I may go back to the topic to finish it, but >>>> I wouldn't mind if you took ownership of it at all. >>> >>> Ah. I didn't understand the status of that branch. I'll pull it in >>> to this topic. >> >> So here is with one of the two things that I found missing in the >> first iteration of the patch: documentation. >> >> The other thing that I found iffy (and still missing from this >> version) was what should be done when "git pull" is explicitly given >> the "--no-write-fetch-head" option. >> >> I think (but didn't check the recent code) that 'git pull' would >> pass only known-to-make-sense command line options to underlying >> 'git fetch', so it probably will barf with "unknown option", which >> is the best case. We might want to make it sure with a new test in >> 5521. On the other hand, if we get anything other than "no such >> option", we may want to think if we want to "fix" it or just leave >> it inside "if it hurts, don't do it" territory. >> >> Thanks. >> >> The patch without doc was Reviewed-by: Taylor Blau <me@ttaylorr.com> >> but this round has not been. >> >> -- >8 -- >> >> If you run fetch but record the result in remote-tracking branches, >> and either if you do nothing with the fetched refs (e.g. you are >> merely mirroring) or if you always work from the remote-tracking >> refs (e.g. you fetch and then merge origin/branchname separately), >> you can get away with having no FETCH_HEAD at all. >> >> Teach "git fetch" a command line option "--[no-]write-fetch-head" >> and "fetch.writeFetchHEAD" configuration variable. Without either, >> the default is to write FETCH_HEAD, and the usual rule that the >> command line option defeats configured default applies. >> >> Note that under "--dry-run" mode, FETCH_HEAD is never written; >> otherwise you'd see list of objects in the file that you do not >> actually have. Passing `--fetch-write-head` > > Typo, it should be `--write-fetch-head` > >> does not force `git >> fetch` to write the file. >> >> Also note that this option is explicitly passed when "git pull" >> internally invokes "git fetch", so that those who configured their >> "git fetch" not to write FETCH_HEAD would not be able to break the >> cooperation between these two commands. "git pull" must see what >> "git fetch" got recorded in FETCH_HEAD to work correctly. >> >> Signed-off-by: Junio C Hamano <gitster@pobox.com> >> --- >> Documentation/config/fetch.txt | 7 ++++++ >> Documentation/fetch-options.txt | 10 +++++++++ >> builtin/fetch.c | 19 +++++++++++++--- >> builtin/pull.c | 3 ++- >> t/t5510-fetch.sh | 39 +++++++++++++++++++++++++++++++-- >> 5 files changed, 72 insertions(+), 6 deletions(-) >> >> diff --git a/Documentation/config/fetch.txt >> b/Documentation/config/fetch.txt >> index b20394038d..0aaa05e8c0 100644 >> --- a/Documentation/config/fetch.txt >> +++ b/Documentation/config/fetch.txt >> @@ -91,3 +91,10 @@ fetch.writeCommitGraph:: >> merge and the write may take longer. Having an updated commit-graph >> file helps performance of many Git commands, including `git >> merge-base`, >> `git push -f`, and `git log --graph`. Defaults to false. >> + >> +fetch.writeFetchHEAD:: >> + Setting it to false tells `git fetch` not to write the list >> + of remote refs fetched in the `FETCH_HEAD` file directly >> + under `$GIT_DIR`. Can be countermanded from the command >> + line with the `--[no-]write-fetch-head` option. Defaults to >> + true. >> diff --git a/Documentation/fetch-options.txt >> b/Documentation/fetch-options.txt >> index 6e2a160a47..6775e8499f 100644 >> --- a/Documentation/fetch-options.txt >> +++ b/Documentation/fetch-options.txt >> @@ -64,6 +64,16 @@ documented in linkgit:git-config[1]. >> --dry-run:: >> Show what would be done, without making any changes. >> +ifndef::git-pull[] >> +--[no-]write-fetch-head:: >> + Write the list of remote refs fetched in the `FETCH_HEAD` >> + file directly under `$GIT_DIR`. This is the default unless >> + the configuration variable `fetch.writeFetchHEAD` is set to >> + false. Passing `--no-write-fetch-head` from the command >> + line tells Git not to write the file. Under `--dry-run` >> + option, the file is never written. >> +endif::git-pull[] >> + >> -f:: >> --force:: >> When 'git fetch' is used with `<src>:<dst>` refspec it may >> diff --git a/builtin/fetch.c b/builtin/fetch.c >> index 82ac4be8a5..3ccf69753f 100644 >> --- a/builtin/fetch.c >> +++ b/builtin/fetch.c >> @@ -56,6 +56,7 @@ static int prune_tags = -1; /* unspecified */ >> #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */ >> static int all, append, dry_run, force, keep, multiple, update_head_ok; >> +static int write_fetch_head = 1; >> static int verbosity, deepen_relative, set_upstream; >> static int progress = -1; >> static int enable_auto_gc = 1; >> @@ -118,6 +119,10 @@ static int git_fetch_config(const char *k, const >> char *v, void *cb) >> return 0; >> } >> + if (!strcmp(k, "fetch.writefetchhead")) { >> + write_fetch_head = git_config_bool(k, v); >> + return 0; >> + } >> return git_default_config(k, v, cb); >> } >> @@ -162,6 +167,8 @@ static struct option builtin_fetch_options[] = { >> PARSE_OPT_OPTARG, option_fetch_parse_recurse_submodules), >> OPT_BOOL(0, "dry-run", &dry_run, >> N_("dry run")), >> + OPT_BOOL(0, "write-fetch-head", &write_fetch_head, >> + N_("write fetched references to the FETCH_HEAD file")), >> OPT_BOOL('k', "keep", &keep, N_("keep downloaded pack")), >> OPT_BOOL('u', "update-head-ok", &update_head_ok, >> N_("allow updating of HEAD ref")), >> @@ -893,7 +900,9 @@ static int store_updated_refs(const char *raw_url, >> const char *remote_name, >> const char *what, *kind; >> struct ref *rm; >> char *url; >> - const char *filename = dry_run ? "/dev/null" : >> git_path_fetch_head(the_repository); >> + const char *filename = (!write_fetch_head >> + ? "/dev/null" >> + : git_path_fetch_head(the_repository)); > > I was suspicious of this as we haven't cleared write_fetch_head in the > --dry-run case yet but the test below seems to show that we still don't > write FETCH_HEAD if --dry-run is given. That makes we wonder what the > point of setting the filename based on the value of write_fetch_head is. Sorry ignore that. I misread the hunk header - we have in fact cleared write_fetch_head in the --dry-run case by the time we get here. > The rest looks good to me, though it might be worth having a test to > check that pull does indeed reject --no-write-fetch-head > > Best Wishes > > Phillip > >> int want_status; >> int summary_width = transport_summary_width(ref_map); >> @@ -1327,7 +1336,7 @@ static int do_fetch(struct transport *transport, >> } >> /* if not appending, truncate FETCH_HEAD */ >> - if (!append && !dry_run) { >> + if (!append && write_fetch_head) { >> retcode = truncate_fetch_head(); >> if (retcode) >> goto cleanup; >> @@ -1594,7 +1603,7 @@ static int fetch_multiple(struct string_list >> *list, int max_children) >> int i, result = 0; >> struct argv_array argv = ARGV_ARRAY_INIT; >> - if (!append && !dry_run) { >> + if (!append && write_fetch_head) { >> int errcode = truncate_fetch_head(); >> if (errcode) >> return errcode; >> @@ -1795,6 +1804,10 @@ int cmd_fetch(int argc, const char **argv, >> const char *prefix) >> if (depth || deepen_since || deepen_not.nr) >> deepen = 1; >> + /* FETCH_HEAD never gets updated in --dry-run mode */ >> + if (dry_run) >> + write_fetch_head = 0; >> + >> if (all) { >> if (argc == 1) >> die(_("fetch --all does not take a repository argument")); >> diff --git a/builtin/pull.c b/builtin/pull.c >> index 8159c5d7c9..e988d92b53 100644 >> --- a/builtin/pull.c >> +++ b/builtin/pull.c >> @@ -527,7 +527,8 @@ static int run_fetch(const char *repo, const char >> **refspecs) >> struct argv_array args = ARGV_ARRAY_INIT; >> int ret; >> - argv_array_pushl(&args, "fetch", "--update-head-ok", NULL); >> + argv_array_pushl(&args, "fetch", "--update-head-ok", >> + "--write-fetch-head", NULL); >> /* Shared options */ >> argv_push_verbosity(&args); >> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh >> index a66dbe0bde..3052c2d8d5 100755 >> --- a/t/t5510-fetch.sh >> +++ b/t/t5510-fetch.sh >> @@ -539,13 +539,48 @@ test_expect_success 'fetch into the current >> branch with --update-head-ok' ' >> ' >> -test_expect_success 'fetch --dry-run' ' >> - >> +test_expect_success 'fetch --dry-run does not touch FETCH_HEAD' ' >> rm -f .git/FETCH_HEAD && >> git fetch --dry-run . && >> ! test -f .git/FETCH_HEAD >> ' >> +test_expect_success '--no-write-fetch-head does not touch FETCH_HEAD' ' >> + rm -f .git/FETCH_HEAD && >> + git fetch --no-write-fetch-head . && >> + ! test -f .git/FETCH_HEAD >> +' >> + >> +test_expect_success '--write-fetch-head gets defeated by --dry-run' ' >> + rm -f .git/FETCH_HEAD && >> + git fetch --dry-run --write-fetch-head . && >> + ! test -f .git/FETCH_HEAD >> +' >> + >> +test_expect_success 'fetch.writeFetchHEAD and FETCH_HEAD' ' >> + rm -f .git/FETCH_HEAD && >> + git -c fetch.writeFetchHEAD=no fetch . && >> + ! test -f .git/FETCH_HEAD >> +' >> + >> +test_expect_success 'fetch.writeFetchHEAD gets defeated by --dry-run' ' >> + rm -f .git/FETCH_HEAD && >> + git -c fetch.writeFetchHEAD=yes fetch --dry-run . && >> + ! test -f .git/FETCH_HEAD >> +' >> + >> +test_expect_success 'fetch.writeFetchHEAD and --no-write-fetch-head' ' >> + rm -f .git/FETCH_HEAD && >> + git -c fetch.writeFetchHEAD=yes fetch --no-write-fetch-head . && >> + ! test -f .git/FETCH_HEAD >> +' >> + >> +test_expect_success 'fetch.writeFetchHEAD and --write-fetch-head' ' >> + rm -f .git/FETCH_HEAD && >> + git -c fetch.writeFetchHEAD=no fetch --write-fetch-head . && >> + test -f .git/FETCH_HEAD >> +' >> + >> test_expect_success "should be able to fetch with duplicate refspecs" ' >> mkdir dups && >> ( >>

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/28/2020 12:37 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> On 7/27/2020 12:13 PM, Junio C Hamano wrote: >>> Derrick Stolee <stolee@gmail.com> writes: >>> >>>> I'll rebase onto jc/no-update-fetch-head for the next version, since >>>> that branch is based on v2.28.0-rc0, which is recent enough. >>> >>> I do not think it is wise to base a work on top of unfinished "you >>> could do it this way, perhaps?" demonstration patch the original >>> author does not have much inclination to finish, though. >>> >>> When I am really bored, I may go back to the topic to finish it, but >>> I wouldn't mind if you took ownership of it at all. >> >> Ah. I didn't understand the status of that branch. I'll pull it in >> to this topic. > > So here is with one of the two things that I found missing in the > first iteration of the patch: documentation. > > The other thing that I found iffy (and still missing from this > version) was what should be done when "git pull" is explicitly given > the "--no-write-fetch-head" option. > > I think (but didn't check the recent code) that 'git pull' would > pass only known-to-make-sense command line options to underlying > 'git fetch', so it probably will barf with "unknown option", which > is the best case. We might want to make it sure with a new test in > 5521. On the other hand, if we get anything other than "no such > option", we may want to think if we want to "fix" it or just leave > it inside "if it hurts, don't do it" territory. Here is the version I applied and updated. Please let me know what you think. -->8-- From 3f60a0f0fd388447aa9c2b805b5646039df98d0b Mon Sep 17 00:00:00 2001 From: Junio C Hamano <gitster@pobox.com> Date: Tue, 28 Jul 2020 09:37:59 -0700 Subject: [PATCH] fetch: optionally allow disabling FETCH_HEAD update If you run fetch but record the result in remote-tracking branches, and either if you do nothing with the fetched refs (e.g. you are merely mirroring) or if you always work from the remote-tracking refs (e.g. you fetch and then merge origin/branchname separately), you can get away with having no FETCH_HEAD at all. Teach "git fetch" a command line option "--[no-]write-fetch-head" and "fetch.writeFetchHEAD" configuration variable. Without either, the default is to write FETCH_HEAD, and the usual rule that the command line option defeats configured default applies. Note that under "--dry-run" mode, FETCH_HEAD is never written; otherwise you'd see list of objects in the file that you do not actually have. Passing `--write-fetch-head` does not force `git fetch` to write the file. Also note that this option is explicitly passed when "git pull" internally invokes "git fetch", so that those who configured their "git fetch" not to write FETCH_HEAD would not be able to break the cooperation between these two commands. "git pull" must see what "git fetch" got recorded in FETCH_HEAD to work correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/fetch.txt | 7 ++++++ Documentation/fetch-options.txt | 10 +++++++++ builtin/fetch.c | 19 +++++++++++++--- builtin/pull.c | 3 ++- t/t5510-fetch.sh | 39 +++++++++++++++++++++++++++++++-- t/t5521-pull-options.sh | 16 ++++++++++++++ 6 files changed, 88 insertions(+), 6 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index b20394038d1..0aaa05e8c0e 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -91,3 +91,10 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.writeFetchHEAD:: + Setting it to false tells `git fetch` not to write the list + of remote refs fetched in the `FETCH_HEAD` file directly + under `$GIT_DIR`. Can be countermanded from the command + line with the `--[no-]write-fetch-head` option. Defaults to + true. diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 495bc8ab5a1..6972ad2522c 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -64,6 +64,16 @@ documented in linkgit:git-config[1]. --dry-run:: Show what would be done, without making any changes. +ifndef::git-pull[] +--[no-]write-fetch-head:: + Write the list of remote refs fetched in the `FETCH_HEAD` + file directly under `$GIT_DIR`. This is the default unless + the configuration variable `fetch.writeFetchHEAD` is set to + false. Passing `--no-write-fetch-head` from the command + line tells Git not to write the file. Under `--dry-run` + option, the file is never written. +endif::git-pull[] + -f:: --force:: When 'git fetch' is used with `<src>:<dst>` refspec it may diff --git a/builtin/fetch.c b/builtin/fetch.c index d8253f66e4c..1d7194aac26 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -56,6 +56,7 @@ static int prune_tags = -1; /* unspecified */ #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */ static int all, append, dry_run, force, keep, multiple, update_head_ok; +static int write_fetch_head = 1; static int verbosity, deepen_relative, set_upstream; static int progress = -1; static int enable_auto_gc = 1; @@ -118,6 +119,10 @@ static int git_fetch_config(const char *k, const char *v, void *cb) return 0; } + if (!strcmp(k, "fetch.writefetchhead")) { + write_fetch_head = git_config_bool(k, v); + return 0; + } return git_default_config(k, v, cb); } @@ -162,6 +167,8 @@ static struct option builtin_fetch_options[] = { PARSE_OPT_OPTARG, option_fetch_parse_recurse_submodules), OPT_BOOL(0, "dry-run", &dry_run, N_("dry run")), + OPT_BOOL(0, "write-fetch-head", &write_fetch_head, + N_("write fetched references to the FETCH_HEAD file")), OPT_BOOL('k', "keep", &keep, N_("keep downloaded pack")), OPT_BOOL('u', "update-head-ok", &update_head_ok, N_("allow updating of HEAD ref")), @@ -895,7 +902,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, const char *what, *kind; struct ref *rm; char *url; - const char *filename = dry_run ? "/dev/null" : git_path_fetch_head(the_repository); + const char *filename = (!write_fetch_head + ? "/dev/null" + : git_path_fetch_head(the_repository)); int want_status; int summary_width = transport_summary_width(ref_map); @@ -1329,7 +1338,7 @@ static int do_fetch(struct transport *transport, } /* if not appending, truncate FETCH_HEAD */ - if (!append && !dry_run) { + if (!append && write_fetch_head) { retcode = truncate_fetch_head(); if (retcode) goto cleanup; @@ -1596,7 +1605,7 @@ static int fetch_multiple(struct string_list *list, int max_children) int i, result = 0; struct argv_array argv = ARGV_ARRAY_INIT; - if (!append && !dry_run) { + if (!append && write_fetch_head) { int errcode = truncate_fetch_head(); if (errcode) return errcode; @@ -1797,6 +1806,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (depth || deepen_since || deepen_not.nr) deepen = 1; + /* FETCH_HEAD never gets updated in --dry-run mode */ + if (dry_run) + write_fetch_head = 0; + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/builtin/pull.c b/builtin/pull.c index 8159c5d7c96..e988d92b535 100644 --- a/builtin/pull.c +++ b/builtin/pull.c @@ -527,7 +527,8 @@ static int run_fetch(const char *repo, const char **refspecs) struct argv_array args = ARGV_ARRAY_INIT; int ret; - argv_array_pushl(&args, "fetch", "--update-head-ok", NULL); + argv_array_pushl(&args, "fetch", "--update-head-ok", + "--write-fetch-head", NULL); /* Shared options */ argv_push_verbosity(&args); diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh index 9850ecde5df..31c91d0ed2e 100755 --- a/t/t5510-fetch.sh +++ b/t/t5510-fetch.sh @@ -539,13 +539,48 @@ test_expect_success 'fetch into the current branch with --update-head-ok' ' ' -test_expect_success 'fetch --dry-run' ' - +test_expect_success 'fetch --dry-run does not touch FETCH_HEAD' ' rm -f .git/FETCH_HEAD && git fetch --dry-run . && ! test -f .git/FETCH_HEAD ' +test_expect_success '--no-write-fetch-head does not touch FETCH_HEAD' ' + rm -f .git/FETCH_HEAD && + git fetch --no-write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success '--write-fetch-head gets defeated by --dry-run' ' + rm -f .git/FETCH_HEAD && + git fetch --dry-run --write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and FETCH_HEAD' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=no fetch . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD gets defeated by --dry-run' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=yes fetch --dry-run . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and --no-write-fetch-head' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=yes fetch --no-write-fetch-head . && + ! test -f .git/FETCH_HEAD +' + +test_expect_success 'fetch.writeFetchHEAD and --write-fetch-head' ' + rm -f .git/FETCH_HEAD && + git -c fetch.writeFetchHEAD=no fetch --write-fetch-head . && + test -f .git/FETCH_HEAD +' + test_expect_success "should be able to fetch with duplicate refspecs" ' mkdir dups && ( diff --git a/t/t5521-pull-options.sh b/t/t5521-pull-options.sh index 159afa7ac81..1acae3b9a4f 100755 --- a/t/t5521-pull-options.sh +++ b/t/t5521-pull-options.sh @@ -77,6 +77,7 @@ test_expect_success 'git pull -q -v --no-rebase' ' test_must_be_empty out && test -s err) ' + test_expect_success 'git pull --cleanup errors early on invalid argument' ' mkdir clonedcleanup && (cd clonedcleanup && git init && @@ -85,6 +86,21 @@ test_expect_success 'git pull --cleanup errors early on invalid argument' ' test -s err) ' +test_expect_success 'git pull --no-write-fetch-head fails' ' + mkdir clonedwfh && + (cd clonedwfh && git init && + test_must_fail git pull --no-write-fetch-head "../parent" >out 2>err && + test_must_be_empty out && + test_i18ngrep "no-write-fetch-head" err) +' + +test_expect_success 'git pull succeeds with fetch.writeFetchHEAD=false' ' + mkdir clonedwfhconfig && + (cd clonedwfhconfig && git init && + git config fetch.writeFetchHEAD false && + git pull "../parent" >out 2>err && + grep FETCH_HEAD err) +' test_expect_success 'git pull --force' ' mkdir clonedoldstyle && -- 2.27.0.366.g34746c1d11e.dirty

gitgitgadget · 2020-07-29T00:26:35Z

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jul 23, 2020 at 05:56:27PM +0000, Derrick Stolee via GitGitGadget wrote: > +static int run_write_commit_graph(void) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + > + argv_array_pushl(&cmd, "commit-graph", "write", > + "--split", "--reachable", NULL); > + > + if (opts.quiet) > + argv_array_push(&cmd, "--no-progress"); > + > + result = run_command_v_opt(cmd.argv, RUN_GIT_CMD); > + argv_array_clear(&cmd); > + > + return result; > +} This is a pretty minor nit, but since I happened to be looking at the merge of all of the recent argv_array callers today... :) You can write this a bit more succinctly by reusing the argv_array provided by the child_process: struct child_process cmd = CHILD_PROCESS_INIT; cmd.git_cmd = 1; argv_array_pushl(&cmd.args, "commit-graph", "write", "--split", "--reachable", NULL); if (opts.quiet) argv_array_push(&cmd.args, "--no-progress"); return run_command(&cmd); Then you don't have to worry about freeing the argv memory, because it's handled automatically. Like I said, quite minor, but it looks like this pattern appears in a few places, so it might be worth tweaking. And it would still work with the "pushf" people recommended to avoid extra strbufs (I saw another one in fetch_remote(), too). -Peff

gitgitgadget · 2020-07-29T22:23:17Z

On the Git mailing list, Emily Shaffer wrote (reply to this):

On Thu, Jul 23, 2020 at 05:56:31PM +0000, Derrick Stolee via GitGitGadget wrote: > +loose-objects:: > + The `loose-objects` job cleans up loose objects and places them into > + pack-files. In order to prevent race conditions with concurrent Git > + commands, it follows a two-step process. First, it deletes any loose > + objects that already exist in a pack-file; concurrent Git processes > + will examine the pack-file for the object data instead of the loose > + object. Second, it creates a new pack-file (starting with "loose-") [jonathan tan + jonathan nieder] If you are going to document this, probably it should also be tested, so the documentation does not become stale. Or, just don't document it. > +static int pack_loose(void) > +{ > + struct repository *r = the_repository; > + int result = 0; > + struct write_loose_object_data data; > + struct strbuf prefix = STRBUF_INIT; > + struct child_process *pack_proc; > + > + /* > + * Do not start pack-objects process > + * if there are no loose objects. > + */ > + if (!for_each_loose_file_in_objdir(r->objects->odb->path, > + loose_object_exists, > + NULL, NULL, NULL)) [emily] To me, this is unintuitive - but upon inspection, it's exiting the foreach early if any loose object is found, so this is cheaper than actually counting. Maybe a comment would help to understand? Or we could name the function differently, like "bail_if_loose()" or something? > +test_expect_success 'loose-objects task' ' > + # Repack everything so we know the state of the object dir > + git repack -adk && > + > + # Hack to stop maintenance from running during "git commit" > + echo in use >.git/objects/maintenance.lock && > + test_commit create-loose-object && [jonathan nieder] Does it make sense to use a different git command which is guaranteed to make a loose object? Is 'git commit' futureproof, if we decide commits should directly create packs in the future? 'git unpack-objects' is guaranteed to make a loose object, although it is clumsy because it needs a packfile to begin with... [jonathan tan] But, using 'git commit' is easier to understand in this context. Maybe commenting to say that we assume 'git commit' makes 1 or more loose objects will be enough to futureproof - then we have a signal to whoever made a change to make this fail, and that person knows how to fix this test.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/29/2020 6:21 PM, Emily Shaffer wrote: > On Thu, Jul 23, 2020 at 05:56:31PM +0000, Derrick Stolee via GitGitGadget wrote: >> +loose-objects:: >> + The `loose-objects` job cleans up loose objects and places them into >> + pack-files. In order to prevent race conditions with concurrent Git >> + commands, it follows a two-step process. First, it deletes any loose >> + objects that already exist in a pack-file; concurrent Git processes >> + will examine the pack-file for the object data instead of the loose >> + object. Second, it creates a new pack-file (starting with "loose-") > > [jonathan tan + jonathan nieder] If you are going to document this, > probably it should also be tested, so the documentation does not become > stale. Or, just don't document it. Adding a condition to the test is a good idea. >> +static int pack_loose(void) >> +{ >> + struct repository *r = the_repository; >> + int result = 0; >> + struct write_loose_object_data data; >> + struct strbuf prefix = STRBUF_INIT; >> + struct child_process *pack_proc; >> + >> + /* >> + * Do not start pack-objects process >> + * if there are no loose objects. >> + */ >> + if (!for_each_loose_file_in_objdir(r->objects->odb->path, >> + loose_object_exists, >> + NULL, NULL, NULL)) > > [emily] To me, this is unintuitive - but upon inspection, it's exiting > the foreach early if any loose object is found, so this is cheaper than > actually counting. Maybe a comment would help to understand? Or we could > name the function differently, like "bail_if_loose()" or something? Sure. "bail_on_loose()" makes more sense to me. >> +test_expect_success 'loose-objects task' ' >> + # Repack everything so we know the state of the object dir >> + git repack -adk && >> + >> + # Hack to stop maintenance from running during "git commit" >> + echo in use >.git/objects/maintenance.lock && >> + test_commit create-loose-object && > > [jonathan nieder] Does it make sense to use a different git command > which is guaranteed to make a loose object? Is 'git commit' futureproof, > if we decide commits should directly create packs in the future? 'git > unpack-objects' is guaranteed to make a loose object, although it is > clumsy because it needs a packfile to begin with... "unpack-objects" also has the problem that it won't write the loose object if it exists as a packed object, so it requires moving the pack-file out of the object directory first. (That's also required for testing that the loose objects get packed by the maintenance task, but I'd rather not need to remove the multi-pack-index, too.) > [jonathan tan] But, using 'git commit' is easier to understand in this > context. Maybe commenting to say that we assume 'git commit' makes 1 or > more loose objects will be enough to futureproof - then we have a signal > to whoever made a change to make this fail, and that person knows how to > fix this test. I'll add the comment for now. Thanks. -Stolee

gitgitgadget · 2020-07-29T22:23:19Z

On the Git mailing list, Emily Shaffer wrote (reply to this):

On Thu, Jul 23, 2020 at 05:56:32PM +0000, Derrick Stolee via GitGitGadget wrote: > 1. 'git multi-pack-index write' creates a multi-pack-index file if > one did not exist, and otherwise will update the multi-pack-index > with any new pack-files that appeared since the last write. This > is particularly relevant with the background fetch job. [emily shaffer] Will this use midx even if the user has disabled it in their config? > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index 94bb493733..3ec813979a 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -103,4 +103,41 @@ test_expect_success 'loose-objects task' ' > test_cmp packs-between packs-after > ' [emily shaffer] Can we include a test to prove that this task is or is not run if core.multipackindex is set or unset? That behavior is hard to deduce from the code... we might want to be cautious. > > +test_expect_success 'incremental-repack task' ' > + packDir=.git/objects/pack && > + for i in $(test_seq 1 5) > + do > + test_commit $i || return 1 > + done && > + > + # Create three disjoint pack-files with size BIG, small, small. > + echo HEAD~2 | git pack-objects --revs $packDir/test-1 && > + test_tick && > + git pack-objects --revs $packDir/test-2 <<-\EOF && > + HEAD~1 > + ^HEAD~2 > + EOF > + test_tick && > + git pack-objects --revs $packDir/test-3 <<-\EOF && > + HEAD > + ^HEAD~1 > + EOF > + rm -f $packDir/pack-* && > + rm -f $packDir/loose-* && > + ls $packDir/*.pack >packs-before && > + test_line_count = 3 packs-before && > + > + # the job repacks the two into a new pack, but does not > + # delete the old ones. > + git maintenance run --task=incremental-repack && > + ls $packDir/*.pack >packs-between && > + test_line_count = 4 packs-between && > + > + # the job deletes the two old packs, and does not write > + # a new one because only one pack remains. > + git maintenance run --task=incremental-repack && > + ls .git/objects/pack/*.pack >packs-after && > + test_line_count = 1 packs-after > +' > + > test_done > -- > gitgitgadget >

-Original file line number
+Diff line change
@@ Expand Up / @@ -90,6 +90,7 @@ @@
     /git-ls-tree
     /git-mailinfo
     /git-mailsplit
+    /git-maintenance
     /git-merge
     /git-merge-base
     /git-merge-index
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -396,6 +396,8 @@ include::config/mailinfo.txt[] @@
     include::config/mailmap.txt[]
+    include::config/maintenance.txt[]
     include::config/man.txt[]
     include::config/merge.txt[]
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -606,8 +606,8 @@ core.useReplaceRefs:: @@
     core.multiPackIndex::
     	Use the multi-pack-index file to track multiple packfiles using a
-    	single index. See link:technical/multi-pack-index.html[the
-    	multi-pack-index design document].
+    	single index. See linkgit:git-multi-pack-index[1] for more
+    	information. Defaults to true.
     core.sparseCheckout::
     	Enable "sparse checkout" feature. See linkgit:git-sparse-checkout[1]
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,32 @@
+    maintenance.<task>.enabled::
+    	This boolean config option controls whether the maintenance task
+    	with name `<task>` is run when no `--task` option is specified.
+    	By default, only `maintenance.gc.enabled` is true.
+    maintenance.commit-graph.auto::
+    	This integer config option controls how often the `commit-graph` task
+    	should be run as part of `git maintenance run --auto`. If zero, then
+    	the `commit-graph` task will not run with the `--auto` option. A
+    	negative value will force the task to run every time. Otherwise, a
+    	positive value implies the command should run when the number of
+    	reachable commits that are not in the commit-graph file is at least
+    	the value of `maintenance.commit-graph.auto`. The default value is
+.
+    maintenance.loose-objects.auto::
+    	This integer config option controls how often the `loose-objects` task
+    	should be run as part of `git maintenance run --auto`. If zero, then
+    	the `loose-objects` task will not run with the `--auto` option. A
+    	negative value will force the task to run every time. Otherwise, a
+    	positive value implies the command should run when the number of
+    	loose objects is at least the value of `maintenance.loose-objects.auto`.
+    	The default value is 100.
+    maintenance.incremental-repack.auto::
+    	This integer config option controls how often the `incremental-repack`
+    	task should be run as part of `git maintenance run --auto`. If zero,
+    	then the `incremental-repack` task will not run with the `--auto`
+    	option. A negative value will force the task to run every time.
+    	Otherwise, a positive value implies the command should run when the
+    	number of pack-files not in the multi-pack-index is at least the value
+    	of `maintenance.incremental-repack.auto`. The default value is 10.

-Original file line number
+Diff line change
@@ Expand Up @@
     other Git command that makes any existing commit unreferenced) in the
     source repository, some objects may become unreferenced (or dangling).
     These objects may be removed by normal Git operations (such as `git commit`)
-    which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-    If these objects are removed and were referenced by the cloned repository,
-    then the cloned repository will become corrupt.
+    which automatically call `git maintenance run --auto`. (See
+    linkgit:git-maintenance[1].) If these objects are removed and were referenced
+    by the cloned repository, then the cloned repository will become corrupt.
     +
     Note that running `git repack` without the `--local` option in a repository
     cloned with `--shared` will copy objects from the source repository into a pack
@@ Expand Down @@

Maintenance builtin, allowing 'gc --auto' customization #671

Maintenance builtin, allowing 'gc --auto' customization #671

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Jul 25, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 25, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 29, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 31, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 31, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 3, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Aug 5, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 25, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 9, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 9, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 9, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 9, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 9, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 25, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 23, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jul 24, 2020