Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance I: Command, gc and commit-graph tasks #695

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 6, 2020

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of ds/maintenance [1], but the discussion has gotten really long. To help focus the conversation, I'm splitting out the portions that create and test the 'maintenance' builtin from the additional tasks (prefetch, loose-objects, incremental-repack) that can be brought in later.

[1] https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git repositories. It has accumulated several smaller pieces in its long history, including:

  1. Repacking all reachable objects into one pack-file (and deleting unreachable objects).
  2. Packing refs.
  3. Expiring reflogs.
  4. Clearing rerere logs.
  5. Updating the commit-graph file.
  6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable objects are suitable under the guise of "garbage collection", packing refs and updating the commit-graph file are not as obviously fitting. Further, these operations are "all or nothing" in that they rewrite almost all repository data, which does not perform well at extremely large scales. These operations can also be disruptive to foreground Git commands when git gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create new choices for automatic maintenance activities, of which git gc remains the only one enabled by default.

The new maintenance tasks are:

  • 'commit-graph' : write and verify a single layer of an incremental commit-graph.
  • 'loose-objects' : prune packed loose objects, then create a new pack from a batch of loose objects.
  • 'pack-files' : expire redundant packs from the multi-pack-index, then repack using the multi-pack-index's incremental repack strategy.
  • 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch//'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will follow in a follow-up series. Including the 'commit-graph' task here allows us to build and test features like config settings and the --task= command-line argument.

These tasks are all disabled by default, but can be enabled with config options or run explicitly using "git maintenance run --task=". There are additional config options to allow customizing the conditions for which the tasks run during the '--auto' option.

Because 'gc' is implemented as a maintenance task, the most dramatic change of this series is to convert the 'git gc --auto' calls into 'git maintenance run --auto' calls at the end of some Git commands. By default, the only change is that 'git gc --auto' will be run below an additional 'git maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended later with subcommands that manage background maintenance, such as 'start' or 'stop'. These are not the subject of this series, as it is important to focus on the maintenance activities themselves.

Update in v4

A segfault when running just "git maintenance" is fixed.

Updates since v3

  • Two commit-message typos are fixed.

Thanks for all of the review!

Updates since v2

  • Based on jc/no-update-fetch-head instead of jk/strvec.

  • I realized that the other maintenance subcommands should not accept the options for the 'run' subcommand, so I reorganized the option parsing into that subcommand. This makes the range-diff noisier than it would have been otherwise.

  • While updating the parsing, I also updated the usage strings.

  • The "verify, then delete and rewrite on failure" logic is removed. I'll consider bringing this back with a way to test the behavior in a later patch series.

  • Other feedback from Jonathan Tan is applied.

Updates since v1 (of this series)

  • Documentation fixes.

  • The builtin code had some slight tweaks in PATCH 1.

UPDATES since v3 of [1]

  • The biggest change here is the use of "test_subcommand", based on Jonathan Nieder's approach. This requires having the exact command-line figured out, which now requires spelling out all --no- options. I also added a bunch of "2>/dev/null" checks because of the isatty(2) calls. Without that, the behavior will change depending on whether the test is run with -x/-v or without.

  • The option parsing has changed to use a local struct and pass that struct to the helper methods. This is instead of having a global singleton.

Thanks,
-Stolee

Cc: sandals@crustytoothpaste.net, steadmon@google.com, jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com, phillip.wood123@gmail.com, emilyshaffer@google.com, sluongng@gmail.com, jonathantanmy@google.com

cc: Jonathan Tan jonathantanmy@google.com
cc: Johannes Schindelin Johannes.Schindelin@gmx.de
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Derrick Stolee stolee@gmail.com

@derrickstolee derrickstolee changed the title Maintenance builtin: Command, gc and commit-graph tasks Maintenance I: Command, gc and commit-graph tasks Aug 6, 2020
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

@derrickstolee derrickstolee marked this pull request as ready for review August 6, 2020 15:54
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

This branch is now known as ds/maintenance.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2020

This patch series was integrated into seen via git@3fde867.

@gitgitgadget gitgitgadget bot added the seen label Aug 6, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 7, 2020

This patch series was integrated into seen via git@507626a.

@@ -90,6 +90,7 @@
/git-ls-tree
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Thu, 6 Aug 2020 at 19:57, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++

This "+" and the one below render literally so you would want to drop
them. (You're not in any kind of "list" here, so no need for a "list
continuation".)

> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.

Martin

@@ -0,0 +1,78 @@
git-maintenance(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 089fa4cedc..35b0be7d40 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -35,6 +35,24 @@ run::
>  TASKS
>  -----
>
> +commit-graph::
> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> +       then verifies that the written data is correct. If the new layer has an
> +       issue, then the chain file is removed and the `commit-graph` is
> +       rewritten from scratch.
> ++
> +The verification only checks the top layer of the `commit-graph` chain.
> +If the incremental write merged the new commits with at least one
> +existing layer, then there is potential for on-disk corruption being
> +carried forward into the new file. This will be noticed and the new
> +commit-graph file will be clean as Git reparses the commit data from
> +the object database.

This reads somewhat awkwardly. I think what you mean is something like
"is there a risk for on-disk corruption? yes, but no: we're clever
enough to detect it and avoid it". So from a user's point of view, I
think this is too detailed.

How about

 The verification checks the top layer of the resulting `commit-graph` chain.
 This ensures that the maintenance task leaves the top layer in a shape
 that matches the object database, even if it was ostensibly constructed
 by "merging in" existing, incorrect layers.

? I don't quite like it -- it's a bit too technical -- but it describes
the end result which the user should care about -- on-disk data is
consistent and correct -- rather than how we got there.

Perhaps:

 It is ensured that the resulting "top layer" is correct. This should
 help avoid most on-disk corruption of the commit-graph and ensure
 that the commit-graph matches the object database.

Don't know whether that's entirely true though...

Food for thought, perhaps.

There's something probabilistic about this whole thing: If a low layer
is corrupt, you might "eventually" get to replace it. I suppose it could
make sense to go "verify the whole thing, drop however many top layers
we need to drop to get only correct layers, then generate a new layer
(and merge and whatnot) on top of that". But the proposed commit message
makes it fairly clear that this would have other drawbacks and that we
don't really expect corrupt layers in the first place.

Martin

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/7/2020 6:29 PM, Martin Ågren wrote:
> On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
>> index 089fa4cedc..35b0be7d40 100644
>> --- a/Documentation/git-maintenance.txt
>> +++ b/Documentation/git-maintenance.txt
>> @@ -35,6 +35,24 @@ run::
>>  TASKS
>>  -----
>>
>> +commit-graph::
>> +       The `commit-graph` job updates the `commit-graph` files incrementally,
>> +       then verifies that the written data is correct. If the new layer has an
>> +       issue, then the chain file is removed and the `commit-graph` is
>> +       rewritten from scratch.
>> ++
>> +The verification only checks the top layer of the `commit-graph` chain.
>> +If the incremental write merged the new commits with at least one
>> +existing layer, then there is potential for on-disk corruption being
>> +carried forward into the new file. This will be noticed and the new
>> +commit-graph file will be clean as Git reparses the commit data from
>> +the object database.
> 
> This reads somewhat awkwardly. I think what you mean is something like
> "is there a risk for on-disk corruption? yes, but no: we're clever
> enough to detect it and avoid it". So from a user's point of view, I
> think this is too detailed.
> 
> How about
> 
>  The verification checks the top layer of the resulting `commit-graph` chain.
>  This ensures that the maintenance task leaves the top layer in a shape
>  that matches the object database, even if it was ostensibly constructed
>  by "merging in" existing, incorrect layers.
> 
> ? I don't quite like it -- it's a bit too technical -- but it describes
> the end result which the user should care about -- on-disk data is
> consistent and correct -- rather than how we got there.
> 
> Perhaps:
> 
>  It is ensured that the resulting "top layer" is correct. This should
>  help avoid most on-disk corruption of the commit-graph and ensure
>  that the commit-graph matches the object database.
> 
> Don't know whether that's entirely true though...

This is my understanding. We focus on the data we just wrote to see
if anything went wrong at a lower level (i.e. filesystem or RAM
corruption during the write process) instead of the data "at rest".

> Food for thought, perhaps.
> 
> There's something probabilistic about this whole thing: If a low layer
> is corrupt, you might "eventually" get to replace it. I suppose it could
> make sense to go "verify the whole thing, drop however many top layers
> we need to drop to get only correct layers, then generate a new layer
> (and merge and whatnot) on top of that". But the proposed commit message
> makes it fairly clear that this would have other drawbacks and that we
> don't really expect corrupt layers in the first place.

Right. And we want to keep the amount of work to be very small in most
cases, amortizing the cost of the big merge operations across many much
smaller runs. Perhaps a later change could introduce an option to drop
the '--shallow' option and check the entire chain, for users willing to
pay that price.

Back to the point of your comments: I'm not sure this second paragraph
is required at all in the documentation. The first paragraph already
says:

	...then verifies that the written data is correct.

This "written data" _is_ the top layer of the chain. There is probably
no reason to dig deeper into _why_ we do this in this user-facing
documentation.

So, I propose just deleting this paragraph. What do you think?

Thanks for keeping a close eye on documentation changes! I updated
my local branch to include the '+' problem from your other reply.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Martin Ågren wrote (reply to this):

On Wed, 12 Aug 2020 at 15:30, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/7/2020 6:29 PM, Martin Ågren wrote:
> > On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> >> index 089fa4cedc..35b0be7d40 100644
> >> --- a/Documentation/git-maintenance.txt
> >> +++ b/Documentation/git-maintenance.txt
> >> @@ -35,6 +35,24 @@ run::
> >>  TASKS
> >>  -----
> >>
> >> +commit-graph::
> >> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> >> +       then verifies that the written data is correct. If the new layer has an
> >> +       issue, then the chain file is removed and the `commit-graph` is
> >> +       rewritten from scratch.
> >> ++
> >> +The verification only checks the top layer of the `commit-graph` chain.
> >> +If the incremental write merged the new commits with at least one
> >> +existing layer, then there is potential for on-disk corruption being
> >> +carried forward into the new file. This will be noticed and the new
> >> +commit-graph file will be clean as Git reparses the commit data from
> >> +the object database.
> >
> > This reads somewhat awkwardly. I think what you mean is something like
> > "is there a risk for on-disk corruption? yes, but no: we're clever
> > enough to detect it and avoid it". So from a user's point of view, I
> > think this is too detailed.

[snip quite a bit]

> Back to the point of your comments: I'm not sure this second paragraph
> is required at all in the documentation. The first paragraph already
> says:
>
>         ...then verifies that the written data is correct.
>
> This "written data" _is_ the top layer of the chain. There is probably
> no reason to dig deeper into _why_ we do this in this user-facing
> documentation.
>
> So, I propose just deleting this paragraph. What do you think?

Yeah, that makes lots of sense. Thanks!


Martin

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 7, 2020

This patch series was integrated into seen via git@bf7a618.

@derrickstolee derrickstolee changed the base branch from jk/strvec to master August 10, 2020 18:34
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@e69a107.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@e8d9043.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2020

This patch series was integrated into seen via git@583391a.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2020

This patch series was integrated into seen via git@ab49a6f.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2020

This patch series was integrated into seen via git@a5744db.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2020

This patch series was integrated into seen via git@75e279c.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2020

This patch series was integrated into seen via git@f845188.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 13, 2020

This patch series was integrated into seen via git@667039e.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2020

This patch series was integrated into seen via git@bc7bb28.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2020

This patch series was integrated into seen via git@e802655.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 14, 2020

This patch series was integrated into seen via git@4c2b4aa.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2020

This patch series was integrated into seen via git@1c7923e.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2020

This patch series was integrated into seen via git@51f5b49.

The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2020

Submitted as pull.695.v5.git.1600366313.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v5

To fetch this version to local tag pr-695/derrickstolee/maintenance/builtin-v5:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-695/derrickstolee/maintenance/builtin-v5

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Update in v4
> ============
>
> A segfault when running just "git maintenance" is fixed.

The net change is just a single liner below, and is obviously
correct.

I propagated it through to part #2 and part #3 locally, and
hopefully this makes part #1 ready to go.

Thanks.



diff --git a/builtin/gc.c b/builtin/gc.c
index c3bcdc1167..090959350e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1027,7 +1027,8 @@ static const char builtin_maintenance_usage[] = N_("git maintenance run [<option
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
-	if (argc == 2 && !strcmp(argv[1], "-h"))
+	if (argc < 2 ||
+	    (argc == 2 && !strcmp(argv[1], "-h")))
 		usage(builtin_maintenance_usage);
 
 	if (!strcmp(argv[1], "run"))
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 4f6a04ddb1..53c883531e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -10,7 +10,9 @@ test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
 	test_expect_code 128 git maintenance barf 2>err &&
-	test_i18ngrep "invalid subcommand: barf" err
+	test_i18ngrep "invalid subcommand: barf" err &&
+	test_expect_code 129 git maintenance 2>err &&
+	test_i18ngrep "usage: git maintenance" err
 '
 
 test_expect_success 'run [--auto|--quiet]' '

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 17, 2020

This patch series was integrated into seen via git@5cb2693.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 18, 2020

This patch series was integrated into seen via git@a2d2679.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 18, 2020

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 17 Sep 2020, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Update in v4
> > ============
> >
> > A segfault when running just "git maintenance" is fixed.
>
> The net change is just a single liner below, and is obviously
> correct.
>
> I propagated it through to part #2 and part #3 locally, and
> hopefully this makes part #1 ready to go.

Yay!

> diff --git a/builtin/gc.c b/builtin/gc.c
> index c3bcdc1167..090959350e 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1027,7 +1027,8 @@ static const char builtin_maintenance_usage[] = N_("git maintenance run [<option
>
>  int cmd_maintenance(int argc, const char **argv, const char *prefix)
>  {
> -	if (argc == 2 && !strcmp(argv[1], "-h"))
> +	if (argc < 2 ||
> +	    (argc == 2 && !strcmp(argv[1], "-h")))
>  		usage(builtin_maintenance_usage);
>
>  	if (!strcmp(argv[1], "run"))
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 4f6a04ddb1..53c883531e 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -10,7 +10,9 @@ test_expect_success 'help text' '
>  	test_expect_code 129 git maintenance -h 2>err &&
>  	test_i18ngrep "usage: git maintenance run" err &&
>  	test_expect_code 128 git maintenance barf 2>err &&
> -	test_i18ngrep "invalid subcommand: barf" err
> +	test_i18ngrep "invalid subcommand: barf" err &&
> +	test_expect_code 129 git maintenance 2>err &&
> +	test_i18ngrep "usage: git maintenance" err
>  '
>
>  test_expect_success 'run [--auto|--quiet]' '
>

Yep, that's how I read the range-diff, too.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 18, 2020

User Johannes Schindelin <Johannes.Schindelin@gmx.de> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 18, 2020

This patch series was integrated into seen via git@1db4528.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 19, 2020

This patch series was integrated into seen via git@6567522.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 19, 2020

This patch series was integrated into next via git@4c367d3.

@gitgitgadget gitgitgadget bot added the next label Sep 19, 2020
@@ -699,3 +699,221 @@ int cmd_gc(int argc, const char **argv, const char *prefix)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Thu, Sep 17 2020, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Performing maintenance on a Git repository involves writing data to the
> .git directory, which is not safe to do with multiple writers attempting
> the same operation. Ensure that only one 'git maintenance' process is
> running at a time by holding a file-based lock. Simply the presence of
> the .git/maintenance.lock file will prevent future maintenance. This
> lock is never committed, since it does not represent meaningful data.
> Instead, it is only a placeholder.
>
> If the lock file already exists, then no maintenance tasks are
> attempted. This will become very important later when we implement the
> 'prefetch' task, as this is our stop-gap from creating a recursive process
> loop between 'git fetch' and 'git maintenance run --auto'.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/gc.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 00fff59bdb..7ba9c6f7c9 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
>  {
>  	int i, found_selected = 0;
>  	int result = 0;
> +	struct lock_file lk;
> +	struct repository *r = the_repository;
> +	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
> +
> +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
> +		/*
> +		 * Another maintenance command is running.
> +		 *
> +		 * If --auto was provided, then it is likely due to a
> +		 * recursive process stack. Do not report an error in
> +		 * that case.
> +		 */
> +		if (!opts->auto_flag && !opts->quiet)
> +			warning(_("lock file '%s' exists, skipping maintenance"),
> +				lock_path);
> +		free(lock_path);
> +		return 0;
> +	}
> +	free(lock_path);
>  
>  	for (i = 0; !found_selected && i < TASK__COUNT; i++)
>  		found_selected = tasks[i].selected_order >= 0;

There's now two different lock strategies in builtin/gc.c, the existing
one introduced in 64a99eb476 ("gc: reject if another gc is running,
unless --force is given", 2013-08-08) where we write the hostname to the
gc.pid file, and then discard the lockfile depending on a heuristic of
whether or not it's the same etc., and this one.

With this as an entry point we'll entirely do away with the old one
since we don't use the "gc --auto" entry point.

All of that may or may not be desirable, but I think a description in
the docs & tests for how these lock files should interact would be
helpful. E.g. writing a different hostname in the gc lockfile and
setting the time on it with with "test-tool chmtime" will cause it to
plow ahead, but doing the same for "git maintenance" will stop it in its
tracks no matter the time or content.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 9/21/2020 9:36 AM, Ævar Arnfjörð Bjarmason wrote:
> There's now two different lock strategies in builtin/gc.c, the existing
> one introduced in 64a99eb476 ("gc: reject if another gc is running,
> unless --force is given", 2013-08-08) where we write the hostname to the
> gc.pid file, and then discard the lockfile depending on a heuristic of
> whether or not it's the same etc., and this one.
> 
> With this as an entry point we'll entirely do away with the old one
> since we don't use the "gc --auto" entry point.

Users could still erroneously launch two `git gc` commands concurrently
and that will not use the maintenance.lock file. The GC lock is worth
keeping around until we redirect the `gc` command to be an alias that
runs `git maintenance run --task=gc [options]`.

> All of that may or may not be desirable, but I think a description in
> the docs & tests for how these lock files should interact would be
> helpful. E.g. writing a different hostname in the gc lockfile and
> setting the time on it with with "test-tool chmtime" will cause it to
> plow ahead, but doing the same for "git maintenance" will stop it in its
> tracks no matter the time or content.

Thanks for the pointers to create this test. I'll try to get to it
soon. For now, here is a tracking issue [1].

[1] https://github.com/gitgitgadget/git/issues/737

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

>> With this as an entry point we'll entirely do away with the old one
>> since we don't use the "gc --auto" entry point.
>
> Users could still erroneously launch two `git gc` commands concurrently
> and that will not use the maintenance.lock file. The GC lock is worth
> keeping around until we redirect the `gc` command to be an alias that
> runs `git maintenance run --task=gc [options]`.

Yes, that is prudent.  And as long as the traditional gc lock stops
the maintenance command from running the gc task, everthing would
work consistently ;-)

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 21, 2020

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 21, 2020

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 21, 2020

This patch series was integrated into seen via git@c4dd4c6.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 22, 2020

This patch series was integrated into seen via git@a70bcb6.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 25, 2020

This patch series was integrated into seen via git@48794ac.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 25, 2020

This patch series was integrated into next via git@48794ac.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 25, 2020

This patch series was integrated into master via git@48794ac.

@gitgitgadget gitgitgadget bot added the master label Sep 25, 2020
@gitgitgadget gitgitgadget bot closed this Sep 25, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Sep 25, 2020

Closed via 48794ac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant