Skip to content

Commit

Permalink
Merge branch 'ds/maintenance' into seen
Browse files Browse the repository at this point in the history
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.

* ds/maintenance: (21 commits)
  midx: use start_delayed_progress()
  maintenance: add pack-files auto condition
  maintenance: create auto condition for loose-objects
  maintenance: add auto condition for commit-graph task
  maintenance: use pointers to check --auto
  maintenance: create maintenance.<task>.enabled config
  maintenance: auto-size pack-files batch
  maintenance: add pack-files task
  maintenance: add loose-objects task
  maintenance: add fetch task
  maintenance: take a lock on the objects directory
  maintenance: add --task option
  maintenance: add commit-graph task
  maintenance: initialize task array and hashmap
  maintenance: replace run_auto_gc()
  maintenance: add --quiet option
  maintenance: create basic maintenance runner
  gc: drop the_repository in log location
  gc: use repo config
  gc: use repository in too_many_loose_objects()
  ...
  • Loading branch information
gitster committed Jul 9, 2020
2 parents b497479 + 111a6e6 commit 2d70fc6
Show file tree
Hide file tree
Showing 27 changed files with 1,265 additions and 92 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -90,6 +90,7 @@
/git-ls-tree
/git-mailinfo
/git-mailsplit
/git-maintenance
/git-merge
/git-merge-base
/git-merge-index
Expand Down
2 changes: 2 additions & 0 deletions Documentation/config.txt
Expand Up @@ -396,6 +396,8 @@ include::config/mailinfo.txt[]

include::config/mailmap.txt[]

include::config/maintenance.txt[]

include::config/man.txt[]

include::config/merge.txt[]
Expand Down
32 changes: 32 additions & 0 deletions Documentation/config/maintenance.txt
@@ -0,0 +1,32 @@
maintenance.<task>.enabled::
This boolean config option controls whether the maintenance task
with name `<task>` is run when no `--task` option is specified.
By default, only `maintenance.gc.enabled` is true.

maintenance.commit-graph.auto::
This integer config option controls how often the `commit-graph` task
should be run as part of `git maintenance run --auto`. If zero, then
the `commit-graph` task will not run with the `--auto` option. A
negative value will force the task to run every time. Otherwise, a
positive value implies the command should run when the number of
reachable commits that are not in the commit-graph file is at least
the value of `maintenance.commit-graph.auto`. The default value is
100.

maintenance.loose-objects.auto::
This integer config option controls how often the `loose-objects` task
should be run as part of `git maintenance run --auto`. If zero, then
the `loose-objects` task will not run with the `--auto` option. A
negative value will force the task to run every time. Otherwise, a
positive value implies the command should run when the number of
loose objects is at least the value of `maintenance.loose-objects.auto`.
The default value is 100.

maintenance.pack-files.auto::
This integer config option controls how often the `pack-files` task
should be run as part of `git maintenance run --auto`. If zero, then
the `pack-files` task will not run with the `--auto` option. A
negative value will force the task to run every time. Otherwise, a
positive value implies the command should run when the number of
pack-files not in the multi-pack-index is at least the value of
`maintenance.pack-files.auto`. The default value is 10.
5 changes: 3 additions & 2 deletions Documentation/fetch-options.txt
Expand Up @@ -86,9 +86,10 @@ ifndef::git-pull[]
Allow several <repository> and <group> arguments to be
specified. No <refspec>s may be specified.

--[no-]maintenance::
--[no-]auto-gc::
Run `git gc --auto` at the end to perform garbage collection
if needed. This is enabled by default.
Run `git maintenance run --auto` at the end to perform garbage
collection if needed. This is enabled by default.

--[no-]write-commit-graph::
Write a commit-graph after fetching. This overrides the config
Expand Down
7 changes: 4 additions & 3 deletions Documentation/git-clone.txt
Expand Up @@ -78,9 +78,10 @@ repository using this option and then delete branches (or use any
other Git command that makes any existing commit unreferenced) in the
source repository, some objects may become unreferenced (or dangling).
These objects may be removed by normal Git operations (such as `git commit`)
which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
If these objects are removed and were referenced by the cloned repository,
then the cloned repository will become corrupt.
which automatically call `git maintenance run --auto` and `git gc --auto`.
(See linkgit:git-maintenance[1] and linkgit:git-gc[1].) If these objects
are removed and were referenced by the cloned repository, then the cloned
repository will become corrupt.
+
Note that running `git repack` without the `--local` option in a repository
cloned with `--shared` will copy objects from the source repository into a pack
Expand Down
124 changes: 124 additions & 0 deletions Documentation/git-maintenance.txt
@@ -0,0 +1,124 @@
git-maintenance(1)
==================

NAME
----
git-maintenance - Run tasks to optimize Git repository data


SYNOPSIS
--------
[verse]
'git maintenance' run [<options>]


DESCRIPTION
-----------
Run tasks to optimize Git repository data, speeding up other Git commands
and reducing storage requirements for the repository.
+
Git commands that add repository data, such as `git add` or `git fetch`,
are optimized for a responsive user experience. These commands do not take
time to optimize the Git data, since such optimizations scale with the full
size of the repository while these user commands each perform a relatively
small action.
+
The `git maintenance` command provides flexibility for how to optimize the
Git repository.

SUBCOMMANDS
-----------

run::
Run one or more maintenance tasks. If one or more `--task` options
are specified, then those tasks are run in that order. Otherwise,
the tasks are determined by which `maintenance.<task>.enabled`
config options are true. By default, only `maintenance.gc.enabled`
is true.

TASKS
-----

commit-graph::
The `commit-graph` job updates the `commit-graph` files incrementally,
then verifies that the written data is correct. If the new layer has an
issue, then the chain file is removed and the `commit-graph` is
rewritten from scratch.
+
The verification only checks the top layer of the `commit-graph` chain.
If the incremental write merged the new commits with at least one
existing layer, then there is potential for on-disk corruption being
carried forward into the new file. This will be noticed and the new
commit-graph file will be clean as Git reparses the commit data from
the object database.
+
The incremental write is safe to run alongside concurrent Git processes
since it will not expire `.graph` files that were in the previous
`commit-graph-chain` file. They will be deleted by a later run based on
the expiration delay.

fetch::
The `fetch` job updates the object directory with the latest objects
from all registered remotes. For each remote, a `git fetch` command
is run. The refmap is custom to avoid updating local or remote
branches (those in `refs/heads` or `refs/remotes`). Instead, the
remote refs are stored in `refs/hidden/<remote>/`. Also, no tags are
updated.
+
This means that foreground fetches are still required to update the
remote refs, but the users is notified when the branches and tags are
updated on the remote.

gc::
Cleanup unnecessary files and optimize the local repository. "GC"
stands for "garbage collection," but this task performs many
smaller tasks. This task can be rather expensive for large
repositories, as it repacks all Git objects into a single pack-file.
It can also be disruptive in some situations, as it deletes stale
data.

loose-objects::
The `loose-objects` job cleans up loose objects and places them into
pack-files. In order to prevent race conditions with concurrent Git
commands, it follows a two-step process. First, it deletes any loose
objects that already exist in a pack-file; concurrent Git processes
will examine the pack-file for the object data instead of the loose
object. Second, it creates a new pack-file (starting with "loose-")
containing a batch of loose objects. The batch size is limited to 50
thousand objects to prevent the job from taking too long on a
repository with many loose objects.

pack-files::
The `pack-files` job incrementally repacks the object directory
using the `multi-pack-index` feature. In order to prevent race
conditions with concurrent Git commands, it follows a two-step
process. First, it deletes any pack-files included in the
`multi-pack-index` where none of the objects in the
`multi-pack-index` reference those pack-files; this only happens
if all objects in the pack-file are also stored in a newer
pack-file. Second, it selects a group of pack-files whose "expected
size" is below the batch size until the group has total expected
size at least the batch size; see the `--batch-size` option for
the `repack` subcommand in linkgit:git-multi-pack-index[1]. The
default batch-size is zero, which is a special case that attempts
to repack all pack-files into a single pack-file.

OPTIONS
-------
--auto::
When combined with the `run` subcommand, run maintenance tasks
only if certain thresholds are met. For example, the `gc` task
runs when the number of loose objects exceeds the number stored
in the `gc.auto` config setting, or when the number of pack-files
exceeds the `gc.autoPackLimit` config setting.

--quiet::
Do not report progress or other information over `stderr`.

--task=<task>::
If this option is specified one or more times, then only run the
specified tasks in the specified order.

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions builtin.h
Expand Up @@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
int cmd_ls_remote(int argc, const char **argv, const char *prefix);
int cmd_mailinfo(int argc, const char **argv, const char *prefix);
int cmd_mailsplit(int argc, const char **argv, const char *prefix);
int cmd_maintenance(int argc, const char **argv, const char *prefix);
int cmd_merge(int argc, const char **argv, const char *prefix);
int cmd_merge_base(int argc, const char **argv, const char *prefix);
int cmd_merge_index(int argc, const char **argv, const char *prefix);
Expand Down
2 changes: 1 addition & 1 deletion builtin/am.c
Expand Up @@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
if (!state->rebasing) {
am_destroy(state);
close_object_store(the_repository->objects);
run_auto_gc(state->quiet);
run_auto_maintenance(state->quiet);
}
}

Expand Down
2 changes: 1 addition & 1 deletion builtin/commit.c
Expand Up @@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
git_test_write_commit_graph_or_die();

repo_rerere(the_repository, 0);
run_auto_gc(quiet);
run_auto_maintenance(quiet);
run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
if (amend && !no_post_rewrite) {
commit_post_rewrite(the_repository, current_head, &oid);
Expand Down
6 changes: 4 additions & 2 deletions builtin/fetch.c
Expand Up @@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = {
OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
N_("report that we have only objects reachable from this object")),
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_BOOL(0, "maintenance", &enable_auto_gc,
N_("run 'maintenance --auto' after fetching")),
OPT_BOOL(0, "auto-gc", &enable_auto_gc,
N_("run 'gc --auto' after fetching")),
N_("run 'maintenance --auto' after fetching")),
OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
N_("check for forced-updates on all updated branches")),
OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
Expand Down Expand Up @@ -1882,7 +1884,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
close_object_store(the_repository->objects);

if (enable_auto_gc)
run_auto_gc(verbosity < 0);
run_auto_maintenance(verbosity < 0);

return result;
}

0 comments on commit 2d70fc6

Please sign in to comment.