Skip to content

Commit

Permalink
Merge branch 'cc/multi-promisor'
Browse files Browse the repository at this point in the history
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.

* cc/multi-promisor:
  Move core_partial_clone_filter_default to promisor-remote.c
  Move repository_format_partial_clone to promisor-remote.c
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  remote: add promisor and partial clone config to the doc
  partial-clone: add multiple remotes in the doc
  t0410: test fetching from many promisor remotes
  builtin/fetch: remove unique promisor remote limitation
  promisor-remote: parse remote.*.partialclonefilter
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: use repository_format_partial_clone
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: implement promisor_remote_get_direct()
  Add initial support for many promisor remotes
  fetch-object: make functions return an error code
  t0410: remove pipes after git commands
  • Loading branch information
gitster committed Sep 18, 2019
2 parents de67293 + 4ca9474 commit b9ac6c5
Show file tree
Hide file tree
Showing 27 changed files with 523 additions and 172 deletions.
8 changes: 8 additions & 0 deletions Documentation/config/remote.txt
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,11 @@ remote.<name>.pruneTags::
+
See also `remote.<name>.prune` and the PRUNING section of
linkgit:git-fetch[1].

remote.<name>.promisor::
When set to true, this remote will be used to fetch promisor
objects.

remote.<name>.partialclonefilter::
The filter that will be applied when fetching from this
promisor remote.
117 changes: 84 additions & 33 deletions Documentation/technical/partial-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,20 @@ advance* during clone and fetch operations and thereby reduce download
times and disk usage. Missing objects can later be "demand fetched"
if/when needed.

A remote that can later provide the missing objects is called a
promisor remote, as it promises to send the objects when
requested. Initialy Git supported only one promisor remote, the origin
remote from which the user cloned and that was configured in the
"extensions.partialClone" config option. Later support for more than
one promisor remote has been implemented.

Use of partial clone requires that the user be online and the origin
remote be available for on-demand fetching of missing objects. This may
or may not be problematic for the user. For example, if the user can
stay within the pre-selected subset of the source tree, they may not
encounter any missing objects. Alternatively, the user could try to
pre-fetch various objects if they know that they are going offline.
remote or other promisor remotes be available for on-demand fetching
of missing objects. This may or may not be problematic for the user.
For example, if the user can stay within the pre-selected subset of
the source tree, they may not encounter any missing objects.
Alternatively, the user could try to pre-fetch various objects if they
know that they are going offline.


Non-Goals
Expand Down Expand Up @@ -100,18 +108,18 @@ or commits that reference missing trees.
Handling Missing Objects
------------------------

- An object may be missing due to a partial clone or fetch, or missing due
to repository corruption. To differentiate these cases, the local
repository specially indicates such filtered packfiles obtained from the
promisor remote as "promisor packfiles".
- An object may be missing due to a partial clone or fetch, or missing
due to repository corruption. To differentiate these cases, the
local repository specially indicates such filtered packfiles
obtained from promisor remotes as "promisor packfiles".
+
These promisor packfiles consist of a "<name>.promisor" file with
arbitrary contents (like the "<name>.keep" files), in addition to
their "<name>.pack" and "<name>.idx" files.

- The local repository considers a "promisor object" to be an object that
it knows (to the best of its ability) that the promisor remote has promised
that it has, either because the local repository has that object in one of
it knows (to the best of its ability) that promisor remotes have promised
that they have, either because the local repository has that object in one of
its promisor packfiles, or because another promisor object refers to it.
+
When Git encounters a missing object, Git can see if it is a promisor object
Expand All @@ -123,12 +131,12 @@ expensive-to-modify list of missing objects.[a]
- Since almost all Git code currently expects any referenced object to be
present locally and because we do not want to force every command to do
a dry-run first, a fallback mechanism is added to allow Git to attempt
to dynamically fetch missing objects from the promisor remote.
to dynamically fetch missing objects from promisor remotes.
+
When the normal object lookup fails to find an object, Git invokes
fetch-object to try to get the object from the server and then retry
the object lookup. This allows objects to be "faulted in" without
complicated prediction algorithms.
promisor_remote_get_direct() to try to get the object from a promisor
remote and then retry the object lookup. This allows objects to be
"faulted in" without complicated prediction algorithms.
+
For efficiency reasons, no check as to whether the missing object is
actually a promisor object is performed.
Expand Down Expand Up @@ -157,8 +165,7 @@ and prefetch those objects in bulk.
+
We are not happy with this global variable and would like to remove it,
but that requires significant refactoring of the object code to pass an
additional flag. We hope that concurrent efforts to add an ODB API can
encompass this.
additional flag.


Fetching Missing Objects
Expand All @@ -182,21 +189,63 @@ has been updated to not use any object flags when the corresponding argument
though they are not necessary.


Using many promisor remotes
---------------------------

Many promisor remotes can be configured and used.

This allows for example a user to have multiple geographically-close
cache servers for fetching missing blobs while continuing to do
filtered `git-fetch` commands from the central server.

When fetching objects, promisor remotes are tried one after the other
until all the objects have been fetched.

Remotes that are considered "promisor" remotes are those specified by
the following configuration variables:

- `extensions.partialClone = <name>`

- `remote.<name>.promisor = true`

- `remote.<name>.partialCloneFilter = ...`

Only one promisor remote can be configured using the
`extensions.partialClone` config variable. This promisor remote will
be the last one tried when fetching objects.

We decided to make it the last one we try, because it is likely that
someone using many promisor remotes is doing so because the other
promisor remotes are better for some reason (maybe they are closer or
faster for some kind of objects) than the origin, and the origin is
likely to be the remote specified by extensions.partialClone.

This justification is not very strong, but one choice had to be made,
and anyway the long term plan should be to make the order somehow
fully configurable.

For now though the other promisor remotes will be tried in the order
they appear in the config file.

Current Limitations
-------------------

- The remote used for a partial clone (or the first partial fetch
following a regular clone) is marked as the "promisor remote".
- It is not possible to specify the order in which the promisor
remotes are tried in other ways than the order in which they appear
in the config file.
+
We are currently limited to a single promisor remote and only that
remote may be used for subsequent partial fetches.
It is also not possible to specify an order to be used when fetching
from one remote and a different order when fetching from another
remote.

- It is not possible to push only specific objects to a promisor
remote.
+
We accept this limitation because we believe initial users of this
feature will be using it on repositories with a strong single central
server.
It is not possible to push at the same time to multiple promisor
remote in a specific order.

- Dynamic object fetching will only ask the promisor remote for missing
objects. We assume that the promisor remote has a complete view of the
- Dynamic object fetching will only ask promisor remotes for missing
objects. We assume that promisor remotes have a complete view of the
repository and can satisfy all such requests.

- Repack essentially treats promisor and non-promisor packfiles as 2
Expand All @@ -218,15 +267,17 @@ server.
Future Work
-----------

- Allow more than one promisor remote and define a strategy for fetching
missing objects from specific promisor remotes or of iterating over the
set of promisor remotes until a missing object is found.
- Improve the way to specify the order in which promisor remotes are
tried.
+
A user might want to have multiple geographically-close cache servers
for fetching missing blobs while continuing to do filtered `git-fetch`
commands from the central server, for example.
For example this could allow to specify explicitly something like:
"When fetching from this remote, I want to use these promisor remotes
in this order, though, when pushing or fetching to that remote, I want
to use those promisor remotes in that order."

- Allow pushing to promisor remotes.
+
Or the user might want to work in a triangular work flow with multiple
The user might want to work in a triangular work flow with multiple
promisor remotes that each have an incomplete view of the repository.

- Allow repack to work on promisor packfiles (while keeping them distinct
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -884,7 +884,6 @@ LIB_OBJS += ewah/ewah_io.o
LIB_OBJS += ewah/ewah_rlw.o
LIB_OBJS += exec-cmd.o
LIB_OBJS += fetch-negotiator.o
LIB_OBJS += fetch-object.o
LIB_OBJS += fetch-pack.o
LIB_OBJS += fsck.o
LIB_OBJS += fsmonitor.o
Expand Down Expand Up @@ -948,6 +947,7 @@ LIB_OBJS += preload-index.o
LIB_OBJS += pretty.o
LIB_OBJS += prio-queue.o
LIB_OBJS += progress.o
LIB_OBJS += promisor-remote.o
LIB_OBJS += prompt.o
LIB_OBJS += protocol.o
LIB_OBJS += quote.o
Expand Down
5 changes: 3 additions & 2 deletions builtin/cat-file.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "sha1-array.h"
#include "packfile.h"
#include "object-store.h"
#include "promisor-remote.h"

struct batch_options {
int enabled;
Expand Down Expand Up @@ -524,8 +525,8 @@ static int batch_objects(struct batch_options *opt)
if (opt->all_objects) {
struct object_cb_data cb;

if (repository_format_partial_clone)
warning("This repository has extensions.partialClone set. Some objects may not be loaded.");
if (has_promisor_remote())
warning("This repository uses promisor remotes. Some objects may not be loaded.");

cb.opt = opt;
cb.expand = &data;
Expand Down
29 changes: 10 additions & 19 deletions builtin/fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "list-objects-filter-options.h"
#include "commit-reach.h"
#include "branch.h"
#include "promisor-remote.h"

#define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)

Expand Down Expand Up @@ -1559,37 +1560,27 @@ static inline void fetch_one_setup_partial(struct remote *remote)
* If no prior partial clone/fetch and the current fetch DID NOT
* request a partial-fetch, do a normal fetch.
*/
if (!repository_format_partial_clone && !filter_options.choice)
if (!has_promisor_remote() && !filter_options.choice)
return;

/*
* If this is the FIRST partial-fetch request, we enable partial
* on this repo and remember the given filter-spec as the default
* for subsequent fetches to this remote.
* If this is a partial-fetch request, we enable partial on
* this repo if not already enabled and remember the given
* filter-spec as the default for subsequent fetches to this
* remote.
*/
if (!repository_format_partial_clone && filter_options.choice) {
if (filter_options.choice) {
partial_clone_register(remote->name, &filter_options);
return;
}

/*
* We are currently limited to only ONE promisor remote and only
* allow partial-fetches from the promisor remote.
*/
if (strcmp(remote->name, repository_format_partial_clone)) {
if (filter_options.choice)
die(_("--filter can only be used with the remote "
"configured in extensions.partialClone"));
return;
}

/*
* Do a partial-fetch from the promisor remote using either the
* explicitly given filter-spec or inherit the filter-spec from
* the config.
*/
if (!filter_options.choice)
partial_clone_get_default_filter_spec(&filter_options);
partial_clone_get_default_filter_spec(&filter_options, remote->name);
return;
}

Expand Down Expand Up @@ -1710,7 +1701,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;

if (filter_options.choice && !repository_format_partial_clone)
if (filter_options.choice && !has_promisor_remote())
die("--filter can only be used when extensions.partialClone is set");

if (all) {
Expand Down Expand Up @@ -1744,7 +1735,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
}

if (remote) {
if (filter_options.choice || repository_format_partial_clone)
if (filter_options.choice || has_promisor_remote())
fetch_one_setup_partial(remote);
result = fetch_one(remote, argc, argv, prune_tags_ok);
} else {
Expand Down
3 changes: 2 additions & 1 deletion builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "pack-objects.h"
#include "blob.h"
#include "tree.h"
#include "promisor-remote.h"

#define FAILED_RUN "failed to run %s"

Expand Down Expand Up @@ -659,7 +660,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(&prune, prune_expire);
if (quiet)
argv_array_push(&prune, "--no-progress");
if (repository_format_partial_clone)
if (has_promisor_remote())
argv_array_push(&prune,
"--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
Expand Down
8 changes: 4 additions & 4 deletions builtin/index-pack.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#include "thread-utils.h"
#include "packfile.h"
#include "object-store.h"
#include "fetch-object.h"
#include "promisor-remote.h"

static const char index_pack_usage[] =
"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
Expand Down Expand Up @@ -1352,7 +1352,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
sorted_by_pos[i] = &ref_deltas[i];
QSORT(sorted_by_pos, nr_ref_deltas, delta_pos_compare);

if (repository_format_partial_clone) {
if (has_promisor_remote()) {
/*
* Prefetch the delta bases.
*/
Expand All @@ -1366,8 +1366,8 @@ static void fix_unresolved_deltas(struct hashfile *f)
oid_array_append(&to_fetch, &d->oid);
}
if (to_fetch.nr)
fetch_objects(repository_format_partial_clone,
to_fetch.oid, to_fetch.nr);
promisor_remote_get_direct(the_repository,
to_fetch.oid, to_fetch.nr);
oid_array_clear(&to_fetch);
}

Expand Down
3 changes: 2 additions & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "midx.h"
#include "packfile.h"
#include "object-store.h"
#include "promisor-remote.h"

static int delta_base_offset = 1;
static int pack_kept_objects = -1;
Expand Down Expand Up @@ -361,7 +362,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
argv_array_push(&cmd.args, "--all");
argv_array_push(&cmd.args, "--reflog");
argv_array_push(&cmd.args, "--indexed-objects");
if (repository_format_partial_clone)
if (has_promisor_remote())
argv_array_push(&cmd.args, "--exclude-promisor-objects");
if (write_bitmaps > 0)
argv_array_push(&cmd.args, "--write-bitmap-index");
Expand Down
3 changes: 2 additions & 1 deletion cache-tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "cache-tree.h"
#include "object-store.h"
#include "replace-object.h"
#include "promisor-remote.h"

#ifndef DEBUG_CACHE_TREE
#define DEBUG_CACHE_TREE 0
Expand Down Expand Up @@ -357,7 +358,7 @@ static int update_one(struct cache_tree *it,
}

ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
(repository_format_partial_clone &&
(has_promisor_remote() &&
ce_skip_worktree(ce));
if (is_null_oid(oid) ||
(!ce_missing_ok && !has_object_file(oid))) {
Expand Down
2 changes: 0 additions & 2 deletions cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -937,8 +937,6 @@ extern int grafts_replace_parents;
#define GIT_REPO_VERSION 0
#define GIT_REPO_VERSION_READ 1
extern int repository_format_precious_objects;
extern char *repository_format_partial_clone;
extern const char *core_partial_clone_filter_default;
extern int repository_format_worktree_config;

/*
Expand Down
5 changes: 0 additions & 5 deletions config.c
Original file line number Diff line number Diff line change
Expand Up @@ -1379,11 +1379,6 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
return 0;
}

if (!strcmp(var, "core.partialclonefilter")) {
return git_config_string(&core_partial_clone_filter_default,
var, value);
}

if (!strcmp(var, "core.usereplacerefs")) {
read_replace_refs = git_config_bool(var, value);
return 0;
Expand Down
Loading

0 comments on commit b9ac6c5

Please sign in to comment.