Skip to content

Commit

Permalink
Merge branch 'jt/fetch-cdn-offload' into pu
Browse files Browse the repository at this point in the history
WIP for allowing a response to "git fetch" to instruct the bulk of
the pack contents to be instead taken from elsewhere (aka CDN).

* jt/fetch-cdn-offload:
  SQUASH???
  upload-pack: send part of packfile response as uri
  fetch-pack: support more than one pack lockfile
  upload-pack: refactor reading of pack-objects out
  Documentation: add Packfile URIs design doc
  Documentation: order protocol v2 sections
  http-fetch: support fetching packfiles by URL
  http: improve documentation of http_pack_request
  http: use --stdin when getting dumb HTTP pack
  • Loading branch information
gitster committed Aug 14, 2019
2 parents f283c80 + 1eba6eb commit 18ae2f5
Show file tree
Hide file tree
Showing 17 changed files with 669 additions and 132 deletions.
8 changes: 7 additions & 1 deletion Documentation/git-http-fetch.txt
Expand Up @@ -9,7 +9,7 @@ git-http-fetch - Download from a remote Git repository via HTTP
SYNOPSIS
--------
[verse]
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin] <commit> <url>
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url>

DESCRIPTION
-----------
Expand Down Expand Up @@ -40,6 +40,12 @@ commit-id::

<commit-id>['\t'<filename-as-in--w>]

--packfile::
Instead of a commit id on the command line (which is not expected in
this case), 'git http-fetch' fetches the packfile directly at the given
URL and uses index-pack to generate corresponding .idx and .keep files.
The output of index-pack is printed to stdout.

--recover::
Verify that everything reachable from target is fetched. Used after
an earlier fetch is interrupted.
Expand Down
78 changes: 78 additions & 0 deletions Documentation/technical/packfile-uri.txt
@@ -0,0 +1,78 @@
Packfile URIs
=============

This feature allows servers to serve part of their packfile response as URIs.
This allows server designs that improve scalability in bandwidth and CPU usage
(for example, by serving some data through a CDN), and (in the future) provides
some measure of resumability to clients.

This feature is available only in protocol version 2.

Protocol
--------

The server advertises `packfile-uris`.

If the client then communicates which protocols (HTTPS, etc.) it supports with
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
directly before the `packfile` section (right after `wanted-refs` if it is
sent) containing URIs of any of the given protocols. The URIs point to
packfiles that use only features that the client has declared that it supports
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
this section.

Clients then should understand that the returned packfile could be incomplete,
and that it needs to download all the given URIs before the fetch or clone is
complete.

Server design
-------------

The server can be trivially made compatible with the proposed protocol by
having it advertise `packfile-uris`, tolerating the client sending
`packfile-uris`, and never sending any `packfile-uris` section. But we should
include some sort of non-trivial implementation in the Minimum Viable Product,
at least so that we can test the client.

This is the implementation: a feature, marked experimental, that allows the
server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
with the given sha1 can be replaced by the given URI. This allows, for example,
servers to delegate serving of large blobs to CDNs.

Client design
-------------

While fetching, the client needs to remember the list of URIs and cannot
declare that the fetch is complete until all URIs have been downloaded as
packfiles.

The division of work (initial fetch + additional URIs) introduces convenient
points for resumption of an interrupted clone - such resumption can be done
after the Minimum Viable Product (see "Future work").

The client can inhibit this feature (i.e. refrain from sending the
`packfile-uris` parameter) by passing --no-packfile-uris to `git fetch`.

Future work
-----------

The protocol design allows some evolution of the server and client without any
need for protocol changes, so only a small-scoped design is included here to
form the MVP. For example, the following can be done:

* On the server, a long-running process that takes in entire requests and
outputs a list of URIs and the corresponding inclusion and exclusion sets of
objects. This allows, e.g., signed URIs to be used and packfiles for common
requests to be cached.
* On the client, resumption of clone. If a clone is interrupted, information
could be recorded in the repository's config and a "clone-resume" command
can resume the clone in progress. (Resumption of subsequent fetches is more
difficult because that must deal with the user wanting to use the repository
even after the fetch was interrupted.)

There are some possible features that will require a change in protocol:

* Additional HTTP headers (e.g. authentication)
* Byte range support
* Different file formats referenced by URIs (e.g. raw object)
44 changes: 34 additions & 10 deletions Documentation/technical/protocol-v2.txt
Expand Up @@ -323,13 +323,26 @@ included in the client's request:
indicating its sideband (1, 2, or 3), and the server may send "0005\2"
(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.

If the 'packfile-uris' feature is advertised, the following argument
can be included in the client's request as well as the potential
addition of the 'packfile-uris' section in the server's response as
explained below.

packfile-uris <comma-separated list of protocols>
Indicates to the server that the client is willing to receive
URIs of any of the given protocols in place of objects in the
sent packfile. Before performing the connectivity check, the
client should download from all given URIs. Currently, the
protocols supported are "http" and "https".

The response of `fetch` is broken into a number of sections separated by
delimiter packets (0001), with each section beginning with its section
header.
header. Most sections are sent only when the packfile is sent.

output = *section
section = (acknowledgments | shallow-info | wanted-refs | packfile)
(flush-pkt | delim-pkt)
output = acknowledgements flush-pkt |
[acknowledgments delim-pkt] [shallow-info delim-pkt]
[wanted-refs delim-pkt] [packfile-uris delim-pkt]
packfile flush-pkt

acknowledgments = PKT-LINE("acknowledgments" LF)
(nak | *ack)
Expand All @@ -347,13 +360,17 @@ header.
*PKT-LINE(wanted-ref LF)
wanted-ref = obj-id SP refname

packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)

packfile = PKT-LINE("packfile" LF)
*PKT-LINE(%x01-03 *%x00-ff)

acknowledgments section
* If the client determines that it is finished with negotiations
by sending a "done" line, the acknowledgments sections MUST be
omitted from the server's response.
* If the client determines that it is finished with negotiations by
sending a "done" line (thus requiring the server to send a packfile),
the acknowledgments sections MUST be omitted from the server's
response.

* Always begins with the section header "acknowledgments"

Expand Down Expand Up @@ -404,9 +421,6 @@ header.
which the client has not indicated was shallow as a part of
its request.

* This section is only included if a packfile section is also
included in the response.

wanted-refs section
* This section is only included if the client has requested a
ref using a 'want-ref' line and if a packfile section is also
Expand All @@ -420,6 +434,16 @@ header.
* The server MUST NOT send any refs which were not requested
using 'want-ref' lines.

packfile-uris section
* This section is only included if the client sent
'packfile-uris' and the server has at least one such URI to
send.

* Always begins with the section header "packfile-uris".

* For each URI the server sends, it sends a hash of the pack's
contents (as output by git index-pack) followed by the URI.

packfile section
* This section is only included if the client has sent 'want'
lines in its request and either requested that no more
Expand Down
17 changes: 11 additions & 6 deletions builtin/fetch-pack.c
Expand Up @@ -48,8 +48,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
struct ref **sought = NULL;
int nr_sought = 0, alloc_sought = 0;
int fd[2];
char *pack_lockfile = NULL;
char **pack_lockfile_ptr = NULL;
struct string_list pack_lockfiles = STRING_LIST_INIT_DUP;
struct string_list *pack_lockfiles_ptr = NULL;
struct child_process *conn;
struct fetch_pack_args args;
struct oid_array shallow = OID_ARRAY_INIT;
Expand Down Expand Up @@ -138,7 +138,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
}
if (!strcmp("--lock-pack", arg)) {
args.lock_pack = 1;
pack_lockfile_ptr = &pack_lockfile;
pack_lockfiles_ptr = &pack_lockfiles;
continue;
}
if (!strcmp("--check-self-contained-and-connected", arg)) {
Expand Down Expand Up @@ -239,10 +239,15 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
}

ref = fetch_pack(&args, fd, ref, sought, nr_sought,
&shallow, pack_lockfile_ptr, version);
if (pack_lockfile) {
printf("lock %s\n", pack_lockfile);
&shallow, pack_lockfiles_ptr, version);
if (pack_lockfiles.nr) {
int i;

printf("lock %s\n", pack_lockfiles.items[0].string);
fflush(stdout);
for (i = 1; i < pack_lockfiles.nr; i++)
warning(_("Lockfile created but not reported: %s"),
pack_lockfiles.items[i].string);
}
if (args.check_self_contained_and_connected &&
args.self_contained_and_connected) {
Expand Down
76 changes: 76 additions & 0 deletions builtin/pack-objects.c
Expand Up @@ -115,6 +115,8 @@ static unsigned long window_memory_limit = 0;

static struct list_objects_filter_options filter_options;

static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;

enum missing_action {
MA_ERROR = 0, /* fail if any missing objects are encountered */
MA_ALLOW_ANY, /* silently allow ALL missing objects */
Expand All @@ -123,6 +125,15 @@ enum missing_action {
static enum missing_action arg_missing_action;
static show_object_fn fn_show_object;

struct configured_exclusion {
struct oidmap_entry e;
char *pack_hash_hex;
char *uri;
};
static struct oidmap configured_exclusions;

static struct oidset excluded_by_config;

/*
* stats
*/
Expand Down Expand Up @@ -837,6 +848,25 @@ static off_t write_reused_pack(struct hashfile *f)
return reuse_packfile_offset - sizeof(struct pack_header);
}

static void write_excluded_by_configs(void)
{
struct oidset_iter iter;
const struct object_id *oid;

oidset_iter_init(&excluded_by_config, &iter);
while ((oid = oidset_iter_next(&iter))) {
struct configured_exclusion *ex =
oidmap_get(&configured_exclusions, oid);

if (!ex)
BUG("configured exclusion wasn't configured");
write_in_full(1, ex->pack_hash_hex, strlen(ex->pack_hash_hex));
write_in_full(1, " ", 1);
write_in_full(1, ex->uri, strlen(ex->uri));
write_in_full(1, "\n", 1);
}
}

static const char no_split_warning[] = N_(
"disabling bitmap writing, packs are split due to pack.packSizeLimit"
);
Expand Down Expand Up @@ -1133,6 +1163,25 @@ static int want_object_in_pack(const struct object_id *oid,
}
}

if (uri_protocols.nr) {
struct configured_exclusion *ex =
oidmap_get(&configured_exclusions, oid);
int i;
const char *p;

if (ex) {
for (i = 0; i < uri_protocols.nr; i++) {
if (skip_prefix(ex->uri,
uri_protocols.items[i].string,
&p) &&
*p == ':') {
oidset_insert(&excluded_by_config, oid);
return 0;
}
}
}
}

return 1;
}

Expand Down Expand Up @@ -2723,6 +2772,29 @@ static int git_pack_config(const char *k, const char *v, void *cb)
pack_idx_opts.version);
return 0;
}
if (!strcmp(k, "uploadpack.blobpackfileuri")) {
struct configured_exclusion *ex = xmalloc(sizeof(*ex));
const char *oid_end, *pack_end;
/*
* Stores the pack hash. This is not a true object ID, but is
* of the same form.
*/
struct object_id pack_hash;

if (parse_oid_hex(v, &ex->e.oid, &oid_end) ||
*oid_end != ' ' ||
parse_oid_hex(oid_end + 1, &pack_hash, &pack_end) ||
*pack_end != ' ')
die(_("value of uploadpack.blobpackfileuri must be "
"of the form '<object-hash> <pack-hash> <uri>' (got '%s')"), v);
if (oidmap_get(&configured_exclusions, &ex->e.oid))
die(_("object already configured in another "
"uploadpack.blobpackfileuri (got '%s')"), v);
ex->pack_hash_hex = xcalloc(1, pack_end - oid_end);
memcpy(ex->pack_hash_hex, oid_end + 1, pack_end - oid_end - 1);
ex->uri = xstrdup(pack_end + 1);
oidmap_put(&configured_exclusions, ex);
}
return git_default_config(k, v, cb);
}

Expand Down Expand Up @@ -3320,6 +3392,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
N_("do not pack objects in promisor packfiles")),
OPT_BOOL(0, "delta-islands", &use_delta_islands,
N_("respect islands during delta compression")),
OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
N_("protocol"),
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
OPT_END(),
};

Expand Down Expand Up @@ -3508,6 +3583,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
the_repository);
}

write_excluded_by_configs();
trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
trace2_region_leave("pack-objects", "write-pack-file", the_repository);
Expand Down
8 changes: 5 additions & 3 deletions connected.c
Expand Up @@ -42,10 +42,12 @@ int check_connected(oid_iterate_fn fn, void *cb_data,

if (transport && transport->smart_options &&
transport->smart_options->self_contained_and_connected &&
transport->pack_lockfile &&
strip_suffix(transport->pack_lockfile, ".keep", &base_len)) {
transport->pack_lockfiles.nr == 1 &&
strip_suffix(transport->pack_lockfiles.items[0].string,
".keep", &base_len)) {
struct strbuf idx_file = STRBUF_INIT;
strbuf_add(&idx_file, transport->pack_lockfile, base_len);
strbuf_add(&idx_file, transport->pack_lockfiles.items[0].string,
base_len);
strbuf_addstr(&idx_file, ".idx");
new_pack = add_packed_git(idx_file.buf, idx_file.len, 1);
strbuf_release(&idx_file);
Expand Down

0 comments on commit 18ae2f5

Please sign in to comment.