From fad8ff4330bdf3f5048c7a3d787e5d44434f04b3 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:19 +0000 Subject: [PATCH 01/24] docs: document bundle URI standard Introduce the idea of bundle URIs to the Git codebase through an aspirational design document. This document includes the full design intended to include the feature in its fully-implemented form. This will take several steps as detailed in the Implementation Plan section. By committing this document now, it can be used to motivate changes necessary to reach these final goals. The design can still be altered as new information is discovered. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/bundle-uri.txt | 477 +++++++++++++++++++++++++ 1 file changed, 477 insertions(+) create mode 100644 Documentation/technical/bundle-uri.txt diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt new file mode 100644 index 00000000000000..0862336c9eb059 --- /dev/null +++ b/Documentation/technical/bundle-uri.txt @@ -0,0 +1,477 @@ +Bundle URIs +=========== + +Bundle URIs are locations where Git can download one or more bundles in +order to bootstrap the object database in advance of fetching the remaining +objects from a remote. + +One goal is to speed up clones and fetches for users with poor network +connectivity to the origin server. Another benefit is to allow heavy users, +such as CI build farms, to use local resources for the majority of Git data +and thereby reducing the load on the origin server. + +To enable the bundle URI feature, users can specify a bundle URI using +command-line options or the origin server can advertise one or more URIs +via a protocol v2 capability. + +Design Goals +------------ + +The bundle URI standard aims to be flexible enough to satisfy multiple +workloads. The bundle provider and the Git client have several choices in +how they create and consume bundle URIs. + +* Bundles can have whatever name the server desires. This name could refer + to immutable data by using a hash of the bundle contents. However, this + means that a new URI will be needed after every update of the content. + This might be acceptable if the server is advertising the URI (and the + server is aware of new bundles being generated) but would not be + ergonomic for users using the command line option. + +* The bundles could be organized specifically for bootstrapping full + clones, but could also be organized with the intention of bootstrapping + incremental fetches. The bundle provider must decide on one of several + organization schemes to minimize client downloads during incremental + fetches, but the Git client can also choose whether to use bundles for + either of these operations. + +* The bundle provider can choose to support full clones, partial clones, + or both. The client can detect which bundles are appropriate for the + repository's partial clone filter, if any. + +* The bundle provider can use a single bundle (for clones only), or a + list of bundles. When using a list of bundles, the provider can specify + whether or not the client needs _all_ of the bundle URIs for a full + clone, or if _any_ one of the bundle URIs is sufficient. This allows the + bundle provider to use different URIs for different geographies. + +* The bundle provider can organize the bundles using heuristics, such as + timestamps or tokens, to help the client prevent downloading bundles it + does not need. When the bundle provider does not provide these + heuristics, the client can use optimizations to minimize how much of the + data is downloaded. + +* The bundle provider does not need to be associated with the Git server. + The client can choose to use the bundle provider without it being + advertised by the Git server. + +* The client can choose to discover bundle providers that are advertised + by the Git server. This could happen during `git clone`, during + `git fetch`, both, or neither. The user can choose which combination + works best for them. + +* The client can choose to configure a bundle provider manually at any + time. The client can also choose to specify a bundle provider manually + as a command-line option to `git clone`. + +Each repository is different and every Git server has different needs. +Hopefully the bundle URI feature is flexible enough to satisfy all needs. +If not, then the feature can be extended through its versioning mechanism. + +Server requirements +------------------- + +To provide a server-side implementation of bundle servers, no other parts +of the Git protocol are required. This allows server maintainers to use +static content solutions such as CDNs in order to serve the bundle files. + +At the current scope of the bundle URI feature, all URIs are expected to +be HTTP(S) URLs where content is downloaded to a local file using a `GET` +request to that URL. The server could include authentication requirements +to those requests with the aim of triggering the configured credential +helper for secure access. (Future extensions could use "file://" URIs or +SSH URIs.) + +Assuming a `200 OK` response from the server, the content at the URL is +expected to be of one of two forms: + +1. Bundle: A Git bundle file of version 2 or higher. + +2. Table of Contents: A plain-text file that is parsable using Git's + config file parser. This file describes one or more bundles that are + accessible from other URIs. + +Any other data provided by the server is considered erroneous. + +Bundle Lists +------------ + +The Git server can advertise bundle URIs using a set of `key=value` pairs. +A bundle URI can also serve a plain-text file in the Git config format +containing these same `key=value` pairs. In both cases, we consider this +to be a _bundle list_. The pairs specify information about the bundles +that the client can use to make decisions for which bundles to download +and which to ignore. + +A few keys focus on properties of the list itself. + +bundle.list.version:: + (Required) This value provides a version number for the table of + contents. If a future Git change enables a feature that needs the + Git client to react to a new key in the table of contents file, + then this version will increment. The only current version number + is 1, and if any other value is specified then Git will fail to + use this file. + +bundle.list.mode:: + (Required) This value has one of two values: `all` and `any`. When + `all` is specified, then the client should expect to need all of + the listed bundle URIs that match their repository's requirements. + When `any` is specified, then the client should expect that any + one of the bundle URIs that match their repository's requirements + will suffice. Typically, the `any` option is used to list a number + of different bundle servers located in different geographies. + +bundle.list.forFetch:: + This boolean value is a signal to the Git client that + the bundle server has designed its bundle organization to assist `git fetch` + commands in addition to `git clone` commands. If this is missing, + Git should not use this table of contents for `git fetch` as it + may lead to excess data downloads. + +The remaining keys include an `` segment which is a server-designated +name for each available bundle. + +bundle..uri:: + (Required) This string value is the URI for downloading bundle + ``. If the URI begins with a protocol (`http://` or + `https://`) then the URI is absolute. Otherwise, the URI is + interpreted as relative to the URI used for the table of contents. + If the URI begins with `/`, then that relative path is relative to + the domain name used for the table of contents. (This use of + relative paths is intended to make it easier to distribute a set + of bundles across a large number of servers or CDNs with different + domain names.) + +bundle..timestamp:: + This value is the number of seconds since Unix epoch (UTC) that + this bundle was created. This is used as an approximation of a + point in time that the bundle matches the data available at the + origin server. + +bundle..requires:: + This string value represents the ID of another bundle. When + present, the server is indicating that this bundle contains a thin + packfile. If the client does not have all necessary objects to + unbundle this packfile, then the client can download the bundle + with the `requires` ID and try again. (Note: it may be beneficial + to allow the server to specify multiple `requires` bundles.) + +bundle..filter:: + This string value represents an object filter that should also + appear in the header of this bundle. The server uses this value to + differentiate different kinds of bundles from which the client can + choose those that match their object filters. + +bundle..list:: + This boolean value indicates whether the client should expect the + content from this URI to be a list (if `true`) or a bundle (if + `false`). This is typically used when `bundle.list.mode` is `any`. + +bundle..location:: + This string value advertises a real-world location from where the + bundle URI is served. This can be used to present the user with an + option for which bundle URI to use. This is only valuable when + `bundle.list.mode` is `any`. + +Here is an example bundle list using the Git config format: + +``` +[bundle "list"] + version = 1 + mode = all + forFetch = true + +[bundle "2022-02-09-1644442601-daily"] + uri = https://bundles.fake.com/git/git/2022-02-09-1644442601-daily.bundle + timestamp = 1644442601 + requires = 2022-02-02-1643842562 + +[bundle "2022-02-02-1643842562"] + uri = https://bundles.fake.com/git/git/2022-02-02-1643842562.bundle + timestamp = 1643842562 + +[bundle "2022-02-09-1644442631-daily-blobless"] + uri = 2022-02-09-1644442631-daily-blobless.bundle + timestamp = 1644442631 + requires = 2022-02-02-1643842568-blobless + filter = blob:none + +[bundle "2022-02-02-1643842568-blobless"] + uri = /git/git/2022-02-02-1643842568-blobless.bundle + timestamp = 1643842568 + filter = blob:none +``` + +This example uses `bundle.list.mode=all` as well as the +`bundle..timestamp` heuristic. It also uses the `bundle..filter` +options to present two parallel sets of bundles: one for full clones and +another for blobless partial clones. + +Suppose that this bundle list was found at the URI +`https://bundles.fake.com/git/git/` and so the two blobless bundles have +the following fully-expanded URIs: + +* `https://bundles.fake.com/git/git/2022-02-09-1644442631-daily-blobless.bundle` +* `https://bundles.fake.com/git/git/2022-02-02-1643842568-blobless.bundle` + +Advertising Bundle URIs +----------------------- + +If a user knows a bundle URI for the repository they are cloning, then they +can specify that URI manually through a command-line option. However, a +Git host may want to advertise bundle URIs during the clone operation, +helping users unaware of the feature. + +The only thing required for this feature is that the server can advertise +one or more bundle URIs. This advertisement takes the form of a new +protocol v2 capability specifically for discovering bundle URIs. + +The client could choose an arbitrary bundle URI as an option _or_ select +the URI with lowest latency by some exploratory checks. It is up to the +bundle provider to decide if having multiple URIs is preferable to a +single URI that is geodistributed through server-side infrastructure. + +Cloning with Bundle URIs +------------------------ + +The primary need for bundle URIs is to speed up clones. The Git client +will interact with bundle URIs according to the following flow: + +1. The user specifies a bundle URI with the `--bundle-uri` command-line + option _or_ the client discovers a bundle list advertised by the + Git server. + +2. If the downloaded data from a bundle URI is a bundle, then the client + inspects the bundle headers to check that the negative commit OIDs are + present in the client repository. If some are missing, then the client + delays unbundling until other bundles have been unbundled, making those + OIDs present. When all required OIDs are present, the client unbundles + that data using a refspec. The default refspec is + `+refs/heads/*:refs/bundles/*`, but this can be configured. + +3. If the file is instead a bundle list, then the client inspects the + `bundle.list.mode` to see if the list is of the `all` or `any` form. + + a. If `bundle.list.mode=all`, then the client considers all bundle + URIs. The list is reduced based on the `bundle..filter` options + matching the client repository's partial clone filter. Then, all + bundle URIs are requested. If the `bundle..timestamp` heuristic + is provided, then the bundles are downloaded in reverse- + chronological order, stopping when a bundle has all required OIDs. + The bundles can then be unbundled in chronological order. The client + stores the latest timestamp as a heuristic for avoiding future + downloads if the bundle list does not advertise newer bundles. + + b. If `bundle.list.mode=any`, then the client can choose any one of the + bundle URIs to inspect. The client can use a variety of ways to + choose among these URIs. The client can also fallback to another URI + if the initial choice fails to return a result. + +Note that during a clone we expect that all bundles will be required, and +heuristics such as `bundle..timestamp` can be used to download bundles +in chronological order or in parallel. + +If a given bundle URI is a bundle list with `bundle.list.forFetch=true`, +then the client can choose to store that URI as its chosen bundle URI. The +client can then navigate directly to that URI during later `git fetch` +calls. + +When downloading bundle URIs, the client can choose to inspect the initial +content before committing to downloading the entire content. This may +provide enough information to determine if the URI is a bundle list or +a bundle. In the case of a bundle, the client may inspect the bundle +header to determine that all advertised tips are already in the client +repository and cancel the remaining download. + +Fetching with Bundle URIs +------------------------- + +When the client fetches new data, it can decide to fetch from bundle +servers before fetching from the origin remote. This could be done via +a command-line option, but it is more likely useful to use a config value +such as the one specified during the clone. + +The fetch operation follows the same procedure to download bundles from a +bundle list (although we do _not_ want to use parallel downloads here). We +expect that the process will end when all negative commit OIDs in a thin +bundle are already in the object database. + +A further optimization is that the client can avoid downloading any +bundles if their timestamps are not larger than the stored timestamp. +After fetching new bundles, this local timestamp value is updated. + +If the bundle provider does not provide the `bundle..timestamp` +heuristic, then the client should attempt to inspect the bundle headers +before downloading the full bundle data in case the bundle tips already +exist in the client repository. + +Error Conditions +---------------- + +If the Git client discovers something unexpected while downloading +information according to a bundle URI or the table of contents found at +that location, then Git can ignore that data and continue as if it was not +given a bundle URI. The remote Git server is the ultimate source of truth, +not the bundle URI. + +Here are a few example error conditions: + +* The client fails to connect with a server at the given URI or a connection + is lost without any chance to recover. + +* The client receives a response other than `200 OK` (such as `404 Not Found`, + `401 Not Authorized`, or `500 Internal Server Error`). The client should + use the `credential.helper` to attempt authentication after the first + `401 Not Authorized` response, but a second such response is a failure. + +* The client receives data that is not parsable as a bundle or table of + contents. + +* The bundle list describes a directed cycle in the + `bundle..requires` links. + +* A bundle includes a filter that does not match expectations. + +* The client cannot unbundle the bundles because the negative commit OIDs + are not in the object database and there are no more + `bundle..requires` links to follow. + +There are also situations that could be seen as wasteful, but are not +error conditions: + +* The downloaded bundles contain more information than is requested by + the clone or fetch request. A primary example is if the user requests + a clone with `--single-branch` but downloads bundles that store every + reachable commit from all `refs/heads/*` references. This might be + initially wasteful, but perhaps these objects will become reachable by + a later ref update that the client cares about. + +* A bundle download during a `git fetch` contains objects already in the + object database. This is probably unavoidable if we are using bundles + for fetches, since the client will almost always be slightly ahead of + the bundle servers after performing its "catch-up" fetch to the remote + server. This extra work is most wasteful when the client is fetching + much more frequently than the server is computing bundles, such as if + the client is using hourly prefetches with background maintenance, but + the server is computing bundles weekly. For this reason, the client + should not use bundle URIs for fetch unless the server has explicitly + recommended it through the `bundle.list.forFetch = true` value. + +Implementation Plan +------------------- + +This design document is being submitted on its own as an aspirational +document, with the goal of implementing all of the mentioned client +features over the course of several patch series. Here is a potential +outline for submitting these features: + +1. Integrate bundle URIs into `git clone` with a `--bundle-uri` option. + This will include a new `git fetch --bundle-uri` mode for use as the + implementation underneath `git clone`. The initial version here will + expect a single bundle at the given URI. + +2. Implement the ability to parse a bundle list from a bundle URI and + update the `git fetch --bundle-uri` logic to properly distinguish + between `bundle.list.mode` options. Specifically design the feature so + that the config format parsing feeds a list of key-value pairs into the + bundle list logic. + +3. Create the `bundle-uri` protocol v2 verb so Git servers can advertise + bundle URIs using the key-value pairs. Plug into the existing key-value + input to the bundle list logic. Allow `git clone` to discover these + bundle URIs and bootstrap the client repository from the bundle data. + (This choice is an opt-in via a config option and a command-line + option.) + +4. Allow the client to understand the `bundle.list.forFetch` configuration + and the `bundle..timestamp` heuristic. When `git clone` discovers a + bundle URI with `bundle.list.forFetch=true`, it configures the client + repository to check that bundle URI during later `git fetch ` + commands. + +5. Allow clients to discover bundle URIs during `git fetch` and configure + a bundle URI for later fetches if `bundle.list.forFetch=true`. + +6. Implement the "inspect headers" heuristic to reduce data downloads when + the `bundle..timestamp` heuristic is not available. + +As these features are reviewed, this plan might be updated. We also expect +that new designs will be discovered and implemented as this feature +matures and becomes used in real-world scenarios. + +Related Work: Packfile URIs +--------------------------- + +The Git protocol already has a capability where the Git server can list +a set of URLs along with the packfile response when serving a client +request. The client is then expected to download the packfiles at those +locations in order to have a complete understanding of the response. + +This mechanism is used by the Gerrit server (implemented with JGit) and +has been effective at reducing CPU load and improving user performance for +clones. + +A major downside to this mechanism is that the origin server needs to know +_exactly_ what is in those packfiles, and the packfiles need to be available +to the user for some time after the server has responded. This coupling +between the origin and the packfile data is difficult to manage. + +Further, this implementation is extremely hard to make work with fetches. + +Related Work: GVFS Cache Servers +-------------------------------- + +The GVFS Protocol [2] is a set of HTTP endpoints designed independently of +the Git project before Git's partial clone was created. One feature of this +protocol is the idea of a "cache server" which can be colocated with build +machines or developer offices to transfer Git data without overloading the +central server. + +The endpoint that VFS for Git is famous for is the `GET /gvfs/objects/{oid}` +endpoint, which allows downloading an object on-demand. This is a critical +piece of the filesystem virtualization of that product. + +However, a more subtle need is the `GET /gvfs/prefetch?lastPackTimestamp=` +endpoint. Given an optional timestamp, the cache server responds with a list +of precomputed packfiles containing the commits and trees that were introduced +in those time intervals. + +The cache server computes these "prefetch" packfiles using the following +strategy: + +1. Every hour, an "hourly" pack is generated with a given timestamp. +2. Nightly, the previous 24 hourly packs are rolled up into a "daily" pack. +3. Nightly, all prefetch packs more than 30 days old are rolled up into + one pack. + +When a user runs `gvfs clone` or `scalar clone` against a repo with cache +servers, the client requests all prefetch packfiles, which is at most +`24 + 30 + 1` packfiles downloading only commits and trees. The client +then follows with a request to the origin server for the references, and +attempts to checkout that tip reference. (There is an extra endpoint that +helps get all reachable trees from a given commit, in case that commit +was not already in a prefetch packfile.) + +During a `git fetch`, a hook requests the prefetch endpoint using the +most-recent timestamp from a previously-downloaded prefetch packfile. +Only the list of packfiles with later timestamps are downloaded. Most +users fetch hourly, so they get at most one hourly prefetch pack. Users +whose machines have been off or otherwise have not fetched in over 30 days +might redownload all prefetch packfiles. This is rare. + +It is important to note that the clients always contact the origin server +for the refs advertisement, so the refs are frequently "ahead" of the +prefetched pack data. The missing objects are downloaded on-demand using +the `GET gvfs/objects/{oid}` requests, when needed by a command such as +`git checkout` or `git log`. Some Git optimizations disable checks that +would cause these on-demand downloads to be too aggressive. + +See Also +-------- + +[1] https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/ + An earlier RFC for a bundle URI feature. + +[2] https://github.com/microsoft/VFSForGit/blob/master/Protocol.md + The GVFS Protocol From e6f4072bb5fadbee2003698316f31d8fe930a21d Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:20 +0000 Subject: [PATCH 02/24] remote-curl: add 'get' capability A future change will want a way to download a file over HTTP(S) using the simplest of download mechanisms. We do not want to assume that the server on the other side understands anything about the Git protocol but could be a simple static web server. Create the new 'get' capability for the remote helpers which advertises that the 'get' command is avalable. A caller can send a line containing 'get ' to download the file at into the file at . RFC-TODO: This change requires tests directly on the remote helper. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/gitremote-helpers.txt | 6 ++++++ remote-curl.c | 32 +++++++++++++++++++++++++++++ t/t6021-fetch-bundle.sh | 23 +++++++++++++++++++++ transport-helper.c | 5 ++++- 4 files changed, 65 insertions(+), 1 deletion(-) create mode 100755 t/t6021-fetch-bundle.sh diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt index 6f1e269ae43e00..f82588601a9a5f 100644 --- a/Documentation/gitremote-helpers.txt +++ b/Documentation/gitremote-helpers.txt @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'. Can guarantee that when a clone is requested, the received pack is self contained and is connected. +'get':: + Can use the 'get' command to download a file from a given URI. + If a helper advertises 'connect', Git will use it if possible and fall back to another capability if the helper requests so when connecting (see the 'connect' command under COMMANDS). @@ -418,6 +421,9 @@ Supported if the helper has the "connect" capability. + Supported if the helper has the "stateless-connect" capability. +'get' :: + Downloads the file from the given `` to the given ``. + If a fatal error occurs, the program writes the error message to stderr and exits. The caller should expect that a suitable error message has been printed if the child closes the connection without diff --git a/remote-curl.c b/remote-curl.c index 67f178b1120bd9..53750d88e76de8 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -1276,6 +1276,33 @@ static void parse_fetch(struct strbuf *buf) strbuf_reset(buf); } +static void parse_get(struct strbuf *buf) +{ + struct http_get_options opts = { 0 }; + struct strbuf url = STRBUF_INIT; + struct strbuf path = STRBUF_INIT; + const char *p, *space; + + if (!skip_prefix(buf->buf, "get ", &p)) + die(_("http transport does not support %s"), buf->buf); + + space = strchr(p, ' '); + + if (!space) + die(_("protocol error: expected ' ', missing space")); + + strbuf_add(&url, p, space - p); + strbuf_addstr(&path, space + 1); + + http_get_file(url.buf, path.buf, &opts); + + strbuf_release(&url); + strbuf_release(&path); + printf("\n"); + fflush(stdout); + strbuf_reset(buf); +} + static int push_dav(int nr_spec, const char **specs) { struct child_process child = CHILD_PROCESS_INIT; @@ -1549,9 +1576,14 @@ int cmd_main(int argc, const char **argv) printf("unsupported\n"); fflush(stdout); + } else if (skip_prefix(buf.buf, "get ", &arg)) { + parse_get(&buf); + fflush(stdout); + } else if (!strcmp(buf.buf, "capabilities")) { printf("stateless-connect\n"); printf("fetch\n"); + printf("get\n"); printf("option\n"); printf("push\n"); printf("check-connectivity\n"); diff --git a/t/t6021-fetch-bundle.sh b/t/t6021-fetch-bundle.sh new file mode 100755 index 00000000000000..8ae16009ba0aee --- /dev/null +++ b/t/t6021-fetch-bundle.sh @@ -0,0 +1,23 @@ +#!/bin/sh + +test_description='test fetching files and bundles over HTTP' + +. ./test-lib.sh +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +test_expect_success 'set up source repo' ' + git init src && + test_commit -C src one +' + +test_expect_success 'fail to get a file over HTTP' ' + cat >input <<-EOF && + get $HTTPD_URL/does-not-exist.txt download-failed.txt + + EOF + git remote-https "$HTTPD_URL" err && + test_path_is_missing download-failed.txt +' + +test_done diff --git a/transport-helper.c b/transport-helper.c index b4dbbabb0c2ef6..dfbeaebe40cac0 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -33,7 +33,8 @@ struct helper_data { check_connectivity : 1, no_disconnect_req : 1, no_private_update : 1, - object_format : 1; + object_format : 1, + get : 1; /* * As an optimization, the transport code may invoke fetch before @@ -210,6 +211,8 @@ static struct child_process *get_helper(struct transport *transport) data->no_private_update = 1; } else if (starts_with(capname, "object-format")) { data->object_format = 1; + } else if (!strcmp(capname, "get")) { + data->get = 1; } else if (mandatory) { die(_("unknown mandatory capability %s; this remote " "helper probably needs newer version of Git"), From 7e38468ed7f0369c9318551916fc02824c199efe Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:21 +0000 Subject: [PATCH 03/24] bundle-uri: create basic file-copy logic Before implementing a way to fetch bundles into a repository, create the basic logic. Assume that the URI is actually a file path. Future logic will make this more careful to other protocols. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Makefile | 1 + bundle-uri.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 14 ++++++++ 3 files changed, 107 insertions(+) create mode 100644 bundle-uri.c create mode 100644 bundle-uri.h diff --git a/Makefile b/Makefile index f8bccfab5e9c46..8f27310836d9a7 100644 --- a/Makefile +++ b/Makefile @@ -887,6 +887,7 @@ LIB_OBJS += blob.o LIB_OBJS += bloom.o LIB_OBJS += branch.o LIB_OBJS += bulk-checkin.o +LIB_OBJS += bundle-uri.o LIB_OBJS += bundle.o LIB_OBJS += cache-tree.o LIB_OBJS += cbtree.o diff --git a/bundle-uri.c b/bundle-uri.c new file mode 100644 index 00000000000000..91aba061833804 --- /dev/null +++ b/bundle-uri.c @@ -0,0 +1,92 @@ +#include "cache.h" +#include "bundle-uri.h" +#include "bundle.h" +#include "object-store.h" +#include "refs.h" +#include "run-command.h" + +static void find_temp_filename(struct strbuf *name) +{ + int fd; + /* + * Find a temporray filename that is available. This is briefly + * racy, but unlikely to collide. + */ + fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); + if (fd < 0) + die(_("failed to create temporary file")); + close(fd); + unlink(name->buf); +} + +static int copy_uri_to_file(const char *uri, const char *file) +{ + /* Copy as a file */ + return copy_file(uri, file, 0444); +} + +static int unbundle_from_file(struct repository *r, const char *file) +{ + int result = 0; + int bundle_fd; + struct bundle_header header = BUNDLE_HEADER_INIT; + struct strvec extra_index_pack_args = STRVEC_INIT; + struct string_list_item *refname; + struct strbuf bundle_ref = STRBUF_INIT; + size_t bundle_prefix_len; + + if ((bundle_fd = read_bundle_header(file, &header)) < 0) + return 1; + + result = unbundle(r, &header, bundle_fd, &extra_index_pack_args); + + /* + * Convert all refs/heads/ from the bundle into refs/bundles/ + * in the local repository. + */ + strbuf_addstr(&bundle_ref, "refs/bundles/"); + bundle_prefix_len = bundle_ref.len; + + for_each_string_list_item(refname, &header.references) { + struct object_id *oid = refname->util; + struct object_id old_oid; + const char *branch_name; + int has_old; + + if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) + continue; + + strbuf_setlen(&bundle_ref, bundle_prefix_len); + strbuf_addstr(&bundle_ref, branch_name); + + has_old = !read_ref(bundle_ref.buf, &old_oid); + update_ref("fetched bundle", bundle_ref.buf, oid, + has_old ? &old_oid : NULL, + REF_SKIP_OID_VERIFICATION, + UPDATE_REFS_MSG_ON_ERR); + } + + bundle_header_release(&header); + return result; +} + +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + int result = 0; + struct strbuf filename = STRBUF_INIT; + + find_temp_filename(&filename); + if ((result = copy_uri_to_file(uri, filename.buf))) + goto cleanup; + + if ((result = !is_bundle(filename.buf, 0))) + goto cleanup; + + if ((result = unbundle_from_file(r, filename.buf))) + goto cleanup; + +cleanup: + unlink(filename.buf); + strbuf_release(&filename); + return result; +} \ No newline at end of file diff --git a/bundle-uri.h b/bundle-uri.h new file mode 100644 index 00000000000000..8a152f1ef142f9 --- /dev/null +++ b/bundle-uri.h @@ -0,0 +1,14 @@ +#ifndef BUNDLE_URI_H +#define BUNDLE_URI_H + +struct repository; + +/** + * Fetch data from the given 'uri' and unbundle the bundle data found + * based on that information. + * + * Returns non-zero if no bundle information is found at the given 'uri'. + */ +int fetch_bundle_uri(struct repository *r, const char *uri); + +#endif From bdc6bdbd0952b50b233b1b6ac5f52f409a2b843a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:22 +0000 Subject: [PATCH 04/24] bundle-uri: add support for http(s):// and file:// The previous change created the logic for copying a file by name. Now, first inspect the URI for an HTTP(S) prefix and use git-remote-https as the way to download the data at that URI. Otherwise, check to see if file:// is present and modify the prefix accordingly. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 91aba061833804..08de7257c74759 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -19,10 +19,73 @@ static void find_temp_filename(struct strbuf *name) unlink(name->buf); } +static int download_https_uri_to_file(const char *uri, const char *file) +{ + int result = 0; + struct child_process cp = CHILD_PROCESS_INIT; + FILE *child_in = NULL, *child_out = NULL; + struct strbuf line = STRBUF_INIT; + int found_get = 0; + + strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL); + cp.in = -1; + cp.out = -1; + + if (start_command(&cp)) + return 1; + + child_in = fdopen(cp.in, "w"); + if (!child_in) { + result = 1; + goto cleanup; + } + + child_out = fdopen(cp.out, "r"); + if (!child_out) { + result = 1; + goto cleanup; + } + + fprintf(child_in, "capabilities\n"); + fflush(child_in); + + while (!strbuf_getline(&line, child_out)) { + if (!line.len) + break; + if (!strcmp(line.buf, "get")) + found_get = 1; + } + strbuf_release(&line); + + if (!found_get) { + result = error(_("insufficient capabilities")); + goto cleanup; + } + + fprintf(child_in, "get %s %s\n\n", uri, file); + +cleanup: + if (child_in) + fclose(child_in); + if (finish_command(&cp)) + return 1; + if (child_out) + fclose(child_out); + return result; +} + static int copy_uri_to_file(const char *uri, const char *file) { + const char *out; + if (skip_prefix(uri, "https:", &out) || + skip_prefix(uri, "http:", &out)) + return download_https_uri_to_file(uri, file); + + if (!skip_prefix(uri, "file://", &out)) + out = uri; + /* Copy as a file */ - return copy_file(uri, file, 0444); + return !!copy_file(out, file, 0); } static int unbundle_from_file(struct repository *r, const char *file) From b4865121f3b8a3e0a78e174a159fd5867b2bf5be Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:23 +0000 Subject: [PATCH 05/24] fetch: add --bundle-uri option Teach 'git fetch' a new --bundle-uri= option which changes the mode from fetching from a remote using the Git protocol to fetching a bundle from the given . See Documentation/technical/bundle-uri.txt for more information on the design of this feature. This implementation is limited to the most basic version of the feature. We expect the content at that URI to be a bundle file, not a bundle list. Bundle lists will be implemented later. This implementation is sufficient for a bundle provider to create a single bootstrap bundle for a large repository. The user would bootstrap a repository using a sequence of Git commands, such as: 1. git init && cd 2. git fetch --bundle-uri= 3. git remote add origin 4. git fetch origin 5. git checkout FETCH_HEAD Later changes will make this seamless within a 'git clone' command, but this implementation is large enough to delay that integration. Currently, this option supports URIs using "http(s)://", "file://" or simply specifying a filename. Other protocols could be added in the future. RFC TODO: add end-to-end tests of this workflow. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/fetch-options.txt | 5 +++++ Documentation/git-fetch.txt | 1 + builtin/fetch.c | 10 ++++++++++ 3 files changed, 16 insertions(+) diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 622bd84768b056..09bd1feeed8779 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -317,3 +317,8 @@ endif::git-pull[] -6:: --ipv6:: Use IPv6 addresses only, ignoring IPv4 addresses. + +--bundle-uri=:: + Instead of fetching from a remote, fetch a bundle from the given + `` and unbundle the data into the local repository. The refs + in the bundle will be stored under the `refs/bundle/*` namespace. diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt index e9d364669af7b9..4fd8911b3361ab 100644 --- a/Documentation/git-fetch.txt +++ b/Documentation/git-fetch.txt @@ -13,6 +13,7 @@ SYNOPSIS 'git fetch' [] 'git fetch' --multiple [] [( | )...] 'git fetch' --all [] +'git fetch' --bundle-uri= [] DESCRIPTION diff --git a/builtin/fetch.c b/builtin/fetch.c index e3791f09ed51d0..cb0d2fbe82caf0 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -29,6 +29,7 @@ #include "commit-graph.h" #include "shallow.h" #include "worktree.h" +#include "bundle-uri.h" #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000) @@ -37,6 +38,7 @@ static const char * const builtin_fetch_usage[] = { N_("git fetch [] "), N_("git fetch --multiple [] [( | )...]"), N_("git fetch --all []"), + N_("git fetch --bundle-uri= []"), NULL }; @@ -86,6 +88,7 @@ static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP; static int fetch_write_commit_graph = -1; static int stdin_refspecs = 0; static int negotiate_only; +static const char *bundle_uri; static int git_fetch_config(const char *k, const char *v, void *cb) { @@ -224,6 +227,8 @@ static struct option builtin_fetch_options[] = { N_("write the commit-graph after fetching")), OPT_BOOL(0, "stdin", &stdin_refspecs, N_("accept refspecs from stdin")), + OPT_STRING(0, "bundle-uri", &bundle_uri, N_("uri"), + N_("download bundle data from the given URI instead of from a remote")), OPT_END() }; @@ -2181,6 +2186,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (dry_run) write_fetch_head = 0; + if (bundle_uri) { + result = fetch_bundle_uri(the_repository, bundle_uri); + goto cleanup; + } + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); From 767020d569746c50e4fc451e8b7fdd63bb865ed2 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:24 +0000 Subject: [PATCH 06/24] fetch: add 'refs/bundle/' to log.excludeDecoration When fetching from a bundle URI, the branches of that bundle are stored in a different ref namespace: refs/bundles/. This namespace is intended to assist with later 'git fetch' negotiations with a Git server, allowing the client to advertise which data it already has from a bundle URI. These references can be confusing for a user when they appear as a decoration in 'git log' output. Add "refs/bundles/" to the multi-valued log.excludeDecoration config value. This is similar to the way "refs/prefetch/" is hidden by background prefetch operations in 'git maintenance' as added by 96eaffebb (maintenance: set log.excludeDecoration durin prefetch, 2021-01-19). Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/fetch-options.txt | 3 ++- bundle-uri.c | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 09bd1feeed8779..8b801bcc2f31a8 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -321,4 +321,5 @@ endif::git-pull[] --bundle-uri=:: Instead of fetching from a remote, fetch a bundle from the given `` and unbundle the data into the local repository. The refs - in the bundle will be stored under the `refs/bundle/*` namespace. + in the bundle will be stored under the hidden `refs/bundle/*` + namespace. diff --git a/bundle-uri.c b/bundle-uri.c index 08de7257c74759..0692a62071a6c2 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -1,6 +1,7 @@ #include "cache.h" #include "bundle-uri.h" #include "bundle.h" +#include "config.h" #include "object-store.h" #include "refs.h" #include "run-command.h" @@ -148,6 +149,12 @@ int fetch_bundle_uri(struct repository *r, const char *uri) if ((result = unbundle_from_file(r, filename.buf))) goto cleanup; + git_config_set_multivar_gently("log.excludedecoration", + "refs/bundle/", + "refs/bundle/", + CONFIG_FLAGS_FIXED_VALUE | + CONFIG_FLAGS_MULTI_REPLACE); + cleanup: unlink(filename.buf); strbuf_release(&filename); From 1951859406b28ab24b0cd78137f0c22ac50f3a26 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:25 +0000 Subject: [PATCH 07/24] clone: add --bundle-uri option Cloning a remote repository is one of the most expensive operations in Git. The server can spend a lot of CPU time generating a pack-file for the client's request. The amount of data can clog the network for a long time, and the Git protocol is not resumable. For users with poor network connections or are located far away from the origin server, this can be especially painful. The 'git bundle fetch' command allows users to bootstrap a repository using a set of bundles. However, this would require them to use 'git init' first, followed by the 'git bundle fetch', and finally add a remote, fetch, and checkout the branch they want. Instead, integrate this workflow directly into 'git clone' with the --bundle-uri' option. If the user is aware of a bundle server, then they can tell Git to bootstrap the new repository with these bundles before fetching the remaining objects from the origin server. RFC-TODO: Document this option in git-clone.txt. RFC-TODO: I added a comment about the location of this code being necessary for the later step of auto-discovering the bundle URI from the origin server. This is probably not actually a requirement, but rather a pain point around how I implemented the feature. If a --bundle-uri option is specified, but SSH is used for the clone, then the SSH connection is left open while Git downloads bundles from another server. This is sub-optimal and should be reconsidered when fully reviewed. RFC-TODO: create tests for this option with a variety of URI types. RFC-TODO: a simple end-to-end test is available at the end of the series. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/clone.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/builtin/clone.c b/builtin/clone.c index 52316563795fe3..fd1ae82e57b2b0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -34,6 +34,7 @@ #include "list-objects-filter-options.h" #include "hook.h" #include "bundle.h" +#include "bundle-uri.h" /* * Overall FIXMEs: @@ -77,6 +78,7 @@ static int option_filter_submodules = -1; /* unspecified */ static int config_filter_submodules = -1; /* unspecified */ static struct string_list server_options = STRING_LIST_INIT_NODUP; static int option_remote_submodules; +static const char *bundle_uri; static int recurse_submodules_cb(const struct option *opt, const char *arg, int unset) @@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = { N_("any cloned submodules will use their remote-tracking branch")), OPT_BOOL(0, "sparse", &option_sparse_checkout, N_("initialize sparse-checkout file to include only files at root")), + OPT_STRING(0, "bundle-uri", &bundle_uri, + N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")), OPT_END() }; @@ -1232,6 +1236,24 @@ int cmd_clone(int argc, const char **argv, const char *prefix) refs = transport_get_remote_refs(transport, &transport_ls_refs_options); + /* + * NOTE: The bundle URI download takes place after transport_get_remote_refs() + * because a later change will introduce a check for recommended features, + * which might include a recommended bundle URI. + */ + + /* + * Before fetching from the remote, download and install bundle + * data from the --bundle-uri option. + */ + if (bundle_uri) { + /* At this point, we need the_repository to match the cloned repo. */ + repo_init(the_repository, git_dir, work_tree); + if (fetch_bundle_uri(the_repository, bundle_uri)) + warning(_("failed to fetch objects from bundle URI '%s'"), + bundle_uri); + } + if (refs) mapped_refs = wanted_peer_refs(refs, &remote->fetch); From 74ef2cc069313da9372e900d176bdb2dc637804a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:26 +0000 Subject: [PATCH 08/24] clone: --bundle-uri cannot be combined with --depth The previous change added the '--bundle-uri' option, but did not check if the --depth parameter was included. Since bundles are not compatible with shallow clones, provide an error message to the user who is attempting this combination. I am leaving this as its own change, separate from the one that implements '--bundle-uri', because this is more of an advisory for the user. There is nothing wrong with bootstrapping with bundles and then fetching a shallow clone. However, that is likely going to involve too much work for the client _and_ the server. The client will download all of this bundle information containing the full history of the repository only to ignore most of it. The server will get a shallow fetch request, but with a list of haves that might cause a more painful computation of that shallow pack-file. RFC-TODO: add a test case for this error message. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/clone.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/clone.c b/builtin/clone.c index fd1ae82e57b2b0..a9caa5dfed640b 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -926,6 +926,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix) option_no_checkout = 1; } + if (bundle_uri) { + if (deepen) + die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude")); + } + repo_name = argv[0]; path = get_repo_path(repo_name, &is_bundle); From f02f1bf883785de351e8612efacd3377cb8596d8 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:27 +0000 Subject: [PATCH 09/24] bundle-uri: create bundle_list struct and helpers It will likely be rare where a user uses a single bundle URI and expects that URI to point to a bundle. Instead, that URI will likely be a list of bundles provided in some format. Alternatively, the Git server could advertise a list of bundles. In anticipation of these two ways of advertising multiple bundles, create a data structure that represents such a list. This will be populated using a common API, but for now focus on what data can be represented. Each list contains a number of remote_bundle_info structs. These contain an 'id' that is used to uniquely identify them in the list, and also a 'uri' that contains the location of its data. Finally, there is a strbuf containing the filename used when Git downloads the contents to disk. The list itself stores these remote_bundle_info structs in a hashtable using 'id' as the key. The order of the structs in the input is considered unimportant, but future modifications to the format and these data structures will place ordering possibilities on the set. The list also has a few "global" properties, including the version (used when parsing the list) and the mode. The mode is one of these two options: 1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined together. The client should download all of the advertised data to have a complete copy of the data. 2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete copy of the data. The client can choose arbitrarily from these options. In the future, the client may use pings to find the closest URI among geodistributed replicas, or use some other heuristic information added to the format. This API is currently unused, but will soon be expanded with parsing logic and then be consumed by the bundle URI download logic. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++ bundle-uri.h | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 0692a62071a6c2..b9a219d3202e03 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -5,6 +5,67 @@ #include "object-store.h" #include "refs.h" #include "run-command.h" +#include "hashmap.h" +#include "pkt-line.h" + +static int compare_bundles(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *he1, + const struct hashmap_entry *he2, + const void *id) +{ + const struct remote_bundle_info *e1 = + container_of(he1, const struct remote_bundle_info, ent); + const struct remote_bundle_info *e2 = + container_of(he2, const struct remote_bundle_info, ent); + + return strcmp(e1->id, id ? (const char *)id : e2->id); +} + +void init_bundle_list(struct bundle_list *list) +{ + memset(list, 0, sizeof(*list)); + + /* Implied defaults. */ + list->mode = BUNDLE_MODE_ALL; + list->version = 1; + + hashmap_init(&list->bundles, compare_bundles, NULL, 0); +} + +static int clear_remote_bundle_info(struct remote_bundle_info *bundle, + void *data) +{ + free(bundle->id); + free(bundle->uri); + strbuf_release(&bundle->file); + return 0; +} + +void clear_bundle_list(struct bundle_list *list) +{ + if (!list) + return; + + for_all_bundles_in_list(list, clear_remote_bundle_info, NULL); + hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent); +} + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data) +{ + struct remote_bundle_info *info; + struct hashmap_iter i; + + hashmap_for_each_entry(&list->bundles, &i, info, ent) { + int result = iter(info, data); + + if (result) + return result; + } + + return 0; +} static void find_temp_filename(struct strbuf *name) { diff --git a/bundle-uri.h b/bundle-uri.h index 8a152f1ef142f9..0f95197744c705 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -1,7 +1,74 @@ #ifndef BUNDLE_URI_H #define BUNDLE_URI_H +#include "hashmap.h" +#include "strbuf.h" + struct repository; +struct strbuf; +struct string_list; + +/** + * The remote_bundle_info struct contains information for a single bundle + * URI. This may be initialized simply by a given URI or might have + * additional metadata associated with it if the bundle was advertised by + * a bundle list. + */ +struct remote_bundle_info { + struct hashmap_entry ent; + + /** + * The 'id' is a name given to the bundle for reference + * by other bundle infos. + */ + char *id; + + /** + * The 'uri' is the location of the remote bundle so + * it can be downloaded on-demand. This will be NULL + * if there was no table of contents. + */ + char *uri; + + /** + * If the bundle has been downloaded, then 'file' is a + * filename storing its contents. Otherwise, 'file' is + * an empty string. + */ + struct strbuf file; +}; + +#define REMOTE_BUNDLE_INFO_INIT { \ + .file = STRBUF_INIT, \ +} + +enum bundle_list_mode { + BUNDLE_MODE_NONE = 0, + BUNDLE_MODE_ALL, + BUNDLE_MODE_ANY +}; + +/** + * A bundle_list contains an unordered set of remote_bundle_info structs, + * as well as information about the bundle listing, such as version and + * mode. + */ +struct bundle_list { + int version; + enum bundle_list_mode mode; + unsigned forFetch : 1; + struct hashmap bundles; +}; + +void init_bundle_list(struct bundle_list *list); +void clear_bundle_list(struct bundle_list *list); + +typedef int (*bundle_iterator)(struct remote_bundle_info *bundle, + void *data); + +int for_all_bundles_in_list(struct bundle_list *list, + bundle_iterator iter, + void *data); /** * Fetch data from the given 'uri' and unbundle the bundle data found From 7b12487b1e7082f5cb31487d1293e87c0d5b60d6 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:28 +0000 Subject: [PATCH 10/24] bundle-uri: create base key-value pair parsing There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.list.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.list.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "" as part of the key name, and this cannot equal "list". Keys using the same "" describe a common bundle list item. * bundle..uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index b9a219d3202e03..f18bba7071d72f 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -67,6 +67,86 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +/** + * Given a key-value pair, update the state of the given bundle list. + * Returns 0 if the key-value pair is understood. Returns 1 if the key + * is not understood or the value is malformed. + */ +MAYBE_UNUSED +static int bundle_list_update(const char *key, const char *value, + struct bundle_list *list) +{ + const char *pkey, *dot; + struct strbuf id = STRBUF_INIT; + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT; + struct remote_bundle_info *bundle; + + if (!skip_prefix(key, "bundle.", &pkey)) + return 1; + + if (!strcmp(pkey, "list.version")) { + int version = atoi(value); + if (version != 1) + return 1; + + list->version = version; + return 0; + } + + if (!strcmp(pkey, "list.mode")) { + if (!strcmp(value, "all")) + list->mode = BUNDLE_MODE_ALL; + else if (!strcmp(value, "any")) + list->mode = BUNDLE_MODE_ANY; + else + return 1; + return 0; + } + + /* + * All remaining keys must be of the form "bundle..*" where + * != "list" + */ + + dot = strchr(pkey, '.'); + if (!dot) + return 1; + if (dot - pkey == 4 && + !strncmp(pkey, "list", 4)) + return 1; + + strbuf_add(&id, pkey, dot - pkey); + dot++; + + /* + * Check for an existing bundle with this , or create one + * if necessary. + */ + lookup.id = id.buf; + hashmap_entry_init(&lookup.ent, strhash(lookup.id)); + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) { + CALLOC_ARRAY(bundle, 1); + bundle->id = strbuf_detach(&id, NULL); + strbuf_init(&bundle->file, 0); + hashmap_entry_init(&bundle->ent, strhash(bundle->id)); + hashmap_add(&list->bundles, &bundle->ent); + } + strbuf_release(&id); + + if (!strcmp(dot, "uri")) { + free(bundle->uri); + bundle->uri = xstrdup(value); + return 0; + } + + /* + * At this point, we ignore any information that we don't + * understand, assuming it to be hints for a heuristic the client + * does not currently understand. + */ + return 0; +} + static void find_temp_filename(struct strbuf *name) { int fd; From 399333882e448bd4cdc94c206e313c20d4a86741 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 20 May 2022 18:40:29 +0000 Subject: [PATCH 11/24] bundle-uri: create "key=value" line parsing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When advertising a bundle list over Git's protocol v2, we will use packet lines. Each line will be of the form "key=value" representing a bundle list. Connect the API necessary for Git's transport to the key-value pair parsing created in the previous change. Co-authored-by: Derrick Stolee Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 29 +++++++++++++++++++++++++++-- bundle-uri.h | 14 +++++++++++++- 2 files changed, 40 insertions(+), 3 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index f18bba7071d72f..948f974c478ea0 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -72,7 +72,6 @@ int for_all_bundles_in_list(struct bundle_list *list, * Returns 0 if the key-value pair is understood. Returns 1 if the key * is not understood or the value is malformed. */ -MAYBE_UNUSED static int bundle_list_update(const char *key, const char *value, struct bundle_list *list) { @@ -300,4 +299,30 @@ int fetch_bundle_uri(struct repository *r, const char *uri) unlink(filename.buf); strbuf_release(&filename); return result; -} \ No newline at end of file +} + +/** + * General API for {transport,connect}.c etc. + */ +int bundle_uri_parse_line(struct bundle_list *list, const char *line) +{ + int result; + const char *equals; + struct strbuf key = STRBUF_INIT; + + if (!strlen(line)) + return error(_("bundle-uri: got an empty line")); + + equals = strchr(line, '='); + + if (!equals) + return error(_("bundle-uri: line is not of the form 'key=value'")); + if (line == equals || !*(equals + 1)) + return error(_("bundle-uri: line has empty key or value")); + + strbuf_add(&key, line, equals - line); + result = bundle_list_update(key.buf, equals + 1, list); + strbuf_release(&key); + + return result; +} diff --git a/bundle-uri.h b/bundle-uri.h index 0f95197744c705..344d594392244a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -78,4 +78,16 @@ int for_all_bundles_in_list(struct bundle_list *list, */ int fetch_bundle_uri(struct repository *r, const char *uri); -#endif +/** + * General API for {transport,connect}.c etc. + */ + +/** + * Parse a "key=value" packet line from the bundle-uri verb. + * + * Returns 0 on success and non-zero on error. + */ +int bundle_uri_parse_line(struct bundle_list *list, + const char *line); + +#endif /* BUNDLE_URI_H */ From 018753e8121d826669809aa61ec4a718e8533918 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 20 May 2022 18:40:30 +0000 Subject: [PATCH 12/24] bundle-uri: unit test "key=value" parsing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Create a new 'test-tool bundle-uri' test helper. This helper will assist in testing logic deep in the bundle URI feature. This change introduces the 'parse-key-values' subcommand, which parses stdin as a list of lines. These are fed into bundle_uri_parse_line() to test how we construct a 'struct bundle_list' from that data. The list is then output to stdout as if the key-value pairs were a Git config file. Co-authored-by: Derrick Stolee Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Makefile | 1 + bundle-uri.c | 34 ++++++++++++++ bundle-uri.h | 3 ++ t/helper/test-bundle-uri.c | 63 +++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5750-bundle-uri-parse.sh | 91 +++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 11 +++++ 8 files changed, 205 insertions(+) create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh diff --git a/Makefile b/Makefile index 8f27310836d9a7..c8a14793005206 100644 --- a/Makefile +++ b/Makefile @@ -706,6 +706,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_BUILTINS_OBJS += test-advise.o TEST_BUILTINS_OBJS += test-bitmap.o TEST_BUILTINS_OBJS += test-bloom.o +TEST_BUILTINS_OBJS += test-bundle-uri.o TEST_BUILTINS_OBJS += test-chmtime.o TEST_BUILTINS_OBJS += test-config.o TEST_BUILTINS_OBJS += test-crontab.o diff --git a/bundle-uri.c b/bundle-uri.c index 948f974c478ea0..ff2154228884a7 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -67,6 +67,40 @@ int for_all_bundles_in_list(struct bundle_list *list, return 0; } +static int summarize_bundle(struct remote_bundle_info *info, void *data) +{ + FILE *fp = data; + fprintf(fp, "[bundle \"%s\"]\n", info->id); + fprintf(fp, "\turi = %s\n", info->uri); + return 0; +} + +void print_bundle_list(FILE *fp, struct bundle_list *list) +{ + const char *mode; + + switch (list->mode) { + case BUNDLE_MODE_ALL: + mode = "all"; + break; + + case BUNDLE_MODE_ANY: + mode = "any"; + break; + + case BUNDLE_MODE_NONE: + default: + mode = ""; + } + + printf("[bundle \"list\"]\n"); + printf("\tversion = %d\n", list->version); + printf("\tmode = %s\n", mode); + + for_all_bundles_in_list(list, summarize_bundle, fp); +} + + /** * Given a key-value pair, update the state of the given bundle list. * Returns 0 if the key-value pair is understood. Returns 1 if the key diff --git a/bundle-uri.h b/bundle-uri.h index 344d594392244a..4a12a90bf42620 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -70,6 +70,9 @@ int for_all_bundles_in_list(struct bundle_list *list, bundle_iterator iter, void *data); +struct FILE; +void print_bundle_list(FILE *fp, struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c new file mode 100644 index 00000000000000..5cb0c9196fa57c --- /dev/null +++ b/t/helper/test-bundle-uri.c @@ -0,0 +1,63 @@ +#include "test-tool.h" +#include "parse-options.h" +#include "bundle-uri.h" +#include "strbuf.h" +#include "string-list.h" + +static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) +{ + const char *usage[] = { + "test-tool bundle-uri parse-key-values []", + NULL + }; + struct option options[] = { + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION | + PARSE_OPT_KEEP_ARGV0); + if (argc == 1) + goto usage; + + if (!strcmp(argv[1], "parse-key-values")) + return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); + error("there is no test-tool bundle-uri tool '%s'", argv[1]); + +usage: + usage_with_options(usage, options); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 0424f7adf5d69c..bff823fbd3e213 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "advise", cmd__advise_if_enabled }, { "bitmap", cmd__bitmap }, { "bloom", cmd__bloom }, + { "bundle-uri", cmd__bundle_uri }, { "chmtime", cmd__chmtime }, { "config", cmd__config }, { "crontab", cmd__crontab }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index c876e8246fbdff..eb747e07dd19f5 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -7,6 +7,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); +int cmd__bundle_uri(int argc, const char **argv); int cmd__chmtime(int argc, const char **argv); int cmd__config(int argc, const char **argv); int cmd__crontab(int argc, const char **argv); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh new file mode 100755 index 00000000000000..ebb805961256a7 --- /dev/null +++ b/t/t5750-bundle-uri-parse.sh @@ -0,0 +1,91 @@ +#!/bin/sh + +test_description="Test bundle-uri bundle_uri_parse_line()" + +TEST_NO_CREATE_REPO=1 +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'bundle_uri_parse_line() just URIs' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + bundle.two.uri=https://example.com/bundle.bdl + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >expect <<-\EOF && + [bundle "list"] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-key-values actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' ' + cat >in <<-\EOF && + =bogus-value + bogus-key= + EOF + + cat >err.expect <<-EOF && + error: bundle-uri: line has empty key or value + error: bad line: '\''=bogus-value'\'' + error: bundle-uri: line has empty key or value + error: bad line: '\''bogus-key='\'' + EOF + + cat >expect <<-\EOF && + [bundle "list"] + version = 1 + mode = all + EOF + + test_must_fail test-tool bundle-uri parse-key-values actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' + cat >in <<-\EOF && + bundle.one.uri=http://example.com/bundle.bdl + + bundle.two.uri=https://example.com/bundle.bdl + + bundle.three.uri=file:///usr/share/git/bundle.bdl + EOF + + cat >err.expect <<-\EOF && + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + error: bundle-uri: got an empty line + error: bad line: '\'''\'' + EOF + + # We fail, but try to continue parsing regardless + cat >expect <<-\EOF && + [bundle "list"] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test_must_fail test-tool bundle-uri parse-key-values actual 2>err && + test_cmp err.expect err && + test_cmp_config_output expect actual +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 93c03380d44966..747739f81b74c8 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1944,3 +1944,14 @@ test_is_magic_mtime () { rm -f .git/test-mtime-actual return $ret } + +# Given two filenames, parse both using 'git config --list --file' +# and compare the sorted output of those commands. Useful when +# wanting to ignore whitespace differences and sorting concerns. +test_cmp_config_output () { + git config --list --file="$1" >config-expect && + git config --list --file="$2" >config-actual && + sort config-expect >sorted-expect && + sort config-actual >sorted-actual && + test_cmp sorted-expect sorted-actual +} From a275e379733923844e7d143c92da9c749e324049 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:31 +0000 Subject: [PATCH 13/24] bundle-uri: limit recursion depth for bundle lists The next change will start allowing us to parse bundle lists that are downloaded from a provided bundle URI. Those lists might point to other lists, which could proceed to an arbitrary depth (and even create cycles). Restructure fetch_bundle_uri() to have an internal version that has a recursion depth. Compare that to a new max_bundle_uri_depth constant that is twice as high as we expect this depth to be for any legitimate use of bundle list linking. We can consider making max_bundle_uri_depth a configurable value if there is demonstrated value in the future. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index ff2154228884a7..74a26ce6c00076 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -308,11 +308,25 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } -int fetch_bundle_uri(struct repository *r, const char *uri) +/** + * This limits the recursion on fetch_bundle_uri_internal() when following + * bundle lists. + */ +static int max_bundle_uri_depth = 4; + +static int fetch_bundle_uri_internal(struct repository *r, + const char *uri, + int depth) { int result = 0; struct strbuf filename = STRBUF_INIT; + if (depth >= max_bundle_uri_depth) { + warning(_("exceeded bundle URI recursion limit (%d)"), + max_bundle_uri_depth); + return -1; + } + find_temp_filename(&filename); if ((result = copy_uri_to_file(uri, filename.buf))) goto cleanup; @@ -335,6 +349,11 @@ int fetch_bundle_uri(struct repository *r, const char *uri) return result; } +int fetch_bundle_uri(struct repository *r, const char *uri) +{ + return fetch_bundle_uri_internal(r, uri, 0); +} + /** * General API for {transport,connect}.c etc. */ From 8d557a258bcfc0958b4691ebbff7dde7203f8d30 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:32 +0000 Subject: [PATCH 14/24] bundle-uri: parse bundle list in config format When a bundle provider wants to operate independently from a Git remote, they want to provide a single, consistent URI that users can use in their 'git clone --bundle-uri' commands. At this point, the Git client expects that URI to be a single bundle that can be unbundled and used to bootstrap the rest of the clone from the Git server. This single bundle cannot be re-used to assist with future incremental fetches. To allow for the incremental fetch case, teach Git to understand a bundle list that could be advertised at an independent bundle URI. Such a bundle list is likely to be inspected by human readers, even if only by the bundle provider creating the list. For this reason, we can take our expected "key=value" pairs and instead format them using Git config format. Create parse_bundle_list_in_config_format() to parse a file in config format and convert that into a 'struct bundle_list' filled with its understanding of the contents. Be careful to call git_config_from_file_with_options() because the default action for git_config_from_file() is to die() on a parsing error. The current warning isn't particularly helpful if it arises to a user, but it will be made more verbose at a higher layer later. Update 'test-tool bundle-uri' to take this config file format as input. It uses a filename instead of stdin because there is no existing way to parse a FILE pointer in the config machinery. Using git_config_from_mem() is overly complicated and more likely to introduce bugs than this simpler version. I would rather have a slightly confusing test helper than complicated product code. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 28 +++++++++++++++++++++ bundle-uri.h | 10 ++++++++ t/helper/test-bundle-uri.c | 45 ++++++++++++++++++++++++++------- t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++ 4 files changed, 124 insertions(+), 9 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 74a26ce6c00076..ab4b6385fc0c0e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -180,6 +180,34 @@ static int bundle_list_update(const char *key, const char *value, return 0; } +static int config_to_bundle_list(const char *key, const char *value, void *data) +{ + struct bundle_list *list = data; + return bundle_list_update(key, value, list); +} + +int parse_bundle_list_in_config_format(const char *uri, + const char *filename, + struct bundle_list *list) +{ + int result; + struct config_options opts = { + .error_action = CONFIG_ERROR_ERROR, + }; + + list->mode = BUNDLE_MODE_NONE; + result = git_config_from_file_with_options(config_to_bundle_list, + filename, list, + &opts); + + if (!result && list->mode == BUNDLE_MODE_NONE) { + warning(_("bundle list at '%s' has no mode"), uri); + result = 1; + } + + return result; +} + static void find_temp_filename(struct strbuf *name) { int fd; diff --git a/bundle-uri.h b/bundle-uri.h index 4a12a90bf42620..b01838703360ab 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -73,6 +73,16 @@ int for_all_bundles_in_list(struct bundle_list *list, struct FILE; void print_bundle_list(FILE *fp, struct bundle_list *list); +/** + * A bundle URI may point to a bundle list where the key=value + * pairs are provided in config file format. This method is + * exposed publicly for testing purposes. + */ + +int parse_bundle_list_in_config_format(const char *uri, + const char *filename, + struct bundle_list *list); + /** * Fetch data from the given 'uri' and unbundle the bundle data found * based on that information. diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c index 5cb0c9196fa57c..23ce0eebca3dd9 100644 --- a/t/helper/test-bundle-uri.c +++ b/t/helper/test-bundle-uri.c @@ -4,27 +4,52 @@ #include "strbuf.h" #include "string-list.h" -static int cmd__bundle_uri_parse_key_values(int argc, const char **argv) +enum input_mode { + KEY_VALUE_PAIRS, + CONFIG_FILE, +}; + +static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode) { - const char *usage[] = { + const char *key_value_usage[] = { "test-tool bundle-uri parse-key-values ", + NULL + }; struct option options[] = { OPT_END(), }; + const char **usage = key_value_usage; struct strbuf sb = STRBUF_INIT; struct bundle_list list; int err = 0; - argc = parse_options(argc, argv, NULL, options, usage, 0); - if (argc) - goto usage; + if (mode == CONFIG_FILE) + usage = config_usage; + + argc = parse_options(argc, argv, NULL, options, usage, + PARSE_OPT_STOP_AT_NON_OPTION); init_bundle_list(&list); - while (strbuf_getline(&sb, stdin) != EOF) { - if (bundle_uri_parse_line(&list, sb.buf) < 0) - err = error("bad line: '%s'", sb.buf); + + switch (mode) { + case KEY_VALUE_PAIRS: + if (argc) + goto usage; + while (strbuf_getline(&sb, stdin) != EOF) { + if (bundle_uri_parse_line(&list, sb.buf) < 0) + err = error("bad line: '%s'", sb.buf); + } + break; + + case CONFIG_FILE: + if (argc != 1) + goto usage; + err = parse_bundle_list_in_config_format("", argv[0], &list); + break; } strbuf_release(&sb); @@ -55,7 +80,9 @@ int cmd__bundle_uri(int argc, const char **argv) goto usage; if (!strcmp(argv[1], "parse-key-values")) - return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1); + return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS); + if (!strcmp(argv[1], "parse-config")) + return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE); error("there is no test-tool bundle-uri tool '%s'", argv[1]); usage: diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index ebb805961256a7..c2b7a8ce9681e5 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -88,4 +88,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: just URIs' ' + cat >expect <<-\EOF && + [bundle "list"] + version = 1 + mode = all + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + +test_expect_success 'parse config format edge cases: empty key or value' ' + cat >in1 <<-\EOF && + = bogus-value + EOF + + cat >err1 <<-EOF && + error: bad config line 1 in file in1 + EOF + + cat >expect <<-\EOF && + [bundle "list"] + version = 1 + mode = + EOF + + test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err && + test_cmp err1 err && + test_cmp_config_output expect actual && + + cat >in2 <<-\EOF && + bogus-key = + EOF + + cat >err2 <<-EOF && + warning: bundle list at '\'''\'' has no mode + EOF + + test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err && + test_cmp err2 err && + test_cmp_config_output expect actual +' + test_done From 2fba1fc42cf3a0a7b80b207b68ffb1c480b4be29 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 20 May 2022 18:40:33 +0000 Subject: [PATCH 15/24] bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). RFC TODO: add tests for this feature right now. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 213 ++++++++++++++++++++++++++++++++++++++++++++++----- bundle-uri.h | 6 ++ 2 files changed, 200 insertions(+), 19 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index ab4b6385fc0c0e..601e6e87822a78 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -35,9 +35,10 @@ void init_bundle_list(struct bundle_list *list) static int clear_remote_bundle_info(struct remote_bundle_info *bundle, void *data) { - free(bundle->id); - free(bundle->uri); + FREE_AND_NULL(bundle->id); + FREE_AND_NULL(bundle->uri); strbuf_release(&bundle->file); + bundle->unbundled = 0; return 0; } @@ -336,18 +337,102 @@ static int unbundle_from_file(struct repository *r, const char *file) return result; } +struct bundle_list_context { + struct repository *r; + struct bundle_list *list; + enum bundle_list_mode mode; + int count; + int depth; +}; + +/* + * This early definition is necessary because we use indirect recursion: + * + * While iterating through a bundle list that was downloaded as part + * of fetch_bundle_uri_internal(), iterator methods eventually call it + * again, but with depth + 1. + */ +static int fetch_bundle_uri_internal(struct repository *r, + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list); + +static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data) +{ + struct bundle_list_context *ctx = data; + + if (ctx->mode == BUNDLE_MODE_ANY && ctx->count) + return 0; + + ctx->count++; + return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list); +} + +static int download_bundle_list(struct repository *r, + struct bundle_list *local_list, + struct bundle_list *global_list, + int depth) +{ + struct bundle_list_context ctx = { + .r = r, + .list = global_list, + .depth = depth + 1, + .mode = local_list->mode, + }; + + return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx); +} + +static int fetch_bundle_list_in_config_format(struct repository *r, + struct bundle_list *global_list, + struct remote_bundle_info *bundle, + int depth) +{ + int result; + struct bundle_list list_from_bundle; + + init_bundle_list(&list_from_bundle); + + if ((result = parse_bundle_list_in_config_format(bundle->uri, + bundle->file.buf, + &list_from_bundle))) + goto cleanup; + + if (list_from_bundle.mode == BUNDLE_MODE_NONE) { + warning(_("unrecognized bundle mode from URI '%s'"), + bundle->uri); + result = -1; + goto cleanup; + } + + if ((result = download_bundle_list(r, &list_from_bundle, + global_list, depth))) + goto cleanup; + +cleanup: + clear_bundle_list(&list_from_bundle); + return result; +} + /** * This limits the recursion on fetch_bundle_uri_internal() when following * bundle lists. */ static int max_bundle_uri_depth = 4; +/** + * Recursively download all bundles advertised at the given URI + * to files. If the file is a bundle, then add it to the given + * 'list'. Otherwise, expect a bundle list and recurse on the + * URIs in that list according to the list mode (ANY or ALL). + */ static int fetch_bundle_uri_internal(struct repository *r, - const char *uri, - int depth) + struct remote_bundle_info *bundle, + int depth, + struct bundle_list *list) { int result = 0; - struct strbuf filename = STRBUF_INIT; + struct remote_bundle_info *bcopy; if (depth >= max_bundle_uri_depth) { warning(_("exceeded bundle URI recursion limit (%d)"), @@ -355,31 +440,121 @@ static int fetch_bundle_uri_internal(struct repository *r, return -1; } - find_temp_filename(&filename); - if ((result = copy_uri_to_file(uri, filename.buf))) + if (!bundle->file.len) + find_temp_filename(&bundle->file); + if ((result = copy_uri_to_file(bundle->uri, bundle->file.buf))) goto cleanup; - if ((result = !is_bundle(filename.buf, 0))) - goto cleanup; - - if ((result = unbundle_from_file(r, filename.buf))) + if ((result = !is_bundle(bundle->file.buf, 1))) { + result = fetch_bundle_list_in_config_format( + r, list, bundle, depth); goto cleanup; + } - git_config_set_multivar_gently("log.excludedecoration", - "refs/bundle/", - "refs/bundle/", - CONFIG_FLAGS_FIXED_VALUE | - CONFIG_FLAGS_MULTI_REPLACE); + /* Copy the bundle and insert it into the global list. */ + CALLOC_ARRAY(bcopy, 1); + bcopy->id = xstrdup(bundle->id); + strbuf_init(&bcopy->file, 0); + strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len); + strbuf_detach(&bundle->file, NULL); + hashmap_entry_init(&bcopy->ent, strhash(bcopy->id)); + hashmap_add(&list->bundles, &bcopy->ent); cleanup: - unlink(filename.buf); - strbuf_release(&filename); + if (bundle->file.len) + unlink(bundle->file.buf); return result; } +struct attempt_unbundle_context { + struct repository *r; + int success_count; + int failure_count; +}; + +static int attempt_unbundle(struct remote_bundle_info *info, void *data) +{ + struct attempt_unbundle_context *ctx = data; + + if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) { + ctx->success_count++; + info->unbundled = 1; + } else { + ctx->failure_count++; + } + + return 0; +} + +static int unbundle_all_bundles(struct repository *r, + struct bundle_list *list) +{ + int last_success_count = -1; + struct attempt_unbundle_context ctx = { + .r = r, + }; + + /* + * Iterate through all bundles looking for ones that can + * successfully unbundle. If any succeed, then perhaps another + * will succeed in the next attempt. + */ + while (last_success_count < ctx.success_count) { + last_success_count = ctx.success_count; + + ctx.success_count = 0; + ctx.failure_count = 0; + for_all_bundles_in_list(list, attempt_unbundle, &ctx); + } + + if (ctx.success_count) + git_config_set_multivar_gently("log.excludedecoration", + "refs/bundle/", + "refs/bundle/", + CONFIG_FLAGS_FIXED_VALUE | + CONFIG_FLAGS_MULTI_REPLACE); + + if (ctx.failure_count) { + warning(_("failed to unbundle %d bundles"), + ctx.failure_count); + return -1; + } + + return 0; +} + +static int unlink_bundle(struct remote_bundle_info *info, void *data) +{ + if (info->file.buf) + unlink_or_warn(info->file.buf); + return 0; +} + int fetch_bundle_uri(struct repository *r, const char *uri) { - return fetch_bundle_uri_internal(r, uri, 0); + int result; + struct bundle_list list; + struct remote_bundle_info bundle = { + .uri = xstrdup(uri), + .id = xstrdup(""), + .file = STRBUF_INIT, + }; + + init_bundle_list(&list); + + /* If a bundle is added to this global list, then it is required. */ + list.mode = BUNDLE_MODE_ALL; + + if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list))) + goto cleanup; + + result = unbundle_all_bundles(r, &list); + +cleanup: + for_all_bundles_in_list(&list, unlink_bundle, NULL); + clear_bundle_list(&list); + clear_remote_bundle_info(&bundle, NULL); + return result; } /** diff --git a/bundle-uri.h b/bundle-uri.h index b01838703360ab..adce5dc07e236a 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -36,6 +36,12 @@ struct remote_bundle_info { * an empty string. */ struct strbuf file; + + /** + * If the bundle has been unbundled successfully, then + * this boolean is true. + */ + unsigned unbundled:1; }; #define REMOTE_BUNDLE_INFO_INIT { \ From 7d14211d95a8748c8f1097a8a2a8776c92580754 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 20 May 2022 18:40:34 +0000 Subject: [PATCH 16/24] protocol v2: add server-side "bundle-uri" skeleton MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a skeleton server-side implementation of a new "bundle-uri" command to protocol v2. This will allow conforming clients to optionally seed their initial clones or incremental fetches from URLs containing "*.bundle" files created with "git bundle create". This change only performs the basic boilerplate of advertising a new protocol v2 capability. The new 'bundle-uri' capability allows a client to request a list of bundles. Right now, the server only returns a flush packet, which corresponds to an empty advertisement. The critical bit right now is that the new boolean uploadPack.adverstiseBundleURIs config value signals whether or not this capability should be advertised at all. RFC TODO: Write documentation. RFC TODO: Compare to original implementation. Co-authored-by: Derrick Stolee Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- bundle-uri.c | 36 ++++++++++++++++++++++++++++++++++++ bundle-uri.h | 7 +++++++ serve.c | 6 ++++++ t/t5701-git-serve.sh | 40 +++++++++++++++++++++++++++++++++++++++- 4 files changed, 88 insertions(+), 1 deletion(-) diff --git a/bundle-uri.c b/bundle-uri.c index 601e6e87822a78..79a6b7ed8b88d4 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -557,6 +557,42 @@ int fetch_bundle_uri(struct repository *r, const char *uri) return result; } +/** + * API for serve.c. + */ + +static int advertise_bundle_uri = -1; + +int bundle_uri_advertise(struct repository *r, struct strbuf *value) +{ + if (advertise_bundle_uri != -1) + goto cached; + + advertise_bundle_uri = 0; + git_config_get_maybe_bool("uploadpack.advertisebundleuris", &advertise_bundle_uri); + +cached: + return advertise_bundle_uri; +} + +int bundle_uri_command(struct repository *r, + struct packet_reader *request) +{ + struct packet_writer writer; + packet_writer_init(&writer, 1); + + while (packet_reader_read(request) == PACKET_READ_NORMAL) + die(_("bundle-uri: unexpected argument: '%s'"), request->line); + if (request->status != PACKET_READ_FLUSH) + die(_("bundle-uri: expected flush after arguments")); + + /* TODO: Implement the communication */ + + packet_writer_flush(&writer); + + return 0; +} + /** * General API for {transport,connect}.c etc. */ diff --git a/bundle-uri.h b/bundle-uri.h index adce5dc07e236a..9f6fc2d75f9fa6 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -4,6 +4,7 @@ #include "hashmap.h" #include "strbuf.h" +struct packet_reader; struct repository; struct strbuf; struct string_list; @@ -97,6 +98,12 @@ int parse_bundle_list_in_config_format(const char *uri, */ int fetch_bundle_uri(struct repository *r, const char *uri); +/** + * API for serve.c. + */ +int bundle_uri_advertise(struct repository *r, struct strbuf *value); +int bundle_uri_command(struct repository *r, struct packet_reader *request); + /** * General API for {transport,connect}.c etc. */ diff --git a/serve.c b/serve.c index b3fe9b5126a334..f3e0203d2c6c71 100644 --- a/serve.c +++ b/serve.c @@ -8,6 +8,7 @@ #include "protocol-caps.h" #include "serve.h" #include "upload-pack.h" +#include "bundle-uri.h" static int advertise_sid = -1; static int client_hash_algo = GIT_HASH_SHA1; @@ -136,6 +137,11 @@ static struct protocol_capability capabilities[] = { .advertise = always_advertise, .command = cap_object_info, }, + { + .name = "bundle-uri", + .advertise = bundle_uri_advertise, + .command = bundle_uri_command, + }, }; void protocol_v2_advertise_capabilities(void) diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh index 1896f671cb37f9..f21e5e9d33d199 100755 --- a/t/t5701-git-serve.sh +++ b/t/t5701-git-serve.sh @@ -13,7 +13,7 @@ test_expect_success 'test capability advertisement' ' wrong_algo sha1:sha256 wrong_algo sha256:sha1 EOF - cat >expect <<-EOF && + cat >expect.base <<-EOF && version 2 agent=git/$(git version | cut -d" " -f3) ls-refs=unborn @@ -21,8 +21,11 @@ test_expect_success 'test capability advertisement' ' server-option object-format=$(test_oid algo) object-info + EOF + cat >expect.trailer <<-EOF && 0000 EOF + cat expect.base expect.trailer >expect && GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \ --advertise-capabilities >out && @@ -342,4 +345,39 @@ test_expect_success 'basics of object-info' ' test_cmp expect actual ' +test_expect_success 'test capability advertisement with uploadpack.advertiseBundleURIs' ' + test_config uploadpack.advertiseBundleURIs true && + + cat >expect.extra <<-EOF && + bundle-uri + EOF + cat expect.base \ + expect.extra \ + expect.trailer >expect && + + GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \ + --advertise-capabilities >out && + test-tool pkt-line unpack actual && + test_cmp expect actual +' + +test_expect_success 'basics of bundle-uri: dies if not enabled' ' + test-tool pkt-line pack >in <<-EOF && + command=bundle-uri + 0000 + EOF + + cat >err.expect <<-\EOF && + fatal: invalid command '"'"'bundle-uri'"'"' + EOF + + cat >expect <<-\EOF && + ERR serve: invalid command '"'"'bundle-uri'"'"' + EOF + + test_must_fail test-tool serve-v2 --stateless-rpc out 2>err.actual && + test_cmp err.expect err.actual && + test_must_be_empty out +' + test_done From edceab01e04cc07c4cad3c6c035aaaeeb1beda01 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 20 May 2022 18:40:35 +0000 Subject: [PATCH 17/24] bundle-uri client: add minimal NOOP client MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Set up all the needed client parts of the "bundle-uri" protocol extension, without actually doing anything with the bundle URIs. I.e. if the server says it supports "bundle-uri" we'll issue a command=bundle-uri after command=ls-refs when we're cloning. We'll parse the returned output using the code already tested for in t5750-bundle-uri-parse.sh. What we aren't doing is actually acting on that data, i.e. downloading the bundle(s) before we get to doing the command=fetch, and adjusting our negotiation dialog appropriately. I'll do that in subsequent commits. There's a question of what level of encapsulation we should use here, I've opted to use connect.h in clone.c, but we could also e.g. make transport_get_remote_refs() invoke this, i.e. make it implicitly get the bundle-uri list for later steps. This approach means that we don't "support" this in "git fetch" for now. I'm starting with the case of initial clones, although as noted in preceding commits to the protocol documentation nothing about this approach precludes getting bundles on incremental fetches. For the t5732-protocol-v2-bundle-uri-http.sh it's not easy to set environment variables for git-upload-pack (it's started by Apache), so let's skip the test under T5730_HTTP, and add unused T5730_{FILE,GIT} prerequisites for consistency and future use. Co-authored-by: Derrick Stolee Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- builtin/clone.c | 7 ++ bundle-uri.c | 4 + connect.c | 47 ++++++++ remote.h | 5 + t/lib-t5730-protocol-v2-bundle-uri.sh | 148 +++++++++++++++++++++++++ t/t5730-protocol-v2-bundle-uri-file.sh | 36 ++++++ t/t5731-protocol-v2-bundle-uri-git.sh | 17 +++ t/t5732-protocol-v2-bundle-uri-http.sh | 17 +++ transport-helper.c | 13 +++ transport-internal.h | 7 ++ transport.c | 55 +++++++++ transport.h | 19 ++++ 12 files changed, 375 insertions(+) create mode 100644 t/lib-t5730-protocol-v2-bundle-uri.sh create mode 100755 t/t5730-protocol-v2-bundle-uri-file.sh create mode 100755 t/t5731-protocol-v2-bundle-uri-git.sh create mode 100755 t/t5732-protocol-v2-bundle-uri-http.sh diff --git a/builtin/clone.c b/builtin/clone.c index a9caa5dfed640b..7bedd9eb29e054 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -27,6 +27,7 @@ #include "iterator.h" #include "sigchain.h" #include "branch.h" +#include "connect.h" #include "remote.h" #include "run-command.h" #include "connected.h" @@ -1262,6 +1263,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (refs) mapped_refs = wanted_peer_refs(refs, &remote->fetch); + /* + * Populate transport->got_remote_bundle_uri and + * transport->bundle_uri. We might get nothing. + */ + transport_get_remote_bundle_uri(transport); + if (mapped_refs) { int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport)); diff --git a/bundle-uri.c b/bundle-uri.c index 79a6b7ed8b88d4..fed508cd51a919 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -565,6 +565,10 @@ static int advertise_bundle_uri = -1; int bundle_uri_advertise(struct repository *r, struct strbuf *value) { + if (value && + git_env_bool("GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE", 0)) + strbuf_addstr(value, "test-unknown-capability-value"); + if (advertise_bundle_uri != -1) goto cached; diff --git a/connect.c b/connect.c index e6d0b1d34bd61c..00dd00cfa444ad 100644 --- a/connect.c +++ b/connect.c @@ -15,6 +15,7 @@ #include "version.h" #include "protocol.h" #include "alias.h" +#include "bundle-uri.h" static char *server_capabilities_v1; static struct strvec server_capabilities_v2 = STRVEC_INIT; @@ -491,6 +492,52 @@ static void send_capabilities(int fd_out, struct packet_reader *reader) } } +int get_remote_bundle_uri(int fd_out, struct packet_reader *reader, + struct bundle_list *bundles, int stateless_rpc) +{ + int line_nr = 1; + + /* Assert bundle-uri support */ + server_supports_v2("bundle-uri", 1); + + /* (Re-)send capabilities */ + send_capabilities(fd_out, reader); + + /* Send command */ + packet_write_fmt(fd_out, "command=bundle-uri\n"); + packet_delim(fd_out); + + /* Send options */ + if (git_env_bool("GIT_TEST_PROTOCOL_BAD_BUNDLE_URI", 0)) + packet_write_fmt(fd_out, "test-bad-client\n"); + packet_flush(fd_out); + + /* Process response from server */ + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { + const char *line = reader->line; + line_nr++; + + if (!bundle_uri_parse_line(bundles, line)) + continue; + + return error(_("error on bundle-uri response line %d: %s"), + line_nr, line); + } + + if (reader->status != PACKET_READ_FLUSH) + return error(_("expected flush after bundle-uri listing")); + + /* + * Might die(), but obscure enough that that's OK, e.g. in + * serve.c we'll call BUG() on its equivalent (the + * PACKET_READ_RESPONSE_END check). + */ + check_stateless_delimiter(stateless_rpc, reader, + _("expected response end packet after ref listing")); + + return 0; +} + struct ref **get_remote_refs(int fd_out, struct packet_reader *reader, struct ref **list, int for_push, struct transport_ls_refs_options *transport_options, diff --git a/remote.h b/remote.h index dd4402436f1f2e..5f743438cd48a1 100644 --- a/remote.h +++ b/remote.h @@ -236,6 +236,11 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader, const struct string_list *server_options, int stateless_rpc); +/* Used for protocol v2 in order to retrieve refs from a remote */ +struct bundle_list; +int get_remote_bundle_uri(int fd_out, struct packet_reader *reader, + struct bundle_list *bundles, int stateless_rpc); + int resolve_remote_symref(struct ref *ref, struct ref *list); /* diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh new file mode 100644 index 00000000000000..27294e9c9768ae --- /dev/null +++ b/t/lib-t5730-protocol-v2-bundle-uri.sh @@ -0,0 +1,148 @@ +# Included from t573*-protocol-v2-bundle-uri-*.sh + +T5730_PARENT= +T5730_URI= +T5730_BUNDLE_URI= +case "$T5730_PROTOCOL" in +file) + T5730_PARENT=file_parent + T5730_URI="file://$PWD/file_parent" + T5730_BUNDLE_URI="$T5730_URI/fake.bdl" + test_set_prereq T5730_FILE + ;; +git) + . "$TEST_DIRECTORY"/lib-git-daemon.sh + start_git_daemon --export-all --enable=receive-pack + T5730_PARENT="$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent" + T5730_URI="$GIT_DAEMON_URL/parent" + T5730_BUNDLE_URI="https://example.com/fake.bdl" + test_set_prereq T5730_GIT + ;; +http) + . "$TEST_DIRECTORY"/lib-httpd.sh + start_httpd + T5730_PARENT="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" + T5730_URI="$HTTPD_URL/smart/http_parent" + T5730_BUNDLE_URI="https://example.com/fake.bdl" + test_set_prereq T5730_HTTP + ;; +*) + BUG "Need to pass valid T5730_PROTOCOL (was $T5730_PROTOCOL)" + ;; +esac + +test_expect_success "setup protocol v2 $T5730_PROTOCOL:// tests" ' + git init "$T5730_PARENT" && + test_commit -C "$T5730_PARENT" one && + git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs true +' + +# Poor man's URI escaping. Good enough for the test suite whose trash +# directory has a space in it. See 93c3fcbe4d4 (git-svn: attempt to +# mimic SVN 1.7 URL canonicalization, 2012-07-28) for prior art. +test_uri_escape() { + sed 's/ /%20/g' +} + +case "$T5730_PROTOCOL" in +http) + test_expect_success "setup config for $T5730_PROTOCOL:// tests" ' + git -C "$T5730_PARENT" config http.receivepack true + ' + ;; +*) + ;; +esac +T5730_BUNDLE_URI_ESCAPED=$(echo "$T5730_BUNDLE_URI" | test_uri_escape) + +test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: no bundle-uri" ' + test_when_finished "rm -f log" && + test_when_finished "git -C \"$T5730_PARENT\" config uploadpack.advertiseBundleURIs true" && + git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs false && + + GIT_TRACE_PACKET="$PWD/log" \ + git \ + -c protocol.version=2 \ + ls-remote --symref "$T5730_URI" \ + >actual 2>err && + + # Server responded using protocol v2 + grep "< version 2" log && + + ! grep bundle-uri log +' + +test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: have bundle-uri" ' + test_when_finished "rm -f log" && + + test_config -C "$T5730_PARENT" \ + uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" && + + GIT_TRACE_PACKET="$PWD/log" \ + git \ + -c protocol.version=2 \ + ls-remote --symref "$T5730_URI" \ + >actual 2>err && + + # Server responded using protocol v2 + grep "< version 2" log && + + # Server advertised bundle-uri capability + grep bundle-uri log +' + +test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protocol v2" ' + test_when_finished "rm -f log" && + + test_config -C "$T5730_PARENT" uploadpack.bundleURI \ + "$T5730_BUNDLE_URI_ESCAPED" && + + cat >err.expect <<-\EOF && + Cloning into '"'"'child'"'"'... + EOF + case "$T5730_PROTOCOL" in + file) + cat >fatal-bundle-uri.expect <<-\EOF + fatal: bundle-uri: unexpected argument: '"'"'test-bad-client'"'"' + EOF + ;; + *) + cat >fatal.expect <<-\EOF + fatal: read error: Connection reset by peer + EOF + ;; + esac && + + test_when_finished "rm -rf child" && + test_must_fail ok=sigpipe env \ + GIT_TRACE_PACKET="$PWD/log" \ + GIT_TEST_PROTOCOL_BAD_BUNDLE_URI=true \ + git -c protocol.version=2 \ + clone "$T5730_URI" child \ + >out 2>err && + test_must_be_empty out && + + grep -v -e ^fatal: -e ^error: err >err.actual && + test_cmp err.expect err.actual && + + case "$T5730_PROTOCOL" in + file) + # Due to general race conditions with client/server replies we + # may or may not get "fatal: the remote end hung up + # expectedly" here + grep "^fatal: bundle-uri:" err >fatal-bundle-uri.actual && + test_cmp fatal-bundle-uri.expect fatal-bundle-uri.actual + ;; + *) + grep "^fatal:" err >fatal.actual && + # Due to the same race conditions this might be + # "fatal: read error: Connection reset by peer", "fatal: the remote end + # hung up unexpectedly" etc. + cat fatal.actual && + test_file_not_empty fatal.actual + ;; + esac && + + grep "clone> test-bad-client$" log >sent-bad-request && + test_file_not_empty sent-bad-request +' diff --git a/t/t5730-protocol-v2-bundle-uri-file.sh b/t/t5730-protocol-v2-bundle-uri-file.sh new file mode 100755 index 00000000000000..89203d3a23c40a --- /dev/null +++ b/t/t5730-protocol-v2-bundle-uri-file.sh @@ -0,0 +1,36 @@ +#!/bin/sh + +test_description="Test bundle-uri with protocol v2 and 'file://' transport" + +TEST_NO_CREATE_REPO=1 + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +# Test protocol v2 with 'file://' transport +# +T5730_PROTOCOL=file +. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh + +test_expect_success "unknown capability value with $T5730_PROTOCOL:// using protocol v2" ' + test_when_finished "rm -f log" && + + test_config -C "$T5730_PARENT" \ + uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" && + + GIT_TRACE_PACKET="$PWD/log" \ + GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE=true \ + git \ + -c protocol.version=2 \ + ls-remote --symref "$T5730_URI" \ + >actual 2>err && + + # Server responded using protocol v2 + grep "< version 2" log && + + grep "> bundle-uri=test-unknown-capability-value" log +' + +test_done diff --git a/t/t5731-protocol-v2-bundle-uri-git.sh b/t/t5731-protocol-v2-bundle-uri-git.sh new file mode 100755 index 00000000000000..282847b311fc09 --- /dev/null +++ b/t/t5731-protocol-v2-bundle-uri-git.sh @@ -0,0 +1,17 @@ +#!/bin/sh + +test_description="Test bundle-uri with protocol v2 and 'git://' transport" + +TEST_NO_CREATE_REPO=1 + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +# Test protocol v2 with 'git://' transport +# +T5730_PROTOCOL=git +. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh + +test_done diff --git a/t/t5732-protocol-v2-bundle-uri-http.sh b/t/t5732-protocol-v2-bundle-uri-http.sh new file mode 100755 index 00000000000000..fcc1cf3faef442 --- /dev/null +++ b/t/t5732-protocol-v2-bundle-uri-http.sh @@ -0,0 +1,17 @@ +#!/bin/sh + +test_description="Test bundle-uri with protocol v2 and 'git://' transport" + +TEST_NO_CREATE_REPO=1 + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +. ./test-lib.sh + +# Test protocol v2 with 'git://' transport +# +T5730_PROTOCOL=http +. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh + +test_done diff --git a/transport-helper.c b/transport-helper.c index dfbeaebe40cac0..ee7d0b09a240c8 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -1270,9 +1270,22 @@ static struct ref *get_refs_list_using_list(struct transport *transport, return ret; } +static int get_bundle_uri(struct transport *transport) +{ + get_helper(transport); + + if (process_connect(transport, 0)) { + do_take_over(transport); + return transport->vtable->get_bundle_uri(transport); + } + + return -1; +} + static struct transport_vtable vtable = { .set_option = set_helper_option, .get_refs_list = get_refs_list, + .get_bundle_uri = get_bundle_uri, .fetch_refs = fetch_refs, .push_refs = push_refs, .connect = connect_helper, diff --git a/transport-internal.h b/transport-internal.h index c4ca0b733acbdb..90ea749e5cfbf7 100644 --- a/transport-internal.h +++ b/transport-internal.h @@ -26,6 +26,13 @@ struct transport_vtable { struct ref *(*get_refs_list)(struct transport *transport, int for_push, struct transport_ls_refs_options *transport_options); + /** + * Populates the remote side's bundle-uri under protocol v2, + * if the "bundle-uri" capability was advertised. Returns 0 if + * OK, negative values on error. + */ + int (*get_bundle_uri)(struct transport *transport); + /** * Fetch the objects for the given refs. Note that this gets * an array, and should ignore the list structure. diff --git a/transport.c b/transport.c index 3d64a43ab394b1..106bb8c5577fc8 100644 --- a/transport.c +++ b/transport.c @@ -22,6 +22,7 @@ #include "protocol.h" #include "object-store.h" #include "color.h" +#include "bundle-uri.h" static int transport_use_color = -1; static char transport_colors[][COLOR_MAXLEN] = { @@ -359,6 +360,25 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus return handshake(transport, for_push, options, 1); } +static int get_bundle_uri(struct transport *transport) +{ + struct git_transport_data *data = transport->data; + struct packet_reader reader; + int stateless_rpc = transport->stateless_rpc; + + if (!transport->bundles) { + CALLOC_ARRAY(transport->bundles, 1); + init_bundle_list(transport->bundles); + } + + packet_reader_init(&reader, data->fd[0], NULL, 0, + PACKET_READ_CHOMP_NEWLINE | + PACKET_READ_GENTLE_ON_EOF); + + return get_remote_bundle_uri(data->fd[1], &reader, + transport->bundles, stateless_rpc); +} + static int fetch_refs_via_pack(struct transport *transport, int nr_heads, struct ref **to_fetch) { @@ -899,6 +919,7 @@ static int disconnect_git(struct transport *transport) static struct transport_vtable taken_over_vtable = { .get_refs_list = get_refs_via_connect, + .get_bundle_uri = get_bundle_uri, .fetch_refs = fetch_refs_via_pack, .push_refs = git_transport_push, .disconnect = disconnect_git @@ -1052,6 +1073,7 @@ static struct transport_vtable bundle_vtable = { static struct transport_vtable builtin_smart_vtable = { .get_refs_list = get_refs_via_connect, + .get_bundle_uri = get_bundle_uri, .fetch_refs = fetch_refs_via_pack, .push_refs = git_transport_push, .connect = connect_git, @@ -1066,6 +1088,9 @@ struct transport *transport_get(struct remote *remote, const char *url) ret->progress = isatty(2); string_list_init_dup(&ret->pack_lockfiles); + CALLOC_ARRAY(ret->bundles, 1); + init_bundle_list(ret->bundles); + if (!remote) BUG("No remote provided to transport_get()"); @@ -1473,6 +1498,34 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs) return rc; } +int transport_get_remote_bundle_uri(struct transport *transport) +{ + const struct transport_vtable *vtable = transport->vtable; + + /* Check config only once. */ + if (transport->got_remote_bundle_uri++) + return 0; + + /* + * "Support" protocol v0 and v2 without bundle-uri support by + * silently degrading to a NOOP. + */ + if (!server_supports_v2("bundle-uri", 0)) + return 0; + + /* + * This is intentionally below the transport.injectBundleURI, + * we want to be able to inject into protocol v0, or into the + * dialog of a server who doesn't support this. + */ + if (!vtable->get_bundle_uri) + return error(_("bundle-uri operation not supported by protocol")); + + if (vtable->get_bundle_uri(transport) < 0) + return error(_("could not retrieve server-advertised bundle-uri list")); + return 0; +} + void transport_unlock_pack(struct transport *transport, unsigned int flags) { int in_signal_handler = !!(flags & TRANSPORT_UNLOCK_PACK_IN_SIGNAL_HANDLER); @@ -1503,6 +1556,8 @@ int transport_disconnect(struct transport *transport) ret = transport->vtable->disconnect(transport); if (transport->got_remote_refs) free_refs((void *)transport->remote_refs); + clear_bundle_list(transport->bundles); + free(transport->bundles); free(transport); return ret; } diff --git a/transport.h b/transport.h index 12bc08fc33949a..015375e083d9d9 100644 --- a/transport.h +++ b/transport.h @@ -62,6 +62,7 @@ enum transport_family { TRANSPORT_FAMILY_IPV6 }; +struct bundle_list; struct transport { const struct transport_vtable *vtable; @@ -76,6 +77,18 @@ struct transport { */ unsigned got_remote_refs : 1; + /** + * Indicates whether we already called get_bundle_uri_list(); set by + * transport.c::transport_get_remote_bundle_uri(). + */ + unsigned got_remote_bundle_uri : 1; + + /* + * The results of "command=bundle-uri", if both sides support + * the "bundle-uri" capability. + */ + struct bundle_list *bundles; + /* * Transports that call take-over destroys the data specific to * the transport type while doing so, and cannot be reused. @@ -280,6 +293,12 @@ void transport_ls_refs_options_release(struct transport_ls_refs_options *opts); const struct ref *transport_get_remote_refs(struct transport *transport, struct transport_ls_refs_options *transport_options); +/** + * Retrieve bundle URI(s) from a remote. Populates "struct + * transport"'s "bundle_uri" and "got_remote_bundle_uri". + */ +int transport_get_remote_bundle_uri(struct transport *transport); + /* * Fetch the hash algorithm used by a remote. * From 8ae7d7b7c22104c09100a873408d84ed20ff81b4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Date: Fri, 20 May 2022 18:40:36 +0000 Subject: [PATCH 18/24] bundle-uri client: add "git ls-remote-bundle-uri" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TODO: make this a test-helper instead of its own builtin? Add a git-ls-remote-bundle-uri command, this is a thin wrapper for issuing protocol v2 "bundle-uri" commands to a server, and to the parsing routines in bundle-uri.c. Since in the "git clone" case we'll have already done the handshake(), but not here, introduce a "got_advertisement" state along with "got_remote_heads". It seems to me that the "got_remote_heads" is badly named in the first place, and the whole logic of eagerly getting ls-refs on handshake() or not could be refactored somewhat, but let's not do that now, and instead just add another self-documenting state variable. Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- .gitignore | 1 + Documentation/git-ls-remote-bundle-uri.txt | 62 +++++++++ Documentation/git-ls-remote.txt | 1 + Makefile | 1 + builtin.h | 1 + builtin/clone.c | 2 +- builtin/ls-remote-bundle-uri.c | 71 ++++++++++ command-list.txt | 1 + git.c | 1 + t/lib-t5730-protocol-v2-bundle-uri.sh | 147 ++++++++++++++++++++- transport.c | 43 ++++-- transport.h | 6 +- 12 files changed, 324 insertions(+), 13 deletions(-) create mode 100644 Documentation/git-ls-remote-bundle-uri.txt create mode 100644 builtin/ls-remote-bundle-uri.c diff --git a/.gitignore b/.gitignore index e81de1063a4004..e1312b2795c4b2 100644 --- a/.gitignore +++ b/.gitignore @@ -91,6 +91,7 @@ /git-log /git-ls-files /git-ls-remote +/git-ls-remote-bundle-uri /git-ls-tree /git-mailinfo /git-mailsplit diff --git a/Documentation/git-ls-remote-bundle-uri.txt b/Documentation/git-ls-remote-bundle-uri.txt new file mode 100644 index 00000000000000..793d7677f2f7ac --- /dev/null +++ b/Documentation/git-ls-remote-bundle-uri.txt @@ -0,0 +1,62 @@ +git-ls-remote-bundle-uri(1) +=========================== + +NAME +---- +git-ls-remote-bundle-uri - List 'bundle-uri' in a remote repository + +SYNOPSIS +-------- +[verse] +'git ls-remote-bundle-uri' [-q |--quiet] [--uri] [--upload-pack=] + [[-o | --server-option=]