Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle URIs II: git clone --bundle-uri #1300

Closed

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Jul 25, 2022

This is the second series building the bundle URI feature as discussed in the previous series that added the design document [1]. This series does not modify the design document, so the patches are independent and can be applied to the latest 'master'.

[1] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

This series brings in just enough logic that we can bootstrap clones from a single bundle using git clone --bundle-uri=<X>.

  • Patch 1 adds a 'get' capability to 'git remote-https' which allows downloading the contents of a URI to a local file.
  • Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It is not used until patch 3.
  • Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to bootstrap a clone from a bundle, but get the remaining objects from the origin URL. (As of this patch, it only accepts a filename.)
  • Patch 4 extends the git clone --bundle-uri=<X> option to allow file:// and https:// URIs.
  • Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the same time in git clone.

As outlined in [1], the next steps after this are:

  1. Allow parsing a bundle list as a config file at the given URI. The key-value format is unified with the protocol v2 verb (coming in (3)). [2]
  2. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [3]
  3. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [4]

I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going:

[2] derrickstolee#20

[3] derrickstolee#21

[4] derrickstolee#22

Note: this series includes some code pulled out of the first series [1], and in particular the git fetch --bundle-uri=<X> option is removed. The intention was to replace that with git bundle fetch <X>, but that conflicts with the work to refactor how subcommands are parsed. The git bundle fetch subcommand could be added later for maximum flexibility on the client side, but we can also move forward without it.

Updates in v3

  • The protocol matching for "http://", "https://", and "file://" now uniformly include the "//" portion and are case-sensitive.

Updates in v2

  • Several typos or small tweaks. See the range-diff for details.

Thanks,
-Stolee

cc: gitster@pobox.com
cc: me@ttaylorr.com
cc: newren@gmail.com
cc: avarab@gmail.com
cc: dyroneteng@gmail.com
cc: Johannes.Schindelin@gmx.de
cc: szeder.dev@gmail.com
cc: mjcheetham@outlook.com
cc: steadmon@google.com

@derrickstolee derrickstolee self-assigned this Jul 25, 2022
@derrickstolee derrickstolee force-pushed the bundle-redo/clone branch 2 times, most recently from 1b07cae to e823b16 Compare July 25, 2022 20:24
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 25, 2022

Submitted as pull.1300.git.1658781277.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1300/derrickstolee/bundle-redo/clone-v1

To fetch this version to local tag pr-1300/derrickstolee/bundle-redo/clone-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1300/derrickstolee/bundle-redo/clone-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 25, 2022

This branch is now known as ds/bundle-uri-clone.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 25, 2022

This patch series was integrated into seen via git@6ba7107.

@gitgitgadget gitgitgadget bot added the seen label Jul 25, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 27, 2022

This patch series was integrated into seen via git@952976f.

@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
Can guarantee that when a clone is requested, the received
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> A future change will want a way to download a file over HTTP(S) using
> the simplest of download mechanisms. We do not want to assume that the
> server on the other side understands anything about the Git protocol but
> could be a simple static web server.
> 
> Create the new 'get' capability for the remote helpers which advertises
> that the 'get' command is avalable. A caller can send a line containing
> 'get <url> <path>' to download the file at <url> into the file at
> <path>.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/gitremote-helpers.txt |  9 +++++++
>  remote-curl.c                       | 32 +++++++++++++++++++++++++
>  t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
>  3 files changed, 78 insertions(+)
>  create mode 100755 t/t5557-http-get.sh
> 
> diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
> index 6f1e269ae43..ed8da428c98 100644
> --- a/Documentation/gitremote-helpers.txt
> +++ b/Documentation/gitremote-helpers.txt
> @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
>  	Can guarantee that when a clone is requested, the received
>  	pack is self contained and is connected.
>  
> +'get'::
> +	Can use the 'get' command to download a file from a given URI.
> +
>  If a helper advertises 'connect', Git will use it if possible and
>  fall back to another capability if the helper requests so when
>  connecting (see the 'connect' command under COMMANDS).
> @@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
>  +
>  Supported if the helper has the "stateless-connect" capability.
>  
> +'get' <uri> <path>::
> +	Downloads the file from the given `<uri>` to the given `<path>`. If
> +	`<path>.temp` exists, then Git assumes that the `.temp` file is a
> +	partial download from a previous attempt and will resume the
> +	download from that position.
> +
>  If a fatal error occurs, the program writes the error message to
>  stderr and exits. The caller should expect that a suitable error
>  message has been printed if the child closes the connection without
> diff --git a/remote-curl.c b/remote-curl.c
> index b8758757ece..73fbdbddd84 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -1286,6 +1286,33 @@ static void parse_fetch(struct strbuf *buf)
>  	strbuf_reset(buf);
>  }
>  
> +static void parse_get(struct strbuf *buf)
> +{
> +	struct strbuf url = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	const char *p, *space;
> +
> +	if (!skip_prefix(buf->buf, "get ", &p))
> +		die(_("http transport does not support %s"), buf->buf);

Nit: since we're already calling skip_prefix(...) in cmd_main() below,
can we just pass the suffix to parse_get() and avoid having to skip the
prefix twice?


> +
> +	space = strchr(p, ' ');
> +
> +	if (!space)
> +		die(_("protocol error: expected '<url> <path>', missing space"));
> +
> +	strbuf_add(&url, p, space - p);
> +	strbuf_addstr(&path, space + 1);
> +
> +	if (http_get_file(url.buf, path.buf, NULL))
> +		die(_("failed to download file at URL '%s'"), url.buf);
> +
> +	strbuf_release(&url);
> +	strbuf_release(&path);
> +	printf("\n");
> +	fflush(stdout);
> +	strbuf_reset(buf);
> +}
> +
>  static int push_dav(int nr_spec, const char **specs)
>  {
>  	struct child_process child = CHILD_PROCESS_INIT;
> @@ -1564,9 +1591,14 @@ int cmd_main(int argc, const char **argv)
>  				printf("unsupported\n");
>  			fflush(stdout);
>  
> +		} else if (skip_prefix(buf.buf, "get ", &arg)) {
> +			parse_get(&buf);
> +			fflush(stdout);
> +
>  		} else if (!strcmp(buf.buf, "capabilities")) {
>  			printf("stateless-connect\n");
>  			printf("fetch\n");
> +			printf("get\n");
>  			printf("option\n");
>  			printf("push\n");
>  			printf("check-connectivity\n");
> diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
> new file mode 100755
> index 00000000000..50b7dbcf957
> --- /dev/null
> +++ b/t/t5557-http-get.sh
> @@ -0,0 +1,37 @@
> +#!/bin/sh
> +
> +test_description='test downloading a file by URL'
> +
> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp
> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

@@ -906,6 +906,7 @@ LIB_OBJS += blob.o
LIB_OBJS += bloom.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Before implementing a way to fetch bundles into a repository, create the
> basic logic. Assume that the URI is actually a file path. Future logic
> will make this more careful to other protocols.
> 
> For now, we also only succeed if the content at the URI is a bundle
> file, not a bundle list. Bundle lists will be implemented in a future
> change.
> 
> Note that the discovery of a temporary filename is slightly racy because
> the odb_mkstemp() relies on the temporary file not existing. With the
> current implementation being limited to file copies, we could replace
> the copy_file() with copy_fd(). The tricky part comes in future changes
> that send the filename to 'git remote-https' and its 'get' capability.

> At that point, we need the file descriptor closed _and_ the file
> unlinked.

Ahh, it looks like this was the point I missed in my previous review.
IIUC, we need the file unlinked because http_get_file() will eventually
call finalize_object_file() to move a tempfile to the final object name,
and that will fail if we have an empty file already in place.


> If we were to keep the file descriptor open for the sake of
> normal file copies, then we would pollute the rest of the code for
> little benefit. This is especially the case because we expect that most
> bundle URI use will be based on HTTPS instead of file copies.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Makefile     |   1 +
>  bundle-uri.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  bundle-uri.h |  14 +++++++
>  3 files changed, 119 insertions(+)
>  create mode 100644 bundle-uri.c
>  create mode 100644 bundle-uri.h
> 
> diff --git a/Makefile b/Makefile
> index 1624471badc..7d5f48069ea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -906,6 +906,7 @@ LIB_OBJS += blob.o
>  LIB_OBJS += bloom.o
>  LIB_OBJS += branch.o
>  LIB_OBJS += bulk-checkin.o
> +LIB_OBJS += bundle-uri.o
>  LIB_OBJS += bundle.o
>  LIB_OBJS += cache-tree.o
>  LIB_OBJS += cbtree.o
> diff --git a/bundle-uri.c b/bundle-uri.c
> new file mode 100644
> index 00000000000..b35babc36aa
> --- /dev/null
> +++ b/bundle-uri.c
> @@ -0,0 +1,104 @@
> +#include "cache.h"
> +#include "bundle-uri.h"
> +#include "bundle.h"
> +#include "object-store.h"
> +#include "refs.h"
> +#include "run-command.h"
> +
> +static int find_temp_filename(struct strbuf *name)
> +{
> +	int fd;
> +	/*
> +	 * Find a temporary filename that is available. This is briefly
> +	 * racy, but unlikely to collide.
> +	 */
> +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
> +	if (fd < 0) {
> +		warning(_("failed to create temporary file"));
> +		return -1;
> +	}
> +
> +	close(fd);
> +	unlink(name->buf);
> +	return 0;
> +}
> +
> +static int copy_uri_to_file(const char *file, const char *uri)
> +{
> +	/* File-based URIs only for now. */
> +	return copy_file(file, uri, 0);
> +}
> +
> +static int unbundle_from_file(struct repository *r, const char *file)
> +{
> +	int result = 0;
> +	int bundle_fd;
> +	struct bundle_header header = BUNDLE_HEADER_INIT;
> +	struct string_list_item *refname;
> +	struct strbuf bundle_ref = STRBUF_INIT;
> +	size_t bundle_prefix_len;
> +
> +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
> +		return 1;
> +
> +	if ((result = unbundle(r, &header, bundle_fd, NULL)))
> +		return 1;
> +
> +	/*
> +	 * Convert all refs/heads/ from the bundle into refs/bundles/
> +	 * in the local repository.
> +	 */
> +	strbuf_addstr(&bundle_ref, "refs/bundles/");
> +	bundle_prefix_len = bundle_ref.len;
> +
> +	for_each_string_list_item(refname, &header.references) {
> +		struct object_id *oid = refname->util;
> +		struct object_id old_oid;
> +		const char *branch_name;
> +		int has_old;
> +
> +		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
> +			continue;
> +
> +		strbuf_setlen(&bundle_ref, bundle_prefix_len);
> +		strbuf_addstr(&bundle_ref, branch_name);
> +
> +		has_old = !read_ref(bundle_ref.buf, &old_oid);
> +		update_ref("fetched bundle", bundle_ref.buf, oid,
> +			   has_old ? &old_oid : NULL,
> +			   REF_SKIP_OID_VERIFICATION,
> +			   UPDATE_REFS_MSG_ON_ERR);
> +	}
> +
> +	bundle_header_release(&header);

We still also need to release bundle_ref here, right?


> +	return result;
> +}
> +
> +int fetch_bundle_uri(struct repository *r, const char *uri)
> +{
> +	int result = 0;
> +	struct strbuf filename = STRBUF_INIT;
> +
> +	if ((result = find_temp_filename(&filename)))
> +		goto cleanup;
> +
> +	if ((result = copy_uri_to_file(filename.buf, uri))) {
> +		warning(_("failed to download bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = !is_bundle(filename.buf, 0))) {
> +		warning(_("file at URI '%s' is not a bundle"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = unbundle_from_file(r, filename.buf))) {
> +		warning(_("failed to unbundle bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	unlink(filename.buf);
> +	strbuf_release(&filename);
> +	return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> new file mode 100644
> index 00000000000..8a152f1ef14
> --- /dev/null
> +++ b/bundle-uri.h
> @@ -0,0 +1,14 @@
> +#ifndef BUNDLE_URI_H
> +#define BUNDLE_URI_H
> +
> +struct repository;
> +
> +/**
> + * Fetch data from the given 'uri' and unbundle the bundle data found
> + * based on that information.
> + *
> + * Returns non-zero if no bundle information is found at the given 'uri'.
> + */
> +int fetch_bundle_uri(struct repository *r, const char *uri);
> +
> +#endif
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> Before implementing a way to fetch bundles into a repository, create the
>> basic logic. Assume that the URI is actually a file path. Future logic
>> will make this more careful to other protocols.
>>
>> For now, we also only succeed if the content at the URI is a bundle
>> file, not a bundle list. Bundle lists will be implemented in a future
>> change.
>>
>> Note that the discovery of a temporary filename is slightly racy because
>> the odb_mkstemp() relies on the temporary file not existing. With the
>> current implementation being limited to file copies, we could replace
>> the copy_file() with copy_fd(). The tricky part comes in future changes
>> that send the filename to 'git remote-https' and its 'get' capability.
> 
>> At that point, we need the file descriptor closed _and_ the file
>> unlinked.
> 
> Ahh, it looks like this was the point I missed in my previous review.
> IIUC, we need the file unlinked because http_get_file() will eventually
> call finalize_object_file() to move a tempfile to the final object name,
> and that will fail if we have an empty file already in place.

Yes, and I also was not sure what would happen if the empty file existed.
I tested it and thought allowing overwriting an existing file would be a
bigger problem than this choice of a filename.

We also discussed options about how it would be nice to have a predictable
filename so we could resume downloads _across Git process failures_
instead of just a network failure within a single Git process. This is
something to explore when creating that functionality.

Thanks,
-Stolee

@@ -0,0 +1,168 @@
#include "cache.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The previous change created the 'git clone --bundle-uri=<uri>' option.
> Currently, <uri> must be a filename.
> 
> Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
> and use git-remote-https as the way to download the data at that URI.
> Otherwise, check to see if file:// is present and modify the prefix
> accordingly.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
>  t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
>  2 files changed, 112 insertions(+), 3 deletions(-)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index b35babc36aa..7086382b186 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
>  	return 0;
>  }
>  
> -static int copy_uri_to_file(const char *file, const char *uri)
> +static int download_https_uri_to_file(const char *file, const char *uri)
>  {
> -	/* File-based URIs only for now. */
> -	return copy_file(file, uri, 0);
> +	int result = 0;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	FILE *child_in = NULL, *child_out = NULL;
> +	struct strbuf line = STRBUF_INIT;
> +	int found_get = 0;
> +
> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> +	cp.in = -1;
> +	cp.out = -1;
> +
> +	if (start_command(&cp))
> +		return 1;
> +
> +	child_in = fdopen(cp.in, "w");
> +	if (!child_in) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	child_out = fdopen(cp.out, "r");
> +	if (!child_out) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "capabilities\n");
> +	fflush(child_in);
> +
> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}
> +	strbuf_release(&line);
> +
> +	if (!found_get) {
> +		result = error(_("insufficient capabilities"));
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "get %s %s\n\n", uri, file);
> +
> +cleanup:
> +	if (child_in)
> +		fclose(child_in);
> +	if (finish_command(&cp))
> +		return 1;
> +	if (child_out)
> +		fclose(child_out);
> +	return result;
> +}
> +
> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> +	const char *out;
> +
> +	if (skip_prefix(uri, "https:", &out) ||
> +	    skip_prefix(uri, "http:", &out))
> +		return download_https_uri_to_file(filename, uri);

Since we're not using "out" here, should we replace the skip_prefix()s
with starts_with(), or perhaps even istarts_with()?


> +	if (!skip_prefix(uri, "file://", &out))
> +		out = uri;
> +
> +	/* Copy as a file */
> +	return copy_file(filename, out, 0);
>  }
>  
>  static int unbundle_from_file(struct repository *r, const char *file)
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index f709bcb729c..ad666a2d28a 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone with file:// bundle' '
> +	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
> +		clone-from clone-file &&
> +	git -C clone-file rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual
> +'
> +
> +#########################################################################
> +# HTTP tests begin here
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'fail to fetch from non-existent HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
> +	grep "failed to download bundle from URI" err
> +'
> +
> +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
> +	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
> +	grep "is not a bundle" err
> +'
> +
> +test_expect_success 'clone HTTP bundle' '
> +	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
> +
> +	git clone --no-local --mirror clone-from \
> +		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
> +
> +	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
> +		"$HTTPD_URL/smart/fetch.git" clone-http &&
> +	git -C clone-http rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual &&
> +
> +	test_config -C clone-http log.excludedecoration refs/bundle/
> +'
> +
> +# Do not add tests here unless they use the HTTP server, as they will
> +# not run unless the HTTP dependencies exist.
> +
>  test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:

>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> +	const char *out;
>> +
>> +	if (skip_prefix(uri, "https:", &out) ||
>> +	    skip_prefix(uri, "http:", &out))
>> +		return download_https_uri_to_file(filename, uri);
> 
> Since we're not using "out" here, should we replace the skip_prefix()s
> with starts_with(), or perhaps even istarts_with()?

Sounds good to me. Can't abandon "out" because it is used
for the "file://" prefix, but istarts_with() is better here.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 27, 2022

This patch series was integrated into seen via git@8adee93.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 27, 2022

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> This is the second series building the bundle URI feature as discussed in
> the previous series that added the design document [1]. This series does not
> modify the design document, so the patches are independent and can be
> applied to the latest 'master'.
> 
> [1]
> https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com
> 
> This series brings in just enough logic that we can bootstrap clones from a
> single bundle using git clone --bundle-uri=<X>.
> 
>  * Patch 1 adds a 'get' capability to 'git remote-https' which allows
>    downloading the contents of a URI to a local file.
>  * Patch 2 creates basic file-copy logic within a new bundle-uri.c file. It
>    is not used until patch 3.
>  * Patch 3 creates the git clone --bundle-uri=<X> option, allowing Git to
>    bootstrap a clone from a bundle, but get the remaining objects from the
>    origin URL. (As of this patch, it only accepts a filename.)
>  * Patch 4 extends the git clone --bundle-uri=<X> option to allow file://
>    and https:// URIs.
>  * Patch 5 is a CLI helper to avoid using --bundle-uri and --depth at the
>    same time in git clone.
> 
> As outlined in [1], the next steps after this are:
> 
>  1. Allow parsing a bundle list as a config file at the given URI. The
>     key-value format is unified with the protocol v2 verb (coming in (3)).
>     [2]
>  2. Implement the protocol v2 verb, re-using the bundle list logic from (2).
>     Use this to auto-discover bundle URIs during 'git clone' (behind a
>     config option). [3]
>  3. Implement the 'creationToken' heuristic, allowing incremental 'git
>     fetch' commands to download a bundle list from a configured URI, and
>     only download bundles that are new based on the creation token values.
>     [4]
> 
> I have prepared some of this work as pull requests on my personal fork so
> curious readers can look ahead to where we are going:
> 
> [2] https://github.com/derrickstolee/git/pull/20
> 
> [3] https://github.com/derrickstolee/git/pull/21
> 
> [4] https://github.com/derrickstolee/git/pull/22
> 
> Note: this series includes some code pulled out of the first series [1], and
> in particular the git fetch --bundle-uri=<X> option is removed. The
> intention was to replace that with git bundle fetch <X>, but that conflicts
> with the work to refactor how subcommands are parsed. The git bundle fetch
> subcommand could be added later for maximum flexibility on the client side,
> but we can also move forward without it.
> 
> Thanks, -Stolee
> 
> P.S. Here is the range-diff compared to v2 of the previous bundle URI
> series:
> 
> 1:  d444042dc4dcc < -:  ------------- docs: document bundle URI standard
> 2:  0a2cf60437f78 ! 1:  40808e92afb7b remote-curl: add 'get' capability
>     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
>       
>      +static void parse_get(struct strbuf *buf)
>      +{
>     -+	struct http_get_options opts = { 0 };
>      +	struct strbuf url = STRBUF_INIT;
>      +	struct strbuf path = STRBUF_INIT;
>      +	const char *p, *space;
>     @@ remote-curl.c: static void parse_fetch(struct strbuf *buf)
>      +	strbuf_add(&url, p, space - p);
>      +	strbuf_addstr(&path, space + 1);
>      +
>     -+	if (http_get_file(url.buf, path.buf, &opts))
>     ++	if (http_get_file(url.buf, path.buf, NULL))
>      +		die(_("failed to download file at URL '%s'"), url.buf);
>      +
>      +	strbuf_release(&url);
>     @@ t/t5557-http-get.sh (new)
>      +	get $url file1
>      +	EOF
>      +
>     -+	test_must_fail git remote-http $url $url <input 2>err &&
>     ++	test_must_fail git remote-http $url <input 2>err &&
>      +	test_path_is_missing file1 &&
>      +	grep "failed to download file at URL" err &&
>      +	rm file1.temp
>     @@ t/t5557-http-get.sh (new)
>      +
>      +	EOF
>      +
>     -+	GIT_TRACE2_PERF=1 git remote-http $url $url <input &&
>     ++	GIT_TRACE2_PERF=1 git remote-http $url <input &&
>      +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
>      +'
>      +
>      +test_done
>     -
>     - ## transport-helper.c ##
>     -@@ transport-helper.c: struct helper_data {
>     - 		check_connectivity : 1,
>     - 		no_disconnect_req : 1,
>     - 		no_private_update : 1,
>     --		object_format : 1;
>     -+		object_format : 1,
>     -+		get : 1;
>     - 
>     - 	/*
>     - 	 * As an optimization, the transport code may invoke fetch before
>     -@@ transport-helper.c: static struct child_process *get_helper(struct transport *transport)
>     - 			data->no_private_update = 1;
>     - 		} else if (starts_with(capname, "object-format")) {
>     - 			data->object_format = 1;
>     -+		} else if (!strcmp(capname, "get")) {
>     -+			data->get = 1;
>     - 		} else if (mandatory) {
>     - 			die(_("unknown mandatory capability %s; this remote "
>     - 			      "helper probably needs newer version of Git"),
> 3:  abec47564fd9c ! 2:  7d3159f0d9a29 bundle-uri: create basic file-copy logic
>     @@ Commit message
>          file, not a bundle list. Bundle lists will be implemented in a future
>          change.
>      
>     +    Note that the discovery of a temporary filename is slightly racy because
>     +    the odb_mkstemp() relies on the temporary file not existing. With the
>     +    current implementation being limited to file copies, we could replace
>     +    the copy_file() with copy_fd(). The tricky part comes in future changes
>     +    that send the filename to 'git remote-https' and its 'get' capability.
>     +    At that point, we need the file descriptor closed _and_ the file
>     +    unlinked. If we were to keep the file descriptor open for the sake of
>     +    normal file copies, then we would pollute the rest of the code for
>     +    little benefit. This is especially the case because we expect that most
>     +    bundle URI use will be based on HTTPS instead of file copies.
>     +
>          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>      
>       ## Makefile ##
>     @@ bundle-uri.c (new)
>      +#include "refs.h"
>      +#include "run-command.h"
>      +
>     -+static void find_temp_filename(struct strbuf *name)
>     ++static int find_temp_filename(struct strbuf *name)
>      +{
>      +	int fd;
>      +	/*
>     @@ bundle-uri.c (new)
>      +	 * racy, but unlikely to collide.
>      +	 */
>      +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
>     -+	if (fd < 0)
>     -+		die(_("failed to create temporary file"));
>     ++	if (fd < 0) {
>     ++		warning(_("failed to create temporary file"));
>     ++		return -1;
>     ++	}
>     ++
>      +	close(fd);
>      +	unlink(name->buf);
>     ++	return 0;
>      +}
>      +
>     -+static int copy_uri_to_file(const char *uri, const char *file)
>     ++static int copy_uri_to_file(const char *file, const char *uri)
>      +{
>     -+	/* Copy as a file */
>     -+	return copy_file(file, uri, 0444);
>     ++	/* File-based URIs only for now. */
>     ++	return copy_file(file, uri, 0);
>      +}
>      +
>      +static int unbundle_from_file(struct repository *r, const char *file)
>     @@ bundle-uri.c (new)
>      +	int result = 0;
>      +	int bundle_fd;
>      +	struct bundle_header header = BUNDLE_HEADER_INIT;
>     -+	struct strvec extra_index_pack_args = STRVEC_INIT;
>      +	struct string_list_item *refname;
>      +	struct strbuf bundle_ref = STRBUF_INIT;
>      +	size_t bundle_prefix_len;
>     @@ bundle-uri.c (new)
>      +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
>      +		return 1;
>      +
>     -+	if ((result = unbundle(r, &header, bundle_fd, &extra_index_pack_args)))
>     ++	if ((result = unbundle(r, &header, bundle_fd, NULL)))
>      +		return 1;
>      +
>      +	/*
>     @@ bundle-uri.c (new)
>      +	int result = 0;
>      +	struct strbuf filename = STRBUF_INIT;
>      +
>     -+	find_temp_filename(&filename);
>     -+	if ((result = copy_uri_to_file(uri, filename.buf)))
>     ++	if ((result = find_temp_filename(&filename)))
>     ++		goto cleanup;
>     ++
>     ++	if ((result = copy_uri_to_file(filename.buf, uri))) {
>     ++		warning(_("failed to download bundle from URI '%s'"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>     -+	if ((result = !is_bundle(filename.buf, 0)))
>     ++	if ((result = !is_bundle(filename.buf, 0))) {
>     ++		warning(_("file at URI '%s' is not a bundle"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>     -+	if ((result = unbundle_from_file(r, filename.buf)))
>     ++	if ((result = unbundle_from_file(r, filename.buf))) {
>     ++		warning(_("failed to unbundle bundle from URI '%s'"), uri);
>      +		goto cleanup;
>     ++	}
>      +
>      +cleanup:
>      +	unlink(filename.buf);
> 4:  f6255ec518857 < -:  ------------- fetch: add --bundle-uri option
> -:  ------------- > 3:  29e645a54ba7f clone: add --bundle-uri option
> 5:  bfbd11b48bf1b ! 4:  f6bc3177332e8 bundle-uri: add support for http(s):// and file://
>     @@ Metadata
>       ## Commit message ##
>          bundle-uri: add support for http(s):// and file://
>      
>     -    The previous change created the 'git fetch --bundle-uri=<uri>' option.
>     +    The previous change created the 'git clone --bundle-uri=<uri>' option.
>          Currently, <uri> must be a filename.
>      
>          Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
>     @@ Commit message
>          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>      
>       ## bundle-uri.c ##
>     -@@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
>     - 	unlink(name->buf);
>     +@@ bundle-uri.c: static int find_temp_filename(struct strbuf *name)
>     + 	return 0;
>       }
>       
>     -+static int download_https_uri_to_file(const char *uri, const char *file)
>     -+{
>     +-static int copy_uri_to_file(const char *file, const char *uri)
>     ++static int download_https_uri_to_file(const char *file, const char *uri)
>     + {
>     +-	/* File-based URIs only for now. */
>     +-	return copy_file(file, uri, 0);
>      +	int result = 0;
>      +	struct child_process cp = CHILD_PROCESS_INIT;
>      +	FILE *child_in = NULL, *child_out = NULL;
>     @@ bundle-uri.c: static void find_temp_filename(struct strbuf *name)
>      +	return result;
>      +}
>      +
>     - static int copy_uri_to_file(const char *uri, const char *file)
>     - {
>     ++static int copy_uri_to_file(const char *filename, const char *uri)
>     ++{
>      +	const char *out;
>     ++
>      +	if (skip_prefix(uri, "https:", &out) ||
>      +	    skip_prefix(uri, "http:", &out))
>     -+		return download_https_uri_to_file(uri, file);
>     ++		return download_https_uri_to_file(filename, uri);
>      +
>      +	if (!skip_prefix(uri, "file://", &out))
>      +		out = uri;
>      +
>     - 	/* Copy as a file */
>     --	return copy_file(file, uri, 0444);
>     -+	return !!copy_file(file, out, 0);
>     ++	/* Copy as a file */
>     ++	return copy_file(filename, out, 0);
>       }
>       
>       static int unbundle_from_file(struct repository *r, const char *file)
>      
>     - ## t/t5558-fetch-bundle-uri.sh ##
>     -@@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
>     + ## t/t5558-clone-bundle-uri.sh ##
>     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
>           test_cmp expect actual
>       '
>       
>     -+test_expect_success 'fetch file:// bundle' '
>     -+	git init fetch-file &&
>     -+	git -C fetch-file fetch --bundle-uri="file://$(pwd)/fetch-from/B.bundle" &&
>     -+	git -C fetch-file rev-parse refs/bundles/topic >actual &&
>     -+	git -C fetch-from rev-parse topic >expect &&
>     ++test_expect_success 'clone with file:// bundle' '
>     ++	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
>     ++		clone-from clone-file &&
>     ++	git -C clone-file rev-parse refs/bundles/topic >actual &&
>     ++	git -C clone-from rev-parse topic >expect &&
>      +	test_cmp expect actual
>      +'
>      +
>     @@ t/t5558-fetch-bundle-uri.sh: test_expect_success 'fetch file bundle' '
>      +start_httpd
>      +
>      +test_expect_success 'fail to fetch from non-existent HTTP URL' '
>     -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/does-not-exist" 2>err &&
>     ++	test_when_finished rm -rf test &&
>     ++	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
>      +	grep "failed to download bundle from URI" err
>      +'
>      +
>      +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
>     ++	test_when_finished rm -rf test &&
>      +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
>     -+	test_must_fail git fetch --bundle-uri="$HTTPD_URL/bogus" 2>err &&
>     ++	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
>      +	grep "is not a bundle" err
>      +'
>      +
>     -+test_expect_success 'fetch HTTP bundle' '
>     -+	cp fetch-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
>     -+	git init fetch-http &&
>     -+	git -C fetch-http fetch --bundle-uri="$HTTPD_URL/B.bundle" &&
>     -+	git -C fetch-http rev-parse refs/bundles/topic >actual &&
>     -+	git -C fetch-from rev-parse topic >expect &&
>     -+	test_cmp expect actual
>     ++test_expect_success 'clone HTTP bundle' '
>     ++	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
>     ++
>     ++	git clone --no-local --mirror clone-from \
>     ++		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
>     ++
>     ++	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
>     ++		"$HTTPD_URL/smart/fetch.git" clone-http &&
>     ++	git -C clone-http rev-parse refs/bundles/topic >actual &&
>     ++	git -C clone-from rev-parse topic >expect &&
>     ++	test_cmp expect actual &&
>     ++
>     ++	test_config -C clone-http log.excludedecoration refs/bundle/
>      +'
>      +
>      +# Do not add tests here unless they use the HTTP server, as they will
> 6:  a217e9a0640b4 < -:  ------------- fetch: add 'refs/bundle/' to log.excludeDecoration
> -:  ------------- > 5:  e823b168ab725 clone: --bundle-uri cannot be combined with --depth
> 
> 
> Derrick Stolee (5):
>   remote-curl: add 'get' capability
>   bundle-uri: create basic file-copy logic
>   clone: add --bundle-uri option
>   bundle-uri: add support for http(s):// and file://
>   clone: --bundle-uri cannot be combined with --depth
> 
>  Documentation/git-clone.txt         |   7 ++
>  Documentation/gitremote-helpers.txt |   9 ++
>  Makefile                            |   1 +
>  builtin/clone.c                     |  18 +++
>  bundle-uri.c                        | 168 ++++++++++++++++++++++++++++
>  bundle-uri.h                        |  14 +++
>  remote-curl.c                       |  32 ++++++
>  t/t5557-http-get.sh                 |  37 ++++++
>  t/t5558-clone-bundle-uri.sh         |  81 ++++++++++++++
>  t/t5606-clone-options.sh            |   8 ++
>  10 files changed, 375 insertions(+)
>  create mode 100644 bundle-uri.c
>  create mode 100644 bundle-uri.h
>  create mode 100755 t/t5557-http-get.sh
>  create mode 100755 t/t5558-clone-bundle-uri.sh
> 
> 
> base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1300%2Fderrickstolee%2Fbundle-redo%2Fclone-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1300/derrickstolee/bundle-redo/clone-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1300
> -- 
> gitgitgadget

Looks good to me, with just one small fix and a couple of optional
nitpicks. Thaks for the series!

Reviewed-by: Josh Steadmon <steadmon@google.com>

@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
Can guarantee that when a clone is requested, the received
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]

Add a:

	TEST_PASSES_SANITIZE_LEAK=true

Before:

> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp

This should presumably be a :

	test_when_finished "rm file.temp"

Before the "test_must_fail", otherwise we'll leave this around if
e.g. the "grep" in the &&-chain fails, but we surely want to clean this
up in that case..

> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&

Is this ad-hoc debugging that snuck into the test? Nothing here needs
this GIT_TRACE2_PERF...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
> 
> Add a:
> 
> 	TEST_PASSES_SANITIZE_LEAK=true

I trust this is important for your series that flags tests
that _could_ pass with this. Thanks.

>> +. ./test-lib.sh
>> +
>> +. "$TEST_DIRECTORY"/lib-httpd.sh
>> +start_httpd
>> +
>> +test_expect_success 'get by URL: 404' '
>> +	url="$HTTPD_URL/none.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file1
>> +	EOF
>> +
>> +	test_must_fail git remote-http $url <input 2>err &&
>> +	test_path_is_missing file1 &&
>> +	grep "failed to download file at URL" err &&
>> +	rm file1.temp
> 
> This should presumably be a :
> 
> 	test_when_finished "rm file.temp"

Yes. Thanks.

>> +'
>> +
>> +test_expect_success 'get by URL: 200' '
>> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
>> +
>> +	url="$HTTPD_URL/exists.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file2
>> +
>> +	EOF
>> +
>> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> 
> Is this ad-hoc debugging that snuck into the test? Nothing here needs
> this GIT_TRACE2_PERF...

Good eye. Removed.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Aug 01 2022, Derrick Stolee wrote:

> On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> [...]
>> 
>> Add a:
>> 
>> 	TEST_PASSES_SANITIZE_LEAK=true
>
> I trust this is important for your series that flags tests
> that _could_ pass with this. Thanks.

Not really, it does add a "check" mode, and this is one of the few tests
crop up when merged with "seen" if run in that mode, but currentlny it's
only run manually, and we have other outstanding deviations.

It's just something for "master": We have the "linux-leaks" job for a
while now, so it makes sense for new tests to declare if they're
leak-free or not, so we guard against the introduction of leaks with new
code, if possible.

I (or someone else) can always follow-up with something like[1], but as
it's easy it makes sense to just add it in the first place, thanks.

1. https://lore.kernel.org/git/patch-v3-13.15-28255ac3239-20220727T230800Z-avarab@gmail.com/

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 28, 2022

This patch series was integrated into seen via git@b1dc090.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2022

This patch series was integrated into seen via git@d6118a7.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2022

This patch series was integrated into seen via git@e3b6dda.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2022

This patch series was integrated into seen via git@711119e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 29, 2022

There was a status update in the "New Topics" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.git.1658781277.gitgitgadget@gmail.com>

A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.

Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.

For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.

Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 6:11 PM, Josh Steadmon wrote:

> Looks good to me, with just one small fix and a couple of optional
> nitpicks. Thaks for the series!
> 
> Reviewed-by: Josh Steadmon <steadmon@google.com>

Thanks for the review!

-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

This patch series was integrated into seen via git@3b8e67e.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.git.1658781277.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 1, 2022

This patch series was integrated into seen via git@197f788.

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 2, 2022

Submitted as pull.1300.v2.git.1659443384.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1300/derrickstolee/bundle-redo/clone-v2

To fetch this version to local tag pr-1300/derrickstolee/bundle-redo/clone-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1300/derrickstolee/bundle-redo/clone-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 5, 2022

This patch series was integrated into seen via git@6ae7029.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 6, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.v2.git.1659443384.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 8, 2022

This patch series was integrated into seen via git@5afd477.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 8, 2022

This patch series was integrated into seen via git@787f8fc.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 9, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".

Expecting a reroll.
cf. <9afd5eb2-44a0-6342-6006-5dbdefba9947@github.com>
source: <pull.1300.v2.git.1659443384.gitgitgadget@gmail.com>

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 9, 2022

Submitted as pull.1300.v3.git.1660050703.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1300/derrickstolee/bundle-redo/clone-v3

To fetch this version to local tag pr-1300/derrickstolee/bundle-redo/clone-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1300/derrickstolee/bundle-redo/clone-v3

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2022

This patch series was integrated into seen via git@c5404b7.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 10, 2022

This patch series was integrated into seen via git@c641cc1.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2022

This patch series was integrated into seen via git@a960cf7.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.v3.git.1660050703.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 11, 2022

This patch series was integrated into seen via git@0026f34.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 12, 2022

This patch series was integrated into seen via git@851294a.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2022

This patch series was integrated into seen via git@d93a9b1.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 15, 2022

This patch series was integrated into seen via git@c322243.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 16, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.v3.git.1660050703.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 16, 2022

This patch series was integrated into seen via git@27af781.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2022

This patch series was integrated into seen via git@03cf395.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 17, 2022

This patch series was integrated into seen via git@08c1ec6.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".
source: <pull.1300.v3.git.1660050703.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

This patch series was integrated into seen via git@c0ad21a.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 18, 2022

This patch series was integrated into next via git@5e8a3ec.

@gitgitgadget gitgitgadget bot added the next label Aug 18, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 19, 2022

This patch series was integrated into seen via git@633f7df.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

This patch series was integrated into seen via git@c8b7c44.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-clone on the Git mailing list:

Implement "git clone --bundle-uri".

Will merge to 'master'.
source: <pull.1300.v3.git.1660050703.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 20, 2022

This patch series was integrated into seen via git@7ce9b86.

@@ -323,6 +323,12 @@ or `--mirror` is given)
for `host.xz:foo/.git`). Cloning into an existing directory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * Before fetching from the remote, download and install bundle
> +	 * data from the --bundle-uri option.
> +	 */
> +	if (bundle_uri) {
> +		/* At this point, we need the_repository to match the cloned repo. */
> +		repo_init(the_repository, git_dir, work_tree);
> +		if (fetch_bundle_uri(the_repository, bundle_uri))
> +			warning(_("failed to fetch objects from bundle URI '%s'"),
> +				bundle_uri);
> +	}

I do not offhand know why I suddenly started seeing the issue for
this relatively old commit I have had in my tree for at least 10
days, but I am getting this

builtin/clone.c: In function 'cmd_clone':
builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
 1248 |                 repo_init(the_repository, git_dir, work_tree);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


with the commit merged in 'seen'.

It seems that the updated ab/submodule-helper-prep is doing it.  

Why can't that (or any of Ævar's) topic focus on what it needs to
do, without churning the codebase and inflict damages to other
topics like this?  Quite frustrating.

I'll redo today's integration without the offending topic to give
other topics test coverage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2022 5:24 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	/*
>> +	 * Before fetching from the remote, download and install bundle
>> +	 * data from the --bundle-uri option.
>> +	 */
>> +	if (bundle_uri) {
>> +		/* At this point, we need the_repository to match the cloned repo. */
>> +		repo_init(the_repository, git_dir, work_tree);
>> +		if (fetch_bundle_uri(the_repository, bundle_uri))
>> +			warning(_("failed to fetch objects from bundle URI '%s'"),
>> +				bundle_uri);
>> +	}
> 
> I do not offhand know why I suddenly started seeing the issue for
> this relatively old commit I have had in my tree for at least 10
> days, but I am getting this
> 
> builtin/clone.c: In function 'cmd_clone':
> builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
>  1248 |                 repo_init(the_repository, git_dir, work_tree);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> with the commit merged in 'seen'.

Thank you for pointing this out. Here is a patch that fixes
the issue from this patch:

--- >8 ---

From 6df8bc6d7ffdc1b115d85ef9550bab5dcf1811f8 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Tue, 23 Aug 2022 09:53:47 -0400
Subject: [PATCH] clone: warn on failure to repo_init()

The --bundle-uri option was added in 55568919616 (clone: add
--bundle-uri option, 2022-08-09), but this also introduced a call to
repo_init() whose return value was ignored. Fix that ignored value by
warning that the bundle URI process could not continue if it failed.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4463789680b..e21d42dfee5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 */
 	if (bundle_uri) {
 		/* At this point, we need the_repository to match the cloned repo. */
-		repo_init(the_repository, git_dir, work_tree);
-		if (fetch_bundle_uri(the_repository, bundle_uri))
+		if (repo_init(the_repository, git_dir, work_tree))
+			warning(_("failed to initialize the repo, skipping bundle URI"));
+		else if (fetch_bundle_uri(the_repository, bundle_uri))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
 	}
-- 
2.37.1.vfs.0.0.rebase

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
> Date: Tue, 23 Aug 2022 09:53:47 -0400
> Subject: [PATCH] clone: warn on failure to repo_init()
>
> The --bundle-uri option was added in 55568919616 (clone: add
> --bundle-uri option, 2022-08-09), but this also introduced a call to
> repo_init() whose return value was ignored. Fix that ignored value by
> warning that the bundle URI process could not continue if it failed.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

OK.  It looks like a sensible thing to do.

>  builtin/clone.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 4463789680b..e21d42dfee5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	 */
>  	if (bundle_uri) {
>  		/* At this point, we need the_repository to match the cloned repo. */
> -		repo_init(the_repository, git_dir, work_tree);
> -		if (fetch_bundle_uri(the_repository, bundle_uri))
> +		if (repo_init(the_repository, git_dir, work_tree))
> +			warning(_("failed to initialize the repo, skipping bundle URI"));
> +		else if (fetch_bundle_uri(the_repository, bundle_uri))
>  			warning(_("failed to fetch objects from bundle URI '%s'"),
>  				bundle_uri);
>  	}

@@ -0,0 +1,168 @@
#include "cache.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Teng Long wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}

Clear implementation for me to read, thanks.

I'm not sure but maybe a nit, should it be a "break;" after
"found_get = 1;" for omitting the left uncencerned capabilities
to quit earlier?

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/29/2022 12:58 AM, Teng Long wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> +	while (!strbuf_getline(&line, child_out)) {
>> +		if (!line.len)
>> +			break;
>> +		if (!strcmp(line.buf, "get"))
>> +			found_get = 1;
>> +	}
> 
> Clear implementation for me to read, thanks.
> 
> I'm not sure but maybe a nit, should it be a "break;" after
> "found_get = 1;" for omitting the left uncencerned capabilities
> to quit earlier?

Teng, you are right that it does not hurt to have a 'break'
here. Since this topic has already merged, that change will
need a forward-fix that I can add to a future series.

Thanks,
-Stolee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant