fetch: download bundles once, even with --all #1508

derrickstolee · 2023-03-31T14:55:02Z

I noticed this in local testing where I had set up a bundle server for my Git repo, but wrote the wrong port number so I saw multiple warning messages during a git fetch --all run.

Thanks,
-Stolee

cc: vdye@github.com
cc: gitster@pobox.com
cc: Jeff King peff@peff.net

When fetch.bundleURI is set, 'git fetch' downloads bundles from the given bundle URI before fetching from the specified remote. However, when using non-file remotes, 'git fetch --all' will launch 'git fetch' subprocesses which then read fetch.bundleURI and fetch the bundle list again. We do not expect the bundle list to have new information during these multiple runs, so avoid these extra calls by un-setting fetch.bundleURI in the subprocess arguments. Be careful to skip fetching bundles for the empty bundle string. Fetching bundles from the empty list presents some interesting test failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com>

derrickstolee · 2023-03-31T15:58:24Z

/submit

gitgitgadget · 2023-03-31T15:59:11Z

Submitted as pull.1508.git.1680278344173.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1508/derrickstolee/fetch-bundles-multiple-v1

To fetch this version to local tag pr-1508/derrickstolee/fetch-bundles-multiple-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1508/derrickstolee/fetch-bundles-multiple-v1

gitgitgadget · 2023-03-31T17:09:27Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When fetch.bundleURI is set, 'git fetch' downloads bundles from the
> given bundle URI before fetching from the specified remote. However,
> when using non-file remotes, 'git fetch --all' will launch 'git fetch'
> subprocesses which then read fetch.bundleURI and fetch the bundle list
> again. We do not expect the bundle list to have new information during
> these multiple runs, so avoid these extra calls by un-setting
> fetch.bundleURI in the subprocess arguments.

It is a good observation, if we assume that all these remotes want
to share the same expectation on what the list of bundles described
in bundleURI should be.

I expected that people use multiple remotes and do a "fetch --all"
before starting work from a cronjob or while fetching coffee first
time in the morning to fetch from repositories holding work from
different folks and manged by different groups, and these groups do
not tightly share the object management recipes at what bundles to
pre-package and list in the list served at a bundleURI.  If there is
such an arrangement between repositires to share the object
management, even if the repositories fetched with the "--all" option
are truly multiple places, it may make sense to assume that these
repositories you are fetching from want you to use the same set of
bundles that are managed the same way to be used.  But I am not sure
if that assumption holds true.

Where does fetch.bundleURI come originally?  If we set only one
globally for the local repository at "git clone" time, perhaps that
is a problem?  IOW, instead of fetch.bundleURI, we would want to
have something per remote, e.g. remote.<name>.bundleURI, instead?

Putting that design level tangent aside, everything I see in this
patch makes sense, assuming that the fetch_bundle_uri() call done
fairly early in the parent process is sufficient to tell the child
processes that deal with individual repositories to reuse the info
that was already retrieved that call.

Thanks.

gitgitgadget · 2023-03-31T18:12:28Z

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 3/31/2023 1:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When fetch.bundleURI is set, 'git fetch' downloads bundles from the
>> given bundle URI before fetching from the specified remote. However,
>> when using non-file remotes, 'git fetch --all' will launch 'git fetch'
>> subprocesses which then read fetch.bundleURI and fetch the bundle list
>> again. We do not expect the bundle list to have new information during
>> these multiple runs, so avoid these extra calls by un-setting
>> fetch.bundleURI in the subprocess arguments.
> 
> It is a good observation, if we assume that all these remotes want
> to share the same expectation on what the list of bundles described
> in bundleURI should be.
> 
> I expected that people use multiple remotes and do a "fetch --all"
> before starting work from a cronjob or while fetching coffee first
> time in the morning to fetch from repositories holding work from
> different folks and manged by different groups, and these groups do
> not tightly share the object management recipes at what bundles to
> pre-package and list in the list served at a bundleURI.  If there is
> such an arrangement between repositires to share the object
> management, even if the repositories fetched with the "--all" option
> are truly multiple places, it may make sense to assume that these
> repositories you are fetching from want you to use the same set of
> bundles that are managed the same way to be used.  But I am not sure
> if that assumption holds true.
> 
> Where does fetch.bundleURI come originally?  If we set only one
> globally for the local repository at "git clone" time, perhaps that
> is a problem?  IOW, instead of fetch.bundleURI, we would want to
> have something per remote, e.g. remote.<name>.bundleURI, instead?

fetch.bundleURI is either set by 'git clone --bundle-uri' when the
list advertises a bundle.heuristic value, or can be set manually by
the user.

This indicates the use of a "standalone" bundle server that is
managed independently from a Git remote. The bundle list is downloaded
from a static URI instead of during protocol v2 in the 'bundle-uri'
command.

In that sense, the static bundle URI is not paired with a remote,
but instead presents a place where objects can be downloaded perhaps
faster than any Git remote (by downloading from a static file, or
because the bundle server has a faster link to the client machine).
The objects added to the client object store can then reduce the
size of the fetch from each other Git server remote, of course only
if the objects in the bundle(s) are also present on those remotes.

If there are remote-specific bundle lists, then the protocol v2
command will be a good way to communicate bundle lists that are
custom to those remotes. However, even in that case it is unlikely
that sourcing bundles from multiple sources will be useful, since
the bundle downloads will likely have significant common data.

> Putting that design level tangent aside, everything I see in this
> patch makes sense, assuming that the fetch_bundle_uri() call done
> fairly early in the parent process is sufficient to tell the child
> processes that deal with individual repositories to reuse the info
> that was already retrieved that call.

Thanks for looking closely.

-Stolee

gitgitgadget · 2023-03-31T21:01:02Z

This branch is now known as ds/fetch-bundle-uri-with-all.

gitgitgadget · 2023-03-31T21:01:04Z

This patch series was integrated into seen via git@57b99f4.

gitgitgadget · 2023-03-31T22:41:32Z

This patch series was integrated into seen via git@b18e7e7.

gitgitgadget · 2023-03-31T22:41:33Z

This patch series was integrated into next via git@a9f7873.

gitgitgadget · 2023-04-01T01:31:46Z

This patch series was integrated into seen via git@8c60b41.

gitgitgadget · 2023-04-01T08:25:54Z

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Mar 31, 2023 at 03:59:04PM +0000, Derrick Stolee via GitGitGadget wrote:

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 177c1810402..56390651b92 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -792,6 +792,15 @@ int fetch_bundle_uri(struct repository *r, const char *uri,
>  
>  	init_bundle_list(&list);
>  
> +	/*
> +	 * Do not fetch a NULL or empty bundle URI. An empty bundle URI
> +	 * could signal that a configured bundle URI has been disabled.
> +	 */
> +	if (!uri || !*uri) {
> +		result = 0;
> +		goto cleanup;
> +	}

Coverity flagged this "!uri"; it can never be true since we
unconditionally xstrdup(uri) at the top of the function, and we'd
segfault in that case.

I think the existing code that assumes a non-NULL uri is reasonable.
Even before the xstrdup() we'd have segfaulted when it was handed to
fetch_bundle_uri_internal().

It's not a huge deal, in the sense that you're just being overly
defensive, but the comment and conditional here are a little misleading.

-Peff

gitgitgadget · 2023-04-01T08:25:57Z

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget · 2023-04-03T18:47:54Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

>> +	if (!uri || !*uri) {
>> +		result = 0;
>> +		goto cleanup;
>> +	}
>
> Coverity flagged this "!uri"; it can never be true since we
> unconditionally xstrdup(uri) at the top of the function, and we'd
> segfault in that case.
>
> I think the existing code that assumes a non-NULL uri is reasonable.
> Even before the xstrdup() we'd have segfaulted when it was handed to
> fetch_bundle_uri_internal().
>
> It's not a huge deal, in the sense that you're just being overly
> defensive, but the comment and conditional here are a little misleading.

Nice.  Thanks for commenting on what you spotted.

gitgitgadget · 2023-04-03T21:54:16Z

This patch series was integrated into seen via git@2866c3d.

gitgitgadget · 2023-04-04T23:33:18Z

This patch series was integrated into seen via git@f8b9054.

gitgitgadget · 2023-04-05T00:17:47Z

There was a status update in the "New Topics" section about the branch ds/fetch-bundle-uri-with-all on the Git mailing list:

"git fetch --all" does not have to download and handle the same
bundleURI over and over, which has been corrected.

Will merge to 'master'.
source: <pull.1508.git.1680278344173.gitgitgadget@gmail.com>

gitgitgadget · 2023-04-05T23:28:14Z

This patch series was integrated into seen via git@1d4a47c.

gitgitgadget · 2023-04-06T21:30:39Z

This patch series was integrated into seen via git@89833fc.

gitgitgadget · 2023-04-06T21:30:40Z

This patch series was integrated into master via git@89833fc.

gitgitgadget · 2023-04-06T21:30:41Z

This patch series was integrated into next via git@89833fc.

gitgitgadget · 2023-04-06T21:30:45Z

Closed via 89833fc.

derrickstolee self-assigned this Mar 31, 2023

derrickstolee force-pushed the fetch-bundles-multiple branch from 6806764 to bbfe549 Compare March 31, 2023 15:16

derrickstolee force-pushed the fetch-bundles-multiple branch from bbfe549 to 1ca2651 Compare March 31, 2023 15:19

gitgitgadget bot added the seen label Mar 31, 2023

gitgitgadget bot added the next label Mar 31, 2023

gitgitgadget bot added the master label Apr 6, 2023

gitgitgadget bot closed this Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetch: download bundles once, even with --all #1508

fetch: download bundles once, even with --all #1508

derrickstolee commented Mar 31, 2023 •

edited by gitgitgadget bot

Loading

derrickstolee commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 3, 2023

gitgitgadget bot commented Apr 3, 2023

gitgitgadget bot commented Apr 4, 2023

gitgitgadget bot commented Apr 5, 2023

gitgitgadget bot commented Apr 5, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

fetch: download bundles once, even with --all #1508

fetch: download bundles once, even with --all #1508

Conversation

derrickstolee commented Mar 31, 2023 • edited by gitgitgadget bot Loading

derrickstolee commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Mar 31, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 1, 2023

gitgitgadget bot commented Apr 3, 2023

gitgitgadget bot commented Apr 3, 2023

gitgitgadget bot commented Apr 4, 2023

gitgitgadget bot commented Apr 5, 2023

gitgitgadget bot commented Apr 5, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

gitgitgadget bot commented Apr 6, 2023

derrickstolee commented Mar 31, 2023 •

edited by gitgitgadget bot

Loading