Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle URIs III: Parse and download from bundle lists #1333

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Aug 22, 2022

This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list.

The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series).

The patches are organized in this way (updated for v4):

  1. Patch 1 is a cleanup from the previous part. This allows us to simplify our bundle list data structure slightly.

  2. Patches 2-3 create the bundle list data structures and the logic for populating the list from key-value pairs.

  3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2].

  4. Patch 6 teaches Git to parse Git config files into bundle lists.

  5. Patches 7-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. This also fixes a previous bug when running verify_bundle() multiple times in the same process, as it did not clear the PREREQ_MARK flag upon leaving (see patch 8).

  6. Patches 10-12 suppress unhelpful warnings from user visibility.

[1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/

[2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/

At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file.

As outlined in the design document [1], the next steps after this are:

  1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2]
  2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3]

I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going:

[3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com

[4] derrickstolee#21

[5] derrickstolee#22

Updates in v5

  • The bug about verify_bundle() not working multile times in the same process is fixed without removing the revision walk. Instead, more flags needed to be removed as the method cleaned up after itself.

Updates in v4

  • Properly updated the patch outline.

  • Jonathan Tan asked for more tests, and this revealed some interesting behaviors which I have now either fixed or made explicit:

    1. In "all" mode, we try to download and apply all bundles. Do not fail if a single bundle download fails.
    2. Previously, not all bundles were being applied, and this was noticed by the added checks for the refs/bundles/* refs at the end of the tests. This revealed the need for removing the reachability walk from verify_bundle() since the written refs/bundles/* refs were not being picked up by the loose ref cache. Since removing the reachability walk seemed like the faster (for users) option, I went that direction.
    3. While running those tests and examining the output carefully, I noticed several error messages related to missing prerequisites due to attempting unbundling in a random order. This doesn't appear in the later creationToken version, so I hadn't noticed it at the tip of my local work. These messages are removed with a new quiet mode for verify_bundle().

Updates in v3

  • Fixed a comment about a return value of -1.
  • Fixed and tested scenario where early URIs fail in "any" mode and Git should try the rest of the list.
  • Instead of using 'success_count' and 'failure_count', use the iterator return value to terminate the "all" mode loop early.

Updates in v2

Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version.

  • I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches.

  • We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used.

  • The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series).

  • The bundle.version value is now parsed using git_parse_int().

  • The config key is now parsed using parse_config_key().

  • Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort.

  • Some printf()s are correctly changed to fprintf()s.

  • The test helper CLI is unified across the two modes. They both take a filename now.

  • The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure.

Thanks,

  • Stolee

cc: gitster@pobox.com
cc: me@ttaylorr.com
cc: newren@gmail.com
cc: avarab@gmail.com
cc: mjcheetham@outlook.com
cc: steadmon@google.com
cc: Glen Choo chooglen@google.com
cc: Jonathan Tan jonathantanmy@google.com
cc: Teng Long dyroneteng@gmail.com

@derrickstolee derrickstolee self-assigned this Aug 22, 2022
@derrickstolee derrickstolee changed the base branch from master to ds/bundle-uri-clone August 22, 2022 13:47
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2022

Submitted as pull.1333.git.1661181174.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1333/derrickstolee/bundle-redo/list-v1

To fetch this version to local tag pr-1333/derrickstolee/bundle-redo/list-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1333/derrickstolee/bundle-redo/list-v1

@@ -4,6 +4,67 @@
#include "object-store.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/**
> + * The remote_bundle_info struct contains information for a single bundle
> + * URI. This may be initialized simply by a given URI or might have
> + * additional metadata associated with it if the bundle was advertised by
> + * a bundle list.
> + */
> +struct remote_bundle_info {
> +	struct hashmap_entry ent;
> +
> +	/**
> +	 * The 'id' is a name given to the bundle for reference
> +	 * by other bundle infos.
> +	 */
> +	char *id;
> +
> +	/**
> +	 * The 'uri' is the location of the remote bundle so
> +	 * it can be downloaded on-demand. This will be NULL
> +	 * if there was no table of contents.
> +	 */
> +	char *uri;
> +
> +	/**
> +	 * If the bundle has been downloaded, then 'file' is a
> +	 * filename storing its contents. Otherwise, 'file' is
> +	 * an empty string.
> +	 */
> +	struct strbuf file;
> +};

Presumably the sequence of events are that first a bundle list is
obtained, with their .file member set to empty, then http worker(s)
download and deposit the contents to files at which time the .file
member is set to the resulting file.  The file downloader presumably
uses the usual "create a temporary file, download to it, and then
commit it by closing and then renaming" dance, and the downloading
http worker may want to have two strbufs somewhere it can access to
come up with the name of the temporary and the name of the final
file.  But once the result becomes a committed file, its name will
not change, or will it?

At this step without the code that actually uses the data, use of
strbuf, instead of "char *" like id and uri members do, smells like
a premature optimization, and it is unclear if the optimization is
even effective.

Other than that, looks good to me.

@@ -387,6 +387,8 @@ include::config/branch.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index e376d547ce0..4280af6992e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>  
>  include::config/browser.txt[]
>  
> +include::config/bundle.txt[]
> +

The file that records a list of bundles may borrow the format of git
config files, but will we store their contents in configuration
files in the receiving (or originating) repository?  With the
presence of fields like "bundle.version", I somehow doubt it.

Should "git config --help" list them?

> diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
> new file mode 100644
> index 00000000000..3515bfe38d1
> --- /dev/null
> +++ b/Documentation/config/bundle.txt

If the answer is "no", then this file looks out of place.

> diff --git a/bundle-uri.c b/bundle-uri.c
> index ceeef0b6641..ade7eccce39 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list,
>  	return 0;
>  }
>  
> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.

Let's stick to the "error is negative" if we do not have a strong
reason not to.

> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> +			      struct bundle_list *list)
> +{
> +	const char *pkey, *dot;
> +	struct strbuf id = STRBUF_INIT;
> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> +	struct remote_bundle_info *bundle;
> +
> +	if (!skip_prefix(key, "bundle.", &pkey))
> +		return 1;
> +	dot = strchr(pkey, '.');
> +	if (!dot) {
> +		if (!strcmp(pkey, "version")) {
> +			int version = atoi(value);

Can atoi() safely fail?  Are we happy of pkey that says "1A" and we
parse it as "1"?

> +			if (version != 1)
> +				return 1;
> +
> +			list->version = version;
> +			return 0;
> +		}

Is it OK for a bundle list described in the config-file format to
have "bundle.version" twice, giving different values?  It feels
counter-intuitive to apply the "last one wins" rule that is usual
for configuration files.

> +		if (!strcmp(pkey, "mode")) {
> +			if (!strcmp(value, "all"))
> +				list->mode = BUNDLE_MODE_ALL;
> +			else if (!strcmp(value, "any"))
> +				list->mode = BUNDLE_MODE_ANY;
> +			else
> +				return 1;
> +			return 0;
> +		}

Likewise for bundle.mode

> +		/* Ignore other unknown global keys. */
> +		return 0;
> +	}
> +
> +	strbuf_add(&id, pkey, dot - pkey);
> +	dot++;
> +
> +	/*
> +	 * Check for an existing bundle with this <id>, or create one
> +	 * if necessary.
> +	 */
> +	lookup.id = id.buf;
> +	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> +	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> +		CALLOC_ARRAY(bundle, 1);
> +		bundle->id = strbuf_detach(&id, NULL);
> +		strbuf_init(&bundle->file, 0);
> +		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> +		hashmap_add(&list->bundles, &bundle->ent);
> +	}
> +	strbuf_release(&id);
> +
> +	if (!strcmp(dot, "uri")) {
> +		free(bundle->uri);
> +		bundle->uri = xstrdup(value);
> +		return 0;
> +	}

This explicitly implements "the last one wins".  Would it really
make sense for a server to serve a bundle list that says redundant
and wasteful pieces of information, i.e.

    [bundle "1"]
	url = one
	url = two

It is not like doing so would allow us to reuse an otherwise mostly
good file by appending new information and that would be a performance
or storage win.  So I am not quite sure why we want "the last one wins"
rule here.  It instead looks like something we want to sanity check
and complain about.

> +	/*
> +	 * At this point, we ignore any information that we don't
> +	 * understand, assuming it to be hints for a heuristic the client
> +	 * does not currently understand.
> +	 */

This is sensible.

> +	return 0;
> +}
> +
>  static int find_temp_filename(struct strbuf *name)
>  {
>  	int fd;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index e376d547ce0..4280af6992e 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>>  
>>  include::config/browser.txt[]
>>  
>> +include::config/bundle.txt[]
>> +
> 
> The file that records a list of bundles may borrow the format of git
> config files, but will we store their contents in configuration
> files in the receiving (or originating) repository?  With the
> presence of fields like "bundle.version", I somehow doubt it.
> 
> Should "git config --help" list them?

I suppose that at this point, they should be left out, since
writing them to your Git config does nothing.

In the future, having these config values present will advertise
the bundle list during the 'bundle-uri' protocol v2 command. That
could use some clarification in the documentation, too, perhaps
with a "bundle.*" item discussing how all of the other items are
related to that advertisement.

>> +/**
>> + * Given a key-value pair, update the state of the given bundle list.
>> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
>> + * is not understood or the value is malformed.
> 
> Let's stick to the "error is negative" if we do not have a strong
> reason not to.

Right. Can do.
 
>> + */
>> +MAYBE_UNUSED
>> +static int bundle_list_update(const char *key, const char *value,
>> +			      struct bundle_list *list)
>> +{
>> +	const char *pkey, *dot;
>> +	struct strbuf id = STRBUF_INIT;
>> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
>> +	struct remote_bundle_info *bundle;
>> +
>> +	if (!skip_prefix(key, "bundle.", &pkey))
>> +		return 1;
>> +	dot = strchr(pkey, '.');
>> +	if (!dot) {
>> +		if (!strcmp(pkey, "version")) {
>> +			int version = atoi(value);
> 
> Can atoi() safely fail?  Are we happy of pkey that says "1A" and we
> parse it as "1"?
> 
>> +			if (version != 1)
>> +				return 1;
>> +
>> +			list->version = version;
>> +			return 0;
>> +		}
> 
> Is it OK for a bundle list described in the config-file format to
> have "bundle.version" twice, giving different values?  It feels
> counter-intuitive to apply the "last one wins" rule that is usual
> for configuration files.

...
> This explicitly implements "the last one wins".  Would it really
> make sense for a server to serve a bundle list that says redundant
> and wasteful pieces of information, i.e.
> 
>     [bundle "1"]
> 	url = one
> 	url = two
> 
> It is not like doing so would allow us to reuse an otherwise mostly
> good file by appending new information and that would be a performance
> or storage win.  So I am not quite sure why we want "the last one wins"
> rule here.  It instead looks like something we want to sanity check
> and complain about.

I could switch this to "expect at most one value" and add warnings for
duplicate keys. Should duplicate keys then mean "the bundle list is
malformed, abort downloading bundles"? That seems reasonable to me.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:
> On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > 
> >> diff --git a/Documentation/config.txt b/Documentation/config.txt
> >> index e376d547ce0..4280af6992e 100644
> >> --- a/Documentation/config.txt
> >> +++ b/Documentation/config.txt
> >> @@ -387,6 +387,8 @@ include::config/branch.txt[]
> >>  
> >>  include::config/browser.txt[]
> >>  
> >> +include::config/bundle.txt[]
> >> +
> > 
> > The file that records a list of bundles may borrow the format of git
> > config files, but will we store their contents in configuration
> > files in the receiving (or originating) repository?  With the
> > presence of fields like "bundle.version", I somehow doubt it.
> > 
> > Should "git config --help" list them?
> 
> I suppose that at this point, they should be left out, since
> writing them to your Git config does nothing.
> 
> In the future, having these config values present will advertise
> the bundle list during the 'bundle-uri' protocol v2 command. That
> could use some clarification in the documentation, too, perhaps
> with a "bundle.*" item discussing how all of the other items are
> related to that advertisement.

I think the main point of confusion is that these config variables
currently do nothing when in a repo config, but they will be
subsequently used once we implement advertising them, and it is
convenient that these configs delegate to other files that have the same
format (and that we can specify, at the CLI, a file of the same format).
Maybe documentation like this would clear up the confusion:

  bundle.*::
  	The `bundle.*` keys may appear in a repo's config, in a file
  	linked by bundle.<id>.uri, or in a file passed to "clone
  	--bundle-uri".
  +
  NEEDSWORK: Currently, only the latter 2 situations work. `bundle.*` keys
  appearing in a repo's config will take effect once support for
  advertising bundles in fetch protocol v2 is implemented.
  +
  See link:technical/bundle-uri.html[the bundle URI design document] for
  more details.

@@ -4,6 +4,140 @@
#include "object-store.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
writes:

> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
> +{
> +	int result;
> +	const char *equals;
> +	struct strbuf key = STRBUF_INIT;
> +
> +	if (!strlen(line))
> +		return error(_("bundle-uri: got an empty line"));
> +
> +	equals = strchr(line, '=');
> +
> +	if (!equals)
> +		return error(_("bundle-uri: line is not of the form 'key=value'"));
> +	if (line == equals || !*(equals + 1))
> +		return error(_("bundle-uri: line has empty key or value"));

The suggestions implied by my asking fall strictly into the "it does
not have to exist here at this step and we can later extend it", but
for something whose equivalent can be stored in our configuration
file, it is curious why we _insist_ to refuse an empty string as the
value.

I do not miss the "key alone without even '=' means 'true'"
convention, personally, so insisting to have '=' is OK, but the
inability to have an empty string as a value looks a bit disturbing.

This depends on how the helper gets called, but most likely the
caller has a single line of pkt-line that it GAVE us to process, so
it sounds a bit wasteful to insist that "line" to be const to us and
force us to use a separate strbuf, instead of just stuffing NUL at
where we found '=' and pass the two halves to bundle_list_update().

Not a huge deal, it is just something I found funny in the "back in
the days we coded together, Linus would never have written like
this" way.

Other than that small detail, the code looks OK to me.

> +	strbuf_add(&key, line, equals - line);
> +	result = bundle_list_update(key.buf, equals + 1, list);
> +	strbuf_release(&key);
> +
> +	return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> index 6692aa4b170..f725c9796f7 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
>   */
>  int fetch_bundle_uri(struct repository *r, const char *uri);
>  
> -#endif
> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +
> +/**
> + * Parse a "key=value" packet line from the bundle-uri verb.
> + *
> + * Returns 0 on success and non-zero on error.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list,
> +			  const char *line);
> +
> +#endif /* BUNDLE_URI_H */

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2022 3:17 PM, Junio C Hamano wrote:
> "Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
> writes:
> 
>> +/**
>> + * General API for {transport,connect}.c etc.
>> + */
>> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
>> +{
>> +	int result;
>> +	const char *equals;
>> +	struct strbuf key = STRBUF_INIT;
>> +
>> +	if (!strlen(line))
>> +		return error(_("bundle-uri: got an empty line"));
>> +
>> +	equals = strchr(line, '=');
>> +
>> +	if (!equals)
>> +		return error(_("bundle-uri: line is not of the form 'key=value'"));
>> +	if (line == equals || !*(equals + 1))
>> +		return error(_("bundle-uri: line has empty key or value"));
> 
> The suggestions implied by my asking fall strictly into the "it does
> not have to exist here at this step and we can later extend it", but
> for something whose equivalent can be stored in our configuration
> file, it is curious why we _insist_ to refuse an empty string as the
> value.
> 
> I do not miss the "key alone without even '=' means 'true'"
> convention, personally, so insisting to have '=' is OK, but the
> inability to have an empty string as a value looks a bit disturbing.

I'd be happy to switch this to allow an empty value.
 
> This depends on how the helper gets called, but most likely the
> caller has a single line of pkt-line that it GAVE us to process, so
> it sounds a bit wasteful to insist that "line" to be const to us and
> force us to use a separate strbuf, instead of just stuffing NUL at
> where we found '=' and pass the two halves to bundle_list_update().

I can look into using a non-const buffer.

Thanks,
-Stolee

@@ -4,6 +4,202 @@
#include "object-store.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> To allow for the incremental fetch case, teach Git to understand a
> bundle list that could be advertised at an independent bundle URI. Such
> a bundle list is likely to be inspected by human readers, even if only
> by the bundle provider creating the list. For this reason, we can take
> our expected "key=value" pairs and instead format them using Git config
> format.

"can" does not explain why it is a good idea.  "As a sequence of
key=value pairs is a lot more dense and harder to read than the
configuration file format, let's declare that it is the format we
use in a file that holds a bundle-list" would be.

I do not personally buy it, though.  As I hinted in an earlier step,
some trait we associate with our configuration fioe format, like the
"last one wins" semantics, are undesirable ones, so even if we reuse
the appearance of the text, the semantics would have to become
different (including "syntax errors lead to die()" mentioned
elsewhere in the proposed log message).

> Update 'test-tool bundle-uri' to take this config file format as input.
> It uses a filename instead of stdin because there is no existing way to
> parse a FILE pointer in the config machinery. Using
> git_config_from_mem() is overly complicated and more likely to introduce
> bugs than this simpler version. I would rather have a slightly confusing
> test helper than complicated product code.

All the troubles described above seem to come from the initial
mistake to try reusing the configuration file parser or reusing the
configuration file format, at least to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2022 3:25 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> To allow for the incremental fetch case, teach Git to understand a
>> bundle list that could be advertised at an independent bundle URI. Such
>> a bundle list is likely to be inspected by human readers, even if only
>> by the bundle provider creating the list. For this reason, we can take
>> our expected "key=value" pairs and instead format them using Git config
>> format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.

This "more dense and harder to read" was definitely my intention for
wanting a different format. 

> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

The points you made earlier about "last one wins" semantics are the
biggest road-blocks to using the config file format, from what I've read
so far. We could change those semantics to be different from my current
implementation which respects the "last one wins" rule, and then that
makes the config format match not as closely. That burden of avoiding
multiple key values is not on the end-user but the bundle provider to
match the new expectations. (There might be something we should be careful
about when advertising the bundle list from our Git config in the
'bundle-uri' command in the next series.)

The "syntax errors lead to die()" is mitigated by using
CONFIG_ERROR_ERROR, which is what I meant by "Be careful to call..." I
should have been more clear that we are _not_ going to die() based on the
remote data. We might write an error message and then abort the bundle
download.

With all of these points in mind, I'd still prefer to use the config file
format as described in the design document. If you still don't agree, then
I'll change the format to be key=value pairs split with newlines, and
update the design document accordingly.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Tan wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > To allow for the incremental fetch case, teach Git to understand a
> > bundle list that could be advertised at an independent bundle URI. Such
> > a bundle list is likely to be inspected by human readers, even if only
> > by the bundle provider creating the list. For this reason, we can take
> > our expected "key=value" pairs and instead format them using Git config
> > format.
> 
> "can" does not explain why it is a good idea.  "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.
> 
> I do not personally buy it, though.  As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).

One reason for using the configuration file format (which perhaps could
have been better explained in the commit message) is that we plan to
have a way for a repo to advertise a list of bundles during fetch.
I think that config is a natural place to put that, even with its "last
one wins" semantics.

It could be argued that we can just put a single URI in config and only
allow advertising of a single URI (and then use a different format for
the bundle lists with semantics that are stricter than "last one wins"),
but that seems unnecessarily restrictive (and would make the client make
one more network request). And if we're advertising multiple bundles, it
seems reasonable to make all bundle lists have the same format (whether
they are in config or in a separate file).

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2022

This branch is now known as ds/bundle-uri-3.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2022

This patch series was integrated into seen via git@5367958.

@gitgitgadget gitgitgadget bot added the seen label Aug 22, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2022

This patch series was integrated into seen via git@45a6cf2.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2022

This patch series was integrated into seen via git@475d589.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 24, 2022

This patch series was integrated into seen via git@d6010ce.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into seen via git@0709a31.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

There was a status update in the "New Topics" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

Needs review.
source: <pull.1333.git.1661181174.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 25, 2022

This patch series was integrated into seen via git@713624f.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 26, 2022

This patch series was integrated into seen via git@61be489.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 26, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

Needs review.
source: <pull.1333.git.1661181174.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 29, 2022

This patch series was integrated into seen via git@d5803d0.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2022

This patch series was integrated into seen via git@b4f8e61.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2022

This patch series was integrated into seen via git@e028421.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

Needs review.
source: <pull.1333.git.1661181174.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2022

This patch series was integrated into seen via git@35b6c51.

@@ -387,6 +387,8 @@ include::config/branch.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Glen Choo wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.
> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> +			      struct bundle_list *list)
> +{
> +	const char *pkey, *dot;
> +	struct strbuf id = STRBUF_INIT;
> +	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> +	struct remote_bundle_info *bundle;
> +
> +	if (!skip_prefix(key, "bundle.", &pkey))
> +		return 1;
> +
> +	dot = strchr(pkey, '.');
> +	if (!dot) {
> +		if (!strcmp(pkey, "version")) {
> +			int version = atoi(value);
> +			if (version != 1)
> +				return 1;
> +
> +			list->version = version;
> +			return 0;
> +		}
> +
> +		if (!strcmp(pkey, "mode")) {
> +			if (!strcmp(value, "all"))
> +				list->mode = BUNDLE_MODE_ALL;
> +			else if (!strcmp(value, "any"))
> +				list->mode = BUNDLE_MODE_ANY;
> +			else
> +				return 1;
> +			return 0;
> +		}

Drive-by comment from Review Club: we could simplify
"section.[subsection.]key" parsing using parse_config_key(). There are
other places in the code that do custom parsing like this, but maybe
they should use parse_config_key() too.

> +
> +		/* Ignore other unknown global keys. */
> +		return 0;
> +	}
> +
> +	strbuf_add(&id, pkey, dot - pkey);
> +	dot++;
> +
> +	/*
> +	 * Check for an existing bundle with this <id>, or create one
> +	 * if necessary.
> +	 */
> +	lookup.id = id.buf;
> +	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> +	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> +		CALLOC_ARRAY(bundle, 1);
> +		bundle->id = strbuf_detach(&id, NULL);
> +		strbuf_init(&bundle->file, 0);
> +		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> +		hashmap_add(&list->bundles, &bundle->ent);
> +	}
> +	strbuf_release(&id);
> +
> +	if (!strcmp(dot, "uri")) {
> +		free(bundle->uri);
> +		bundle->uri = xstrdup(value);
> +		return 0;
> +	}
> +
> +	/*
> +	 * At this point, we ignore any information that we don't
> +	 * understand, assuming it to be hints for a heuristic the client
> +	 * does not currently understand.
> +	 */
> +	return 0;
> +}
> +
>  static int find_temp_filename(struct strbuf *name)
>  {
>  	int fd;
> -- 
> gitgitgadget

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2022

User Glen Choo <chooglen@google.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2022

User Jonathan Tan <jonathantanmy@google.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2022

This patch series was integrated into seen via git@21b73fd.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 1, 2022

This patch series was integrated into seen via git@9a5b0fe.

@@ -387,6 +387,8 @@ include::config/branch.txt[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Teng Long wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> +> +	init_bundle_list(&list);
> +	while (strbuf_getline(&sb, stdin) != EOF) {
> +		if (bundle_uri_parse_line(&list, sb.buf) < 0)
> +			err = error("bad line: '%s'", sb.buf);
> +	}

The command to write such a test is useful for people who
want to experiment about the feature, Thanks. On top of that,
I have a little question about the condition:

  if (bundle_uri_parse_line(&list, sb.buf) < 0)

"bundle_uri_parse_line" will call "bundle_list_update" inside, and
could get the result of it as "bundle_uri_parse_line"'s return, then
actually "bundle_list_update" could return "1", so I'm not sure but maybe
the line could modified to:

  if (bundle_uri_parse_line(&list, sb.buf))

at here.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 13, 2022

This patch series was integrated into seen via git@0a5c856.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2022

This patch series was integrated into seen via git@077d1cb.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 17, 2022

This patch series was integrated into seen via git@c248d22.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 17, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.
source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2022

This patch series was integrated into seen via git@1f6debc.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 19, 2022

This patch series was integrated into seen via git@fdc2ce0.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.
source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.
source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2022

This patch series was integrated into seen via git@5343e9f.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 20, 2022

This patch series was integrated into seen via git@bec6733.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 21, 2022

This patch series was integrated into seen via git@c22ffd8.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 21, 2022

This patch series was integrated into seen via git@273e05b.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 22, 2022

This patch series was integrated into seen via git@00743d0.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 23, 2022

This patch series was integrated into seen via git@3a33d3e.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 24, 2022

This patch series was integrated into seen via git@415696b.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 26, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 10/12/2022 8:52 AM, Derrick Stolee via GitGitGadget wrote:
> This is the third series building the bundle URI feature. It is built on top
> of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
> a URI to a bundle file. This series adds the capability of downloading and
> parsing a bundle list and then downloading the URIs in that list.
> 
> The core functionality of bundle lists is implemented by creating data
> structures from a list of key-value pairs. These pairs can come from a
> plain-text file in Git config format, but in the future, we will support the
> list being supplied by packet lines over Git's protocol v2 in the
> 'bundle-uri' command (reserved for the next series).

This version has been available for a while now without comment. Could
we consider it for merging to 'next' soon?

I want to wait for this series to merge into 'master' before sending
part IV on top, which advertises bundle URIs over protocol v2.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 26, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> This version has been available for a while now without comment. Could
> we consider it for merging to 'next' soon?

Could somebody who has reviewed it fully give an Ack (or two)?  I
know an earlier rounds had some comments, but after v3 things have
quieted down.

I know the change from v4 to v5 has good improvements, but do not
claim to have read the other parts in detail.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 26, 2022

This patch series was integrated into seen via git@dcec1e4.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 26, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

Will merge to 'next'?
source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 27, 2022

This patch series was integrated into seen via git@985c158.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 28, 2022

This patch series was integrated into seen via git@d8100ee.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 28, 2022

This patch series was integrated into next via git@9d9092b.

@gitgitgadget gitgitgadget bot added the next label Oct 28, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 28, 2022

There was a status update in the "Cooking" section about the branch ds/bundle-uri-3 on the Git mailing list:

Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

Will merge to 'master'.
source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 31, 2022

This patch series was integrated into seen via git@d32dd8a.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 31, 2022

This patch series was integrated into master via git@d32dd8a.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 31, 2022

This patch series was integrated into next via git@d32dd8a.

@gitgitgadget gitgitgadget bot added the master label Oct 31, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 31, 2022

Closed via d32dd8a.

@gitgitgadget gitgitgadget bot closed this Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants