-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial bundles #1159
Partial bundles #1159
Conversation
The --promisor option of 'git index-pack' was created in 88e2f9e (introduce fetch-object: fetch one promisor object, 2017-12-05) but was untested. It is currently unused within the Git codebase, but that will change in an upcoming change to 'git bundle unbundle' when there is a filter capability. For now, add documentation about the option and add a test to ensure it is working as expected. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
/submit |
Submitted as pull.1159.git.1645638911.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
d552efd
to
82dd0cb
Compare
bundle.c
Outdated
@@ -451,6 +451,12 @@ struct bundle_prerequisites_info { | |||
int fd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff Hostetler wrote (reply to this):
On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> > Since 'git bundle' uses setup_revisions() to specify the object walk,
> some options do not make sense to include during the pack-objects child
> process. Further, these options are used for a call to
> traverse_commit_list() which would then require a callback which is
> currently NULL.
> > By populating the callback we prevent a segfault in the case of adding
> the --objects flag. This is really a redundant statement because the
Nit: I stumbled over "...because the bundles are constructing..."
Is there a better wording here??
> bundles are constructing a pack-file containing all objects in the
> discovered commit range.
> > Adding --objects to a 'git bundle' command might cause a slower command,
> but at least it will not have a hard failure when the user supplies this
> option. We can also disable walking trees and blobs in advance of this
> walk.
> > Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> bundle.c | 10 +++++++++-
> t/t6020-bundle-misc.sh | 12 ++++++++++++
> 2 files changed, 21 insertions(+), 1 deletion(-)
> > diff --git a/bundle.c b/bundle.c
> index a0bb687b0f4..dc56db9a50a 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
> int fd;
> };
> > +
> +static void ignore_object(struct object *obj, const char *v, void *data)
> +{
> + /* Do nothing. */
> +}
> +
> static void write_bundle_prerequisites(struct commit *commit, void *data)
> {
> struct bundle_prerequisites_info *bpi = data;
> @@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
> die("revision walk setup failed");
> bpi.fd = bundle_fd;
> bpi.pending = &revs_copy.pending;
> - traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
> +
> + revs.blob_objects = revs.tree_objects = 0;
> + traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
> object_array_remove_duplicates(&revs_copy.pending);
> > /* write bundle refs */
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index b13e8a52a93..6522401617d 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
> test_cmp expect actual
> '
> > +test_expect_success 'unfiltered bundle with --objects' '
> + git bundle create all-objects.bdl \
> + --all --objects &&
> + git bundle create all.bdl \
> + --all &&
> +
> + # Compare the headers of these files.
> + head -11 all.bdl >expect &&
> + head -11 all-objects.bdl >actual &&
> + test_cmp expect actual
> +'
> +
> test_done
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff Hostetler <git@jeffhostetler.com> writes:
> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>> Since 'git bundle' uses setup_revisions() to specify the object
>> walk,
>> some options do not make sense to include during the pack-objects child
>> process. Further, these options are used for a call to
>> traverse_commit_list() which would then require a callback which is
>> currently NULL.
>> By populating the callback we prevent a segfault in the case of
>> adding
>> the --objects flag. This is really a redundant statement because the
>
> Nit: I stumbled over "...because the bundles are constructing..."
> Is there a better wording here??
"... because the command is constructing ..." should be sufficient,
I hope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 3/4/2022 5:58 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
>
>> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> Since 'git bundle' uses setup_revisions() to specify the object
>>> walk,
>>> some options do not make sense to include during the pack-objects child
>>> process. Further, these options are used for a call to
>>> traverse_commit_list() which would then require a callback which is
>>> currently NULL.
>>> By populating the callback we prevent a segfault in the case of
>>> adding
>>> the --objects flag. This is really a redundant statement because the
>>
>> Nit: I stumbled over "...because the bundles are constructing..."
>> Is there a better wording here??
>
> "... because the command is constructing ..." should be sufficient,
> I hope?
That works for me. Thanks!
-Stolee
User |
On the Git mailing list, Jeff Hostetler wrote (reply to this):
|
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
8854f3b
to
a4fa9bb
Compare
On the Git mailing list, Jeff Hostetler wrote (reply to this):
|
4f693b8
to
d55ad1d
Compare
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
@@ -62,7 +62,6 @@ static const char rev_list_usage[] = | |||
static struct progress *progress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> static int try_bitmap_count(struct rev_info *revs,
> - struct list_objects_filter_options *filter,
> int filter_provided_objects)
This makes quite a lot of sense as filter is now available as
revs->filter.
> {
> uint32_t commit_count = 0,
> @@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
> */
> max_count = revs->max_count;
>
> - bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
> + bitmap_git = prepare_bitmap_walk(revs, revs->filter,
> + filter_provided_objects);
And we should be able to do the same to prepare_bitmap_walk(). It
is OK if such a change comes later and not as part of this commit.
Perhaps it is deliberate. Unlike the helpers this step touches,
namely, try_bitmap_count(), try_bitmap_traversal(), and
try_bitmap_disk_usage(), prepare_bitmap_walk() is not a file-scope
static helper and updating it will need touching many more places.
> @@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> }
>
> if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
#leftoverbit. We need to remember to clean this up, now "--filter"
is well established (I am assuming this literal-string pasting is
because we didn't know what the right and final word to be used as
the option name back when this code was originally written), when
the code around here is quiescent.
> - parse_list_objects_filter(&filter_options, arg);
> - if (filter_options.choice && !revs.blob_objects)
> + if (!revs.filter)
> + CALLOC_ARRAY(revs.filter, 1);
> + parse_list_objects_filter(revs.filter, arg);
> + if (revs.filter->choice && !revs.blob_objects)
> die(_("object filtering requires --objects"));
> continue;
OK. The original "filter_options" was a structure and not a pointer
to a structure; now we have a pointer to a structure in revs as a
member so we need an on-demand allocation. CALLOC_ARRAY() instead
of xcalloc(), when we know we are creating one element and not an
array of elements whose size happens to be one, is not wrong but it
does look strange. Was there a reason why we avoid xcalloc()?
Makes me also wonder how big the filter_options structure is;
because we will not use unbounded many revs structure, it may have
been a simpler conversion to turn a static struct into an embedded
struct member in a struct (instead of a member of a struct that is a
pointer to a struct). That way, we did not have to ...
> }
> if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
> - list_objects_filter_set_no_filter(&filter_options);
> + if (!revs.filter)
> + CALLOC_ARRAY(revs.filter, 1);
... repeat the on-demand allocation. If some code used to pass
&filter_options in a parameter to helper functions, and such calling
sites get rewritten to pass the value in the revs.filter pointer,
and if revs hasn't gone through this codepath, these helper functions
will start receiving NULL in their filter_options parameter, which
they may or may not be prepared to take. This "we get rid of a
global struct and replace it with an on-demand allocated structure,
pointer to which is stored in the revs structure" rewrite somehow
makes me nervous.
> diff --git a/revision.h b/revision.h
> index 3c58c18c63a..1ddb73ab82e 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -81,6 +81,7 @@ struct rev_cmdline_info {
>
> struct oidset;
> struct topo_walk_info;
> +struct list_objects_filter_options;
Is the forward-declaration the only reason why we needed to have a
pointer to a(n opaque) struct, not an embedded struct, as a member?
> struct rev_info {
> /* Starting list */
> @@ -94,6 +95,9 @@ struct rev_info {
> /* The end-points specified by the end user */
> struct rev_cmdline_info cmdline;
>
> + /* Object filter options. NULL for no filtering. */
> + struct list_objects_filter_options *filter;
> +
> /* excluding from --branches, --refs, etc. expansion */
> struct string_list *ref_excludes;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 3/4/2022 5:15 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> static int try_bitmap_count(struct rev_info *revs,
>> - struct list_objects_filter_options *filter,
>> int filter_provided_objects)
>
> This makes quite a lot of sense as filter is now available as
> revs->filter.
>
>> {
>> uint32_t commit_count = 0,
>> @@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
>> */
>> max_count = revs->max_count;
>>
>> - bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
>> + bitmap_git = prepare_bitmap_walk(revs, revs->filter,
>> + filter_provided_objects);
>
> And we should be able to do the same to prepare_bitmap_walk(). It
> is OK if such a change comes later and not as part of this commit.
>
> Perhaps it is deliberate. Unlike the helpers this step touches,
> namely, try_bitmap_count(), try_bitmap_traversal(), and
> try_bitmap_disk_usage(), prepare_bitmap_walk() is not a file-scope
> static helper and updating it will need touching many more places.
I'm making a note that this cleanup can happen in a follow-up series.
>> @@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>> }
>>
>> if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
>
> #leftoverbit. We need to remember to clean this up, now "--filter"
> is well established (I am assuming this literal-string pasting is
> because we didn't know what the right and final word to be used as
> the option name back when this code was originally written), when
> the code around here is quiescent.
Good point.
>> - parse_list_objects_filter(&filter_options, arg);
>> - if (filter_options.choice && !revs.blob_objects)
>> + if (!revs.filter)
>> + CALLOC_ARRAY(revs.filter, 1);
>> + parse_list_objects_filter(revs.filter, arg);
>> + if (revs.filter->choice && !revs.blob_objects)
>> die(_("object filtering requires --objects"));
>> continue;
>
> OK. The original "filter_options" was a structure and not a pointer
> to a structure; now we have a pointer to a structure in revs as a
> member so we need an on-demand allocation. CALLOC_ARRAY() instead
> of xcalloc(), when we know we are creating one element and not an
> array of elements whose size happens to be one, is not wrong but it
> does look strange. Was there a reason why we avoid xcalloc()?
I think I've been using CALLOC_ARRAY(..., 1) over "... = xcalloc()"
for simplicity's sake for a while. I see quite a few across the
codebase, too, but I can swap the usage here if you feel that is
important.
> Makes me also wonder how big the filter_options structure is;
> because we will not use unbounded many revs structure, it may have
> been a simpler conversion to turn a static struct into an embedded
> struct member in a struct (instead of a member of a struct that is a
> pointer to a struct). That way, we did not have to ...
>>> }
>> if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
>> - list_objects_filter_set_no_filter(&filter_options);
>> + if (!revs.filter)
>> + CALLOC_ARRAY(revs.filter, 1);
>
> ... repeat the on-demand allocation. If some code used to pass
> &filter_options in a parameter to helper functions, and such calling
> sites get rewritten to pass the value in the revs.filter pointer,
> and if revs hasn't gone through this codepath, these helper functions
> will start receiving NULL in their filter_options parameter, which
> they may or may not be prepared to take. This "we get rid of a
> global struct and replace it with an on-demand allocated structure,
> pointer to which is stored in the revs structure" rewrite somehow
> makes me nervous.
I think the main idea is that the filter being NULL indicates "no
filter is used. Do a full object walk." If we use a static struct,
then we need to instead use revs->filter.filter_spec.nr, but that
is already being used as a BUG() statement:
const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
{
if (!filter->filter_spec.nr)
BUG("no filter_spec available for this filter");
so a non-NULL filter with no specs is considered invalid, while
a NULL filter _is_ currently considered valid.
While we _could_ make that switch to using a static struct and
change our checks to allow empty specs, that would be a more
involved change. Maybe we can leave it for a follow up?
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Derrick Stolee <derrickstolee@github.com> writes:
>> member so we need an on-demand allocation. CALLOC_ARRAY() instead
>> of xcalloc(), when we know we are creating one element and not an
>> array of elements whose size happens to be one, is not wrong but it
>> does look strange. Was there a reason why we avoid xcalloc()?
>
> I think I've been using CALLOC_ARRAY(..., 1) over "... = xcalloc()"
> for simplicity's sake for a while. I see quite a few across the
> codebase, too, but I can swap the usage here if you feel that is
> important.
Not at all. I think it was just me who was confused; CALLOC_ARRAY()
vs xcalloc() is not confusing. Both are capable of and meant for
allocating an array of these elements, and use of it for a single
element non-array is the same either way. Nothing to complain about.
>> they may or may not be prepared to take. This "we get rid of a
>> global struct and replace it with an on-demand allocated structure,
>> pointer to which is stored in the revs structure" rewrite somehow
>> makes me nervous.
>
> I think the main idea is that the filter being NULL indicates "no
> filter is used. Do a full object walk." If we use a static struct,
> then we need to instead use revs->filter.filter_spec.nr, but that
> is already being used as a BUG() statement:
Thanks. My observation was primarily that it looked deviating from
the original code but that is not an objection unless the original
was without room for improvement. And in fact in this case, I think
the global variable that was a static struct should have been a
global variable that is a pointer to a struct which is NULL unless
the filtering capability is being used. So in that sense, the
conversion done in this series is an improvement and going in the
right direction.
> While we _could_ make that switch to using a static struct and
> change our checks to allow empty specs, that would be a more
> involved change. Maybe we can leave it for a follow up?
No, there is no need to. A pointer that is NULL when unused is the
right thing to do.
Thanks.
@@ -71,6 +71,11 @@ and the Git bundle v2 format cannot represent a shallow clone repository. | |||
== Capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Tue, Mar 08 2022, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> [...]
> static const char v2_bundle_signature[] = "# v2 git bundle\n";
> static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -33,6 +33,7 @@ void bundle_header_release(struct bundle_header *header)
> {
> string_list_clear(&header->prerequisites, 1);
> string_list_clear(&header->references, 1);
> + list_objects_filter_release(&header->filter);
> }
>
> static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +46,10 @@ static int parse_capability(struct bundle_header *header, const char *capability
> header->hash_algo = &hash_algos[algo];
> return 0;
> }
> + if (skip_prefix(capability, "filter=", &arg)) {
> + parse_list_objects_filter(&header->filter, arg);
> + return 0;
> + }
> return error(_("unknown capability '%s'"), capability);
> }
I haven't tested, but I did wonder if our purely "check reachability"
(or equivalent) "verify" would be slowed down by doing whatever filter
magic we're doing here, but then remembered/saw that we only parse the
header, so it can't be that bad :)
I.e. this is only checking the syntax of the filter, surely, and then
spits it back at us. That makes sense.
I think that this hunk from the subsequent 10/12 is in the wrong place
though, and should be here when we change "verify" (not "create" later):
diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 72ab8139052..831c4788a94 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -75,8 +75,8 @@ verify <file>::
cleanly to the current repository. This includes checks on the
bundle format itself as well as checking that the prerequisite
commits exist and are fully linked in the current repository.
- 'git bundle' prints a list of missing commits, if any, and exits
- with a non-zero status.
+ 'git bundle' prints the bundle's object filter and its list of
+ missing commits, if any, and exits with a non-zero status.
list-heads <file>::
Lists the references defined in the bundle. If followed by a
I think instead of starting to list every header we might add in the
future there it would make sense to just add after the pre-image
full-stop something like:
We'll also print out information about any known capabilities, such
as "object filter". See "Capabilities" in technical/...
Which future proofs it a bit...
> [...]
I think it would also be great to have tests for intentionally
corrupting the header and seeing what the "verify" output is, i.e. do we
die right away, proceed to validate the rest. So just (somewhat
pseudocode):
git bundle create [...] my.bdl &&
sed 's/@filter: .*/@filter: bad/' <my.bdl >bad.bdl &&
test_must_fail git bundle verify bad.bdl 2>actual &&
[...]
test_cmp expect actual
Overall this series looks really good at this point, and I'm down to
minor shades of colors on the bikeshedding :)
805e1d1
to
e14691b
Compare
As we add more embedded members with type 'struct list_objects_filter_options', it will be important to easily perform a deep copy across multiple such structs. Create list_objects_filter_copy() to satisfy this need. This method is recursive to match the recursive nature of the struct. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
e14691b
to
3507a1b
Compare
Placing a 'struct list_objects_filter_options' within 'struct rev_info' will assist making some bookkeeping around object filters in the future. For now, let's use this new member to remove a static global instance of the struct from builtin/rev-list.c. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
In builtin/pack-objects.c, we use a 'filter_options' global to populate the --filter=<X> argument. The previous change created a pointer to a filter option in 'struct rev_info', so we can use that pointer here as a start to simplifying some usage of object filters. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Now that all consumers of prepare_bitmap_walk() have populated the 'filter' member of 'struct rev_info', we can drop that extra parameter from the method and access it directly from the 'struct rev_info'. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Now that all consumers of traverse_commit_list_filtered() populate the 'filter' member of 'struct rev_info', we can drop that parameter from the method prototype to simplify things. In addition, the only thing different now between traverse_commit_list_filtered() and traverse_commit_list() is the presence of the 'omitted' parameter, which is only non-NULL for one caller. We can consolidate these two methods by having one call the other and use the simpler form everywhere the 'omitted' parameter would be NULL. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The previous change consolidated traverse_commit_list() and traverse_commit_list_filtered(). This allows us to simplify the recommended usage in MyFirstObjectWalk.txt to use this new set of values. While here, add some clarification on the difference between the two methods. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
If a caller to traverse_commit_list() specifies the options for the --objects flag but does not specify a show_object function pointer, the result is a segfault. This is currently visible by running 'git bundle create --objects HEAD'. We could fix this problem by supplying a no-op callback in builtin/bundle.c, but that only solves the problem for one builtin, leaving this segfault open for other callers. Replace all callers of the show_commit and show_object function pointers in list-objects.c to call helper functions show_commit() and show_object() which check that the given context has non-NULL functions before passing the necessary data. One extra benefit is that it reduces duplication due to passing ctx->show_data to every caller. Test that this segfault no longer occurs for 'git bundle'. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The v3 bundle format has capabilities, allowing newer versions of Git to create bundles with newer features. Older versions that do not understand these new capabilities will fail with a helpful warning. Create a new capability allowing Git to understand that the contained pack-file is filtered according to some object filter. Typically, this filter will be "blob:none" for a blobless partial clone. This change teaches Git to parse this capability, place its value in the bundle header, and demonstrate this understanding by adding a message to 'git bundle verify'. Since we will use gently_parse_list_objects_filter() outside of list-objects-filter-options.c, make it an external method and move its API documentation to before its declaration. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Now that 'struct rev_info' has a 'filter' member and most consumers of object filtering are using that member instead of an external struct, move the parsing of the '--filter' option out of builtin/rev-list.c and into revision.c. This use within handle_revision_pseudo_opt() allows us to find the option within setup_revisions() if the arguments are passed directly. In the case of a command such as 'git blame', the arguments are first scanned and checked with parse_revision_opt(), which complains about the option, so 'git blame --filter=blob:none <file>' does not become valid with this change. Some commands, such as 'git diff' gain this option without having it make an effect. And 'git diff --objects' was already possible, but does not actually make sense in that builtin. The key addition that is coming is 'git bundle create --filter=<X>' so we can create bundles containing promisor packs. More work is required to make them fully functional, but that will follow. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
A previous change allowed Git to parse bundles with the 'filter' capability. Now, teach Git to create bundles with this option. Some rearranging of code is required to get the option parsing in the correct spot. There are now two reasons why we might need capabilities (a new hash algorithm or an object filter) so that is pulled out into a place where we can check both at the same time. The --filter option is parsed as part of setup_revisions(), but it expected the --objects flag, too. That flag is somewhat implied by 'git bundle' because it creates a pack-file walking objects, but there is also a walk that walks the revision range expecting only commits. Make this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects' before the call to setup_revisions(). Signed-off-by: Derrick Stolee <derrickstolee@github.com>
In order to have a valid pack-file after unbundling a bundle that has the 'filter' capability, we need to generate a .promisor file. The bundle does not promise _where_ the objects can be found, but we can expect that these bundles will be unbundled in repositories with appropriate promisor remotes that can find those missing objects. Use the 'git index-pack --promisor=<message>' option to create this .promisor file. Add "from-bundle" as the message to help anyone diagnose issues with these promisor packs. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Users can create a new repository using 'git clone <bundle-file>'. The new "@filter" capability for bundles means that we can generate a bundle that does not contain all reachable objects, even if the header has no negative commit OIDs. It is feasible to think that we could make a filtered bundle work with the command git clone --filter=$filter --bare <bundle-file> or possibly replacing --bare with --no-checkout. However, this requires having some repository-global config that specifies the specified object filter and notifies Git about the existence of promisor pack-files. Without a remote, that is currently impossible. As a stop-gap, parse the bundle header during 'git clone' and die() with a helpful error message instead of the current behavior of failing due to "missing objects". Most of the existing logic for handling bundle clones actually happens in fetch-pack.c, but that logic is the same as if the user specified 'git fetch <bundle>', so we want to avoid failing to fetch a filtered bundle when in an existing repository that has the proper config set up for at least one remote. Carefully comment around the test that this is not the desired long-term behavior of 'git clone' in this case, but instead that we need to do more work before that is possible. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
3507a1b
to
de6a1a8
Compare
/submit |
Submitted as pull.1159.v4.git.1646841703.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -75,8 +75,11 @@ verify <file>:: | |||
cleanly to the current repository. This includes checks on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
> index 72ab8139052..ac4c4352aae 100644
> --- a/Documentation/git-bundle.txt
> +++ b/Documentation/git-bundle.txt
> @@ -75,8 +75,11 @@ verify <file>::
> cleanly to the current repository. This includes checks on the
> bundle format itself as well as checking that the prerequisite
> commits exist and are fully linked in the current repository.
> - 'git bundle' prints a list of missing commits, if any, and exits
> - with a non-zero status.
> + Information about additional capabilities, such as "object filter",
> + is printed. See "Capabilities" in link:technical/bundle-format.html
> + for more information. Finally, 'git bundle' prints a list of
> + missing commits, if any. The exit code is zero for success, but
> + will be nonzero if the bundle file is invalid.
Hmph. I wasn't expecting this change (not objecting, but mostly am
surprised) relative to the previous round where the filter was
mentioned only when we issue an error message. I was expecting to
see something like "list-filters <file>", which is analog to the
"list-heads <file>", to help those who want to programatically build
around the "git bundle" command output. Or "--list-capabilities" to
accomodate the current, this, and future capabilities. We already
have the object-format thing before this series. Do we have an
interface to expose that out of a given bundle file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 3/9/2022 1:41 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
>> index 72ab8139052..ac4c4352aae 100644
>> --- a/Documentation/git-bundle.txt
>> +++ b/Documentation/git-bundle.txt
>> @@ -75,8 +75,11 @@ verify <file>::
>> cleanly to the current repository. This includes checks on the
>> bundle format itself as well as checking that the prerequisite
>> commits exist and are fully linked in the current repository.
>> - 'git bundle' prints a list of missing commits, if any, and exits
>> - with a non-zero status.
>> + Information about additional capabilities, such as "object filter",
>> + is printed. See "Capabilities" in link:technical/bundle-format.html
>> + for more information. Finally, 'git bundle' prints a list of
>> + missing commits, if any. The exit code is zero for success, but
>> + will be nonzero if the bundle file is invalid.
>
> Hmph. I wasn't expecting this change (not objecting, but mostly am
> surprised) relative to the previous round where the filter was
> mentioned only when we issue an error message. I was expecting to
> see something like "list-filters <file>", which is analog to the
> "list-heads <file>", to help those who want to programatically build
> around the "git bundle" command output. Or "--list-capabilities" to
> accomodate the current, this, and future capabilities. We already
> have the object-format thing before this series. Do we have an
> interface to expose that out of a given bundle file?
The object format does _not_ appear to be output by 'git bundle verify'.
I just ran t6020 under GIT_TEST_DEFAULT_HASH=sha256 and see the
capability being written, but not output by verify:
The bundle contains these 10 refs:
d519553fbcf280df4448d588c25a51872f2d8dec95ba65a8a1bd3c64a5eec664 refs/heads/main
265b1effb3fdb80e04f7ea64e717f5677ddf57d00145dce7c508ba1f5ddb9081 refs/heads/release
611ac8182ea26d7aad227873b70f584593af1aa584bbdd37b36055e71be6ccd7 refs/heads/topic/1
4251af01ec70cdca692a3d15d78ccb9a6ca92ef344bd2dbc3bac20081347ae9b refs/heads/topic/2
611ac8182ea26d7aad227873b70f584593af1aa584bbdd37b36055e71be6ccd7 refs/pull/1/head
ec7e40a591df46923b25fd44bd86a2a80927f343d141f55ddf295c5d2d57959e refs/pull/2/head
754e9363bbfce179d35ccc48ae3a3c81db95a489cc632fafe5c10b25aed29d74 refs/tags/v1
a96c78650835f2041c49dce964bb759add14cfc4d35af3b7ee2b22289f9ba817 refs/tags/v2
398a930e72d21ea455c982227cca3c8fb5feb88c31f1d42226a3e6c42ff8db8f refs/tags/v3
d519553fbcf280df4448d588c25a51872f2d8dec95ba65a8a1bd3c64a5eec664 HEAD
The bundle uses this filter: blob:none
The bundle records a complete history.
partial.bdl is okay
This output does not seem to be designed for machine parsing, so
this extension of "The bundle uses this filter:" shouldn't be
problematic. A similar addition for object-format should be
possible as a follow-up.
Further, it seems you are asking for something that just reads
the header and supplies valuable data from the header such as
the refs and the capabilities. Perhaps 'git bundle header <file>'?
(We already have 'git bundle list-heads', so maybe we should
leave the refs out of 'git bundle header' output?)
I can add both as follow-ups to my existing list of things.
Thanks,
-Stolee
@@ -62,7 +62,6 @@ static const char rev_list_usage[] = | |||
static struct progress *progress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/revision.h b/revision.h
> index 3c58c18c63a..b1669a8cc33 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -8,6 +8,7 @@
> #include "pretty.h"
> #include "diff.h"
> #include "commit-slab-decl.h"
> +#include "list-objects-filter-options.h"
>
> /**
> * The revision walking API offers functions to build a list of revisions
> @@ -94,6 +95,12 @@ struct rev_info {
> /* The end-points specified by the end user */
> struct rev_cmdline_info cmdline;
>
> + /*
> + * Object filter options. No filtering is specified
> + * if and only if filter.choice is zero.
> + */
> + struct list_objects_filter_options filter;
I wondered if s/zero/LOFC_DISABLED/ would make it more helpful, but
seeing changes like the one i the later "parse filter capability"
step, which can just become
- if (header->filter.choice != LOFC_DISABLED) {
+ if (header->filter.choice) {
... do things ...
relative to the previous round, I think what is in the posted patch
indeed is a better way to help developers.
This patch series was integrated into seen via git@455f075. |
This patch series was integrated into seen via git@3b3888b. |
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> In builtin/pack-objects.c, we use a 'filter_options' global to populate
> the --filter=<X> argument. The previous change created a pointer to a
> filter option in 'struct rev_info', so we can use that pointer here as a
> start to simplifying some usage of object filters.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/pack-objects.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ba2006f2212..e5b7d015d7d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>
> static int get_object_list_from_bitmap(struct rev_info *revs)
> {
> - if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
> + if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
> return -1;
>
> if (pack_options_allow_reuse() &&
> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
> repo_init_revisions(the_repository, &revs, NULL);
> save_commit_buffer = 0;
> setup_revisions(ac, av, &revs, &s_r_opt);
> + list_objects_filter_copy(&revs.filter, &filter_options);
>
> /* make sure shallows are read */
> is_repository_shallow(the_repository);
> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>
> if (!fn_show_object)
> fn_show_object = show_object;
> - traverse_commit_list_filtered(&filter_options, &revs,
> + traverse_commit_list_filtered(&revs.filter, &revs,
> show_commit, fn_show_object, NULL,
> NULL);
Re your
https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
I was looking at how to handle the interaction between this and my
revisions_release() series.
This adds a new memory leak, which can be seen with:
make SANITIZE=leak
(cd t && ./t5532-fetch-proxy.sh -vixd)
I.e. this part is new:
remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
remote: #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
remote: #1 0x75a089 in do_xmalloc wrapper.c:41:8
remote: #2 0x75a046 in xmalloc wrapper.c:62:9
remote: #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
remote: #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
remote: #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
remote: #6 0x4592ba in run_builtin git.c:465:11
remote: #7 0x457d71 in handle_builtin git.c:718:3
remote: #8 0x458ca5 in run_argv git.c:785:4
remote: #9 0x457b30 in cmd_main git.c:916:19
remote: #10 0x563179 in main common-main.c:56:11
remote: #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
remote: #12 0x4300e9 in _start (git+0x4300e9)
Of course it's not "new" in the sense that we in effect leaked this
before, but it was still reachable, you're just changing it so that
instead of being stored in the static "filter_options" variable in
pack-objects.c we're storing it in "struct rev_info", which has a
different lifetime.
I think instead of me rebasing my series on top of yours and tangling
the two up a better option is to just add a change to this, so after
list_objects_filter_copy() do:
UNLEAK(revs.filter);
Or, alternatively adding this to the end of the function (in which case
Junio will need to deal with a minor textual conflict):
list_objects_filter_release(&revs.filter);
Both of those make my series merged with "seen" (which has this change)
pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.
You could do the same in your later change adding
list_objects_filter_copy() to verify_bundle(), that one also adds a new
leak, but happens not to cause test failures since the bundle.c code
isn't otherwise marked as passing with SANITIZE=leak, it fails in
various other ways.
Obviously we should do something about the actual leak eventually, but
that can be done in some follow-up work to finish up the missing bits of
release_revisions(), i.e. adding list_objects_filter_release() etc. to
release_revisions().
So I think just adding UNLEAK() here (and optionally, also to the
bundle.c code) is the least invasive thing, if you & Junio are OK with
that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 3/10/2022 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> In builtin/pack-objects.c, we use a 'filter_options' global to populate
>> the --filter=<X> argument. The previous change created a pointer to a
>> filter option in 'struct rev_info', so we can use that pointer here as a
>> start to simplifying some usage of object filters.
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>> builtin/pack-objects.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index ba2006f2212..e5b7d015d7d 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>>
>> static int get_object_list_from_bitmap(struct rev_info *revs)
>> {
>> - if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
>> + if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
>> return -1;
>>
>> if (pack_options_allow_reuse() &&
>> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>> repo_init_revisions(the_repository, &revs, NULL);
>> save_commit_buffer = 0;
>> setup_revisions(ac, av, &revs, &s_r_opt);
>> + list_objects_filter_copy(&revs.filter, &filter_options);
>>
>> /* make sure shallows are read */
>> is_repository_shallow(the_repository);
>> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>>
>> if (!fn_show_object)
>> fn_show_object = show_object;
>> - traverse_commit_list_filtered(&filter_options, &revs,
>> + traverse_commit_list_filtered(&revs.filter, &revs,
>> show_commit, fn_show_object, NULL,
>> NULL);
>
> Re your
> https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
> I was looking at how to handle the interaction between this and my
> revisions_release() series.
>
> This adds a new memory leak, which can be seen with:
>
> make SANITIZE=leak
> (cd t && ./t5532-fetch-proxy.sh -vixd)
>
> I.e. this part is new:
>
> remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
> remote: #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
> remote: #1 0x75a089 in do_xmalloc wrapper.c:41:8
> remote: #2 0x75a046 in xmalloc wrapper.c:62:9
> remote: #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
> remote: #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
> remote: #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
> remote: #6 0x4592ba in run_builtin git.c:465:11
> remote: #7 0x457d71 in handle_builtin git.c:718:3
> remote: #8 0x458ca5 in run_argv git.c:785:4
> remote: #9 0x457b30 in cmd_main git.c:916:19
> remote: #10 0x563179 in main common-main.c:56:11
> remote: #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
> remote: #12 0x4300e9 in _start (git+0x4300e9)
>
> Of course it's not "new" in the sense that we in effect leaked this
> before, but it was still reachable, you're just changing it so that
> instead of being stored in the static "filter_options" variable in
> pack-objects.c we're storing it in "struct rev_info", which has a
> different lifetime.
True, and 'struct rev_info' is not being release in any way, either,
right?
> I think instead of me rebasing my series on top of yours and tangling
> the two up a better option is to just add a change to this, so after
> list_objects_filter_copy() do:
>
> UNLEAK(revs.filter);
>
> Or, alternatively adding this to the end of the function (in which case
> Junio will need to deal with a minor textual conflict):
>
> list_objects_filter_release(&revs.filter);
>
> Both of those make my series merged with "seen" (which has this change)
> pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.
>
> You could do the same in your later change adding
> list_objects_filter_copy() to verify_bundle(), that one also adds a new
> leak, but happens not to cause test failures since the bundle.c code
> isn't otherwise marked as passing with SANITIZE=leak, it fails in
> various other ways.
>
> Obviously we should do something about the actual leak eventually, but
> that can be done in some follow-up work to finish up the missing bits of
> release_revisions(), i.e. adding list_objects_filter_release() etc. to
> release_revisions().
I understand that you like to "show your work" by marking tests as
safe for leak-check by making the smallest changes possible, but your
series has a lot of small patches that do nothing but add a free() or
release_*() call instead of implementing the "right" release_revisions()
from the start.
> So I think just adding UNLEAK() here (and optionally, also to the
> bundle.c code) is the least invasive thing, if you & Junio are OK with
> that approach.
Could you send a patch that does just that, so we are sure to cover
the warnings you are seeing in your tests?
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):
On Thu, Mar 10 2022, Derrick Stolee wrote:
> On 3/10/2022 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:
>>
>>> From: Derrick Stolee <derrickstolee@github.com>
>>>
>>> In builtin/pack-objects.c, we use a 'filter_options' global to populate
>>> the --filter=<X> argument. The previous change created a pointer to a
>>> filter option in 'struct rev_info', so we can use that pointer here as a
>>> start to simplifying some usage of object filters.
>>>
>>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>>> ---
>>> builtin/pack-objects.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index ba2006f2212..e5b7d015d7d 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>>>
>>> static int get_object_list_from_bitmap(struct rev_info *revs)
>>> {
>>> - if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
>>> + if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
>>> return -1;
>>>
>>> if (pack_options_allow_reuse() &&
>>> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>>> repo_init_revisions(the_repository, &revs, NULL);
>>> save_commit_buffer = 0;
>>> setup_revisions(ac, av, &revs, &s_r_opt);
>>> + list_objects_filter_copy(&revs.filter, &filter_options);
>>>
>>> /* make sure shallows are read */
>>> is_repository_shallow(the_repository);
>>> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>>>
>>> if (!fn_show_object)
>>> fn_show_object = show_object;
>>> - traverse_commit_list_filtered(&filter_options, &revs,
>>> + traverse_commit_list_filtered(&revs.filter, &revs,
>>> show_commit, fn_show_object, NULL,
>>> NULL);
>>
>> Re your
>> https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
>> I was looking at how to handle the interaction between this and my
>> revisions_release() series.
>>
>> This adds a new memory leak, which can be seen with:
>>
>> make SANITIZE=leak
>> (cd t && ./t5532-fetch-proxy.sh -vixd)
>>
>> I.e. this part is new:
>>
>> remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
>> remote: #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
>> remote: #1 0x75a089 in do_xmalloc wrapper.c:41:8
>> remote: #2 0x75a046 in xmalloc wrapper.c:62:9
>> remote: #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
>> remote: #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
>> remote: #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
>> remote: #6 0x4592ba in run_builtin git.c:465:11
>> remote: #7 0x457d71 in handle_builtin git.c:718:3
>> remote: #8 0x458ca5 in run_argv git.c:785:4
>> remote: #9 0x457b30 in cmd_main git.c:916:19
>> remote: #10 0x563179 in main common-main.c:56:11
>> remote: #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
>> remote: #12 0x4300e9 in _start (git+0x4300e9)
>>
>> Of course it's not "new" in the sense that we in effect leaked this
>> before, but it was still reachable, you're just changing it so that
>> instead of being stored in the static "filter_options" variable in
>> pack-objects.c we're storing it in "struct rev_info", which has a
>> different lifetime.
>
> True, and 'struct rev_info' is not being release in any way, either,
> right?
Yes, sorry to not be clear there. There's 2x other leaks just in that
one test on "master", my series addresses those, but then the 3rd
"added" here will be left behind.
>> I think instead of me rebasing my series on top of yours and tangling
>> the two up a better option is to just add a change to this, so after
>> list_objects_filter_copy() do:
>>
>> UNLEAK(revs.filter);
>>
>> Or, alternatively adding this to the end of the function (in which case
>> Junio will need to deal with a minor textual conflict):
>>
>> list_objects_filter_release(&revs.filter);
>>
>> Both of those make my series merged with "seen" (which has this change)
>> pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.
>>
>> You could do the same in your later change adding
>> list_objects_filter_copy() to verify_bundle(), that one also adds a new
>> leak, but happens not to cause test failures since the bundle.c code
>> isn't otherwise marked as passing with SANITIZE=leak, it fails in
>> various other ways.
>>
>> Obviously we should do something about the actual leak eventually, but
>> that can be done in some follow-up work to finish up the missing bits of
>> release_revisions(), i.e. adding list_objects_filter_release() etc. to
>> release_revisions().
>
> I understand that you like to "show your work" by marking tests as
> safe for leak-check by making the smallest changes possible, but your
> series has a lot of small patches that do nothing but add a free() or
> release_*() call instead of implementing the "right" release_revisions()
> from the start.
Yes, another way to do it would be to add the end-state
release_revisions() I have and incrementally add it everywhere.
The way I opted to do it admittedly results in a bit more churn, but was
(and still is) very useful to bisect and debug any changes.
I.e. I can easily pinpoint for any failures what exact rev.* member
being released caused a failure, which I couldn't do if I added the
end-state release_revisions() and then started adding it to code in
various places.
Then a failure due to adding it to say pack-objects wouldn't be easily
distinguishable from a failure due to releasing rev.SOMETHING in
pack-objects.
But in any case, the interaction with tips of these two sets of patches
(this series & mine) would be the same whatever the intra-series
progression I opted for is.
>> So I think just adding UNLEAK() here (and optionally, also to the
>> bundle.c code) is the least invasive thing, if you & Junio are OK with
>> that approach.
>
> Could you send a patch that does just that, so we are sure to cover
> the warnings you are seeing in your tests?
I'm suggesting fixing the following up into this series, the first hunk
is needed for the interaction with the release_revisions() series, the
second is there for completeness, but isn't required currently:
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1a0b0950a28..ffe6197729c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3987,6 +3987,7 @@ static void get_object_list(int ac, const char **av)
save_commit_buffer = 0;
setup_revisions(ac, av, &revs, &s_r_opt);
list_objects_filter_copy(&revs.filter, &filter_options);
+ UNLEAK(revs.filter);
/* make sure shallows are read */
is_repository_shallow(the_repository);
diff --git a/bundle.c b/bundle.c
index e359370cfcd..90cfea0c984 100644
--- a/bundle.c
+++ b/bundle.c
@@ -226,6 +226,7 @@ int verify_bundle(struct repository *r,
setup_revisions(2, argv, &revs, NULL);
list_objects_filter_copy(&revs.filter, &header->filter);
+ UNLEAK(revs.filter);
if (prepare_revision_walk(&revs))
die(_("revision walk setup failed"));
This patch series was integrated into seen via git@6b8bcc1. |
This patch series was integrated into seen via git@39a1b59. |
This patch series was integrated into seen via git@3126a82. |
This patch series was integrated into next via git@03529ea. |
There was a status update in the "Cooking" section about the branch Bundle file format gets extended to allow a partial bundle, filtered by similar criteria you would give when making a partial/lazy clone. Will merge to 'master'. source: <pull.1159.v4.git.1646841703.gitgitgadget@gmail.com> |
While discussing bundle-URIs [1], it came to my attention that bundles have no way to specify an object filter, so bundles cannot be used with partial clones.
[1] https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
This series provides a way to fix that by adding a 'filter' capability to the bundle file format and allowing one to create a partial bundle with 'git bundle create --filter=blob:none '.
There are a couple things that I want to point out about this implementation that could use some high-level feedback:
I moved the '--filter' parsing into setup_revisions() instead of adding another place to parse it. This works for 'git bundle' but it also allows it to be parsed successfully in commands such as 'git diff' which doesn't make sense. Options such as '--objects' are already being parsed there, and they don't make sense either, so I want some thoughts on this.
If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a filtered bundle, then the clone will fail with a message such as
fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e'
fatal: remote did not send all necessary objects
This might be fine. We don't expect users to clone partial bundles or fetch partial bundles into an unfiltered repo and these failures are expected. It is possible that we could put in custom logic to fail faster by reading the bundle header for a filter.
Generally, the idea is to open this up as a potential way to bootstrap a clone of a partial clone using a set of precomputed partial bundles.
Updates in v4
'struct rev_info' now has 'filter' embedded statically, using filter.choice to indicate if it is an empty filter. This makes the range-diff look really messy, as lots of '&' characters are inserted, especially in the middle patches. The end result looks very similar.
To accommodate previous lines that were a pointer copy, create list_objects_filter_copy() to assist with deep copies of filters.
Commit message typo fixes.
Documentation improvements.
Tests use 'sed' over 'head' to be more robust to future changes.
Initialization of a 'struct traversal_context' is made more compact.
"filter.choice != LOFC_DISABLED" is replaced by "filter.choice", since LOFC_DISABLED is zero by design.
Updates in v3
'struct bundle_header' now has 'filter' embedded statically, using filter.choice to indicate if it is an empty filter.
list-objects.c is now more robust to NULL function pointers.
Updates in v2
Thanks for the reviews, Jeff, Junio, and Ævar!
Commit message typos and grammar are improved.
Grammar in MyFirstObjectWalk.txt is improved.
Unnecessary line wrapping is unwrapped.
Final test to check unbundled repo is made more rigorous.
The new 'filter' capability is added to Documentation/technical/bundle-format.txt
Expanded docs for 'git bundle verify'.
Moved API docs gently_parse_list_objects_filter() to header.
Test name swaps '' with "" to evaluate $filter.
Added a new patch that helps
git clone <bundle>
fail gracefully when<bundle>
is has a filter capability.Thanks,
-Stolee
cc: stolee@gmail.com
cc: avarab@gmail.com
cc: gitster@pobox.com
cc: zhiyou.jx@alibaba-inc.com
cc: jonathantanmy@google.com
cc: Jeff Hostetler git@jeffhostetler.com