Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object checking related additions and fixes for bundles in fetches #1730

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

blanet
Copy link

@blanet blanet commented May 15, 2024

While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:

  1. In the bundle-uri scenario, object IDs were not validated before
    writing bundle references. This was the root cause of the original
    negotiation bug in bundle-uri and could lead to potential repository
    corruption.
  2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
    configurations were not applied when directly fetching bundles or
    fetching with bundle-uri enabled. In fact, there were no object
    validation supports for unbundle.

The first patch addresses the bundle-uri negotiation issue by removing
the REF_SKIP_OID_VERIFICATION flag when writing bundle references.

Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle
to add support for object validation (fsck) in fetch scenarios, mainly
following the suggestions from Junio and Patrick on the mailing list.

cc: Patrick Steinhardt ps@pks.im
cc: Karthik Nayak karthik.188@gmail.com

@blanet
Copy link
Author

blanet commented May 15, 2024

/preview

Copy link

gitgitgadget bot commented May 15, 2024

Preview email sent as pull.1730.git.1715741156686.gitgitgadget@gmail.com

@blanet
Copy link
Author

blanet commented May 15, 2024

/submit

Copy link

gitgitgadget bot commented May 15, 2024

Submitted as pull.1730.git.1715742069966.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1

To fetch this version to local tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1

Copy link

gitgitgadget bot commented May 17, 2024

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>

Long time no see :)

> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
> 
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
> 
>   git bundle create base.bundle main
> 
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
> 
>   git bundle create incr.bundle A..main
> 
> Now we can generate a bundle list with the following content:
> 
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
> 
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
> 
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
> 
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
> 
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
> 
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
> 
> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.

And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.

Okay, the explanation feels sensible.

> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.

This makes me wonder though. Do we really need to call
`reprepare_packed_git()` once for every bundle, or can't we instead call
it once at the end once we have fetched all bundles? Reading on.

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
> 
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>  
> +	reprepare_packed_git(r);
> +

So what's hidden here is that `unbundle_from_file()` will also try to
access the bundle's refs right away. Surprisingly, we do so by calling
`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
effect that we basically accept arbitrary object IDs here even if we do
not know those. That's why we didn't have to `reprepare_packed_git()`
before this change.

Now there are two conflicting thoughts here:

  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
    should now be accessible.

  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
    instead call it once after we have fetched all bundles.

The second one feels a bit like premature optimization to me. But the
first item does feel like it could help us to catch broken bundles
because we wouldn't end up creating refs for objects that neither we nor
the bundle have.

Patrick

Copy link

gitgitgadget bot commented May 17, 2024

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

Copy link

gitgitgadget bot commented May 17, 2024

On the Git mailing list, Karthik Nayak wrote (reply to this):

"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
>   git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
>   git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
>
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
>
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to

s/a//

>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
>

So the flow is:
1. `fetch_bundle_list` fetches all the bundles advertised via
`download_bundle_list` to local files.
2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
3. Each bundle is then unbundled using `unbundle_from_file`.
4. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle. Then it creates the index from the bundle file.

So because the objects are being checked, the `prepare_packed_git`
function is eventually called, which means that the
`raw_object_store->packed_git` data gets filled in and
`packed_git_initialized` is set.

This means consecutive calls to `prepare_packed_git` doesn't
re-initiate `raw_object_store->packed_git` since
`packed_git_initialized` already is set.

So your explanation makes sense, as a _nit_ I would perhaps add the part
about why consecutive calls to `prepare_packed_git` are ineffective.

> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,

s/unbundled/unbundling

>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.
>
> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.
>

The solution makes sense.

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>
> +	reprepare_packed_git(r);
> +
>

Would it make sense to move this to `bundle.c:unbundle()`, since that is
also where the idx is created?

[snip]

Copy link

gitgitgadget bot commented May 17, 2024

User Karthik Nayak <karthik.188@gmail.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented May 17, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <ps@pks.im> writes:

> Now there are two conflicting thoughts here:
>
>   - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>     should now be accessible.
>
>   - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>     instead call it once after we have fetched all bundles.
>
> The second one feels a bit like premature optimization to me. But the
> first item does feel like it could help us to catch broken bundles
> because we wouldn't end up creating refs for objects that neither we nor
> the bundle have.

I like the way your thoughts are structured around here.

I do agree that the latter is a wrong approach---we shouldn't be
trusting what came from elsewhere over the network without first
checking.  We should probably be running the "index-pack --fix-thin"
the unbundling process runs with also the "--fsck-objects" option if
we are not doing so already, and even then, we should make sure that
the object we are making our ref point at have everything behind it.

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch 3 times, most recently from 3d2d2b6 to 2f96966 Compare May 20, 2024 06:12
Copy link

gitgitgadget bot commented May 20, 2024

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Long time no see :)

Glad to see you again here :)

>> 
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>> 
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>>    check its prerequisites if any. The verify procedure would find oids
>>    so `packed_git` is initialized.
>> 
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>>    during which `mark_complete_and_common_ref` and `mark_tips` would
>>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>>    enlisted if `packed_git` has already initialized in 1.
>
>And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

I think so. For clones or fetches without using bundle-uri, we can hardly
encounter the case that new packs are added during the negotiation process.
So using `OBJECT_INFO_QUICK` can get some performance gain.

>
>> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
>> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
>> can not find `B` for negotiation at a latter time bacause `B` exists in
>> `incr.pack` which is not enlisted in `packed_git`.
>
>Okay, the explanation feels sensible.
>
>> This commit fixes this by adding a `reprepare_packed_git` call for every
>> successfully unbundled bundle, which ensures to enlist all generated
>> packs from bundle uri. And a set of negotiation related tests are added.
>
>This makes me wonder though. Do we really need to call
>`reprepare_packed_git()` once for every bundle, or can't we instead call
>it once at the end once we have fetched all bundles? Reading on.
>
>> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
>> ---
>>     bundle-uri: refresh packed_git if unbundle succeed
>> 
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>> 
>>  bundle-uri.c                |   3 +
>>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 132 insertions(+)
>> 
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index ca32050a78f..2b9d36cfd8e 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -7,6 +7,7 @@
>>  #include "refs.h"
>>  #include "run-command.h"
>>  #include "hashmap.h"
>> +#include "packfile.h"
>>  #include "pkt-line.h"
>>  #include "config.h"
>>  #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  			       VERIFY_BUNDLE_QUIET)))
>>  		return 1;
>>  
>> +	reprepare_packed_git(r);
>> +
>
>So what's hidden here is that `unbundle_from_file()` will also try to
>access the bundle's refs right away. Surprisingly, we do so by calling
>`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
>effect that we basically accept arbitrary object IDs here even if we do
>not know those. That's why we didn't have to `reprepare_packed_git()`
>before this change.

You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and
the negotiation works as expected.

After some further digging I find that without `REF_SKIP_OID_VERIFICATION`,
both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare`
for reftable backend would call `parse_object` to check the oid. `parse_object`
can help refresh `packed_git` via `reprepare_packed_git`.

>
>Now there are two conflicting thoughts here:
>
>  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>    should now be accessible.
>
>  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>    instead call it once after we have fetched all bundles.
>
>The second one feels a bit like premature optimization to me. But the
>first item does feel like it could help us to catch broken bundles
>because we wouldn't end up creating refs for objects that neither we nor
>the bundle have.

I favor the first approach because a validation on the object IDs we are
writing is a safe guard . And the flag itself was designed to be used in
testing scenarios.

/*
 * Blindly write an object_id. This is useful for testing data corruption
 * scenarios.
 */
#define REF_SKIP_OID_VERIFICATION (1 << 10)

Copy link

gitgitgadget bot commented May 20, 2024

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-05-17 15:36:53, "Karthik Nayak" <karthik.188@gmail.com> wrote:
>"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>>
>> When using the bundle-uri mechanism with a bundle list containing
>> multiple interrelated bundles, we encountered a bug where tips from
>> downloaded bundles were not being discovered, resulting in rather slow
>> clones. This was particularly problematic when employing the heuristic
>> `creationTokens`.
>>
>> And this is easy to reproduce. Suppose we have a repository with a
>> single branch `main` pointing to commit `A`, firstly we create a base
>> bundle with
>>
>>   git bundle create base.bundle main
>>
>> Then let's add a new commit `B` on top of `A`, so that an incremental
>> bundle for `main` can be created with
>>
>>   git bundle create incr.bundle A..main
>>
>> Now we can generate a bundle list with the following content:
>>
>>   [bundle]
>>       version = 1
>>       mode = all
>>       heuristic = creationToken
>>
>>   [bundle "base"]
>>       uri = base.bundle
>>       creationToken = 1
>>
>>   [bundle "incr"]
>>       uri = incr.bundle
>>       creationToken = 2
>>
>> A fresh clone with the bundle list above would give the expected
>> `refs/bundles/main` pointing at `B` in new repository, in other words we
>> already had everything locally from the bundles, but git would still
>> download everything from server as if we got nothing.
>>
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>>
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>
>s/a//

Thanks!

>
>>    check its prerequisites if any. The verify procedure would find oids
>>    so `packed_git` is initialized.
>>
>
>So the flow is:
>1. `fetch_bundle_list` fetches all the bundles advertised via
>`download_bundle_list` to local files.
>2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
>3. Each bundle is then unbundled using `unbundle_from_file`.
>4. Here, we first read the bundle header to get all the prerequisites
>for the bundle, this is done in `read_bundle_header`.
>5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
>the repository does indeed contain the prerequisites mentioned in the
>bundle. Then it creates the index from the bundle file.
>
>So because the objects are being checked, the `prepare_packed_git`
>function is eventually called, which means that the
>`raw_object_store->packed_git` data gets filled in and
>`packed_git_initialized` is set.
>
>This means consecutive calls to `prepare_packed_git` doesn't
>re-initiate `raw_object_store->packed_git` since
>`packed_git_initialized` already is set.
>
>So your explanation makes sense, as a _nit_ I would perhaps add the part
>about why consecutive calls to `prepare_packed_git` are ineffective.

Thanks my friend, you have expressed this issue more clearly. I will
post a new description based on your explanation with the creationToken
case covered.

>
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>
>s/unbundled/unbundling

Copy that.

>
>> +#include "packfile.h"
>>  #include "pkt-line.h"
>>  #include "config.h"
>>  #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  			       VERIFY_BUNDLE_QUIET)))
>>  		return 1;
>>
>> +	reprepare_packed_git(r);
>> +
>>
>
>Would it make sense to move this to `bundle.c:unbundle()`, since that is
>also where the idx is created?
>

I wonder if we need a mental model that we should `reprepare_packed_git`
that when a new pack and its corresponding idx is generated? Currently
whether to call `reprepare_packed_git` is determined by the caller.

But within the scope of this bug, I tend to remove the
`REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested.

Xing Xin

Copy link

gitgitgadget bot commented May 20, 2024

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-05-18 00:14:47, "Junio C Hamano" <gitster@pobox.com> wrote:
>Patrick Steinhardt <ps@pks.im> writes:
>
>> Now there are two conflicting thoughts here:
>>
>>   - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>>     should now be accessible.
>>
>>   - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>>     instead call it once after we have fetched all bundles.
>>
>> The second one feels a bit like premature optimization to me. But the
>> first item does feel like it could help us to catch broken bundles
>> because we wouldn't end up creating refs for objects that neither we nor
>> the bundle have.
>
>I like the way your thoughts are structured around here.
>
>I do agree that the latter is a wrong approach---we shouldn't be
>trusting what came from elsewhere over the network without first
>checking.  We should probably be running the "index-pack --fix-thin"
>the unbundling process runs with also the "--fsck-objects" option if
>we are not doing so already, and even then, we should make sure that
>the object we are making our ref point at have everything behind it.

Currently `unbundle` and all its callers are not adding "--fsck-objects".
There is a param `extra_index_pack_args` for `unbundle` but it is
mainly used for progress related options.

Personally I think data from bundles and data received via network
should be treated equally. For "fetch-pack" we now have some configs
such as  "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
behavior, these configs are invisible when we are fetching bundles.

So for bundles I think we have some different ways to go:

  - Acknowledge the configs mentioned above and behave like
    "fetch-pack".

  - Add "--fsck-objects" as a default in `unbundle`.

  - In `unbundle_from_file`, pass in "--fsck-objects" as
    `extra_index_pack_args` for `unbundle` so this only affect the
    bundle-uri related process.

What do you think?

Xing Xin

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from 2f96966 to 8bdeacf Compare May 20, 2024 12:17
@blanet
Copy link
Author

blanet commented May 20, 2024

/submit

Copy link

gitgitgadget bot commented May 20, 2024

Submitted as pull.1730.v2.git.1716208605926.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2

To fetch this version to local tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2

Copy link

gitgitgadget bot commented May 20, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Xing Xin" <bupt_xingxin@163.com> writes:

> Personally I think data from bundles and data received via network
> should be treated equally.

Yup, that is not personal ;-) but is universally accepted as a good
discipline.  In the case of bundle-uri, the bundle came over the
network so it is even more true that they should be treated the
same.

> For "fetch-pack" we now have some configs
> such as  "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
> behavior, these configs are invisible when we are fetching bundles.

When fetching over network, transport.c:fetch_refs_via_pack() calls
fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the
configuration variables are honored there.  It appears that the
transport layer is unaware of the .fsckobjects configuration knobs.

When fetching from a bundle, transport.c:fetch_refs_from_bundle()
calls bundle.c:unbundle().  This function has three callers, i.e.
"git bundle unbundle", normal fetching from a bundle, and more
recently added bundle-uri codepaths.  

I think one reasonable approach to take is to add an extra parameter
that takes one of three values: (never, use-config, always), and
conditionally add "--fsck-objects" to the command line of the
index-pack.  Teach "git bundle unbundle" the "--fsck-objects" option
so that it can pass 'never' or 'always' from the command line, and
pass 'use-config' from the code paths for normal fetching from a
budnle and bundle-uri.

To implement use-config, you'd probably need to refactor a small
part of fetch-pack.c:get_pack()

	if (fetch_fsck_objects >= 0
	    ? fetch_fsck_objects
	    : transfer_fsck_objects >= 0
	    ? transfer_fsck_objects
	    : 0)
		fsck_objects = 1;

into a public function (to support a caller like unbundle() that
comes from sideways, the new function may also need to call
fetch_pack_setup() to prime them).

A patch series may take a structure like so:

 * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in
   bundle.h, have bundle.c:unbundle() accept a new parameter of that
   type, and conditionally add "--fsck-objects" to its call to
   "index-pack".  "git bundle unbundle" can pass 'never' to its
   invocation to unbundle() as an easy way to test it.  For the
   other two callers, we can start by passing 'always'.

 * (optional) teach "git bundle unbundle" a new "--fsck-objects"
   option to allow passing 'always' to its call to unbundle().  With
   that, add tests to feed it a bundle with questionable objects in
   it and make sure that unbundling notices.

 * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer
   configuration logic available to external callers.

 * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to
   react to the value by calling the helper function you introduced
   in the previous step.

Thanks.

Copy link

gitgitgadget bot commented May 21, 2024

On the Git mailing list, Karthik Nayak wrote (reply to this):

"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>
>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not discovered, thus resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
>   git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
>   git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
>
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
>
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. Bundles in bundle list are downloaded to local files via
>    `download_bundle_list` or via `fetch_bundles_by_token` for the
>    creationToken heuristic case.
> 2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
>    is called by `unbundle_all_bundles` or called within
>    `fetch_bundles_by_token` for the creationToken heuristic case.
> 3. Here, we first read the bundle header to get all the prerequisites
>    for the bundle, this is done in `read_bundle_header`.
> 4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
>    the repository does indeed contain the prerequisites mentioned in the
>    bundle.
> 5. The `verify_bundle` will call `parse_object`, within which the
>    `prepare_packed_git` or `reprepare_packed_git` is eventually called,
>    which means that the `raw_object_store->packed_git` data gets filled
>    in and ``packed_git_initialized` is set. This also means consecutive
>    calls to `prepare_packed_git` doesn't re-initiate
>    `raw_object_store->packed_git` since `packed_git_initialized` already
>    is set.
> 6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
>    with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
>    target arbitrary objects are written to the repository.
> 7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
>    `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
>    tips. Here it won't call `reprepare_packed_git` anymore so it would
>    fail to parse oids that only reside in the last bundle.
>
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. While

s/bacause/because

[snip]

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 91b3319a5c1..65666a11d9c 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  		refs_update_ref(get_main_ref_store(the_repository),
>  				"fetched bundle", bundle_ref.buf, oid,
>  				has_old ? &old_oid : NULL,
> -				REF_SKIP_OID_VERIFICATION,
> -				UPDATE_REFS_MSG_ON_ERR);
> +				0, UPDATE_REFS_MSG_ON_ERR);
>  	}
>
>  	bundle_header_release(&header);
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 1ca5f745e73..a5b04d6f187 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
>  test_expect_success 'create bundle' '
>  	git init clone-from &&
>  	git -C clone-from checkout -b topic &&
> +
>  	test_commit -C clone-from A &&
> +	git -C clone-from bundle create A.bundle topic &&
> +
>  	test_commit -C clone-from B &&
>  	git -C clone-from bundle create B.bundle topic
>  '
> @@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
>  	! grep "refs/bundles/" refs
>  '
>
> +#########################################################################
> +# Clone negotiation related tests begin here
> +
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="clone-from/A.bundle" \
> +		clone-from nego-bundle-part &&
> +	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# Ensure that refs/bundles/topic are sent as "have".
> +	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle with all wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --single-branch --branch=topic --no-tags \
> +		--bundle-uri="clone-from/B.bundle" \
> +		clone-from nego-bundle-all &&
> +	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (no heuristic)' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-no-heuristic &&
> +
> +	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (creationToken)' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-heuristic &&
> +
> +	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list with all wanted commits' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --single-branch --branch=left --no-tags \
> +		--bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-all &&
> +
> +	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt
> +'
> +
>  #########################################################################
>  # HTTP tests begin here
>
>
> base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
> --
> gitgitgadget

This update looks good to me, Thanks.

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch 3 times, most recently from 992e226 to fa92198 Compare May 27, 2024 13:15
@blanet
Copy link
Author

blanet commented May 27, 2024

/preview

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from fa92198 to c19b8f6 Compare May 27, 2024 13:29
Copy link

gitgitgadget bot commented May 27, 2024

Preview email sent as pull.1730.v3.git.1716816589.gitgitgadget@gmail.com

@blanet
Copy link
Author

blanet commented May 27, 2024

/preview

Copy link

gitgitgadget bot commented May 27, 2024

Preview email sent as pull.1730.v3.git.1716824184.gitgitgadget@gmail.com

@blanet blanet changed the title bundle-uri: refresh packed_git if unbundle succeed object verification related additions and fixes for bundles in fetches May 27, 2024
@blanet blanet changed the title object verification related additions and fixes for bundles in fetches object checking related additions and fixes for bundles in fetches May 27, 2024
@blanet
Copy link
Author

blanet commented May 30, 2024

/preview

Copy link

gitgitgadget bot commented May 30, 2024

Preview email sent as pull.1730.v4.git.1717057012.gitgitgadget@gmail.com

@blanet
Copy link
Author

blanet commented May 30, 2024

/submit

Copy link

gitgitgadget bot commented May 30, 2024

Submitted as pull.1730.v4.git.1717057290.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v4

To fetch this version to local tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v4

@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, May 30, 2024 at 08:21:28AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>

Tiny nit: the important change in this commit is not that you wire up
the flag, but rather that we start to execute git-fsck(1) now. I'd thus
propose to adapt the commit title accordingly.

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-06-06 20:06:41, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Thu, May 30, 2024 at 08:21:28AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Tiny nit: the important change in this commit is not that you wire up
>the flag, but rather that we start to execute git-fsck(1) now. I'd thus

To be precise, it is adding a "--fsck-objects" flag to "git-index-pack".

>propose to adapt the commit title accordingly.

This commit is mapped to [PATCH v5 3/4] due to some adjustments to the
commit order and implementation details. Hope the new title can better
describe the new patch.

Xing Xin

@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, May 30, 2024 at 08:21:30AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>

Same here, the important part is not that we introduce the flag, but
that we start using it in `unbundle_from_file()`.

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 066ff788104..e7ebac6ce57 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  	 * the prerequisite commits.
>  	 */
>  	if ((result = unbundle(r, &header, bundle_fd, NULL,
> -			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
> +			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
>  		return 1;
>  
>  	/*

One thing that is a bit weird is that we first change `unbundle()` to
use `FSCK_ALWAYS` in a preceding patch, and then convert it to use
`FSCK_FOLLOW_FETCH` in the same series. It could be restructured a bit
to first introduce the flags, only, while not modifying any of the
callsites yet. Passing the respective flags would then be done in a
separate commit.

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-06-06 20:06:47, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Thu, May 30, 2024 at 08:21:30AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Same here, the important part is not that we introduce the flag, but
>that we start using it in `unbundle_from_file()`.
>
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index 066ff788104..e7ebac6ce57 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  	 * the prerequisite commits.
>>  	 */
>>  	if ((result = unbundle(r, &header, bundle_fd, NULL,
>> -			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
>> +			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
>>  		return 1;
>>  
>>  	/*
>
>One thing that is a bit weird is that we first change `unbundle()` to
>use `FSCK_ALWAYS` in a preceding patch, and then convert it to use
>`FSCK_FOLLOW_FETCH` in the same series. It could be restructured a bit
>to first introduce the flags, only, while not modifying any of the
>callsites yet. Passing the respective flags would then be done in a
>separate commit.

This makes sense to me, thanks!

Xing Xin

Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches and fetches with _bundle-uri_ enabled.

This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, it will also be used by
`bundle.c:unbundle` to better fit fetching scenarios.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from 68b9bca to cd34aa5 Compare June 11, 2024 05:01
@blanet
Copy link
Author

blanet commented Jun 11, 2024

/preview

Copy link

gitgitgadget bot commented Jun 11, 2024

Preview email sent as pull.1730.v5.git.1718085353.gitgitgadget@gmail.com

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from cd34aa5 to eb9f21f Compare June 11, 2024 06:31
@blanet
Copy link
Author

blanet commented Jun 11, 2024

/submit

Copy link

gitgitgadget bot commented Jun 11, 2024

Submitted as pull.1730.v5.git.1718088126.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v5

To fetch this version to local tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v5:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v5

@@ -17,6 +17,7 @@
#include "list-objects-filter-options.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, Jun 11, 2024 at 06:42:05AM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
> 
> This commit extends object verification support in `bundle.c:unbundle`
> by adding two new options to `verify_bundle_flags`:
> 
> - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
>   objects. It will be used to add "--fsck-objects" support for "git
>   bundle unbundle" in a separate series.
> - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
>   operations, specifically for direct bundle fetches and _bundle-uri_
>   enabled fetches. When enabled, `bundle.c:unbundle` invokes
>   `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
>   checks for broken objects. Passing this flag during fetching will be
>   implemented in a subsequent commit.
> 
> Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.

Thanks, the new sequence of commits is much easier to follow. It also
shows that there is no user of `VERIFY_BUNDLE_FSCK_ALWAYS` at the end of
this series. So maybe we should drop that flag?

If you do that, then I'd also propose to merge patches 2 and 3 into one
given that both are quite trivial and related to each other.

Other than that this series looks good to me.

Patrick

> Reviewed-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>  bundle.c | 10 ++++++++++
>  bundle.h |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/bundle.c b/bundle.c
> index 95367c2d0a0..53ac73834ea 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -17,6 +17,7 @@
>  #include "list-objects-filter-options.h"
>  #include "connected.h"
>  #include "write-or-die.h"
> +#include "fetch-pack.h"
>  
>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -615,6 +616,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
>  	     enum verify_bundle_flags flags)
>  {
>  	struct child_process ip = CHILD_PROCESS_INIT;
> +	int fsck_objects = 0;
>  
>  	if (verify_bundle(r, header, flags))
>  		return -1;
> @@ -625,6 +627,14 @@ int unbundle(struct repository *r, struct bundle_header *header,
>  	if (header->filter.choice)
>  		strvec_push(&ip.args, "--promisor=from-bundle");
>  
> +	if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
> +		fsck_objects = 1;
> +	else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
> +		fsck_objects = fetch_pack_fsck_objects();
> +
> +	if (fsck_objects)
> +		strvec_push(&ip.args, "--fsck-objects");
> +
>  	if (extra_index_pack_args) {
>  		strvec_pushv(&ip.args, extra_index_pack_args->v);
>  		strvec_clear(extra_index_pack_args);
> diff --git a/bundle.h b/bundle.h
> index 021adbdcbb3..a39d8ea1a7e 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -33,6 +33,8 @@ int create_bundle(struct repository *r, const char *path,
>  enum verify_bundle_flags {
>  	VERIFY_BUNDLE_VERBOSE = (1 << 0),
>  	VERIFY_BUNDLE_QUIET = (1 << 1),
> +	VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
> +	VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
>  };
>  
>  int verify_bundle(struct repository *r, struct bundle_header *header,
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-06-11 17:11:09, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Tue, Jun 11, 2024 at 06:42:05AM +0000, Xing Xin via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>> 
>> This commit extends object verification support in `bundle.c:unbundle`
>> by adding two new options to `verify_bundle_flags`:
>> 
>> - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
>>   objects. It will be used to add "--fsck-objects" support for "git
>>   bundle unbundle" in a separate series.
>> - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
>>   operations, specifically for direct bundle fetches and _bundle-uri_
>>   enabled fetches. When enabled, `bundle.c:unbundle` invokes
>>   `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
>>   checks for broken objects. Passing this flag during fetching will be
>>   implemented in a subsequent commit.
>> 
>> Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
>> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
>
>Thanks, the new sequence of commits is much easier to follow. It also
>shows that there is no user of `VERIFY_BUNDLE_FSCK_ALWAYS` at the end of
>this series. So maybe we should drop that flag?

OK, let's focus on the fetching scenarios in this patch series.

>If you do that, then I'd also propose to merge patches 2 and 3 into one
>given that both are quite trivial and related to each other.

I merged patches 3 and 4 instead to combine the new option definition and usage
logic, as the two are more closely related and more trivial. :)

Xing Xin

>Other than that this series looks good to me.
>
>Patrick
>

@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from eb9f21f to 81a6782 Compare June 11, 2024 11:48
This commit extends object verification support for fetches in
`bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
option to `verify_bundle_flags`. When this option is enabled,
`bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
determine whether to append the "--fsck-objects" flag to
"git-index-pack".

`VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
fetching process, including:

- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.

This addition ensures a consistent logic for object verification during
fetch operations. Tests have been added to confirm functionality in the
scenarios mentioned above.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
@blanet blanet force-pushed the xx/bundle-uri-bug-using-bundle-list branch from 81a6782 to 53395e8 Compare June 11, 2024 12:14
@blanet
Copy link
Author

blanet commented Jun 11, 2024

/submit

Copy link

gitgitgadget bot commented Jun 11, 2024

Submitted as pull.1730.v6.git.1718109943.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v6

To fetch this version to local tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v6:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v6

Copy link

gitgitgadget bot commented Jun 11, 2024

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, Jun 11, 2024 at 12:45:40PM +0000, blanet via GitGitGadget wrote:
> While attempting to fix a reference negotiation bug in bundle-uri, we
> identified that the fetch process lacks some crucial object validation
> checks when processing bundles. The primary issues are:
> 
>  1. In the bundle-uri scenario, object IDs were not validated before writing
>     bundle references. This was the root cause of the original negotiation
>     bug in bundle-uri and could lead to potential repository corruption.
>  2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
>     configurations were not applied when directly fetching bundles or
>     fetching with bundle-uri enabled. In fact, there were no object
>     validation supports for unbundle.
> 
> The first patch addresses the bundle-uri negotiation issue by removing the
> REF_SKIP_OID_VERIFICATION flag when writing bundle references.
> 
> Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
> support for object validation (fsck) in fetch scenarios, mainly following
> the suggestions from Junio and Patrick on the mailing list.

Thanks, this version looks good to me.

Patrick

@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
refs_update_ref(get_main_ref_store(the_repository),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag

"Fix the bug by ...".

> when writing bundle refs. When `refs.c:refs_update_ref` is called to to

"to to"

> write the corresponding bundle refs, it triggers
> `refs.c:ref_transaction_commit`.  This, in turn, invokes
> `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
> the refs storage backend. For files backend, this function is
> `files-backend.c:files_transaction_prepare`, and for reftable backend,
> it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
> functions eventually call `object.c:parse_object`, which can invoke
> `packfile.c:reprepare_packed_git` to refresh `packed_git`.

Nice.  That does sound like the right fix.  The one who did
something to _require_ us to reprepare causes the packed_git
list refreshed.

>  test_expect_success 'create bundle' '
>  	git init clone-from &&
> -	git -C clone-from checkout -b topic &&
> -	test_commit -C clone-from A &&
> -	test_commit -C clone-from B &&
> -	git -C clone-from bundle create B.bundle topic
> +	(
> +		cd clone-from &&
> +		git checkout -b topic &&
> +
> +		test_commit A &&
> +		git bundle create A.bundle topic &&
> +
> +		test_commit B &&
> +		git bundle create B.bundle topic &&
> +
> +		# Create a bundle with reference pointing to non-existent object.
> +		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle

I suspect that this would be terribly unportable.  The early part of
a bundle file may be text and sed may be able to grok but are you
sure everybody's implementation of sed would not barf (or even
worse, corrupt) the pack stream data that follows?

The code used in t/lib-bundle.sh:convert_bundle_to_pack() has been
in use since 8315588b (bundle: fix wrong check of read_header()'s
return value & add tests, 2007-03-06), so munging the bundle with
a code similar to it may have a better portability story.

Add something like:

        corrupt_bundle_header () {
                sed -e '/^$/q' "$@"
                cat
        }

to t/lib-bundle.sh, which can take an arbitrary sequence of command
line parameters to drive "sed", and can be used like so:

	corrupt_bundle_header \
		-e "s/^$(git rev-parse A) /$(git rev-parse B) /" \
		<A.bndl >B.bndl

perhaps?

> @@ -33,6 +42,16 @@ test_expect_success 'clone with path bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone with bundle that has bad header' '
> +	git clone --bundle-uri="clone-from/bad-header.bundle" \
> +		clone-from clone-bad-header 2>err &&
> +	# Write bundle ref fails, but clone can still proceed.
> +	commit_b=$(git -C clone-from rev-parse B) &&
> +	test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
> +	git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
> +	! grep "refs/bundles/" refs
> +'
> +

So this is the test the proposed log message discussed.  The
description gave a false impression that the "broken header" test
that has not much to do with the bug being fixed was the only added
test---we probably want to correct that impression.

> @@ -259,6 +278,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
>  	! grep "refs/bundles/" refs
>  '
>  
> +#########################################################################
> +# Clone negotiation related tests begin here

Drop this divider and comment.  The HTTP tests you see below has a
much better reason to be separated like that in order to warn test
writers (they shouldn't add their random new tests after that point,
because everything after that one is skipped when HTTPD tests are
disabled---see the beginning of t/lib-httpd.sh which is included
after that divider line), but everything here you added is not
special.  Everybody should run these tests.

> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&

Do not overly depend on glob not matching at this point when you
establish the when-finished handler (or glob matching the files you
want to catch and later test not adding anything you would want to
clean).  Quote "rm -f trace*.txt" and drop "r" if you do not absolutely
need it (and I would imagine you don't, given the .txt suffix).

> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="clone-from/A.bundle" \
> +		clone-from nego-bundle-part &&
> +	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF

Hmph, if the expected pattern is only a few lines without any magic,

	test_write_lines >expect refs/bundles/topic &&

may be easier to follow.

> +	test_cmp expect actual &&
> +	# Ensure that refs/bundles/topic are sent as "have".
> +	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'

Using "test_grep" would make it easier to diagnose when test breaks.
A failing "grep" will be silent (i.e., "I didn't find anything you
told me to look for").  A failing "test_grep" will tell you "I was
told to find this, but didn't find any in that".

> +test_expect_success 'negotiation: bundle with all wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --single-branch --branch=topic --no-tags \
> +		--bundle-uri="clone-from/B.bundle" \
> +		clone-from nego-bundle-all &&
> +	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (no heuristic)' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +	EOF

OK.  This is a good use of here-doc (as opposed to test_write_lines
I sugested earlier for different purposes).  I wondered if these
$(pwd) and file://$(pwd) are safe (I always get confused by the need
to sometimes use $PWD to help Windows), but I see them used in what
Derrick wrote in this file, so they must be fine.

But there may be characters in the leading part of $(pwd) that we do
not control that needs quoting (like a double quote '"').  The value
of bundle.*.uri may need to be quoted a bit carefully.  This is not
a new problem this patch introduces, so you do not have to rewrite
this part of the patch; I'll mark it with #leftoverbits here---the
idea being somebody else who is too bored can come back, see if it
is truly broken, and fix them after all dust settles.

Abusing "git config -f bundle-list" might be safer, e.g.

	$ git config -f bundle.list bundle.bundle-1.uri \
		'file:///home/abc"def/t/trash dir/clone-from/b1.bndl'
	$ cat bundle.list
        [bundle "bundle-1"]
                uri = file:///home/abc\"def/t/trash dir/clone-from/b1.bndl

as you do not know what other garbage character is in $(pwd) part.

> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt

Just FYI, to negate test_grep, use

	test_grep ! "clone > want " trace-packet.txt

not	

	! test_grep "clone > want " trace-packet.txt ;# WRONG

@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
strvec_push(&cmd.args, alternate_shallow_file);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -	if (fetch_fsck_objects >= 0
> -	    ? fetch_fsck_objects
> -	    : transfer_fsck_objects >= 0
> -	    ? transfer_fsck_objects
> -	    : 0)
> -		fsck_objects = 1;
> +	fsck_objects = fetch_pack_fsck_objects();
> ...
> +int fetch_pack_fsck_objects(void)
> +{
> +	fetch_pack_setup();

OK, this one is designed to be as lightweight as possible when it
has already been called, so a potentially duplicated call would not
cause too much worries here.

> +	if (fetch_fsck_objects >= 0)
> +		return fetch_fsck_objects;
> +	if (transfer_fsck_objects >= 0)
> +		return transfer_fsck_objects;
> +	return 0;
> +}

Much easier to follow.  Nicely done.

@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
* the prerequisite commits.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>
>
> This commit extends object verification support for fetches in
> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
> option to `verify_bundle_flags`. When this option is enabled,
> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
> determine whether to append the "--fsck-objects" flag to
> "git-index-pack".

Please start your proposed log message by stating what the perceived
problem without this patch in the current world is.  Without it, you
cannot easily answer the most basic question: "why are we doing this?"

The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
VERIFY_BUNDLE part is common across various flags, so what is
specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
we convey the fact that we do a bit more than the normal
VERIFY_BUNDLE (which is, to read the prerequisite headers and make
sure we have them in the sense that they are reachable from our
refs) with the word FSCK.

But is it necessary (or even a good idea) to limit its usability
with "FOLLOW_FETCH" (which does not look even grammatical)?  Aren't
we closing the door to other folks who may want to do a more
thorough fsck-like checks in other codepaths by saying "if you are
not doing this immediately after you fetch, you are unwelcome to use
this flag"?

> `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
> fetching process, including:
>
> - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
> - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
>
> This addition ensures a consistent logic for object verification during
> fetch operations. Tests have been added to confirm functionality in the
> scenarios mentioned above.
>
> Reviewed-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>  bundle-uri.c                |  2 +-
>  bundle.c                    |  5 +++++
>  bundle.h                    |  1 +
>  t/t5558-clone-bundle-uri.sh | 35 ++++++++++++++++++++++++++++++++++-
>  t/t5607-clone-bundle.sh     | 33 +++++++++++++++++++++++++++++++++
>  transport.c                 |  2 +-
>  6 files changed, 75 insertions(+), 3 deletions(-)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 65666a11d9c..e7ebac6ce57 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  	 * the prerequisite commits.
>  	 */
>  	if ((result = unbundle(r, &header, bundle_fd, NULL,
> -			       VERIFY_BUNDLE_QUIET)))
> +			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
>  		return 1;

OK (modulo the flag name).

> +	if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
> +		if (fetch_pack_fsck_objects())
> +			strvec_push(&ip.args, "--fsck-objects");
> +

OK, that's quite straight-forward.  We are running "index-pack
--fix-thin --stdin" and feeding the bundle data to it.  We just tell
it to also work in the "--fsck-objects" mode.

> diff --git a/transport.c b/transport.c
> index 0ad04b77fd2..6cd5683bb45 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
>  	if (!data->get_refs_from_bundle_called)
>  		get_refs_from_bundle_inner(transport);
>  	ret = unbundle(the_repository, &data->header, data->fd,
> -		       &extra_index_pack_args, 0);
> +		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);

OK.

I wonder if something like this is a potential follow-up topic
somebody may be interested in after the dust settles.  That is
exactly why the name of this bit matters.



 builtin/bundle.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git c/builtin/bundle.c w/builtin/bundle.c
index d5d41a8f67..eeb5963dcb 100644
--- c/builtin/bundle.c
+++ w/builtin/bundle.c
@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 	int bundle_fd = -1;
 	int ret;
 	int progress = isatty(2);
+	int fsck_objects = 0;
 
 	struct option options[] = {
 		OPT_BOOL(0, "progress", &progress,
 			 N_("show progress meter")),
+		OPT_BOOL(0, "fsck-objects", &fsck_objects,
+			 N_("check the objects in the bundle")),
 		OPT_END()
 	};
 	char *bundle_file;
@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
 			     _("Unbundling objects"), NULL);
 	ret = !!unbundle(the_repository, &header, bundle_fd,
-			 &extra_index_pack_args, 0) ||
+			 &extra_index_pack_args,
+			 fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
 		list_bundle_refs(&header, argc, argv);
 	bundle_header_release(&header);
 cleanup:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Xing Xin" wrote (reply to this):

At 2024-06-12 04:05:35, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>
>>
>> This commit extends object verification support for fetches in
>> `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
>> option to `verify_bundle_flags`. When this option is enabled,
>> `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
>> determine whether to append the "--fsck-objects" flag to
>> "git-index-pack".
>
>Please start your proposed log message by stating what the perceived
>problem without this patch in the current world is.  Without it, you
>cannot easily answer the most basic question: "why are we doing this?"

Got it.

>The name VERIFY_BUNDLE_FSCK_FOLLOW_FETCH does not read very well.
>VERIFY_BUNDLE part is common across various flags, so what is
>specific to the feature is "FSCK_FOLLOW_FETCH", and it is good that
>we convey the fact that we do a bit more than the normal
>VERIFY_BUNDLE (which is, to read the prerequisite headers and make
>sure we have them in the sense that they are reachable from our
>refs) with the word FSCK.
>
>But is it necessary (or even a good idea) to limit its usability
>with "FOLLOW_FETCH" (which does not look even grammatical)?  Aren't
>we closing the door to other folks who may want to do a more
>thorough fsck-like checks in other codepaths by saying "if you are
>not doing this immediately after you fetch, you are unwelcome to use
>this flag"?

I initially considered adding another option VERIFY_BUNDLE_FSCK_ALWAYS
for other scenarios, which would take a higher priority than
VERIFY_BUNDLE_FSCK_FOLLOW_FETCH. However, that approach is also
confusing, as we would have two flags both controlling the fsck
behavior.

How about extending VERIFY_BUNDLE_FSCK and letting the callers decide
whether to add the flag for fscking?

In bundle.c, we can make a small change like:

@@ -625,6 +626,9 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	if (header->filter.choice)
 		strvec_push(&ip.args, "--promisor=from-bundle");

+	if (flags & VERIFY_BUNDLE_FSCK)
+		strvec_push(&ip.args, "--fsck-objects");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);

For example, in `bundle-uri.c:unbundle_from_file`, which is one of the
callers of unbundle, we can use `fetch_pack_fsck_objects` to decide
whether to add that option:

@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
 		return 1;

This approach should streamline the code while maintaining flexibility.
The follow-up patch you mentioned below should just work then, it is not
for now because we are touching the unwanted `fetch_pack_fsck_objects`
within `unbundle`.

Xing Xin

>I wonder if something like this is a potential follow-up topic
>somebody may be interested in after the dust settles.  That is
>exactly why the name of this bit matters.
>
>
>
> builtin/bundle.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git c/builtin/bundle.c w/builtin/bundle.c
>index d5d41a8f67..eeb5963dcb 100644
>--- c/builtin/bundle.c
>+++ w/builtin/bundle.c
>@@ -194,10 +194,13 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> 	int bundle_fd = -1;
> 	int ret;
> 	int progress = isatty(2);
>+	int fsck_objects = 0;
> 
> 	struct option options[] = {
> 		OPT_BOOL(0, "progress", &progress,
> 			 N_("show progress meter")),
>+		OPT_BOOL(0, "fsck-objects", &fsck_objects,
>+			 N_("check the objects in the bundle")),
> 		OPT_END()
> 	};
> 	char *bundle_file;
>@@ -217,7 +220,8 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
> 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
> 			     _("Unbundling objects"), NULL);
> 	ret = !!unbundle(the_repository, &header, bundle_fd,
>-			 &extra_index_pack_args, 0) ||
>+			 &extra_index_pack_args,
>+			 fsck_objects ? VERIFY_BUNDLE_FSCK_FOLLOW_FETCH : 0) ||
> 		list_bundle_refs(&header, argc, argv);
> 	bundle_header_release(&header);
> cleanup:

Copy link

gitgitgadget bot commented Jun 12, 2024

This branch is now known as xx/bundie-uri-fixes.

Copy link

gitgitgadget bot commented Jun 12, 2024

This patch series was integrated into seen via git@7ae269d.

@gitgitgadget gitgitgadget bot added the seen label Jun 12, 2024
Copy link

gitgitgadget bot commented Jun 12, 2024

There was a status update in the "New Topics" section about the branch xx/bundie-uri-fixes on the Git mailing list:

When bundleURI interface fetches multiple bundles, Git failed to
take full advantage of all bundles and ended up slurping duplicated
objects.

Will merge to 'next'?
cf. <xmqqa5jruwkr.fsf@gitster.g>
cf. <xmqqsexjtfcg.fsf@gitster.g>
source: <pull.1730.v6.git.1718109943.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jun 12, 2024

This patch series was integrated into seen via git@3739a48.

Copy link

gitgitgadget bot commented Jun 13, 2024

There was a status update in the "Cooking" section about the branch xx/bundie-uri-fixes on the Git mailing list:

When bundleURI interface fetches multiple bundles, Git failed to
take full advantage of all bundles and ended up slurping duplicated
objects.

Expecting a reroll.
cf. <xmqqa5jruwkr.fsf@gitster.g>
cf. <xmqqsexjtfcg.fsf@gitster.g>
source: <pull.1730.v6.git.1718109943.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jun 13, 2024

This patch series was integrated into seen via git@384bfa3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant