Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send-pack.c: add config push.useBitmaps #1263

Closed
wants to merge 1 commit into from

Conversation

keyu98
Copy link

@keyu98 keyu98 commented Jun 15, 2022

This patch add config push.useBitmaps to prevent git push using bitmap.

The origin discussion is here:
https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/

Thanks,
-Kyle

Changes since v1:

  • changed the commit message
  • modified and added missing \n to push.txt
  • used test_subcommand for test
  • modified "if" statement for "git_config_get_bool()" in send-pack.c

Changes since v2:

  • enable 'push.useBitmaps" by default
  • fix nit in t/t5516-fetch-push.sh

Changes since v3:

  • changed the commit message
  • s/no_use_bitmaps/disable_bitmaps in send-pack.h and send-pack.c
  • I modified the document about "push.useBitmaps".
    When it is true, Git will keep the historical behaviour. So I mainly introduced what happens when it set to false.
  • use test_config and test_unconfig for test.

cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Derrick Stolee derrickstolee@github.com
cc: Taylor Blau me@ttaylorr.com
cc: kylezhao(赵柯宇) kylezhao@tencent.com
cc: Jeff King peff@peff.net

@keyu98
Copy link
Author

keyu98 commented Jun 15, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

Submitted as pull.1263.git.1655291320433.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1263/keyu98/kz/push-usebitmps-config-v1

To fetch this version to local tag pr-1263/keyu98/kz/push-usebitmps-config-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1263/keyu98/kz/push-usebitmps-config-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/15/22 7:08 AM, Kyle Zhao via GitGitGadget wrote:
> From: Kyle Zhao <kylezhao@tencent.com>
> 
> This allows you to disabled bitmaps for "git push". Default is false.

s/disabled/disable/

> Bitmaps are designed to speed up the "counting objects" phase of
> subsequent packs created for clones and fetches.
> But in some cases, turning bitmaps on does horrible things for "push"
> performance[1].

I would rephrase this message body as follows:

  Reachability bitmaps are designed to speed up the "counting objects"
  phase of generating a pack during a clone or fetch. They are not
  optimized for Git clients sending a small topic branch via "git push".
  In some cases (see [1]), using reachability bitmaps during "git push"
  can cause significant performance regressions.

  Add a new "push.useBitmaps" config option to disable reachability
  bitmaps during "git push". This allows reachability bitmaps to still
  be used in other areas, such as "git rev-list --use-bitmap-index".

> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

> +
> +push.useBitmaps::
> +	If this config and `pack.useBitmaps` are both "true", git will
> +	use pack bitmaps (if available) when git push. Default is false.

Rewording slightly:

  If this config and `pack.useBitmaps` are both `true`, then Git will
  use reachability bitmaps during `git push`, if available (disabled
  by default).

> \ No newline at end of file

Please fix this missing newline.

I'm glad that this references `pack.useBitmaps`. I wonder if that
config is sufficient for your purposes: do you expect to use your
bitmaps to generate pack-files in any other way than "git push"?

That is: are you serving remote requests from your repo with your
bitmaps while also using "git push"? Or, are you using bitmaps
only for things like "git rev-list --use-bitmap-index"?

I just want to compare this with `pack.useSparse` which targets
"git push", but focuses entirely on the pack-creation part of the
operation. Since the existence of reachability bitmaps overrides
the sparse algorithm, this does not affect Git servers (who should
have a reachability bitmap).

I just want to be sure that using pack.useBitmaps=false would not
be sufficient for your situation (and probably most situations).


> diff --git a/send-pack.c b/send-pack.c
> index bc0fcdbb000..d6091571caa 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
>  		strvec_push(&po.args, "--progress");
>  	if (is_repository_shallow(the_repository))
>  		strvec_push(&po.args, "--shallow");
> +	if (!args->use_bitmaps)
> +		strvec_push(&po.args, "--no-use-bitmap-index");

Here is where you specify the lower-level pack creation only
when sending a pack. It is very focused. Good.

> +	int use_bitmaps = 0;

> +	git_config_get_bool("push.usebitmaps", &use_bitmaps);
> +	if (use_bitmaps)
> +		args->use_bitmaps = 1;

You can instead write this as:

	if (!git_config_get_bool("push.usebitmaps", &use_bitmaps))
		args->use_bitmaps = use_bitmaps;

> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index dedca106a7a..ee0b912a5e8 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' '
>  	test_line_count = 1 warnings
>  '
>  
> +test_expect_success 'push with config push.useBitmaps' '
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&

Just use "err" instead of "stderr".

> +	grep "no-use-bitmap-index" stderr &&
> +
> +	git config push.useBitmaps true &&
> +	GIT_TRACE=1 git push testrepo main:test2 >/dev/null 2>stderr &&
> +	! grep "no-use-bitmap-index" stderr
> +'

I believe this test is correct, but it might be worth looking
into test_subcommand if you can determine the exact subcommand
arguments you are looking for.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Wed, Jun 15 2022, Kyle Zhao via GitGitGadget wrote:

[CC'd Taylor, who's worked a lot on bitmaps]

> From: Kyle Zhao <kylezhao@tencent.com>
>
> This allows you to disabled bitmaps for "git push". Default is false.

Thanks for following up.

Refresh my memory, we always use them if we find them now, correct?
I.e. if repack.writebitmaps=true

This doesn't make it clear: Are we now going to ignore them for "push"
by default, even if they exist? I.e. a change in the current default.

I think I recall from the previous discussions that it was a bit of hit
& miss, maybe we're confident that they're almost (or always?) bad for
"push", but I think there *are* cases where they help.

> Bitmaps are designed to speed up the "counting objects" phase of
> subsequent packs created for clones and fetches.
> But in some cases, turning bitmaps on does horrible things for "push"
> performance[1].
>
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
>
> Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
> ---
>     send-pack.c: add config push.useBitmaps
>     
>     This patch add config push.useBitmaps to prevent git push using bitmap.
>     
>     The origin discussion is here:
>     https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
>     
>     Thanks, Kyle
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1263%2Fkeyu98%2Fkz%2Fpush-usebitmps-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1263/keyu98/kz/push-usebitmps-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1263
>
>  Documentation/config/push.txt |  4 ++++
>  send-pack.c                   |  6 ++++++
>  send-pack.h                   |  3 ++-
>  t/t5516-fetch-push.sh         | 11 +++++++++++
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
> index e32801e6c91..d8fb0bd1414 100644
> --- a/Documentation/config/push.txt
> +++ b/Documentation/config/push.txt
> @@ -137,3 +137,7 @@ push.negotiate::
>  	server attempt to find commits in common. If "false", Git will
>  	rely solely on the server's ref advertisement to find commits
>  	in common.
> +
> +push.useBitmaps::
> +	If this config and `pack.useBitmaps` are both "true", git will
> +	use pack bitmaps (if available) when git push. Default is false.
> \ No newline at end of file

"git diff" is telling you something is wrong here :) hint: missing \n.

> diff --git a/send-pack.c b/send-pack.c
> index bc0fcdbb000..d6091571caa 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
>  		strvec_push(&po.args, "--progress");
>  	if (is_repository_shallow(the_repository))
>  		strvec_push(&po.args, "--shallow");
> +	if (!args->use_bitmaps)
> +		strvec_push(&po.args, "--no-use-bitmap-index");
>  	po.in = -1;
>  	po.out = args->stateless_rpc ? -1 : fd;
>  	po.git_cmd = 1;
> @@ -482,6 +484,7 @@ int send_pack(struct send_pack_args *args,
>  	int use_push_options = 0;
>  	int push_options_supported = 0;
>  	int object_format_supported = 0;
> +	int use_bitmaps = 0;
>  	unsigned cmds_sent = 0;
>  	int ret;
>  	struct async demux;
> @@ -497,6 +500,9 @@ int send_pack(struct send_pack_args *args,
>  	git_config_get_bool("push.negotiate", &push_negotiate);
>  	if (push_negotiate)
>  		get_commons_through_negotiation(args->url, remote_refs, &commons);
> +	git_config_get_bool("push.usebitmaps", &use_bitmaps);
> +	if (use_bitmaps)
> +		args->use_bitmaps = 1;

[A bit rambling, sorry]

A bit off of a use of the API, can't we just do:

	git_config_get_bool("push.usebitmaps", &args->use_bitmaps);

And drop the local variable? I.e. if we have a config variable we'll
write the value to it.

But then again that goes with the suggested default, i.e. I think we
should probably use them by default, and provide an out, unless we're
*really* sure they're useless for "push".

In this case I found the code a bit odd, which took me a moment to put
my finger on.

In builtin/send-pack.c we initialize "struct send_pack_args" in the file
scope, so it's zero'd out.

So all parameters are 0'd by default.

So your new "use_bitmaps" is born false.

Then when we get here you assign 0 to use_bitmaps, and only if it's true
do you use it.

Which to me is a couple of layers in to something that's less clear than
it could be, i.e. we're making the hard assumption here that the default
in the struct is false. So we should really just do:

	int tmp;
	if (!git_config_get_bool("push.usebitmaps", &tmp))
		args->use_bitmaps = tmp;

So don't care what the default was before, we have an explicit config
variable we're going to use, if we saw push.usebitmaps let's use its
value.

This way the default can flip, and this code downstream of that will
still do what's intended.

FWIW that "if" is redundant, but it's the general idiom, but we *can* do
it the way I suggested above, but I think it's clearer to go with the
second form, which reads more obviously as "if the variable exists, set
it to ...".

For config.c in particular the "without the if" works, but I think
that's relying on an implementation detail, i.e. we have a few other
APIs that "zero out" the parameter you put in as the first thing they
do. So if they also return "I didn't error" you want to use a temp
variable, and only assign it to your "real" variable if the return value
was "OK".

>  
>  	git_config_get_bool("transfer.advertisesid", &advertise_sid);
>  
> diff --git a/send-pack.h b/send-pack.h
> index e148fcd9609..f7af1b0353e 100644
> --- a/send-pack.h
> +++ b/send-pack.h
> @@ -26,7 +26,8 @@ struct send_pack_args {
>  		/* One of the SEND_PACK_PUSH_CERT_* constants. */
>  		push_cert:2,
>  		stateless_rpc:1,
> -		atomic:1;
> +		atomic:1,
> +		use_bitmaps:1;
>  	const struct string_list *push_options;
>  };
>  
> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index dedca106a7a..ee0b912a5e8 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1865,4 +1865,15 @@ test_expect_success 'push warns or fails when using username:password' '
>  	test_line_count = 1 warnings
>  '
>  
> +test_expect_success 'push with config push.useBitmaps' '
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE=1 git push testrepo main:test >/dev/null 2>stderr &&

We generally don't >/dev/null in tests, just emit the output, and if we
run with -v you'll see it.

In this case though you want just:

    GIT_TRACE="$PWD/trace.log" [...]

Without any redirection,

Also, you probably want GIT_TRACE2_EVENT=$PWD/trace.json, and see
"test_subcommand". We have a handy helper for finding if we have trace2
regions.

I'm assuming we have some region for "used bitmaps" already, but I
didn't check...

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Jun 15, 2022 at 09:09:35AM -0400, Derrick Stolee wrote:
> I just want to be sure that using pack.useBitmaps=false would not
> be sufficient for your situation (and probably most situations).

I think the only other affected scenario on the client side would be
repacking. And in theory most clients are repacking in the background
anyways, so any speed-ups from using bitmaps wouldn't be noticeable
anyway.

I think just relying on the existing pack.useBitmaps config should be
sufficient here.

Thanks,
Taylor

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

User Taylor Blau <me@ttaylorr.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2022

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Jun 15, 2022 at 09:47:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refresh my memory, we always use them if we find them now, correct?
> I.e. if repack.writebitmaps=true

Kind of. `rev-list` has an opt-in `--use-bitmap-index` option, and
`pack-objects` has `pack.useBitmaps` which controls whether or not
bitmaps are read.

So `git rev-list --objects HEAD` won't use any bitmaps, even if they
exist, but `git rev-list --objects --use-bitmap-index HEAD` will. There
there's a good reason not to use bitmaps by default, which is that they
(effectively) randomize the ordering of your output.

`pack-objects` behaves slightly differently, cf. its
`use_bitmap_index{,default}` variables to see how that works.

> This doesn't make it clear: Are we now going to ignore them for "push"
> by default, even if they exist? I.e. a change in the current default.
>
> I think I recall from the previous discussions that it was a bit of hit
> & miss, maybe we're confident that they're almost (or always?) bad for
> "push", but I think there *are* cases where they help.

Yeah. In theory they should help most of the time as long as the bitmaps
are kept reasonably up-to-date. There's a tradeoff between using bitmaps
and how much walking between bitmap and ref-tips is required. Since
every object we encounter between the most recent bitmap and the thing
we're trying to generate a bitmap for has to incur extra overhead in
order to find and mark its position in the bitmap being generated,
there's a certain point at which it would have been faster to just walk
down to the roots and avoid bitmaps altogether.

But I suspect that in this case the bitmaps are just simply stale, and
that a "git repack -adb" or more aggressive background maintenance would
make things quite a bit faster.

Thanks,
Taylor

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, kylezhao(赵柯宇) wrote (reply to this):

> > I'm glad that this references `pack.useBitmaps`. I wonder if that config is sufficient for your purposes: do you expect to use your bitmaps to generate pack-files in any other way > > I just want to be sure that using pack.useBitmaps=false would not be 
> > sufficient for your situation (and probably most situations).
> 
> I think the only other affected scenario on the client side would be repacking. And in theory most clients are repacking in the background anyways, so any speed-ups from using bitmaps wouldn't be noticeable anyway.
> 
> I think just relying on the existing pack.useBitmaps config should be sufficient here.

In fact ,we also use "git push" on our server side. 
Each git repositories have multiple replicas on our servers to improve system disaster tolerance and read performance.

For example, a git repo will be distributed on multiple servers (like server-1, server-2, server-3).
If user pushes the pack to server-1, then server-1 will call "git push" to transfer the objects data to server-2 and server-3.
And users can clone from all the server mentioned above.
Under such a process, our system works well most of the time.

If we set pack.useBitmaps=false, "git upload-pack" will also be affected.
For my situation, it would be sufficient when set both pack.useBitmaps=true and push.useBitmaps=false.

> But I suspect that in this case the bitmaps are just simply stale, and
> that a "git repack -adb" or more aggressive background maintenance would
> make things quite a bit faster.

It doesn't seem to be the reason.
I have already called  "git repack -adb" in this case[1] but that didn't seem to fix the push performance issue.
You can see that the git repo had only one pack at that time.

[1] https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/

Thanks,
- Kyle

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

User kylezhao(赵柯宇) <kylezhao@tencent.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

There is a merge commit in this Pull Request:

d0397ef3ac13ad5c26951b60bf8ee951ea7f22ec

Please rebase the branch and force-push.

@keyu98 keyu98 force-pushed the kz/push-usebitmps-config branch 3 times, most recently from 91ce038 to 42e0b48 Compare June 16, 2022 03:32
@keyu98
Copy link
Author

keyu98 commented Jun 16, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

Submitted as pull.1263.v2.git.1655350617442.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1263/keyu98/kz/push-usebitmps-config-v2

To fetch this version to local tag pr-1263/keyu98/kz/push-usebitmps-config-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1263/keyu98/kz/push-usebitmps-config-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/15/2022 11:36 PM, Kyle Zhao via GitGitGadget wrote:
> From: Kyle Zhao <kylezhao@tencent.com>
> 
> This allows you to disable bitmaps for "git push". Default is false.
> 
> Reachability bitmaps are designed to speed up the "counting objects"
> phase of generating a pack during a clone or fetch. They are not
> optimized for Git clients sending a small topic branch via "git push".
> In some cases (see [1]), using reachability bitmaps during "git push"
> can cause significant performance regressions.
> 
> Add a new "push.useBitmaps" config option to disable reachability
> bitmaps during "git push". This allows reachability bitmaps to still
> be used in other areas, such as "git rev-list --use-bitmap-index".
> 
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
> 
> Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
> ---
>     send-pack.c: add config push.useBitmaps
>     
>     This patch add config push.useBitmaps to prevent git push using bitmap.
>     
>     The origin discussion is here:
>     https://lore.kernel.org/git/b940e705fbe9454685757f2e3055e2ce@tencent.com/
>     
>     Thanks, -Kyle
>     
>     Changes since v1:
>     
>      * changed the commit message
>      * modified and added missing \n to push.txt
>      * used test_subcommand for test
>      * modified "if" statement for "git_config_get_bool()" in send-pack.c

This version satisfies the recommended changes I had from the last
version. As long as we think there is a benefit to having this
additional knob over just pack.useBitmaps, then this is good to go.

For my part, I think having the additional knob is less complicated
than requiring all users in this situation to shift from "push" to
a "fetch" (from the other side of the connection).

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Jun 16 2022, Kyle Zhao via GitGitGadget wrote:

> From: Kyle Zhao <kylezhao@tencent.com>
>
> This allows you to disable bitmaps for "git push". Default is false.
>
> Reachability bitmaps are designed to speed up the "counting objects"
> phase of generating a pack during a clone or fetch. They are not
> optimized for Git clients sending a small topic branch via "git push".
> In some cases (see [1]), using reachability bitmaps during "git push"
> can cause significant performance regressions.
>
> Add a new "push.useBitmaps" config option to disable reachability
> bitmaps during "git push". This allows reachability bitmaps to still
> be used in other areas, such as "git rev-list --use-bitmap-index".
>
> [1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
> [...]
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE2_EVENT="$PWD/default" \
> +	git push testrepo main:test &&
> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
> +	--thin --delta-base-offset -q --no-use-bitmap-index <default &&

Nit: We tend to indent these ase we wrap, so e.g.:

	test_subcommand git ... \
		--thin --delta [...]

The rest all looks good as far as the diff goes, if what we want to do
is to disable this by default, and this isn't worth a re-roll in itself.

But I still think that completely disabling bitmaps might be premature
here, especially per Taylor's comment on v1 (which I understand to mean
that they should help some of the time, even with push).

And since this is linking to that old thread I started in 2019 I really
think the commit message should be exploring and justifying why this
might be slower in all cases, or at least a sufficient number of cases
to flip the default.

if it's not are we throwing out a generally useful optimization (by
default) due to some edge cases we've found?

E.g. have you tried to follow-up on this comment by Jeff King?
https://lore.kernel.org/git/20190523113031.GA17448@sigill.intra.peff.net/

At the time I didn't, because as noted in a follow-up I'd lost my test
case by the time I read that, but it seems you haven't, and have a
current test case.

If you apply that patch on that old git version at the time (as it
probably won't apply cleanly), does it make a difference for your case?

To be clear I'm *not* confident that this isn't the right thing to do as
far as the diff is concerned. But I *am* confident that the proposed
commit message isn't selling me on the idea.

I think a really good start would be to come up with some benchmarks for
a few common cases, e.g. how do bitmaps do if we have a big repo but few
refs, how about a lot of refs? Does their stale-ness make a difference
or not (per previous discussions) etc. etc.

Or, if you don't have time for any of that I think a good change for now
would be to add the flag, but not flip the default, and say "we don't
have enough data to say if bitmaps are overall worse for pushes, but
here's an opt-opt".

But the change as it stands is effectively saying "bitmaps on push
hinder more than they help, and turning them on for push was a mistake".

Maybe that's true, but I don't think we've got any data to support
that. Even though I've got one of those anecdotes (and from occasional
investigations of slow "git push" I'm pretty sure this would help my
use-cases more than it would harm them) the plural of anecdote isn't
data.

I've only personally run into this with repos with a large number (>30k
at least) number of refs, and where the bitmaps were in some state of
staleness (this usually being my working copy, where I rely on "git gc
--auto").

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, kylezhao(赵柯宇) wrote (reply to this):

> And since this is linking to that old thread I started in 2019 I really
>think the commit message should be exploring and justifying why this
> might be slower in all cases, or at least a sufficient number of cases
> to flip the default.
> 
> if it's not are we throwing out a generally useful optimization (by
> default) due to some edge cases we've found?

I'm not sure if it's an edge case.
However, if a git repo has this problem, it's "git push" will always be very slow
instead of going fast and slow.

> E.g. have you tried to follow-up on this comment by Jeff King?
> https://lore.kernel.org/git/20190523113031.GA17448@sigill.intra.peff.net/

Not yet, I'll try it later.

> At the time I didn't, because as noted in a follow-up I'd lost my test
> case by the time I read that, but it seems you haven't, and have a
> current test case.

I tried to find a test case in open-source projects, and finally found one.
$ git clone https://github.com/JetBrains/intellij-community.git --bare
$ cd intellij-community.git
$ git repack -adb
$ GIT_TRACE=1 git push . master:test1

Then it would get stuck in "git pack-objects" for about 10s. (OS: Windows)
After I deleted the bitmap, "git push" took less than 1s.


> I think a really good start would be to come up with some benchmarks for
> a few common cases, e.g. how do bitmaps do if we have a big repo but few
> refs, how about a lot of refs? Does their stale-ness make a difference
> or not (per previous discussions) etc. etc.

According to my observation, the problem always occur on git repos which have 
a lot of refs and objects.
Also, "git push"  takes much time on "find _objects(..) for haves_bitmap".

Hope the information above can help find the cause of the problem.

Thanks,
- Kyle

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	mk_test testrepo heads/main &&
>> +	git checkout main &&
>> +	GIT_TRACE2_EVENT="$PWD/default" \
>> +	git push testrepo main:test &&
>> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
>> +	--thin --delta-base-offset -q --no-use-bitmap-index <default &&
>
> Nit: We tend to indent these ase we wrap, so e.g.:
>
> 	test_subcommand git ... \
> 		--thin --delta [...]
>
> The rest all looks good as far as the diff goes, if what we want to do
> is to disable this by default, and this isn't worth a re-roll in itself.
>
> But I still think that completely disabling bitmaps might be premature
> here, especially per Taylor's comment on v1 (which I understand to mean
> that they should help some of the time, even with push).

The usual way to move is to move slowly and carefully.

It may well be the case that disabling bitmaps gives users a better
default, but that is not even proven and is hard to prove.  How many
users of Git do we have?  Those silently using it happily will by
definition complain here or elsewhere, and the complaints "X is slow
with Y, so Y should be disabled when doing X" we hear tend to be
louder than "I am happily doing X with Y".

I have different problems with this patch, though.  It can use a bit
more honesty.  If you introduce a new knob and sell it as a knob
that allows disabling, be honest and keep its behaviour as
advertised.

As posted, IIUC, the patch does something quite different.  It
disables by default, and have a knob to allow it enabled again.

So, perhaps make it default on to keep the historical behaviour, and
document it as "setting it false may improve push performance without
affecting use of the reachability bitmaps for other operations.
Default is true."

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jun 16, 2022 at 03:38:36PM +0200, Ævar Arnfjörð Bjarmason wrote:

> But the change as it stands is effectively saying "bitmaps on push
> hinder more than they help, and turning them on for push was a mistake".
> 
> Maybe that's true, but I don't think we've got any data to support
> that. Even though I've got one of those anecdotes (and from occasional
> investigations of slow "git push" I'm pretty sure this would help my
> use-cases more than it would harm them) the plural of anecdote isn't
> data.

Yeah, I'm not convinced they hinder more than they help on net. And any
problems on pushing are also things that _could_ happen on fetching, so
I think this is really a band-aid. But:

  - even if the average is improved, it's reasonable to want to avoid
    the most pathological cases

  - it's reasonable for it to be the user's choice to make, and right
    now the config isn't expressive enough to allow the choice they want

My biggest fear would be that the pathological cases are improved
(perhaps with the approach of mine that you linked earlier; thanks for
that), but that people erroneously continue to think that turning off
push.useBitmaps is a good idea due to inertia or superstition. But that
seems like a minor problem that can be addressed later if need be.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

User Jeff King <peff@peff.net> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2022

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Jun 16, 2022 at 03:11:09PM +0000, kylezhao(赵柯宇) wrote:
> > At the time I didn't, because as noted in a follow-up I'd lost my test
> > case by the time I read that, but it seems you haven't, and have a
> > current test case.
>
> I tried to find a test case in open-source projects, and finally found one.
> $ git clone https://github.com/JetBrains/intellij-community.git --bare
> $ cd intellij-community.git
> $ git repack -adb
> $ GIT_TRACE=1 git push . master:test1

I wouldn't expect this to push any objects at all, since you're pushing
to a repository that already has all of the objects contained in
`master`.

A more representative test might be something like:

    $ git clone https://github.com/JetBrains/intellij-community.git --bare
    $ cd intellij-community.git
    $ git repack -adb
    $ git rev-parse HEAD >in
    $ time git pack-objects --revs --stdout <in >/dev/null
    # move the bitmap away so we don't use it
    $ mv objects/pack/pack-*.bitmap{,.bak}
    $ time git pack-objects --revs --stdout <in >/dev/null

Thanks,
Taylor

@keyu98
Copy link
Author

keyu98 commented Jun 17, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2022

Submitted as pull.1263.v3.git.1655438361228.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1263/keyu98/kz/push-usebitmps-config-v3

To fetch this version to local tag pr-1263/keyu98/kz/push-usebitmps-config-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1263/keyu98/kz/push-usebitmps-config-v3

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2022

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Kyle Zhao via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  Documentation/config/push.txt |  6 ++++++
>  send-pack.c                   |  6 ++++++
>  send-pack.h                   |  3 ++-
>  t/t5516-fetch-push.sh         | 21 +++++++++++++++++++++
>  4 files changed, 35 insertions(+), 1 deletion(-)

Thanks.  This round looks more or less OK.

> diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
> index e32801e6c91..f16ac9311db 100644
> --- a/Documentation/config/push.txt
> +++ b/Documentation/config/push.txt
> @@ -137,3 +137,9 @@ push.negotiate::
>  	server attempt to find commits in common. If "false", Git will
>  	rely solely on the server's ref advertisement to find commits
>  	in common.
> +
> +push.useBitmaps::
> +	If this config and `pack.useBitmaps` are both `true`, then Git will
> +	use reachability bitmaps during `git push`, if available. If set to
> +	`false`, may improve push performance without affecting use of the
> +	reachability bitmaps for other operations. Default is true.

While nothing in the description is incorrect per-se, I somehow find
it hard to follow.  "git push" uses reachability bitmaps when three
conditions hold true at the same time (pack.useBitmaps that is by
default true, push.useBitmaps that is by default true, and there
exist reachability bitmaps to be used).  It is left unsaid why the
user needs to know about this configuration variable in order to
tweak "without affecting other operations".

I tried to come up with a version that I find easier to read (at
least to me).  I am not sure how successfull I was.

    While git can be told to use reachability bitmaps in general
    with "pack.useBitmaps" configuration, this variable can be used
    to disable use of bitmaps only for "git push" by setting it to
    false, without preventing other git operations from using
    bitmaps.

> diff --git a/send-pack.c b/send-pack.c
> index bc0fcdbb000..d18cde850ef 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -84,6 +84,8 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised,
>  		strvec_push(&po.args, "--progress");
>  	if (is_repository_shallow(the_repository))
>  		strvec_push(&po.args, "--shallow");
> +	if (args->no_use_bitmaps)
> +		strvec_push(&po.args, "--no-use-bitmap-index");

"disable_bitmaps" might be a better name for the struct member, but OK.

> @@ -498,6 +501,9 @@ int send_pack(struct send_pack_args *args,
>  	if (push_negotiate)
>  		get_commons_through_negotiation(args->url, remote_refs, &commons);
>  
> +	if (!git_config_get_bool("push.usebitmaps", &use_bitmaps))
> +		args->no_use_bitmaps = !use_bitmaps;

Because the "--no-use-bitmap-index" is explicitly given only when
the ".no_use_bitmaps" member is true and we do not pass
"--use-bitmap-index" merely because the member is false, the
configuration can only disable, which matches the documented
design.  Good.

> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index dedca106a7a..ffda830ba22 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -1865,4 +1865,25 @@ test_expect_success 'push warns or fails when using username:password' '
>  	test_line_count = 1 warnings
>  '
>  
> +test_expect_success 'push with config push.useBitmaps' '
> +	mk_test testrepo heads/main &&
> +	git checkout main &&
> +	GIT_TRACE2_EVENT="$PWD/default" \
> +	git push testrepo main:test &&
> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
> +		--thin --delta-base-offset -q <default &&

We do not pass "--no-use-bitmap-index" without configuration.
Good. 

How do we know this is the "default" state, by the way?  Because we
read the tests before this, all 1860 lines?

Perhaps

	test_unconfig push.useBitmaps &&

at the very beginning of the test would help future readers.  That
way, they can immediately tell that the first test piece is without
the variable.

> +	git config push.useBitmaps true &&
> +	GIT_TRACE2_EVENT="$PWD/true" \
> +	git push testrepo main:test2 &&

Perhaps use

	test_config push.useBitmaps true

instead?  That way, after this test-expect-success finishes,
push.useBitmaps variable will be wiped from the configuration
variable.  That is a good hygiene to follow to help future
developers who may need to add more tests after this one.

> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
> +		--thin --delta-base-offset -q <true &&
> +
> +	git config push.useBitmaps false &&

Likewise.

> +	GIT_TRACE2_EVENT="$PWD/false" \
> +	git push testrepo main:test3 &&
> +	test_subcommand git pack-objects --all-progress-implied --revs --stdout \
> +		--thin --delta-base-offset -q --no-use-bitmap-index <false
> +'
> +
>  test_done


Lastly, some critique on the proposed log message.

> This allows you to disable bitmaps for "git push". Default is true.

I find it more confusing to have this line at the beginning than
without.  What the configuration variable does is explained later
anyway, and our convention is not to start from such a "conclusion"
but ease the readers into the reasoning by starting with the
explanation of the status quo, which leads readers to realize why
the current system is lacking.

> Reachability bitmaps are designed to speed up the "counting objects"
> phase of generating a pack during a clone or fetch. They are not
> optimized for Git clients sending a small topic branch via "git push".
> In some cases (see [1]), using reachability bitmaps during "git push"
> can cause significant performance regressions.

And this paragraph brilliantly does its job.  A reader who did not
even know what the reachability bitmaps are for is now prepared to
follow your logic why it is a good idea to special case "git push"
because you explain that it is good for some things and not
necessarily good for others like "git push".

> Add a new "push.useBitmaps" config option to disable reachability
> bitmaps during "git push". This allows reachability bitmaps to still
> be used in other areas, such as "git rev-list --use-bitmap-index".

I had the same problem with this paragraph as the one in the
documentation part of the patch.  Nothing in it is incorrect per-se,
but is roundabout way of saying what it wants to say.

Here is my attempted rewrite.

    Add a new "push.useBitmaps" configuration variable to allow users to
    tell "git push" not to use bitmaps.  We already have "pack.bitmaps"
    that controls the use of bitmaps, but a separate configuration
    variable allows the reachability bitmaps to still be used in other
    areas, such as "git rev-list --use-bitmap-index", while disabling it
    only for "git push".

Thanks.

Reachability bitmaps are designed to speed up the "counting objects"
phase of generating a pack during a clone or fetch. They are not
optimized for Git clients sending a small topic branch via "git push".
In some cases (see [1]), using reachability bitmaps during "git push"
can cause significant performance regressions.

Add a new "push.useBitmaps" configuration variable to allow users to
tell "git push" not to use bitmaps. We already have "pack.bitmaps"
that controls the use of bitmaps, but a separate configuration variable
allows the reachability bitmaps to still be used in other areas,
such as "git upload-pack", while disabling it only for "git push".

[1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
@keyu98
Copy link
Author

keyu98 commented Jun 17, 2022

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2022

Submitted as pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1263/keyu98/kz/push-usebitmps-config-v4

To fetch this version to local tag pr-1263/keyu98/kz/push-usebitmps-config-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1263/keyu98/kz/push-usebitmps-config-v4

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2022

This patch series was integrated into seen via git@699d68f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2022

This patch series was integrated into seen via git@2d65c0e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2022

There was a status update in the "New Topics" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'next'?
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 21, 2022

This patch series was integrated into seen via git@1dc587f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 21, 2022

This patch series was integrated into seen via git@e0b6469.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 22, 2022

This patch series was integrated into seen via git@8130449.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 22, 2022

There was a status update in the "Cooking" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'next'?
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 23, 2022

This patch series was integrated into seen via git@d02b069.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 23, 2022

This patch series was integrated into seen via git@8b0a66a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2022

This patch series was integrated into seen via git@47e6fcb.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 27, 2022

There was a status update in the "Cooking" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'next'.
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 29, 2022

This patch series was integrated into seen via git@f43efff.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2022

There was a status update in the "Cooking" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'next'.
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 3, 2022

This patch series was integrated into seen via git@b25c611.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 6, 2022

This patch series was integrated into seen via git@4312b03.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 8, 2022

This patch series was integrated into seen via git@74edef6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 8, 2022

This patch series was integrated into next via git@8aa1f94.

@gitgitgadget gitgitgadget bot added the next label Jul 8, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 9, 2022

There was a status update in the "Cooking" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'master'.
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2022

This patch series was integrated into seen via git@b3f320f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 10, 2022

This patch series was integrated into seen via git@a41e923.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 11, 2022

This patch series was integrated into seen via git@a02cb59.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2022

This patch series was integrated into seen via git@50bb8fa.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 12, 2022

There was a status update in the "Cooking" section about the branch zk/push-use-bitmaps on the Git mailing list:

"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps.  The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".

Will merge to 'master'.
source: <pull.1263.v4.git.1655492779228.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 13, 2022

This patch series was integrated into seen via git@81705c4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 13, 2022

This patch series was integrated into master via git@81705c4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 13, 2022

This patch series was integrated into next via git@81705c4.

@gitgitgadget gitgitgadget bot added the master label Jul 13, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 13, 2022

Closed via 81705c4.

@gitgitgadget gitgitgadget bot closed this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant