Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch: show progress for packfile uri downloads #907

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

albertcui
Copy link

@albertcui albertcui commented Mar 17, 2021

Git appears to hang when downloading packfiles as this part of the
fetch is silent, causing user confusion. This change implements
progress for the number of packfiles downloaded; a progress display
for bytes would involve deeper changes at the http-fetch layer
instead of fetch-pack, the caller.

cc: Jeff King peff@peff.net
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jonathan Nieder jrnieder@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

Welcome to GitGitGadget

Hi @albertcui, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the FreeNode IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Freenode. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@dscho
Copy link
Member

dscho commented Mar 17, 2021

/allow

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

User albertcui is now allowed to use GitGitGadget.

WARNING: albertcui has no public email address set on GitHub

@albertcui
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

Submitted as pull.907.git.1616007794513.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-907/albertcui/progress-v1

To fetch this version to local tag pr-907/albertcui/progress-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-907/albertcui/progress-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Mar 17, 2021 at 07:03:13PM +0000, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>
> 
> Git appears to hang when downloading packfiles as this part of the
> fetch is silent, causing user confusion. This change implements
> progress for the number of packfiles downloaded; a progress display
> for bytes would involve deeper changes at the http-fetch layer
> instead of fetch-pack, the caller.

I think this is an improvement, but I agree that a real byte display
would be much better. I actually worked on this a long time ago for a
very similar feature that we never quite pushed over the finish line.
See patches 11 and 12 here:

  https://lore.kernel.org/git/20111110074330.GA27925@sigill.intra.peff.net/

(it might need some of the earlier refactoring, too, I'm not sure; and
quite likely will need forward-porting as it has been 10 years).

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

User Jeff King <peff@peff.net> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Mar 17, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Albert Cui via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Albert Cui <albertqcui@gmail.com>
>
> Git appears to hang when downloading packfiles as this part of the
> fetch is silent, causing user confusion. This change implements
> progress for the number of packfiles downloaded; a progress display
> for bytes would involve deeper changes at the http-fetch layer
> instead of fetch-pack, the caller.

... "hence we do not do so in this patch"?  

That's probably a very sensible way to go.

I expect that http-fetch will in the longer term become a mere
fallback default used by those who do not have anything better.
Because we are not in the business of writing a performant HTTP
downloader, we would be better off if we make it easy to plug an
external HTTP downloader other people write in to this codepath.

> +	packfile_uri_progress = start_progress(_("Downloading packs"), packfile_uris.nr);

OK, so we plan to count from 0 up to .nr; and the message is made
localizable.  Good.

> @@ -1696,6 +1700,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		const char *uri = packfile_uris.items[i].string +
>  			the_hash_algo->hexsz + 1;
>  
> +		display_progress(packfile_uri_progress, i+1);

		display_progress(packfile_uri_progress, i + 1);

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 2e1243ca40b0..8964a4003678 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -851,7 +851,8 @@ test_expect_success 'part of packfile response provided as URI' '
>  	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
>  	git -c protocol.version=2 \
>  		-c fetch.uriprotocols=http,https \
> -		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +		clone "$HTTPD_URL/smart/http_parent" http_child \
> +		--progress 2>progress &&

Some existing tests use GIT_PROGRESS_DELAY to protect against an
operation that is too quick to complete.  Don't we need to do the
same?  If not, then perhaps we need to allow delaying the progress
meter we add with this patch for "too quick" case, perhaps?

>  	# Ensure that my-blob and other-blob are in separate packfiles.
>  	for idx in http_child/.git/objects/pack/*.idx
> @@ -875,6 +876,8 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test -f hfound &&
>  	test -f h2found &&
>  
> +	test_i18ngrep "Downloading packs" progress &&

Also, I am not sure with all the terminal control junk, 'grep'
should be expected to reliably pick this substring in the output.
Are we expecting any other output to the standard error stream?
Some tests in t5318 seem to just see if the output is non-empty, and
I am wondering if that is an approach more appropriate here (not
rhetorical---I simply do not know the answer).

>  	# Ensure that there are exactly 3 packfiles with associated .idx
>  	ls http_child/.git/objects/pack/*.pack \
>  	    http_child/.git/objects/pack/*.idx >filelist &&
>
> base-commit: a5828ae6b52137b913b978e16cd2334482eb4c1f

Git appears to hang when downloading packfiles as this part of the
fetch is silent, causing user confusion. This change implements
progress for the number of packfiles downloaded; a progress display
for bytes would involve deeper changes at the http-fetch layer
instead of fetch-pack, the caller, so we do not do that in this
patch.

Signed-off-by: Albert Cui <albertqcui@gmail.com>
@albertcui
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 9, 2021

Submitted as pull.907.v2.git.1618008249632.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-907/albertcui/progress-v2

To fetch this version to local tag pr-907/albertcui/progress-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-907/albertcui/progress-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 10, 2021

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Wed, Mar 17 2021, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>
> [...]
> @@ -875,6 +876,8 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test -f hfound &&
>  	test -f h2found &&
>  
> +	test_i18ngrep "Downloading packs" progress &&

This can just be "grep"

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 10, 2021

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Apr 11, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Albert Cui via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Albert Cui <albertqcui@gmail.com>
>
> Git appears to hang when downloading packfiles as this part of the
> fetch is silent, causing user confusion. This change implements
> progress for the number of packfiles downloaded; a progress display
> for bytes would involve deeper changes at the http-fetch layer
> instead of fetch-pack, the caller, so we do not do that in this
> patch.

... meaning, hopefully later we'd hook into transport->progress and
implement the byte-level progress display down there?  And when that
happens, we'd remove this file-level progress as it would be too
confusing to have both at the same time?

Is this start_progress() call a way to unconditionally enable the
progress display?  How does it interact with transport->progress
that is driven by transport_set_verbosity(), which in turn is called
by builtin/fetch.c and friends?  If it doesn't, shouldn't this
codepath pay attention to the transport->progress and enable the
progress meter only when it is enabled (i.e. the stderr going to a
terminal, or --progress explicitly being asked)?

> @@ -1585,6 +1586,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  	struct fetch_negotiator *negotiator;
>  	int seen_ack = 0;
>  	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
> +	struct progress *packfile_uri_progress;
>  	int i;
>  	struct strvec index_pack_args = STRVEC_INIT;
>  	struct oidset gitmodules_oids = OIDSET_INIT;
> @@ -1689,6 +1691,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		}
>  	}
>  
> +	packfile_uri_progress = start_progress(_("Downloading packs"), packfile_uris.nr);
> +
>  	for (i = 0; i < packfile_uris.nr; i++) {
>  		int j;
>  		struct child_process cmd = CHILD_PROCESS_INIT;
> @@ -1696,6 +1700,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		const char *uri = packfile_uris.items[i].string +
>  			the_hash_algo->hexsz + 1;
>  
> +		display_progress(packfile_uri_progress, i + 1);
>  		strvec_push(&cmd.args, "http-fetch");
>  		strvec_pushf(&cmd.args, "--packfile=%.*s",
>  			     (int) the_hash_algo->hexsz,
> @@ -1739,6 +1744,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  						 get_object_directory(),
>  						 packname));
>  	}
> +
> +	stop_progress(&packfile_uri_progress);
> +
>  	string_list_clear(&packfile_uris, 0);
>  	strvec_clear(&index_pack_args);
>  
> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 2e1243ca40b0..0476b3f50455 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -848,10 +848,12 @@ test_expect_success 'part of packfile response provided as URI' '
>  	configure_exclusion "$P" my-blob >h &&
>  	configure_exclusion "$P" other-blob >h2 &&
>  
> -	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
> +	GIT_PROGRESS_DELAY=0 GIT_TRACE=1 GIT_TRACE2_EVENT=1 \
> +	GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
>  	git -c protocol.version=2 \
>  		-c fetch.uriprotocols=http,https \
> -		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +		clone "$HTTPD_URL/smart/http_parent" http_child \
> +		--progress 2>progress &&
>  
>  	# Ensure that my-blob and other-blob are in separate packfiles.
>  	for idx in http_child/.git/objects/pack/*.idx
> @@ -875,6 +877,8 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test -f hfound &&
>  	test -f h2found &&
>  
> +	test_i18ngrep "Downloading packs" progress &&
> +
>  	# Ensure that there are exactly 3 packfiles with associated .idx
>  	ls http_child/.git/objects/pack/*.pack \
>  	    http_child/.git/objects/pack/*.idx >filelist &&
>
> base-commit: a5828ae6b52137b913b978e16cd2334482eb4c1f

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 30, 2021

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Hi,

Albert Cui wrote:

> Git appears to hang when downloading packfiles as this part of the
> fetch is silent, causing user confusion. This change implements
> progress for the number of packfiles downloaded; a progress display
> for bytes would involve deeper changes at the http-fetch layer
> instead of fetch-pack, the caller, so we do not do that in this
> patch.
>
> Signed-off-by: Albert Cui <albertqcui@gmail.com>
> ---
>  fetch-pack.c           | 8 ++++++++
>  t/t5702-protocol-v2.sh | 8 ++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)

This is something that came up at the last in-person Git Contributor
Summit; I'm glad to see it being taken care of.

> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -23,6 +23,7 @@
>  #include "fetch-negotiator.h"
>  #include "fsck.h"
>  #include "shallow.h"
> +#include "progress.h"
>  
>  static int transfer_unpack_limit = -1;
>  static int fetch_unpack_limit = -1;
> @@ -1585,6 +1586,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  	struct fetch_negotiator *negotiator;
>  	int seen_ack = 0;
>  	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
> +	struct progress *packfile_uri_progress;

It seems to be more idiomatic to initialize this to NULL.

>  	int i;
>  	struct strvec index_pack_args = STRVEC_INIT;
>  	struct oidset gitmodules_oids = OIDSET_INIT;
> @@ -1689,6 +1691,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		}
>  	}
>  
> +	packfile_uri_progress = start_progress(_("Downloading packs"), packfile_uris.nr);

That way, we can respect a --quiet option by making this remain NULL
when progress is not enabled:

	if (!args->quiet && !args->no_progress)
		packfile_uri_progress = ...;

[...]
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -848,10 +848,12 @@ test_expect_success 'part of packfile response provided as URI' '
>  	configure_exclusion "$P" my-blob >h &&
>  	configure_exclusion "$P" other-blob >h2 &&
>  
> -	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" GIT_TEST_SIDEBAND_ALL=1 \
> +	GIT_PROGRESS_DELAY=0 GIT_TRACE=1 GIT_TRACE2_EVENT=1 \

This puts the trace in stderr mixed with other output.  Would it make
sense to put it in a separate file, like this?

	GIT_TRACE=1 GIT_TRACE_PACKET="$(pwd)/log" \
	GIT_PROGRESS_DELAY=0 GIT_TRACE2_EVENT="$(pwd)/trace2" \
	GIT_TEST_SIDEBAND_ALL=1 \
	git -c [etc]

[...]
> @@ -875,6 +877,8 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test -f hfound &&
>  	test -f h2found &&
>  
> +	test_i18ngrep "Downloading packs" progress &&

That way, this "grep" could check the trace2 file which would contain
output intended for machines, and we wouldn't have to worry e.g. about
ANSII control codes potentially affecting the output around the space
some day in the progress output intended for a terminal.

With whatever subset of the changes described above make sense, this is
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 30, 2021

User Jonathan Nieder <jrnieder@gmail.com> has been added to the cc: list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants