Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress callbacks #990

Merged
merged 27 commits into from
Oct 25, 2012
Merged

Progress callbacks #990

merged 27 commits into from
Oct 25, 2012

Conversation

ben
Copy link
Member

@ben ben commented Oct 17, 2012

Fetch, checkout, and clone currently report progress through a shared memory object, which requires a two-thread arrangement on the caller's side. This is a generally good idea, but not especially friendly to bindings.

This PR converts these three APIs to use inline callbacks for reporting progress, so no threading is necessary. It probably accomplishes the same goal as #890, and in a more binding-friendly way.

  • Checkout will report on every diff
  • Fetch will report on every object received, every object indexed, and (ideally) every 100kb or so received

data->found_submodules = true;
data->num_stages = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this could come fairly late in the checkout process (always in the second stage? possibly as the very last item of the second stage?) Is it possible that progress could jump back quite a bit? (say, from ~100% to ~66% in the worst case...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually hits somewhere in the first stage, while missing items are being created. And yeah, I've seen the progress jump back from 50% to 33%. Is this better or worse than having an instantaneous jump from 66% to 100%?

We could also weight the third stage to be 10% of the total and always include it. It's only creating a relatively small number of directories, and I doubt we'll be implementing anything that includes fetching and checking out the submodules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having progress jump backwards is very problematic. It seems vastly preferable to have it jump forward. Imagine this being displayed as a progress bar in the UI. A backwards jump is not good a nice user experience.

By the way, I actually think as implemented, this jump will happen somewhere during the second phase (first phase is removing files). You can move the jump to the first phase by replicating the "is there a submodule that would need to be checked out" logic while scanning through the deletes, but I opted not to do that since I knew that further rethinking of the progress reporting would still be yet to come.

Another alternative is to exclude the third pass from the progress reporting completely since it just calls mkdir on the submodule. We could then require an explicit call to checkout submodules and/or add a submodule callback a la the notification callback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the time spent doing mkdir is probably negligible. I'll weight that stage to 0%. 😉

@jamill
Copy link
Member

jamill commented Oct 19, 2012

Thanks Ben! I think this looks pretty good to me!

Just two small question on the progress value calculation.

  1. With the current logic is that it seems that it would be possible for the progress value to jump back (if there are submodules).

  2. Does the checkout algorithm know the total number of files it is going to modify? And how many it has already updated? For instance, Core Git (from observation) reports checkout progress in the following format: Checking out files: 100% (253/253), done. I assume the numbers refer to the number of files it is going to modify. Would that be a useful metric to report as well (if possible)? I am not sure how much work it would be to calculate this data from the diff structures (# of updated files, total # of files to update) or if it is even possible. Just a thought - it seems that this would also avoid the backward progress (if that is an issue).

@ben
Copy link
Member Author

ben commented Oct 19, 2012

Thanks for the review!

  1. See above.
  2. I'm being a little lazy here. Yes, we know how many steps the checkout will go through, but git_diff_foreach passes in a float for progress, so that's what I used. I'll put that on my list.

ben and others added 22 commits October 19, 2012 19:34
Also removing all the *stats parameters from external
APIs that don't need them anymore.
Also converted the network example to use it.
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.
Also implemented in the git2 example.
The fetch code takes advantage of this to implement a
progress callback every 100kb of transfer.
Also, now only reporting checkout progress for files that
are actually being added or removed.
@ben
Copy link
Member Author

ben commented Oct 20, 2012

There, I switched the checkout callbacks to current/total.

I've also added a network-transfer callback for when you're fetching large objects (try git2 clone https://github.com/PublicMapping/DistrictBuilder /tmp/foo and watch what happens at 91% of the fetch). @carlosmn will have something to say about this.

@carlosmn
Copy link
Member

The only thing would be that the callback and bytes use different sources, and bytes is bound to lag a bit behind, as it only updates when we receive a full packet.

@ben
Copy link
Member Author

ben commented Oct 22, 2012

The callback gets an accumulated number of bytes that's been returned from p_recv or SSL_read, which is the same place gitno_recv gets its number from. There will be a bit of jitter from sideband packets, so the absolute number of bytes will occasionally differ, but it's brought back into sync when a GIT_PKT_DATA packet has been completed.

@ben
Copy link
Member Author

ben commented Oct 23, 2012

I think this is ready. Any more commentary?

git_repository **out,
const char *origin_url,
const char *dest_path,
git_indexer_progress_callback fetch_progress_cb,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This irks me. Is indexer_progress the right name for a callback that will also report network operations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • git_fetch_progress_callback?
  • git_network_progress_callback?
  • git_something_happened_update_your_ui?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: git_indexer_stats is used to report network-transfer progress. Just sayin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There, I changed everything that was git_indexer_* to git_transfer_progress_*, and sane-ified the progress member naming. What do you think?

@jamill
Copy link
Member

jamill commented Oct 24, 2012

git_remote_download still passes the bytes parameter by reference. Will consumers still have to poll in order to get this value?

I guess consumers could read the bytes value as part of the callback routine (either include a pointer to it in progress_payload or some other mechanism) - but will this be sufficient to cover the times that bytes are being received? That is, will there be extended periods where we are receiving bytes but not calling any callbacks?

git_indexer_stats and friends -> git_transfer_progress*

Also made git_transfer_progress members more sanely
named.
@ben
Copy link
Member Author

ben commented Oct 24, 2012

@jamill: how would you feel if I nuked bytes from git_remote_download? If the caller wants progress info, they can always call git_remote_stats and watch that buffer, or provide a callback.

@jamill
Copy link
Member

jamill commented Oct 24, 2012

how would you feel if I nuked bytes from git_remote_download? If the caller wants progress info, they can always call git_remote_stats and watch that buffer, or provide a callback.

👍 I think that makes sense. The other reference parameters have already been removed. I think it would be consistent to remove bytes as well.

vmg pushed a commit that referenced this pull request Oct 25, 2012
@vmg vmg merged commit 1eb8cd7 into libgit2:development Oct 25, 2012
phatblat pushed a commit to phatblat/libgit2 that referenced this pull request Sep 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants