Win http fetch #213

Closed
wants to merge 2 commits into
from

Projects

None yet

6 participants

@jamill
Member
jamill commented Sep 12, 2012

An updated Fetch implementation based off of @Ben's repository, from suggestions in Pull Request #206. This includes a basic test for fetching from a remote into an empty repository. I made some choices on how / where to expose this functionality, so please let me know if you have thoughts / comments / questions.

Missing functionality:

  • It does not appear that libgit2 fetches tags
    • Tests do no cover fetching tags
  • Completion callback in git_remote_callbacks does not appear to be called by libgit2
    • Completion callback not plumbed through LibGit2Sharp

Implementation:

  • I have exposed the git_remote_callbacks as a property on the FetchProgress object that is passed into Fetch(). This is slightly different than how it is organized in libgit2 (where the callbacks are added to the remote itself). I went this way because the lifetime of these callbacks seems tied to the actual fetch operation itself, at least in how I see callers of the LibGit2Sharp API. If I am missing something and this is not the appropriate place, please let me know.
  • I slighty changed the signature on the clone interface to so that callers of the LibGit2Sharp do not have to include the LibGitSharp.Core namespace. Instead, I added a wrapper around GitIndexerStats for LibGit2Sharp callers, and exposed the uint fields as long properties.
  • I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

Issues:

I have seen intermittent failures from libgit2 during both the clone and fetch tests that I have not tracked down at this point (as part of call to git_remote_update_tips in fetch, and git_clone in Clone).

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

@ben ben commented on an outdated diff Sep 12, 2012
LibGit2Sharp/Repository.cs
+ /// This allows fetching by a named remote or by url.
+ /// </summary>
+ /// <param name="remoteHandle"></param>
+ /// <param name="fetchProgress"></param>
+ private void FetchInternal(RemoteSafeHandle remoteHandle, FetchProgress fetchProgress)
+ {
+ // reset the current progress object
+ fetchProgress.Reset();
+
+ try
+ {
+ // It is OK to pass the reference to the GitCallbacks directly here because libgit2 makes a copy of
+ // the data in the git_remote_callbacks structure. If, in the future, libgit2 changes its implementation
+ // to store a reference to the git_remote_callbacks structure this would introduce a subtle bug
+ // where the managed layer could move the git_remote_callbacks to a different location in memory,
+ // but libgit2 would still reference the old address.
@ben
ben Sep 12, 2012 Member

💕 comments like this.

@nulltoken
Member

@jamill Wow. That's a very detailed PR. Thanks a lot for this!

It does not appear that libgit2 fetches tags

It doesn't do this by default. Yet. (@carlosmn ;-) )
However, maybe would it make sense to "emulate" this in C# meanwhile. Once the native implementation is done, we would get rid of the managed code (provided the tests still pass).

A naive implementation could be, once the heads have been fetched, to create a temporary remote targeting the same url and bearing a different refspec (refs/tags/*:refs/tags/* should work), fetch from this remote, then drop the remote. Provided we make this work, this might do this trick at the cost of an additional fetch.

I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

@ben @carlosmn Have you encountered this as well?

@ben
Member
ben commented Sep 13, 2012

I've seen it, and I can reproduce it fairly regularly using the libgit2sharp unit tests. I'll do some digging today.

@ben
Member
ben commented Sep 13, 2012

Found the root cause of the "OID doesn't exist" problem; a fix is pending on libgit2/libgit2#932. This will most likely only happen with REALLY small repos.

@nulltoken
Member

@ben ❤️

@jamill
Member
jamill commented Sep 13, 2012

@Ben Thanks!

@nulltoken

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

sure!

I will look at "emulating" the tag fetching in C#.

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method. The other approach would have been to expose loading the remote, downloading the packfile, updating the reference tips as seperate operations (including managing the lifetime of the remote handle and remote connection between - as they would have to be kept alive for longer than the single call on the public method)

@nulltoken
Member

I will look at "emulating" the tag fetching in C#.

Great!

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method.

I completely agree with this. Let's make the standard operations easy to deal with.

@dahlbyk
Member
dahlbyk commented Sep 14, 2012

Even if it's not the primary API, a useful overload might look something like this...

public delegate void FetchProgressHandler(string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);
@uluhonolulu
Contributor

+1 on handlers vs events -- makes consuming easier.

@jamill
Member
jamill commented Sep 15, 2012

@dahlbyk , @uluhonolulu Thanks for the feedback! I don't mind exposing these callbacks as delegates - I would prefer to only expose one way or the other, but I don't mind if that way is as delegates.

@dahlbyk dahlbyk commented on the diff Sep 16, 2012
LibGit2Sharp/Repository.cs
+ // It is OK to pass the reference to the GitCallbacks directly here because libgit2 makes a copy of
+ // the data in the git_remote_callbacks structure. If, in the future, libgit2 changes its implementation
+ // to store a reference to the git_remote_callbacks structure this would introduce a subtle bug
+ // where the managed layer could move the git_remote_callbacks to a different location in memory,
+ // but libgit2 would still reference the old address.
+ NativeMethods.git_remote_set_callbacks(remoteHandle, ref fetchProgress.RemoteCallbacks.GitCallbacks);
+
+ int res = NativeMethods.git_remote_connect(remoteHandle, GitDirection.Fetch);
+ Ensure.Success(res);
+
+ int downloadResult = NativeMethods.git_remote_download(remoteHandle, ref fetchProgress.bytes, fetchProgress.indexerStats);
+ Ensure.Success(downloadResult);
+ }
+ finally
+ {
+ if (remoteHandle != null)
@dahlbyk
dahlbyk Sep 16, 2012 Member

Is it even valid for remoteHandle to be null here?

@jamill
jamill Oct 9, 2012 Member

I will remove the null check here.

@dahlbyk
Member
dahlbyk commented Sep 16, 2012

I'll cast my vote for delegates:

FetchProgress progress = new FetchProgress();
progress.RemoteCallbacks.UpdateTipsChanged += new System.EventHandler<UpdateTipsChangedEventArgs>(helper.RemoteUpdateTipsHandler);
repo.Fetch(remote, progress);

vs

repo.Fetch(remote, onUpdateTips = helper.RemoteUpdateTipsHandler);

Can/should Fetch() support cancellation?

@jamill
Member
jamill commented Sep 17, 2012

Can/should Fetch() support cancellation?

I think it seems reasonable that Fetch() would support cancellation - however, I do not see this supported in libgit2's fetch implementation. I think this is something that would be handled at a layer below LibGit2Sharp...

@dahlbyk
Member
dahlbyk commented Sep 17, 2012

Is calling git_remote_disconnect mid-request not supported?

@jamill
Member
jamill commented Sep 18, 2012

Is calling git_remote_disconnect mid-request not supported?

I see, thanks! Looks like that would cancel the network transfer portion of the fetch (where I would expect the majority of the time is). I like the idea of fetch being cancelable, but I would like think through some of the details a bit more before I would go ahead and implement it. I wouldn't expect the Fetch interface to change much to support cancelation.

I will expose the callbacks as delegates instead of events. I had a question about the other part of your API suggestion. The signature of the Fetch method overload that you proposed takes some extra parameters (one for each of the callbacks) and removes the progress reporting parameter (containing the bytes received and the GitIndexerStats). Is the scenario that you have in mind that a caller would only care about the callbacks and not the other progress information? Or, were you suggesting that there should be a flavor of the Fetch method that can take each of the arguments individually as optional parameters.

That is, is:

public void Fetch(Remote remote,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

the full signature of the overload, or were you suggesting something more along the lines of:

public void Fetch(Remote remote,
              ref long bytes,
              ref GitIndexerStats stats,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

Thank you again for you feedback.

@dahlbyk
Member
dahlbyk commented Sep 18, 2012

My first reaction was to just return the FetchProgress:

public FetchProgress Fetch(Remote remote,
                           FetchProgressHandler onProgress = null,
                           UpdateTipsHandler onUpdateTips = null,
                           Action onCompleted = null);

However, it seems reasonable to assume onProgress would want access to the progress details, so maybe this makes more sense?

public delegate void FetchProgressHandler(FetchProgress progress, string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);

Or you could return the FetchProgress but also expose it to the callback.

@uluhonolulu
Contributor

+1 for public delegate void FetchProgressHandler(FetchProgress progress, string message); ref parameters should be avoided IMO -- make the code much more verbose than it has to be.

Returning FetchProgress seems reasonable if we want to have the Cancel method hanging off it.

var progress = repo.Fetch(..);
Button1.Click += () => progress.Cancel();

In that case, C# grammar requires we both return it and use it in the delegate, since we cannot write a lambda for it.

Anyway, let's try and write a couple of tests, and see which way is less painful.

@nulltoken nulltoken commented on the diff Sep 18, 2012
LibGit2Sharp.Tests/TestHelpers/BaseFixture.cs
@@ -83,7 +83,10 @@ public void Dispose()
{
foreach (string directory in directories)
{
- DirectoryHelper.DeleteDirectory(directory);
+ if (Directory.Exists(directory))
@nulltoken
nulltoken Sep 18, 2012 Member

Could you please give me some more context about this?
When would this check be required?

@jamill
jamill Sep 18, 2012 Member

I had hit an issue running the CanCloneARepository(...) tests where the test would create the SelfCleaningDirectory object and register the temporary directory path for deletion, but the test would fail in Repository.Clone(...) before the temporary directory actually got created. On cleanup, the test would throw an exception indicating that it could not find the path of the temporary directory (as it does not exist). The end result is that the test reports that it failed to find the path of the temporary directory (which hides the original exception).

@nulltoken
nulltoken Sep 18, 2012 Member

Thanks a lot for your answer. Indeed, that makes sense.

However, I'm afraid this check could hide that something may have went wrong earlier (a wrongly written test for instance). Could you think of a way to let the user know about this not expected state?

@jamill
jamill Oct 9, 2012 Member

For now (especially as the clone is no longer part of the test suite) I will remove this change so we can concentrate on the core Fetch changes.

@nulltoken nulltoken commented on the diff Sep 18, 2012
LibGit2Sharp.Tests/BranchFixture.cs
@@ -221,6 +221,19 @@ public void CanListAllBranchesIncludingRemoteRefs()
}
[Fact]
+ public void CanResolveTrackedRemote()
+ {
+ using (var repo = new Repository(StandardTestRepoPath))
+ {
+ Branch master = repo.Branches["master"];
+ Assert.Equal(repo.Remotes["origin"], master.ResolveTrackedRemote());
+
+ Branch test = repo.Branches["test"];
+ Assert.Null(test);
@nulltoken
nulltoken Sep 18, 2012 Member

What's the use of this assert?

@jamill
jamill Oct 9, 2012 Member

This is not correct - I will update this test to cover what I meant for it to test.

@nulltoken nulltoken commented on the diff Sep 18, 2012
LibGit2Sharp/Branch.cs
@@ -136,9 +136,31 @@ public virtual ICommitLog Commits
get { return repo.Commits.QueryBy(new Filter { Since = this }); }
}
+ /// <summary>
+ /// Gets the <see cref="Remote"/> tracking this branch
+ /// </summary>
+ /// <returns>The <see cref="Remote"/> tracking this branch if one exists, otherwise returns null.</returns>
+ public virtual Remote ResolveTrackedRemote()
+ {
@nulltoken
nulltoken Sep 18, 2012 Member

I'd rather be in favor of exposing a property Remote.

Could you please cover with test how this would behave when dealing with a local tracking branch (there's one in the StandardTestRepository named "track-local" IIRC)?

@jamill
jamill Oct 9, 2012 Member

I will expose this as a new property.

I was not aware of local tracking tracking branches ... I covered this behavior in a test, In this case, there is no "Remote" for the local machine that I could see, so this would return "null"... Maybe we should add a property indicating that this is a "local tracking" branch?

@nulltoken nulltoken commented on the diff Sep 18, 2012
LibGit2Sharp/Core/GitIndexerStats.cs
@@ -3,10 +3,10 @@
namespace LibGit2Sharp.Core
{
[StructLayout(LayoutKind.Sequential)]
- public class GitIndexerStats
+ internal class GitIndexerStats
@nulltoken
nulltoken Sep 18, 2012 Member

❤️ !

@nulltoken nulltoken commented on the diff Sep 18, 2012
LibGit2Sharp/FetchProgress.cs
@@ -0,0 +1,50 @@
+using LibGit2Sharp.Core;
+using System;
+using System.Collections.Generic;
+using System.Linq;
+using System.Text;
+
+namespace LibGit2Sharp
+{
+ /// <summary>
+ /// Contains data regarding fetch progress.
+ /// </summary>
+ public class FetchProgress : IndexerStats
@nulltoken
nulltoken Sep 18, 2012 Member

Nitpick... From a semantic standpoint, I don't think FetchProgress should derive from IndexerStats.
To my knowledge, indexing is only the last step of the fetching process.

@jamill
jamill Oct 9, 2012 Member

I will split this out so FetchProgress no longer derives from IndexerStats.

@jamill
Member
jamill commented Sep 18, 2012

@dahlbyk, @uluhonolulu Regarding the Fetch method:

  1. As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure. This is also why there are ref parameters. I could look at making method this asynchronous, but it did not appear that were any other asynchronous methods at the moment, which is one reason I did not go down this route. Has there been other discussion(s) on how to handle long running operations in LibGit2Sharp (push, pull, fetch, clone, etc)?

  2. I could see the progress datastructure could be useful to the OnProgress callback. Currently the OnProgress matches the libgit2 progress callback. How do we feel about breaking this mapping here - I know the high level design goals indicate that LibGit2Sharp is not a 1:1 mapping of libgit2, so maybe this would not be a problem.

@nulltoken
Member

@ben regarding the Clone implementation, could you please get rid of the GitCheckoutOptions?

@nulltoken
Member

@jamill @ben I'm not in favor of exposing the IndexerStats as it is. Unless I'm wrong, consuming it would require a separate thread.

@carlosmn How complex would that be (or is it even feasible) to make the libgit2 indexing process invoke a progress callback?

@nulltoken
Member

@jamill

Regarding the API:

I'm not sure about Events..

I also agree with @dahlbyk and @uluhonolulu. Let's go down the delegates road :)

Is calling git_remote_disconnect mid-request not supported?

When callbacks are involved, usually returning something different from 0 is the "conventional" way to interrupt the process. Hopefully, the remote would be disconnected.

Can/should Fetch() support cancellation?

I'm not even sure this scenario has been tested at the libgit2 level. /cc @carlosmn

@nulltoken
Member

@jamill

  1. As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure.

FWIW libgit2/libgit2#890 is trying to tackle this issue (/cc @ben)

@nulltoken
Member

@jamill @ben Provided I do not encounter any regression while running the LibGit2Sharp test suite, I'm going to update vNext very soon with a newer version of libgit2 as soon as libgit2/libgit2#932 is merged.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

/cc @arrbee

@jamill
Member
jamill commented Sep 18, 2012

@nulltoken @ben @dahlbyk

Thanks, so a couple of questions:

1. How to report progress

My understanding of the current state of things is that currently, the best way to get some of the progress information for git_indexer_stats and the bytes is by passing them in as ref parameters to the fetch call.

  • Including these parameters as part of the OnProgress callback does not look like it would work, as it appears that the git_indexer_stat datastructure and the OnProgress callbacks are updated at different points of the fetch process.
    This has several downsides, in particular that it can be akward to work with.
  • This has the downside that this can be akward for callers (need to be queried on a different thread, method signature has ref parameters)

However, there is some work going on to make update progress reporting:

  • With libgit2/libgit2#890, the progress callback should report more interperatable information.
  • If it is possible to for the indexing process to invoke a callback, then we could at least get indexing progress on the same thread. I will wait for @carlosmn for more on this

With these items, it seems we would not have to query the progress datastructures on a seperate thread for progress.

2. Whether there should be an asynchronous fetch method

From the API usage examples above, it seems that an asynchronous version of this method would be nice. In the clone / fetch examples for libgit2 the caller is responsible for running clone / fetch on a background thread. Should LibGit2Sharp follow this pattern? Alternatively, should LibGit2Sharp provide an asynchronous method that will go ahead and run the long running operation on the background thread and return immediately.

Even if there was an async version of Fetch, it seems that a synchronous version is needed. Given the above, I propose the following to move forward (in addition to addressing other feedback):

  • Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.
  • Leave an async version of fetch for future work.

Thoughts?

@nulltoken
Member

Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.

👍 Even if I tend to avoid delivering API that will be broken by next version, Fetch has been delayed for too long.

Leave an async version of fetch for future work.

👍 as well.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

Unfortunately, libgit2/libgit2#932 cannot be merged right now. However, in order to not keep you holding, vNext has been updated with the tip of libgit2 development branch (libgit2/libgit2@411cb01).

Please note that a Proxy class has been added since the benstraub/clone branch was created. It's now required to go through this layer when invoking NativeMethods.

@dahlbyk
Member
dahlbyk commented Sep 18, 2012

👍 from me as well.

Thanks for clearing up the difference between the values exposed through FetchProgress and the onProgress callback. A callback for stats seems like a better option there.

@uluhonolulu
Contributor

I'm all for release now, enhance later.

@ben
Member
ben commented Sep 19, 2012

I'm actually thinking that passing a structure that receives progress info is a Bad Idea. What if the GC moves the structure around on the managed heap during the fetch? The C code wouldn't have a clue, and Bad Things could happen.

Given this, we should probably double-down on delegates for reporting progress, which means I'll have to think about how to make that happen with clone. Unfortunately I'm jet-lagged today and speaking at a conference tomorrow morning, so I can't read through @jamill's code to figure out how it should be done. Hopefully I'll have a chance in the next couple of days.

@jamill
Member
jamill commented Sep 19, 2012

@ben From a correctness point of view, I think passing structures that receive progress info is OK - The progress structures are pinned by the interop layer during the call to git_remote_download and should not be moved by GC (git_remote_download is synchronous and will not return until fetch is complete).

@carlosmn
Member
carlosmn commented Oct 2, 2012

The code to auto-follow tags and create them has been merged to libgit2, you'll get that whenever you update the library.

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

@nulltoken
Member

@carlosmn ❤️

@jamill I'll update vNext with a more recent version of libgit2 in the following days.

@jamill
Member
jamill commented Oct 3, 2012

@carlosmn , @nulltoken Thanks! I will move back on to the main development branch. I will get back to this shortly and update the change based on the feedback.

@dahlbyk
Member
dahlbyk commented Oct 3, 2012

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

I'd go with what's easiest. If they're equally easy, go with the more granular. If it's called more often than a consumer cares for, they can throttle it.

@nulltoken
Member

@jamill vNext has been updated with libgit2/libgit2@9adfa7d

@jamill jamill referenced this pull request Oct 9, 2012
Closed

Fetch #221

@jamill
Member
jamill commented Oct 9, 2012

@nulltoken @dahlbyk @carlosmn @uluhonolulu : I have created a new PR with the changes based off of vNext (I did not see a way to update the branch the PR is based off of - if I missed something, please let me know).

Again, thank you everyone for the feedback so far. I have addressed much of it in the new PR.

@nulltoken
Member

Closing this one, as all the love is now redirected to #221

@nulltoken nulltoken closed this Oct 10, 2012
@jamill jamill referenced this pull request Oct 12, 2012
Merged

Clone #223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment