Awesome work! 😍 #1

jairbubbles · 2023-04-22T06:37:25Z

Having a managed library to read .git repository is a plus for the .NET ecosystem.

I quickly looked at how libGit2Sharp is used in our application and I'm wondering if you have in mind to support more things in this library or keep it super minimalistic.

Here is the main struct that we fill when we inspect a repo:

internal struct GitInfos
    {
        public string RemoteUrl;
        public IGitCommit HeadCommit;
        public IGitBranch CurrentBranch;
        public IReadOnlyList<GitBranch> Branches;
        public IGitTag[] Tags;
        public IGitRemote[] Remotes;
        public IGitStash[] Stashes;
        public DateTimeOffset LastCommitDate;
        public int BehindDefaultBranch;
        public IGitBranch DefaultRemoteBranch;
        public IGitOperation CurrentOperation;
}

Your lib seems to support already the main things but is lacking things like remotes or stashes.

Let me know, I'd be much interested to contribute. Cheers!

The text was updated successfully, but these errors were encountered:

kekyo · 2023-04-22T11:02:40Z

Thanks your comment!

Interesting information, I wrote GitReader to remove libgit2sharp from RelaxVersioner, but of course for general purpose. From your code, I've tried to map the functionality you want GitReader to have:

string RemoteUrl : GitReader does not have remote fetch ability, so I do not plan to make it possible to retrieve URLs at this time. However, it is not difficult to make URL acquisition possible, and I will consider it if necessary.
IGitCommit HeadCommit : Already can get it.
IGitBranch CurrentBranch : Already can get it.
IReadOnlyList<GitBranch> Branches : Already can get it.
IGitTag[] Tags : Already can get it.
IGitRemote[] Remotes : GitReader does not have remote fetch ability, so I do not plan to make it.
IGitStash[] Stashes : Can not get it, but I will consider it positively.
DateTimeOffset LastCommitDate : Already can get it from head commit.
int BehindDefaultBranch : (?) Required manually calculation.
IGitBranch DefaultRemoteBranch : Already can get it from remote branch list.
IGitOperation CurrentOperation : (?)

jairbubbles · 2023-04-22T11:31:17Z

Happy to know you have great plans for your lib. We use libGit2Sharp for other things and totally replacing it with GitReader is probably a long shot (our application is basically a full featured git GUI). We use git.exe for commands (including fetch and status) and libGit2Sharp to introspect the repository, build the graph, compute diff between commits, edit remotes...

kekyo · 2023-04-22T23:54:23Z

Well, I have often wanted to analyze Git commit graphs (I don't do it full time now, but in the past I have held both CI and progress analysis maintainer roles). One of my motivations is that it would be useful to have such a library as an infrastructure that can be easily handled for such purposes ;)

kekyo · 2023-05-18T07:45:24Z

@jairbubbles 0.10.0 released.

After the merged, I did some tweaking for consistency. If you have any problems, please throw them here or create a separate issue if appropriate.

jairbubbles · 2023-05-18T07:51:30Z

@kekyo Cool! I have started some work to benchmark GitReader vs LibGit2Sharp. In a nusthell, what I see is that it's faster when using the "primitive" open (but we're not getting all info that LibGit2Sharp is providing) but it's a lot slower when using the "structure" open which is too bad as the data structures are a lot user friendly.

jairbubbles · 2023-05-18T07:53:50Z

Method	Mean	Error	StdDev
GitReader	4.604 ms	0.0415 ms	0.0347 ms
GitReaderStructured	187.221 ms	17.3900 ms	51.0018 ms
LibGit2sharp	20.431 ms	0.4045 ms	0.4496 ms

kekyo · 2023-05-18T09:49:02Z

Primitive access has been able to reduce latency more than expected 😄

The difficulty is that when we open the repository in Structures interface, it is loading packed indexes, branches, tags, stashes, and many other things...
However, since these are asynchronous operations, it is difficult to say that they cannot be done on demand when accessing Repository.Branches.get() its non-awaitable.

At first, I thought about designing a method like Repository.GetBranchesAsync(), but then there is the problem of what to do with Commit.Branches.get(). There is a way to make everything including these methods awaitable, but that would affect convenience, so I left it out of the basic design of Structures interface.

jairbubbles · 2023-05-18T09:53:43Z

Yest but in my benchmark I'm also getting the references in the "primitive" mode. My guess is that it's the commit resolving which is taking a lot of time. I'm wondering if a lazy evaluation approach like in LibGit2sharp wouldn't be better.

jairbubbles · 2023-05-18T13:22:47Z

I quickly test to remove Commit / Tag resolving for each reference and as expected the Primitive / Structured have similar performance:

Method	Mean	Error	StdDev
GitReader	4.562 ms	0.0652 ms	0.0544 ms
GitReaderStructured	4.143 ms	0.0820 ms	0.0976 ms
LibGit2sharp	24.052 ms	0.4013 ms	0.3557 ms

kekyo · 2023-05-19T01:27:06Z

For example, even in the Structures interface, we may be able to use the idea to stop reading Branches, Tags, etc. in bulk when they are opened, and instead have them call an asynchronous method that explicitly reads them. Suppose we could control what information to read with FillFlags like the following:

[Flags]
enum FillFlags
{
  None = 0x00,
  Branches = 0x01,
  RemoteBranches = 0x02,
  Tags = 0x04,
  Stashes = 0x08,
  All = 0x0f,
}

// (Defaulted: FillFlags.All)
using var repository = await Repository.Factory.OpenStructureAsync(FillFlags.None);

// All refernces are NOT loaded.
Trace.Assert(repository.Branches.Count == 0);
Trace.Assert(repository.RemoteBranches.Count == 0);
Trace.Assert(repository.Tags.Count == 0);
Trace.Assert(repository.Stashes.Count == 0);

// The commit doesn't fixup any additional informations.
var commit = await repository.GetCommitAsync("....");

Trace.Assert(commit.Branches.Count == 0);
Trace.Assert(commit.RemoteBranches.Count == 0);
Trace.Assert(commit.Tags.Count == 0);

// After delayed but explicitly reading:
await repository.FillImmediateAsync(FillFlags.Branches | FillFlags.Tags);

Trace.Assert(repository.Branches.Count >= 1);
Trace.Assert(repository.Tags.Count >= 1);

// (this may require careful implementation of the process in Commit to make this possible)
Trace.Assert(commit.Branches.Count >= 1);
Trace.Assert(commit.Tags.Count >= 1);

By explicitly calling FillImmediateAsync(), users can control the timing of time-consuming tasks themselves. And by default, everything is read automatically, so the convenience of the current Structures interface is not lost.

jairbubbles · 2023-05-19T06:47:48Z

@kekyo I agree that we need control but reading references is not really slow when they are packed. We also need to control commits / tag resolving, it would be some kind of prefetch option. Do you want to pay the price of resolving right away when you open the repository or when you access objects later on?

For instance if you have 490 packed branches, 10 branches in refs/heads/. The cost would be in the commits resolving as you'll have to resolve 500 commits.

Moreover, I feel like it's mostly useless to resolve all branches or tags, it's unlikely that you need that info for all them, at least for most common scenarios.

As for controlling references retrieval why not exposing directly the methods on the repository?

// Method for each types?
public class Respository
{
  IReadonLyDictionary<string, Branch> GetBranchesAsync(ResolvingFlags ...)
  IReadonLyDictionary<string, Branch> GetRemoteBranchesAsync(ResolvingFlags ...)
  IReadonLyCollection<Stash> GetStashesAsync(ResolvingFlags ...)
  ...
}
// Or more generic?
public class Respository
{
  IReadonLyDictionary<string, Branch> GetReferencesAsync(ReferenceTypes...)
  ...
}

public enum ReferenceTypes
{
  Branch,
  RemoteBranch,
  Stash,
  Tag 
}

If we provide enough control through these methods we wouldn't need structures vs primitives anymore which would make the code a lot simpler / easier to consume and it would cover a lot a different use cases.

jairbubbles · 2023-05-19T06:56:11Z

In my use case, we open the repository to get its info as soon as the file watcher detects a change so we want this to be as fast as possible. I was thinking that it would be interesting to be able to keep the object cache between several repository opening.

// Persistent cache that we would be kept in memory
static ObjectsCache  cache = new ObjectsCache();

// When we refresh we would pass the cache
var repository = Factory.OpenRepository(cache);
var branch = await repository.GetHeadBranchAsync();
var commit = await branch.GetCommitAsync(); // If the commit didn't change we didn't price to look for commits in the disk, it's already in the cache

kekyo · 2023-05-19T09:45:36Z

I see, so you are saying that you would eliminate property accesses such as Repository.Branches and switch to an awaitable asynchronous method like Repository.GetBranchesAsync(). I thought about that too, but I figured that making it a method would be a bad debugging experience.

An immediate example is the test result of Verify(model) in GitReader.Tests, which is property-accessible, so the test is easy to write. If access to branches and tags were only possible via asynchronous method calls, we need to write code to retrieve this information every time with asynchoronous method calling.

Since this example is test code for GitReader, it is fine to write labor-intensive asynchronous method call code, but it is easy to imagine that this kind of labor would be required in general use. Since the Structures interface is a high-level interface, I thought it would be desirable to make it easier to use, even at the compromise of performance loss.

(I think it would be better if there was something in between the Structures interface and the Primitive interface, but I also think that having too many options is a problem...)

jairbubbles · 2023-05-19T10:42:23Z

but I figured that making it a method would be a bad debugging experience.

How so? I mean you have one method call for one what you need, it's pretty straight forward.

so the test is easy to write

Well for Verify creating a wrapper class is probably the best approach, it gives you control on what you want to test:

internal class RepositoryWrapper
{
  async Task<RepositoryWrapper> InitAsync(string gitPath)
  {
    var repository = Factory.OpenRepository(cache);
    Branches  = await GetBranchesAsync();
    RemoteBranches  = await GetBranchesAsync();
  }
  
   IReadonlyDictionary<string, Branch> Branches { get; }
   IReadonlyDictionary<string, Branch> RemoteBranches { get; }
   ...
}

I feel like the high level interface should:

expose user friendly classes like Branch, Tag and so on
as performant as possible 😍

But having a low level is also super interesting for more advanced scenarios but I would expose things like:

ReadPackedRefs
ReadReferences
ReadGitConfig
....

It wouldn't expose a class Repository for that API it could be only static methods that takes a .git path.

jairbubbles · 2023-05-19T18:46:04Z

In #3 I'm not resolving anymore the commits, it's still slower because of the tags resolving (I have many in the repo I'm benchmarking). I see an optimisation by treating the info about peeled tags in packed-refs, it's currently ignored.

Method	Mean	Error	StdDev
GitReader	3.833 ms	0.0397 ms	0.0310 ms
GitReaderStructured	97.224 ms	6.3351 ms	18.6791 ms
LibGit2sharp	19.009 ms	0.2665 ms	0.2492 ms

kekyo · 2023-06-25T09:26:00Z

Thank you again your suggestions, GitReader reached 1.0.0!

This issue is closed, please open new issue when you want to.

kekyo added the question Further information is requested label Apr 22, 2023

kekyo linked a pull request May 18, 2023 that will close this issue

GetStashes / GetRefLog #2

Merged

kekyo mentioned this issue May 19, 2023

CommitRef #5

Closed

kekyo closed this as completed Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Awesome work! 😍 #1

Awesome work! 😍 #1

jairbubbles commented Apr 22, 2023

kekyo commented Apr 22, 2023

jairbubbles commented Apr 22, 2023

kekyo commented Apr 22, 2023

kekyo commented May 18, 2023

jairbubbles commented May 18, 2023

jairbubbles commented May 18, 2023

kekyo commented May 18, 2023

jairbubbles commented May 18, 2023 •

edited

jairbubbles commented May 18, 2023 •

edited

kekyo commented May 19, 2023 •

edited

jairbubbles commented May 19, 2023

jairbubbles commented May 19, 2023

kekyo commented May 19, 2023 •

edited

jairbubbles commented May 19, 2023

jairbubbles commented May 19, 2023

kekyo commented Jun 25, 2023

Awesome work! 😍 #1

Awesome work! 😍 #1

Comments

jairbubbles commented Apr 22, 2023

kekyo commented Apr 22, 2023

jairbubbles commented Apr 22, 2023

kekyo commented Apr 22, 2023

kekyo commented May 18, 2023

jairbubbles commented May 18, 2023

jairbubbles commented May 18, 2023

kekyo commented May 18, 2023

jairbubbles commented May 18, 2023 • edited

jairbubbles commented May 18, 2023 • edited

kekyo commented May 19, 2023 • edited

jairbubbles commented May 19, 2023

jairbubbles commented May 19, 2023

kekyo commented May 19, 2023 • edited

jairbubbles commented May 19, 2023

jairbubbles commented May 19, 2023

kekyo commented Jun 25, 2023

jairbubbles commented May 18, 2023 •

edited

jairbubbles commented May 18, 2023 •

edited

kekyo commented May 19, 2023 •

edited

kekyo commented May 19, 2023 •

edited