Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

plumbing: object, add APIs for traversing over commit graphs #1132

Merged
merged 19 commits into from
May 14, 2019

Conversation

filipnavara
Copy link
Contributor

@filipnavara filipnavara commented Apr 23, 2019

Follow-up to #1128. Only the last commit is relevant, rest will be disapper with rebase eventually.

This wires up the commit-graph API on the object layer. It introduces a concept of a lightweight CommitNode interface that is used to encapsulate either real Commit or a node loaded from the commit graph. A second interface called CommitNodeIndex is introduced to access the graph of CommitNode objects.

Two implementations of CommitNode and CommitNodeIndex are provided. One of them operates on top of storer.EncodedObjectStorer and regular git objects. The other one operates on commit-graph files and has fallback to storer.EncodedObjectStorer for cases where the commit-graph file doesn't cover the entire repository.

NewCommitNodeIterCTime iterator is added, analogous to NewCommitIterCTime, that can operate on the CommitNodeIndex implementations.

@filipnavara filipnavara changed the title plumbing: format/commitgraph, add APIs for reading and writing commit-graph files plumbing: object, add APIs for traversing over commit graphs Apr 23, 2019
@filipnavara filipnavara force-pushed the commitgraph-obj branch 2 times, most recently from 3ab40da to ec65d90 Compare April 24, 2019 08:48
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara filipnavara marked this pull request as ready for review April 24, 2019 09:58
Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara filipnavara reopened this Apr 24, 2019
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
func NewCommitNodeIterCTime(
c CommitNode,
seenExternal map[plumbing.Hash]bool,
ignore []plumbing.Hash,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mimics the interface from NewCommitIterCTime. As far as I can see ignore and seenExternal essentially serve the same purpose, perhaps with different performance characteristics. I'd be fine with keeping it this way or removing one of them.


// CommitNode is generic interface encapsulating a lightweight commit object retrieved
// from CommitNodeIndex
type CommitNode interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get the generation of the commit? This information is interesting to compare two commits or to stop walking at a certain level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is available at the format level, but I didn't propagate it here and it is not exposed through API at this level. I will look into it. One thing to consider though is that the Git folks are looking into replacing generation numbers with "corrected commit date". I will link to the relevent discussion once I get back to my computer...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, V3 was the one I most recently read about. There's no final decision about any particular approach, but I didn't feel confident about designing API around something that is likely to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are three things we can expose with a reasonable level of forward compatibility with the new proposals:

  • Generation() uint64 API on CommitNode that will uphold the property that if A.Generation() < B.Generation() then B is unreachable by A. While the current format allows the generation to be only 30-bit integer extending it to 64-bit should account for any of the other proposals to work with the same API (except maybe the FELINE version), including the 34-bit corrected commit dates.
  • A high-level API like IsUnreachableFrom(a, b CommitNode). The idea is to abstract away the concept of the generations from the user and just expose the most useful underlying property. This may need some thinking since we basically have three states: a) definitely unreachable (property coming from the generations) b) definitely reachable (not in the commit-graph, but perhaps something that is stored in reachability bitmap for some subset of commits?) c) we don't know.
  • High-level API for priority heaps for traversing the graph in topological order by reusing the Generation property without directly exposing it.

Thoughts? I don't feel like implementing the last one since I currently have no use case for it, but maybe it would be useful for merge-base or something of that sort.

Copy link
Contributor Author

@filipnavara filipnavara Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't feel confident enough to establish API shape for two of the above options. I've added the Generation() method to CommitNode in a fashion similar to what git uses internally. It should be fairly easy to expand on it later.

Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara
Copy link
Contributor Author

Any news about this one? I have the next PR in the pipeline and this one is approved... /cc @mcuadros

@mcuadros
Copy link
Contributor

mcuadros commented May 3, 2019

I rather not add new objects for unstable APIs to the object package, maybe make sense add this to any package called commitgraph, and mark this as EXPERIMENTAL, I am not very sure about this API or where is going.

@filipnavara
Copy link
Contributor Author

I was considering that. The reason I put it into the object package was that git itself has only one Commit object and it hides from the API user whether it was loaded from commit-graph or object storage. Unfortunately that's not possible to do in go-git without incompatible API changes because the current Commit struct exposes data as fields, not through methods.

I do consider this somewhat experimental which is one of the reasons why it's designed to be quite independent of the rest of the code.

As for where it's going I plan to write it up all the way to the Repository and provide methods for the following:

  • Getting a CommitNodeIndex object that will transparently return either one using commit-graph, if available, or one using the object storage
  • Writing a commit-graph file for existing repository.

Additionally I may rewrite the Log, Blame or other similar methods to use the new APIs and validate its shape.

@mcuadros
Copy link
Contributor

mcuadros commented May 3, 2019

I understand the situation that's why I rather keep this package and his usage limited to some packages until we reach the v5 of the library.

Meanwhile, we can place all the code in a package.
Otherwise will be hard to understand by the users.

@filipnavara
Copy link
Contributor Author

Fine with me. The only problem with the commitgraph name is that it will inevitably clash with the name of the plumbing/fmt/commitgraph package and it will have to be aliased. If that's acceptable I will just do it.

@mcuadros
Copy link
Contributor

mcuadros commented May 3, 2019

The format is used inside of the object/commitgraph, so it's not a problem.

@mcuadros
Copy link
Contributor

mcuadros commented May 3, 2019

Awesome thanks! Can you add a doc.go file with the package documentation explaining the goal and also that is an experimental package (this implies that the API is also unstable), also will be great if you can add an example. Thanks!

@filipnavara
Copy link
Contributor Author

filipnavara commented May 3, 2019

Can you add a doc.go file with the package documentation explaining the goal and also that is an experimental package (this implies that the API is also unstable), also will be great if you can add an example.

Sure. I'll add the doc.go for both this and the plumbing/fmt/commitgraph package. The examples would make more sense once I add the high-level API but I will see what can I do.

Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara
Copy link
Contributor Author

I added basic documentation and one high-level usage example.

Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara
Copy link
Contributor Author

@mcuadros Anything else that is missing and needs to be addressed?

@jfontan jfontan requested a review from mcuadros May 14, 2019 13:42
@mcuadros mcuadros merged commit 52fcf7d into src-d:master May 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants