-
Notifications
You must be signed in to change notification settings - Fork 533
Improved consistency of Tree iterators #30
Conversation
Current coverage is
|
|
I need to review it, but on the meanwhile can you squash the commits? |
Instead of returning a channel of files, Tree.Files() now returns a FileIter with these qualities: * It returns files in the original order of the repository (relying on a * new Tree.OrderedNames property) * It can return errors encountered when retrieving files and trees from * underlying storage * It can be Closed without having to drain the entire channel * It defers the heavy lifting to a new TreeWalker type * Its behavior is a little more consistent with other Iter types * It's a little less prone to memory leaks This update includes a new TreeWalker type that will iterate through all of the entries of a tree and its descendant subtrees. It does the dirty work that Tree.walkEntries() used to do, but with a public API. A new TreeIter type is also included that just walks through subtrees. This could be useful for performing a directory search while ignoring files/blobs altogether.
82df1bf to
8261deb
Compare
|
@mcuadros Commits are now squashed. |
tree.go
Outdated
| Entries map[string]TreeEntry | ||
| Hash core.Hash | ||
| Entries map[string]TreeEntry | ||
| OrderedNames []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be useful and appropriate to keep the entries ordered. I already found it useful for writing tests, and it could make Tree comparisons more efficient when we can rely on a known order.
The hash of the tree is already dependent on its order, and that order generally seems well defined. If a tree is recreated with the same entries in a different order I expect that it would produce a different hash. Users of go-git might be surprised to get the files back in a weird order when the order is well defined by the tree itself.
Relying on the map here meant that the resulting order would always be random. Users could sort it again to put it back in order, but that seems wasteful. Also, golang's default lexicographic sort algorithm is different than what I'm seeing in most git repositories.
The implementation under review here does add some overhead and I actually don't like it very much. What I'd really like to do is this:
- Change the definition of
Tree.Entriesto[]TreeEntry - Add a
Tree.Mapproperty asmap[string]*TreeEntry - Build
Tree.Maplazily: only on the first call toTree.File()for example (it basically functions as a cache) - Update
TreeWalkerto simply iterate throughTree.Entries(it wouldn't touchTree.Map)
Making such a change would have these advantages:
- The implementation would be cleaner than the current code under review
- Users of
Treethat don't need random lookups would never bother generating a map (a performance improvement)
I was hesitant to do this before because it's more invasive, but I'd be happy to update this PR with the changes described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making Tree.Entries a slice ended up being pretty straightforward so I went ahead and added the change to this PR.
Tree's mapping of names to entries has been made internal, and will only be built when necessary with the first call to Tree.File().
Improved consistency of Tree iterators
Improved consistency of Tree iterators
Add missing api path in README
This update primarily impacts iteration of
Treeentries and the processing ofTree.Files().Instead of returning a channel of files,
Tree.Files()now returns aFileIter. It has the following benefits:Tree.OrderedNamesproperty)It has the following downsides:
Closefunction, which means bothering with anotherClose()callTreeWalkerimplementation it relies upon is slightly harder to reason about than the closure-based versionThis update includes a new
TreeWalkertype that will iterate through all of the entries of a tree and its descendant subtrees. It does the dirty work thatTree.walkEntries()used to do, but with a public API.A new
TreeItertype is also included that just walks through subtrees. This could be useful for performing a directory search while ignoring files/blobs altogether.