Improved consistency of Tree iterators #30

JoshuaSjoding · 2016-02-17T11:49:26Z

This update primarily impacts iteration of Tree entries and the processing of Tree.Files().

Instead of returning a channel of files, Tree.Files() now returns a FileIter. It has the following benefits:

It returns files in the original order of the repository (relying on a new Tree.OrderedNames property)
It can return errors encountered when retrieving files and trees from underlying storage
It can be Closed without having to drain the entire channel
It defers the heavy lifting to a new TreeWalker type
Its behavior is a little more consistent with other Iter types
It's a little less prone to memory leaks

It has the following downsides:

Using it is slightly more cumbersome than simply ranging over a channel
It exposes a Close function, which means bothering with another Close() call
The TreeWalker implementation it relies upon is slightly harder to reason about than the closure-based version

This update includes a new TreeWalker type that will iterate through all of the entries of a tree and its descendant subtrees. It does the dirty work that Tree.walkEntries() used to do, but with a public API.

A new TreeIter type is also included that just walks through subtrees. This could be useful for performing a directory search while ignoring files/blobs altogether.

codecov-io · 2016-02-17T12:06:57Z

Current coverage is `60.35%`

Merging #30 into master will increase coverage by +1.84% as of c190b44

@@            master     #30   diff @@
======================================
  Files           24      27     +3
  Stmts         1567    1657    +90
  Branches       200     216    +16
  Methods          0       0       
======================================
+ Hit            917    1000    +83
- Partial         93     101     +8
+ Missed         557     556     -1

Review entire Coverage Diff as of c190b44

Powered by Codecov. Updated on successful CI builds.

mcuadros · 2016-02-17T12:18:07Z

I need to review it, but on the meanwhile can you squash the commits?

Instead of returning a channel of files, Tree.Files() now returns a FileIter with these qualities: * It returns files in the original order of the repository (relying on a * new Tree.OrderedNames property) * It can return errors encountered when retrieving files and trees from * underlying storage * It can be Closed without having to drain the entire channel * It defers the heavy lifting to a new TreeWalker type * Its behavior is a little more consistent with other Iter types * It's a little less prone to memory leaks This update includes a new TreeWalker type that will iterate through all of the entries of a tree and its descendant subtrees. It does the dirty work that Tree.walkEntries() used to do, but with a public API. A new TreeIter type is also included that just walks through subtrees. This could be useful for performing a directory search while ignoring files/blobs altogether.

JoshuaSjoding · 2016-02-17T12:59:06Z

@mcuadros Commits are now squashed.

mcuadros · 2016-02-17T14:19:59Z

tree.go

-	Entries map[string]TreeEntry
-	Hash    core.Hash
+	Entries      map[string]TreeEntry
+	OrderedNames []string


is really need it?

I thought it would be useful and appropriate to keep the entries ordered. I already found it useful for writing tests, and it could make Tree comparisons more efficient when we can rely on a known order.

The hash of the tree is already dependent on its order, and that order generally seems well defined. If a tree is recreated with the same entries in a different order I expect that it would produce a different hash. Users of go-git might be surprised to get the files back in a weird order when the order is well defined by the tree itself.

Relying on the map here meant that the resulting order would always be random. Users could sort it again to put it back in order, but that seems wasteful. Also, golang's default lexicographic sort algorithm is different than what I'm seeing in most git repositories.

The implementation under review here does add some overhead and I actually don't like it very much. What I'd really like to do is this:

Change the definition of Tree.Entries to []TreeEntry

Add a Tree.Map property as map[string]*TreeEntry

Build Tree.Map lazily: only on the first call to Tree.File() for example (it basically functions as a cache)

Update TreeWalker to simply iterate through Tree.Entries (it wouldn't touch Tree.Map)

Making such a change would have these advantages:

The implementation would be cleaner than the current code under review

Users of Tree that don't need random lookups would never bother generating a map (a performance improvement)

I was hesitant to do this before because it's more invasive, but I'd be happy to update this PR with the changes described above.

Making Tree.Entries a slice ended up being pretty straightforward so I went ahead and added the change to this PR.

Tree's mapping of names to entries has been made internal, and will only be built when necessary with the first call to Tree.File().

Improved consistency of Tree iterators

Add missing api path in README

JoshuaSjoding force-pushed the consistent-iterators branch from 82df1bf to 8261deb Compare February 17, 2016 12:55

mcuadros reviewed Feb 17, 2016
View reviewed changes

Tree.Entries is now a slice

e8524ed

Tree's mapping of names to entries has been made internal, and will only be built when necessary with the first call to Tree.File().

mcuadros added a commit that referenced this pull request Feb 17, 2016

Merge pull request #30 from scjalliance/consistent-iterators

33dada7

Improved consistency of Tree iterators

mcuadros merged commit 33dada7 into src-d:master Feb 17, 2016

JoshuaSjoding deleted the consistent-iterators branch June 24, 2016 09:10

mcuadros added a commit that referenced this pull request Jan 31, 2017

Merge pull request #30 from scjalliance/consistent-iterators

f57aa86

Improved consistency of Tree iterators

gsalingu-ovhus pushed a commit to gsalingu-ovhus/go-git that referenced this pull request Mar 28, 2019

Merge pull request src-d#30 from bodji/rc4

b87add0

Add missing api path in README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved consistency of Tree iterators #30

Improved consistency of Tree iterators #30

Uh oh!

JoshuaSjoding commented Feb 17, 2016

Uh oh!

codecov-io commented Feb 17, 2016

Uh oh!

mcuadros commented Feb 17, 2016

Uh oh!

JoshuaSjoding commented Feb 17, 2016

Uh oh!

mcuadros Feb 17, 2016

Uh oh!

JoshuaSjoding Feb 17, 2016

Uh oh!

JoshuaSjoding Feb 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improved consistency of Tree iterators #30

Improved consistency of Tree iterators #30

Uh oh!

Conversation

JoshuaSjoding commented Feb 17, 2016

Uh oh!

codecov-io commented Feb 17, 2016

Current coverage is 60.35%

Uh oh!

mcuadros commented Feb 17, 2016

Uh oh!

JoshuaSjoding commented Feb 17, 2016

Uh oh!

mcuadros Feb 17, 2016

Choose a reason for hiding this comment

Uh oh!

JoshuaSjoding Feb 17, 2016

Choose a reason for hiding this comment

Uh oh!

JoshuaSjoding Feb 17, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Current coverage is `60.35%`