Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie: add difference iterator #3637

Merged
merged 6 commits into from Feb 22, 2017
Merged

trie: add difference iterator #3637

merged 6 commits into from Feb 22, 2017

Conversation

Arachnid
Copy link
Contributor

@Arachnid Arachnid commented Feb 2, 2017

This PR implements a differenceIterator, which allows iterating over trie nodes that exist in one trie but not in another. This is a prerequisite for most GC strategies, in order to find obsolete nodes.

@mention-bot
Copy link

@Arachnid, thanks for your PR! By analyzing the history of the files in this pull request, we identified @karalabe, @fjl and @obscuren to be potential reviewers.

@GitCop
Copy link

GitCop commented Feb 2, 2017

Thank you for your contribution! Your commits seem to not adhere to the repository coding standards

  • Commit: 2b78c04
  • Commits must be prefixed with the package(s) they modify

Please check the contribution guidelines for more details.


This message was auto-generated by https://gitcop.com

@Arachnid Arachnid requested review from fjl and obscuren February 2, 2017 18:20
@fjl fjl changed the title Implement differenceIterator trie: add difference iterator Feb 3, 2017

accountHash common.Hash // Hash of the node containing the account
codeHash common.Hash // Hash of the contract source code
code []byte // Source code associated with a contract

Hash common.Hash // Hash of the current entry being iterated (nil if not standalone)
Entry interface{} // Current state entry being iterated (internal representation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you delete this because it wasn't used anywhere? Any particular reason? Doesn't this make the iterator a lot less useful as there's no way to actually access the data being iterated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I deleted it because it isn't being used anywhere. Because the entry types are all private anyway, this field wasn't usable outside the package.

trie/iterator.go Outdated
nodeIt: NewNodeIterator(trie),
keyBuf: make([]byte, 0, 64),
Key: nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed, nil is the default value anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, just saw this was in the original code too. Anyway, probably should be deleted.

trie/iterator.go Outdated
// FromNodeIterator creates a new key-value iterator from a node iterator
func FromNodeIterator(it NodeIterator) *Iterator {
return &Iterator{
nodeIt: it,
Key: nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key: nil shouldn't be needed.

trie/iterator.go Outdated
}

// FromNodeIterator creates a new key-value iterator from a node iterator
func FromNodeIterator(it NodeIterator) *Iterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct name is NewIteratorFromNodeIterator. A bit unwieldy but NewSomething should be part of the method signature to make it visible what it does. I'm open to suggestions though :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewIterator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that already exists

}
return decodeCompact(key)
// NodeIterator is an iterator to traverse the trie pre-order.
type NodeIterator interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this interface is a public one, please document all the methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few special mentions probably warranted for:

  • The caller of the interface methods must not modify the returned values (at least for LeadBlob and Path), since those are not copied by the iterator, just returned as is.
  • The caller should not retain the result of Path or should make a copy, since the next invocation to "Next" will potentially change its content.

trie/iterator.go Outdated

// Hash returns the hash of the current node
func (it *nodeIterator) Hash() common.Hash {
if it.trie == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up for debate, but I think it would be cleaner to have if len(it.stack) == 0 here, rather than looking at it.trie == nil because below you are indexing into the stack and you need an extra "mental jump" to realize what the connection between it.trie and it.stack is.

Same for the methods below.

trie/iterator.go Outdated
return nil
}

if node, ok := it.stack[len(it.stack)-1].node.(valueNode); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about it but I think you could convert to []byte directly here instead of valueNode first and then []byte later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't, unfortunately.

trie/iterator.go Outdated
func (it *NodeIterator) Next() bool {
// sets the Error field to the encountered failure. If `children` is false,
// skips iterating over any subnodes of the current node.
func (it *nodeIterator) Next(children bool) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick. Maybe descend would be a more appropriate name than children? It more verb-y, at least fits better with a bool.

hashes[it.Hash] = struct{}{}
for it := NewNodeIterator(trie); it.Next(true); {
if it.Hash() != (common.Hash{}) {
hashes[it.Hash()] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine here since it's only a test, but live code should cache the result of .Hash() since it allocates a new common.Hash memory area. Just saying to make sure any code after this will be ok :)

func (it *NodeIterator) retrieve() bool {
// Clear out any previously set values
it.Hash, it.Node, it.Parent, it.Leaf, it.LeafBlob = common.Hash{}, nil, common.Hash{}, false, nil
type differenceIterator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some dox. I know it's only an internal type, but often it helps when skimming the code if internals have at least slight docs.

trie/iterator.go Outdated
a: a,
b: b,
eof: false,
count: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to set eof to false and count to 0, those are the default values.

trie/iterator.go Outdated
// If the iteration's done, return no available data
if it.trie == nil {
// NewDifferenceIterator constructs a NodeIterator that iterates over elements in b that
// are not in a.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also mention what the int pointer return is, and why it's a pointer.

trie/iterator.go Outdated
it.Hash, it.Node, it.Parent, it.Leaf, it.LeafBlob = common.Hash{}, nil, common.Hash{}, false, nil
type differenceIterator struct {
a, b NodeIterator
eof bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document that eof marks the end of a.

trie/iterator.go Outdated

for {
apath, bpath := it.a.Path(), it.b.Path()
cmp := bytes.Compare(apath, bpath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be nicer with:

switch bytes.Compare(it.a.Path(), it.b.Path()) {
case -1:
  // ...
case 1:
  // ...
default:
  // ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not, see and chose whichever :)

trie/iterator.go Outdated
// a and b are identical; skip this whole subtree if the nodes have hashes
if !it.b.Next(it.a.Hash() == common.Hash{}) {
return false
}
Copy link
Member

@karalabe karalabe Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you ought to increment count by one here, and by one below. Otherwise you might miss a count if a goes EOF.

trie/iterator.go Outdated
}

// a and b are identical; skip this whole subtree if the nodes have hashes
if !it.b.Next(it.a.Hash() == common.Hash{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b.Hash()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the same as the current one, but at least it doesn't look like an accident :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored a bit to make the logic clearer.

@GitCop
Copy link

GitCop commented Feb 21, 2017

Thank you for your contribution! Your commits seem to not adhere to the repository coding standards

  • Commit: 2b78c04
  • Commits must be prefixed with the package(s) they modify

Please check the contribution guidelines for more details.


This message was auto-generated by https://gitcop.com

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@karalabe karalabe added this to the 1.6.0 milestone Feb 22, 2017
@fjl fjl merged commit 5552734 into ethereum:master Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants