-
Notifications
You must be signed in to change notification settings - Fork 540
packfile/decoder: speed up packfile iterator when specific type #200
packfile/decoder: speed up packfile iterator when specific type #200
Conversation
If specified, the packfile decoder only decode objects of a specific type, improving the decoding time in this cases. - Added public constructor NewPackfileIter to be able to iterate objects of a specific type into a packfile in a low level way. - Modified the Decoder to be able to read object headers first, to only decode specific objects.
Current coverage is 76.22% (diff: 87.93%)@@ master #200 diff @@
==========================================
Files 96 96
Lines 6224 6270 +46
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 4775 4779 +4
- Misses 923 971 +48
+ Partials 526 520 -6
|
@ajnavarro Please, use the appropiate title for the commit and PR (first package, then message, lowercase). |
return NewDecoderForType(s, o, plumbing.AnyObject) | ||
} | ||
|
||
func NewDecoderForType(s *Scanner, o storer.EncodedObjectStorer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
var realType plumbing.ObjectType | ||
switch { | ||
case h.Type == plumbing.OFSDeltaObject: | ||
realType = d.offsetToType[h.OffsetReference] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset reference might not be present, check it and do error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
case h.Type == plumbing.OFSDeltaObject: | ||
realType = d.offsetToType[h.OffsetReference] | ||
case h.Type == plumbing.REFDeltaObject: | ||
ofs := d.hashToOffset[h.Reference] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference might not be present, check it and do error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
var realType plumbing.ObjectType | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this with switch h.Type
, then you can use the values directly in cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return d.decodeByHeader(h) | ||
} | ||
|
||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil?
Nope, return the next object with a relevant type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see that logic is in the PackfileIter, ok.
case plumbing.REFDeltaObject: | ||
crc, err = d.fillREFDeltaObjectContent(obj, h.Reference) | ||
case plumbing.OFSDeltaObject: | ||
crc, err = d.fillOFSDeltaObjectContent(obj, h.OffsetReference) | ||
case plumbing.CommitObject, plumbing.TreeObject, plumbing.BlobObject, plumbing.TagObject: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to change the order of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
scanner := packfile.NewScanner(f.Packfile()) | ||
storage := memory.NewStorage() | ||
|
||
d, err := packfile.NewDecoderForType(scanner, storage, plumbing.CommitObject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only CommitObject? Test all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -91,3 +91,24 @@ func (s *FsSuite) TestIterWithType(c *C) { | |||
c.Assert(err, IsNil) | |||
}) | |||
} | |||
|
|||
func (s *FsSuite) TestPackFileIterator(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestPackFileIterator -> TestPackfileIter (same spelling)
} | ||
|
||
func NewDecoderForType(s *Scanner, o storer.EncodedObjectStorer, | ||
t plumbing.ObjectType) (*Decoder, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first thing to do is probably returning an error if t is plumbing.InvalidObject
.
In case t is plumbing.OFSDeltaObject
or plumbing.REFDeltaObject
I would recommend returning also an error.
@@ -174,24 +185,54 @@ func (d *Decoder) decodeObjectsWithObjectStorerTx(count int) error { | |||
|
|||
// DecodeObject reads the next object from the scanner and returns it. This | |||
// method can be used in replacement of the Decode method, to work in a | |||
// interative way | |||
// interactive way. If you created a new decoder instance using NewDecoderForType | |||
// constructor, if the object decoded is not equals to the specified one, null will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/null/nil/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return nil instead of an error or just skip til the next object of that type? The caller wants an object or an error, not something that is not an object, nor an error. Let's keep things easy for the caller doing the work inside the function, not outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do this in a proper way, we should refactor the Decoder to be an Iterator instead of get the elements count from the Scanner and then call to DecodeObject 'count' times. But this was the previous behavior. Could we do this refactor in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, that will be a nice solution.
return nil, nil | ||
} | ||
|
||
func (d *Decoder) nextHeader() (*ObjectHeader, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method, along with 193-195 is redundant. Refactor all these.
@@ -201,7 +242,7 @@ func (d *Decoder) DecodeObject() (plumbing.EncodedObject, error) { | |||
} | |||
|
|||
hash := obj.Hash() | |||
d.setOffset(hash, h.Offset) | |||
d.setOffsetAndType(hash, h.Offset, obj.Type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set... is quite a bad name, as it does not transmit intention or usefulness. Can we call it memoize
or remember
instead?
return iter.Next() | ||
} | ||
iter.position++ | ||
if obj != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant before when commenting on not giving extra work to our users. This method is unnecasary complex due to DecodeObject returning ambigous results. Please change that and simplify this.
Besides this: this if
should negated to avoid extra logic an extra indentation:
if obj == nil {
continue
}
return iter.Next() | ||
} | ||
if iter.t != plumbing.AnyObject && iter.t != obj.Type() { | ||
return iter.Next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a recursive call inside a for loop? on a condition that can not happen because the code just added to the file above? WTF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch.
}) | ||
} | ||
|
||
func (s *FsSuite) TestPackFileIterator(c *C) { | ||
fixtures.ByTag(".git").ByTag("packfile").Test(c, func(f *fixtures.Fixture) { | ||
func (s *FsSuite) TestPackFileIter(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestPackFileIter -> TestPackfileIter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Typo in api documentation
If specified, the packfile decoder only decode objects of a specific type, improving the decoding time in this cases.