Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

packfile/decoder: speed up packfile iterator when specific type #200

Merged
merged 3 commits into from
Jan 12, 2017
Merged

packfile/decoder: speed up packfile iterator when specific type #200

merged 3 commits into from
Jan 12, 2017

Conversation

ajnavarro
Copy link
Contributor

If specified, the packfile decoder only decode objects of a specific type, improving the decoding time in this cases.

  • Added public constructor, NewPackfileIter, to be able to iterate objects of a specific type into a packfile in a low level way.
  • Modified the Decoder to be able to read object headers first, to only decode specific objects.

If specified, the packfile decoder only decode objects of a specific type,
improving the decoding time in this cases.

- Added public constructor NewPackfileIter to be able to iterate objects of
a specific type into a packfile in a low level way.
- Modified the Decoder to be able to read object headers first, to only decode
specific objects.
@codecov-io
Copy link

codecov-io commented Jan 4, 2017

Current coverage is 76.22% (diff: 87.93%)

Merging #200 into master will decrease coverage by 0.49%

@@             master       #200   diff @@
==========================================
  Files            96         96          
  Lines          6224       6270    +46   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           4775       4779     +4   
- Misses          923        971    +48   
+ Partials        526        520     -6   

Powered by Codecov. Last update 133b97e...874330e

@smola
Copy link
Collaborator

smola commented Jan 4, 2017

@ajnavarro Please, use the appropiate title for the commit and PR (first package, then message, lowercase).

return NewDecoderForType(s, o, plumbing.AnyObject)
}

func NewDecoderForType(s *Scanner, o storer.EncodedObjectStorer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

var realType plumbing.ObjectType
switch {
case h.Type == plumbing.OFSDeltaObject:
realType = d.offsetToType[h.OffsetReference]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset reference might not be present, check it and do error handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case h.Type == plumbing.OFSDeltaObject:
realType = d.offsetToType[h.OffsetReference]
case h.Type == plumbing.REFDeltaObject:
ofs := d.hashToOffset[h.Reference]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference might not be present, check it and do error handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

var realType plumbing.ObjectType
switch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this with switch h.Type, then you can use the values directly in cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return d.decodeByHeader(h)
}

return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil?
Nope, return the next object with a relevant type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see that logic is in the PackfileIter, ok.

case plumbing.REFDeltaObject:
crc, err = d.fillREFDeltaObjectContent(obj, h.Reference)
case plumbing.OFSDeltaObject:
crc, err = d.fillOFSDeltaObjectContent(obj, h.OffsetReference)
case plumbing.CommitObject, plumbing.TreeObject, plumbing.BlobObject, plumbing.TagObject:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change the order of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

scanner := packfile.NewScanner(f.Packfile())
storage := memory.NewStorage()

d, err := packfile.NewDecoderForType(scanner, storage, plumbing.CommitObject)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only CommitObject? Test all types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -91,3 +91,24 @@ func (s *FsSuite) TestIterWithType(c *C) {
c.Assert(err, IsNil)
})
}

func (s *FsSuite) TestPackFileIterator(c *C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestPackFileIterator -> TestPackfileIter (same spelling)

@ajnavarro ajnavarro changed the title Speed up packfile iterator when specific type packfile/decoder: Speed up packfile iterator when specific type Jan 5, 2017
@ajnavarro ajnavarro changed the title packfile/decoder: Speed up packfile iterator when specific type packfile/decoder: speed up packfile iterator when specific type Jan 5, 2017
}

func NewDecoderForType(s *Scanner, o storer.EncodedObjectStorer,
t plumbing.ObjectType) (*Decoder, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first thing to do is probably returning an error if t is plumbing.InvalidObject.

In case t is plumbing.OFSDeltaObject or plumbing.REFDeltaObject I would recommend returning also an error.

@@ -174,24 +185,54 @@ func (d *Decoder) decodeObjectsWithObjectStorerTx(count int) error {

// DecodeObject reads the next object from the scanner and returns it. This
// method can be used in replacement of the Decode method, to work in a
// interative way
// interactive way. If you created a new decoder instance using NewDecoderForType
// constructor, if the object decoded is not equals to the specified one, null will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/null/nil/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return nil instead of an error or just skip til the next object of that type? The caller wants an object or an error, not something that is not an object, nor an error. Let's keep things easy for the caller doing the work inside the function, not outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do this in a proper way, we should refactor the Decoder to be an Iterator instead of get the elements count from the Scanner and then call to DecodeObject 'count' times. But this was the previous behavior. Could we do this refactor in another PR?

Copy link
Contributor

@alcortesm alcortesm Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, that will be a nice solution.

return nil, nil
}

func (d *Decoder) nextHeader() (*ObjectHeader, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method, along with 193-195 is redundant. Refactor all these.

@@ -201,7 +242,7 @@ func (d *Decoder) DecodeObject() (plumbing.EncodedObject, error) {
}

hash := obj.Hash()
d.setOffset(hash, h.Offset)
d.setOffsetAndType(hash, h.Offset, obj.Type())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set... is quite a bad name, as it does not transmit intention or usefulness. Can we call it memoize or remember instead?

return iter.Next()
}
iter.position++
if obj != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I meant before when commenting on not giving extra work to our users. This method is unnecasary complex due to DecodeObject returning ambigous results. Please change that and simplify this.

Besides this: this if should negated to avoid extra logic an extra indentation:

if obj == nil {
    continue
}

return iter.Next()
}
if iter.t != plumbing.AnyObject && iter.t != obj.Type() {
return iter.Next()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a recursive call inside a for loop? on a condition that can not happen because the code just added to the file above? WTF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch.

})
}

func (s *FsSuite) TestPackFileIterator(c *C) {
fixtures.ByTag(".git").ByTag("packfile").Test(c, func(f *fixtures.Fixture) {
func (s *FsSuite) TestPackFileIter(c *C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestPackFileIter -> TestPackfileIter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mcuadros mcuadros merged commit c5f1056 into src-d:master Jan 12, 2017
@ajnavarro ajnavarro deleted the improvement/filesystem-storage branch January 12, 2017 09:08
gsalingu-ovhus pushed a commit to gsalingu-ovhus/go-git that referenced this pull request Mar 28, 2019
Typo in api documentation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants