New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exact slice allocation for fixed size repeated packed fields #437

Merged
merged 3 commits into from Jul 31, 2018

Conversation

Projects
None yet
3 participants
@gunnihinn
Contributor

gunnihinn commented Jul 25, 2018

See comments in commit message and issue 436.

gunnihinn added some commits Jul 25, 2018

Add a test for Unmarshal memory use
This is a regression test for issue 436:

    #436
Allocate exactly the memory needed for packed fields
Suppose we have the protobuf definition

    message Foo {
        repeated type Stuff = 1;
    }

that we deserialize in a fairly common way:

    f := &Foo{}
    f.Unmarshal(blob)

Before the call to Unmarshal, `f.Stuff` will be a slice of length 0, so
the Unmarshal operation will more or less be:

    for _, x := range xs {
        f.Stuff = append(f.Stuff, x)
    }

If we don't know how many elements we're going to deserialize
beforehand, this is the best we can do.

Suppose, however, that we know that we're going to deserialize n
elements. If k is such that 2^k < n <= 2^{k+1}, then the Go runtime's
exponential doubling strategy for resizing the arrays that back slices
will cause us to allocate memory for at least

    1 + 2 + ... + 2^{k+1} = 2 * 2^{k+1}

elements, which is usually more than double what we actually need.

When we deserialize packed fields, we know how many bytes we're going to
deserialize before we start the default append loop. If we furthermore
know how many elements those bytes correspond to, which we do when the
protobuf wire type corresponding to `type` has fixed length [1], we can
prepend the default append loop with

    f.Stuff = make([]type, 0, n)

and ask for exactly the memory we're going to use.

This results in considerable memory savings, between 50 and 80 percent,
compared with the default strategy. These savings are important to
people who use protobuf to communicate things like time series between
services, which consist almost entirely of large arrays of floats and
doubles.

This fixes #436.

It's conceivable to implement similar things for packed types of
non-fixed length. They're encoded with varints, and we _could_ run
through the byte stream we're going to deserialize and count how many
bytes don't have the most significant bit set, but the performance
implications of that seem less predictable than of the simple division
we can perform here.

[1] https://developers.google.com/protocol-buffers/docs/encoding#structure
Commit modified test files
Travis CI is unhappy with me.
@gunnihinn

This comment has been minimized.

Show comment
Hide comment
@gunnihinn

gunnihinn Jul 26, 2018

Contributor

I thought a bit more about the throwaway comment about packed types of non-fixed length. I think it's worth a shot to do that, and tune it in such a way that maybe it only kicks in if we're sure there are enough varints to deserialize to be worth it. It sounds like a nice project for junior developers, so I'm fishing for some at $work to see if anyone wants to have a go at this. It might take a little while to organize, so I think we'll do a separate PR for that.

Contributor

gunnihinn commented Jul 26, 2018

I thought a bit more about the throwaway comment about packed types of non-fixed length. I think it's worth a shot to do that, and tune it in such a way that maybe it only kicks in if we're sure there are enough varints to deserialize to be worth it. It sounds like a nice project for junior developers, so I'm fishing for some at $work to see if anyone wants to have a go at this. It might take a little while to organize, so I think we'll do a separate PR for that.

@awalterschulze

This comment has been minimized.

Show comment
Hide comment
@awalterschulze

awalterschulze Jul 30, 2018

Member

Thank you so much @gunnihinn this looks very promising.

@donaldgraham would you mind reviewing this?

Member

awalterschulze commented Jul 30, 2018

Thank you so much @gunnihinn this looks very promising.

@donaldgraham would you mind reviewing this?

@donaldgraham

This comment has been minimized.

Show comment
Hide comment
@donaldgraham

donaldgraham Jul 31, 2018

Contributor

This looks great. I like the way the test expresses the expectation that we allocate less than half of the memory we had allocated before...

Contributor

donaldgraham commented Jul 31, 2018

This looks great. I like the way the test expresses the expectation that we allocate less than half of the memory we had allocated before...

@awalterschulze awalterschulze changed the title from Gmagnusson/exact slice allocation to exact slice allocation for fixed size repeated packed fields Jul 31, 2018

@awalterschulze awalterschulze merged commit 5440baf into gogo:master Jul 31, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@awalterschulze

This comment has been minimized.

Show comment
Hide comment
@awalterschulze

awalterschulze Jul 31, 2018

Member

Thank you so much @gunnihinn and @donaldgraham

Member

awalterschulze commented Jul 31, 2018

Thank you so much @gunnihinn and @donaldgraham

gunnihinn added a commit to go-graphite/protocol that referenced this pull request Jul 31, 2018

Update gogo/protobuf dependency
... and generate the protobuf files with the latest version.

We really want to run at least commit 5440baf of gogo/protobuf, because
it contains a pull request we got merged upstream:

    gogo/protobuf#437

The effect of the PR is to reduce the memory allocated by the

    carbonapi_v3_pb.MultiFetchResponse.Unmarshal

method by at least 50% and up to 80%. This is very relevant to
Graphite's interests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment