Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize encoding of dictionaries #752

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Apr 1, 2021

Closes #743

Description

Optimize encoding of dictionaries to:

  • speed up encoding dictionaries
  • speed up decoding dictionaries
  • reduce memory use
  • reduce size of stored data

Changes

  • Switch from CBOR map to CBOR array to improve speed and preserve ordering.
  • Remove key strings from encoding.
  • Remove old backwards-compatibility decoding code.
  • Add tests, including round-trip for decoding old format and encoding new format.
  • Continue to support deferred keys (no change).

Encoding comparisons:

name                  old time/op    new time/op    delta
EncodingSmallValue-4     200µs ± 0%     138µs ± 0%  -31.32%  (p=0.000 n=8+10)
EncodingLargeValue-4    19.4ms ± 1%    13.2ms ± 1%  -32.39%  (p=0.000 n=10+10)

name                  old alloc/op   new alloc/op   delta
EncodingSmallValue-4    57.6kB ± 0%    36.2kB ± 0%  -37.12%  (p=0.000 n=10+9)
EncodingLargeValue-4    4.67MB ± 1%    3.55MB ± 0%  -23.95%  (p=0.000 n=10+10)

name                  old allocs/op  new allocs/op  delta
EncodingSmallValue-4     1.10k ± 0%     0.80k ± 0%  -27.46%  (p=0.000 n=10+10)
EncodingLargeValue-4     92.1k ± 0%     70.8k ± 0%  -23.12%  (p=0.000 n=10+10)

Decoding comparisons:

name                  old time/op    new time/op    delta
DecodingSmallValue-4     192µs ± 0%     145µs ± 0%  -24.21%  (p=0.000 n=10+8)
DecodingLargeValue-4    19.9ms ± 0%    14.7ms ± 0%  -26.29%  (p=0.000 n=10+10)

name                  old alloc/op   new alloc/op   delta
DecodingSmallValue-4    73.9kB ± 0%    60.7kB ± 0%  -17.84%  (p=0.000 n=10+10)
DecodingLargeValue-4    7.49MB ± 0%    5.75MB ± 0%  -23.23%  (p=0.000 n=10+10)

name                  old allocs/op  new allocs/op  delta
DecodingSmallValue-4     1.57k ± 0%     1.36k ± 0%  -13.36%  (p=0.000 n=10+10)
DecodingLargeValue-4      143k ± 0%      122k ± 0%  -14.63%  (p=0.000 n=10+10)

Benchmark comparisons were done on linux_amd64 with Go 1.15.10.

Special thanks to @turbolent and @SupunS for their helpful feedback, explanations, and suggestions during our call!


For contributor use:

  • Targeted PR against feature/storage-optimizations branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Switch from CBOR map to CBOR array to improve speed and preserve
ordering.

Remove key strings from encoding.

Remove old backwards-compatibility decoding code.

Add tests, including round-trip for decoding old format and encoding new
format.

Closes onflow#743
Copy link
Member

@SupunS SupunS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👏

@@ -755,13 +757,13 @@ func (e *Encoder) prepareDictionaryValue(
if err != nil {
return nil, err
}
entries[key] = prepared
entries = append(entries, prepared)
Copy link
Member

@SupunS SupunS Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly setting the value to the index might be slightly better in terms of performance (I haven't tested though, just a hunch)

Suggested change
entries = append(entries, prepared)
entries[index] = prepared
index++

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERY nice catch! Your hunch is correct! 👍

Copy link
Member Author

@fxamacker fxamacker Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In isolation (small example program), this change provides a bigger improvement.

For encoding, the new way required more coding changes. Encoding speed improvement is around 0.5-0.7% vs append for large data. Thanks again for the feedback, I'll incorporate this change!

New benchmark comparisons won't show this 0.5-0.7% speedup (compared to yesterday's benchmark) because compiling after adding error handling took away this gain.

@@ -683,7 +684,8 @@ func (e *Encoder) prepareDictionaryValue(
return nil, err
}

entries := make(map[string]interface{}, v.Entries.Len())
// Use CBOR array for entry value to preserve ordering and improve speed.
entries := make([]interface{}, 0, v.Entries.Len())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use cborArray here and in other places?

Suggested change
entries := make([]interface{}, 0, v.Entries.Len())
entries := make(cborArray, 0, v.Entries.Len())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Nice catch! For this PR, I should do this in other places if it's in functions for dictionaries. For non-dictionaries like composite types, I'd prefer to do it in the PR for composite types but I'm OK with either approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that sounds good!

@fxamacker
Copy link
Member Author

fxamacker commented Apr 1, 2021

I found some code that required more error handling (needed to check array length). I fixed it and will be more careful before opening my next PR (I had gotten too comfortable using coverage+fuzzing tools to alert me because it's feasible for serialization libraries, probably not for interpreters).

@fxamacker fxamacker marked this pull request as draft April 1, 2021 15:06
Add error handling to check decoded dictionary array length.

Incorporate feedback to replace append with direct assignment
for an additional speed gain of 0.5-0.7% for encoding large data.
@fxamacker
Copy link
Member Author

New benchmarks after incorporating feedback and adding error-handling are similar to the original benchmark comparisons posted with the PR.

Encoding vs master:

name                  old time/op    new time/op    delta
EncodingSmallValue-4     199µs ± 0%     137µs ± 0%  -31.26%  (p=0.000 n=10+9)
EncodingLargeValue-4    19.4ms ± 0%    13.1ms ± 1%  -32.25%  (p=0.000 n=10+10)

name                  old alloc/op   new alloc/op   delta
EncodingSmallValue-4    57.6kB ± 0%    36.2kB ± 0%  -37.12%  (p=0.000 n=10+10)
EncodingLargeValue-4    4.67MB ± 0%    3.55MB ± 0%  -23.93%  (p=0.000 n=10+10)

name                  old allocs/op  new allocs/op  delta
EncodingSmallValue-4     1.10k ± 0%     0.80k ± 0%  -27.46%  (p=0.000 n=10+10)
EncodingLargeValue-4     92.1k ± 0%     70.8k ± 0%  -23.12%  (p=0.000 n=10+10)

Decoding vs master:

name                  old time/op    new time/op    delta
DecodingSmallValue-4     192µs ± 0%     145µs ± 0%  -24.32%  (p=0.000 n=9+10)
DecodingLargeValue-4    19.9ms ± 1%    14.7ms ± 0%  -26.26%  (p=0.000 n=10+10)

name                  old alloc/op   new alloc/op   delta
DecodingSmallValue-4    73.9kB ± 0%    60.7kB ± 0%  -17.84%  (p=0.000 n=10+10)
DecodingLargeValue-4    7.49MB ± 0%    5.75MB ± 0%  -23.23%  (p=0.000 n=10+9)

name                  old allocs/op  new allocs/op  delta
DecodingSmallValue-4     1.57k ± 0%     1.36k ± 0%  -13.36%  (p=0.000 n=10+10)
DecodingLargeValue-4      143k ± 0%      122k ± 0%  -14.62%  (p=0.000 n=10+10)

Benchmark comparisons were done on linux_amd64 with Go 1.15.10.

@fxamacker fxamacker marked this pull request as ready for review April 1, 2021 17:01
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, nice work @fxamacker! 👏

No worries about opening earlier, feel free to keep doing that, the review and adding missing things is what the PRs are here for 👍

@@ -31,6 +31,8 @@ import (
"github.com/onflow/cadence/runtime/common"
)

type cborArray = []interface{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}

return cbor.Tag{
Number: cborTagDictionaryValue,
Content: cborMap{
Content: cborArray{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, nice that the slice initialization can stay the same, I didn't know this was possible 👍

Comment on lines +78 to +80
if version <= 3 {
decoded, err = DecodeValueV3(encoded, &testOwner, nil, version, nil)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to still support the old one 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants