Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Compression.LZ4ContiguousBlock and implementation #681

Merged
merged 13 commits into from
Nov 30, 2019
Merged

Conversation

neuecc
Copy link
Member

@neuecc neuecc commented Nov 27, 2019

Related to #675, #680

  • Add Compression.LZ4ContiguousBlock
  • Impl LZ4ContiguousBlock serialization in MessagePackSerializer
  • use ReusableSequenceWithMinSize.Rent() instead of new Sequence in LZ4 compression(new Sequence is still exists in Deserialize(Stream))
  • Cache LZ4Transform delegate

unit test dumps serialized binary size.
when MinimumSpanLength = 4096, shows this result.

Len:1 NoneSize:23
Len:1 Lz4BlockSize:23
Len:1 Lz4ContiguousBlockSize:23
Len:10 NoneSize:221
Len:10 Lz4BlockSize:107
Len:10 Lz4ContiguousBlockSize:110
Len:100 NoneSize:2377
Len:100 Lz4BlockSize:816
Len:100 Lz4ContiguousBlockSize:819
Len:1000 NoneSize:27085
Len:1000 Lz4BlockSize:8690
Len:1000 Lz4ContiguousBlockSize:8737
Len:10000 NoneSize:279085
Len:10000 Lz4BlockSize:86004
Len:10000 Lz4ContiguousBlockSize:88304

…nto v2.0

# Conflicts:
#	src/MessagePack.UnityClient/Assets/Scripts/MessagePack/MessagePackSerializer.cs
@neuecc neuecc requested a review from AArnott November 27, 2019 08:05
@neuecc
Copy link
Member Author

neuecc commented Nov 28, 2019

I've added benchmark, this is results.

9 properties simple object, 1000 array

Method Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 463.4 us NA 49.3164 49.3164 49.3164 168 KB
SerializeLz4Block 686.5 us NA 9.7656 0.9766 - 67.46 KB
SerializeLz4ContiguousBlock 716.0 us NA 9.7656 0.9766 - 70.66 KB
DeserializeNone 1,024.6 us NA 7.8125 1.9531 - 62.71 KB
DeserializeLz4Block 1,110.6 us NA 7.8125 1.9531 - 62.71 KB
DeserializeLz4ContiguousBlock 1,241.5 us NA 7.8125 1.9531 - 62.71 KB

9 properties simple object, 10000 array

Method Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 4.805 ms NA 343.7500 273.4375 187.5000 2710.08 KB
SerializeLz4Block 8.161 ms NA 531.2500 437.5000 375.0000 5136.12 KB
SerializeLz4ContiguousBlock 7.120 ms NA 320.3125 218.7500 125.0000 1968.8 KB
DeserializeNone 9.685 ms NA 78.1250 31.2500 - 626.67 KB
DeserializeLz4Block 11.415 ms NA 312.5000 281.2500 234.3750 2307.55 KB
DeserializeLz4ContiguousBlock 10.349 ms NA 250.0000 125.0000 - 1720.38 KB

It seems to change drastically when the pool is used up.
SerializeNone 's 1000 array allocates is strange point.
I'll investigate it.

Edit:
1000 array of SerializeNone's allocation is just final byte[].
No problem.

@neuecc
Copy link
Member Author

neuecc commented Nov 28, 2019

SequencePool.MinimumSpanLength = 4098

Method Length Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 1 736.1 ns NA 0.0477 - - 201 B
SerializeLz4Block 1 2,381.1 ns NA 0.0458 - - 193 B
SerializeLz4ContiguousBlock 1 2,340.3 ns NA 0.0458 - - 193 B
DeserializeNone 1 1,323.4 ns NA 0.0210 - - 88 B
DeserializeLz4Block 1 2,448.9 ns NA 0.0191 - - 88 B
DeserializeLz4ContiguousBlock 1 2,276.5 ns NA 0.0191 - - 88 B
SerializeNone 10 5,076.8 ns NA 0.4120 - - 1758 B
SerializeLz4Block 10 7,989.1 ns NA 0.1831 - - 819 B
SerializeLz4ContiguousBlock 10 8,216.6 ns NA 0.1831 - - 819 B
DeserializeNone 10 11,499.4 ns NA 0.1526 - - 666 B
DeserializeLz4Block 10 12,497.6 ns NA 0.1526 - - 666 B
DeserializeLz4ContiguousBlock 10 12,778.0 ns NA 0.1526 - - 666 B
SerializeNone 100 49,682.6 ns NA 4.0894 - - 17256 B
SerializeLz4Block 100 72,355.9 ns NA 1.5869 - - 6933 B
SerializeLz4ContiguousBlock 100 78,011.5 ns NA 1.7090 - - 7346 B
DeserializeNone 100 109,853.4 ns NA 1.4648 - - 6443 B
DeserializeLz4Block 100 116,401.1 ns NA 1.4648 - - 6443 B
DeserializeLz4ContiguousBlock 100 124,026.5 ns NA 1.4648 - - 6444 B
SerializeNone 1000 562,298.8 ns NA 48.8281 48.8281 48.8281 172032 B
SerializeLz4Block 1000 820,224.5 ns NA 15.6250 - - 69264 B
SerializeLz4ContiguousBlock 1000 904,753.9 ns NA 16.6016 - - 72368 B
DeserializeNone 1000 1,231,013.6 ns NA 13.6719 1.9531 - 64225 B
DeserializeLz4Block 1000 1,243,090.8 ns NA 13.6719 1.9531 - 64225 B
DeserializeLz4ContiguousBlock 1000 1,068,335.9 ns NA 13.6719 1.9531 - 64225 B
SerializeNone 10000 4,749,817.9 ns NA 367.1875 359.3750 203.1250 2774943 B
SerializeLz4Block 10000 9,319,995.2 ns NA 562.5000 500.0000 375.0000 5259804 B
SerializeLz4ContiguousBlock 10000 7,070,504.0 ns NA 335.9375 242.1875 117.1875 2015710 B
DeserializeNone 10000 13,153,344.1 ns NA 93.7500 46.8750 - 641822 B
DeserializeLz4Block 10000 11,653,735.9 ns NA 328.1250 296.8750 234.3750 2362782 B
DeserializeLz4ContiguousBlock 10000 10,544,559.4 ns NA 281.2500 140.6250 - 1761820 B

SequencePool.MinimumSpanLength = 32768

Method Length Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 1 734.9 ns NA 0.0477 - - 201 B
SerializeLz4Block 1 2,079.9 ns NA 0.0420 - - 185 B
SerializeLz4ContiguousBlock 1 2,216.0 ns NA 0.0458 - - 193 B
DeserializeNone 1 1,148.5 ns NA 0.0210 - - 88 B
DeserializeLz4Block 1 2,692.1 ns NA 0.0191 - - 88 B
DeserializeLz4ContiguousBlock 1 2,209.4 ns NA 0.0191 - - 88 B
SerializeNone 10 4,421.4 ns NA 0.4120 - - 1758 B
SerializeLz4Block 10 7,072.2 ns NA 0.1907 - - 811 B
SerializeLz4ContiguousBlock 10 10,115.5 ns NA 0.1907 - - 811 B
DeserializeNone 10 10,337.2 ns NA 0.1526 - - 666 B
DeserializeLz4Block 10 10,557.7 ns NA 0.1526 - - 666 B
DeserializeLz4ContiguousBlock 10 11,294.3 ns NA 0.1526 - - 666 B
SerializeNone 100 41,411.4 ns NA 4.0894 - - 17256 B
SerializeLz4Block 100 60,677.4 ns NA 1.5869 - - 6956 B
SerializeLz4ContiguousBlock 100 62,890.5 ns NA 1.5869 - - 7037 B
DeserializeNone 100 93,014.4 ns NA 1.4648 - - 6443 B
DeserializeLz4Block 100 99,454.8 ns NA 1.4648 - - 6443 B
DeserializeLz4ContiguousBlock 100 106,544.8 ns NA 1.4648 - - 6443 B
SerializeNone 1000 430,591.0 ns NA 49.8047 49.8047 49.8047 172032 B
SerializeLz4Block 1000 639,525.4 ns NA 15.6250 - - 69312 B
SerializeLz4ContiguousBlock 1000 616,490.2 ns NA 15.6250 - - 68688 B
DeserializeNone 1000 928,720.0 ns NA 14.6484 2.9297 - 64217 B
DeserializeLz4Block 1000 1,065,028.8 ns NA 13.6719 1.9531 - 64225 B
DeserializeLz4ContiguousBlock 1000 1,076,702.1 ns NA 13.6719 1.9531 - 64225 B
SerializeNone 10000 4,418,331.1 ns NA 203.1250 203.1250 203.1250 1720024 B
SerializeLz4Block 10000 7,342,280.9 ns NA 273.4375 273.4375 273.4375 4137640 B
SerializeLz4ContiguousBlock 10000 6,219,741.9 ns NA 109.3750 109.3750 109.3750 683592 B
DeserializeNone 10000 9,281,947.1 ns NA 93.7500 46.8750 - 641822 B
DeserializeLz4Block 10000 10,792,139.1 ns NA 328.1250 296.8750 234.3750 2362625 B
DeserializeLz4ContiguousBlock 10000 9,669,103.1 ns NA 93.7500 46.8750 - 641822 B

SequencePool.MinimumSpanLength = 65536

Method Length Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 1 613.9 ns NA 0.0477 - - 201 B
SerializeLz4Block 1 2,208.6 ns NA 0.0420 - - 185 B
SerializeLz4ContiguousBlock 1 2,269.3 ns NA 0.0458 - - 193 B
DeserializeNone 1 1,164.1 ns NA 0.0210 - - 88 B
DeserializeLz4Block 1 2,137.0 ns NA 0.0191 - - 88 B
DeserializeLz4ContiguousBlock 1 2,104.0 ns NA 0.0191 - - 88 B
SerializeNone 10 5,092.2 ns NA 0.4120 - - 1758 B
SerializeLz4Block 10 7,831.7 ns NA 0.1831 - - 803 B
SerializeLz4ContiguousBlock 10 7,844.0 ns NA 0.1831 - - 803 B
DeserializeNone 10 11,712.6 ns NA 0.1526 - - 666 B
DeserializeLz4Block 10 11,198.9 ns NA 0.1526 - - 666 B
DeserializeLz4ContiguousBlock 10 12,735.4 ns NA 0.1526 - - 666 B
SerializeNone 100 47,462.6 ns NA 4.0894 - - 17256 B
SerializeLz4Block 100 69,421.6 ns NA 1.5869 - - 6945 B
SerializeLz4ContiguousBlock 100 68,989.6 ns NA 1.5869 - - 7038 B
DeserializeNone 100 104,542.6 ns NA 1.4648 - - 6443 B
DeserializeLz4Block 100 111,816.9 ns NA 1.4648 - - 6443 B
DeserializeLz4ContiguousBlock 100 118,971.0 ns NA 1.4648 - - 6444 B
SerializeNone 1000 494,267.8 ns NA 48.8281 48.8281 48.8281 172032 B
SerializeLz4Block 1000 680,661.0 ns NA 15.6250 - - 69232 B
SerializeLz4ContiguousBlock 1000 683,819.1 ns NA 15.6250 - - 68168 B
DeserializeNone 1000 1,039,536.1 ns NA 14.6484 2.9297 - 64217 B
DeserializeLz4Block 1000 1,070,342.4 ns NA 13.6719 1.9531 - 64225 B
DeserializeLz4ContiguousBlock 1000 1,402,801.4 ns NA 13.6719 1.9531 - 64225 B
SerializeNone 10000 4,621,291.7 ns NA 203.1250 203.1250 203.1250 1720032 B
SerializeLz4Block 10000 8,946,428.0 ns NA 265.6250 265.6250 265.6250 4137368 B
SerializeLz4ContiguousBlock 10000 7,375,459.2 ns NA 109.3750 109.3750 109.3750 680560 B
DeserializeNone 10000 11,641,417.8 ns NA 93.7500 46.8750 - 641822 B
DeserializeLz4Block 10000 14,458,389.1 ns NA 328.1250 296.8750 234.3750 2362965 B
DeserializeLz4ContiguousBlock 10000 12,878,323.4 ns NA 93.7500 46.8750 - 641822 B

(The property values are random so result has variations)


Looking at all the code paths, I think that MinimumSpanLength should be 65536.
Exceeding 65536 is only when LZ4 compression with LZ4Codec.MaximumOutputLength.
In the case of Lz4Block, MinimumSpanLength is irrelevant because large sizes can be passed.
In the case of Lz4ContiguousBlock, we can optimize and control memory used.
If the use of ArrayPool of SequencePool is released, only one array is used per thread.

var maxLength = 0;
foreach (var item in msgpackUncompressedData)
{
    sequenceCount++;
    Math.Max(maxLength, item.Length);
}

// rent/return only once.
var maxCompressedLength = LZ4Codec.MaximumOutputLength(maxLength);
var lz4Buffer = msgpackUncompressedData.ArrayPool.Rent(maxCompressedLength);
try
{
    foreach (var item in msgpackUncompressedData)
    {
        int lz4Length = LZ4Codec.Encode(item.Span, lz4Buffer);
        writer.Write(lz4Buffer.AsSpan(0, lz4Length));
    }
}
finally
{
    msgpackUncompressedData.ArrayPool.Return(lz4Buffer);
}

Since there are cases where SequencePool is nested (use two), you may need to set Environment.ProcessorCount * 2 for the worst case.


Int32(small data) Lz4 Serialization benchmark result:

Method Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 164.9 ns NA 0.0076 - - 32 B
SerializeLz4Block 474.8 ns NA 0.0076 - - 32 B
SerializeLz4ContiguousBlock 512.9 ns NA 0.0076 - - 32 B
DeserializeNone 114.7 ns NA - - - -
DeserializeLz4Block 196.5 ns NA - - - -
DeserializeLz4ContiguousBlock 230.0 ns NA - - - -

LZ4 option should be same as None but result is too slow.
Because v1 using the same buffer but v2 in Lz4 requires write buffer.
Ideally, you can avoid entering the LZ4 code path if it is less than 64 bytes, but only the primitives know the size before serialization.
I've added this optimize code.

if (options.Compression.IsCompression() && !PrimitiveChecker<T>.IsFixedSizePrimitive)
{
}

Deserialize also has for optimization.
Currently alwayse calls ReusableSequenceWithMinSize.Rent() but when TryDecompress = false, it is not necessary.

@AArnott AArnott added this to the v2.0 milestone Nov 29, 2019
Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this avoids copying data into a large contiguous buffer. :)

AArnott added a commit to AArnott/MessagePack-CSharp that referenced this pull request Nov 29, 2019
Also add support for changing the `MinimumSpanLength` property for rented `Sequence<T>`  on an as-needed basis. This should allow the upcoming LZ4 compression scheme added in MessagePack-CSharp#681 to crank up the value to 64KB for the benefits of larger compression blocks.

Closes MessagePack-CSharp#680
AArnott added a commit to AArnott/MessagePack-CSharp that referenced this pull request Nov 29, 2019
Also add support for changing the `MinimumSpanLength` property for rented `Sequence<T>`  on an as-needed basis. This should allow the upcoming LZ4 compression scheme added in MessagePack-CSharp#681 to crank up the value to 64KB for the benefits of larger compression blocks.

Closes MessagePack-CSharp#680
@AArnott
Copy link
Collaborator

AArnott commented Nov 29, 2019

I see you're targeting v2.0 with this PR. Are you keen to get it in for 2.0 (which we're trying to stabilize) or would 2.1 (master branch) be better?

@neuecc
Copy link
Member Author

neuecc commented Nov 30, 2019

I want to target to 2.0 if possible.
v2 is a big change, many people will refer to new v2 documentation.

I think this API(Lz4BlockArray) is better than the traditional Lz4Block (it's like avoiding LOH in StringBuilder .NET 4, I can also reduce LOH during deserialization)
I want to notify this API at the beginning of v2.
Also, delaying to 2.1 will result in an API that most people don't know.

But if we can't agree on the implementation and it takes too much time, we should change it to 2.1.
I'm going to make a code that responds to the changes in the issues.

@neuecc
Copy link
Member Author

neuecc commented Nov 30, 2019

Pushed fixed code.

Finally I've changed serialization code to following.

foreach (var item in msgpackUncompressedData)
{
    var maxCompressedLength = LZ4Codec.MaximumOutputLength(item.Length);
    var lz4Span = writer.GetSpan(maxCompressedLength + 5);
    int lz4Length = LZ4Codec.Encode(item.Span, lz4Span.Slice(5, lz4Span.Length - 5));
    WriteBin32Header((uint)lz4Length, lz4Span);
    writer.Advance(lz4Length + 5);
}

writer.GetSpan(maxCompressedLength + 5) will return 128K.
Depending on the implementation of the backend IBufferWriter,
If this is not desired, we can avoid it by setting MinimumSpanLength to 32K.
Certainly, the benchmark measurement results showed no difference between 32K and 64K in the this sample data.

Method Length Mean Error Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SerializeNone 10000 4.521 ms NA 179.6875 179.6875 179.6875 1679.72 KB
SerializeLz4Block 10000 8.640 ms NA 281.2500 281.2500 281.2500 4040.32 KB
SerializeLz4BlockArray 10000 6.335 ms NA 78.1250 78.1250 78.1250 664.92 KB
DeserializeNone 10000 10.370 ms NA 93.7500 46.8750 - 626.78 KB
DeserializeLz4Block 10000 10.985 ms NA 312.5000 281.2500 218.7500 2307.35 KB
DeserializeLz4BlockArray 10000 9.730 ms NA 93.7500 46.8750 - 626.78 KB

@neuecc
Copy link
Member Author

neuecc commented Nov 30, 2019

reply to WriteInt32:

Oh, yes, I'm sure it works well,
but I feel that the empty extension header only for the flag is slightly ugly.

But when calculate sequnceCount, we can calc write size too.

var extHeaderSize = 0;
foreach (var item in msgpackUncompressedData)
{
    sequenceCount++;
    extHeaderSize += GetUInt32WriteSize((uint)item.Length);
}

that fixed simply and I've pushed.

@neuecc
Copy link
Member Author

neuecc commented Nov 30, 2019

Thank you for approval.
I'll write ReadMe and migration.md(that still exists obsolete WithLZ4Compression) soon and create another PR.

@neuecc neuecc merged commit ab22531 into v2.0 Nov 30, 2019
@neuecc neuecc deleted the lz4contiguous branch November 30, 2019 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants