Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pickling/unpickling performance #59

Merged
merged 3 commits into from Apr 18, 2019
Merged

Conversation

stephentoub
Copy link
Contributor

Thanks for the helpful library!

I found while using it that it was generating a bunch of intermediate objects that could be avoided, and that these were contributing non-trivially to execution time. This PR addresses many of them while also avoiding touching changing the public surface area, as I wasn't sure if it was malleable or not. There's more that can be done in this regard, but for now this addresses a lot of the low-hanging fruit that's there.

(I also disabled one test that was failing prior to any of my changes.)

Example:

using System.IO;
using System.Linq;
using Razorvine.Pickle;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class Test
{
    public static void Main() => BenchmarkRunner.Run<Test>();

    private readonly MemoryStream _stream = new MemoryStream();

    private object[][] _objectArrays = Enumerable.Range(0, 1000).Select(_ => new object[] { 1.0, 2.0 }).ToArray();
    private double[][] _doubleArrays = Enumerable.Range(0, 1000).Select(_ => new double[] { 1.0, 2.0 }).ToArray();
    private string[][] _stringArrays = Enumerable.Range(0, 1000).Select(_ => new string[] { "hello", "world" }).ToArray();

    [Benchmark] public void ObjectArrays() => PickleUnpickle(_objectArrays);
    [Benchmark] public void DoubleArrays() => PickleUnpickle(_doubleArrays);
    [Benchmark] public void StringArrays() => PickleUnpickle(_stringArrays);

    private void PickleUnpickle(object obj)
    {
        _stream.SetLength(0);
        new Pickler(useMemo: false).dump(obj, _stream);
        _stream.Position = 0;
        new Unpickler().load(_stream);
    }
}

Before:

|       Method |       Mean |     Error |   StdDev |    Gen 0 |   Gen 1 | Gen 2 |  Allocated |
|------------- |-----------:|----------:|---------:|---------:|--------:|------:|-----------:|
| ObjectArrays |   444.1 us |  9.490 us | 21.81 us |  78.6133 | 11.7188 |     - |  321.79 KB |
| DoubleArrays | 1,250.1 us | 26.046 us | 32.94 us | 312.5000 |       - |     - | 1282.73 KB |
| StringArrays |   633.2 us | 11.740 us | 12.06 us | 124.0234 |  1.9531 |     - |  509.29 KB |

After:

|       Method |     Mean |    Error |   StdDev |   Gen 0 |  Gen 1 | Gen 2 | Allocated |
|------------- |---------:|---------:|---------:|--------:|-------:|------:|----------:|
| ObjectArrays | 234.0 us | 2.081 us | 1.946 us | 26.8555 | 4.1504 |     - | 110.31 KB |
| DoubleArrays | 660.7 us | 5.529 us | 5.172 us | 89.8438 | 0.9766 |     - |  368.4 KB |
| StringArrays | 415.5 us | 3.102 us | 2.901 us | 61.0352 | 0.9766 |     - | 250.94 KB |

cc: @imback82, @eerhardt, @rapoth

I found while using the library that it was generating a bunch of intermediate objects that could be avoided, and that these were contributing non-trivially to execution time.  I addressed many of them while also avoiding touching the public surface area, as I wasn't sure if it was maleable or not.
@eerhardt
Copy link

@stephentoub - Do you think it would be valuable to add your benchmark test to the repo?

@stephentoub
Copy link
Contributor Author

Do you think it would be valuable to add your benchmark test to the repo?

That would be up to @irmen. There's currently no perf tests project, and the only perf test anywhere for the C# functionality is minimal and disabled (

[Fact(Skip = "performancetest")]
public void TestUnpicklingPerformance()
). Adding perf tests is only valuable if someone will actually be using them to either monitor for regressions or drive further improvements.

Copy link

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for this improvement!

Copy link
Owner

@irmen irmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR! Really appreciate the contribution! I have added a few questions in this review though

@irmen
Copy link
Owner

irmen commented Apr 16, 2019

@stephentoub may I ask in general, what you are using Pyrolite for?

@stephentoub
Copy link
Contributor Author

may I ask in general, what you are using Pyrolite for?

Yup. Been looking at using it as part of https://issues.apache.org/jira/browse/SPARK-27006.

@stephentoub
Copy link
Contributor Author

@irmen, please let me know if there's anything else you'd like me to address. I'm happy to make whatever changes would help to get this in and into an updated NuGet.

@irmen irmen merged commit fbf9360 into irmen:master Apr 18, 2019
@stephentoub
Copy link
Contributor Author

Thanks, @irmen.

@stephentoub stephentoub deleted the pickleperf branch April 18, 2019 22:08
@irmen
Copy link
Owner

irmen commented Apr 19, 2019

just published the new version to nuget.org as well.

@stephentoub
Copy link
Contributor Author

Awesome. Thanks!

@irmen
Copy link
Owner

irmen commented Apr 19, 2019

thanks for the improvements and interest in pyrolite, really appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants