Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization gets slow after deserializing large collection #541

Open
andreas-eriksson opened this issue Aug 19, 2019 · 4 comments
Open

Comments

@andreas-eriksson
Copy link

I need to deserialize a large collection and then a lot of smaller objects. The performance however degrades considerably when deserializing the smaller objects if I deserialized the large object first. Is there a cache or something that needs to be cleared between deserializations?

I made a small sample program that recreates the problem. Deserializing a single node 1000 times takes 0 milliseconds if I don't first deserialize the large collection and 1200 milliseconds if I do.

using ProtoBuf;
using System;
using System.Collections.Generic;
using System.IO;

namespace ProtobufPerf
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            byte[] data;

            using (var ms = new MemoryStream())
            {
                var node = new Node { Id = 1, Name = "Test" };
                Serializer.Serialize(ms, node);
                data = ms.ToArray();
            }

            GenerateData();

            var stopwatch = System.Diagnostics.Stopwatch.StartNew();

            for (int i = 0; i < 1000; i++)
            {
                using (var ms = new MemoryStream(data))
                {
                    var node = Serializer.Deserialize<Node>(ms);
                }
            }

            Console.WriteLine(stopwatch.ElapsedMilliseconds);

            Console.Read();
        }

        private static void GenerateData()
        {
            byte[] data;
            using (var ms = new MemoryStream())
            {
                var b = new NodeCollection();
                for (int i = 0; i < 500000; i++)
                {
                    var item = new Node
                    {
                        Id = i,
                        Name = i.ToString(),
                    };

                    b.Nodes.Add(item);
                }

                Serializer.Serialize(ms, b);
                data = ms.ToArray();
            }

            using (var ms = new MemoryStream(data))
            {
                var nodes = Serializer.Deserialize<NodeCollection>(ms);
            }
        }

        [ProtoContract]
        public class NodeCollection
        {
            [ProtoMember(1, AsReference = true)]
            public HashSet<Node> Nodes { get; set; }

            public NodeCollection()
            {
                Nodes = new HashSet<Node>();
            }
        }

        [ProtoContract]
        public class Node
        {
            [ProtoMember(1)]
            public int Id { get; set; }

            [ProtoMember(2)]
            public string Name { get; set; }

            [ProtoMember(3, AsReference = true)]
            public Node Child { get; set; }
        }
    }
}

It seems that removing the AsReference = true from the NodeCollection.Nodes property speeds things up considerably, but don't I need that to be able to track all nodes and their children?

@mgravell
Copy link
Member

Well, I can confirm you aren't imagining it. Presumably something in the NetObjectCache is playing up, but I haven't looked at what yet; you could try the following, but I haven't tested whether it is reliable in your scenario:

using (var reader = ProtoReader.Create(ms, RuntimeTypeModel.Default, null))
{
    var node = reader.Model.Deserialize(reader, null, typeof(Node));
}

but yeah, something is definitely broken in there

@andreas-eriksson
Copy link
Author

Thanks Marc, it seems to work okay.

Would you recommend using the same pattern for all deserialization calls or only when deserializing large objects?

I also tested by using it on the large collection and that seemed to work as well.

using (var reader = ProtoReader.Create(ms, RuntimeTypeModel.Default))
{
      var nodes = reader.Model.Deserialize(reader, null, typeof(NodeCollection));
}

@mgravell
Copy link
Member

This is more of a workaround than a recommendation - it abuses some knowledge of what happens internally with the pooling model: in the current code, the code shown above won't continually re-use the same reader.

The recommendation is that I find and fix the actual problem, especially since the above workaround won't work with the 3.* changes that are in my pending branches. So... yeah, I need to find out what is actually wrong here!

@andreas-eriksson
Copy link
Author

Ok, let me know if you need any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants