New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve deserialization performance by ~22%. #97
Conversation
Deserialization CPU performance was suffering because both LookupDiscriminatorConvention and LookupSerializer were being called for nearly each class member. Both functions take out and release the global config read-lock. Profiling showed that around 25% of all CPU time was being spent in ReaderWriterLockSlim.TryEnterReadLock and ReaderWriterLockSlim.ExitReadLock due to these functions. Fix makes the BsonClassMap and BsonMemberMap persist references to the appropriate IDiscriminatorConvention and IBsonSerializer singletons. By doing this, lookups are avoided while maintaining semantic correctness due to the associated BsonClassMap and BsonMemberMap types never changing once constructed. Also introduced is the FastSingleton class, which extends the existing runtime singleton logic as follows: 1. Enables more static singleton binding both at compile time and when using generics. 2. Enables lazy singleton instantiation 3. Decouples singleton data structure logic from type-specific instantiation logic. 4. Provides structural performance advantages by isolating runtime binding reader/writer locks and dictionaries by use-case. Other issues addressed: 1. DefaultSerializers are now always lazy-bound. This simplified the implementation by ensuring singletons are only set once in the case where a user has specified a custom serializer. 2. BsonSerializer.Serialize had a bug where a class implementing IBsonSerializable would always be serialized using the IBsonSerializable implemention, even if the implementation was overridden by a registered IBsonSerializer.
I tried to reproduce your 22% speed improvement and did not see the same improvement. I ran the test program that you provided me separately 4 times each with the driver from master and from your fork. The results were: master: 10.68, 10.71, 10.72, 11.30 (seconds for 10,000 iterations) So I'm seeing essentially no speed difference at all. I'm not sure why. |
At Alex's suggestion I have repeated the tests without a debugger attached and also using Release builds. Without a debugger attached I see 7% speed improvement in Debug builds and 9% improvement in Release builds. Maybe not the same improvement Alex is seeing, but certainly significant now. |
I've poured over this pull request and I'm pretty convinced that the only thing contributing to any speed improvement is that BsonClassMap and BsonMemberMap are caching information about the corresponding serializer and discriminator convention. I don't think FastSingleton is contributing anything. I also think FastSingleton is incredibly confusing and would rather not adopt it. Check out this experimental commit which achieves even better performance than this pull request with very much simpler changes: Would love to know if you see similar performance improvements on your machine using my experimental commit (or whether you see any bugs or thread safety issues in my experimental commit). |
I pushed another experimental commit to my personal branch: that replaces the HashSet used in BsonClassMapSerializer.Deserialize with a linear array. I use a shrinking from/to range to avoid resizing the array, and the fact that elements probably appear in the same order as the memberMaps to assume that a linear search will be very fast. This code needs to be cleaned up a bit before being incorporated into the master branch, but it should serve the purpose of proving that the speed improvements are real. |
Created two JIRAs for the two components of speed improvements discussed above (caching serializers and discriminator conventions and not using HashSet to detect missing required elements): https://jira.mongodb.org/browse/CSHARP-451 While we won't be merging in this pull request verbatim I would like to give full credit to this pull request for finding these two opportunities for speed improvements and for the research and prototyping involved. Thanks Alex! |
Deserialization CPU performance was suffering because both LookupDiscriminatorConvention and LookupSerializer were being called for nearly each class member. Both functions take out and release the global config read-lock. Profiling showed that around 25% of all CPU time was being spent in ReaderWriterLockSlim.TryEnterReadLock and ReaderWriterLockSlim.ExitReadLock due to these functions.
Fix makes the BsonClassMap and BsonMemberMap persist references to the appropriate IDiscriminatorConvention and IBsonSerializer singletons. By doing this, lookups are avoided while maintaining semantic correctness due to the associated BsonClassMap and BsonMemberMap types never changing once constructed.
Also introduced is the FastSingleton class, which extends the existing runtime singleton logic as follows:
Other issues addressed: