-
-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of serializing nested collections is poor #10
Comments
The deepcopy operation is expensive, but necessary, so that serializers can store errors from nested serializers. I did a little work with cProfile and your code above (the gist of the script is here) and found 2 significant speedups:
Example: collaborators = fields.Nested(UserSerializer(), many=True) instead of collaborators = fields.Nested(UserSerializer, many=True) This avoids repeating the initialization code (including the deepcopy) for each collaborator. In the future, it'll be better to cache the nested serializer object, or disallow passing classes altogether.
These two modifications decreased the execution time of the above script by almost half. Thanks for reporting this. I will continue to do more profiling and see where performance can be improved even further. |
I underestimated the effect of passing in an instance into a nested Field: doing this for both the "user" and the "collaborators" field reduces the total runtime of the profiling script from ~5.7s to ~1.6s. class BlogSerializer(Serializer):
title = fields.String()
user = fields.Nested(UserSerializer())
collaborators = fields.Nested(UserSerializer(), many=True)
categories = fields.List(fields.String)
id = fields.String() |
That's great! Very interesting... Thanks again! |
Glad I could help. I've made some further improvements so that performance should be good whether you pass in a Serializer class or a Serializer object into a Nested field. |
I have same issue with serialize, and I was run into this issue. I think, you should update the documentation http://marshmallow.readthedocs.org/en/latest/nesting.html about use instance instead of class passing trough. |
…ass declaration time #24 Reduced query time from ~1.35s to ~1.28s (-5%) marshmallow-code/marshmallow#10 (comment)
@sloria, just to be clear-- performance should now be similar in both of the following scenarios:
so if nested fields are running slow for me it's just my shitty code? |
@mgd722 I haven't compared the two usages in a while, but they should be similar. If you're serializing ORM objects, I'd first look into your relationship loading technique and make sure you're not running into the n+1 problem. |
I worked up a quick test using the nose
timed
decorator.The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it's still rather slow.
I did a little bit more testing with
profile
. Serializing the whole blog collection was running between 5 and 6s.It looks like the bottleneck is the
deepcopy
operation in serializer.py and it doesn't seem like the call can be removed, or changed to a pickle/unpickle operation.I'm going to keep digging to see what I can do. If you have any insight, I'd appreciate the help. Thanks!
The text was updated successfully, but these errors were encountered: