New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make django-lifecycle much, much faster #64
Conversation
eca83db
to
194861d
Compare
|
I didn't measure this but I have to imagine that the deduplicated caches would save a lot of memory as well. |
a62b897
to
b258b00
Compare
Some of the work that the lifecycle mixin is during the initialization of new model objects is very expensive and unnecessary. It's calculating (and caching) field names and foreign key models per-object, rather than per-class / model. All instances of a model are going to have the same field names and foreign key model types so this work actually only needs to be done once per model type. Replacing cached methods with cached classmethods yields a very sizable performance improvement when creating a bunch of new model instances.
|
I finally got the tests passing. The changes necessary to do so are in a separate commit in case it's easier to review that way. The performance difference is pretty significant. Our code does a lot of heavy creating, saving, querying and deleting of model instances, many tens of thousands of them, and our speedup ranges from 40% to 65%. It's doing a bunch of other stuff too so it's not a good pure benchmark, but it shows that there are very tangible benefits in real-world code. Here's some cProfile graphs showing before and after for different parts of our code. Before:After:Before:After: |
|
(The graphs don't show every single function called, just the ones significant enough impact to be worth paying attention to. That's why the other blocks disappeared -- they're still being called, but only like <10 times instead of tens/hundreds of thousands of times like before, which means the impact is totally insignificant). |
| attr = getattr(self, name) | ||
| if ismethod(attr) and hasattr(attr, "_hooked"): | ||
| attr = getattr(cls, name) | ||
| if isfunction(attr) and hasattr(attr, "_hooked"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor semantic change right here - I think the type of the python object under the hood depends on whether you're looking at it from the instance perspective (where methods are special and self is implicitly passed along) or from the class perspective (where methods are just functions and there's no magic going on).
I don't think it's problematic but I want to call it out for a little extra attention.
| @@ -196,15 +199,15 @@ def _run_hooked_methods(self, hook: str) -> List[str]: | |||
| if when_field: | |||
| if self._check_callback_conditions(when_field, callback_specs): | |||
| fired.append(method.__name__) | |||
| method() | |||
| method(self) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flip side of that thing I just mentioned, it's returning functions not bound methods, so we can't rely on internal Python magic to pass self along behind the scenes. But functionally it's the same thing.
|
I can squash the commits once you're done reviewing also |
|
This is great. Thank you very much. |




Some of the work that the lifecycle mixin is during the initialization of new model objects is very expensive and unnecessary. It's calculating (and caching) field names and foreign key models per-object, rather than per-class / model. All instances of a model are going to have the same field names and foreign key model types so this work actually only needs to be done once per model type.
Replacing cached methods with cached classmethods yields a very sizable performance improvement when creating a bunch of new model instances.