Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile page performance issue Fix #5472

Merged

Conversation

@saadmk11
Copy link
Member

@saadmk11 saadmk11 commented Mar 16, 2019

closes #5468

@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Mar 16, 2019

This seems like a elegant fix for the optimization. I did not know about solving the perfomance issue using the Exits query.

Copy link
Contributor

@davidfischer davidfischer left a comment

I just realized the project_list_detailed.html template is also used to display the pages for tags (eg. /projects/tags/tag-slug/). As-is, since on that page there is no project.good_build attribute, it shows all the projects with no builds.

I see two possibilities going forward here:

  • We also need to change the tags view to have a similar annotation
  • We can make the annotation write to a "private" variable (like _good_build) and then the project.has_good_build method can check if that variable exists and use it if it does and fallback to making an extra query.

This is becoming less of a good first issue and if you need help with it I'm happy to take it back and build on top of what you've done already.

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 18, 2019

@davidfischer I would like to give it a try if its okay with you.

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 19, 2019

@davidfischer Updated The PR please have a look and let me know if the implementation is what you wanted. and what do you think I should do for the other annotation latest_build_date as it is used in /projects/tags/tag-slug/ page also.

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 19, 2019

I like the solution for has_good_build and that was exactly what I was thinking!

I think we can use a similar approach for latest_build_date. Perhaps there's a way to get the whole build object and re-use the property project.get_latest_build?

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 20, 2019

@davidfischer Great! Actually I Didn't use get_latest_build because it created an extra Database query. But now following this approach we can solve it. But I was thinking we don't use the full build object on the template and also I was not able to find a way to get a single build object in the same query. so can we just get the latest build date by adding another property or can we add the variable on the tag view also using the same approach? What do you think?

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 21, 2019

This was a little trickier than I thought but you can control the prefetching with a Prefetch object:

from readthedocs.projects.models import Project
from readthedocs.builds.models import Build

from django.db.models import Prefetch


latest_build = Prefetch('builds', queryset=Build.objects.filter(state='finished', type='html'), to_attr='_latest_build')

# Executes 2 queries
projects = list(Project.objects.all().prefetch_related(latest_build))

# This executes 0 queries
for project in projects:
    print(project.slug)
    if project._latest_build:
        # It is a little awkward that "_latest_build" is a list... and it could be empty
        print(' - {}'.format(project._latest_build[0].date))

The above code isn't ideal but it should give you some tips on working this out.

@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Mar 22, 2019

@davidfischer The queryset projects = list(Project.objects.all().prefetch_related(latest_build)) will fetch all the builds from all the projects that are selected. I think its more likely memory heavy and make the queryset also much slow.

Do you think its good enough for fetching the latest build data that we need or fetch the whole object?

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 22, 2019

I think its more likely memory heavy and make the queryset also much slow.

Hmmmm, you're right. Maybe there's a way to limit it to just 1 result.

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 22, 2019

Do you think its good enough for fetching the latest build data that we need or fetch the whole object?

A lot of things get simpler if we can fetch the whole build object. However, we don't need to fetch 100s of build objects if possible.

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 22, 2019

Thanks for looking at this @safwanrahman. I think this will do it if you wouldn't mind taking another look:

from django.db.models import OuterRef, Subquery, Prefetch

from readthedocs.projects.models import Project
from readthedocs.builds.models import Build


subquery = Subquery(Build.objects.filter(state='finished', type='html', project=OuterRef('project_id')).values_list('id', flat=True)[:1])
latest_build = Prefetch('builds', Build.objects.filter(pk__in=subquery), to_attr='_latest_build')
projects = list(Project.objects.all().prefetch_related(latest_build))

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 22, 2019

@davidfischer Thanks a lot for your help. I tried out your code in the shell and also in the application but I can't seem to reduce the number of Database Queries. When ever project._latest_build is called it creates a new Query to the database.

>>> from django.db import connection
>>> from django.db.models import OuterRef, Subquery, Prefetch
>>> from readthedocs.projects.models import Project
>>> from readthedocs.builds.models import Build
>>> 
>>> subquery = Subquery(Build.objects.filter(state='finished', type='html', project=OuterRef('project_id')).values_list('id', flat=True)[:1])
>>> latest_build = Prefetch('builds', Build.objects.filter(pk__in=subquery), to_attr='_latest_build')
>>> # Executes 2 queries
>>> projects = list(Project.objects.all().prefetch_related(latest_build))
>>> 
>>> # Executes 8 queries as 8 projects had builds.
>>> for project in projects:
...     if project._latest_build:
...         project._latest_build
... 
[<Build: Build Cue for AnonymousUser admin (11)>]
[<Build: Build Django Test Utils for AnonymousUser admin (13)>]
[<Build: Build Fabric for AnonymousUser admin (15)>]
[<Build: Build Kong for AnonymousUser admin (21)>]
[<Build: Build Pinax for AnonymousUser admin (19)>]
[<Build: Build Read The Docs for AnonymousUser admin (9)>]
[<Build: Build South for AnonymousUser admin (20)>]
[<Build: Build Tastypie for AnonymousUser admin (16)>]
>>> len(connection.queries)
10

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Mar 22, 2019

Those queries are coming from querying the project_user table from the __str__ method on build. (the "for AnonymousUser" part). If you print say only the build date and ID you won't see them.

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 22, 2019

@davidfischer I have Updated the PR according to your recommendations.
I have removed state='finished', type='html' from the filter as previously the template wasn't using the project.get_latest_build() method it was just getting the latest build like project.builds.all.0. Let me know what you think about the PR and thanks a lot for Guiding me. 👍

Copy link
Contributor

@davidfischer davidfischer left a comment

This worked well and performed much better locally for me!

@saadmk11
Copy link
Member Author

@saadmk11 saadmk11 commented Mar 27, 2019

Great

@davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Apr 27, 2019

I'm interested in getting this merged soon if possible.

humitos
humitos approved these changes May 9, 2019
Copy link
Member

@humitos humitos left a comment

I didn't test it locally, but changes look good to me.

@davidfischer davidfischer merged commit 768856e into readthedocs:master May 9, 2019
1 check passed
@saadmk11 saadmk11 deleted the profile-page-performance-fix branch May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants