Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index date and ID together on builds #6926

Merged
merged 1 commit into from Apr 27, 2020

Conversation

davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Apr 17, 2020

Currently the production Django admin for Builds times out. This is a fix for that.

  • The Django admin automatically adds the ID to the ordering in order to get a deterministic sort order.
  • In our setup, this greatly slows down the query since date and ID are not indexed together.
  • This index may be very slow to add on .org where the Build table has 6.8M rows. In testing, it was added in less than a minute.

Below is the query plan in production. Virtually all the time is spent on sorting.

Limit  (cost=1301869.22..1301869.72 rows=200 width=805) (actual time=42943.729..42943.815 rows=200 loops=1)
   ->  Sort  (cost=1301869.22..1317972.23 rows=6441205 width=805) (actual time=42943.723..42943.797 rows=200 loops=1)
         Sort Key: builds_build.date DESC, builds_build.id DESC
         Sort Method: top-N heapsort  Memory: 266kB
         ->  Hash Join  (cost=39213.29..1023484.97 rows=6441205 width=805) (actual time=947.943..35212.336 rows=6785440 loops=1)
               Hash Cond: (builds_build.project_id = projects_project.id)
               ->  Merge Right Join  (cost=8.83..489575.94 rows=6441205 width=223) (actual time=0.118..12322.444 rows=6785440 loops=1)
                     Merge Cond: (builds_version.id = builds_build.version_id)
                     ->  Index Scan using builds_version_pkey on builds_version  (cost=0.43..81188.92 rows=1584185 width=85) (actual time=0.006..1396.431 rows=1590883 loops=1)
                     ->  Index Scan using builds_build_version_id_cd1a89966ebc2f3 on builds_build  (cost=0.56..324291.61 rows=6441205 width=138) (actual time=0.005..7627.731 rows=6785440 loops=1)
               ->  Hash  (cost=20343.87..20343.87 rows=217487 width=582) (actual time=928.183..928.183 rows=217648 loops=1)
                     Buckets: 8192  Batches: 64  Memory Usage: 1513kB
                     ->  Seq Scan on projects_project  (cost=0.00..20343.87 rows=217487 width=582) (actual time=0.011..184.260 rows=217648 loops=1)
 Planning time: 0.699 ms
 Execution time: 42944.572 ms

I'm not 100% sure this will fix the problem but it should be pretty easy to verify before we roll it out:

CREATE INDEX CONCURRENTLY deterministic_build_sort ON builds_build ("date", "id");
-- Ensure that the Builds admin loads in the Django admin
DROP INDEX CONCURRENTLY deterministic_build_sort;

- The Django admin automatically adds the ID to the ordering
  in order to get a deterministic sort order.
- In our setup, this *greatly* slows down the query
  since date and ID are not indexed together.
- This index may be very slow to add on .org where the Build table
  has ~6.8M rows.
@davidfischer davidfischer requested a review from Apr 17, 2020
Copy link
Member

@humitos humitos left a comment

I think this is fine. However, won't this fail if we click on another field that it's not date in that page, like state? I suppose that Django will still add id and we will end up ordering by state and id which is again, not indexed.

Another different approach could be to override ModelAdmin.get_ordering to always remove the id unless it's the only sorting option. See https://docs.djangoproject.com/en/2.2/ref/contrib/admin/#django.contrib.admin.ModelAdmin.get_ordering

@davidfischer
Copy link
Contributor Author

@davidfischer davidfischer commented Apr 19, 2020

However, won't this fail if we click on another field that it's not date in that page, like state?

Probably. However, to filter by state we probably want to use the filters not the sorts. Those already work in production.

Another different approach could be to override ModelAdmin.get_ordering

I can give this a try.

Copy link
Member

@ericholscher ericholscher left a comment

I meant to ship this today -- we could ship it to the admin instance as a hotfix if we wanted to test it prior.

@ericholscher
Copy link
Member

@ericholscher ericholscher commented Apr 21, 2020

Though I guess its a DB migration, so not easy to deploy to a single server :)

@davidfischer
Copy link
Contributor Author

@davidfischer davidfischer commented Apr 21, 2020

Another different approach could be to override ModelAdmin.get_ordering

I tested this and the deterministic build ordering overrides this method. Specifically, even with a method like:

    def get_ordering(self, request):
        return ['-date']

...the final query generated by the admin ends with ORDER BY "builds_build"."date" DESC, "builds_build"."id" DESC. It looks like there is a different solution but it is not as clean.

@davidfischer
Copy link
Contributor Author

@davidfischer davidfischer commented Apr 21, 2020

I think this is fine. However, won't this fail if we click on another field that it's not date in that page, like state? I suppose that Django will still add id and we will end up ordering by state and id which is again, not indexed.

Actually, the final query when you sort by state ends with ORDER BY "builds_build"."state" ASC, "builds_build"."date" DESC, "builds_build"."id" DESC. It's possible that a (date, ID) index might still help there!

@davidfischer
Copy link
Contributor Author

@davidfischer davidfischer commented Apr 21, 2020

I tested adding the index in production. It does allow the build admin to load. Filtering works like a charm. However, sorting by other columns can cause the build admin to still time out. Also searching can also trigger timeouts. I still think having this index is better than not having it as this gives an easy look into what builds are in which states across our servers from the admin.

@ericholscher ericholscher merged commit cd12af5 into master Apr 27, 2020
2 checks passed
@ericholscher ericholscher deleted the davidfischer/deterministic-order-build-admin branch Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants