-
Notifications
You must be signed in to change notification settings - Fork 107
Description
There is currently a bug on the staging branch where database prefetching (which is used to mitigate N+1 problems - #577) isn't working for deeply nested foreign keys when query parameters are passed to the endpoint.
Illustration & Reproduction Intructions
The bug can be reproduced as follows:
- Install Django Debug Toolbar on your local fork (this allows you to view response times, SQL query count, etc.)
- Checkout the
stagingbranch and spin up a development server - Visit the
/v2/spellsendpoint – notice that Django is making 7 SQL calls to the DB. All good, prefetching the related fields is working well. - Now visit
/v2/spell/?fields=document– this instructs the API to only return thedocumentfield for each spell. Notice that Django now makes 102 SQL calls to the DB (yikes!).
Digging into the SQL breakdown, it looks like including the document query parameter broke the prefetching for the document__publisher and document__gamesystem that was stopping N+1 problems occuring.
I suspect that the issuelies in the implementation of the EagerLoadingMixin (/api_v2/views/mixins.py)
Additional Context
Drilling down into the EagerLoadingMixin code:
# Override these lists in child views
select_related_fields = [] # ForeignKeys to optimise
prefetch_related_fields = [] # ManyToMany & reverse relationships to prefetch
def get_queryset(self):
"""Override DRF's default get_queryset() method to apply eager loading"""
queryset = super().get_queryset()
request = self.request
# Get query parameters
requested_fields = request.query_params.get('fields', '')
depth = int(request.query_params.get('depth', 0))
if requested_fields:
requested_fields = set(requested_fields.split(','))
else:
# If no fields requested, apply all opitmisations
requested_fields = set(self.select_related_fields + self.prefetch_related_fields)
# Filter fields based on on which have been requested
select_fields = [field for field in self.select_related_fields if field in requested_fields]
prefetch_fields = [field for field in self.prefetch_related_fields if field in requested_fields]
# Apply optimisations
if select_fields:
queryset = queryset.select_related(*select_fields)
if prefetch_fields:
queryset = queryset.prefetch_related(*prefetch_fields)
return queryset
The job of this piece of code is: when a fields query parameter is provided, filter any fields not included in query parameter from the select_related_fields and prefetch_related_fields. This is so that fields that are not being returned by the API are not being unneccesarily prefetched.
The problem here is that it, if the query parameter is document (as in our test case), it will filter out the document__publisher and document__gamesystem items from the prefetch_related_fields (defined in /api_v2/views/spells.py), so these nested fields will no longer be prefetched. This leads to N+1 problems
Proposed Solution
The best solution IMHO would be to update the EagerLoadingMixin so that prefetched/selected fields are filtered based on whether they contain a query parameter as a substring. ie. if ”/?fields=document”, then prefetch_fields = [“document”, “document__publisher”, “document__gamesystem”]
However, this might cause problems if nested fields are requested. ie.
/?fields=document&document__fields=name,gamesystem
In this case we would want prefetch_fields = [“document”, “document__gamesystem”] (not need to prefetch the publisher, it won’t be returned by the API.