Automatically determine `select_related` and `prefetch_related` on ModelSerializer. #1964

Closed
cancan101 opened this Issue Oct 16, 2014 · 11 comments

Comments

Projects
None yet
5 participants
@cancan101
Contributor

cancan101 commented Oct 16, 2014

I have a serializer that looks the the following:

class ReportSerializer(serializers.ModelSerializer):
    person = serializers.SlugRelatedField(slug_field='uuid')

which is causing an N +1 select issue because select_related is not being used to pre-query the Person object. This means that in order to serialize the person field when serializing a Report, each Person must be queried for its uuid.

We likely want to use select_related on all SlugRelatedField and likely all nested serializers somewhere like here: https://github.com/tomchristie/django-rest-framework/blob/e437520217e20d500d641b95482d49484b1f24a7/rest_framework/serializers.py#L570

@tomchristie

This comment has been minimized.

Show comment
Hide comment
@tomchristie

tomchristie Oct 17, 2014

Member

It's actually been a conscious design decision to leave the select_related and prefetch_related down the end users.

Attempting to automatically add them would be helpful in many cases, but could also end up failing or giving worse performance in other cases. The other bit that's awkward is that we'd only want to add this to ModelSerializer (Serializer shouldn't have anything specific to Django models in it), which would end up introducing a further divide between what those two classes do, which we might or might not want.

Of course we'd also have to do the clevernesses of figuring out when to use prefetch_related and when to use select_related, so we'd be introducing a bunch of complexity.

Having said all that, might not be averse to building something like this in, in the future. Any thoughts from anyone else?

Member

tomchristie commented Oct 17, 2014

It's actually been a conscious design decision to leave the select_related and prefetch_related down the end users.

Attempting to automatically add them would be helpful in many cases, but could also end up failing or giving worse performance in other cases. The other bit that's awkward is that we'd only want to add this to ModelSerializer (Serializer shouldn't have anything specific to Django models in it), which would end up introducing a further divide between what those two classes do, which we might or might not want.

Of course we'd also have to do the clevernesses of figuring out when to use prefetch_related and when to use select_related, so we'd be introducing a bunch of complexity.

Having said all that, might not be averse to building something like this in, in the future. Any thoughts from anyone else?

@tomchristie tomchristie changed the title from select_related not being used -> many extra DB queries to Automatically determine `select_related` and `prefetch_related` on ModelSerializer. Oct 17, 2014

@cancan101

This comment has been minimized.

Show comment
Hide comment
@cancan101

cancan101 Oct 17, 2014

Contributor

A follow up question, would be "given that DRF does not currently do the
needed magic, how would I as a user specify to use select related?" I would
think that I would want to subclass and change the line of code I linked to
(ie subclass data property) or explicitly specify the queryset to the
serializer?

I feel at the very least more documentation is deserved. I was quite
surprised to see the large number of SQL queries occurring due to the slug
related usage.
On Oct 17, 2014 10:35 AM, "Tom Christie" notifications@github.com wrote:

It's actually been a conscious design decision to leave the select_related
and prefetch_related down the end users.

Attempting to automatically add them would be helpful in many cases, but
could also end up failing or giving worse performance in other cases. The
other bit that's awkward is that we'd only want to add this to
ModelSerializer (Serializer shouldn't have anything specific to Django
models in it), which would end up introducing a further divide between what
those two classes do, which we might or might not want.

Of course we'd also have to do the clevernesses of figuring out when to
use prefetch_related and when to use select_related, so we'd be
introducing a bunch of complexity.

Having said all that, might not be averse to building something like this
in, in the future. Any thoughts from anyone else?


Reply to this email directly or view it on GitHub
#1964 (comment)
.

Contributor

cancan101 commented Oct 17, 2014

A follow up question, would be "given that DRF does not currently do the
needed magic, how would I as a user specify to use select related?" I would
think that I would want to subclass and change the line of code I linked to
(ie subclass data property) or explicitly specify the queryset to the
serializer?

I feel at the very least more documentation is deserved. I was quite
surprised to see the large number of SQL queries occurring due to the slug
related usage.
On Oct 17, 2014 10:35 AM, "Tom Christie" notifications@github.com wrote:

It's actually been a conscious design decision to leave the select_related
and prefetch_related down the end users.

Attempting to automatically add them would be helpful in many cases, but
could also end up failing or giving worse performance in other cases. The
other bit that's awkward is that we'd only want to add this to
ModelSerializer (Serializer shouldn't have anything specific to Django
models in it), which would end up introducing a further divide between what
those two classes do, which we might or might not want.

Of course we'd also have to do the clevernesses of figuring out when to
use prefetch_related and when to use select_related, so we'd be
introducing a bunch of complexity.

Having said all that, might not be averse to building something like this
in, in the future. Any thoughts from anyone else?


Reply to this email directly or view it on GitHub
#1964 (comment)
.

@tomchristie

This comment has been minimized.

Show comment
Hide comment
@tomchristie

tomchristie Oct 17, 2014

Member

Assuming you're using generic views then specify it on the queryset attribute of view or viewset, eg...

queryset = Report.objects.all().select_related('person')

And yes, we should make to include some notes on that in the generic view section, plus also have a a section on general usage tips etc. Including:

  • Good API style.
  • querysets and select_related, prefetch_related.
  • DB indexes, particularly wrt filters, relationships, view lookups.
  • Model encapsulation.
Member

tomchristie commented Oct 17, 2014

Assuming you're using generic views then specify it on the queryset attribute of view or viewset, eg...

queryset = Report.objects.all().select_related('person')

And yes, we should make to include some notes on that in the generic view section, plus also have a a section on general usage tips etc. Including:

  • Good API style.
  • querysets and select_related, prefetch_related.
  • DB indexes, particularly wrt filters, relationships, view lookups.
  • Model encapsulation.
@cancan101

This comment has been minimized.

Show comment
Hide comment
@cancan101

cancan101 Oct 17, 2014

Contributor

Okay. That makes sense.

I am curious to hear what others think, but I would love to see some magic
here with automatically adding select related based on slug related usage.
On Oct 17, 2014 10:47 AM, "Tom Christie" notifications@github.com wrote:

Assuming you're using generic views then specify it on the queryset
attribute of view or viewset, eg...

queryset = Report.objects.all().select_related('person')

And yes, we should make to include some notes on that in the generic view
section, plus also have a a section on general usage tips etc. Including:

  • Good API style.
  • querysets and select_related, prefetch_related.
  • DB indexes, particularly wrt filters, relationships, view lookups.
  • Model encapsulation.


Reply to this email directly or view it on GitHub
#1964 (comment)
.

Contributor

cancan101 commented Oct 17, 2014

Okay. That makes sense.

I am curious to hear what others think, but I would love to see some magic
here with automatically adding select related based on slug related usage.
On Oct 17, 2014 10:47 AM, "Tom Christie" notifications@github.com wrote:

Assuming you're using generic views then specify it on the queryset
attribute of view or viewset, eg...

queryset = Report.objects.all().select_related('person')

And yes, we should make to include some notes on that in the generic view
section, plus also have a a section on general usage tips etc. Including:

  • Good API style.
  • querysets and select_related, prefetch_related.
  • DB indexes, particularly wrt filters, relationships, view lookups.
  • Model encapsulation.


Reply to this email directly or view it on GitHub
#1964 (comment)
.

@tomchristie

This comment has been minimized.

Show comment
Hide comment
@tomchristie

tomchristie Oct 17, 2014

Member

"slug related usage" That's just one of many possibilities of course - nested serializers and other relationship types also benefit from this.

Member

tomchristie commented Oct 17, 2014

"slug related usage" That's just one of many possibilities of course - nested serializers and other relationship types also benefit from this.

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Oct 19, 2014

Contributor

I believe that this approach if implemented correctly can enhance the performance of DRF greatly.
If @tomchristie doesn't want this to be the default it can be provided as a mixin.

The algorithm for generating the values of the arguments of select_related is:

  1. Initialize select_related_list as an empty list.
  2. For each serializer field
  3. If the field type is a serializer and the field's many attribute is False:
    1. Append the source attribute as an item to select_related_list.
  4. Else if the field's source contain references to a one to one relationship, a foreign key or a reverse one to one relationship:
    1. Initialize relationship_source to the path of the relationship (e.g. If the source is foo.bar then the path of the relationship should be foo)
    2. Append the source attribute as an item to select_related_list.

The algorithm for generating the values of the arguments of prefetch_related is:

  1. Initialize prefetch_related_list as an empty list.
  2. For each serializer field
  3. If the field type is a serializer and the field's many attribute is True:
    1. Append the source attribute as an item to prefetch_related_list.
  4. Else if the field's source contain references to a many to many relationship or a reverse foreign key relationship:
    1. Initialize relationship_source to the path of the relationship (e.g. If the source is foo.bar then the path of the relationship should be foo)
    2. Append the source attribute as an item to prefetch_related_list.

The only problem I see here is what happens when the relationship source involves a nested relationship since I don't know how Django performs the queries.
E.g. The following serializer represents a model that has a m2m to a foo model that has an FK to the bar model:

class MySerializer(ModelSerializer):
  foo = CharField(source='foo.bar.name')
  class Meta:
    model = MyModel

Should generate which query exactly?

I think that the query should be:

MyModel.objects.prefetch_related('foo').select_related('foo__bar').only('foo__bar__name')

But I'm not sure what django will do in that case. In any case, I'd use only to ensure only the fields I require are fetched.

I believe that extending this issue to support automatically detecting exactly which fields should be fetched using only()/defer() is a good idea since it relates to the exact same problem and might offer us some solutions in paths like the above.
Detecting when to use only() or defer() is much easier. You just iterate on each field and translate the source attribute into a relationship path if required.
As a bonus Django will use update_fields when updating an existing record since the fields which are not used by the serializer will be deferred. See https://github.com/django/django/blob/master/django/db/models/base.py#L614 for the implementation.

We can compromise and specify that we don't support detecting *_related on nested relationships or defer the support for those relationships to later version (which I'd prefer if any of you thinks it's too complex to implement).

Contributor

thedrow commented Oct 19, 2014

I believe that this approach if implemented correctly can enhance the performance of DRF greatly.
If @tomchristie doesn't want this to be the default it can be provided as a mixin.

The algorithm for generating the values of the arguments of select_related is:

  1. Initialize select_related_list as an empty list.
  2. For each serializer field
  3. If the field type is a serializer and the field's many attribute is False:
    1. Append the source attribute as an item to select_related_list.
  4. Else if the field's source contain references to a one to one relationship, a foreign key or a reverse one to one relationship:
    1. Initialize relationship_source to the path of the relationship (e.g. If the source is foo.bar then the path of the relationship should be foo)
    2. Append the source attribute as an item to select_related_list.

The algorithm for generating the values of the arguments of prefetch_related is:

  1. Initialize prefetch_related_list as an empty list.
  2. For each serializer field
  3. If the field type is a serializer and the field's many attribute is True:
    1. Append the source attribute as an item to prefetch_related_list.
  4. Else if the field's source contain references to a many to many relationship or a reverse foreign key relationship:
    1. Initialize relationship_source to the path of the relationship (e.g. If the source is foo.bar then the path of the relationship should be foo)
    2. Append the source attribute as an item to prefetch_related_list.

The only problem I see here is what happens when the relationship source involves a nested relationship since I don't know how Django performs the queries.
E.g. The following serializer represents a model that has a m2m to a foo model that has an FK to the bar model:

class MySerializer(ModelSerializer):
  foo = CharField(source='foo.bar.name')
  class Meta:
    model = MyModel

Should generate which query exactly?

I think that the query should be:

MyModel.objects.prefetch_related('foo').select_related('foo__bar').only('foo__bar__name')

But I'm not sure what django will do in that case. In any case, I'd use only to ensure only the fields I require are fetched.

I believe that extending this issue to support automatically detecting exactly which fields should be fetched using only()/defer() is a good idea since it relates to the exact same problem and might offer us some solutions in paths like the above.
Detecting when to use only() or defer() is much easier. You just iterate on each field and translate the source attribute into a relationship path if required.
As a bonus Django will use update_fields when updating an existing record since the fields which are not used by the serializer will be deferred. See https://github.com/django/django/blob/master/django/db/models/base.py#L614 for the implementation.

We can compromise and specify that we don't support detecting *_related on nested relationships or defer the support for those relationships to later version (which I'd prefer if any of you thinks it's too complex to implement).

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Oct 19, 2014

Contributor

Note that prefetch_related should not be applied when pagination is off or if no filtering is performed.

Also after thinking about the nested relationships problem I believe it can be solved only in Django 1.7 and above using the Prefetch object.

Contributor

thedrow commented Oct 19, 2014

Note that prefetch_related should not be applied when pagination is off or if no filtering is performed.

Also after thinking about the nested relationships problem I believe it can be solved only in Django 1.7 and above using the Prefetch object.

@tomchristie

This comment has been minimized.

Show comment
Hide comment
@tomchristie

tomchristie Nov 5, 2014

Member

At this point in time I'm closing this off in favor of the documentation ticket #1977.
Would I be happy to consider this at some point in the future? Sure, but it needs to start off life as a third party project. There's nothing to stop someone else from taking this on outside of core.

Member

tomchristie commented Nov 5, 2014

At this point in time I'm closing this off in favor of the documentation ticket #1977.
Would I be happy to consider this at some point in the future? Sure, but it needs to start off life as a third party project. There's nothing to stop someone else from taking this on outside of core.

@Deepakdubey90

This comment has been minimized.

Show comment
Hide comment
@Deepakdubey90

Deepakdubey90 Mar 23, 2015

Hi,
i'm trying to query with generic relation what should i use, either 'Select_related' or 'Prefetch_related'.
i'm having to two table user and contact.
i'm using contact table as generic(created generic relation from django view) .
so i'm making query like:
result = User.objects.filter().select_related('id')
in above queryset i'm passing "ID" to select_related . (ID is pk of "user" table and acting as Entity_object_id in "contact" table and Entity_content_type_id)
How can i get each user details with their contact details
for generic relation used link :https://docs.djangoproject.com/en/1.7/ref/contrib/contenttypes/#generic-relations.

Hi,
i'm trying to query with generic relation what should i use, either 'Select_related' or 'Prefetch_related'.
i'm having to two table user and contact.
i'm using contact table as generic(created generic relation from django view) .
so i'm making query like:
result = User.objects.filter().select_related('id')
in above queryset i'm passing "ID" to select_related . (ID is pk of "user" table and acting as Entity_object_id in "contact" table and Entity_content_type_id)
How can i get each user details with their contact details
for generic relation used link :https://docs.djangoproject.com/en/1.7/ref/contrib/contenttypes/#generic-relations.

@xordoquy

This comment has been minimized.

Show comment
Hide comment
@xordoquy

xordoquy Mar 23, 2015

Collaborator

@Deepakdubey90 please do use the mailing list, IRC or stack overflow for usage question.

Collaborator

xordoquy commented Mar 23, 2015

@Deepakdubey90 please do use the mailing list, IRC or stack overflow for usage question.

@Deepakdubey90

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment