Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListSerializer.to_representation does not respect prefetches #2704

Closed
aleontiev opened this issue Mar 17, 2015 · 20 comments
Closed

ListSerializer.to_representation does not respect prefetches #2704

aleontiev opened this issue Mar 17, 2015 · 20 comments
Labels
Milestone

Comments

@aleontiev
Copy link
Contributor

aleontiev commented Mar 17, 2015

Update: Related to #2727 - @tomchristie


I noticed some undesirable behavior around ListSerializer's implementation of to_representation in the context of relations prefetched with Django 1.7's Prefetch object.

I am talking about this block of code:

        # Dealing with nested relationships, data can be a Manager,
        # so, first get a queryset from the Manager if needed
        iterable = data.all() if isinstance(data, (models.Manager, query.QuerySet)) else data
        return [
            self.child.to_representation(item) for item in iterable
        ]

To demonstrate the problem, consider a request made to fetch all auth.Users and their related auth.Groups matching a certain filter. This can be done with Prefetch like so:

>>> users_with_test_groups = User.objects.all().prefetch_related(Prefetch('groups', queryset=Group.objects.filter(name__icontains='test')

Lets look at the first user and his prefetched groups:

>>> user = users_with_test_groups[0]
>>> user.groups.all()
[<Group: test>]

... so far, so good. Now let's add DRF into the mix; I have a UserSerializer and a related GroupSerializer:

UserSerializer(ModelSerializer):
   groups = GroupSerializer(many=True)
   ...

GroupSerializer(ModelSerializer):
   ...   

If I call UserSerializer(users_with_test_groups, many=True).data, I expect to see my first user returned with only those related groups that contain test. But I actually see ALL groups related to that user!

This is because calling .all() on a filtered managed queryset will re-evaluate that queryset as if it has no filters:

>>> user.groups.all()
[<Group: test>]
>>> user.groups.all().all()
[<Group: student>, <Group: test>]

This took a while to figure out and seems like bizarre behavior. Why is calling .all() on a queryset necessary when you can just iterate over it?

As a workaround, I have started using a custom ListSerializer that never calls .all(). Interested to hear if anybody else has run into this or tried using DRF Serializers together with the Prefetch object.

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

Closing this as duplicate of #2442.

@aleontiev
Copy link
Contributor Author

aleontiev commented Mar 17, 2015

@xordoquy this is actually a distinct issue that only comes up when you use the Prefetch object along with a queryset to filter the relation.

My use case is providing a flexible GET API that provides filtering on secondary related resources.

@xordoquy xordoquy reopened this Mar 17, 2015
@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

Will take time to investigate this further then

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

This took a while to figure out and seems like bizarre behavior. Why is calling .all() on a queryset necessary when you can just iterate over it?

This is to force the QS reevaluation when someone writes the view class such as:

class View(...):
    queryset = User.objects.all()

Which means that the queryset will be evaluated at the class declaration rather than on per request basis if we didn't force the reevaluation.

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

@tomchristie do we really need the queryset reevaluation ?
It definitively helps for new comers that don't fully understand QS but I'm a bit afraid we'll get a couple of similar issues - such as this one - at some point

@aleontiev
Copy link
Contributor Author

aleontiev commented Mar 17, 2015

This isn't a major blocker right now (e.g. we've been able to work around it with a custom ListSerializer and/or custom relation field that proxies a serializer)

However, perhaps there should be an setting that allows you to toggle this behavior, which can be on by default for newcomers per your example?

@tomchristie
Copy link
Member

tomchristie commented Mar 17, 2015

@tomchristie do we really need the queryset reevaluation ?

@xordoquy Sorry, which one specifically?

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

@tomchristie the one we have in serializers:

        # Dealing with nested relationships, data can be a Manager,
        # so, first get a queryset from the Manager if needed
        iterable = data.all() if isinstance(data, (models.Manager, query.QuerySet)) else data
        return [
            self.child.to_representation(item) for item in iterable
        ]

The more I think about it the more I believe it is encouraging bad patterns (i.e. queryset evaluated where they shouldn't be)

@tomchristie
Copy link
Member

tomchristie commented Mar 17, 2015

We certainly need it for the Manager case - I expect we do need it for the queryset case too, but it'd be worth look at the history for that line.

@kevin-brown
Copy link
Member

kevin-brown commented Mar 17, 2015

This is because calling .all() on a filtered managed queryset will re-evaluate that queryset as if it has no filters:

Is this something that's documented? It seems strange that calling all() on a queryset multiple times will perform different queries.

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

@kevin-brown adding .all() creates a new queryset which by default isn't evaluated hence the second query. In most cases the .all().all() won't add additional DB request since the first queryset won't get a chance to be evaluated.

@kevin-brown
Copy link
Member

kevin-brown commented Mar 17, 2015

My question is specifically about the different query. I understand that it should be re-evaluated, which would be fine if it triggered the same query, but I don't understand why it's being evaluated using a different query.

I'm specifically referencing this

>>> user.groups.all()
[<Group: test>]
>>> user.groups.all().all()
[<Group: student>, <Group: test>]

@xordoquy
Copy link
Collaborator

xordoquy commented Mar 17, 2015

@kevin-brown you probably have a point there. It should have been the very same request played another time. Probably an issue on the Django part.

@aleontiev
Copy link
Contributor Author

aleontiev commented Mar 17, 2015

@kevin-brown @xordoquy this does seem like a Django bug / undesired behavior related to the way querysets are cloned, which is happening during calls to .all. (https://github.com/django/django/blob/stable/1.7.x/django/db/models/query.py#L953)

The prefetch context is lost during cloning; also note that calling .all and then re-evaluating an evaluated queryset will issue another query:

>>> from django.db.models import Prefetch
>>> from django.contrib.auth.models import User, Group
>>> user = User.objects.create(username='test@prefetch.com', email='test@prefetch.com')
>>> user.groups = [Group.objects.create(name='test1'), Group.objects.create(name='test2')]
>>> connection.queries = []
>>> users = User.objects.filter(pk=user.pk).only('id').prefetch_related(Prefetch('groups', queryset=Group.objects.filter(name='test1').only('name')))
>>> user = users[0]
>>> connection.queries
[{u'sql': u'SELECT "auth_user"."id" FROM "auth_user" WHERE "auth_user"."id" = 1435 LIMIT 1', u'time': u'0.001'}, {u'sql': u'SELECT ("auth_user_groups"."user_id") AS "_prefetch_related_val_user_id", "auth_group"."id", "auth_group"."name" FROM "auth_group" INNER JOIN "auth_user_groups" ON ( "auth_group"."id" = "auth_user_groups"."group_id" ) WHERE ("auth_group"."name" = \'test1\' AND "auth_user_groups"."user_id" IN (1435))', u'time': u'0.001'}]
>>> len(connection.queries)
2
>>> user_groups = user.groups.all()
>>> user_groups
[<Group: test1>]
>>> len(connection.queries)
2
>>> user_groups.all()
[<Group: test1>, <Group: test2>]
>>> len(connection.queries)
3
>>> user_groups.all().all()
[<Group: test1>, <Group: test2>]
>>> len(connection.queries)
4

@tomchristie
Copy link
Member

tomchristie commented Jun 24, 2015

Have closed #2727 as a duplicate of this, although possible that we can more broadly state the issue, as it's probably not just prefetch_related that's at issue, but anything else that can be lost when .all() is called on the queryset.

@tomchristie
Copy link
Member

tomchristie commented Jun 24, 2015

TODO to progress this issue:

  • Issue a pull request with iterable = data.all() if isinstance(data, models.Manager) else data (Note that QuerySet is removed from the isinstance check.
  • Do any tests fail?
  • Check the blame/history on that line - when was it last modified and when was QuerySet added? What was the rationale at the time?

Note that #2727 includes a very trival example for demonstrating the behavior that doesn't rely on prefetch_related... #2727 (comment)

@jpadilla
Copy link
Member

jpadilla commented Jun 25, 2015

Working on #3076 to hopefully progress this issue.

@tomchristie
Copy link
Member

tomchristie commented Jun 25, 2015

Closed by #3076.

@tomchristie tomchristie added this to the 3.1.4 Release milestone Jun 25, 2015
@aleontiev
Copy link
Contributor Author

aleontiev commented Jun 25, 2015

Thanks for the fix guys :)

@tomchristie
Copy link
Member

tomchristie commented Jun 25, 2015

😄

@tomchristie tomchristie modified the milestones: 3.1.4 Release, 3.2.0 Release Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants