Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate any existing prefetch cache on PUT requests. #4668

Merged
merged 2 commits into from Nov 11, 2016

Conversation

tomchristie
Copy link
Member

@tomchristie tomchristie commented Nov 10, 2016

Invalidate the prefetch cache instead of reloading the object from the database when a prefetch_related queryset is used with a PUT request.

Would be nice to also have a test case demonstrating why this approach has a different behavioral effect.

Closes #4661.
Refs #4553.

@tomchristie tomchristie added this to the 3.5.4 Release milestone Nov 10, 2016
@tomchristie tomchristie merged commit 24791cb into master Nov 11, 2016
4 checks passed
@tomchristie tomchristie deleted the invalidate-prefetch-cache branch Nov 11, 2016
@spectras
Copy link

spectras commented Nov 11, 2016

The only drawback I can see to this approach is, if I remember well, in the absence of a prefetch cache, every use of the related objects from that point forward will trigger a query. (but double-check that, I did not follow the latest Django updates very closely).
Other than this, it seems to be better than the former approach in every aspect: faster, won't reload queryset that won't get used, will not raise a 404 if the update took the object out of the queryset.

I was wondering whether this could be a possible solution too, digging just a single bit deeper into the queryset internals:

cache = getattr(instance, '_prefetched_objects_cache', None)
if cache:
    for key, queryset in cache.items():
        cache[key] = queryset.all()

Or, invalidating the cache in place instead:

cache = getattr(instance, '_prefetched_objects_cache', None)
if cache:
    for queryset in cache.values():
        queryset._result_cache = None

The idea is using related objects will still use the prefetch cache. First use will re-load data from the database, subsequent loads will not.

(I for one would love to see a class-level flag allowing to disable that code section entirely, but if I am too isolated a user, I will just use my own altered UpdateMixin where I need it).

@tomchristie
Copy link
Member Author

tomchristie commented Nov 11, 2016

Thanks for all that! Given that we use the instance once-only after removing the cache it's not clear that'd benefit us, tho would be perfectly happy to consider any pull requests that came along with assertNumQueries to demonstrate any improvements that we could still make here.
We'll roll with what we've got for now. 😄

@spectras
Copy link

spectras commented Nov 11, 2016

I don't either, let's just say that if someone does he'll make his own mixin too. Well, thanks for taking the time to come up with an improved solution.
I still believe third-party packages should not "fix" or deviate from Django behaviors when they have an equivalent but, what can I say, it's your project and I have great respect for your work. I will live with it - and much more easily so with this new approach.

@fredpalmer
Copy link

fredpalmer commented Nov 16, 2016

I'm not sure this patch is the most diplomatic approach...

People spend a lot of time tweaking select_related and prefetch_related queries to minimize the amount of queries needed for m2m and other related objects. At least this is only happing on updates. However, it's sneaky in that one has dive into the depths of the DRF framework to understand why one's data is incurring a possible 2x+ the normal amount of queries for a PATCH request that updates a simple property only on the root object. Although, I'm currently not on this version - Unless this is changed, I now have make sure to bypass this behavior when I upgrade to the next point release. Don't get me wrong, I have to do this as well currently. However, there a few differences in my approach:

The primary differences are:

  • When the cache is cleared (before any updates take place)
  • What has to be reloaded at the end (only the related items)
    def update(self, instance, validated_data):
        """
        Surgically removes potentially stale, prefetched, cached, related objects
        in order to ensure non-stale related data in the serialized object after updating it.
        """
        # There are several reports of issues around this, e.g.:
        #   https://github.com/tomchristie/django-rest-framework/pull/1754
        #   https://github.com/tomchristie/django-rest-framework/pull/4668
        if hasattr(instance, "_prefetched_objects_cache"):
            # noinspection PyProtectedMember
            prefetch_objects_cache = instance._prefetched_objects_cache

            # It's common practice to support nested writes by popping it from `validated_data`
            # before going through the standard `update` from DRF
            initial_data_keys = self.initial_data.keys() if self.initial_data else []
            validated_data_keys = validated_data.keys()
            all_possible = set(initial_data_keys + validated_data_keys)
            prefetch_keys_to_clear = set(prefetch_objects_cache.keys()).intersection(all_possible)
            for key in prefetch_keys_to_clear:
                del prefetch_objects_cache[key]

        # This would essentially go on to DRF's `update`
        return super(BaseModelSerializer, self).update(instance, validated_data)

Summing Up...

  • The current implementation is not just re-loading the the related objects, it's reloading the root object, the m2m related objects and any other dependencies that could be needed and otherwise optimized within select_related.
  • Clearing the _prefetched_objects_cache before updating the object fixes the issue without the need to reload everything.
  • Developers are very deliberate about what they choose to prefetch_related. Automatically clearing might lead to unhappy users.
  • The ability to opt-in to this new behavior might also be more palatable (mentioned already somewhere else).
  • There may be more one could do here in my sample implementation to ascertain the need to clear an individual entry in the cache. For example:
    • One already has incurred the cost to populate the cache, one could determine if there's a delta between the already saved, related objects and the validated ones.

I see there's already been a ton of discussion around this - sorry I'm late to the party. Thanks for the wonderful framework - I hope that helps

@spectras
Copy link

spectras commented Nov 16, 2016

@fredpalmer > current release re-invokes get_object() indeed. The point of the change in this very issue is to invalidate the cache instead. Other than this I share your feeling. The issue being, some developers are not that deliberate about what they choose to prefetch_related and they regularly come and complain of stale cache entries because they fail to reload or update them.

For now, the best workaround is to have your own UpdateMixin that does not reload the object, and pull that in in all your ViewSets. I sure wish it was opt-in but, given the rationale, as I understand it, is some developers can't read, they would not read about the opt-in either.

At least with this patch here, the impact will be much more limited that with current release and it will avoid some pitfalls.

@carltongibson
Copy link
Collaborator

carltongibson commented Nov 16, 2016

@tomchristie Is the behaviour here settled? (It's currently unreleased right?)

Lots of people run into this when first using prefetch_related — so doing something as a correct but maybe slow default seems good.

But equally, we're all adults, and I keep coming back to the upstream decision not to handle this, i.e. to leave it to application developers.

For now, the best workaround is to have your own UpdateMixin that does not reload the object, and pull that in in all your ViewSets

This seems (to me) an overly heavy burden to put on everybody just in order to add what are essentially training wheels for using prefetch_related.

If we do want this, do we not also want — although "want" isn't quite what I mean to say — a setting and/or an GenericAPIView-level attribute?

@tomchristie
Copy link
Member Author

tomchristie commented Nov 16, 2016

Is the behaviour here settled?

Yes, it's settled. (Tho the improvement is currently unreleased, yes.)

We've got two options:

  • Have slow behavior when prefetch_related is used with updates.
  • Have broken behavior when prefetch_related is used with updates.

There's no question which of those two we should do.

It may be that there's still further improvements we could make, and I'm very happy to consider productive discussion there, but that's a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants