Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queryset with annotated Now() is cached #193

Closed
dbartenstein opened this issue Aug 18, 2021 · 12 comments · Fixed by #195
Closed

Queryset with annotated Now() is cached #193

dbartenstein opened this issue Aug 18, 2021 · 12 comments · Fixed by #195
Labels

Comments

@dbartenstein
Copy link
Contributor

What happened?

An annotated query including Now() is cached.

What should've happened instead?

An annotated query containing Now() must not be cached.

Steps to reproduce

from django.db.models.functions import Now
# Replace Vehicle with some sort of cachable model
Vehicle.objects.all().annotate(now=Now())

Debian Linux
Django version 3.2 LTS
Postgresql 10

@Andrew-Chen-Wang

This comment has been minimized.

@dbartenstein
Copy link
Contributor Author

dbartenstein commented Aug 18, 2021

@Andrew-Chen-Wang: this is a constructed case of course which I came up with: so there is no need for a temporary cure.

One feasible approach would be to do an additional check at:

if isinstance(annotation, Subquery):

if isinstance(annotation, Now):
    raise UncachableQuery

Wouldn’t it be most obvious to simply deactivate caching - on the lines of how the Now() function is handled in the following lines of code?

elif h_class in UNCACHABLE_FUNCS:

Another question

I also noted that annotated condititional expression do not seem to be handled? Is this perception correct?
It would be something like:
isinstance(annotation, Case)

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Aug 19, 2021

Ah I just noticed the UNCACHABLE_FUNC already has Now. Apologies. Surprisingly, the filter behavior passes in our tests but somehow the annotation for Now, as you've rightly pointed out, isn't being tested:

def test_now(self):
"""
Checks that queries with a Now() parameter are not cached.
"""
obj = Test.objects.create(datetime='1992-07-02T12:00:00')
qs = Test.objects.filter(
datetime__lte=Now())
with self.assertNumQueries(1):
obj1 = qs.get()
with self.assertNumQueries(1):
obj2 = qs.get()
self.assertEqual(obj1, obj2)
self.assertEqual(obj1, obj)

Did you happen to have this behavior on Django 3.1 and/or on previous cachalot versions? Django 3.2 brought a weird change that had deeply hidden nuances, and the changes I made or Django made may have affected the ORM module.

One feasible approach would be to do an additional check

That's OK for a temporary solution, but it doesn't resolve the worry that these annotations aren't being caught in the utils.py on L117.

I also noted that annotated condititional expression do not seem to be handled? Is this perception correct?

They are handled I think; though, I suppose we don't have a test case for Case or Where specifically.

@dbartenstein
Copy link
Contributor Author

dbartenstein commented Aug 19, 2021

Did you happen to have this behavior on Django 3.1 and/or on previous cachalot versions? Django 3.2 brought a weird change that had deeply hidden nuances, and the changes I made or Django made may have affected the ORM module.

We have been using cachalot for only two weeks now and as we are always on the latest LTS Django version, we have only tested it with Django 3.2.

They are handled I think; though, I suppose we don't have a test case for Case or Where specifically.
I don’t think so. I will come up with an example.

Is it ok for you if I open a new PR with additional tests that demonstrate use cases which are not handled yet? In a further step we could then try to fix these cases in the same PR. What do you think?

@Andrew-Chen-Wang
Copy link
Collaborator

Approved the CI. Thanks for the quick write-up!

@Andrew-Chen-Wang
Copy link
Collaborator

@dbartenstein thanks for the quick test and great bug reports. Both are resolved in #195 and #196

@dbartenstein
Copy link
Contributor Author

dbartenstein commented Aug 19, 2021

@dbartenstein thanks for the quick test and great bug reports. Both are resolved in #195 and #196

@Andrew-Chen-Wang: wuhu - you’re quick! 👍 I’ll have a look at our code base and check it for additional sophisticated queries which might not yet be covered by cachalot.

For many uses cases not caching a query is better than falsely not invalidating one. What do you generally think of introducing a setting CACHALOT_CAUTIOUS_BEHAVIOR which by default would be set to False (to mimic current behavior)? CACHALOT_CAUTIOUS_BEHAVIOR = True would force cachalot to only cache when everything is safe. When unknown expressions are part of the query it would not do any caching to be on the safe side. The current behavior is very optimistic and leads to unwanted cached queries.

@Andrew-Chen-Wang
Copy link
Collaborator

Can you specify where this setting would be used? Cachalot is pretty algorithmic, so we can't really tell if we're missing something or not or making a false positive besides tests. Do you mean when we add new behaviors? In which case, I haven't added any significantly new features, besides some new settings; everything's in maintenance mode since it's been working fine for me.

@dbartenstein
Copy link
Contributor Author

dbartenstein commented Aug 20, 2021

Can you specify where this setting would be used? Cachalot is pretty algorithmic, so we can't really tell if we're missing something or not or making a false positive besides tests. Do you mean when we add new behaviors? In which case, I haven't added any significantly new features, besides some new settings; everything's in maintenance mode since it's been working fine for me.

It was more like an idea. When we added cachalot we realized that there were cases where unwanted caching happened - due to the fact that not all cases were handled (e.g. annotated Case). Thus I asked whether if was possible to make cachalot behave more conservative. This would need explicit handling of expressions etc., which of course would add a lot of extra code. Not a good idea. Maybe you have some other idea how cachalot could be forced (upon request) to behave more conservative?

Edit: to be on the safe side, one could additionally check the resulting SQL str(query) using _get_tables_from_sql when running in conservative mode. If the final call of _get_tables_from_sql returned additional tables, it would be a hint that maybe some case is not yet covered by cachalot.

@Andrew-Chen-Wang
Copy link
Collaborator

Andrew-Chen-Wang commented Aug 20, 2021

This would need explicit handling of expressions etc., which of course would add a lot of extra code

I thought about it before, and it's not that bad in terms of code (it was pretty bad in terms of performance tho), but it isn't necessary since the bug reports found here or, in general, a conservative approach, misses the core dilemma: where do we even look for these expressions? Plainly, it's the ever-changing internals of the ORM and the lack of variety in the test suite that cause issues. For example, Django uses the RHS/LHS concept, and for all of cachalot history, only in Django 3.2 have I needed to add support for checking the LHS. Or the introduction of inner_query:

tables.update(_get_tables(db_alias, query.inner_query))
It took me hours to find out about it since ORM internals are never in release notes.

Taking a conservative approach vs. a liberal approach both end up at this "where do I look" dilemma. If we were to check all expressions (the conservative approach), how would we know that we checked every single QuerySet class method that might contain a cachable expression? Same vice versa for the liberal approach: how do we know we caught all the uncachable expressions?

So it boils down to:

  1. We need a more extensive test suite. Like much larger.
  2. I need to start just comparing everything in the minor version Django releases in the db module.

@dbartenstein
Copy link
Contributor Author

dbartenstein commented Aug 21, 2021

Taking a conservative approach vs. a liberal approach both end up at this "where do I look" dilemma. If we were to check all expressions (the conservative approach), how would we know that we checked every single QuerySet class method that might contain a cachable expression? Same vice versa for the liberal approach: how do we know we caught all the uncachable expressions?

So it boils down to:

1. We need a more extensive test suite. Like much larger.

👍

2. I need to start just comparing everything in the minor version Django releases in the db module.

👍

Yesterday I experimented a bit with an additional SQL check - at the end of def _get_tables(db_alias, query) I added:

# Additional check
final_check_tables = _get_tables_from_sql(connections[db_alias], str(query))
if not tables.issuperset(final_check_tables):
    print(str(query))
    print('There is a difference!', tables, final_check_tables)
    tables.update(final_check_tables)

One of the tests which surprised me at first:

test_custom_subquery (cachalot.tests.read.ReadTestCase) ... SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", (SELECT U0."name" FROM "cachalot_test" U0 WHERE U0."permission_id" = "auth_permission"."id" ORDER BY U0."name" ASC LIMIT 1) AS "first_permission" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") ORDER BY "django_content_type"."app_label" ASC, "django_content_type"."model" ASC, "auth_permission"."codename" ASC
There is a difference! {'auth_permission', 'cachalot_test'} {'cachalot_test', 'auth_permission', 'django_content_type'}
FAIL

Learning

The ordering by content_type is defined on the Permission model: https://github.com/django/django/blob/ca9872905559026af82000e46cde6f7dedc897b6/django/contrib/auth/models.py#L72

Changing content types’ labels would result in different queryset order.

  • Is that sufficient to have django_content_type returned by _get_tables? Or is it valid to only do that when the ordering is requested explicitly?
  • From your experience, how helpful is it to analyze the resulting SQL to determine the affected tables? Could this be the extra step for the convervative behavior?

@Andrew-Chen-Wang
Copy link
Collaborator

Is that sufficient to have django_content_type returned by _get_tables? Or is it valid to only do that when the ordering is requested explicitly?

The Content Type app is used by several packages and Django internally quite a lot, so it's fine (all tables are invalidated on migration). In general, if someone specifies ordering in their model's Meta like the way Permissions has, the user would assume that the tables are cached assuming they specified caching it in the settings. But for most users, they do the naive approach of selecting all models so :P

There are many things Django does behind the scenes that you don't specify explicitly in an ORM query or inside your view.

From your experience, how helpful is it to analyze the resulting SQL to determine the affected tables?

I can really only answer this in two parts:

  1. when maintaining cachalot, it's not really that helpful since my debugger has lots of class attributes with their values listed. I think the only time I found it helpful was when the compiled SQL was unexpectedly in 2 queries rather than 1 or something like the Permissions model example you gave.
  2. I guess it's pretty useful when it comes to my own applications...? It's usually something unexpected, though, in context of the question; I always check my compiled SQL queries -- since you never know if your Django ORM queries were built inefficiently -- and that's usually where I subconsciously find a mistake in my cachalot configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants