Queryset with annotated Now() is cached #193

dbartenstein · 2021-08-18T08:49:38Z

What happened?

An annotated query including Now() is cached.

What should've happened instead?

An annotated query containing Now() must not be cached.

Steps to reproduce

from django.db.models.functions import Now
# Replace Vehicle with some sort of cachable model
Vehicle.objects.all().annotate(now=Now())

Debian Linux
Django version 3.2 LTS
Postgresql 10

The text was updated successfully, but these errors were encountered:

dbartenstein · 2021-08-18T21:14:21Z

@Andrew-Chen-Wang: this is a constructed case of course which I came up with: so there is no need for a temporary cure.

One feasible approach would be to do an additional check at:

django-cachalot/cachalot/utils.py

Line 176 in 9d52865

if isinstance(annotation, Subquery):

if isinstance(annotation, Now):
    raise UncachableQuery

Wouldn’t it be most obvious to simply deactivate caching - on the lines of how the Now() function is handled in the following lines of code?

django-cachalot/cachalot/utils.py

Line 117 in 9d52865

elif h_class in UNCACHABLE_FUNCS:

Another question

I also noted that annotated condititional expression do not seem to be handled? Is this perception correct?
It would be something like:
isinstance(annotation, Case)

Andrew-Chen-Wang · 2021-08-19T02:41:45Z

Ah I just noticed the UNCACHABLE_FUNC already has Now. Apologies. Surprisingly, the filter behavior passes in our tests but somehow the annotation for Now, as you've rightly pointed out, isn't being tested:

django-cachalot/cachalot/tests/read.py

Lines 1034 to 1046 in 9d52865

    
               def test_now(self): 
        
                   """ 
        
                   Checks that queries with a Now() parameter are not cached. 
        
                   """ 
        
                   obj = Test.objects.create(datetime='1992-07-02T12:00:00') 
        
                   qs = Test.objects.filter( 
        
                       datetime__lte=Now()) 
        
                   with self.assertNumQueries(1): 
        
                       obj1 = qs.get() 
        
                   with self.assertNumQueries(1): 
        
                       obj2 = qs.get() 
        
                   self.assertEqual(obj1, obj2) 
        
                   self.assertEqual(obj1, obj)

Did you happen to have this behavior on Django 3.1 and/or on previous cachalot versions? Django 3.2 brought a weird change that had deeply hidden nuances, and the changes I made or Django made may have affected the ORM module.

One feasible approach would be to do an additional check

That's OK for a temporary solution, but it doesn't resolve the worry that these annotations aren't being caught in the utils.py on L117.

I also noted that annotated condititional expression do not seem to be handled? Is this perception correct?

They are handled I think; though, I suppose we don't have a test case for Case or Where specifically.

dbartenstein · 2021-08-19T08:12:12Z

Did you happen to have this behavior on Django 3.1 and/or on previous cachalot versions? Django 3.2 brought a weird change that had deeply hidden nuances, and the changes I made or Django made may have affected the ORM module.

We have been using cachalot for only two weeks now and as we are always on the latest LTS Django version, we have only tested it with Django 3.2.

They are handled I think; though, I suppose we don't have a test case for Case or Where specifically.
I don’t think so. I will come up with an example.

Is it ok for you if I open a new PR with additional tests that demonstrate use cases which are not handled yet? In a further step we could then try to fix these cases in the same PR. What do you think?

Andrew-Chen-Wang · 2021-08-19T16:40:13Z

Approved the CI. Thanks for the quick write-up!

Andrew-Chen-Wang · 2021-08-19T18:17:35Z

@dbartenstein thanks for the quick test and great bug reports. Both are resolved in #195 and #196

dbartenstein · 2021-08-19T19:28:57Z

@dbartenstein thanks for the quick test and great bug reports. Both are resolved in #195 and #196

@Andrew-Chen-Wang: wuhu - you’re quick! 👍 I’ll have a look at our code base and check it for additional sophisticated queries which might not yet be covered by cachalot.

For many uses cases not caching a query is better than falsely not invalidating one. What do you generally think of introducing a setting CACHALOT_CAUTIOUS_BEHAVIOR which by default would be set to False (to mimic current behavior)? CACHALOT_CAUTIOUS_BEHAVIOR = True would force cachalot to only cache when everything is safe. When unknown expressions are part of the query it would not do any caching to be on the safe side. The current behavior is very optimistic and leads to unwanted cached queries.

Andrew-Chen-Wang · 2021-08-20T00:03:16Z

Can you specify where this setting would be used? Cachalot is pretty algorithmic, so we can't really tell if we're missing something or not or making a false positive besides tests. Do you mean when we add new behaviors? In which case, I haven't added any significantly new features, besides some new settings; everything's in maintenance mode since it's been working fine for me.

dbartenstein · 2021-08-20T08:16:21Z

Can you specify where this setting would be used? Cachalot is pretty algorithmic, so we can't really tell if we're missing something or not or making a false positive besides tests. Do you mean when we add new behaviors? In which case, I haven't added any significantly new features, besides some new settings; everything's in maintenance mode since it's been working fine for me.

It was more like an idea. When we added cachalot we realized that there were cases where unwanted caching happened - due to the fact that not all cases were handled (e.g. annotated Case). Thus I asked whether if was possible to make cachalot behave more conservative. This would need explicit handling of expressions etc., which of course would add a lot of extra code. Not a good idea. Maybe you have some other idea how cachalot could be forced (upon request) to behave more conservative?

Edit: to be on the safe side, one could additionally check the resulting SQL str(query) using _get_tables_from_sql when running in conservative mode. If the final call of _get_tables_from_sql returned additional tables, it would be a hint that maybe some case is not yet covered by cachalot.

Andrew-Chen-Wang · 2021-08-20T17:54:41Z

This would need explicit handling of expressions etc., which of course would add a lot of extra code

I thought about it before, and it's not that bad in terms of code (it was pretty bad in terms of performance tho), but it isn't necessary since the bug reports found here or, in general, a conservative approach, misses the core dilemma: where do we even look for these expressions? Plainly, it's the ever-changing internals of the ORM and the lack of variety in the test suite that cause issues. For example, Django uses the RHS/LHS concept, and for all of cachalot history, only in Django 3.2 have I needed to add support for checking the LHS. Or the introduction of inner_query:

django-cachalot/cachalot/utils.py

Line 201 in b15027a

tables.update(_get_tables(db_alias, query.inner_query))

It took me hours to find out about it since ORM internals are never in release notes.

Taking a conservative approach vs. a liberal approach both end up at this "where do I look" dilemma. If we were to check all expressions (the conservative approach), how would we know that we checked every single QuerySet class method that might contain a cachable expression? Same vice versa for the liberal approach: how do we know we caught all the uncachable expressions?

So it boils down to:

We need a more extensive test suite. Like much larger.
I need to start just comparing everything in the minor version Django releases in the db module.

dbartenstein · 2021-08-21T12:04:18Z

Taking a conservative approach vs. a liberal approach both end up at this "where do I look" dilemma. If we were to check all expressions (the conservative approach), how would we know that we checked every single QuerySet class method that might contain a cachable expression? Same vice versa for the liberal approach: how do we know we caught all the uncachable expressions?

So it boils down to:
1. We need a more extensive test suite. Like much larger.

👍

2. I need to start just comparing everything in the minor version Django releases in the db module.

👍

Yesterday I experimented a bit with an additional SQL check - at the end of def _get_tables(db_alias, query) I added:

# Additional check
final_check_tables = _get_tables_from_sql(connections[db_alias], str(query))
if not tables.issuperset(final_check_tables):
    print(str(query))
    print('There is a difference!', tables, final_check_tables)
    tables.update(final_check_tables)

One of the tests which surprised me at first:

test_custom_subquery (cachalot.tests.read.ReadTestCase) ... SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", (SELECT U0."name" FROM "cachalot_test" U0 WHERE U0."permission_id" = "auth_permission"."id" ORDER BY U0."name" ASC LIMIT 1) AS "first_permission" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") ORDER BY "django_content_type"."app_label" ASC, "django_content_type"."model" ASC, "auth_permission"."codename" ASC
There is a difference! {'auth_permission', 'cachalot_test'} {'cachalot_test', 'auth_permission', 'django_content_type'}
FAIL

Learning

The ordering by content_type is defined on the Permission model: https://github.com/django/django/blob/ca9872905559026af82000e46cde6f7dedc897b6/django/contrib/auth/models.py#L72

Changing content types’ labels would result in different queryset order.

Is that sufficient to have django_content_type returned by _get_tables? Or is it valid to only do that when the ordering is requested explicitly?
From your experience, how helpful is it to analyze the resulting SQL to determine the affected tables? Could this be the extra step for the convervative behavior?

Andrew-Chen-Wang · 2021-08-21T12:29:37Z

Is that sufficient to have django_content_type returned by _get_tables? Or is it valid to only do that when the ordering is requested explicitly?

The Content Type app is used by several packages and Django internally quite a lot, so it's fine (all tables are invalidated on migration). In general, if someone specifies ordering in their model's Meta like the way Permissions has, the user would assume that the tables are cached assuming they specified caching it in the settings. But for most users, they do the naive approach of selecting all models so :P

There are many things Django does behind the scenes that you don't specify explicitly in an ORM query or inside your view.

From your experience, how helpful is it to analyze the resulting SQL to determine the affected tables?

I can really only answer this in two parts:

when maintaining cachalot, it's not really that helpful since my debugger has lots of class attributes with their values listed. I think the only time I found it helpful was when the compiled SQL was unexpectedly in 2 queries rather than 1 or something like the Permissions model example you gave.
I guess it's pretty useful when it comes to my own applications...? It's usually something unexpected, though, in context of the question; I always check my compiled SQL queries -- since you never know if your Django ORM queries were built inefficiently -- and that's usually where I subconsciously find a mistake in my cachalot configuration.

Andrew-Chen-Wang added needs info bug and removed needs info labels Aug 18, 2021

This comment has been minimized.

Sign in to view

This was referenced Aug 19, 2021

Check for Uncachables (i.e. Now) in annotations #195

Merged

Check for annotated case #196

Merged

Andrew-Chen-Wang closed this as completed in #195 Aug 19, 2021

oliwarner mentioned this issue Oct 17, 2022

Excluding certain queries #124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queryset with annotated Now() is cached #193

Queryset with annotated Now() is cached #193

dbartenstein commented Aug 18, 2021

This comment has been minimized.

dbartenstein commented Aug 18, 2021 •

edited

Andrew-Chen-Wang commented Aug 19, 2021 •

edited

dbartenstein commented Aug 19, 2021 •

edited

Andrew-Chen-Wang commented Aug 19, 2021

Andrew-Chen-Wang commented Aug 19, 2021

dbartenstein commented Aug 19, 2021 •

edited

Andrew-Chen-Wang commented Aug 20, 2021

dbartenstein commented Aug 20, 2021 •

edited

Andrew-Chen-Wang commented Aug 20, 2021 •

edited

dbartenstein commented Aug 21, 2021 •

edited

Andrew-Chen-Wang commented Aug 21, 2021

Queryset with annotated Now() is cached #193

Queryset with annotated Now() is cached #193

Comments

dbartenstein commented Aug 18, 2021

What happened?

What should've happened instead?

Steps to reproduce

This comment has been minimized.

dbartenstein commented Aug 18, 2021 • edited

Another question

Andrew-Chen-Wang commented Aug 19, 2021 • edited

dbartenstein commented Aug 19, 2021 • edited

Andrew-Chen-Wang commented Aug 19, 2021

Andrew-Chen-Wang commented Aug 19, 2021

dbartenstein commented Aug 19, 2021 • edited

Andrew-Chen-Wang commented Aug 20, 2021

dbartenstein commented Aug 20, 2021 • edited

Andrew-Chen-Wang commented Aug 20, 2021 • edited

dbartenstein commented Aug 21, 2021 • edited

Learning

Andrew-Chen-Wang commented Aug 21, 2021

dbartenstein commented Aug 18, 2021 •

edited

Andrew-Chen-Wang commented Aug 19, 2021 •

edited

dbartenstein commented Aug 19, 2021 •

edited

dbartenstein commented Aug 19, 2021 •

edited

dbartenstein commented Aug 20, 2021 •

edited

Andrew-Chen-Wang commented Aug 20, 2021 •

edited

dbartenstein commented Aug 21, 2021 •

edited