Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidation of data stored in a primary/replica configured DB invalidates only for primary instance #37

Closed
micku opened this issue Dec 10, 2015 · 14 comments

Comments

@micku
Copy link

micku commented Dec 10, 2015

I have a Django app with two MariaDB databases configured as primary/replica with the following configuration:

DATABASE_ROUTERS = ['app.db_replica.PrimaryReplicaRouter', ]

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'app',
        'USER': 'app',
        'PASSWORD': '*******************',
        'HOST': 'mysql-master',
    },
    'replica1': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'app',
        'USER': 'app',
        'PASSWORD': '*******************',
        'HOST': 'mysql-slave',
    },
}

The router is configured to write on primary and read on replica:

class PrimaryReplicaRouter(object):
    def db_for_read(self, model, **hints):
        return 'replica1'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        db_list = ('default', 'replica1')
        if obj1._state.db in db_list and obj2._state.db in db_list:
            return True
        return None

    def allow_migrate(self, db, app_label, model=None, **hints):
        return True

When Django writes on a table, cache gets invalidated only on primary.
You can test this by configuring the router to randomly return the primary or the replica, you will see old and new value alternated refreshing the page.

Calling ./manage.py invalidate_cachalot works, I think because it invalidates all the cache ignoring the database instance.

@marccerrato
Copy link

We have recently commented the same behaviour here: #27 (comment)

@micku
Copy link
Author

micku commented Dec 11, 2015

Thank you @marccerrato, didn't see that issue since it is closed.
I think that this issue needs to have a dedicated post.

This is blocking for me since I am about to deploy a high traffic app, and I am stucked with only one DB of the two available.

Regarding your post, as far as I can tell routers does not have to implement a method that returns the list of read DB aliases, so I don't think this is a viable solution, without the need to modify existing routers adding non-standard methods. I may be wrong since I'm not that experienced with Django, and I had no time to dig deeper.

@marccerrato
Copy link

Then the only feasible options I see would be either to invalidate the table for all the available DB aliases or simply stop using the DB alias for generating the cache key.

But let's wait for @BertrandBordage thoughts... 😄

@micku
Copy link
Author

micku commented Dec 11, 2015

Probably the best way is to stop using DB aliases, I see that DB routers are mainly used for two purposes:

  • Primary/replica
  • Separation of apps data

In both cases table name, which contains the name of the app, seems to be a valid key since you can't have two apps with the same name in a single Django project, or two tables with the same name in one app as far as I know.

Maybe I am missing something :)

@BertrandBordage
Copy link
Member

Short answer: you can’t use a replica DB with django-cachalot, and I will add it to the docs.

Long answer:
The problem is not coming from the router. The router only selects a database or another, and cachalot has no problem with multiple databases.
The only problem is that one of the databases is updated outside Django, so cachalot has no way of knowing what has changed! The replica is automatically modified on each modification of the primary database, but from the Django point of view, these are two independent databases.
So you should stop using django-cachalot in this way.
I will add a note in the documentation about that.

@micku:
./manage.py invalidate_cachalot works because it invalidates the cache for all configured databases and caches (unless you specify otherwise using command parameters).

@micku
Copy link
Author

micku commented Dec 18, 2015

Thank you @BertrandBordage for your response.
Unfortunately in my case primary/replica is mandatory due to high availability requirement.
At this point I think that the only alternative is to try with another module, it's unfortunate since I was really happy with cachalot.

@marccerrato
Copy link

@BertrandBordage the other option was to invalidate the table for all the available database aliases. Would it make sense to you?

@BertrandBordage
Copy link
Member

@marccerrato: Yes, you’re right, but it can’t be done automatically, you have to do it yourself, and that’s what I’ll add to the docs.
Using the post_invalidation signal you can achieve this.

@micku, @marccerrato, could you therefore test if this works correctly?

from cachalot.api import invalidate
from cachalot.signal post_invalidation

@receiver(post_invalidation)
def invalidate_replica(sender, **kwargs):
    if kwargs['db_alias'] == 'default':
        invalidate(sender, db_alias='replica')

@micku Other similar django libraries will have the same problem if I remember correctly how they work.

@micku
Copy link
Author

micku commented Dec 18, 2015

@BertrandBordage I'll be very happy to test this for you as soon as I can!
And thank you for you advice.
As for other libraries, I see that django-cacheops uses the query as a key, and has a config param named db_agnostic which changes this behaviour, but I have to dig more to be sure, so I prefer to continue using cachalot if this snippet works.

@xino12
Copy link

xino12 commented Dec 18, 2015

@BertrandBordage The solution works :)

Thanks!
The code should be like this:

from cachalot.api import invalidate
from cachalot.signals import post_invalidation
from django.dispatch import receiver
@receiver(post_invalidation)
def invalidate_replica(sender, **kwargs):
    if kwargs['db_alias'] == 'default':
        invalidate(sender, db_alias='replica1')

@micku Exactly for your case.

@BertrandBordage
Copy link
Member

@xino12 Cool :) Glad it works as expected! Yes, I forgot an import, that’s right.

@micku
Copy link
Author

micku commented Dec 18, 2015

Thank you @xino12, I'm implementing it on our stage environment right now!
I'll be back after my tests.

@micku
Copy link
Author

micku commented Dec 18, 2015

Works really well!
Thank you @BertrandBordage and @xino12.

Just a note, I had to run ./manage.py invalidate_cachalot after adding the code for it to become effective.

@BertrandBordage
Copy link
Member

Fixed by 286e590.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants