Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached values may be stale after updating in a transaction #296

Closed
dtao opened this issue May 31, 2018 · 3 comments
Closed

Cached values may be stale after updating in a transaction #296

dtao opened this issue May 31, 2018 · 3 comments
Labels

Comments

@dtao
Copy link
Contributor

dtao commented May 31, 2018

On Bitbucket we use waffle flags/samples/switches in some very high-traffic scenarios, which can lead to race conditions like this one.

Recently we uncovered another one: since BaseModel.save() calls flush() immediately after saving, there is the potential for a separate process to read the previous value and cache it before the transaction is committed, resulting in the stale value being cached indefinitely.

You might think this is an edge case but in fact the Django admin site always wraps changes in transactions, so this applies to any application for which the primary mechanism for flipping flags and switches is the admin UI.

The most natural fix for this would be to change BaseModel.save() to something like:

from django.db import transaction

def save(self, *args, **kwargs):
    self.modified = timezone.now()
    ret = super(BaseModel, self).save(*args, **kwargs)
    transaction.on_commit(self.flush)
    return ret

However, transaction.on_commit only became available in Django 1.9, so this wouldn't work out of the box on previous versions of Django. If this were my library, I'd probably just add a runtime check to use on_commit if available, otherwise call flush() directly and add a note in the docs to library consumers: sorry, but there is a race condition unless you're on Django>=1.9.

The reason I am opening an issue and not a pull request is actually something else entirely: the current test suite seems to use sqlite, but the best way I can come up with to demonstrate the problem is to use threads and it seems sqlite doesn't play nicely with threads—or at least, it didn't work for me locally (the linked issue appears to have a solution, but this is about where I stopped digging).

In any case, here is a test that can be added if someone wants to put in the effort to make it work:

import threading

from django.db import transaction
from django.test import TransactionTestCase

from waffle import switch_is_active
from waffle.models import Switch


class WaffleTestCase(TransactionTestCase):
    def test_update_switch_in_transaction(self):
        """Wait to invalidate the cache until after the current transaction."""

        switch_name = 'transaction-switch-name'
        switch = Switch.objects.create(name=switch_name, active=False)
        self.addCleanup(switch.delete)

        switch_written_in_background_thread = threading.Event()
        switch_read_in_main_thread = threading.Event()

        @transaction.atomic
        def update_switch():
            switch.active = True
            switch.save()

            # Signal to the main thread that the switch has been updated, but
            # the transaction is not yet committed.
            switch_written_in_background_thread.set()

            # Pause here to allow the main thread to make an assertion.
            switch_read_in_main_thread.wait(timeout=1)

        # Start a background thread to update the switch in a transaction.
        t = threading.Thread(target=update_switch)
        t.daemon = True
        t.start()

        # After the switch is updated but before the transaction is committed,
        # the cache will still have the previous value.
        switch_written_in_background_thread.wait(timeout=1)
        assert not switch_is_active(switch_name)

        # After the transaction is committed, the cache should have been
        # invalidated, hence the next call to switch_is_active should have the
        # correct value.
        switch_read_in_main_thread.set()
        t.join(timeout=1)
        assert switch_is_active(switch_name)
@jsocol
Copy link
Collaborator

jsocol commented May 31, 2018

Interesting!

So I'm OK with the proposed fix more or less as-is, and here are some bonus thoughts:

  • Can definitely feature-detect transaction.on_commit, but I don't think Django 1.8 is still an active LTS, so we can also probably just assume it and support >=1.11.
  • I'd be OK using a real DB for testing on Travis but would rather not require it locally.
  • I'm OK skipping impossible tests based on the DB backend, or even not having explicit tests for race conditions—this one looks fairly stable but I know they tend to be slow and flakey.

@stale
Copy link

stale bot commented Jun 30, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 30, 2018
@stale
Copy link

stale bot commented Jul 14, 2018

This issue has been closed as stale because it has not had any recent activity. Please feel free to re-open if the issue is still relevant. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants