My Pygotham 2018 talk titled: How to Stop Worrying and Love atomic(), Banking Blunders & Concurrency Challenges
Clone or download
Latest commit 46ecdf7 Dec 5, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
slides.html initial commit Oct 6, 2018
slides.jpg initial commit Oct 6, 2018
README.md Update README.md Dec 4, 2018
slides.pdf initial commit Oct 6, 2018

README.md

How I Learned to Stop Worrying and Love atomic()
Banking Blunders and Concurrency Challenges

How to deal with concurrency, money, and log-structured data in the Django ORM

Nick Sweeting | @theSquashSH on Twitter | @pirate on Github | LinkedIn | Co-Founder/CTO @ Monadical building OddSlingers Poker


PyGotham Talk Page
#pygotham2018 #python #django #distributed-systems #concurrency #databases

(sorry for the overused "how I learned..." title)

Video

Watch the PyGotham 2018 talk video on YouTube

Slides

You can view the slides here:

Overview

What do Superman III, Hackers, and Office Space all have in common? Find out in this talk, along with some concurrency, database integrity, and financial data safety fundamentals. This is a technical talk going over the core principles in database integrity and data safety, it assumes familiarity with the Django ORM.

Description

Did you know every Django app already behaves like a distributed system, even when it’s running only on one server? In this talk I’ll go over some of the distributed systems & database fundamentals that you’ll neeed to understand when building a Python project that handles sensitive data. We’ll focus on intermediate and advanced usage of the Django ORM, but many of the concepts apply equally well to SQLAlchemy and other Python ORMs.

We’ll go over:

  • how to use immutable append-only logs to order events across your system (log-structured storage)
  • the importance of accurate timestamps, and why that's difficult in distributed systems
  • the meaning of transaction isolation levels
  • how and when to use locking (pessimistic concurrency)
  • how and when to use atomic compare-and-swaps (optimistic concurrency)
  • how and when to use a hybrid optimistic/pessimistic approach like MVCC
  • type safety for currencies, and math operations to avoid
  • new distributed-SQL databases like spanner, TiDB, and Cockroachdb
  • transaction lifecycles when doing async processing with django-channels

We spent the last two years building an online poker engine based on Django + channels, and we have plenty of stories about our failures and discoveries to share along the way. Come learn about all the ways it’s possible to screw up when handling sensitive data, and how to avoid them!

Notes

One thing I didn't cover in the talk is that SQL can do something called "gap locking". That is if you have an index for a given column, and you perform a .select_for_update() with a filter, it won't just lock the rows that match the filter, it will actually prevent any new rows from being added that match the filter while the lock is held, which lets you effectively lock append-only tables without needing to lock the entire table.

(Thanks to Sam Kimbrel for telling me about this feature in the hall after the talk)

Example:

class BalanceTransfer(models.Model):
    src = models.ForeignKey(User)
    dst = models.ForeignKey(User)
    amt = models.DecimalField(max_digits=20, decimal_places=2)
    timestamp = models.DateTimeField(auto_now_add=True)

...
with transaction.atomic():
    lock = BalanceTransfer.objects.select_for_update().filter(Q(src=user) | Q(dst=user))
    ...
    # no new rows can be added matching Q(src=user) | Q(dst=user) during this time
    # meaning we can safely do some logic without a user's balance being changed by new transfers added to the table
    withdraw(user, 5000)

Gap locking only works on columns that are indexed (all columns must be indexed together for the query you're tring to lock). By default, all django ForeignKey columns are indexed, so you only need to worry about that if you want to gap lock a field like CharField, DecimalField, DateTimeField, etc.

Further Reading