## Deadlocks

Deadlocks are the Charybdis to the Scylla of race conditions.  That is, avoiding one puts you in danger of suffering the other.

In [1]:
from threading import Thread, current_thread, Lock
from time import sleep
from random import randint

## A simple example

Let's start by analyzing a simple example: transfers between two "bank accounts":

In [2]:
def move_funds(from_, to_):
    global accounts, kill_switch
    initial_total = accounts[from_] + accounts[to_] 
    name = current_thread().name
    
    for n in range(1_000_000):
        transfer = randint(1, 100)
        accounts[from_] -= transfer
        accounts[to_] += transfer        
        total = accounts[from_] + accounts[to_]
        
        # Exit if balance wrong or if another thread thinks so
        if total != initial_total:
            print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
            kill_switch = True
            break
        elif kill_switch:
            print(f"{name} other thread flagged: ${total:,} ({n:,} transactions)")
            break
    else:
        print(f'{name} reached iteration limit. Stopping...')            

The augmented assigments, `accounts[from_] -= amount` and `accounts[to_] += amount` can potentially introduce race conditions.

In [12]:
kill_switch = False
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds, args=('acc1', 'acc2'))
t2 = Thread(target=move_funds, args=('acc2', 'acc1'))

t1.start()
t2.start()
t1.join()
t2.join()

print("Balances:", accounts)

Thread-14 inconsistent balance: $109,598 (6,612 transactions)
Thread-15 inconsistent balance: $109,557 (1,803 transactions)
Balances: {'acc1': -236058, 'acc2': 345656}


## Adding Locks

In the last lesson, you learned about locks. We can use those to try synchronizing access to the accounts. We'll create 2 locks, one for each account:

In [4]:
def move_funds2(from_, lock_from, to_, lock_to):
    initial_total = accounts[from_] + accounts[to_]
    name = current_thread().name
    
    for n in range(1_000_000):
        amount = randint(1, 100)
        with lock_from, lock_to:
            accounts[from_] -= amount
            accounts[to_] += amount

            total = accounts[from_] + accounts[to_]
            if total != initial_total:
                print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
                break
    else:
        print(f'{name} reached iteration limit. Stopping...')

In [5]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc1, 'acc1', lock_acc2))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())

t1 alive? True | t2 alive? True | {'acc1': -10231, 'acc2': 210231}
t1 alive? True | t2 alive? True | {'acc1': 253564, 'acc2': -53564}
t1 alive? True | t2 alive? True | {'acc1': 436827, 'acc2': -236827}
t1 alive? True | t2 alive? True | {'acc1': 505229, 'acc2': -305229}
t1 alive? True | t2 alive? True | {'acc1': 455660, 'acc2': -255660}
t1 alive? True | t2 alive? True | {'acc1': 410521, 'acc2': -210521}
Thread-7 reached iteration limit. Stopping...
Thread-6 reached iteration limit. Stopping...
acc1 locked? False | acc2 locked? False


It worked (this time)! Access to the accounts is protected by the locks. 

But there is a danger lurking here. We succeeded more or less accidentally because of the order in which locks were acquired.  In fact, the initial code is conceptually wrong because it used `lock_acc1` as the protection of `acc2`, for example.  

If we make a small change, altering the order of the locks that are passed to our threads, we will find ourselves deadlocked:

In [14]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc2, 'acc1', lock_acc1))
print("Threads created:", accounts)
t1.start()
t2.start()
print("Threads started:", accounts)

for _ in range(8):
    print("{t1.name} alive?", t1.is_alive(), "| {t2.name} alive?", t2.is_alive(), "|", accounts)
    sleep(3)

Threads created: {'acc1': 100000, 'acc2': 100000}
Threads started: {'acc1': 135771, 'acc2': 64229}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 168248, 'acc2': 31752}


Both locks remain locked.

In [7]:
lock_acc1.locked(), lock_acc2.locked()

(True, True)

These threads will never die, and the locks will never be released.  We can do something a bit hack-ish to repeatedly release the locks until both threads crash and end.  This would almost never be a good idea in a production program, but we do it just for teaching.

In [15]:
nerr, err_info = 0, None
while t1.is_alive() or t2.is_alive():
    try:
        lock_acc1.release(), lock_acc2.release()
    except Exception as err:
        sleep(1e-9)
        err_info = repr(err)
        nerr += 1
        
print(f"{nerr:,} errors of type {err_info}")
# Status of threads and locks
t1.is_alive(), lock_acc1.locked(), t2.is_alive(), lock_acc2.locked()

Exception in thread Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Thread-19:
Traceback (most recent call last):
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
self.run()
  File "/home/dmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)  File "<ipython-input-4-4b94722f1175>", line 14, in move_funds2

  File "<ipython-input-4-4b94722f1175>", line 14, in move_funds2
RuntimeErrorRuntimeError: release unlocked lock
: release unlocked lock


25 errors of type RuntimeError('release unlocked lock')


(False, False, False, False)

## What are Deadlocks?

A deadlock occurs when multiple  threads (or processes, or nodes in a cluster) are mutually suspended while waiting for each other to signal processing may continue; none can get there because of the circular dependency though.  As soon as there are at least two resources that might be needed, deadlocks are a danger.


| Step | Status       | Thread-1            | Thread-2            | Thread-3 
|:-----|:-------------|:--------------------|:--------------------|:--------------------
| 1    | Success      | Acquire *lock_A*    | Acquire *lock_B*    | Acquire *lock_C*        
| 2    | **Blocked**  | **Wait for lock_C** | **Wait for lock_A** | **Wait for lock_B**
| 3    | Cannot Reach | Wait for lock_B     | Wait for lock_C     | Wait for lock_A
| 4    | Cannot Reach | Work with A/B/C     | Work with A/B/C     | Work with A/B/C
| 5    | Cannot Reach | Release *lock_B*    | Release *lock_C*    | Release *lock_A*
| 6    | Cannot Reach | Release *lock_C*    | Release *lock_A*    | Release *lock_B*
| 7    | Cannot Reach | Release *lock_A*    | Release *lock_B*    | Release *lock_C*

In *Operating Systems*, Avi Silberschatz describes a non-computing example:

> Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

### How to prevent deadlocks

The unfortunate truth is that it is **very hard** to prevent deadlocks. 

One simple technique is to always use timeouts when trying to acquire locks. If you are trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. We can refactor the funds transfer this way.

In [9]:
def move_funds3(from_, lock_from, to_, lock_to):
    name, T = current_thread().name, LOCK_TIMEOUT
    all_locks = [lock_from, lock_to]
    
    for n in range(10_000):
        amount = randint(1, 100)
        
        # Acquire all locks, if failure, release and keep trying
        while not all(locks_good := [l.acquire(timeout=T) for l in all_locks]):
            for i, acquired in enumerate(locks_good):
                if acquired:
                    all_locks[i].release()
        
        # Perform the action on locked resources
        # ...omit validation on 'accounts' in earlier versions
        accounts[from_] -= amount
        accounts[to_] += amount
        
        # Release all locks
        for lock in all_locks:
            lock.release()

    print(f'{name} reached iteration limit. Stopping...')

In [10]:
LOCK_TIMEOUT = .001
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds3, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds3, args=('acc2', lock_acc2, 'acc1', lock_acc1))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())
print(f"Validation: {sum(accounts.values()):,} == 200,000")

t1 alive? True | t2 alive? True | {'acc1': -163645, 'acc2': 363645}
t1 alive? True | t2 alive? True | {'acc1': -162924, 'acc2': 362924}
t1 alive? True | t2 alive? True | {'acc1': -155605, 'acc2': 355605}
t1 alive? True | t2 alive? True | {'acc1': -140102, 'acc2': 340102}
Thread-10 reached iteration limit. Stopping...
Thread-11 reached iteration limit. Stopping...
acc1 locked? False | acc2 locked? False
Validation: 200,000 == 200,000


## Thread Synchronization Summary

Other synchronization mechanisms outside this course include `Semaphores`, `Conditions`, `Events`, `Barriers`, etc. These follow the same broad principles as locks, but vary in specifics.

An unfortunate but necessary takeaway from this lesson is: **synchronization is HARD, and error/bug prone**. Even the most experience developers create bugs when writing synchronized code..

However, synchronization is a necessary evil as well. Race conditions corrupt data, and many problems demand concurrency. In the next lessons we look at additional approaches to concurrent code that can ease the pitfalls of synchronization.

