## Deadlocks

Deadlocks are yet another problematic condition that might arise as the result of poorly synchronized code.

In [1]:
from threading import Thread, current_thread, Lock
from time import sleep
from random import randint
from sys import stderr

## A simple example

Let's start by analyzing a simple example: transfers between two "bank accounts":

In [2]:
def move_funds(from_, to_, expected):
    global accounts, kill_switch
    name, n = current_thread().name, 0
    
    for n in range(1_000_000):
        transfer = randint(0, 100)
        accounts[from_] -= transfer
        accounts[to_] += transfer        
        total = sum(accounts.values())
        
        # Exit if balance wrong or if another thread thinks so
        if total != expected:
            print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
            kill_switch = True
            break
        elif kill_switch:
            print(f"{name} other thread flagged: ${total:,} ({n:,} transactions)")
            break
    else:
        print(f'{name} reached iteration limit. Stopping...')            

From the last lesson, you can spot the problem we will encounter. The operations `accounts[from_] -= amount` and `accounts[to_] += amount` can potentially introduce race conditions.

In [3]:
kill_switch = False
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds, args=('acc1', 'acc2', 200_000))
t2 = Thread(target=move_funds, args=('acc2', 'acc1', 200_000))

t1.start()
t2.start()
t1.join()
t2.join()

print("Balances:", accounts)

Thread-5 inconsistent balance: $199,963 (3,892 transactions)
Thread-4 other thread flagged: $200,000 (22,020 transactions)
Balances: {'acc1': -807416, 'acc2': 1007416}


## Adding Locks

We've already learned about Locks, so we can use those to try synchronizing the access to the accounts. We'll create 2 locks, on for each account:

In [4]:
def move_funds2(from_, lock_from, to_, lock_to, expected):
    name = current_thread().name
    
    for n in range(1_000_000):
        amount = randint(0, 100)
        with lock_from:
            with lock_to:
                accounts[from_] -= amount
                accounts[to_] += amount
                
                total = sum(accounts.values())
                if total != expected:
                    print(f"{name} inconsistent balance: ${total:,} ({n:,} transactions)")
                    break
    else:
        print(f'{name} reached iteration limit. Stopping...')

In [5]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2, 200_000))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc1, 'acc1', lock_acc2, 200_000))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())

t1 alive? True | t2 alive? True | {'acc1': 198104, 'acc2': 1896}
t1 alive? True | t2 alive? True | {'acc1': 1000586, 'acc2': -800586}
t1 alive? True | t2 alive? True | {'acc1': 2040077, 'acc2': -1840077}
t1 alive? True | t2 alive? True | {'acc1': 1912379, 'acc2': -1712379}
t1 alive? True | t2 alive? True | {'acc1': 1397130, 'acc2': -1197130}
Thread-7 reached iteration limit. Stopping...
Thread-6 reached iteration limit. Stopping...
acc1 locked? False | acc2 locked? False


It worked (this time)! Access to the accounts is protected by the locks. 

But there is a danger lurking here. We succeeded more or less accidentally because of the order in which locks were acquired.  In fact, the initial code is conceptually wrong because it used `lock_acc1` as the protection of `acc2`, for example.  

If we make a small change, altering the order of the locks that are passed to our threads, we will find ourselves deadlocked:

In [6]:
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds2, args=('acc1', lock_acc1, 'acc2', lock_acc2, 200_000))
t2 = Thread(target=move_funds2, args=('acc2', lock_acc2, 'acc1', lock_acc1, 200_000))
print("Threads created:", accounts)
t1.start()
t2.start()
print("Threads started:", accounts)

for _ in range(8):
    print("{t1.name} alive?", t1.is_alive(), "| {t2.name} alive?", t2.is_alive(), "|", accounts)
    sleep(3)

Threads created: {'acc1': 100000, 'acc2': 100000}
Threads started: {'acc1': 88258, 'acc2': 111742}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 88258, 'acc2': 111742}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}
{t1.name} alive? True | {t2.name} alive? True | {'acc1': 204802, 'acc2': -4802}


Both locks remain locked.

In [7]:
lock_acc1.locked(), lock_acc2.locked()

(True, True)

These threads will never die, and the locks will never be released.  We can do something a bit hack-ish to repeatedly release the locks until both threads crash and end.  This would almost never be a good idea in a production program, but we do it just for teaching.

In [8]:
nerr, err_info = 0, None
while t1.is_alive() or t2.is_alive():
    try:
        lock_acc1.release(), lock_acc2.release()
    except Exception as err:
        sleep(1e-9)
        err_info = repr(err)
        nerr += 1
        
print(f"{nerr:,} errors of type {err_info}")
# Status of threads and locks
t1.is_alive(), lock_acc1.locked(), t2.is_alive(), lock_acc2.locked()

Exception in thread Exception in thread Thread-9:
Traceback (most recent call last):
Thread-8  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-4-1bff43128daa>", line 14, in move_funds2
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-4-1bff43128daa>", line 14, in move_funds2
RuntimeErrorRuntimeError: release unlocked lock
: release unlocked lock


55 errors of type RuntimeError('release unlocked lock')


(False, False, False, False)

## What are Deadlocks?

A deadlock occurs when multiple  threads (or processes, or nodes in a cluster) are mutually suspended while waiting for each other to signal processing may continue; none can get there because of the circular dependency though.  As soon as there are at least two resources that might be needed, deadlocks are a danger.


| Step | Status       | Thread-1            | Thread-2            | Thread-3 
|:-----|:-------------|:--------------------|:--------------------|:--------------------
| 1    | Success      | Acquire *lock_A*    | Acquire *lock_B*    | Acquire *lock_C*        
| 2    | **Blocked**  | **Wait for lock_C** | **Wait for lock_A** | **Wait for lock_B**
| 3    | Cannot Reach | Wait for lock_B     | Wait for lock_C     | Wait for lock_A
| 4    | Cannot Reach | Work with A/B/C     | Work with A/B/C     | Work with A/B/C
| 5    | Cannot Reach | Release *lock_B*    | Release *lock_C*    | Release *lock_A*
| 6    | Cannot Reach | Release *lock_C*    | Release *lock_A*    | Release *lock_B*
| 7    | Cannot Reach | Release *lock_A*    | Release *lock_B*    | Release *lock_C*

In _Operating Systems_, Avi Silberschatz describes a non-computing example:

> Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

### How to prevent deadlocks

The unfortunate truth is that it is **very hard** to prevent deadlocks. 

One simple technique is to always use timeouts when trying to acquire locks. If you are trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. We can refactor the funds transfer this way.

In [9]:
def move_funds3(from_, lock_from, to_, lock_to):
    name, T = current_thread().name, LOCK_TIMEOUT
    all_locks = [lock_from, lock_to]
    
    for n in range(10_000):
        amount = randint(0, 100)
        
        # Acquire all locks, if failure, release and keep trying
        while not all(locks_good := [l.acquire(timeout=T) for l in all_locks]):
            for i, acquired in enumerate(locks_good):
                if acquired:
                    all_locks[i].release()
        
        # Perform the action on locked resources
        # ... omit validation on 'accounts' in earlier versions
        accounts[from_] -= amount
        accounts[to_] += amount
        
        # Release all locks
        for lock in all_locks:
            lock.release()

    print(f'{name} reached iteration limit. Stopping...')

In [10]:
LOCK_TIMEOUT = .001
lock_acc1, lock_acc2 = Lock(), Lock()
accounts = {'acc1': 100_000, 'acc2': 100_000}

t1 = Thread(target=move_funds3, args=('acc1', lock_acc1, 'acc2', lock_acc2))
t2 = Thread(target=move_funds3, args=('acc2', lock_acc2, 'acc1', lock_acc1))

t1.start()
t2.start()

while t1.is_alive() or t2.is_alive():
    print("t1 alive?", t1.is_alive(), "| t2 alive?", t2.is_alive(), "|", accounts)
    sleep(3)

print("acc1 locked?", lock_acc1.locked(), "| acc2 locked?", lock_acc2.locked())
print(f"Validation: {sum(accounts.values()):,} == 200,000")

t1 alive? True | t2 alive? True | {'acc1': 75963, 'acc2': 124037}
t1 alive? True | t2 alive? True | {'acc1': 24257, 'acc2': 175743}
t1 alive? True | t2 alive? True | {'acc1': 26383, 'acc2': 173617}
t1 alive? True | t2 alive? True | {'acc1': 36203, 'acc2': 163797}
t1 alive? True | t2 alive? True | {'acc1': 17967, 'acc2': 182033}
t1 alive? True | t2 alive? True | {'acc1': 58775, 'acc2': 141225}
t1 alive? True | t2 alive? True | {'acc1': 102415, 'acc2': 97585}
Thread-11 reached iteration limit. Stopping...
Thread-10 reached iteration limit. Stopping...
acc1 locked? False | acc2 locked? False
Validation: 200,000 == 200,000


## Thread Synchronization Summary

There are several other synchronization mechanisms that we're not explicitly talking about, like `Semaphores`, `Conditions`, `Events`, `Barriers`, etc. These follow the same principles as locks, but are used for other purposes.

The main takeaway from this lesson is: **synchronization is HARD, and error/bug prone**. Even the most experience developers avoid writing synchronized code, there's always something going wrong.

Still, synchronization seems to be a necessary evil. We don't want to have race conditions in our code. In our following lessons we'll explore other alternatives to create correct code without the need of working with synchronization.

