# Thread Data and Race Conditions

We saw at the end of the last lesson that threads can access to share data. This is useful to share configuration, but it introduces problems. The main problem that arises is _[**race conditions**](https://en.wikipedia.org/wiki/Race_condition)_. Let us create an example.

In [46]:
import threading
from threading import Thread, current_thread
from time import sleep, time
import random
import sys

## Shared state

Previously we had created a dictionary and had issues arrive when different threads update values.  But actually, one simple share scalar value suffices to see the problem.

In [56]:
counter, nthread, nloop = 0, 100, 50_000 

def increment(n):
    global counter
    for _ in range(n):
        counter += 1

threads = [Thread(target=increment, args=(nloop,)) for _ in range(nthread)]
for t in threads:
    t.start()

In [57]:
# Make sure they have finished before we report
while alive := sum(t.is_alive() for t in threads):
    sleep(2)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")    

50,000 loops X 100 threads -> counter is 4,950,000


### Where we went wrong?

A subtlety in our program is that the single augmented assignment, `+=` is not **atomic**. 

Behind the scenes, `+=` runs 2 (or more) operations. In pseudo-code, this looks like:

```
add(counter, 1) TO <temp>  # A
store("counter", <temp>)   # B
```

Multiple threads are executing concurrently. Hence one or more may hold a value that becomes outdated between the time when the addition is performed (A) and the time when the name `counter` is rebound (B).

To be more exact, we can actually look at the Python bytecode (wordcode).  A thread may be suspended after any single instruction, and there are **four** instructions involved in the single augmented assignment.

In [None]:
import dis
dis.dis(increment)

## Thread Synchronization

<img src="recording_studio_light.png" width="25%" align="right"/>How can we fix the race condition? 

We need a way to keep the threads from stepping onto each other's data, some signal that a resource is **busy**.

*(Example of INEs studios, a recording light is on, the studio is busy, nobody will enter the room)*

<img src="recording_studio_light.png" width="25%" align="right"/>The easiest synchronization mechanism is a [Lock](https://en.wikipedia.org/wiki/Lock_(computer_science)), or a Mutex (mutual exclusion lock). Python includes the very intuitive `threading.Lock` class. 

A Lock works like the RECORDING light pictured. The first person (thread) that "arrives" turns on the light (acquires the lock). Anyone else has to wait for the person/thread to turn the light off and make the room (resource) available again.

## Locking

In [None]:
lock = threading.Lock()

def lock_hogger(lock, wait=5):
    name = current_thread().name
    print(f"{name}: acquiring lock.")
    lock.acquire()
    print(f"{name}: Lock acquired, sleeping")
    sleep(wait)
    print(f"{name}: Woke up, releasing lock")
    lock.release()

Thread(target=lock_hogger, args=(lock,)).start()

We can ask about the state of a lock.

In [None]:
print("Is lock currently being used?", lock.locked())

# Claim the lock
lock.acquire()
print("Lock acquired?", lock.locked())

# We are done now
lock.release()
print("Still being used?", lock.locked())

If we only ever ran sequential code, the lock would just be equivalent to a Boolean value.  But it is shared across all threads that have it in scope.  When another thread tries to acquire a lock that is in use, it will block until the lock becomes free.

It is important (and sometimes tricky) to get the sequence of actions right, since a released lock with raise an exception if you try to release it again.

In [None]:
lock = threading.Lock()
t = Thread(target=lock_hogger, args=(lock, 0.1))
t.start()
print("Releasing lock in MainThread...")
lock.release()

Where we get an exception among the competing `lock.release()` calls depends on the timings of threads.

In [None]:
lock = threading.Lock()
t = Thread(target=lock_hogger, args=(lock, 0))
t.start()
sleep(0.1)
try:
    print("Releasing lock in MainThread...")
    lock.release()
except Exception as err:
    print(f"{repr(err)} in {current_thread().name}", file=sys.stderr)

## Fixing the counter

Now that we know about locks, we can use them to fix our counter example:

In [47]:
lock = threading.Lock()
counter, nthread, nloop = 0, 100, 50_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        lock.acquire()
        counter += 1
        lock.release()

threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock))
    threads.append(t)
    t.start()

In [48]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 99
Num threads alive: 99
Num threads alive: 99
Num threads alive: 99
Num threads alive: 0
50,000 loops X 100 threads -> counter is 5,000,000
Running everything took 51.14 seconds!


## Problems with synchronization

Locks are acquired before accessing what we call "Critical Sections"; important sections in our code that can potentially introduce race conditions. 

The problem is that locks are "cooperative", but you're not obliged to use them. If just one function in the codebase does lock management wrong, the problems can propagate. Possible problems:

1. You fail to recognize that there is a "critical section".
2. You fail to acquire the lock before entering the critical section.
3. The critical section might use resources NOT protected by the lock.
4. You fail to release the lock (e. g. code breaks before releasing).
5. Deadlocks! (more in next lesson).

### Exceptions

What if our function doesn't run has hoped?

In [None]:
lock = threading.Lock()

def faulty_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    lock.acquire()
    print("\t\tThread: Lock acquired")
    time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")
    lock.release()

# The `sleep` param is incorrect, should be a number
Thread(target=faulty_lock_handler, args=(lock, 'x')).start()

Trying to acquire the lock will block **FOREVER**:

In [61]:
lock.acquire()

KeyboardInterrupt: 

We could potentially add a _timeout_ to our acquire method, it will block for the specified number of seconds; if it hasn't acquired the lock, it returns `False`:

In [62]:
lock.acquire(timeout=2)

False

We can make it non-blocking, if it is not able to acquire the lock, it will release it immediately:

In [65]:
lock.acquire(blocking=False)

True

Thankfully, we have a handle on the lock variable, so we can release it from here, but this is cheating:

In [66]:
lock.release()
lock.acquire(blocking=False)

True

## Context Manager

The way to solve this problem is to use a locks as Context Managers. This will release the lock **even if** something goes wrong within the critical section:

In [None]:
def fixed_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    with lock:
        print("\t\tThread: Lock acquired")
        if sleep:
            time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")

In [67]:
lock = threading.Lock()
Thread(target=fixed_lock_handler, args=(lock, 5)).start()

		Thread: Acquiring lock.
		Thread: Lock acquired


Exception in thread Thread-4527:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-67-8211ba63e980>", line 8, in fixed_lock_handler
AttributeError: 'builtin_function_or_method' object has no attribute 'sleep'


In [68]:
lock.acquire()

True

In [69]:
lock.release()

In [70]:
# The `sleep` param is incorrect, should be a number
Thread(target=fixed_lock_handler, args=(lock, 'x')).start()

		Thread: Acquiring lock.
		Thread: Lock acquired


Exception in thread Thread-4528:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-67-8211ba63e980>", line 8, in fixed_lock_handler
AttributeError: 'builtin_function_or_method' object has no attribute 'sleep'


Is the lock still acquired?

In [71]:
lock.locked()

False

In [72]:
lock.acquire()

True

In [73]:
lock.release()

The critical section failed with an exception, but the lock was released before exiting. Using `with` and the lock as context manager is syntactic sugar for this pattern:

```python
lock.acquire()
try:
    critical_section()
finally:
    lock.release()  # We'll release the lock no matter what
```

## Improving the threaded counter

The last touch for our counter should be to use the context manager protocol of the lock object:

In [77]:
lock = threading.Lock()
counter, nthread, nloop = 0, 100, 50_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        with lock:
            counter += 1
            
threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock))
    threads.append(t)
    t.start()

In [78]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 100
Num threads alive: 99
Num threads alive: 99
Num threads alive: 0
50,000 loops X 100 threads -> counter is 5,000,000
Running everything took 55.22 seconds!


## Coarser Locks

The threaded program is a **whole lot** slower than a serial version! When we do the augmented assignment, `+=` it is not *atomic*, but it is still a lot less work than acquiring and releasing the lock.  The program is dominated by juggling locks.

In [80]:
%%time
counter, nthread, nloop = 0, 100, 50_000 

def increment(nthread, nloop):
    global counter
    for n in range(nloop):
        for m in range(nthread):
            counter += 1
            
increment(nthread, nloop)

CPU times: user 354 ms, sys: 0 ns, total: 354 ms
Wall time: 350 ms


Even the threaded version that did not use locks will be slower than the serial version, but not hundreds of times slower.  By releasing locks less often, and doing more work within a thread, we can make a compromise.

In [83]:
lock = threading.Lock()
counter, nthread, nloop = 0, 100, 50_000
coarseness = 1000

def increment(nloop, lock, coarsness):
    assert nloop % coarseness == 0, "Cannot evenly divide word"
    global counter
    for chunk in range(nloop//coarseness):
        with lock:
            for _ in range(coarseness):
                counter += 1

threads, now = [], time()
for _ in range(nthread):
    t = Thread(target=increment, args=(nloop, lock, coarseness))
    threads.append(t)
    t.start()

In [84]:
while alive := sum(t.is_alive() for t in threads):
    sleep(5)
    print("Num threads alive:", sum(t.is_alive() for t in threads))
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")
print(f"Running everything took {time()-now:.2f} seconds!")

50,000 loops X 100 threads -> counter is 5,000,000
Running everything took 3.26 seconds!


## Summary:

We've seen the importance of keeping our critical sections safe, to avoid race conditions. But there's no free lunch. To prevent race conditions we have to use synchronization mechanisms, and as we saw, that can carry other issues.

In the next section we'll explore one of the many things that can go wrong with manual synchronization: one of the the scariest words in computer science: **Deadlocks**.