# Thread Data and Race Conditions

We saw at the end of the last lesson that threads can access to share data. This is useful to share configuration, but it introduces problems. The main problem that arises is _[**race conditions**](https://en.wikipedia.org/wiki/Race_condition)_. Let us create an example.

In [1]:
import threading
from threading import Thread, current_thread
from time import sleep
import random
import sys

## Shared state

Previously we had created a dictionary and had issues arrive when different threads update values.  But actually, one simple share scalar value suffices to see the problem.

In [2]:
counter, nthread, nloop = 0, 100, 50_000 

def increment(n):
    global counter
    for _ in range(n):
        counter += 1

for _ in range(nthread):
    Thread(target=increment, args=(nloop,)).start()
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")

50,000 loops X 100 threads -> counter is 4,326,495


### Where we went wrong?

A subtlety in our program is that the single augmented assignment, `+=` is not **atomic**. 

Behind the scenes, `+=` runs 2 (or more) operations. In pseudo-code, this looks like:

```
add(counter, 1) TO <temp>  # A
store("counter", <temp>)   # B
```

Multiple threads are executing concurrently. Hence one or more may hold a value that becomes outdated between the time when the addition is performed (A) and the time when the name `counter` is rebound (B).

To be more exact, we can actually look at the Python bytecode (wordcode).  A thread may be suspended after any single instruction, and there are **four** instructions involved in the single augmented assignment.

In [3]:
import dis
dis.dis(increment)

  5           0 LOAD_GLOBAL              0 (range)
              2 LOAD_FAST                0 (n)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                12 (to 22)
             10 STORE_FAST               1 (_)

  6          12 LOAD_GLOBAL              1 (counter)
             14 LOAD_CONST               1 (1)
             16 INPLACE_ADD
             18 STORE_GLOBAL             1 (counter)
             20 JUMP_ABSOLUTE            8
        >>   22 LOAD_CONST               0 (None)
             24 RETURN_VALUE


## Thread Synchronization

<img src="recording_studio_light.png" width="25%" align="right"/>How can we fix the race condition? 

We need a way to keep the threads from stepping onto each other's data, some signal that a resource is **busy**.

*(Example of INEs studios, a recording light is on, the studio is busy, nobody will enter the room)*

<img src="recording_studio_light.png" width="25%" align="right"/>The easiest synchronization mechanism is a [Lock](https://en.wikipedia.org/wiki/Lock_(computer_science)), or a Mutex (mutual exclusion lock). Python includes the very intuitive `threading.Lock` class. 

A Lock works like the RECORDING light pictured. The first person (thread) that "arrives" turns on the light (acquires the lock). Anyone else has to wait for the person/thread to turn the light off and make the room (resource) available again.

## Locking

In [4]:
lock = threading.Lock()

def lock_hogger(lock, wait=5):
    name = current_thread().name
    print(f"{name}: acquiring lock.")
    lock.acquire()
    print(f"{name}: Lock acquired, sleeping")
    sleep(wait)
    print(f"{name}: Woke up, releasing lock")
    lock.release()

Thread(target=lock_hogger, args=(lock,)).start()

Thread-104: acquiring lock.
Thread-104: Lock acquired, sleeping


We can ask about the state of a lock.

In [5]:
print("Is lock currently being used?", lock.locked())

# Claim the lock
lock.acquire()
print("Lock acquired?", lock.locked())

# We are done now
lock.release()
print("Still being used?", lock.locked())

Is lock currently being used? True
Thread-104: Woke up, releasing lock
Lock acquired? True
Still being used? False


If we only ever ran sequential code, the lock would just be equivalent to a Boolean value.  But it is shared across all threads that have it in scope.  When another thread tries to acquire a lock that is in use, it will block until the lock becomes free.

It is important (and sometimes tricky) to get the sequence of actions right, since a released lock with raise an exception if you try to release it again.

In [17]:
t = Thread(target=lock_hogger, args=(lock, 0.1))
t.start()
print("Releasing lock in MainThread...")
lock.release()

Thread-117: acquiring lock.
Thread-117: Lock acquired, sleeping
Releasing lock in MainThread...


Exception in thread Thread-117:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-4-76a137a4e77b>", line 10, in lock_hogger
RuntimeError: release unlocked lock


Thread-117: Woke up, releasing lock


Where we get an exception among the competing `lock.release()` calls depends on the timings of threads.

In [18]:
t = Thread(target=lock_hogger, args=(lock, 0))
t.start()
sleep(0.1)
try:
    print("Releasing lock in MainThread...")
    lock.release()
except Exception as err:
    print(f"{repr(err)} in {current_thread().name}", file=sys.stderr)

Thread-118: acquiring lock.
Thread-118: Lock acquired, sleeping
Thread-118: Woke up, releasing lock
Releasing lock in MainThread...


RuntimeError('release unlocked lock') in MainThread


## Fixing the counter

Now that we know about locks, we can use them to fix our counter example:

In [33]:
lock = threading.Lock()
counter, nthread, nloop = 0, 100, 50_000 

def increment(n, lock):
    global counter
    for _ in range(n):
        lock.acquire()
        counter += 1
        lock.release()

for _ in range(nthread):
    Thread(target=increment, args=(nloop, lock)).start()
    
print(f"{nloop:,} loops X {nthread:,} threads -> counter is {counter:,}")

50,000 loops X 100 threads -> counter is 45,374


In [9]:
def increment(n, lock):
    global COUNTER
    for _ in range(n):
        lock.acquire()
        COUNTER += 1
        lock.release()
        sleep(0.001)

In [10]:
ITERATIONS = 1000

In [11]:
lock = threading.Lock()

In [12]:
threads = [Thread(target=increment, args=(ITERATIONS, lock)) for _ in range(10)]

In [13]:
[t.start() for t in threads];

Exception in thread Exception in thread Thread-108:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Thread-107:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-109:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-110:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-111:
Traceback (most recent call last):
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/davidmertz/miniconda3/envs/INE/lib/python3.8/threading.py", line 870, in run
    self.run()
  File "/home/davidmertz

In [14]:
[t.join() for t in threads];

In [15]:
COUNTER

10

In [16]:
assert COUNTER == (len(threads) * ITERATIONS), f"Invalid value for counter: {COUNTER}"

AssertionError: Invalid value for counter: 10

It doesn't matter how many times we run the example, our code will always be synchronized!

#### Problems with synchronization

Locks are acquired before accessing what we call "Critical Sections"; important sections in our code that can potentially introduce race conditions. The usual process is:

```python
lock = threading.Lock() # Problem [1]

# before entering critical section
lock.acquire()          # Problem [2]

# critical section
do_your_thing()         # Problem [3]

# after we're done with it
lock.release()          # Problem [4]
```

The problem is that locks (and many other synchronization mechanisms) are "cooperative". You're cooperating by using locks, but you're not obliged to use them. In a team of n developers, just one of them screws up with their lock management, everybody loses.

These are the things that can potentially go wrong with cooperative, manual synchronization mechanisms:

1. You might forget to use locks at all! You might have failed to recognize the situation as having a "critical section".
2. You might forget to acquire the lock, getting directly into the critical section.
3. Your critical section might be using resources NOT protected by the lock you're using, so other threads can be stepping onto that.
4. You might forget to release the lock, or your code could break before you're able to release the lock (next example)
5. Deadlocks! (more on later).

Problem *No.4* is very common, let's see an example of it:

In [None]:
lock = threading.Lock()

In [None]:
def faulty_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    lock.acquire()
    print("\t\tThread: Lock acquired")
    if sleep:
        time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")
    lock.release()

In [None]:
t = Thread(target=faulty_lock_handler, args=(lock, 'x'))  # The `sleep` param is incorrect, should be a number

In [None]:
t.start()

Trying to acquire the lock will block **FOREVER**:

In [None]:
lock.acquire()

We could potentially add a _timeout_ to our acquire method, it will block for `n` seconds and if it hasn't acquired the lock, it'll return `False`:

In [None]:
lock.acquire(timeout=2)

Or we can even make it non-blocking, if it's not able to acquire the lock, it'll release it immediately:

In [None]:
lock.acquire(blocking=False)

Thankfully, we have a handle on the lock variable, so we can release it from here, but this is cheating:

In [None]:
lock.release()

In [None]:
lock.acquire(blocking=False)

The way to solve this is to use Locks as Context Managers, so we're sure we'll release the lock EVEN if something goes wrong within the critical section:

In [None]:
lock = threading.Lock()

In [None]:
def fixed_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    with lock:
        print("\t\tThread: Lock acquired")
        if sleep:
            time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")

In [None]:
t = Thread(target=fixed_lock_handler, args=(lock, 5))

In [None]:
t.start()

In [None]:
lock.acquire()

In [None]:
lock.release()

In [None]:
t = Thread(target=fixed_lock_handler, args=(lock, 'x'))  # The `sleep` param is incorrect, should be a number

In [None]:
t.start()

Is the lock still acquired?

In [None]:
lock.locked()

In [None]:
lock.acquire()

In [None]:
lock.release()

As you can see, the critical section failed with an exception, but the lock was released before exiting. As a reference, the `with` context manager is syntactic sugar for the pattern:

```python
lock.acquire()
try:
    critical_section()
finally:
    lock.release()  # We'll release the lock no matter what
```

#### Fixing our counter using `with`

The last touch for our counter should be to use the context manager protocol of the lock object:

In [None]:
COUNTER = 0

In [None]:
def increment(n, lock):
    global COUNTER
    for _ in range(n):
        with lock:
            COUNTER += 1
        time.sleep(0.001)

In [None]:
ITERATIONS = 1000

In [None]:
lock = threading.Lock()

In [None]:
threads = [Thread(target=increment, args=(ITERATIONS, lock)) for _ in range(10)]

In [None]:
[t.start() for t in threads];

In [None]:
[t.join() for t in threads];

In [None]:
COUNTER

In [None]:
assert COUNTER == (len(threads) * ITERATIONS), f"Invalid value for counter: {COUNTER}"

Perfect!

## Summary:

We've seen the importance of keeping our critical sections safe, to avoid race conditions. But there's no free lunch. To prevent race conditions we have to use synchronization mechanisms, and as we saw, that can carry other issues.

In the next section we'll explore one of the many things that can go wrong with manual synchronization: one of the the scariest words in computer science: **Deadlocks**.