In [1]:
import time
import threading
from threading import Thread
import random
import sys

# Shared Data and Synchronization

As we saw in our previous notebook, threads can access to share data. This is useful to communicate things and access common variables. But it'll also introduce problems, mainly [**Race Conditions**](https://en.wikipedia.org/wiki/Race_condition). Let's see an example of a Race Condition.

## Race Condition

We will start 10 threads, each one increases the shared variable `COUNTER` by 1 for 1000 iterations. If running these threads sequentially, we will have `COUNTER=10*1000=10,000`. However, due to race condition, we will not have `COUNTER=10,000`

In [2]:
COUNTER = 0

In [3]:
def increment(n):
    global COUNTER
    for _ in range(n):
        COUNTER += 1
        time.sleep(0.001)

In [4]:
ITERATIONS = 1000

In [5]:
threads = [Thread(target=increment, args=(ITERATIONS,)) for _ in range(10)]

In [6]:
[t.start() for t in threads];

In [7]:
assert COUNTER == (len(threads) * ITERATIONS), f"Invalid value for counter: {COUNTER}"

AssertionError: Invalid value for counter: 8143

What's happening here is that the operation `+=` is not **atomic** (an one-step operation). Which means that, behind the scenes, `+=` runs 2 (or more) operations:

```python
aux = COUNTER + 1
COUNTER = aux
```

Both threads are executing concurrently (potentially at the same time) and they're reading outdated values of `COUNTER`, which results in a race condition.

## Thread Synchronization

How can we fix then this race condition? Basically, we need a way to keep the threads from stepping onto each other's data, some signal that the given resource _"is busy"_:

<img src="img/recording_studio_light.png" width=400/>

*(Example of INEs studios, a recording light is on, the studio is busy, nobody will enter the room)*

The easiest synchronization mechanism is a [_Lock_](https://en.wikipedia.org/wiki/Lock_(computer_science)), or a _Mutex_ (mutual exclusion lock). Python includes the very intuitive `threading.Lock` class. Let's see how a Lock works.

A Lock works in the same way as the Studio Light from the picture. The first one that "arrives" to that given resource "turns on the light", or, formally, "acquires the lock". Any other threads reaching that point, if they want to acquire the lock, they have to wait for the first thread to "release it". Let's see an example:

#### Locking

In [8]:
lock = threading.Lock()

In [9]:
def lock_hogger(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    lock.acquire()
    print("\t\tThread: Lock acquired, sleeping")
    if sleep:
        time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")
    lock.release()

In [13]:
t = Thread(target=lock_hogger, args=(lock, ))

In [14]:
t.start()

		Thread: Acquiring lock.
		Thread: Lock acquired, sleeping


Trying to acquire the lock here will probably block for a few seconds:

In [23]:
lock.locked()

False

The main thread acquires the lock

In [24]:
lock.acquire()
print("Lock acquired!")

Lock acquired!


Once the lock has been acquired, if any other thread tries to acquire it, it will not be able to acquire it:

In [27]:
t = Thread(target=lock_hogger, args=(lock, 0))

In [28]:
t.start()

		Thread: Acquiring lock.


Now if the main thread releases the lock, the other threads will be able to acquire it

In [29]:
lock.release()

		Thread: Lock acquired, sleeping
		Thread: Woke up, releasing lock
		Thread: Lock acquired, sleeping
		Thread: Woke up, releasing lock


#### Time to fix our counter!

Now that we know about locks, we can use them to fix our counter example:

In [6]:
COUNTER = 0

In [7]:
def increment(n, lock):
    global COUNTER
    for _ in range(n):
        # lock before increment COUNTER
        lock.acquire()
        COUNTER += 1
        lock.release()
        time.sleep(0.001)

In [8]:
ITERATIONS = 1000

In [9]:
lock = threading.Lock()

In [10]:
threads = [Thread(target=increment, args=(ITERATIONS, lock)) for _ in range(10)]

In [11]:
[t.start() for t in threads];

In [12]:
[t.join() for t in threads];

In [13]:
COUNTER

10000

In [14]:
assert COUNTER == (len(threads) * ITERATIONS), f"Invalid value for counter: {COUNTER}"

It doesn't matter how many times we run the example, our code will always be synchronized!

#### Problems with synchronization

Locks are acquired before accessing what we call "Critical Sections"; important sections in our code that can potentially introduce race conditions. The usual process is:

```python
lock = threading.Lock() # Problem [1]
# before entering critical section
lock.acquire()          # Problem [2]
# critical section
do_your_thing()         # Problem [3]
# after we're done with it
lock.release()          # Problem [4]
```

The problem is that locks (and many other synchronization mechanisms) are "cooperative". You're cooperating by using locks, but you're not obliged to use them. In a team of n developers, just one of them screws up with their lock management, everybody loses.

These are the things that can potentially go wrong with cooperative, manual synchronization mechanisms:
```diff
- 1. You might *forget to use locks at all*! You might have failed to recognize the situation as having a "critical section".
- 2. You might *forget to acquire the lock*, getting directly into the critical section.
- 3. *Your critical section might be using resources NOT protected by the lock you're using*, so other threads can be stepping onto that.
- 4. You might *forget to release the lock*, or your code could break before you're able to release the lock (next example)
- 5. *Deadlocks*! (more on later).
```

## Code breaks before a lock is released

Problem *No.4 - Forget to release the lock / code breaks before the lock is released* is very common, let's see an example of it:

In [2]:
lock = threading.Lock()

In [48]:
def faulty_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    lock.acquire()
    print("\t\tThread: Lock acquired")
    if sleep:
        time.sleep(sleep)  # this code will raise an exception
        # the lock below will never be released
    print("\t\tThread: Woke up, releasing lock")
    lock.release()

In [49]:
t = Thread(target=faulty_lock_handler, args=(lock, 'x'))  # The `sleep` param is incorrect, should be a number

In [None]:
t.start()

		Thread: Acquiring lock.
		Thread: Lock acquired


Exception in thread Thread-20 (faulty_lock_handler):
Traceback (most recent call last):
  File "/home/dk/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner


    self.run()
  File "/home/dk/miniconda3/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_204815/835078815.py", line 6, in faulty_lock_handler
TypeError: 'str' object cannot be interpreted as an integer


Trying to acquire the lock will block **FOREVER**, and we will have to interupt the kernel to make it stop:

In [51]:
lock.acquire()

KeyboardInterrupt: 

We could potentially add a _timeout_ to our acquire method, it will block for `n` seconds and if it hasn't acquired the lock, it'll return `False`:

In [52]:
lock.acquire(timeout=2)

False

Or we can even make it non-blocking, if it's not able to acquire the lock, it'll release it immediately:

In [20]:
lock.acquire(blocking=False)

False

Thankfully, we have a handle on the lock variable, so we can release it from here, but this is cheating:

In [24]:
lock.release()

In [25]:
lock.acquire(blocking=False)

True

The way to solve this is to <span style="color: lime"> use Locks as Context Managers </span>, so we're sure we'll release the lock EVEN if something goes wrong within the critical section:

In [57]:
lock = threading.Lock()
lock

<unlocked _thread.lock object at 0x7fc704484a80>

In [3]:
def fixed_lock_handler(lock, sleep=10):
    print("\t\tThread: Acquiring lock.")
    with lock:
        print("\t\tThread: Lock acquired")
        if sleep:
            time.sleep(sleep)
    print("\t\tThread: Woke up, releasing lock")

In [4]:
t = Thread(target=fixed_lock_handler, args=(lock, 5))

In [5]:
t.start()

		Thread: Acquiring lock.
		Thread: Lock acquired


In [6]:
lock.acquire()

		Thread: Woke up, releasing lock


True

In [7]:
lock.release()

In [8]:
t = Thread(target=fixed_lock_handler, args=(lock, 'x'))  # The `sleep` param is incorrect, should be a number

In [None]:
t.start()

		Thread: Acquiring lock.
		Thread: Lock acquired


Exception in thread Thread-5 (fixed_lock_handler):
Traceback (most recent call last):
  File "/home/dk/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/dk/miniconda3/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_230447/4120026485.py", line 6, in fixed_lock_handler


TypeError: 'str' object cannot be interpreted as an integer


Is the lock still acquired? (It should not be)

In [10]:
lock.locked()

False

In [11]:
lock.acquire()

True

In [12]:
lock.release()

As you can see, the critical section failed with an exception, but the lock was released before exiting. As a reference, the `with` context manager is syntactic sugar for the pattern:

```python
lock.acquire()
try:
    critical_section()
finally:
    lock.release()  # We'll release the lock no matter what
```

#### Fixing our counter using `with`

The last touch for our counter should be to use the context manager protocol of the lock object:

In [15]:
COUNTER = 0

In [16]:
def increment(n, lock):
    global COUNTER
    for _ in range(n):
        with lock:
            COUNTER += 1
        time.sleep(0.001)

In [17]:
ITERATIONS = 1000

In [18]:
lock = threading.Lock()

In [19]:
threads = [Thread(target=increment, args=(ITERATIONS, lock)) for _ in range(10)]

In [20]:
[t.start() for t in threads];

In [21]:
[t.join() for t in threads];

In [22]:
COUNTER

10000

In [23]:
assert COUNTER == (len(threads) * ITERATIONS), f"Invalid value for counter: {COUNTER}"

Perfect!

## Summary:

We've seen the importance of keeping our critical sections safe, to avoid race conditions. But there's no free lunch. To prevent race conditions we have to use synchronization mechanisms, and as we saw, that can carry other issues.

In the next section we'll explore one of the many things that can go wrong with manual synchronization: one of the the scariest words in computer science: **Deadlocks**.