## The evil GIL 😈🔒

GIL stands for Global Interpreter Lock. It is a mechanism used in the CPython implementation of Python, which is the reference implementation and the most widely used one. The purpose of the GIL is to synchronize access to Python objects, preventing multiple native threads from executing Python bytecodes at once. 

In Python, when multiple threads are used in a program, the GIL ensures that **only one thread executes Python bytecodes at any given time, even on multi-core systems**. This means that although you can have multiple threads in a Python program, they won't run in parallel and **cannot take full advantage of multiple CPU cores for CPU-bound tasks**.

The reason the GIL exists is historical. It was introduced to simplify memory management in CPython and make it thread-safe by preventing concurrent access to Python objects, which can be challenging to manage without proper synchronization. By having a lock that allows only one thread to execute Python code at a time, CPython avoids many low-level thread safety issues.

However, the GIL can have implications for performance in certain scenarios. Since only one thread can execute Python bytecodes at a time, it can limit the potential performance gains of using multiple threads for CPU-bound tasks. However, the GIL doesn't significantly impact performance when threads are primarily performing I/O-bound tasks or waiting for external resources.

It's important to note that the GIL is specific to CPython. Other implementations of Python, such as Jython or IronPython, do not have a GIL and can take full advantage of multiple cores for parallel execution.

### An example

Let's look at an example which explains the effect of GIL in Python programs, consider the following code:

In [12]:
%%timeit

import time
import threading


def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)


fibonacci(31)
fibonacci(32)
fibonacci(33)

3.05 s ± 35.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Let's now rewrite it using `threading` module to see if it takes less time:

In [13]:
%%timeit

import time
import threading


def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)


thread_1 = threading.Thread(target=fibonacci, args=(31,))
thread_2 = threading.Thread(target=fibonacci, args=(32,))
thread_3 = threading.Thread(target=fibonacci, args=(33,))

thread_1.start()
thread_2.start()
thread_3.start()

thread_1.join()
thread_2.join()
thread_3.join()

4.01 s ± 196 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


This code takes more time than the non-threading code because of the Global Interpreter Lock (GIL) in CPython, The GIL prevents multiple threads from executing Python bytecodes in parallel. Consequently, even though we have multiple threads in our code, they cannot fully utilize multiple CPU cores for CPU-bound tasks.

In fact, in this specific case, the threaded version takes longer to execute due to the additional overhead of thread creation and switching.

> It's important to understand that when using the `multiprocessing` module in Python, we can take advantage of parallel execution since each process operates independently with its own Global Interpreter Lock (GIL).

In [14]:
%%timeit

import time
import multiprocessing


def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)


process_1 = multiprocessing.Process(target=fibonacci, args=(31,))
process_2 = multiprocessing.Process(target=fibonacci, args=(32,))
process_3 = multiprocessing.Process(target=fibonacci, args=(33,))

process_1.start()
process_2.start()
process_3.start()

process_1.join()
process_2.join()
process_3.join()

1.79 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### GIL and I/O-bound tasks

The GIL is primarily a concern in CPU-bound tasks where the execution is dominated by Python bytecode, such as computational tasks involving heavy calculations. In such cases, the GIL can limit the potential performance gains from using multiple threads or processes.

**However, in I/O-bound tasks, where a significant amount of time is spent waiting for input/output operations to complete, the GIL is less of a bottleneck**. When performing I/O operations like reading from or writing to a file, making network requests, or interacting with a database, **the GIL is often released by the underlying system calls**. This allows other threads or processes to execute Python code while the I/O operation is in progress. Therefore, the GIL has a smaller impact on the overall performance of I/O-bound tasks.

In I/O-bound scenarios, leveraging multiple threads or processes can still provide benefits. For example, while one thread is waiting for an I/O operation to complete, other threads can continue executing Python code, making progress on other tasks. This can lead to improved overall throughput and responsiveness, even though each individual thread may not be able to fully utilize the CPU due to the GIL.

In summary, the GIL is less of a concern in I/O-bound tasks because it is often released during I/O operations, allowing other threads or processes to execute Python code. This makes it possible to leverage parallelism effectively in I/O-bound scenarios, leading to improved performance and responsiveness.

#### Example: Reading a web page

Consider the following code, in which we send a request to read a web page:

In [21]:
%%timeit

import requests


def read_page(address: str) -> None:
    response = requests.get(address)


read_page('https://github.com')
read_page('https://google.com')
read_page('https://python.org')

4.14 s ± 221 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Let's now rewrite it using `threading` module to see if it takes less time:

In [22]:
%%timeit

import requests
import threading


def read_page(address: str) -> None:
    response = requests.get(address)


thread_1 = threading.Thread(target=read_page, args=('https://github.com',))
thread_2 = threading.Thread(target=read_page, args=('https://google.com',))
thread_3 = threading.Thread(target=read_page, args=('https://python.org',))

thread_1.start()
thread_2.start()

thread_1.join()
thread_2.join()

1.63 s ± 64.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Now we can see that in this case we had a significant performance boost.

By observing this example, we can gain an understanding of how designing concurrent programs in Python can enhance performance, even considering the presence of the Global Interpreter Lock (GIL) in the CPython implementation.

<img src="./pics/gil.png" alt="io-bound-tasks" width="700" height="400">