# Multithreading and Multiprocessing

1. [Programs, Processes, and Threads](#intro)
2. [When to Choose Threads vs. Multiple Processes](#choose)
3. [Global Interpreter Lock](#gil)
    * [Original Purpose](#purpose)
    * [Impact on Multi-Threaded Python Programs](#impact)
    * [Alternatives to Multi-Threading in Python](#alt)
<br>
<br>
4. [Concurrency](#conc)
    * [Concurrency and Parallelism](#cp)
    * [When is Concurrency Useful?](#use)
<br>
<br>
5. [Multiprocessing](#multi)
    * [Advantages of Multiprocessing](#multi2)
<br>
<br>
6. [Threading in Python](#thread)
    * [More on Threads](#thread2)
    * [Using a ThreadPoolExecutor](#thread3)
    * [Race Conditions](#race)
        * [Solving Race Conditions via Basic Synchronization using Lock](#thread4)
        * [Deadlock](#dead)
    * [Producer-Consumer Threading](#thread6)
        * [The Lock Method](#pct1)
        * [The Queue Method](#pct2)
    * [Threading Objects](#thread7)

## Programs, Processes, and Threads  <a class="anchor" id="intro"></a>

* Programs are loaded into the machine's non-valtile memory in binary form
* Programs obviously need memory and various OS resources to run
    * Registers - data holding places that are part of the CPU. May hold instructions, storage address, or some other process-dependent data
    * Program Counter (or Instruction Pointer) - keeps track of where a computer is in its program sequence
    * Stack - data structure that stores information about the active subroutines of a computer program and used as a scratch space for the process
    * Heap - dunamically allocated memory for a process
* __Process__ - a program that has been loaded into memory _along with all the resources it needs to operate_
    * Each instance of a program is its own process which runs independently and isolated from other processes
    * Cannot directly access shared data in other processes
    * Switching processes requires some time (relatively speaking) for saving and loading registers, memory maps, and other resources
    * This independence helps ensure that a problem with one process doesn't corrupt or otherwise mess with another process
<br>
<img src="resources/threads/proc_thread.PNG">
<br>
* __Threads__ - Unit of execution _within_ a process.
    * Can vary from 1 to many for a single process
    * Each thread in a process shares the processes allocated memory and resources.
    * ___Each thread has its own stack, but all threads in a process will share the heap___
    * Sometimes called lightweight processes because of their dedicated stack. 
    * Share the same address space as the process and other threads, so communication is quick and easy
    * A problem with one thread in a process will definitely affect other threads and the viability of the process as a whole
* Single-threaded processes contain only 1 thread, meaning that the process and the thread are one and the same - there is only one thing happening
* Multithreaded processes contain more than one thread and is accomplishing a number of things at (almost) the same time
<br>
<img src="resources/threads/single_multi.PNG">
<br>
* Systems with multiple processors or CPU cores (common in modern processors) can execute multple processes or threads in parallel, but single processors can not
    * In the case of single processors, they run processes via a process scheduling algorithm that divides the CPU's time and gives the illusion of parallel execution (called _concurrency_)

## When to Choose Threads vs. Multiple Processes <a class="anchor" id="choose"></a>

* Tradeoffs between memory and resources and potentially speed  
* ex: Google Chrome, unlike most other browsers, runs each tab in its own process, rather than threads of a single process. 
    * Higher fixed cost in memory and resources but less memory bloat overall  
    * Better memory usage when memory is low 
        * inactive window is treated as a lower priority by the OS and becomes eligible to be swapped to disk when memory is needed for other processes
<br>
<img src="resources/threads/adv_disadv.PNG">
<br>

## Python's Global Interpreter Lock (GIL) <a class="anchor" id="gil"></a>

* Mutex (lock) that allows only one thread to hold control of the python interpreter
    * AKA - only one thread can be in a state of execution at any point in time
    * MUTEX means Mutually Exclusive Flag
* Performance bottleneck in CPU-bound and multi-threaded code

### Original Purpose <a class="anchor" id="purpose"></a>

* Python usees reference counting for memory management
    * Objects in python have a reference count variable that keeps track of the number of references that point to the object
    * When that count hits 0, the memory occupied by the object is released
* The reference count variable needed protection from [__race conditions__](#race) where two threads increase or decrease its value simultaneously
    * Consequences: Leaked memory that's never released, incorrectly released memory, or worse
    * Locks could be added to all data structures that are shared across threads to prevent them from being modified inconsistently
        * Having multiple locks can lead to __Deadlock__ 
        * Also decreased performance by repeated acquisition and release of locks
* GIL is a single lock on the interpreter itself which adds a rule that execution of any Python bytecode requires acquiring the interpreter lock
    * Prevents deadlocks, since there's only one lock
    * Doesn't add much performance overhead
    * Effectively makes any program single threaded
    * Pragmatic solution to what was a new problem at the time of development

### Impact on Multi-Threaded Python Programs <a class="anchor" id="impact"></a>

* __CPU-bound Programs__ - push the CPU to its limit
    * Includes programs that do mathematical computations like matrix multiplications, searching, image processing, etc.
    * Effectively single-threaded because of GIL
    * Will see an increase in execution time in comparison to an entirely single-threaded version of the same process
    * Typically better dealt with by multiprocessing
* __I/O-bound Programs__ - spend time waiting for input/output which can come fro ma user, file, database, netwoork, etc.
    * May have to wait for significant amounts of time for input due to source doing its own processing
    * Not highly impacted by GIL as the lock is shared between threads while they wait for the I/O
    * Frequently arise when working with something much slower than your CPU (typically file system and network connections)
    * Typically better dealt with using Threading
* There are also programs  with threads of each type,so Python implemented a force release
    * After a fixed interval of continuous use, a thread is forced to release the GIL and if no other thread acquires it , the same thread can continue usage

### Alternatives to Multi-Threading in Python <a class="anchor" id="alt"></a>                                            

* Multi-processing vs multi-threading
    * Each python process gets its own python interpreter and memory space ,so the GIL isn't a problem
    * Python has a `multiprocessing` module that does this easily
* Alternative Python interpreters
    * There are multple interpreter implementations written in a number of languages. If your program and its libraries is available for another implementation, you can try that

## Concurrency <a class="anchor" id="conc"></a>

### Concurrency and Parallelism <a class="anchor" id="cp"></a>

* __Concurrency__ - things happening simultaneously
    * In Python, those things have many different names (thread, task, process) but at a high level refer to instructions running in order.
* Different libraries that provide Python workarounds for allowing concurrency do so in different ways. Only in `multiprocessing` are events truly happening at the same time
* Pre-emptive multitasking - OS can pre-empt your thread to make a switch to a different thread
    * Used by `threading`
    * Don't have to hard code anything into the program to make the switch
    * Switch can happen at _any time_ even in the middle of trivial statements
* Cooperative multitasking - tasks cooperate by 'announcing' when they are ready to be switched out
    * Requires slight code change but you know switching will happen at specific times
     * Used by `asyncio`
* __Parallelism__ - Multiple processes running at the same time on different CPU cores
    * `multiprocessing` creates new processes
    * Because each process is different, they can run on a different core in true concurrency

## Multiprocessing <a class="anchor" id="multi"></a>

* Multiprocessing refers to the ability of a system to support more than one processor at the same time
    * Applications are broken to smaller chunks of code that run independently.
    * These chunks (threads) are allocated to multiple processors, improving prerformance

### Advantages of Multiprocessing <a class="anchor" id="multi2"></a>

1. Increased Throughput - By increasing the number of processors more work can be completed in the same time
2. Cost Saving - Parallel system shares the memory, buses, peripherals, etc. Also, if a number of programs operate on the same data, it is cheaper to store that data on one single disk and shared by all rather than using many copies
3. Increased reliability - If one processor fails then its failure may slightly slow down the speed of the system, but it will still work smoothly

## Threading in Python <a class="anchor" id="thread"></a>

* Python has a `threading` module which provides a lot of functionality in this realm
    * For a Python Threading Tutorial, see [this notebook](#http://localhost:8888/notebooks/Python/threading_tutorial.ipynb#)

### More On Threads <a class="anchor" id="thread2"></a>

* __Daemon Threads__ - shuts down immediately when a program exits
    * Can be considered a thread that runs in the background without worrying about having to manually shut it down
    * Programs running `Threads` that are not `daemons` will wait for those threads to complete before it terminates, whereas `daemon` threads are killed upon exiting
* When you want to specifically wait for a thread to stop, call `.join()`
    * Works for both ensuring that a thread is executed before a program closes and for making one thread wait until another is finished
* The order in which threads are run is determined by the OS and can be hard to predict
    * Can be coordinated in the `threading` module

### Using a ThreadPoolExecutor <a class="anchor" id="thread3"></a>

* Context manager which starts up a given number of worker threads
    * Uses `map` to step through an iterable, passing each one to a thread in the pool
    * Automativally does the `.join()` at the end
    * Can cause soem confusing errors - the exceptions are not always shown, so a process may just terminate with no output

### Race Conditions <a class="anchor" id="race"></a>

* Occur when two or more threads access a shared piece of data or resource
* Difficult to debug because they're typically rare and produce confusing results
* Oversimplified explanation
    * When you tell ThreadPoolExecutor to run each thread, you tell it which function to run and what parameters to pass to it
    * The reuslt of this is that each of the threads in a pool will call that function, which calls an instance method on the target object
    * Each thread will have a reference to the same object and will also have a unique index 
        * If you're only running a single thread, then there is (obviously) no concern for race conditions, so things like `time.sleep()` don't have an effect
    * When the thread starts running the function, it has its own version of all the data local to the function, which means that all variables that are scoped to a function are __thread safe__, but the shared target object is the problem
    * Essentially, the threads have interleaving access to a single shared object, overwriting each others' results
        * Similar race conditions can arise when one thread frees memory or closes a file handle before the other thread is finished accessing it 

#### Solving Race Conditions via Basic Synchronization using Lock <a class="anchor" id="thread4"></a>

* `Lock` essentially does the same thing as the GIL
    * Mutex object 
    * Only one thread at a time can have the `Lock` object, and any others must wait until it is released
    * Also operates as a context manager, so with certain syntax it will be released automatically when exited

#### Deadlock <a class="anchor" id="dead"></a>

* Describes the phenomenon occuring when a thread must wait for another thread to release the lock
* Typical causes:
    1. An implementation bug where a lock is not released properly
        * Reduce occurence by utilizing Lock as acontext manager
    2. A design issue where a utility function needs to be called by functions that might or might not already have the lock
        * Python has a second object designed to help with this (`RLock`) - essentially a fairness regulator for resource access

### Producer-Consumer Threading <a class="anchor" id="thread 6"></a>

* Standard computer science problem used to look at threading or process synchronization issues
* Producer - generates an output that is the consumer's input
* Consumer - inputs the Producer's output
* Between the two, there is a pipeline that transmits the relevant data
    * The pipeline can be altered to use different synchronization methods, like locks or queues
* For the lock method, there are two locks, `consumer_lock` and `producer_lock` which manager information transmission through the pipeline
* ___EXAMPLE___:
* Producer - Imagine a program that needs to read messages from a network and write them to a disk
    * Does not request a message when it wants, but must be 'listening' and accepting  messages as they come
    * Messages arrive in bursts, not regular pace
* Consumer - Imagine a program that writes the messages to a database
    * Access is slow, but fast enough to keep up to the average pace of messages
    * _Not_ fast enough to keep up with a burst of messages
* Between producer and consumer, there exists a pipeline 
    * This is what will change based on the synchronization objects 

#### The Lock Method <a class="anchor" id="pct1"></a>

* In this case, the Pipeline is just a class made to allow a single element between producer and consumer
1. Initialize pipeline with Producer, Consumer, and Message members and acquire the consumer lock
    * Essentially, the producer is allowed to put messages in the pipeline, but the consumer can't do any work until a message is present
2. Once the consumer has acquired the `.consumer_lock` it copies the message and then releases the `producer_lock`, allowing the producer to insert the next message into the pipeline
3. On the other side, the producer will notify the pipeline that it has a new message, acquire the `producer_lock`, send the message, then release the `consumer_lock` which allows the consumer to read the message

#### The Queue Method <a class="anchor" id='pct2'></a>

* Allows more than one value in the pipeline at a time because it grows and shrinks as data backs up from the producer
* Change Pipeline to a queue
    * The queue has a maxsize to limit memory usage, and is inherently thread-safe
1. `Event` allows one thread to signal an event while many other threads can be waiting for that event to happen
    * Waiting threads check status every once in a while, don't need to stop what they're doing
    * Producer and Consumer essentially loop until the event is triggered
2. Producer creates a number of messages and is swapped out either by queue reaching maxsize or by the OS
3. Consumer takes the messages from the pipeline
4. Producer finishes adding all messages to the pipeline and exits
5. Consumer continues its work until it has cleaned out the pipeline

### Threading Objects

* __Semaphore__ - Atomic counter with special lock acquisition/release based behaviors
    * AKA the OS can't swap the thread in the middle of incrementing/decrementing the counrer
    * If a thread tries to acquire a lock while the counter is 0, it will block until a different thread releases and increments the counter to 1
* __Timer__ - Schedules functions to be called after a certain amount of time has passed (but not necessarily at the exact desired time)
* __Barrier__ - Keeps a fixed number of threads in sync
    * Threads will be blocked until a specific number of them are waiting, then all are released at the same time
    * Still scheduled to run one at a time
    * Common use - Allows a pool of threads to initialize themselves, ensuring that none of the threads start running before all are finished with their initialization