# Parallel programming in Python 
## Thread-based parallelism using the concurrent.futures (& threading module)


The [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) modules provides interfaces for running tasks using pools of thread or process workers. The APIs are the same, so applications can switch between threads and processes with minimal changes.

The [threading](https://docs.python.org/3/library/threading.html) module - includes a high-level, object oriented, API for working with concurrency from Python. Thread objects run concurrently within the same process and share memory with other thread objects. Using threads is an easy way to scale for tasks that are more I/O bound than CPU bound. The python threading module is used to manage the execution of threads within a process. It allows a program to run multiple operations concurrently in the same process space.

----

In [1]:
import threading

def worker():
    """thread worker function"""
    print('Worker')


threads = []
for i in range(5):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

Worker
Worker
Worker
Worker
Worker


It is useful to be able to spawn a thread and pass it arguments to tell it what work to do. Any type of object can be passed as argument to the thread. This example passes a number, which the thread prints.

In [14]:
import threading
import time

def worker(number):
    """thread worker function"""
    time.sleep(number)
    print('Worker: %s' % number)
    
threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=[i])
    threads.append(t)
    t.start()
    

Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4


In order to identify the current thread one can use:

In [13]:
print(threading.current_thread().getName())
print(threading.current_thread().native_id)

MainThread
29843


In [4]:
import threading
import time

def worker(number):
    """thread worker function"""
    time.sleep(number)
    print('Worker name: {}'.format(threading.current_thread().getName()))
    print('Worker native id: {}'.format(threading.current_thread().native_id))
    print('Worker number: {}'.format(number))
    

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=[i],name="ben_{}".format(i))
    threads.append(t)
    t.start()
    

Worker name: ben_0
Worker native id: 613
Worker number: 0
Worker name: ben_1
Worker native id: 614
Worker number: 1
Worker name: ben_2
Worker native id: 615
Worker number: 2
Worker name: ben_3
Worker native id: 616
Worker number: 3
Worker name: ben_4
Worker native id: 617
Worker number: 4


----

## Deamons vs. Non-deamons threads

Up to this point, the examples above have implicitly waited for all threads to complete their work before exiting (these are called Non-deamon threads). Sometimes it is beneficial for programs to spawn a thread as a daemon which will run without blocking the main program from exiting.

Using daemon threads is useful for services where there may not be an easy way to interrupt the thread, or where letting the thread die in the middle of its work does not lose or corrupt data (for example, a thread that generates “heart beats” for a service monitoring tool). To mark a thread as a daemon, pass `daemon=True` when constructing it or call its `set_daemon()` method with `True`. 

The default is for threads (in the threading module) to not be daemons.

----

In [51]:
%%writefile tdeamons.py
import threading
import time
import logging

def daemon(duration):
    print('Starting {} and sleeping for {} secs'.format(threading.current_thread().getName(),duration))
    time.sleep(duration)
    print('Exiting',threading.current_thread().getName())


def non_daemon(duration):
    print('Starting {} and sleeping for {} secs'.format(threading.current_thread().getName(),duration))
    time.sleep(duration)
    print('Exiting',threading.current_thread().getName())

d = threading.Thread(name='daemon', target=daemon,args=[5],daemon=True)

t = threading.Thread(name='non-daemon', target=non_daemon,args=[2],daemon=False)

d.start()
t.start()

d.join()

Overwriting tdeamons.py


If you want the main program (thread) to wait until a daemon thread has completed its work, use the `join()` method. (Try it in the previous example!)

- By default, `join()` blocks indefinitely. It is also possible to pass a float value representing the number of seconds to wait for the thread to become inactive. If the thread does not complete within the timeout period, `join()` returns anyway.
- `join()` is useful not only with deamon threads, as it can act as a barrier for non-Daemon threads.

In [52]:
!python tdeamons.py

Starting daemon and sleeping for 5 secs
Starting non-daemon and sleeping for 2 secs
Exiting non-daemon
Exiting daemon


----

## Enumerating over all active threads

It is not necessary to retain an explicit handle to all of the daemon threads in order to ensure they have completed before exiting the main process. `enumerate()` returns a list of active Thread instances. The list includes the current thread, and since joining the current thread introduces a deadlock situation, it must be skipped.

----

In [5]:
import threading
#print out all active threads
for t in threading.enumerate():
    print(t)

<_MainThread(MainThread, started 22446638413632)>
<Thread(Thread-3, started daemon 22446595532544)>
<Heartbeat(Thread-4, started daemon 22446593431296)>
<Thread(Thread-5, started daemon 22446587127552)>
<Thread(Thread-6, started daemon 22446585026304)>
<ControlThread(Thread-2, started daemon 22446582925056)>
<HistorySavingThread(IPythonHistorySavingThread, started 22446580823808)>
<ParentPollerUnix(Thread-1, started daemon 22446578460416)>


In [7]:
%%writefile print_active_threads.py
import threading
#print out all active threads
for t in threading.enumerate():
    print(t)

Overwriting print_active_threads.py


In [8]:
!python print_active_threads.py

<_MainThread(MainThread, started 23021584615232)>


# concurrent.futures

[concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html)

The concurrent.futures modules provides interfaces for running tasks using pools of thread or process workers. The APIs are the same, so applications can switch between threads and processes with minimal changes.

The module provides two types of classes for interacting with the pools. 

```Executors``` are used for managing pools of workers, and ```futures``` are used for managing results computed by the workers. 

To use a pool of workers, an application creates an instance of the appropriate executor class and then submits tasks for it to run. When each task is started, a Future instance is returned. 

When the result of the task is needed, an application can 

### Executor Objects 

An abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its concrete subclasses.



### Future Objects

The Future class encapsulates the asynchronous execution of a callable. Future instances are created by `Executor.submit()`

### ThreadPoolExecutor

ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. ThreadPoolExecutor manages a set of worker threads, passing tasks to them as they become available for more work. 

This example uses map() to concurrently produce a set of results from an input iterable. The task uses ```time.sleep()``` to pause a different amount of time to demonstrate that, regardless of the order of execution of concurrent tasks, ```map()``` always returns the values in order based on the inputs.

In [57]:
from concurrent import futures
import threading
import time


def task(n):
    print('{} with ID {}: is sleeping for {} seconds'.format(threading.current_thread().name,threading.current_thread().native_id,n))
    time.sleep(n)
    print('{}: done with {}'.format(threading.current_thread().name,n))
    return n * 10.


#Initiate the threads
ex = futures.ThreadPoolExecutor(max_workers=3)
print('{}: is starting work'.format(threading.current_thread().getName()))

#Start the threads with the map method
results = ex.map(task, range(2,5))

print('{}: is waiting for the results'.format(threading.current_thread().getName()))

real_results = list(results)
print('main: results: {}'.format(real_results))

MainThread: is starting work
ThreadPoolExecutor-9_0 with ID 7257: is sleeping for 2 seconds
ThreadPoolExecutor-9_1 with ID 7258: is sleeping for 3 seconds
ThreadPoolExecutor-9_2 with ID 7259: is sleeping for 4 seconds
MainThread: is waiting for the results
ThreadPoolExecutor-9_0: done with 2
ThreadPoolExecutor-9_1: done with 3
ThreadPoolExecutor-9_2: done with 4
main: results: [20.0, 30.0, 40.0]


### example uses of submit

In [13]:
from concurrent import futures
import threading
import time


def task(n):
    print('{}: sleeping {}'.format(threading.current_thread().name,n))
    time.sleep(n)
    print('{}: done with {}'.format(threading.current_thread().name,n))
    return n * 10.

ex = futures.ThreadPoolExecutor(max_workers=2)

print('{}: Starting'.format(threading.current_thread().name))
f = ex.submit(task, 10)

print('future object: {}'.format(threading.current_thread().name,f))
print('{}: waiting for results'.format(threading.current_thread().name))
result = f.result()


print('{}: result: {}'.format(threading.current_thread().name,result))
print('{}: future after result: {}'.format(threading.current_thread().name,f))

MainThread: Starting
ThreadPoolExecutor-9_0: sleeping 10
future object: MainThread
MainThread: waiting for results
ThreadPoolExecutor-9_0: done with 10
MainThread: result: 100.0
MainThread: future after result: <Future at 0x14e40b1c52b0 state=finished returned float>


### LET OP deadlocks ahead!


In [21]:
%%writefile future_deadlock.py
import time
from concurrent import futures

def wait_on_b():
    time.sleep(5)
    print(b.result())  # b will never complete because it is waiting on a.
    return 5

def wait_on_a():
    time.sleep(5)
    print(a.result())  # a will never complete because it is waiting on b.
    return 6


executor = futures.ThreadPoolExecutor(max_workers=2)
a = executor.submit(wait_on_b)
b = executor.submit(wait_on_a)

Overwriting future_deadlock.py


In [22]:
!python future_deadlock.py

^C
Exception ignored in: <module 'threading' from '/sw/arch/Debian10/EB_production/2021/software/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/sw/arch/Debian10/EB_production/2021/software/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/threading.py", line 1415, in _shutdown
    atexit_call()
  File "/sw/arch/Debian10/EB_production/2021/software/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/sw/arch/Debian10/EB_production/2021/software/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/threading.py", line 1033, in join
    self._wait_for_tstate_lock()
  File "/sw/arch/Debian10/EB_production/2021/software/Python/3.9.5-GCCcore-10.3.0/lib/python3.9/threading.py", line 1049, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt: 
