In [2]:
import time
import random
import threading
from threading import Thread

In [10]:
dir(Thread)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_bootstrap',
 '_bootstrap_inner',
 '_delete',
 '_initialized',
 '_reset_internal_locks',
 '_set_ident',
 '_set_native_id',
 '_set_tstate_lock',
 '_stop',
 '_wait_for_tstate_lock',
 'daemon',
 'getName',
 'ident',
 'isDaemon',
 'is_alive',
 'join',
 'name',
 'native_id',
 'run',
 'setDaemon',
 'setName',
 'start']

In Python 3, `threading` is the module used to create and use threads. There's a low level module `_thread` but it's not recommended to use it directly. I'm mentioning it just as a warning, **don't use `_thread`!**.

The most important class in the `threading` module is: `Thread` (doh!).

Very simplified, this is how a thread is instantiated:

```python
class Thread:
    def __init__(self, target, name=None, args=(), kwargs={}):
        pass
```
(there's a `group` argument which should be always `None`, as it's reserved for future use)

In this case, `target` is the function that will be executed in that particular thread.

Once a thread has been _created_ (instantiated), we'll need to `start()` it in order for it to begin to process.

#### Basic example of a thread

In [11]:
def simple_worker():
    print('hello', flush=True)
    time.sleep(2)
    print('world', flush=True)

In [20]:
t1 = Thread(target=simple_worker)

Start the thread with `start` and it will run in the background. In the meantime, we can do some other things

In [21]:
t1.start()
print(2+2)

hello
4


world


Wait until the thread terminates with `join`

In [19]:
t1.join()

#### Running multiple threads in parallel

In [23]:
def simple_worker():
    time.sleep(random.random() * 5)
    value = random.randint(0, 99)
    print(f'My value: {value}')

In [29]:
threads = [Thread(target=simple_worker) for _ in range(5)]

In [30]:
[t.start() for t in threads]
print(2+2)

4


My value: 7
My value: 17
My value: 95
My value: 80
My value: 26


In [31]:
[t.join() for t in threads]

[None, None, None, None, None]

#### Thread States

A thread can be in multiple states, when a thread has just been created, its state is `"ready"`:

In [47]:
def simple_worker():

    print('Thread running...')
    time.sleep(6)
    print('Thread finished...')

In [48]:
t = Thread(target=simple_worker)

A thread is only alive after it's started

In [49]:
t.is_alive()

False

In [50]:
t.start()

Thread running...


In [51]:
t.is_alive()

True

Let's wait for it

In [52]:
t.join()

Thread finished...


A thread that has finished can't be started again, as shown in the following example:

In [53]:
try:
    t.start()
except Exception as e:
    print(f"Error: {e}")

Error: threads can only be started once


**Important:** It's not possible(\*) to manage thread states manually, for example, stopping a thread. A thread always has to run its natural cycle.

(\*) You might find hacks in the internet on how to stop threads, but **it's a bad practice**. We'll discuss more later.

#### Thread Identity

The thread class has two attributes that lets us identify each thread. The human-ready `name`, which we can set when we construct the thread, and the machine-oriented `ident` one.  
This is very useful for debugging

In [54]:
def simple_worker():
    print('Thread running...')
    time.sleep(5)
    print('Thread exiting...')

In [55]:
t = Thread(target=simple_worker)

In [56]:
t.name

'Thread-26 (simple_worker)'

`ident` will be `None` until we run the thread.

In [57]:
t.ident is None

True

In [58]:
t.start()

Thread running...


Thread exiting...


In [59]:
t.name

'Thread-26 (simple_worker)'

In [60]:
t.ident

140704470791744

We can create a thread and assign a custom name to it:

In [65]:
t = Thread(target=simple_worker, name='PyCon 2020 Tutorial!')

In [68]:
t.start()

Thread running...


In [69]:
t.name

'PyCon 2020 Tutorial!'

In [70]:
t.ident

140704470791744

Thread exiting...


#### A thread knows itself

It's also possible to know the identity of the thread from within the thread itself. It might be counter intuitive as we don't have the reference to the created object, but the module function `threading.currentThread()` will provide access to it.

In [71]:
def simple_worker():
    sleep_secs = random.randint(1, 5)
    myself = threading.current_thread()
    ident = threading.get_ident()
    print(f"I am thread {myself.name} (ID {ident}), and I'm sleeping for {sleep_secs}.")
    time.sleep(sleep_secs)
    print(f'Thread {myself.name} exiting...')

In [72]:
t1 = Thread(target=simple_worker, name='Bubbles')
t2 = Thread(target=simple_worker, name='Blossom')
t3 = Thread(target=simple_worker, name='Buttercup')

In [73]:
t1.start()

I am thread Bubbles (ID 140704470791744), and I'm sleeping for 4.


In [74]:
t2.start()

I am thread Blossom (ID 140705019172416), and I'm sleeping for 1.


Thread Blossom exiting...


In [75]:
t3.start()

I am thread Buttercup (ID 140705019172416), and I'm sleeping for 4.


Thread Bubbles exiting...
Thread Buttercup exiting...


In [76]:
print('Waiting...')

Waiting...


#### Passing parameters to threads

Passing parameters is simple with the thread constructor, just use the `args` argument:

In [77]:
def simple_worker(time_to_sleep):
    myself = threading.current_thread()
    ident = threading.get_ident()
    print(f"I am thread {myself.name} (ID {ident}), and I'm sleeping for {time_to_sleep}.")
    time.sleep(time_to_sleep)
    print(f'Thread {myself.name} exiting...')

In [78]:
t1 = Thread(target=simple_worker, name='Bubbles', args=(3, ))
t2 = Thread(target=simple_worker, name='Blossom', args=(1.5, ))
t3 = Thread(target=simple_worker, name='Buttercup', args=(2, ))

In [79]:
t1.start()

I am thread Bubbles (ID 140705019172416), and I'm sleeping for 3.


In [80]:
t2.start()

I am thread Blossom (ID 140704470791744), and I'm sleeping for 1.5.


In [81]:
t3.start()

I am thread Buttercup (ID 140705041200704), and I'm sleeping for 2.


Thread Blossom exiting...
Thread Buttercup exiting...
Thread Bubbles exiting...


#### Subclassing `Thread`

So far, the way we've created threads is by passing a `target` function to be executed. There's an alternative, more OOP-way to do it, which is extending the Thread class:

In [85]:
class MyThread(Thread):
    def __init__(self, time_to_sleep, name=None):
        super().__init__(name=name)
        self.time_to_sleep = time_to_sleep
        
    def run(self):
        ident = threading.get_ident()
        print(f"I am thread {self.name} (ID {ident}), and I'm sleeping for {self.time_to_sleep} secs.")
        time.sleep(self.time_to_sleep)
        print(f'Thread {self.name} exiting...')

In [89]:
t1 = MyThread(2, name="t1")
t2 = MyThread(3, name="t2")

In [90]:
t1.start()

I am thread t1 (ID 140705041200704), and I'm sleeping for 2 secs.


In [91]:
t2.start()
print(2+2)

I am thread t2 (ID 140705019172416), and I'm sleeping for 3 secs.
4


Thread t1 exiting...
Thread t2 exiting...


## Shared Data

As we'll see, **`Threads` can access shared data within the process they live in**. Example:

In [107]:
TIME_TO_SLEEP = 2.5
EXIT_THREAD = True  # change this to True to exit the threads

I am thread Buttercup, and I'm sleeping for 2.5.


Thread Bubbles exiting...
Thread Blossom exiting...
Thread Buttercup exiting...


In [102]:
def simple_worker():
    while not EXIT_THREAD:
        myself = threading.current_thread()
        print(f"I am thread {myself.name}, and I'm sleeping for {TIME_TO_SLEEP}.")
        time.sleep(TIME_TO_SLEEP)
    print(f'Thread {myself.name} exiting...')

In [103]:
t1 = Thread(target=simple_worker, name='Bubbles')
t2 = Thread(target=simple_worker, name='Blossom')
t3 = Thread(target=simple_worker, name='Buttercup')

In [104]:
t1.start()

I am thread Bubbles, and I'm sleeping for 2.5.


In [105]:
t2.start()

I am thread Blossom, and I'm sleeping for 2.5.


In [106]:
t3.start()

I am thread Buttercup, and I'm sleeping for 2.5.


I am thread Bubbles, and I'm sleeping for 2.5.
I am thread Blossom, and I'm sleeping for 2.5.
I am thread Buttercup, and I'm sleeping for 2.5.
I am thread Bubbles, and I'm sleeping for 2.5.
I am thread Blossom, and I'm sleeping for 2.5.
I am thread Buttercup, and I'm sleeping for 2.5.
I am thread Bubbles, and I'm sleeping for 2.5.
I am thread Blossom, and I'm sleeping for 2.5.
I am thread Buttercup, and I'm sleeping for 2.5.
I am thread Bubbles, and I'm sleeping for 2.5.
I am thread Blossom, and I'm sleeping for 2.5.
I am thread Buttercup, and I'm sleeping for 2.5.
I am thread Bubbles, and I'm sleeping for 2.5.
I am thread Blossom, and I'm sleeping for 2.5.
I am thread Buttercup, and I'm sleeping for 2.5.


How is this possible?

Remember, all threads live **within the same process**, and the variable `TIME_TO_SLEEP` is stored in the process. So all the threads created can access that variable.

<img src="img/thread_shared_data.png" width=900 />

## A real example

In the `crypto-examples` directory, we've included a real example of a web server that contains prices of different cryptocurrencies. You can run it by `cd` into `crypt-examples` and run `python flask_app.py --sleep [sleep in seconds]`. The server can be slowed down on purpose to simulate a real slow server. For the following example, we use `python flask_app.py --sleep 2`

Let's check how to get one price as an example:

In [112]:
BASE_URL = "http://localhost:5000"

In [113]:
import requests

In [114]:
resp = requests.get(f"{BASE_URL}/price/bitfinex/btc/2020-04-01")

In [115]:
resp

<Response [200]>

In [116]:
resp.json()

{'exchange': 'bitfinex',
 'symbol': 'btc',
 'open': 6386.3,
 'high': 6512.8,
 'low': 6325.7,
 'close': 6409.8,
 'volume': 127.05448768,
 'day': '2020-04-01'}

Now, let's suppose we want to get the price of Bitcoin from 3 different exchanges: `bitfinex`, `bitstamp` and `kraken`. 


### Sequential Requests
The sequential requests would take us 6 seconds (with a sleep param of 2).

In [117]:
EXCHANGES = ['bitfinex', 'bitstamp', 'kraken']

In [140]:
start = time.time()

In [141]:
for exchange in EXCHANGES:
    resp = requests.get(f"{BASE_URL}/price/{exchange}/btc/2020-04-01")
    print(f"{exchange.title()}: ${resp.json()['close']}")
end = time.time()

Bitfinex: $6409.8
Bitstamp: $6421.14
Kraken: $6401.9


In [142]:
print(f"sequential requests take {end - start:.2f}s")

sequential requests take 6.75s


### Multithread Requests

Let's now move it to threads! For now, we'll just **print** the output, as we'll se data sharing in further lessons...

In [121]:
def check_price(exchange, symbol, date, base_url=BASE_URL):
    resp = requests.get(f"{base_url}/price/{exchange}/{symbol}/{date}")
    print(f"{exchange.title()}: ${resp.json()['close']}")

In [122]:
check_price('bitfinex', 'btc', '2020-04-01')

Bitfinex: $6409.8


In [143]:
threads = [
    Thread(target=check_price, args=(exchange, 'btc', '2020-04-01'))
    for exchange in EXCHANGES
]

In [144]:
start = time.time()

In [145]:
[t.start() for t in threads];

In [146]:
[t.join() for t in threads];
end = time.time()

Bitfinex: $6409.8
Bitstamp: $6421.14
Kraken: $6401.9


In [147]:
print(f"multi-thread requests take {end - start:.2f}s")

multi-thread requests take 2.35s


Note: We can't be sure which request will finish first

#### How many threads can we start?

Let's say we need to get prices for 10 exchanges, 3 symbols, for a total of 30 days. Those are a lot of requests:

In [None]:
10 * 3 * 30

Can we start 900 threads all at once? Sadly, we can't. Each threads consumes resources and too many threads are usually a problem.

So, what can we do when we need to process too many concurrent jobs? We'll create workers and use a consumer-producer model. But first, we need to talk about shared data, race conditions and synchronization...

## Summary:

* `threading` module ✅
* `_thread`  module ❌

A thread's life cycle is Instantiated > Started > Running > Finished.