# Thread Basics

In this lesson, we look at the `threading` module, and think about threads at a low-level.  In other lessons, higher level abstractions around the basics are addressed.

Note that underneath `threading` lives an even lower-level module `_thread`.  As the initial underscore suggests, `_thread` is really only for developers of Python itself, or perhaps custom low-level libraries.  Use `threading` or "above."

In [144]:
from time import sleep
from random import choice, random, randint
import threading
from threading import Thread

In interface, a thread constructor looks like this:

```python
class Thread:
    def __init__(self, target, name=None, args=(), kwargs={}):
        pass
```

The `target` argument is the function that will be executed in that particular thread.  Alternately, a subclass of `Thread` might define a `.run()` method rather than using a passed in function; passing in an executable is generally more flexible.

Once a thread has been _created_ (instantiated), we need to `start()` it in order for it to begin to process.

## Basic example of a thread

In [145]:
def worker():
    print('hello', flush=True)
    time.sleep(2)
    print('world', flush=True)

In [146]:
t1 = Thread(target=worker)

In [147]:
t1.start()

hello
world


## Running multiple threads concurrently

Although you *can* launch one thread as an indirect way to run one function, the reason you want to is to allow concurrency of multiple threads.

In [173]:
def worker(favcolor=None):
    time.sleep(random() * 5)
    value = randint(0, 99)
    name = threading.current_thread().name
    print(f'My name is {name}; '
          f'computation is {value}; '
          f'favorite color is {favcolor}')

In [174]:
worker()

My name is MainThread; computation is 82; favorite color is None


So far, this is just a function, albeit one that reports the name of the thread it runs in.  If your code never mentions `threading` or related modules, your program runs in one thread called "MainThread".  However, let us create several threads that utilize this function.

In [154]:
threads = [Thread(target=worker, 
                  name=f"thread-{i}", 
                  kwargs={'favcolor': choice(['blue', 'green', 'yellow'])}) 
           for i in range(5)]
threads

[<Thread(thread-0, initial)>,
 <Thread(thread-1, initial)>,
 <Thread(thread-2, initial)>,
 <Thread(thread-3, initial)>,
 <Thread(thread-4, initial)>]

In [155]:
for thread in threads:
    thread.start()

My name is thread-0; computation is 54; favorite color is blue
My name is thread-4; computation is 8; favorite color is blue
My name is thread-2; computation is 8; favorite color is yellow
My name is thread-1; computation is 72; favorite color is green
My name is thread-3; computation is 4; favorite color is blue


## Thread states

A thread is alive when it is running, but not otherwise.  There is both a time "before its birth" and a time "after its termination" and it is alive during neither of those times. Creating a thread only reserves resources like memory for it; it has to be `.start()`'ed to actually become alive and carry out actions.

In [159]:
t = Thread(target=worker)
t.is_alive()

False

In [160]:
t.start()
t.is_alive()

True

My name is Thread-51; computation is 62; favorite color is None


In [163]:
time.sleep(5)  # Must wait for output from above...
t.is_alive()

False

A thread that has finished can't be started again, as shown in the following example:

In [162]:
try:
    t.start()
except Exception as err:
    print(repr(err))

RuntimeError('threads can only be started once')


## Joining threads

There is an element of the lifetime of threads that we brushed over.  Sometimes you would like to wait within the MainThread for a thread to finish; or indeed, wait within some child thread for some grandchild to finish.  For example:

In [164]:
t = Thread(target=worker)
print("About to start...")
t.start()
print("All done")

About to start...
All done?
My name is Thread-52; computation is 22; favorite color is None


That was not as intended. The "All done" message printed before the thread was done.  We *could* manually wait on it:

In [165]:
t = Thread(target=worker)
print("About to start...")
t.start()
while t.is_alive():
    sleep(0.01)
print("All done")

About to start...
My name is Thread-53; computation is 14; favorite color is None
All done


There is a better, shorter, and more explicit way in `.join`'ing a thread.

In [166]:
t = Thread(target=worker)
print("About to start...")
t.start()
t.join() # Don't continue until thread completes
print("All done")

About to start...
My name is Thread-54; computation is 96; favorite color is None
All done


## Thread identity

The thread class has two attributes that lets us identify each thread. The human-ready `name`, which we can set when we construct the thread, and the machine-oriented `ident` one.  The name is set immediately, either to a passed in value or a generated human-readable one.  The `ident` is None until the thread actually starts.

In [170]:
t = Thread(target=worker)
print("Automatic thread name:", t.name)
print("Thread identity:", t.ident)

print("Starting thread...")
t.start()
print("Thread identity:", t.ident)

Automatic thread name: Thread-57
Thread identity: None
Starting thread...
Thread identity: 133674540541696
My name is Thread-57; computation is 76; favorite color is None


## Subclassing `Thread`

In prior examples,  we created threads is by passing a `target` function to be executed. An alternative, and object oriented, way to do it is to extend the Thread class:

In [180]:
class MyThread(Thread):
    def __init__(self, time_to_sleep, name=None):
        super().__init__(name=name)
        self.time_to_sleep = time_to_sleep
        
    def run(self):
        ident = threading.current_thread().ident
        print(f"I am thread {self.name} (ID {ident}), "
              f"sleeping for {self.time_to_sleep} seconds.")
        sleep(self.time_to_sleep)
        print(f'Thread {self.name} exiting...')

In [181]:
t = MyThread(2, name="Santiago")
t.start()

I am thread Santiago (ID 133674540541696), sleeping for 2 seconds.
Thread Santiago exiting...


## Shared Data

As we'll see, Threads can access shared data within the process they live in. Example:

In [41]:
TIME_TO_SLEEP = 2

In [42]:
def simple_worker():
    myself = threading.current_thread()
    print(f"I am thread {myself.name}, and I'm sleeping for {TIME_TO_SLEEP}.")
    time.sleep(TIME_TO_SLEEP)
    print(f'Thread {myself.name} exiting...')

In [43]:
t1 = Thread(target=simple_worker, name='Bubbles')
t2 = Thread(target=simple_worker, name='Blossom')
t3 = Thread(target=simple_worker, name='Buttercup')

In [44]:
t1.start()

I am thread Bubbles, and I'm sleeping for 2.


In [45]:
t2.start()

I am thread Blossom, and I'm sleeping for 2.


In [46]:
t3.start()

I am thread Buttercup, and I'm sleeping for 2.


How is this possible?

Remember, all threads live **within the same process**, and the variable `TIME_TO_SLEEP` is stored in the process. So all the threads created can access that variable.

## A real example

In the `crypto-examples` directory, we've included a real example of a web server that contains prices of different cryptocurrencies. You can run it with `python flask_app.py --sleep [sleep in seconds]`. The server can be slowed down on purpose to simulate a real slow server.

Let's check how to get one price as an example:

In [47]:
BASE_URL = "http://localhost:5000"

In [48]:
import requests

In [64]:
resp = requests.get(f"{BASE_URL}/price/bitfinex/btc/2020-04-01")

In [50]:
resp

<Response [200]>

In [51]:
resp.json()

{'exchange': 'bitfinex',
 'symbol': 'btc',
 'open': 6386.3,
 'high': 6512.8,
 'low': 6325.7,
 'close': 6409.8,
 'volume': 127.05448768,
 'day': '2020-04-01'}

Now, let's suppose we want to get the price of Bitcoin from 3 different exchanges: `bitfinex`, `bitstamp` and `kraken`. The sequential requests would take us 6 seconds (with a sleep param in 2).

In [79]:
%%time
for exchange in ['bitfinex', 'bitstamp', 'kraken']:
    url = f"{BASE_URL}/price/{exchange}/btc/2020-04-01"
    resp = requests.get(url)
    print(f"{exchange.title()}: ${resp.json()['close']}")

Bitfinex: $6409.8
Bitstamp: $6421.14
Kraken: $6401.9
CPU times: user 32.3 ms, sys: 14.2 ms, total: 46.5 ms
Wall time: 6.69 s


Let's now move it to threads! For now, we'll just **print** the output, as we'll se data sharing in further lessons...

In [80]:
%%time
def check_price(exchange, symbol, date, base_url=BASE_URL):
    "General price-checker for given exchange, symbol, and date"
    url = f"{base_url}/price/{exchange}/{symbol}/{date}"
    resp = requests.get(f"{base_url}/price/{exchange}/{symbol}/{date}")
    print(f"{exchange.title()}: ${resp.json()['close']}")

check_price('bitfinex', 'btc', '2020-04-01')

Bitfinex: $6409.8
CPU times: user 8.56 ms, sys: 8.28 ms, total: 16.8 ms
Wall time: 2.23 s


In [90]:
threads = [
    Thread(target=check_price, args=(exchange, 'btc', '2020-04-01'))
    for exchange in EXCHANGES
]

print("Starting threads...")
%time for t in threads: t.start()
print('-----')

print("Joining threads...")
%time for t in threads: t.join()

Starting threads...
CPU times: user 13.8 ms, sys: 2.64 ms, total: 16.4 ms
Wall time: 10.9 ms
-----
Joining threads...
Bitfinex: $6409.8
Kraken: $6401.9
Bitstamp: $6421.14
CPU times: user 9.2 ms, sys: 5.52 ms, total: 14.7 ms
Wall time: 2.25 s


## How many threads can we start?

Let's say we need to get prices for 10 exchanges, 3 symbols, for a total of 30 days. 

We cannot efficiently start 900 threads at once, however. Each threads consumes resources and too many threads tend to bog down your system.

So, what can we do when we need to process too many concurrent jobs? We'll create workers and use a consumer-producer model. But first, we need to talk about shared data, race conditions and synchronization...

## Summary:

* `threading` module ✅
* `_thread`  module ❌

A thread's life cycle is Instantiated > Started > Running > Finished.