# Agenda

1. Concurrency and parallelism in programming in general, and Python in particular
2. Basic threads
3. Joining threads
4. Switching threads and the GIL
5. Sharing data (and other resources)
6. Producer-consumer 
7. Events and timers
8. Multiprocessing
9. `concurrent.futures` and making life easier for ourselves

# Concurrency and parallelism in Python

- Concurency means: I have several things that I want to be tracking at once, even if they're not necessarily executing at once.
- Parallelism means: I have several things that I want to be tracking at once, *AND* they should also be executing at once.

If you want true parallel execution on a computer, then you need multiple cores (processors).  But you'll probably have more processes running than cores anyway, which means that the computer needs to keep track of each process, swapping it in and out of memory to the CPUs.

How can we have multiple things happen in our program, so that we can break a problem apart and deal with it using concurrency?
- The oldest, and most traditional, way is to use *processes*.  The good news is that each process runs separately, with its own memory, and is independent of other processes.  This means that the computer can decide which core runs which process, and when.  The problem is that there's a lot of overhead to that -- it takes more memory, and switching requires more time + resources.
- A newer way to do things is *threads*.  If your OS runs multiple processes, then your process can contain multiple threads. The idea is that the OS tells a process that it now has a chance to run, and then inside of that process, each thread gets a chance to run.  The advantage of threads is that they're much lighter weight, and thus it's easier to switch between them.  Plus, because they are in the same process, they can share memory.

Threads weren't ever popular in the Unix world.  But they became super popular among Windows programmers and in the Java world.  The combination forced Unix people to admit that maybe threads aren't that bad.

# Simple example of threads

To use threads in Python, we need:

- the `threading` module
- a function we want to run in a thread (i.e., not serially, but in parallel with the "main thread")


In [3]:
# let's run the function serially -- meaning, our Python interpreter will consist of 
# one process and one thread.  It'll run our function 5 times.

def hello():
    print('Hello!')
    
for i in range(5):
    hello()

Hello!
Hello!
Hello!
Hello!
Hello!


In [5]:
# now let's run our function 5 times, but each time we do that, we're going to 
# do so inside of a new thread.

# Meaning: We're not going to run the function ourselves, directly.  We're going to
# create a new Thread object, and hand it the function we want to run.  The 
# Thread object will run the function on our behalf inside of a new thread


In [6]:
import threading     # the module we need to work with threads

def hello():
    print('Hello!')

t = threading.Thread(target=hello)    # the function "hello" is the argument we pass to "target"
t.start()                             # ask t to run our function in a new thread

Hello!


In [12]:
# let's run our function 5 times, as before, each time in its own thread

def hello():
    print('Hello!\n', end='')     # don't add \n to the end of print
    
for i in range(5):
    t = threading.Thread(target=hello)
    t.start()   

Hello!
Hello!
Hello!
Hello!
Hello!


In [15]:
# let's prove that we are running concurrently
# how? We'll add time.sleep to our function call
# then, we'll see if the functions run in order
# we'll also add a number to the function call, so we can identify the threads

import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))   # sleep 0-3 seconds
    print(f'{n} Hello!\n', end='')     # don't add \n to the end of print
    
for i in range(5):
    t = threading.Thread(target=hello, args=(i,))   # run the function with the argument of i
    t.start()   

2 Hello!
3 Hello!
0 Hello!
1 Hello!
4 Hello!


# Things to know

1. We can name our threads, which makes them easier to identify just pass "name=" when we create a new thread object.
2. We can always get the currently running thread object from `threading.current_thread()`.  I can get the name of the current thread with `threading.current_thread().name`
3. Because we're not running our function directly, but are rather outsourcing it to the threading system, our function will not return to us.  Any returned value will be ignored.

# Exercise: Hello and goodbye

1. Write two functions, `hello` and `goodbye`, similar to my `hello` here -- it'll take an integer as an ID number, and it'll `time.sleep` for a random number of seconds.  (Keep it small!)
2. Launch 5 threads for each of these functions.
3. You should have a total of 10 lines printed out, 5 from `hello` and 5 from `goodbye`.

In [17]:
import threading
import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))
    print(f'{n} Hello!\n', end='')
    
def goodbye(n):
    time.sleep(random.randint(0, 3))
    print(f'{n} Goodbye!\n', end='')
    
for i in range(5):
    t = threading.Thread(target=hello, args=(i,), name=f'hello-{i}')
    t.start()
    
for i in range(5):
    t = threading.Thread(target=goodbye, args=(i,), name=f'goodbye-{i}')
    t.start() 


0 Goodbye!
3 Goodbye!
2 Goodbye!
1 Goodbye!
4 Hello!
4 Goodbye!
0 Hello!
1 Hello!
2 Hello!
3 Hello!


main thread
hello-0
hello-1
hello-2
hello-3
hello-4
goodbye-0
goodbye-1
goodbye-2
goodbye-3
goodbye-4


# When do threads give up the CPU?

1. When 5ms (about) pass.  Python will allow a thread to run as many bytecodes as it wants within that time slice. But as soon as the time is up, the current thread can finish the current bytecode, and then it gives up the CPU.
2. Every time a thread handles I/O (input/output), it gives us control of the CPU to another thread. That's because I/O (disk/network/screen) takes so long compared with everything else, it's not worth keeping the CPU when we'll be waiting.

When we say `print('a')` in Python, we're basically saying: (1) Print the string `a`, then (2) print the `'\n'`.  Because these are two separate outputs to I/O, the thread often (not always, but often) gives up control.

In [18]:
import dis  # disassembly module in Python 

dis.dis(hello)   # show me the bytecodes for the "hello" function

  6           0 LOAD_GLOBAL              0 (time)
              2 LOAD_METHOD              1 (sleep)
              4 LOAD_GLOBAL              2 (random)
              6 LOAD_METHOD              3 (randint)
              8 LOAD_CONST               1 (0)
             10 LOAD_CONST               2 (3)
             12 CALL_METHOD              2
             14 CALL_METHOD              1
             16 POP_TOP

  7          18 LOAD_GLOBAL              4 (print)
             20 LOAD_FAST                0 (n)
             22 FORMAT_VALUE             0
             24 LOAD_CONST               3 (' Hello!\n')
             26 BUILD_STRING             2
             28 LOAD_CONST               4 ('')
             30 LOAD_CONST               5 (('end',))
             32 CALL_FUNCTION_KW         2
             34 POP_TOP
             36 LOAD_CONST               0 (None)
             38 RETURN_VALUE


In [None]:
# for i in range(5):
#     t1 = threading.Thread(target=hello, args=(i,))  # run the function with the argument of i
#     t2 = threading.Thread(target=goodbye, args=(i,))   # run the function with the argument of i
#     t1.start()
#     t2.start()

In [20]:
# let's print something to the user when we're done!

import threading
import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))   # sleep 0-3 seconds
    print(f'{n} Hello!\n', end='')     # don't add \n to the end of print
    
for i in range(5):
    t = threading.Thread(target=hello, args=(i,))   # run the function with the argument of i
    t.start()   
    
time.sleep(7)   # don't do this!
print('*** DONE! ***')    

1 Hello!
3 Hello!
0 Hello!
4 Hello!
2 Hello!
*** DONE! ***


In [22]:
# what do I do, in order to wait for all of the threads to complete?
# I can use "join"
# "join" is a method we can run on a thread object
# it means: I'll wait for you to finish

import threading
import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))   # sleep 0-3 seconds
    print(f'{n} Hello!\n', end='')     # don't add \n to the end of print
    
for i in range(5):
    t = threading.Thread(target=hello, args=(i,))   # run the function with the argument of i
    t.start()   
    t.join()   # wait for this thread to run, and then go create a new one
    
print('*** DONE! ***')    

0 Hello!
1 Hello!
2 Hello!
3 Hello!
4 Hello!
*** DONE! ***


In [25]:
# what do I do, in order to wait for all of the threads to complete?
# I can use "join"
# "join" is a method we can run on a thread object
# it means: I'll wait for you to finish

# I'm going to put all threads in a list
# then I'll iterate over that list, joining each thread (meaning: wait for the thread to finish)
# when I'm done joining each thread, I know that they're all done

import threading
import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))   # sleep 0-3 seconds
    print(f'{n} Hello!\n', end='')     # don't add \n to the end of print
    
all_threads = []
for i in range(5):
    t = threading.Thread(target=hello, args=(i,), name=f'hello-{i}')   # run the function with the argument of i
    t.start()   
    all_threads.append(t)
    
# Now go through each thread, and join it (i.e., wait for it)
for one_thread in all_threads:
    print(f'\tNow joining {one_thread.name}')
    one_thread.join()    # join blocks -- it hangs, waiting for the thread to finish
    
# by the time I reach line 28, I'm guaranteed that all of the threads are done
print('*** DONE! ***')    

0 Hello!
2 Hello!
3 Hello!
	Now joining hello-0
	Now joining hello-1
4 Hello!
1 Hello!
	Now joining hello-2
	Now joining hello-3
	Now joining hello-4
*** DONE! ***


In [27]:
# another way: invoke join with an argument, a float that tells join how long we're willing to wait
# if that much time passes without the thread ending, we'll be able to go onto another thread


import threading
import time
import random

def hello(n):
    time.sleep(random.randint(0, 3))   # sleep 0-3 seconds
    print(f'{n} Hello!\n', end='')     # don't add \n to the end of print
    
all_threads = []
for i in range(5):
    t = threading.Thread(target=hello, args=(i,), name=f'hello-{i}')   # run the function with the argument of i
    t.start()   
    all_threads.append(t)
    
# go through all_threads as many times as we need, giving each
# thread a chance to be joined.  When it's joined, we remove the 
# thread from the all_threads list

while all_threads:   # meaning: so long as the list is non-empty
    for one_thread in all_threads:
        one_thread.join(0.1)       # wait 0.1 seconds for the thread to end
        if not one_thread.is_alive():
            print(f'\tRemoved {one_thread.name}')
            all_threads.remove(one_thread)
    
# BELOW HERE, we *know* that all threads have ended
print('*** DONE! ***')    

4 Hello!
	Removed hello-4
3 Hello!
	Removed hello-3
0 Hello!
1 Hello!
2 Hello!
	Removed hello-2
	Removed hello-0
	Removed hello-1
*** DONE! ***
