# Threading & Multi-processing
**Threading**
* When we run something concurrently in thread, it is not running the code at the same time
* It just give the illusion of running the codes at the same time because when it comes to a point in time where it is waiting around, it will go ahead with the script and run other codes while the IO operations finish

** **

** CPU Bound**
* CPU bound tasks are operation that are crunching a lot of numbers and data and using CPU resources to do so
* If our tasks are CPU bound, we may not see a lot of benefits from threading
* Instead for CPU bound task, we will see more benefits from multiprocessing and run it in parallel instead

**IO Bound**
* IO bound task are tasks that are waiting for input and output operations to be completed and not really using the CPU resources
* Example of IO bound tasks are reading and writing from the file system, downloading materials from internet and other file system operations network
* For IO bound task, we will see more benefits from threading rather than multiprocessing

# #Threading

In [2]:
import threading
import concurrent.futures
import requests
import os
from timeit import default_timer as timer

**Threading**
* When we run something concurrently in thread, it is not running the code at the same time
* It just give the illusion of running the codes at the same time because when it comes to a point in time where it is waiting around, it will go ahead with the script and run other codes while the IO operations finish

** **

** CPU Bound**
* CPU bound tasks are operation that are crunching a lot of numbers and data and using CPU resources to do so
* If our tasks are CPU bound, we may not see a lot of benefits from threading
* Instead for CPU bound task, we will see more benefits from multiprocessing and run it in parallel instead

**IO Bound**
* IO bound task are tasks that are waiting for input and output operations to be completed and not really using the CPU resources
* Example of IO bound tasks are reading and writing from the file system, downloading materials from internet and other file system operations network
* For IO bound task, we will see more benefits from threading rather than multiprocessing

In [28]:
def do_something(): # a function to sleep for one second
    print('Sleeping for one second...')
    time.sleep(1) # to sleep for one second 
    print('Done Sleeping')

In [29]:
start = timer() # starting counter

do_something() 
do_something()
# the above functions run sequentially

finish = timer()
print(f'time taken: {round(finish - start,5)}s')

Sleeping for one second...
Done Sleeping
Sleeping for one second...
Done Sleeping
time taken: 2.00125s


In [30]:
start = timer() 

# we are creating the threading object
t1 = threading.Thread(target=do_something) # we want to refer to the function , NOT the function call
t2 = threading.Thread(target=do_something)

t1.start() # to start the thread
t2.start() # to start the thread

finish = timer()
print(f'time taken: {round(finish - start,5)}s')
# both threads start at almost the same time
# while the threads were sleeping, it went ahead with the rest of the script and continue with the rest of the codes to calculate the finish time

Sleeping for one second...
Sleeping for one second...
time taken: 0.00503s
Done Sleeping
Done Sleeping


In [31]:
start = timer() 

t1 = threading.Thread(target=do_something)
t2 = threading.Thread(target=do_something)

t1.start()
t2.start()

t1.join() # When join method is invoked, the calling thread is blocked till the thread object on which it was called is terminated. 
t2.join() # by putting the join method here, we ensure that the thread finishes before it went to the rest of the code

finish = timer()

time_taken = finish - start
print(f'time taken: {round(finish - start,5)}s')

"""
For example, when the join() is invoked from a main thread, the main thread 
waits till the child thread on which join is invoked exits. 
The significance of join() method is, if join() is not invoked, the main thread may 
exit before the child thread, which will result undetermined behaviour of programs and affect
program invariants and integrity of the data on which the program operates.
"""

Sleeping for one second...
Sleeping for one second...
Done Sleeping
Done Sleeping
time taken: 1.00899s


'\nFor example, when the join() is invoked from a main thread, the main thread \nwaits till the child thread on which join is invoked exits. \nThe significance of join() method is, if join() is not invoked, the main thread may \nexit before the child thread, which will result undetermined behaviour of programs and affect\nprogram invariants and integrity of the data on which the program operates.\n'

In [32]:
start = timer() 

# we are starting 10 threads here
# _ is a throwaway variable, we are not looping any variable here 

threads = []

for _ in range(10):
    t = threading.Thread(target=do_something) 
    t.start()
    threads.append(t)
    # we cannot do a t.join() within the loop becasue it will join on the main thread before looping through and creating the next thread which is as good as running the code synchronously
    # we need a way to start all the threads in the same loop and loop through the thread again and run the join method so that all 10 threads finishes before the end of the script

for thread in threads:
    thread.join()
# to make sure that all threads finishes before continuing with the rest of the scripts

finish = timer()

time_taken = finish - start
print(f'time taken: {round(finish - start,5)}s')

Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Sleeping for one second...
Done Sleeping
Done Sleeping
Done SleepingDone Sleeping

Done SleepingDone Sleeping

Done Sleeping
Done Sleeping
Done Sleeping
Done Sleeping
time taken: 1.02678s


In [33]:
def do_something_cool(seconds): # a function to sleep for one second
    print(f'Sleeping for {seconds} second(s)')
    time.sleep(seconds) # to sleep for one second 
    print('Done Sleeping')

start = timer() 

threads = []
for _ in range(10):
    t = threading.Thread(target=do_something_cool, args=[1.5]) # if there is an agruments in the function, you need to pass into a list
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

finish = timer()

#time_taken = finish - start
print(f'time taken: {round(finish - start,5)}s')

Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Sleeping for 1.5 second(s)
Done Sleeping
Done Sleeping
Done Sleeping
Done Sleeping
Done Sleeping
Done Sleeping
Done SleepingDone Sleeping

Done Sleeping
Done Sleeping
time taken: 1.5283s


In [35]:
def do_something_cool(seconds): # a function to sleep for one second
    print(f'Sleeping for {seconds} second(s)')
    time.sleep(seconds) # to sleep for one second 
    return f'Done Sleeping ...'
    
# A more efficient way of multithreading and multiprocessing

with concurrent.futures.ThreadPoolExecutor() as executor:
    f1 = executor.submit(do_something_cool, 1) # this execute the function once # we need to pass in the function and parameter. 1 is the parameter here. 
    f2 = executor.submit(do_something_cool, 1)
    print(f1.result())
    print(f2.result())
    
    # if we want to execute the function once at a time # the submit method schedules a function to be executed and returns a future object
    # the future object captures the execution of our function and allows us to check in on it after it is scheduled # we can check that it is running or if it is done and also allow us to check the results
    # the result method give us the return value of the function

Sleeping for 1 second(s)
Sleeping for 1 second(s)
Done Sleeping ...
Done Sleeping ...


In [36]:
with concurrent.futures.ThreadPoolExecutor() as exe:
    results = [exe.submit(do_something_cool, 1) for _ in range(10)] # to submit/execute the function 10 times

    for f in concurrent.futures.as_completed(results): # the as_completed method returns an iterable which contains the res
        print(f.result())

Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Sleeping for 1 second(s)
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...
Done Sleeping ...


In [40]:
def do_something_cool(seconds): 
    print(f'Sleeping for {seconds} second(s)')
    time.sleep(seconds) 
    return f'Done Sleeping {seconds}...'

with concurrent.futures.ThreadPoolExecutor() as exe:
    
    lst = [5,4,3,2,1]
    results = [exe.submit(do_something_cool, sec) for sec in lst]

    for f in concurrent.futures.as_completed(results):
        print(f.result()) # the results are printed in the order it is completed

Sleeping for 5 second(s)
Sleeping for 4 second(s)
Sleeping for 3 second(s)
Sleeping for 2 second(s)
Sleeping for 1 second(s)
Done Sleeping 1...
Done Sleeping 2...
Done Sleeping 3...
Done Sleeping 4...
Done Sleeping 5...


In [44]:
# we can also use the map function to map the function over each elements in the iterable

with concurrent.futures.ThreadPoolExecutor() as exe:
    lst = [5,4,3,2,1]
    results = exe.map(do_something_cool, lst) 
    # when we use the map function on the executor, it returns the results 
    # map will return the result in the order that they were started

    for f in results:
        print(f)

Sleeping for 5 second(s)
Sleeping for 4 second(s)
Sleeping for 3 second(s)
Sleeping for 2 second(s)
Sleeping for 1 second(s)
Done Sleeping 5...
Done Sleeping 4...
Done Sleeping 3...
Done Sleeping 2...
Done Sleeping 1...


In [4]:
# Real World Example of a Threading Example

os.chdir(r'C:\Users\tanzh\Documents\Python\image_download_threading')
start = timer()
img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

for url in img_urls:
    image_content = requests.get(url).content
    image_name = f"{url.split('/')[3]}.jpg"

    with open(image_name, 'wb') as f:
        f.write(image_content)
        print(f'{image_name} was downloaded')

end = timer()

print(end - start)

photo-1516117172878-fd2c41f4a759.jpg was downloaded
photo-1532009324734-20a7a5813719.jpg was downloaded
photo-1524429656589-6633a470097c.jpg was downloaded
photo-1530224264768-7ff8c1789d79.jpg was downloaded
photo-1564135624576-c5c88640f235.jpg was downloaded
photo-1541698444083-023c97d3f4b6.jpg was downloaded
photo-1522364723953-452d3431c267.jpg was downloaded
photo-1513938709626-033611b8cc03.jpg was downloaded
photo-1507143550189-fed454f93097.jpg was downloaded
photo-1493976040374-85c8e12f0c0e.jpg was downloaded
photo-1504198453319-5ce911bafcde.jpg was downloaded
photo-1530122037265-a5f1f91d3b99.jpg was downloaded
photo-1516972810927-80185027ca84.jpg was downloaded
photo-1550439062-609e1531270e.jpg was downloaded
photo-1549692520-acc6669e2f0c.jpg was downloaded
16.250105400000002


In [5]:
start = timer()

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

def download_image(url):
    image_content = requests.get(url).content
    image_name = f"{url.split('/')[3]}.jpg"

    with open(image_name, 'wb') as f:
        f.write(image_content)
        print(f'{image_name} was downloaded')    

with concurrent.futures.ThreadPoolExecutor() as exe:
    exe.map(download_image, img_urls)

end = timer()
print(end-start)

photo-1564135624576-c5c88640f235.jpg was downloaded
photo-1516117172878-fd2c41f4a759.jpg was downloaded
photo-1507143550189-fed454f93097.jpg was downloaded
photo-1549692520-acc6669e2f0c.jpg was downloaded
photo-1516972810927-80185027ca84.jpg was downloaded
photo-1504198453319-5ce911bafcde.jpg was downloaded
photo-1530224264768-7ff8c1789d79.jpg was downloaded
photo-1550439062-609e1531270e.jpg was downloaded
photo-1530122037265-a5f1f91d3b99.jpg was downloaded
photo-1524429656589-6633a470097c.jpg was downloaded
photo-1513938709626-033611b8cc03.jpg was downloaded
photo-1522364723953-452d3431c267.jpg was downloaded
photo-1532009324734-20a7a5813719.jpg was downloaded
photo-1493976040374-85c8e12f0c0e.jpg was downloaded
photo-1541698444083-023c97d3f4b6.jpg was downloaded
12.62778759999999
