# Assignment 1 - Threading and Multiprocessing

In this project, we will explore multithreading an multiprocessing difference. For that purpose, we have an imaginary colleage whose name is John, who asks for your help to increase the speed of his process while downloading images from internet.

John already has the code for serial-programming, however, he don't know concurrent programming and parallel programming! Help John to succeed in his mission by using multithreading and multiprocessing logic to increase the speed of his task.

He has two tasks:

1. Download images from internet
2. Resize them to 128x128 px. 


## Imports

In [1]:
import os
import utils

In [2]:
#pip install -r requirements.txt

## Global Variables

In [3]:
NUM_OF_IMAGES = 500 # max requests can be done per day is 12500
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()

Number of images was set to 500 because I had a hard time getting 1000 images.

## 1. Downloading Images from Internet (Threading)

In this section, we will download some images from internet. As network related tasks are considered as IO bound, it can be fasten by multithreading the downloading task. Our john already did serial way of downloading, it is your turn to do multithreading.

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [4]:
image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

In [5]:
print('Number of Image Links:', len(image_links))

Number of Image Links: 502


In [6]:
%%time

for image_link in image_links:
    utils.download_image_from_url(image_link, IMAGES_DIR)

CPU times: user 10.8 s, sys: 1.69 s, total: 12.5 s
Wall time: 1min 37s


Before doing the same task with multithreading the images from the serial code were removed from the images folder, makeing the folder empty. 

### Multithreading John's Task

In [7]:
from concurrent.futures import ThreadPoolExecutor

In [8]:
%%time

processes = []

with ThreadPoolExecutor(max_workers=10) as executor:
    for image_link in image_links:
        processes.append(executor.submit(utils.download_image_from_url, image_link, IMAGES_DIR))


CPU times: user 9.19 s, sys: 1.64 s, total: 10.8 s
Wall time: 33.5 s


## 2. Resizing (Multiprocessing)

In this part, we have to resize the images downloaded into another size, in this example case, it will be 128x128px. As CPU bound operations are generally considered as multiprocessing tasks, resizing suits exactly for this purpose!

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [9]:
%%time

image_path_list = os.listdir('images')

for image_path in image_path_list:
    utils.create_thumbnail((128, 128), os.path.join('images', image_path))

CPU times: user 5.44 s, sys: 252 ms, total: 5.69 s
Wall time: 5.71 s


Before doing the same task but with multiprocessing the resized images from the serial code were removed from the images folder.

### Multiprocessing John's Task

In [10]:
import multiprocessing

In [11]:
%%time
pool = multiprocessing.Pool(3)
for image_path in image_path_list:
    pool.apply_async(utils.create_thumbnail, args=((128, 128), os.path.join('images', image_path))) 


CPU times: user 13 ms, sys: 13.5 ms, total: 26.5 ms
Wall time: 32.6 ms


## Conclusion

John is very happy with your help and he wants to show his progress to his manager. Help him to create a dataframe/ table to present his results. 

Create a table to show differences between all four approaches and the time it took for those tasks. Table can be anything, as long as you show the differences, as in below.

|Description | Time 
|:----------- | :---- 
|Task 1 | 19.2 sec
|Task 2 | 3.2 sec
|Task N | 6.2 sec
|... | ...

In [12]:
import pandas as pd

df = pd.DataFrame([], columns = ['Description','Time'])
df = df.append({'Description': 'Serial Code Gather Images','Time':'1min 37s' },ignore_index=True)
df = df.append({'Description': 'Multithreading Gather Images','Time':'33.5 s' },ignore_index=True)
df = df.append({'Description': 'Serial Code Resize Images','Time':'5.71 s' },ignore_index=True)
df = df.append({'Description': 'Multiprocessing Resize Images','Time':'32.6 ms' },ignore_index=True)
df

Unnamed: 0,Description,Time
0,Serial Code Gather Images,1min 37s
1,Multithreading Gather Images,33.5 s
2,Serial Code Resize Images,5.71 s
3,Multiprocessing Resize Images,32.6 ms
