# Assignment 1 - Threading and Multiprocessing

In this project, we will explore multithreading an multiprocessing difference. For that purpose, we have an imaginary colleage whose name is John, who asks for your help to increase the speed of his process while downloading images from internet.

John already has the code for serial-programming, however, he don't know concurrent programming and parallel programming! Help John to succeed in his mission by using multithreading and multiprocessing logic to increase the speed of his task.

He has two tasks:

1. Download images from internet
2. Resize them to 128x128 px. 


## Imports

In [1]:
import os
import utils

## Global Variables

In [2]:
NUM_OF_IMAGES = 500 # max requests can be done per day is 12500
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()

## 1. Downloading Images from Internet (Threading)

In this section, we will download some images from internet. As network related tasks are considered as IO bound, it can be fasten by multithreading the downloading task. Our john already did serial way of downloading, it is your turn to do multithreading.

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [4]:
%%time

image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

for image_link in image_links:
    utils.download_image_from_url(image_link, IMAGES_DIR)

Wall time: 10min 48s


### Multithreading John's Task

In [14]:
import multiprocessing
import time

starttime=time.time()
processes=[]

for image_link in image_links:
    p=multiprocessing.Process(target=utils.download_image_from_url(image_link, IMAGES_DIR))
    processes.append(p)
    p.start()

for process in processes:
    process.join()
    
print("It Takes{} second".format(time.time()-starttime))
    
    
## todo: your code goes here

It Takes183.54300546646118 second


## 2. Resizing (Multiprocessing)

In this part, we have to resize the images downloaded into another size, in this example case, it will be 128x128px. As CPU bound operations are generally considered as multiprocessing tasks, resizing suits exactly for this purpose!

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [2]:
%%time

# PS: time for 845 images : 10.1 s

image_path_list = os.listdir('images')

for image_path in image_path_list:
    utils.create_thumbnail((128, 128), os.path.join('images', image_path))

Wall time: 50.9 s


### Multithreading John's Task

In [3]:
%%time
import concurrent.futures

with concurrent.futures.ProcessPoolExecutor() as Ex:
    for image_path in image_path_list:
        result=Ex.submit(utils.create_thumbnail((128, 128), os.path.join('images', image_path)))


# todo: your code goes here

Wall time: 15.3 s


## Conclusion

John is very happy with your help and he wants to show his progress to his manager. Help him to create a dataframe/ table to present his results. 

Create a table to show differences between all four approaches and the time it took for those tasks. Table can be anything, as long as you show the differences, as in below.

|Description | Time 
|:----------- | :---- 
|Task 1 | 19.2 sec
|Task 2 | 3.2 sec
|Task N | 6.2 sec
|... | ...

In [8]:
import pandas as pd

df=pd.DataFrame([('10min 48s','50.9 s'),('183.54s','15.3 s')],index=["John","Ho Man"],columns=("Task 1","Task 2"))
df

Unnamed: 0,Task 1,Task 2
John,10min 48s,50.9 s
Ho Man,183.54s,15.3 s
