# Assignment 1 - Threading and Multiprocessing

In this project, we will explore multithreading an multiprocessing difference. For that purpose, we have an imaginary colleage whose name is John, who asks for your help to increase the speed of his process while downloading images from internet.

John already has the code for serial-programming, however, he don't know concurrent programming and parallel programming! Help John to succeed in his mission by using multithreading and multiprocessing logic to increase the speed of his task.

He has two tasks:

1. Download images from internet
2. Resize them to 128x128 px. 


## Imports

In [1]:
import os
import utils

## Global Variables

In [2]:
NUM_OF_IMAGES = 1000 # max requests can be done per day is 12500
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()

## 1. Downloading Images from Internet (Threading)

In this section, we will download some images from internet. As network related tasks are considered as IO bound, it can be fasten by multithreading the downloading task. Our john already did serial way of downloading, it is your turn to do multithreading.

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [4]:
%%time

image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

for image_link in image_links:
    utils.download_image_from_url(image_link, IMAGES_DIR)

too many requests, enough, or you can choose to put time.sleep() in here...
Wall time: 23min 29s


In [5]:
print('Number of images:', len(image_links))

Number of images: 830


### Multithreading John's Task

In [3]:
NUM_OF_IMAGES=830

In [4]:
%%time

import threading

imageurl=utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

def image():
    for imageu in imageurl:
         utils.download_image_from_url(imageu, IMAGES_DIR)
    
threads = []
for i in range(10):
    t1 = threading.Thread(target=image)
    threads.append(t1)
    t1.start()

too many requests, enough, or you can choose to put time.sleep() in here...
Wall time: 3min 43s


In [5]:
print('Number of images:', len(imageurl))

Number of images: 813


## 2. Resizing (Multiprocessing)

In this part, we have to resize the images downloaded into another size, in this example case, it will be 128x128px. As CPU bound operations are generally considered as multiprocessing tasks, resizing suits exactly for this purpose!

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

### Serial Code of John

In [5]:
%%time

# PS: time for 845 images : 10.1 s

image_path_list = os.listdir('images')

for image_path in image_path_list:
    utils.create_thumbnail((128, 128), os.path.join('images', image_path))

Wall time: 4min 41s


### Multiprocessing John's Task

In [4]:

%%time

import multiprocessing 
from multiprocessing import Pool

image_path_list = os.listdir('images')

if __name__=='__main__':
    for k in range(10):
        pool = Pool(9)
    for image_path in image_path_list:
        pool.apply_async(utils.create_thumbnail((128, 128), os.path.join('images', image_path)))
        


Wall time: 4min 32s


## Conclusion

John is very happy with your help and he wants to show his progress to his manager. Help him to create a dataframe/ table to present his results. 

Create a table to show differences between all four approaches and the time it took for those tasks. Table can be anything, as long as you show the differences, as in below.

|Description | Time 
|:----------- | :---- 
|Task 1 | 19.2 sec
|Task 2 | 3.2 sec
|Task N | 6.2 sec
|... | ...

In [8]:
import pandas as pd

df = pd.DataFrame([], columns = ['Description','Time'])
df = df.append({'Description': 'Johns code to download image','Time':'23min 29s', },ignore_index=True)
df = df.append({'Description': 'Multithreading to download image','Time':'3min 43s' },ignore_index=True)
df = df.append({'Description': ' Resize Images','Time':'4min 41s' },ignore_index=True)
df = df.append({'Description': 'Multiprocessing to Resize Images','Time':'4min 32s' },ignore_index=True)
df

Unnamed: 0,Description,Time
0,Johns code to download image,23min 29s
1,Multithreading to download image,3min 43s
2,Resize Images,4min 41s
3,Multiprocessing to Resize Images,4min 32s
