# Assignment 1 - Threading and Multiprocessing
In this project, we will explore multithreading an multiprocessing difference. For that purpose, we have an imaginary colleage whose name is John, who asks for your help to increase the speed of his process while downloading images from internet.

John already has the code for serial-programming, however, he don't know concurrent programming and parallel programming! Help John to succeed in his mission by using multithreading and multiprocessing logic to increase the speed of his task.

He has two tasks:
Download images from internet

Resize them to 128x128 px.

# Imports

In [31]:
import os
import utils

# Global Variables

In [32]:
# Import 1000 images to test the code

NUM_OF_IMAGES = 1000
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()

# 1. Downloading Images from Internet (Threading)
In this section, we will download some images from internet. As network related tasks are considered as IO bound, it can be fasten by multithreading the downloading task. Our john already did serial way of downloading, it is your turn to do multithreading.

You are free to choose any library you want. Your success will be based on your ability to beat John's timing.

Serial Code of John

In [33]:
%%time

image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

for image_link in image_links:
    utils.download_image_from_url(image_link, IMAGES_DIR)

too many requests, enough, or you can choose to put time.sleep() in here...
Wall time: 5.11 s


# Multithreading John's Task

In [34]:
%%time

import os
import utils
import urllib.request
from concurrent.futures import ThreadPoolExecutor

NUM_OF_IMAGES = 1000
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()


image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

with ThreadPoolExecutor(4) as executor:
    results = executor.map(urllib.request.urlopen, image_links)

too many requests, enough, or you can choose to put time.sleep() in here...
Wall time: 1.21 s


In [35]:
%%time

with ThreadPoolExecutor(6) as executor:
    results = executor.map(urllib.request.urlopen, image_links)

Wall time: 687 ms


In [36]:
%%time

with ThreadPoolExecutor(8) as executor:
    results = executor.map(urllib.request.urlopen, image_links)

Wall time: 766 ms


In [37]:
%%time

with ThreadPoolExecutor(16) as executor:
    results = executor.map(urllib.request.urlopen, image_links)

Wall time: 767 ms


# Serial Code of John

In [38]:
%%time

image_path_list = os.listdir('images')

for image_path in image_path_list:
    utils.create_thumbnail((128, 128), os.path.join('images', image_path))

Wall time: 6.67 s


# Multiprocessing John's Task

In [2]:
%%time
import os
import utils
import multiprocessing


NUM_OF_IMAGES = 1000
CLIENT_ID = utils.get_imgur_client_id()
IMAGES_DIR = utils.create_download_dir()

image_links = utils.build_link_list(CLIENT_ID, NUM_OF_IMAGES)

pool = multiprocessing.Pool(4)
for image_link in image_links:
    pool.apply_async(utils.create_thumbnail, args=((128, 128), os.path.join('images', image_link)))

too many requests, enough, or you can choose to put time.sleep() in here...
Wall time: 600 ms


In [10]:
import pandas as pd 
data = [['Download images by John','5.11s'],['Download images using Multithreading','1.21 s' ],['Resize Image Johns code','6.67 s'],['Multiprocessing','600 ms']]
df = pd.DataFrame(data,columns=['Method','Time'])
print(df)

                                 Method    Time
0               Download images by John   5.11s
1  Download images using Multithreading  1.21 s
2               Resize Image Johns code  6.67 s
3                       Multiprocessing  600 ms
