### Homework 07: Concurrency

## Due Date: Apr 13, 2020, 08:00am

#### Firstname Lastname: Chengwei Chen

#### E-mail: cc6576@nyu.edu

#### Enter your solutions and submit this notebook


---

**Problem 1** **(60 Points)**

Let us consider the Gamma function, or the Euler integral of the second kind: 

$$\Gamma(x) = \int_{0} ^ \infty t ^{x - 1} e^{-t} dt, $$

and in this HW we consider real $x > 0$.

(Here is more on the Gamma function https://en.wikipedia.org/wiki/Gamma_function .
It is not needed for this HW assignment.) 

**1.1 (Points 15)**: 

Write a function (in the cell below) that sequentially calculates the given Gamma integral.


In [1]:
import numpy as np
from time import time

In [2]:
def calculate_gamma(x, bound_1, bound_2, number_of_steps):
    # sequential version to calculate Gamma(x):
    # where we approximate the given integral,
    # like this a discrete sum in number_of_steps
    # equidistant points on the interval [bound_1, bound_2]
    
    # return Gamma(x)
    ts = time()
    gamma = 0
    for i in np.linspace(bound_1, bound_2, number_of_steps):
        gamma = gamma + ((bound_2-bound_1)/number_of_steps)*(i**(x-1))*np.exp(-i)
    print('Took {}ms'.format(time() - ts))
    return gamma

**1.2 (Points 5)** 

Evaluate, $\Gamma(6)$ by using `calculate_gamma(x, bound_1, bound_2, number_of_steps)` and the error of this computation.


As arguments, use `x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000`. We know that $\Gamma(x) = x!$, so $\Gamma(6) = 5! = 120$. 


In [3]:
gamma_6 = calculate_gamma(x=6, bound_1=0, bound_2=1000, number_of_steps=10000000)

Took 39.265263080596924ms


In [4]:
print(gamma_6)
print("error = ", 120-gamma_6)

119.99998799994694
error =  1.2000053061456128e-05


---

Write two functions to calculate $\Gamma(x)$ by using:



**1.3.1 (Points 15)**
**threading** with N=4 threads; 

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

    

In [5]:
x = 6
bound_1 = 0
bound_2 = 1000
number_of_steps = 10000000

In [6]:
# Thread version
from queue import Queue
from threading import Thread
from threading import Lock

In [7]:
lock = Lock()
gamma = 0
def thread(q):
    while True:
        global gamma
        chuck = q.get()
        for i in chuck:
            lock.acquire()
            gamma = gamma + ((bound_2-bound_1)/number_of_steps)*(i**(x-1))*np.exp(-i)
            lock.release()
        q.task_done()

chucks = [np.linspace(bound_1, bound_2, number_of_steps)[i:int(i+number_of_steps/4)] for i in range(bound_1, number_of_steps, int(number_of_steps/4))]
        
ts = time()
q = Queue()
num_threads = 4

for i in range(num_threads):
    worker = Thread(target=thread, args=(q, ))
    worker.setDaemon(True)
    worker.start()

for chuck in chucks:
    q.put(chuck)

q.join()
print(gamma, '-->', time()-ts,'ms')
print("error = ", 120-gamma)

119.99998799994694 --> 40.321210861206055 ms
error =  1.2000053061456128e-05


In [8]:
# Multiprocessing version
from multiprocessing.pool import Pool 
import functools

In [9]:
def multi_processes(x, bound_1, bound_2, number_of_steps, chuck):
    gamma = 0
    for i in chuck:
        gamma = gamma + ((bound_2-bound_1)/number_of_steps)*(i**(x-1))*np.exp(-i)
    return gamma

chucks = [np.linspace(bound_1, bound_2, number_of_steps)[i:int(i+number_of_steps/4)] for i in range(bound_1, number_of_steps, int(number_of_steps/4))]

multi_processes_gamma = functools.partial(multi_processes, x, bound_1, bound_2, number_of_steps)
ts = time()
with Pool(4) as p:
    results = p.map(multi_processes_gamma, chucks)

print(sum(results), '-->', time()-ts,'ms')
print("error = ", 120-sum(results))

119.99998799994694 --> 25.4341881275177 ms
error =  1.2000053061456128e-05


------------

**Explanation:**

Comparing times of the three versions, multi-processing takes minimal time 25.434188. Threading takes more time than sequential version. 

------------

In [10]:
## N = 8
# Thread version
lock = Lock()
gamma = 0
def thread(q):
    while True:
        global gamma
        chuck = q.get()
        for i in chuck:
            lock.acquire()
            gamma = gamma + ((bound_2-bound_1)/number_of_steps)*(i**(x-1))*np.exp(-i) 
            lock.release()
        q.task_done()

chucks = [np.linspace(bound_1, bound_2, number_of_steps)[i:int(i+number_of_steps/8)] for i in range(bound_1, number_of_steps, int(number_of_steps/8))]
        
ts = time()
q = Queue()
num_threads = 8

for i in range(num_threads):
    worker = Thread(target=thread, args=(q, ))
    worker.setDaemon(True)
    worker.start()

for chuck in chucks:
    q.put(chuck)

q.join()
print(gamma, '-->', time()-ts,'ms')
print("error = ", 120-gamma)

119.99998799994694 --> 42.29123306274414 ms
error =  1.2000053061456128e-05


In [11]:
## N = 8
# Multiprocessing version
def multi_processes(x, bound_1, bound_2, number_of_steps, chuck):
    gamma = 0
    for i in chuck:
        gamma = gamma + ((bound_2-bound_1)/number_of_steps)*(i**(x-1))*np.exp(-i)
    return gamma

chucks = [np.linspace(bound_1, bound_2, number_of_steps)[i:int(i+number_of_steps/8)] for i in range(bound_1, number_of_steps, int(number_of_steps/8))]

multi_processes_gamma = functools.partial(multi_processes, x, bound_1, bound_2, number_of_steps)
ts = time()
with Pool(8) as p:
    results = p.map(multi_processes_gamma, chucks)

print(sum(results), '-->', time()-ts,'ms')
print("error = ", 120-sum(results))

119.99998799994694 --> 23.738280057907104 ms
error =  1.2000053061456128e-05


------------

**Explanation:**

After changing number to 8, we observe that the threading method increases runtime, and multi-processing decreases runtime comparing number of 4. The reason why multi-processing has better performance is that the program is performing a task that was CPU bound and truly parallel in Python.


------------

---

**Problem 2 (40 points)**

__Website uptime__ is the time that a website or web service is available to the users over a given period.

The task is to build an application that checks the uptime of websites. 

- The application should go over a list of website URLs and checks if those websites are up.
- Instead of performing a classic HTTP GET request, it performs a HEAD request so that it does not affect traffic significantly.
- If the HTTP status is in the danger ranges (400+, 500+), a message is casted. 

Here are some useful functions:

In [12]:
#### _website uptimer_ ####

import time
import logging
import requests
 
class WebsiteDownException(Exception):
    pass
 
def ping_website(address, timeout=20):
    """
    Check if a website is down. A website is considered down 
    if either the status_code >= 400 or if the timeout expires
     
    Throw a WebsiteDownException if any of the website down conditions are met
    """
    try:
        response = requests.head(address, timeout=timeout)
        if response.status_code >= 400:
            logging.warning("Website %s returned status_code=%s" % (address, response.status_code))
            raise WebsiteDownException()
    except requests.exceptions.RequestException:
        logging.warning("Timeout expired for website %s" % address)
        raise WebsiteDownException()
         
def check_website(address):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    try:
        ping_website(address)
    except WebsiteDownException:
        print('The websie ' + address + ' is down')

---

You need a website list to try our system out. Create your own list or use the following one. 

---

In [13]:
WEBSITE_LIST = [
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://really-cool-available-domain.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://shopify.com',
    'http://another-really-interesting-domain.co',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

---

A serial version of the _website uptimer_ can be written as: 

---


In [14]:
import time
 
start_time = time.time()
 
for address in WEBSITE_LIST:
    check_website(address)
         
end_time = time.time()        
 
print("Time for Serial: %ssecs" % (end_time - start_time))



The websie http://netflix.com is down
Time for Serial: 3.8986947536468506secs


You should build two versions of the **website uptimer**, by using:

**2.1 (Points 15)**
**threading** with N=4 threads; 

**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?


In [15]:
from time import time
# Thread version
def thread_web(q):
    while True:
        chuck = q.get()
        check_website(chuck)
        q.task_done()
        
ts = time()
q = Queue()
num_threads = 4

for i in range(num_threads):
    worker = Thread(target=thread_web, args=(q, ))
    worker.setDaemon(True)
    worker.start()
    
for web in WEBSITE_LIST:
    q.put(web)
    
q.join()

print("Time for Serial: %ssecs" % (time() - ts))



The websie http://netflix.com is down
Time for Serial: 1.5670547485351562secs


In [16]:
from time import time
# Multiprocessing version
ts = time()
with Pool(4) as p:
    results = p.map(check_website, WEBSITE_LIST)

print("Time for Serial: %ssecs" % (time() - ts))



The websie http://netflix.com is down
Time for Serial: 0.967993974685669secs


------------

**Explanation:**

Comparing times of the three versions, both threading and multi-processing version take less time than serial version, and the multi-processing version takes minimal time 0.96799.

------------

In [17]:
## N = 8
# Thread version
from time import time
def thread_web(q):
    while True:
        chuck = q.get()
        check_website(chuck)
        q.task_done()
        
ts = time()
q = Queue()
num_threads = 8

for i in range(num_threads):
    worker = Thread(target=thread_web, args=(q, ))
    worker.setDaemon(True)
    worker.start()
    
for web in WEBSITE_LIST:
    q.put(web)
    
q.join()

print("Time for Serial: %ssecs" % (time() - ts))



The websie http://netflix.com is down
Time for Serial: 0.7486000061035156secs


In [18]:
## N = 8
# Multiprocessing version
from time import time

ts = time()
with Pool(8) as p:
    results = p.map(check_website, WEBSITE_LIST)

print("Time for Serial: %ssecs" % (time() - ts))



The websie http://netflix.com is down
Time for Serial: 0.9068851470947266secs


------------

**Explanation:**

After changing to number of 8, both threading and multi-processing version improve performance, but the threading version improve performance greater than multi-processing, 1.567054748 --> 0.748600006. 

Python has a GIL, which makes one thread to be executed at a time throughout this process. The majority of the time is spent waiting for the network. Also threads lower memory requirements, as they share the same memory space. This is why threading can provide a large speed increase.

------------