### Homework 07: Concurrency

## Due Date: Apr 5, 2023, 11:59pm

#### Firstname Lastname: Seonhye Yang

#### E-mail: sy3420@nyu.edu

#### Enter your solutions and submit this notebook


---

**Problem 1** **(60 Points)**

Let us consider the Gamma function, or the Euler integral of the second kind: 

$$\Gamma(x) = \int_{0} ^ \infty t ^{x - 1} e^{-t} dt, $$

and in this HW we consider real $x > 0$.

(Here is more on the Gamma function https://en.wikipedia.org/wiki/Gamma_function .
It is not needed for this HW assignment.) 

**1.1 (Points 15)**: 

Write a function (in the cell below) that sequentially calculates the given Gamma integral.


In [50]:
from math import exp
import numpy as np
import pandas as pd
from multiprocessing import cpu_count
from queue import Queue
from threading import Thread
from multiprocessing.pool import Pool
import logging
import time
import math
from threading import Lock

In [51]:
def calculate_gamma(x, bound_1, bound_2, number_of_steps):
    # sequential version to calculate Gamma(x):
    # where we approximate the given integral,
    # like this a discrete sum in number_of_steps
    # equidistant points on the interval [bound_1, bound_2]
    
    bounds = (bound_2 - bound_1)/number_of_steps
    gamma = 0
    
    for i in range(number_of_steps):
        t_i = bound_1 + (i + 0.5) * bounds
        gamma = gamma + (t_i ** (x - 1)) * exp(-t_i) * bounds
        
    
    # return Gamma(x)
    return gamma

**1.2 (Points 5)** 

Evaluate, $\Gamma(6)$ by using `calculate_gamma(x, bound_1, bound_2, number_of_steps)` and the error of this computation.


As arguments, use `x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000`. We know that $\Gamma(x) = x!$, so $\Gamma(6) = 5! = 120$. 


In [52]:
gamma_function = calculate_gamma(6, bound_1 = 0, bound_2 = 1000, number_of_steps = 10000000)

print(gamma_function)

119.9999999999461


In [67]:
#calculating the error
print('error for this computation is', gamma_function - math.factorial(5))


error for this computation is -5.39017719347612e-11


---

Write two functions to calculate $\Gamma(x)$ by using:



**1.3.1 (Points 15)**
**threading** with N=4 threads; 

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

    

**1.3.1**

In [54]:
step = 1000/10000000
bound = np.arange(0,1000+step,step)
intervals = []
for i in range(0,len(bound)-1000,1000):
    results = bound[i:i+1000+1]
    intervals.append(results)

In [62]:
y = 0
def gamma_threading(x, a, step):
    global y
    while True:
        a = q.get()
        if a is None:
            q.task_done()
            break
        for i in range(len(a)-1):
            res1 = (a[i]**(x-1))*math.exp(-a[i])
            res2 = (a[i+1]**(x-1))*math.exp(-a[i+1])

            height = res1 if res2 >= res1 else res2
            with Lock():
                y += height*step
        q.task_done()

In [63]:
start_time = time.time()
q=Queue()
num_threads=4
for i in range(num_threads):
    worker = Thread(target=gamma_threading,args=(6,q,step))
    worker.setDaemon(True)
    worker.start()

for a in intervals:
    q.put(a)

q.join()
end_time = time.time()

print(f'N threads = 4 result is ... {y}')
print(f'Time taken is {end_time - start_time} seconds')


N threads = 4 result is ... 119.71997895833506
Time taken is 19.72249698638916 seconds


In [65]:
start_time = time.time()
q=Queue()
num_threads=8
for i in range(num_threads):
    worker = Thread(target=gamma_threading,args=(6,q,step))
    worker.setDaemon(True)
    worker.start()

for a in intervals:
    q.put(a)

q.join()
end_time = time.time()

print(f'N threads = 4 result is ... {y}')
print(f'Time taken is {end_time - start_time} seconds')


N threads = 4 result is ... 118.2323173190342
Time taken is 19.934449911117554 seconds


**1.3.2**

In [19]:
def gamma_multi(x,a):
    j=0
    for i in range(len(a)-1):
        res1 = (a[i]**(x-1))*math.exp(-a[i])
        res2 = (a[i+1]**(x-1))*math.exp(-a[i+1])

        height = res1 if res2 >= res1 else res2
        j+=height*step
    return j


In [20]:
step = 1000/10000000
bound = np.arange(0,1000+step,step)
intervals = []
for i in range(0,len(bound)-1000,1000):
    results = bound[i:i+1000+1]
    intervals.append(results)

In [21]:
start_time = time.time()

thread_n = 4
lis = []
for a in intervals:
    lis.append((6, a))

with Pool(thread_n) as p:
    results = p.starmap(gamma_multi,lis)


end_time = time.time()

print(f'N threads = {thread_n} result is ... {np.sum(results)}')
print(f'Time taken is {end_time - start_time} seconds')

N threads = 4 result is ... 119.9978943915628
Finished in 4.431375026702881 seconds


In [23]:
start_time = time.time()

thread_n = 8
lis = []
for a in intervals:
    lis.append((6, a))

with Pool(thread_n) as p:
    results = p.starmap(gamma_multi,lis)
    
end_time = time.time()

print(f'N threads = {thread_n} result is ... {np.sum(results)}')
print(f'Time taken is {end_time - start_time} seconds')

N threads = 8 result is ... 119.9978943915628
Finished in 3.314188003540039 seconds


**1.3.3** 

What I am noticing here is that multiprocess takes a shorter amount of time than that of threading. Threading takes about 19 seconds on average and multiprocessing takes about 4 seconds. 

As we increase n to 8, the timing decreases a little bit. This depends on various factors. When N is increased to 8 beyound the number of avaiilable CPU cores, it might deteriorate. However, if the workload is sufficiently parallelizable and the number of available CPU cores is not entirely utilized, increased N can lead to improvements in performance.

---

**Problem 2 (40 points)**

__Website uptime__ is the time that a website or web service is available to the users over a given period.

The task is to build an application that checks the uptime of websites. 

- The application should go over a list of website URLs and checks if those websites are up.
- Instead of performing a classic HTTP GET request, it performs a HEAD request so that it does not affect traffic significantly.
- If the HTTP status is in the danger ranges (400+, 500+), a message is casted. 

Here are some useful functions:

In [24]:
#### _website uptimer_ ####

import time
import logging
import requests
 
class WebsiteDownException(Exception):
    pass
 
def ping_website(address, timeout=20):
    """
    Check if a website is down. A website is considered down 
    if either the status_code >= 400 or if the timeout expires
     
    Throw a WebsiteDownException if any of the website down conditions are met
    """
    try:
        response = requests.head(address, timeout=timeout)
        if response.status_code >= 400:
            logging.warning("Website %s returned status_code=%s" % (address, response.status_code))
            raise WebsiteDownException()
    except requests.exceptions.RequestException:
        logging.warning("Timeout expired for website %s" % address)
        raise WebsiteDownException()
         
def check_website(address):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    try:
        ping_website(address)
    except WebsiteDownException:
        print('The websie ' + address + ' is down')

---

You need a website list to try our system out. Create your own list or use the following one. 

---

In [25]:
WEBSITE_LIST = [
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://really-cool-available-domain.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://shopify.com',
    'http://another-really-interesting-domain.co',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

---

A serial version of the _website uptimer_ can be written as: 

---


In [26]:
import time
 
start_time = time.time()
 
for address in WEBSITE_LIST:
    check_website(address)
         
end_time = time.time()        
 
print("Time for Serial: %ssecs" % (end_time - start_time))



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
Time for Serial: 3.563690185546875secs


You should build two versions of the **website uptimer**, by using:

**2.1 (Points 15)**
**threading** with N=4 threads; 

**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?


**2.1**

In [68]:
def website_threading(link):
    while True:
        link = q.get()
        try:
            ping_website(link)
        except WebsiteDownException:
            for link in WEBSITE_LIST:
                check_website(link)
        q.task_done()

In [69]:
start_time = time.time()
q = Queue()
thread_n = 4
for i in range(thread_n):
    worker = Thread(target=website_threading, args=(q,))
    worker.setDaemon(True)
    worker.start()
    
for link in WEBSITE_LIST:
    q.put(link)

q.join()
end_time = time.time()
print(f'Time for Threading{thread_n}', (end_time - start_time))



The websie http://really-cool-available-domain.com is down
The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
The websie http://another-really-interesting-domain.co is down
Time for Threading4 2.8732500076293945


In [70]:
start_time = time.time()
q = Queue()
thread_n = 8
for i in range(thread_n):
    worker = Thread(target=website_threading, args=(q, ))
    worker.setDaemon(True)
    worker.start()
    
for link in WEBSITE_LIST:
    q.put(link)

q.join()
end_time = time.time()
print(f'Time for Threading{thread_n}', (end_time - start_time))



The websie http://really-cool-available-domain.com is down
The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
The websie http://another-really-interesting-domain.co is down




The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
The websie http://baidu.com is down
Time for Threading8 2.852005958557129


**2.2**

In [75]:
def check_websites(thread_n, addresses):
    with Pool(processes=thread_n) as pool:
        pool.map(check_website, addresses)

In [76]:
start_time = time.time()

thread_n = 4

check_websites(thread_n, WEBSITE_LIST)
         
end_time = time.time()        
 
print(f'Time for Pool{thread_n}', (end_time - start_time))



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
Time for Pool4 1.0622408390045166


In [77]:
start_time = time.time()

thread_n = 8

check_websites(thread_n, WEBSITE_LIST)
         
end_time = time.time()        

print(f'Time for Pool{thread_n}', (end_time - start_time))



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down
Time for Pool8 0.8827729225158691


**2.3** 


What I am noticing here is that multiprocess takes a shorter amount of time than that of threading. Threading takes about 3 seconds on average and multiprocessing takes about 1 seconds. 

As we increase n to 8, the timing decreases a little bit. This depends on vvarious factors. When N is increased to 8 beyound the number of avaiilable CPU cores, it might deteriorate. However, if the workload is sufficiently parallelizable and the number of available CPU cores is not entirely utilized, increased N can lead to improvements in performance.