# Assignment 6

The objective of this assignment is to compute the confidence interval of some performance measures of a server.

## Characteristics
The server is characterized by:
- A **Hyper exponential** distribution of the **inter arrival time** with two stages ($ \lambda_{1} $ = 0.1, $ \lambda_{2} $ = 0.05, $ p_{1} $ = 0.5)
- A **Hypo exponential** distribution of the **service time** with two stages ($ \lambda_{1} $ = 0.1, $ \lambda_{2} $ = 0.5)

The assignment requires to plot the arrivals and the completions, and to compute a confidence interval for some performance indexes. 

The confidence interval considered is *95%* and must be computed for:
- The average response time (**R**)
- The average number of jobs (**N**)
- The utilization (**U**)
- The throughput (**X**)

The number of jobs to generate is 10000 and they should be divided on 50 runs of 200 jobs each.

## Initialization
First of all, we need to import the necessary libraries, tell ipython how to treat the generated plots and initialize the random number generator.

In [1]:
%matplotlib tk

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps

rng = np.random.default_rng(seed = 0xdeadbeef)
samples = 10000
runs = 50
ci = 0.95

## Generation
We are now ready to generate the inter arrival samples and the service time samples.

In [2]:
## inter arrival time samples
data1 = rng.random(size=samples)
data2 = rng.random(size=samples)
probabilities = rng.random(size=samples)
inter_time_samples = []
for i in range(0,samples):
    if probabilities[i] < 0.5:
        inter_time_samples.append(-np.log(data1[i])/0.1)
    else:
        inter_time_samples.append(-np.log(data2[i])/0.05)

## service time samples
data1 = rng.random(size=samples)
data2 = rng.random(size=samples)
service_times = - np.log(data1)/0.1 - np.log(data2)/0.5

## Arrivals and completions curves
Next we are asked to plot the arrivals and the completions.

In [3]:
arrival_time_samples = np.cumsum(inter_time_samples)

#completion_time_samples = np.add(arrival_time_samples, service_times)
# This formula would be correct if we had the response times, not the service times.

completion_time_samples = [arrival_time_samples[0]+service_times[0]]
for i in range(1, samples):
    if(arrival_time_samples[i]>completion_time_samples[i-1]):
        completion_time_samples.append(arrival_time_samples[i]+service_times[i])
    else:
        completion_time_samples.append(completion_time_samples[i-1]+service_times[i])
completion_time_samples = np.array(completion_time_samples)

The resulting plots are the following:

In [4]:
y = [*range(0,samples)]
plt.step(arrival_time_samples, y, label='arrivals')
plt.step(completion_time_samples, y, label='completions')
# I don't need to put a sort on the completion time samples
# since they are already sorted (you can check this by doing
# print(all(completion_time_samples == sorted(completion time samples)))
plt.legend()

<matplotlib.legend.Legend at 0x7fc2ebaad2b0>

## Confidence intervals
The final requirement asks to compute the 95% confidence intervals for the performance measures listed above.

### Service time
To compute the service time confidence interval, we first need to compute the average and the variance of the service times sample set.

In [5]:
avg_service_time = np.sum(service_times) / samples
var_service_time = np.sum(np.power(service_times - avg_service_time,2))/(samples-1)

We can then compute the requested percentile of a student's t distribution with a degree of freedom equal to the number of samples we have.

In [6]:
dist = sps.t(samples)
c_gamma = dist.ppf((1+ci)/2)

In [7]:
lowr = avg_service_time - c_gamma * np.sqrt(np.divide(var_service_time, samples))
uppr = avg_service_time + c_gamma * np.sqrt(np.divide(var_service_time, samples))
ci_service = (lowr, uppr)
print("With 95% probability, the average service time of the system will be between {:.3f} and {:.3f}".format(ci_service[0], ci_service[1]))

With 95% probability, the average service time of the system will be between 11.757 and 12.154


Theoretically, if we repeat the experiment multiple times (or increase the number of rounds) __the mean of the distribution should be included in the confidence intervals just found about 95% of the times__. 

Since the number of samples considered is greater than 30, instead of computing the quantiles from the Student's distribution samples, we can approximate them with the quantiles of a Standard Normal distribution.
In our case, the 95% quantile of a standard normal distribution is 1.96.

In [8]:
d_gamma = 1.96 
# at this point already, when we confront d_gamma with c_gamma, we
# can see that the values are really close.

lowr = avg_service_time - d_gamma * np.sqrt(np.divide(var_service_time, samples))
uppr = avg_service_time + d_gamma * np.sqrt(np.divide(var_service_time, samples))
ci_service_norm = (lowr, uppr)
print("With 95% probability, the average service time of the system will be between {:.3f} and {:.3f}".format(ci_service_norm[0], ci_service_norm[1]))

With 95% probability, the average service time of the system will be between 11.757 and 12.154


### Response time
To compute the response time confidence interval, we first need to compute the response times samples, their average and their variance.

In [9]:
response_times = np.subtract(completion_time_samples, arrival_time_samples)
avg_response_time = np.sum(response_times) / samples
var_response_time = np.sum(np.power(response_times - avg_response_time,2))/(samples-1)

Due to the same considerations written above, we can compute the confidence intervals from a Standard normal distribution (instead of a student's distribution).

In [10]:
lowr = avg_response_time - d_gamma * np.sqrt(np.divide(var_response_time, samples))
uppr = avg_response_time + d_gamma * np.sqrt(np.divide(var_response_time, samples))
ci_response = (lowr, uppr)
print("With 95% probability, the average response time of the system will be between {:.3f} and {:.3f}".format(ci_response[0], ci_response[1]))

With 95% probability, the average response time of the system will be between 58.161 and 60.660


### Utilization
Since utilization is not a performance index based on a mean-like formula, we cannot directly apply the method showed for the service time.

We can instead split the sample dataset on multiple rounds (the number of rounds is given in the assignment), compute the utilization for each round and, given this array of utilizations, compute its confidence interval.

Let's start by splitting the service times, the arrivals and the completions in rounds.

In [11]:
service_times_runs = np.split(service_times, runs)
arrival_runs = np.split(arrival_time_samples, runs)
completion_runs = np.split(completion_time_samples, runs)

Then, for each run, we can compute the busy time by summing up the service times.
From there, the utilization is given by the formula $$U_{i} = \frac{B_{i}}{T_{i}}$$

In [12]:
busy_times_runs = np.sum(service_times_runs, 1)

timespans = np.subtract(np.apply_along_axis(max, 1, completion_runs),
            np.apply_along_axis(min, 1, arrival_runs))

utilization_runs = []
for i,busy in enumerate(busy_times_runs):
    utilization_runs.append(busy/timespans[i])
utilization_runs = np.array(utilization_runs)

Much like the service time case, we can now compute the confidence interval we need. 

In [13]:
avg_utilization = np.sum(utilization_runs) / runs
var_utilization = np.sum(np.power(utilization_runs - avg_utilization,2))/(runs-1)

In [14]:
dist = sps.t(50)
c_gamma_runs = dist.ppf((1+ci)/2)

In [15]:
lowr = avg_utilization - c_gamma_runs * np.sqrt(np.divide(var_utilization, runs))
uppr = avg_utilization + c_gamma_runs * np.sqrt(np.divide(var_utilization, runs))
ci_utilization = (lowr, uppr)
print("With 95% probability, the average utilization of the system will be between {:.3f} and {:.3f}".format(ci_utilization[0], ci_utilization[1]))

With 95% probability, the average utilization of the system will be between 0.759 and 0.805


### Number of jobs
The computation of the confidence interval for the number of jobs is similar to the computation of the Utilization, except for the fact that we start from the response times (to compute $W$) and we use the formula $$N_{i} = \frac{W_{i}}{T_{i}}$$

Let's start with the division of the response times in multiple runs.

In [16]:
response_times_runs = np.split(response_times, runs)

Next, we can compute the average number of jobs for each run.

In [17]:
w_runs = np.sum(response_times_runs, 1)
numberj_runs = []
for i,w in enumerate(w_runs):
    numberj_runs.append(w/timespans[i])
numberj_runs = np.array(numberj_runs)

Finally, given the formula showed above and the values obtained in the previous step, we can compute the confidence interval for the average number of jobs in the system.

In [18]:
avg_numberj = np.sum(numberj_runs) / runs
var_numberj = np.sum(np.power(numberj_runs - avg_numberj,2))/(runs-1)

In [19]:
lowr = avg_numberj - c_gamma_runs * np.sqrt(np.divide(var_numberj, runs))
uppr = avg_numberj + c_gamma_runs * np.sqrt(np.divide(var_numberj, runs))
ci_numberj = (lowr, uppr)

print("With 95% probability, the average number of jobs in the system will be between {:.3f} and {:.3f}".format(ci_numberj[0], ci_numberj[1]))

With 95% probability, the average number of jobs in the system will be between 3.205 and 4.718


### Throughput
The computation of the confidence interval for the throughput is similar to the computation of the Utilization, except for the fact that we start from the number of completed jobs and we use the formula $$X_{i} = \frac{C}{T_{i}}$$

Since we don't need any additional performance index, we can start with the conputation of the average throughput for each run.

In [20]:
x_runs = [(samples/runs)/time for time in timespans]
x_runs = np.array(x_runs)

We are now able to compute the confidence interval for the throughput

In [21]:
avg_throughput = np.sum(x_runs)/runs
var_throughput = np.sum(np.power(x_runs - avg_throughput, 2))/(runs-1)

In [22]:
lowr = avg_throughput - c_gamma_runs * np.sqrt(np.divide(var_throughput, runs))
uppr = avg_throughput + c_gamma_runs * np.sqrt(np.divide(var_throughput, runs))
ci_throughput = (lowr, uppr)
print("With 95% probability, the average throughput of the system will be between {:.3f} and {:.3f}".format(ci_throughput[0], ci_throughput[1]))

With 95% probability, the average throughput of the system will be between 0.064 and 0.067


## Summing up
The confidence intervals we obtained are:

In [23]:
print("{!s: ^10}{!s:<15}{!s:<15}\n".format("Index","lower","upper"))
print("{!s: ^10}{:<15.4f}{:<15.4f}".format("S", ci_service[0], ci_service[1]))
print("{!s: ^10}{:<15.4f}{:<15.4f}".format("R", ci_response[0], ci_response[1]))
print("{!s: ^10}{:<15.4f}{:<15.4f}".format("N", ci_numberj[0], ci_numberj[1]))
print("{!s: ^10}{:<15.4f}{:<15.4f}".format("U", ci_utilization[0], ci_utilization[1]))
print("{!s: ^10}{:<15.4f}{:<15.4f}".format("X", ci_throughput[0], ci_throughput[1]))

  Index   lower          upper          

    S     11.7567        12.1535        
    R     58.1611        60.6602        
    N     3.2050         4.7176         
    U     0.7589         0.8049         
    X     0.0639         0.0668         
