# **Assignment 5**

The objective of the assignment is to generate a set of 500 samples for different distributions and compare them on a plot.


## Initialization
First of all, we need to import the necessary libraries and tell ipython how to treat the generated plots

In [16]:
%matplotlib tk
import numpy as np
from matplotlib import pyplot as plt

## Random number generation

We then need to set up the generator to be used for the creation of random numbers between 0 and 1.
In this version, I went with the standard numpy generator, although a linear congruential generator could be used.

At this same time, we define the number of samples to generate for each distribution (required by the assignment to be 500).

In [17]:
rng = np.random.default_rng(seed = 0xdeadbeef)
samples = 500

## Distributions

For each distribution considered, I generated a new set of random data to avoid any sort of correlation.

### Uniform

Generate the samples of a continuous uniform distribution *between 10 and 20*.

In [18]:
data = rng.random(size=samples)
unif = 10 + (20-10)*data;

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [19]:
mean = np.mean(unif)
cv = np.sqrt(np.var(unif))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 0.18236
mean = 15.41


The obtained value for the coefficient of variation is in line with what we expected:  
The uniform distribution has, by nature, a **standard deviation smaller than its mean** (when a and b are positive). 

### Discrete

Generate the samples of a discrete distribution that returns the following values with the given probability:

| Value | Probability |
| --- | --- |
| 5 | 0.2 |
| 15 | 0.6 |
| 20 | 0.2 |

In [20]:
data = rng.random(size=samples)
discrete = []
for i in data:
    if i < 0.2 :
        discrete.append(5)
    elif i < (0.2 + 0.6) :
        discrete.append(15)
    else :
        discrete.append(20)

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [21]:
mean = np.mean(discrete)
cv = np.sqrt(np.var(discrete))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 0.35679
mean = 13.77


The obtained value for the coefficient of variation is in line with what we expected.

### Exponential

Generate the samples of an exponential distribution with average 15.

In [22]:
data = rng.random(size=samples)
exp = -np.log(data)*15

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [23]:
mean = np.mean(exp)
cv = np.sqrt(np.var(exp))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 0.95178
mean = 16.33


The obtained value for the coefficient of variation is in line with what we expected:  
The exponential distribution should have a **c.v. of 1** and the value we obtained is close to it.  
The error we see is caused by the *small number of samples*.

### Hyper-exponential

Generate the samples of an hyper-exponential with two stages characterized by:

| Name | Value |
| --- | --- |
| $\lambda_{1}$ | 0.1 |
| $\lambda_{2}$ | 0.05 |
| p | 0.5 |

In [24]:
data1 = rng.random(size=samples)
data2 = rng.random(size=samples)
probabilities = rng.random(size=samples)
hyper = []
for i in range(0,samples):
    if probabilities[i] <= 0.5:
        hyper.append(-np.log(data1[i])/0.1)
    else:
        hyper.append(-np.log(data2[i])/0.05)

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [25]:
mean = np.mean(hyper)
cv = np.sqrt(np.var(hyper))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 1.13836
mean = 15.19


The obtained value for the coefficient of variation is in line with what we expected:  
The hyper-exponential distribution should have a **c.v. greater than 1**.

### Hypo-exponential

Generate the samples of an hypo-exponential with two stages characterized by:

| Name | Value |
| --- | --- |
| $\lambda_{1}$ | 0.1 |
| $\lambda_{2}$ | 0.2 |

In [26]:
data1 = rng.random(size=samples)
data2 = rng.random(size=samples)
hypo = - np.log(data1)/0.1 - np.log(data2)/0.2

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [27]:
mean = np.mean(hypo)
cv = np.sqrt(np.var(hypo))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 0.77219
mean = 14.74


The obtained value for the coefficient of variation is in line with what we expected:  
The hypo-exponential distribution should have a **c.v smaller than 1**. 

### Hyper-erlang

Generate the samples of an hyper-erlang with the following branches:

| Stages | Rate | Probability |
| --- | --- | --- |
| 1 | 0.02 | 0.1 |
| 2 | 0.2 | 0.4 |
| 3 | 0.25 | 0.5 |

In [28]:
data1 = rng.random(size=samples)
data2 = rng.random(size=samples)
data3 = rng.random(size=samples)
probabilities = rng.random(size=samples)
hyper_erlang = []
for i in range(0, samples):
    if probabilities[i] < 0.1:
        hyper_erlang.append(-np.log(data1[i])/0.02)
    elif probabilities[i] < (0.4 + 0.1):
        hyper_erlang.append(-np.log(data1[i])/0.2-np.log(data2[i])/0.2)
    else:
        hyper_erlang.append(-np.log(data1[i])/0.25 -np.log(data2[i])/0.25 -np.log(data3[i])/0.25)

The values obtained with the generation give the following *mean* and *coefficient of variation*

In [29]:
mean = np.mean(hyper_erlang)
cv = np.sqrt(np.var(hyper_erlang))/mean
print("cv = %3.5f\nmean = %3.2f" % (cv, mean))

cv = 1.27984
mean = 14.67


The obtained value for the coefficient of variation is in line with what we expected:  
The hyper-erlang distribution has a **c.v. greater than one** when its stages aren't all *single stage erlangs*.

## Plots

To compare the distributions created above, we calculated their CDF and plotted them together.

In [30]:
for (data, name) in [(unif, 'uniform'),
                     (discrete, 'discrete'),
                     (exp, 'exponential'),
                     (hyper, 'hyper'),
                     (hypo, 'hypoexponential'),
                     (hyper_erlang, 'hyper erlang')]:
    
    y = [*range(0,len(data))]
    y = np.divide(y, len(data))
    x = np.sort(data)

    plt.plot(x, y, label=name)
plt.legend()

<matplotlib.legend.Legend at 0x7fc6d26cadf0>