# Before your start:

    Read the README.md file
    Comment as much as you can and use the resources (README.md file)
    Happy learning!

In this exercise, we  will generate random numbers from the continuous disributions we learned in the lesson. There are two ways to generate random numbers:

1. Using the numpy library 
1. using the Scipy library 

Use either or both of the lbraries in this exercise.

## Uniform Distribution

To generate uniform random numbers between any two given values using scipy, we can either use the following code or the code that we have
discussed in class:

In [None]:
from scipy.stats import uniform
x = uniform.rvs(size=10)
a = 2
b = 3
randoms  = a + (b-a)*x
print(randoms)
print(x)

**Your task:**

1. Based on the code above, write a function that generates uniformly distributed random numbers. There are several requirements for your function:
    * It should accept 3 parameters: 
        * `bottom` - the lower boundary of the generated numbers
        * `ceiling` - the upper boundary of the generated numbers
        * `count` - how many numbers to generate
    * It should return an array of uniformly distributed random numbers

1. Call your function with 2 sets of params below:
    * bottom=10, ceiling=15, count=100
    * bottom=10, ceiling=60, count=1,000

1. Plot the uniform distributions generated above using histograms, where x axis is the value and y axis is the count. Let the histogram's number of bins be 10.



In [None]:
# your code here
import matplotlib.pyplot as plt
import seaborn as sns

def unif(bottom, ceiling, count):
    x = uniform.rvs(size=count)
    return  bottom + (ceiling - bottom) * x

dist_100 = unif(10, 15, 100)
dist_1000 = unif(10, 60, 1000)

How are the two distributions different?

In [None]:
# your answer below
fig, [plot_a, plot_b] = plt.subplots(1,2, figsize=(12,4))

plot_a.hist(dist_100)
plot_b.hist(dist_1000)

plt.show()

#The more samples are in an uniform distribution, the more they tend to have similar frequencies
#Paolo: yes and X-axis depending on ceiling, number of obs in each bin depending on n

## Normal Distribution

1. In the same way in the Uniform Distribution challenge, write a function that generates normally distributed random numbers.
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 1
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 50
2. Plot the distributions of the data generated.

You can check the expected output [here](https://drive.google.com/file/d/1ULdYD411SqkrlR9CqJJ7H8_Rt5T2GjLe/view?usp=sharing)

In [None]:
# your code here
import numpy as np

def normal(mean, stdev, count):
    return np.random.normal(mean, stdev, count)

normal(10, 1, 1000)
fig, [plot_a, plot_b] = plt.subplots(1,2, figsize=(12,4))

plot_a.hist(normal(10, 1, 1000))
plot_b.hist(normal(10, 50, 1000))

plt.show()

How are the two distributions different?

In [None]:
# your answer below
#They hardly differ, the distribution is fairly similar
#Paolo: yes, but st. deviations are different

## Normal Distribution of Real Data

In this challenge we are going to take a look the real data. We will use vehicles.csv file for this exercise

First import `vehicles.csv` from [here](https://drive.google.com/file/d/1bNZgaQ-_Z9i3foO-OeB89x7kXJxm8xcC/view?usp=sharing), place it in the data folder and load it.


In [None]:
#your code here
import pandas as pd

vehicles = pd.read_csv('data/vehicles.csv')

Then plot the histograms for the following variables:
1. Fuel Barrels/Year

In [None]:
# your code here
plt.hist(vehicles['Fuel Barrels/Year'])
plt.show()

2. CO2 Emission Grams/Mile 

In [None]:
# your code here
plt.hist(vehicles['CO2 Emission Grams/Mile'])
plt.show()

3. Combined MPG

In [None]:
# your code here
plt.hist(vehicles['Combined MPG'])
plt.show()

Which one(s) of the variables are nearly normally distributed? How do you know?

In [None]:
# your answer here
#The 3 of them seem quite normally distributed, but with a small tail to the right
# Paolo: in general it is not always easy to assess the normality of a distribution,
# sometimes it looks different to the eye depending on the number of bins you choose
# A possibility could be to use qq plots, have a look here 
# https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot

## Exponential Distribution

1. Using `numpy.random.exponential`, create a function that returns a list of numbers exponentially distributed with the mean of 10. 

1. Use the function to generate two number sequences with the size of 1 and 100.

1. Plot the distributions as histograms with the nubmer of bins as 100.



In [None]:
# your code here
def exp(size):
    return np.random.exponential(10,size=size)

plt.hist(exp(100), bins=100)

How are the two distributions different?

In [None]:
# your answer here
#There is no difference. 
#Paolo: plotting with size=1 you would have had only one data point

## Exponential Distribution of Real Data

Suppose that the amount of time one spends in a bank is exponentially distributed with mean as 10 minutes (i.e. λ = 1/10). What is the probability that a customer will spend less than fifteen minutes in the bank? 

Write a code in python to solve this problem

In [None]:
# your answer here
# Hint: This is same as saying P(x<15)
from scipy.stats import expon

def cdf(mean, value):
    return expon(mean).cdf(value)
#Paolo: you have to specify scale=mean, see below for differeces

In [None]:
def cdf_(mean, value):
    return expon(scale=mean).cdf(value)

In [None]:
cdf_(10, 15)

In [None]:
# your answer here
cdf(10, 15)

What is the probability that the customer will spend more than 15 minutes

# Central Limit Theorem

A delivery company needs 35 minutes to deliver a package, with a standard deviation of 8 minutes. Suppose that in one day, they deliver 200 packages.

**Hint**: `stats.norm.cdf` can help you find the answers.

#### Step 1: What is the probability that the mean delivery time today is between 30 and 35 minutes?

In [None]:
# your code here
from scipy.stats import norm
x = 35;
std = 8;
n = 200;
dist_std = std / np.sqrt(n)

# Probability
norm.cdf(35, x, dist_std) - norm.cdf(30, x, dist_std)
#Paolo: yes

#### Step 2: What is the probability that in total, it takes more than 115 hours to deliver all 200 packages?

In [None]:
# your code here
mean_time_needed = 115 * 60 / 200
1 - norm.cdf(mean_time_needed, x, dist_std)
#Paolo:yes