# Before your start:

    Read the README.md file
    Comment as much as you can and use the resources (README.md file)
    Happy learning!

In this exercise, we  will generate random numbers from the continuous distributions we learned in the lesson. There are two ways to generate random numbers:

1. Using the numpy library 
1. using the Scipy library 

Use either or both of the lbraries in this exercise.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
from scipy.stats import binom
%matplotlib inline
from scipy import stats
import pandas as pd

## Uniform Distribution

To generate uniform random numbers between any two given values using scipy, we can either use the following code or the code that we have
discussed in class:

In [None]:
from scipy.stats import uniform

x = uniform.rvs(size=10)
a = 2
b = 3
randoms  = a + (b-a)*x
print(randoms)

**Your task:**

1. Based on the code above, write a function that generates uniformly distributed random numbers. There are several requirements for your function:
    * It should accept 3 parameters: 
        * `bottom` - the lower boundary of the generated numbers
        * `ceiling` - the upper boundary of the generated numbers
        * `count` - how many numbers to generate
    * It should return an array of uniformly distributed random numbers

1. Call your function with 2 sets of params below:
    * bottom=10, ceiling=15, count=100
    * bottom=10, ceiling=60, count=1,000

1. Plot the uniform distributions generated above using histograms, where x axis is the value and y axis is the count. Let the histogram's number of bins be 10.



In [None]:
# your code here
plt.subplots(figsize=(6,6))
bottom=10
ceiling=15
count=100
x=np.random.uniform(bottom,ceiling,count)
count,bins,ignored=plt.hist(x,10, density=False)
plt.show()

In [None]:
bottom=10
ceiling=60
count=1000
x=np.random.uniform(bottom,ceiling,count)
count,bins,ignored=plt.hist(x,10, density=False)
plt.show()

How are the two distributions different?

In [None]:
# your answer below
#The second distribution seems to be more uniform given the fact 
#that it has a much higher number of counts.

## Normal Distribution

1. In the same way in the Uniform Distribution challenge, write a function that generates normally distributed random numbers.
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 1
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 50
2. Plot the distributions of the data generated.

You can check the expected output [here](https://drive.google.com/file/d/1ULdYD411SqkrlR9CqJJ7H8_Rt5T2GjLe/view?usp=sharing)

In [None]:
# your code here
mu,sigma=10,1
s=np.random.normal(mu,sigma,1000)
count, bins, ignored=plt.hist(s,100,density=False)
plt.show()

In [None]:
mu,sigma=10,50
s=np.random.normal(mu,sigma,1000)
count, bins, ignored=plt.hist(s,100,density=False)
plt.show()

How are the two distributions different?

In [None]:
# your answer below
#Since the standard deviation of the second distribution is much higher than the firt one,
#then the data is spread out over a larger range of values.

## Normal Distribution of Real Data

In this challenge we are going to take a look the real data. We will use vehicles.csv file for this exercise

First import `vehicles.csv` from [here](https://drive.google.com/file/d/1bNZgaQ-_Z9i3foO-OeB89x7kXJxm8xcC/view?usp=sharing), place it in the data folder and load it.


In [None]:
#your code here
vehicles = pd.read_csv('./data/vehicles.csv')
vehicles

Then plot the histograms for the following variables:
1. Fuel Barrels/Year

In [None]:
# your code 
plt.hist(vehicles['Fuel Barrels/Year'],bins=50)
plt.show()

In [None]:
from numpy.random import seed
from numpy.random import randn
from statsmodels.graphics.gofplots import qqplot
from matplotlib import pyplot
qqplot(vehicles['Fuel Barrels/Year'], line='s')
pyplot.show()

2. CO2 Emission Grams/Mile 

In [None]:
# your code here
plt.hist(vehicles['CO2 Emission Grams/Mile'],bins=50)
plt.show()

In [None]:
qqplot(vehicles['CO2 Emission Grams/Mile'], line='s')
pyplot.show()

3. Combined MPG

In [None]:
# your code here
plt.hist(vehicles['Combined MPG'],bins=50)
plt.show()

In [None]:
qqplot(vehicles['Combined MPG'], line='s')
pyplot.show()

Which one(s) of the variables are nearly normally distributed? How do you know?

In [None]:
# your answer here
#From the histograms it seemed like Combined MPG was the most normally distributed out of the three.
#However, after checking with QQplot, it seems like CO2 Emission Grams/Mile is the variable closest
#to a normal distribution.

## Exponential Distribution

1. Using `numpy.random.exponential`, create a function that returns a list of numbers exponentially distributed with the mean of 10. 

1. Use the function to generate two number sequences with the size of 1 and 100.

1. Plot the distributions as histograms with the nubmer of bins as 100.



In [None]:
# your code here
s=np.random.exponential(scale=10,size=1)
count, bins, ignored=plt.hist(s,100,density=False)
plt.show()

In [None]:
s=np.random.exponential(scale=10,size=100)
count, bins, ignored=plt.hist(s,100,density=False)
plt.show()

How are the two distributions different?

In [None]:
# your answer here
#Since for the first distribution the size is "1", then there is only
#one value printed in the distribution. 
#The second distribution has a tendency to go to the left given that the mean
#is 10.

## Exponential Distribution of Real Data

Suppose that the amount of time one spends in a bank is exponentially distributed with mean as 10 minutes (i.e. λ = 1/10). What is the probability that a customer will spend less than fifteen minutes in the bank? 

Write a code in python to solve this problem

In [None]:
# your answer here
# Hint: This is same as saying P(x<15)
# P(x<15) = 1 - e^(-λ*x)
import math
p_less15 = 1- math.exp(-(1/10)*15)
p_less15

What is the probability that the customer will spend more than 15 minutes

In [None]:
# your answer here
p_more15 = 1 - p_less15
p_more15

# Central Limit Theorem

A delivery company needs 35 minutes to deliver a package, with a standard deviation of 8 minutes. Suppose that in one day, they deliver 200 packages.

**Hint**: `stats.norm.cdf` can help you find the answers.

#### Step 1: What is the probability that the mean delivery time today is between 30 and 35 minutes?

In [None]:
# your code here
delivery_mins= 35
population_std=8
n=200
loc_35=stats.norm.cdf(35,loc=35,scale=8)
loc_30=stats.norm.cdf(30,loc=35,scale=8)
print(loc_35-loc_30)

#### Step 2: What is the probability that in total, it takes more than 115 hours to deliver all 200 packages?

In [None]:
#p(x>(115*60))
loc_115=stats.norm.cdf((115*60),loc=35,scale=8)
print(1-loc_115)