# Before your start:

    Read the README.md file
    Comment as much as you can and use the resources (README.md file)
    Happy learning!

In this exercise, we  will generate random numbers from the continuous disributions we learned in the lesson. There are two ways to generate random numbers:

1. Using the numpy library 
1. using the Scipy library 

Use either or both of the lbraries in this exercise.

## Uniform Distribution

To generate uniform random numbers between any two given values using scipy, we can either use the following code or the code that we have
discussed in class:

In [None]:
from scipy.stats import uniform
x = uniform.rvs(size=10)
a = 2
b = 3
randoms  = a + (b-a)*x
print(randoms,x)

**Your task:**

1. Based on the code above, write a function that generates uniformly distributed random numbers. There are several requirements for your function:
    * It should accept 3 parameters: 
        * `bottom` - the lower boundary of the generated numbers
        * `ceiling` - the upper boundary of the generated numbers
        * `count` - how many numbers to generate
    * It should return an array of uniformly distributed random numbers

1. Call your function with 2 sets of params below:
    * bottom=10, ceiling=15, count=100
    * bottom=10, ceiling=60, count=1,000

1. Plot the uniform distributions generated above using histograms, where x axis is the value and y axis is the count. Let the histogram's number of bins be 10.

You can check the expected output [here](https://drive.google.com/file/d/1uSelMUT-aSspJcDbfXpswZv9A5ChlaEL/view?usp=sharing)

In [None]:
# your code here
def unif_random (bottom, ceiling, count):
    x = uniform.rvs(size=count)
    randoms  = bottom + (ceiling-bottom)*x
    return randoms

params1 = unif_random(10,15,100)
params2 = unif_random(10,60,1000)

In [None]:
import matplotlib.pyplot as plt

fig, [ax1, ax2] = plt.subplots(1,2, figsize = (10,4))

ax1.hist(params1,bins=10)
ax1.set_xticks(range(10, 15, 1))
ax1.set_yticks(range(0, 120, 20))
ax1.set_title("parameters 1")

ax2.hist(params2,bins=10)
ax2.set_xticks(range(10, 60, 10))
ax2.set_yticks(range(0, 120, 20))
ax2.set_title("parameters 2")

plt.show()

How are the two distributions different?

In [None]:
# your answer below
# they are both pretty uniform, only one has a lot more counts than the other (as expected)

## Normal Distribution

1. In the same way in the Uniform Distribution challenge, write a function that generates normally distributed random numbers.
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 1
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 50
2. Plot the distributions of the data generated.

You can check the expected output [here](https://drive.google.com/file/d/1ULdYD411SqkrlR9CqJJ7H8_Rt5T2GjLe/view?usp=sharing)

In [None]:
# your code here
import numpy as np
def normal_random (avg, stdv, count): 
    return np.random.normal(avg, stdv, count)

nor_params1 = normal_random(10,1,1000)
now_params2 = normal_random(10,50,1000)

In [None]:
fig, [ax1, ax2] = plt.subplots(1,2, figsize = (10,4))

ax1.hist(nor_params1,bins=50)
ax1.set_title("parameters 1")

ax2.hist(now_params2,bins=50)
ax2.set_title("parameters 2")

plt.show()

How are the two distributions different?

In [None]:
# your answer below
# they both have an mean of 10, so its noticeble that the peak is here
# when standard deviation is 1, most data around 3 numbers away from the mean
# for figure 2, as the standard deviation is 50, the numbers spread out a lot more, 
# but since its a normally distributed array, the numbers go evenly to both sides of the mean

## Normal Distribution of Real Data

In this challenge we are going to take a look the real data. We will use vehicles.csv file for this exercise

First import `vehicles.csv` from [here](https://drive.google.com/file/d/1bNZgaQ-_Z9i3foO-OeB89x7kXJxm8xcC/view?usp=sharing), place it in the data folder and load it.


In [None]:
#your code here
import pandas as pd

vehicles = pd.read_csv('data/vehicles.csv')
vehicles.head()

Then plot the histograms for the following variables:
1. Fuel Barrels/Year

In [None]:
# your code here
vehicles['Fuel Barrels/Year'].plot.hist()

2. CO2 Emission Grams/Mile 

In [None]:
# your code here
vehicles['CO2 Emission Grams/Mile'].plot.hist()

3. Combined MPG

In [None]:
# your code here
vehicles['Combined MPG'].plot.hist()

Which one(s) of the variables are nearly normally distributed? How do you know?

In [None]:
# your answer here
# visually its really hard to say, maybe if I had to guess I would say Combined MPG.
# But we can check properly. If the distribution is normal its mean, meadian and mode are all the same

vehicles['Fuel Barrels/Year'].mean() #17.609
vehicles['Fuel Barrels/Year'].median() #17.347
vehicles['Fuel Barrels/Year'].mode() #18.311

vehicles['CO2 Emission Grams/Mile'].mean() #475.316
vehicles['CO2 Emission Grams/Mile'].median() #467.736
vehicles['CO2 Emission Grams/Mile'].mode() #493.722

vehicles['Combined MPG'].mean() #19.929
vehicles['Combined MPG'].median() #19.0
vehicles['Combined MPG'].mode() #18

In [None]:
# or maybe its better to look at the standard deviation?
vehicles[['Fuel Barrels/Year','CO2 Emission Grams/Mile','Combined MPG']].std()
#looks like Fuel Barrels/Year, is more evenly distributed.

## Exponential Distribution

1. Using `numpy.random.exponential`, create a function that returns a list of numbers exponentially distributed with the mean of 10. 

1. Use the function to generate two number sequences with the size of 1 and 100.

1. Plot the distributions as histograms with the nubmer of bins as 100.

You can check the expected output [here](https://drive.google.com/file/d/1pybmhXeeG5Wzb69wfFv2J8JyR6t44mRi/view?usp=sharing)

In [None]:
# your code here
def rand_exp(scale,size):
    return np.random.exponential(scale=scale,size=size)

seq1 = rand_exp(10,1)
seq2 = rand_exp(10,100)
seq1

In [None]:
fig, [ax1, ax2] = plt.subplots(1,2, figsize = (10,4))

ax1.hist(seq1,bins=100)
ax1.set_title("sequence 1")

ax2.hist(seq2,bins=100)
ax2.set_title("sequence 2")

plt.show()

How are the two distributions different?

In [None]:
# your answer here
# they do not look like the example?
# Probably a typo in the instructions?

## Exponential Distribution of Real Data

Suppose that the amount of time one spends in a bank is exponentially distributed with mean as 10 minutes (i.e. λ = 1/10). What is the probability that a customer will spend less than fifteen minutes in the bank? 

Write a code in python to solve this problem

In [None]:
# your answer here
# Hint: This is same as saying P(x<15)
# formula: 𝐹(𝐴) = 1−𝑒−𝜆𝑎
import math

result = 1 - math.exp(-15/10)

#result == 0.7769


What is the probability that the customer will spend more than 15 minutes

In [None]:
# your answer here
result2 = 1 - result
# result2 == 0.2231

# Central Limit Theorem

A delivery company needs 35 minutes to deliver a package, with a standard deviation of 8 minutes. Suppose that in one day, they deliver 200 packages.

**Hint**: `stats.norm.cdf` can help you find the answers.

#### Step 1: What is the probability that the mean delivery time today is between 30 and 35 minutes?

In [None]:
# your code here
from scipy.stats import norm

# where loc is the mean and scale is stdev
normal = norm(loc=35,scale=8) 

cummulative_prob_30to35 = normal.cdf(35) - normal.cdf(30)
cummulative_prob_30to35 #0.2340

# not sure if I need to compute the size=200 anywhere?

#### Step 2: What is the probability that in total, it takes more than 115 hours to deliver all 200 packages?

In [None]:
# your code here
# 115 hours to deliver all 200 packages would give us a mean of 34.5 min per package:
mean_per_pckage = (115 * 60) / 200 # 34.5

cummulative_prob_115hrs= 1 - normal.cdf(34.5)
cummulative_prob_115hrs #probability of 0.5249