# Overall Question: Is the new Bus route improving commute times?

**Scenario**: A new bus route for line X8 is implemented. MTA wants to
know if it improves commute time (travel time at peak hours).
They know what the mean travel time used to be, and measure
the new travel time 100 times. The data is in
https://raw.githubusercontent.com/fedhere/PUI2018_fb55/master/Lab4_fb55/times.txt



# Null Hypothesis ($H_0$):
The commute time is the same or longer with the new bus route as it was before: 
$T_{new} >= T_{old}$, 


# Alternative Hypothesis ($H_a$):
$H_a:$ The commute time is shorter with the new bus route as it was before: $T_{new} < T_{old}$


# Significance Level
We will use a sig level of 0.05, **$\alpha: 0.05$**

*Note: This is a one-tailed hypothesis test since there is a directionality inferred by the null Hypothesis*


# Formulas Used

$N(\mu=36, \sigma=6)$**

$Z = \frac{\mu_{pop}-\mu_{sample}}{\sigma/\sqrt{N}}$



In [44]:
import numpy as np
import pandas as pd
from __future__ import print_function
import os
import matplotlib.pyplot as plt
%matplotlib inline
PUIDATA = os.getenv("PUIDATA")

In [49]:
#Download text file and move to home directory
!curl https://raw.githubusercontent.com/fedhere/PUI2018_fb55/master/Lab4_fb55/times.txt --output $PUIDATA/times.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1179  100  1179    0     0   8214      0 --:--:-- --:--:-- --:--:--  8244


In [46]:
file = PUIDATA + '/times.txt'
Tnew = pd.read_csv(file, header=None)
Tnew_mean = Tnew.describe().iloc[1][0]

In [48]:
#Translate Z Score formula to function
def Z_score(mu_pop,mu_new,sigma,N):
    '''Using Z score formula from 1st cell'''
    return((mu_pop-mu_new)/(sigma/np.sqrt(N)))

Z = Z_score(36,Tnew_mean,6,Tnew.size)
print('Z Value: ' + str(Z))

Z Value: 2.55639718617


* With a Z-Score of **2.556**, we can reject the null hypothesis.
* Such a Z-score means that the mean time of the new bus line fell was *significantly lower* than our population mean, at an alpha level of 0.05. 
* Consulting a Z table shows us that the probability that this happened by chance was ~0.0054, in other words it is much lower than our *significance threshold*