# Analysis of the New Bus Route for the X8 Line

## Overall question:

Is the average commute time on the new route faster than that of the old route?

## Null Hypothesis ($H_0$):

The average commute time for the new route ($T_{new}$) is the same as or longer than the average commute time of the old route ($T_{old}$).

$H_0$: $T_{new}$ - $T_{old}$ >= 0

## Alternative Hypothesis ($H_a$):

The average commute time for the new route is faster than the average commute time of the old route.

$H_a$: $T_{new}$ - $T_{old}$ < 0

## Testing:

I will set the $\alpha$ at .05 ($\alpha$ = .05) when testing the sample from the new bus route. An $\alpha$ value this high seems reasonable (or possibly too high) given that we are testing data from an uncontrolled, real-world environment with many outside factors that could affect the speed of the X8 along the new bus route.

I will obtain the Z-score for the new route to determine if it is (statistically) significantly different from the mean of the standard route for the X8 line. The Z-score can be calculated as follows:

$Z = \frac{\mu_{pop}-\mu_{sample}}{\sigma/\sqrt{N}}$

In [1]:
# Get the environmental variable called PUIdata for this notebook
import os

os.environ["PUIdata"] = "{}/PUIdata".format(os.getenv("HOME"))

In [2]:
# Import libraries
from __future__  import print_function, division
import pylab as pl
import pandas as pd
import numpy as np

%pylab inline

if os.getenv('PUI2018') is None:
    print ("Must set env variable PUI2018")
if os.getenv('PUIdata') is None:
    print ("Must set env variable PUIdata")

import os
import json

Populating the interactive namespace from numpy and matplotlib
Must set env variable PUI2018


In [3]:
# Read the bus data into PUIdata
!curl --url https://raw.githubusercontent.com/fedhere/PUI2018_fb55/master/Lab4_fb55/times.txt --output \
    $PUIdata/times.txt > times.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1179  100  1179    0     0   7481      0 --:--:-- --:--:-- --:--:--  7509


In [4]:
# Read the bus data into a dataframe
x8 = pd.read_csv(os.getenv("PUIdata") + '/' + 'times.txt', sep=" ", header=None)
x8.columns = ['times']

In [5]:
# Preview of the first 5 rows
x8.head(5)

Unnamed: 0,times
0,31.622239
1,32.821376
2,30.229101
3,31.413766
4,39.01055


In [6]:
# Get the count and mean of the travel times
sampm = x8['times'].mean()
n = x8['times'].count()

In [7]:
# Calculate the z statistic
popm = 36
popsd = 6
z = (popm - sampm)/(popsd/sqrt(n))
zround = round(z,2)
print("Z score: " + str(zround))

Z score: 2.56


## Conclusion

The $\alpha$ value of .05 that was previously stated corresponds to 2 standard deviations from our population mean of 36. But with a calculated Z score of 2.56 (which is, of course, greater than the 2 standard deviations for our $\alpha$ value), we are led to conclude that there is a statistically significant effect of the new X8 bus route on average travel time. Therefore, we should reject our null hypothesis. The new bus route, in fact, does reduce travel time for passengers on the X8.