# Hypothesis Testing Exercises

 ## Overview Exercises
 ***For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.***

- **1. Has the network latency gone up since we switched internet service providers?**


- H<sub>o</sub> : There is no difference on network latency since switching ISPs
- H<sub>*a*</sub> : Switching ISPs has increased network latency


- True Positive: Switching ISPs **has** affected latency and we **rejected** H<sub>o</sub> in testing
- True Negative: Switching ISPs **has not** affected latency and we **accepted** H<sub>o</sub> in testing
- Type I Error:  Switching ISPs **has not** affected latency but we **rejected** H<sub>o</sub> in testing
- Type II Error: Switching ISPs **has** affecetd latency but we **accepted** H<sub>o</sub> in testing

- **2. Is the website redesign any good?**

- H<sub>o</sub> : The website redesign has had no impact on the average time users spend on the website
- H<sub>*a*</sub> : The website redesign has increased the average time users spend on the website


- True Positive: Redesign **has** increased avg user time and we **rejected** H<sub>o</sub> in testing
- True Negative: Redesign **has not** increased avg user time and we **accepted** H<sub>o</sub> in testing
- Type I Error:  Redesign **has not** increased avg user time but we **rejected** H<sub>o</sub> in testing
- Type II Error: Redesign **has** increased avg user time but we **accepted** H<sub>o</sub> in testing

- **3. Is our television ad driving more sales?**

- H<sub>o</sub> : Sales have not increased after airing our TV ad
- H<sub>*a*</sub> : Sales have increased since airing our TV ad 


- True Positive: Sales **have** increased and we **rejected** H<sub>o</sub> in testing
- True Negative: Sales  **have not** increased and we **accepted** H<sub>o</sub> in testing
- Type I Error:  Sales **have not** increased but we **rejected** H<sub>o</sub> in testing
- Type II Error: Sales **have** increased but we **accepted** H<sub>o</sub> in testing

## T-test Exercises

In [12]:
from math import sqrt
from scipy import stats
from scipy.stats import ttest_ind_from_stats

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [148]:
#List the data we have 

#Avg time it takes each office to sell homes
office_1_mean = 90
office_2_mean = 100

#Stdev for the avg time it takes to sell homes 
office_1_sd = 15
office_2_sd = 20

In [149]:
#Setup the test and set the confidence level
null_hypothesis = "There is no difference in average time for selling homes between the two Ace Realty offices"
alt_hypothesis = "There is a difference in avg time for selling homes"
confidence_level = .95
a = 1 - confidence_level

In [152]:
#Use scipy to conduct the t-test
t, p = stats.ttest_ind_from_stats(office_1_mean, office_1_sd, 40, office_2_mean, office_2_sd, 50)
t, p

(-2.6252287036468456, 0.01020985244923939)

In [151]:
if p > a:
    print(f"{null_hypothesis}")
else:
    print(f"Reject the null hypothesis. {alt_hypothesis}.")

Reject the null hypothesis. There is a difference in avg time for selling homes.


### Load the mpg dataset and use it to answer the following questions:

In [153]:
from pydataset import data
mpg = data("mpg")
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


- **Is there a difference in fuel-efficiency in cars from 2008 vs 1999?**

In [174]:
#Prepare the data:

#Step 1: Calculate the avg mpg
mpg['avg_mpg'] = (mpg.cty + mpg.hwy) / 2
mpg.head()

#Note from demo: In the future, when dealing with rates(ratios) like mpg, I may want to use harmonic mean instead of arithmatic mean

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class_of_car,avg_mpg
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


In [175]:
#Step 2: Classify vehicles by the years 2008 and 1999
cars_from_99 = mpg[mpg.year == 1999]
cars_from_08 = mpg[mpg.year == 2008] 

In [176]:
#Setup the test and set the confidence level
null_hypothesis = "There is no difference in fuel-efficiency in cars from 2008 vs 1999"
alt_hypothesis = "There is a difference in fuel-efficiency in cars from 2008 vs 1999"

confidence_level = .95
a = 1 - confidence_level

In [177]:
#Use ttest_ind for a two sample t-test
t, p = stats.ttest_ind(cars_from_99.avg_mpg,cars_from_08.avg_mpg)
t, p

(0.21960177245940962, 0.8263744040323578)

In [178]:
#Use a two-tailed test
if p < a: 
    print("Reject the null hypothesis that there is no difference in fuel-efficiency")
    print("Move forward with the understanding that there is a difference in efficiency between cars from 1999 and 2008")
else:
    print("Fail to reject the null hypothesis.")
    print("Not enough evidence to support a difference in fuel-efficiency.")

Fail to reject the null hypothesis.
Not enough evidence to support a difference in fuel-efficiency.


- **Are compact cars more fuel-efficient than the average car?**

In [179]:
#Rename the class column because class is a reserved word in python
mpg = mpg.rename(columns={'class': 'class_of_car'})


In [183]:
#Separate the compact car data sets to compare against the population
compact_cars = mpg[mpg.class_of_car == "compact"]

In [184]:
#Setup the test and set the confidence level
null_hypothesis = "There is no difference in fuel-efficiency for compact vs average cars"
alt_hypothesis = "Compact cars are more efficient than the average car"
confidence_level = .95
a = 1 - confidence_level

In [185]:
#use ttest_ind for a one-sample t-test
t, p = stats.ttest_1samp(compact_cars.avg_mpg, mpg.avg_mpg.mean())
t,p

(7.896888573132535, 4.1985637943171336e-10)

In [186]:
#Set a one-tailed test 
if p/2 < a and t > 0:
    print("Reject the null hypothesis that states there is no difference in fuel-efficiency")
    print("Move forward with the understanding that compact cars are more efficient than the average car.")
else:
    print(f'Fail to reject the null hypothesis. There is not enough eveidence to support the claim: {alt_hypothesis}')

Reject the null hypothesis that states there is no difference in fuel-efficiency
Move forward with the understanding that compact cars are more efficient than the average car.


- **Do manual cars get better gas mileage than automatic cars?**

In [193]:
#Prepare the data by grouping manual and automatic cars
def trans_type(x):
    if "auto" in x:
        return "auto"
    else:
        return "manual"

In [194]:
#Assign the cars one of two variables where auto == automatic transmission and manual == manual transmission
mpg['trans_type'] = mpg.trans.apply(trans_type)

In [195]:
#Separate the two data sets for comparison
auto_cars = mpg[mpg.trans_type == 'auto']
manual_cars = mpg[mpg.trans_type == 'manual']

In [196]:
#Setup the test and select a confidence level 
null_hypothesis = "There is no difference in gas mileage for automatic vs manual cars"
alt_hypothesis = "Manual cars get better gas mileage than automatic cars"
confidence_level = .95
a = 1 - confidence_level

In [197]:
#Use ttest_ind for the two sample t-test
t, p = stats.ttest_ind(manual_cars.avg_mpg, auto_cars.avg_mpg)
t, p

(4.593437735750014, 7.154374401145683e-06)

In [192]:
#Since we are testing against whether or not manual cars get better gas mileage, I will use a one-tailed test:
if p/2 < a and  t > 0:
    print("Reject the null hypothesis that states there is no difference in gas mileage for automatic vs. manual cars")
    print("Move forward with the understanding that manual cars get better gas mileage than automatic cars.")
else:
    print("Fail to reject the null hypothesis.")
    print(f"There is not enough evidence to support the claim: {alt_hypothesis}")

Reject the null hypothesis that states there is no difference in gas mileage for automatic vs. manual cars
Move forward with the understanding that manual cars get better gas mileage than automatic cars.
