# Hypothesis Testing Exercises

In [73]:
from scipy import stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1. Answer with the type of test you would use (assume normal distribution):

### a. Is there a difference in grades of students on the second floor compared to grades of all students?

Independent T-Test

Instructor Answer: One sample t-test two tailed

### b. Are adults who drink milk taller than adults who dont drink milk?

One sample T-test

Instructor answer: Independent one-tailed t-test 

### c. Is the the price of gas higher in texas or in new mexico?

One sample T-test

Instructor Answer: Independent one-tailed t-test

### d. Are there differences in stress levels between students who take data science vs students who take web development vs students who take cloud academy?

ANOVA test

## 2. Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

H0 = The mean of length of time to sell a home at office A is == the mean of the length of time to sell home at office B.

HA = The mean of length of time to sell a home at office A is != the mean of the length of time to sell home at office B.



In [16]:
office_a = np.random.normal(90, 15, 40)
office_b = np.random.normal(100, 20, 50)
'''
Instructor answer:office_1 = stats.norm(90, 15).rvs(40)
                  office_2 = stats.norm(100,20).rvs(50)
                  
stat, p_val = stats.levene(office_1,office_2)
if p_val < 0.05:
    print('We can reject H0 ==> inequal variance')
else:
    print('We can refect H0 ==> equal variance')'''
print(office_a.var())
print(office_b.var())

146.80842818538014
399.38322533367705


In [72]:
alpha = .05
t, p = stats.ttest_ind(office_a, office_b, equal_var=False)
print(t, p/2)

if p / 2 > alpha:
    print('We fail to reject the null')
elif t < 0:
    print('We fail to reject the null')
else:
    print('We reject the null')

-2.5958751317255464 0.005582424000215237
We fail to reject the null


In [2]:
from pydataset import data

## 3. Load the mpg dataset and use it to answer the following questions:

In [25]:
mpg = data('mpg')
mpg

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
...,...,...,...,...,...,...,...,...,...,...,...
230,volkswagen,passat,2.0,2008,4,auto(s6),f,19,28,p,midsize
231,volkswagen,passat,2.0,2008,4,manual(m6),f,21,29,p,midsize
232,volkswagen,passat,2.8,1999,6,auto(l5),f,16,26,p,midsize
233,volkswagen,passat,2.8,1999,6,manual(m5),f,18,26,p,midsize


## a. Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

H0 = Fuel efficiency in cars from 2008 == the fuel efficiency of cars in 1999.

HA = Fuel efficiency in cars from 2008 != the fuel efficiency of cars in 1999.

There are two measures of fuel efficiency in mpg: city and highway. This hypothesis will measure the average of these two values.

In [78]:
cars_2008_fe = mpg[mpg.year == 2008][['cty', 'hwy']].mean(axis=1)
cars_1999_fe = mpg[mpg.year == 1999][['cty', 'hwy']].mean(axis=1)
'''
Instructor Answer:
Harmonic mean is used for rates.
mpg['avg_fe'] = stats.hmean(mpg[['city', 'hwy']], axis =1)'''
print(cars_2008_fe.var())
print(cars_1999_fe.var())

# Instructor Answer: variance does not have to be exact, just needs to be close

24.097480106100797
27.122605363984682


In [77]:
alpha = .05
t, p = stats.ttest_ind(cars_2008_fe, cars_1999_fe, equal_var=False)
# Instructor Answer: t, p = stats.ttest_ind(cars_2008_fe, cars_1999_fe)
print(t, p/2)

if p / 2 > alpha:
    print('We fail to reject the null')
elif t < 0:
    print('We fail to reject the null')
else:
    print('We reject the null')

-0.21960177245940962 0.4131872020161789
We fail to reject the null


## b. Are compact cars more fuel-efficient than the average car?

H0 = Fuel efficiency in compact cars == the fuel efficiency the average car.

HA = Fuel efficiency in compact cars != the fuel efficiency the average car.

In [69]:
cars_compact_fe = mpg[mpg['class'] == 'compact'][['cty', 'hwy']].mean(axis=1)
cars_average_fe = mpg[mpg['class'] != 'compact'][['cty', 'hwy']].mean(axis=1)
'''
Instructor answer: mean = mpg.avg_fuel.mean()'''
print(cars_compact_fe.var())
print(cars_average_fe.var())

t, p = stats.ttest_ind(cars_compact_fe,cars_average_fe, equal_var=False)
print(t, p/2)

if p / 2 > alpha:
    print('We fail to reject the null')
elif t < 0: #Instructor answer: t > 0 since this is a one-tailed one sample test.
    print('We fail to reject the null')
else:
    print('We reject the null')

12.442876965772433
23.652794548904602
8.128810422808078 8.009030328061511e-13
We reject the null


## c. Do manual cars get better gas mileage than automatic cars?

H0 = Fuel efficiency in manual cars == the fuel efficiency automatic cars.

HA = Fuel efficiency in manual cars != the fuel efficiency automatic car.

Note* practice correctly categorizing the tailed type test and the type of test to run.

In [63]:
cars_manual_fe = mpg[mpg.trans.str.startswith('m')][['cty', 'hwy']].mean(axis=1)
cars_auto_fe = mpg[mpg.trans.str.startswith('a')][['cty', 'hwy']].mean(axis=1)
print(cars_manual_fe.var())
print(cars_auto_fe.var())

26.635167464114826
21.942777233382337


In [71]:
t, p = stats.ttest_ind(cars_manual_fe, cars_auto_fe, equal_var=False)
print(t, p/2)

if p / 2 > alpha:
    print('We fail to reject the null')
elif t < 0:
    print('We fail to reject the null')
else:
    print('We reject the null')

4.443514012903072 8.976124499958947e-06
We reject the null
