# Hypothesis Testing

For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.

### Has the network latency gone up since we switched internet service providers?

Null hypothesis: Internet service provider does not affect network latency

Alternative hypothesis: Our new internet is giving me lag! T_T

True positive: Network latency going up is due to a difference in service providers

True negative: Our network latency is due to factors outside of our choice of service provider

Type I error: I knew we shouldn't have switched our ISP! Now we're lagging! #But it's not the ISP, bruh

Type II error: No, just got off the phone with our ISP. They're saying we're good. #BUT IT IS THE ISP BRUH


### Is the website redesign any good?

Null hypothesis: The new design of the website doesn't affect web performance

Alternative hypothesis: Our redo is killin' it!

True positive: The new design of the website positively impacts web performance

True negative: Our new web design does not affect web performance

Type I error: Web performance is awesome thanks to the website design team! #but good web performance is due to other factors

Type II error: Our web performance went up due to other factors. #But it was really the outstanding new design

### Is our television ad driving more sales?

Null hypothesis: Our television ad does not affect our sales

Alternative hypothesis: Our ad is pulling the cash in!

True positive: Our increased sales are driven by our advertisement

True negative: Our increases sales are due to other factors

Type I error: Look at all these profits thanks to the new ad! 
#But I told my fanbase about the product 2 weeks ago, and they been mad posting with the product for a week

Type II error: Our sales are just going up, I can't find enough evidence to say our ad affects sales. 
#But no one knew about our product before the ad boiiiiiiiiiiiiii

# T-Test Exercises

In [1]:
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
from pydataset import data

## Exercise 1
Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [2]:
#null hypothesis: The office_location selling the homes does not have an effect on the time it takes to sell homes
#alternate hypothesis: The different offices do sell homes at different rates

In [3]:
α = .05
μ1 = 90
σ1 = 15
μ2 = 100
σ2 = 20
office_1 = np.random.normal(μ1, σ1, 40)
office_2 = np.random.normal(μ2, σ2, 50)

In [4]:
office_1.var(), office_2.var() 

(163.71188074678173, 419.0918666447691)

In [5]:
office_2.var() / office_1.var()
#variance ratio is less than 4

2.559935569324938

In [6]:
#tentative
t, p = stats.ttest_ind(office_1, office_2, equal_var = True)
t, p, α

(-4.260784992739686, 5.093602523912023e-05, 0.05)

In [7]:
if p < α:
    print('REJECTED: Null hypothesis')
else:
    print('Failed to reject: Null hypothesis')

REJECTED: Null hypothesis


In [8]:
#fsho
t, p = stats.ttest_ind_from_stats(μ1, σ1, 40, μ2, σ2, 50)
t, p, α

(-2.6252287036468456, 0.01020985244923939, 0.05)

In [9]:
if p < α:
    print('REJECTED: Null hypothesis')
else:
    print('Failed to reject: Null hypothesis')

REJECTED: Null hypothesis


In [10]:
#Different offices sell homes at different rates

## Exercise 2
#### Load the mpg dataset and use it to answer the following questions:

Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

Are compact cars more fuel-efficient than the average car?

Do manual cars get better gas mileage than automatic cars?

#### A. Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [11]:
mpg = data('mpg')

In [12]:
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


In [13]:
mpg['average_mileage'] = mpg[['cty', 'hwy']].mean(axis = 1)
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


In [14]:
models_2008 = mpg[mpg.year == 2008]
models_1999 = mpg[mpg.year == 1999]
models_1999.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,average_mileage
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0
6,audi,a4,2.8,1999,6,manual(m5),f,18,26,p,compact,22.0
8,audi,a4 quattro,1.8,1999,4,manual(m5),4,18,26,p,compact,22.0


In [15]:
#null hypothesis: There is no difference in fuel efficiency.
#alternate hypothesis: The year of manufacture does affect the fuel efficiency of the car

In [16]:
α = .05
models_1999.average_mileage.var(), models_2008.average_mileage.var()
#variance is less than 4

(27.122605363984682, 24.097480106100797)

In [17]:
t, p = stats.ttest_ind(models_2008.average_mileage, models_1999.average_mileage)
t, p, α

(-0.21960177245940962, 0.8263744040323578, 0.05)

In [18]:
if p < α:
    print('REJECTED: Null hypothesis')
else:
    print('Failed to reject: Null hypothesis')

Failed to reject: Null hypothesis


In [19]:
#There is no difference in fuel efficiency between 2008 cars and 1999 cars

#### B. Are compact cars more fuel-efficient than the average car?

In [20]:
#null hypothesis: compact cars and average cars have the same fuel efficiency
#alternate hypothesis: compact cars are more fuel-efficient than the average car
α = .05
compact_cars = mpg[mpg['class'] == 'compact']
mpg.average_mileage.mean()

20.14957264957265

In [21]:
t, p = stats.ttest_1samp(compact_cars.average_mileage, mpg.average_mileage.mean())
t, p , α

(7.896888573132535, 4.1985637943171336e-10, 0.05)

In [22]:
if p < α:
    print('REJECTED: Null hypothesis')
else:
    print('Failed to reject: Null hypothesis')

REJECTED: Null hypothesis


In [23]:
#Compact cars are more fuel-efficient 

#### C. Do manual cars get better gas mileage than automatic cars?

In [24]:
#null hypothesis: automatic cars and manual cars have the same gas mileage
#alternate hypothesis: manual cars have better gas mileage than automatic cars
α = .05
manual_average = mpg[mpg.trans.str.contains('manual')].average_mileage
auto_average = mpg[mpg.trans.str.contains('auto')].average_mileage
manual_average.var(), auto_average.var()
#variances are roughly the same (variance1:variance2 < 4)

(26.635167464114826, 21.942777233382337)

In [25]:
t, p = stats.ttest_ind(manual_average, auto_average)
t, p, α

(4.593437735750014, 7.154374401145683e-06, 0.05)

In [26]:
if p < α:
    print('REJECTED: Null hypothesis')
else:
    print('Failed to reject: Null hypothesis')

REJECTED: Null hypothesis


In [27]:
#manual cars have better gas mileage