## For each of the following questions, formulate a null and alternative hypothesis (be as specific as you can be), then give an example of what a true positive, true negative, type I and type II errors would look like. Note that some of the questions are intentionally phrased in a vague way. It is your job to reword these as more precise questions that could be tested.

> Has the network latency gone up since we switched internet service providers?

   - Run speedtest on ookla
   - $H_0$ : our internet speeds have not changed since we changed internet providers.
   - $H_a$ : our internet speeds have increased since we changed internet providers.
    
- True-positive: Reject $H_0$
    - Speed after changing internet providers is faster
- False-positive: Reject $H_0$
    - internet speed after changing providers is faster, but I'm checking the speed at 10a.m on a Tuesday while nobody elseis home in the neighborhood
- False-negative: Fail to reject $H_0$
 - Internet speeds are slower since changing internet providers, but I'm checking the speeds while everyone is quarantined for coronavirus and everyone is using similar internet providers.
- True-negative: Fail to reject $H_0$
    - since changing internet providers speeds are indeed slower.

> Is the website redesign any good?

- survey our users, using a broad spectrum of demographics of new and existing users.
- $H_0$: our website makes for a harder to navigate and confusing ux
- $H_a$: Our website redesign makes for a better user experience
    
    - True-positive: reject $H_0$
        - We surveyed a broad spectrum of users and our website re-design indeed made our website easier to navigate.
    - False-positive: reject $H_0$
        - We surveyed our designers of the ux and they told us that our ux is better
    - False-negative: Fail to reject $H_0$
        - We surveyed people in the 50+ age bracket and they had never been on a website before.
    - True-negative: fail to reject $H_0$
        - we surveyed a broad spectrum of users and found that our website re-design made it more complicated for new and exisiting users to navigate


> Is our television ad driving more sales?

- Getting the sales from the accounting department of the company
- $H_0$: our television ad was unsuccessful and did not impact sales
- $H_a$: our television ad was successful and made for a very successful quarter

    - True-positive: Reject $H_0$
        - since we aired the ad sales have increased 35% for the product advertised
    - False-positive: Reject $H_0$
        - since we aired the ad sales increased 15% for the product advertised but it's christmas.
    - False-negative: fail to reject $H_0$
        - since we aired our ad sales decreased 10% for the product advertised but we got our numbers from a disgruntled accountant who wants to sabotage our data.
    - True-negative: Fail to reject $H_0$
        - since we aired our ad sales decreased 72% for the product advertised because our ad was a crappy ad and nobody wanted to buy our product.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

- Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices. A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days. A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days. Use a .05 level of significance.

In [7]:
from scipy import stats
from math import sqrt

# office 1 sample of 40 sales days = mean = 90 STD: 15
# office 2 sample of 50 sales days = mean = 100 STD 20

xbar1 = 90
xbar2 = 100

n1 = 40
n2 = 50

s1 = 15
s2 = 20

degf = (n1 + n2) - 2

s_p = sqrt(
    ((n1 - 1) * s1**2 + (n2 - 1) * s2**2)
    /
    (n1 + n2 - 2)
)

t = (xbar1 - xbar2) / (s_p * sqrt(1 / n1 + 1 / n2))

In [8]:
t

-2.6252287036468456

In [9]:
p = stats.t(degf).sf(t) * 2
p

1.9897901475507607

In [11]:
alpha = .05

p, alpha

(1.9897901475507607, 0.05)

- Load the mpg dataset and use it to answer the following questions:

In [49]:
from pydataset import data
mpg = data('mpg')
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


- Is there a difference in fuel-efficiency in cars from 2008 vs 1999?

In [50]:
mpg['fuel_efficency'] = (mpg.hwy + mpg.cty) / 2

mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,fuel_efficency
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,23.5
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,25.0
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,25.5
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,25.5
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,21.0


In [55]:
x1 = mpg[mpg.year == 1999].fuel_efficency
x2 = mpg[mpg.year == 2008].fuel_efficency

t, p = stats.ttest_ind(x1, x2)

In [56]:
t, p

(0.21960177245940962, 0.8263744040323578)

Are compact cars more fuel-efficient than the average car?

In [114]:
compact = mpg[mpg['class'] == 'compact']
not_compact = mpg[mpg['class'] != 'compact']

x1 = compact.fuel_efficency
x2 = not_compact.fuel_efficency

t, p = stats.ttest_ind(x1, x2)

In [115]:
t, p

(6.731177612837954, 1.3059121585018135e-10)

In [131]:
alpha = 0.05
print(f'with a alpha of {alpha} and a p-value of {p:.11f}, we can reject the null hypothesis')

with a alpha of 0.05 and a p-value of 0.00000715437, we can reject the null hypothesis


Do manual cars get better gas mileage than automatic cars?

In [125]:
manual = mpg[mpg['trans'].str.contains('manual')]
auto = mpg[mpg['trans'].str.contains('auto')]

x1 = manual.fuel_efficency
x2 = auto.fuel_efficency

t, p = stats.ttest_ind(x1, x2)

In [127]:
t, p

(4.593437735750014, 7.154374401145683e-06)

In [130]:
alpha = .05
print(f'with an alpha of {alpha} and a p-value of {p:.6f}, we can reject the null hypothesis')

with an alpha of 0.05 and a p-value of 0.000007, we can reject the null hypothesis
