# T-Test

__Ace Realty__ wants to determine whether the average time it takes to sell homes is different for its two offices.    
- A sample of 40 sales from office #1 revealed:
  - a mean of 90 days,
  - a standard deviation of 15 days.    
 ***
- A sample of 50 sales from office #2 revealed:
  - a mean of 100 days,
  - a standard deviation of 20 days.  
***
Use a .05 level of significance.

In [3]:
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

from scipy import stats
import pydataset

import env

In [None]:
n1 = 40
x̅_office_1 = 90
s1 = 15

n2 = 50
x̅_office_2 = 100
s2 = 20 # S-2 is the military intelligence section, from which I reign. 

t, p = stats.ttest_ind_from_stats(xbar1, s1, n1, xbar2, s2, n2)

one_tailed_p = p / 2
one_tailed_p

In [None]:
degf = n1 + n2 - 2

dist = stats.t(degf)

x = np.linspace(-3.5, 3.5)
y = dist.pdf(x)

plt.figure(figsize=(12, 9))
plt.plot(x, y)
plt.vlines([-2, 2], 0, .3)

In [None]:
x = np.arange(50,150)

y1 = stats.norm(90,15).pdf(x)
y2 = stats.norm(100,20).pdf(x)


plt.plot(x, y1, label = 'office 1')
plt.plot(x, y2, label = 'office 2')
plt.axvline(90, ls = ':')
plt.axvline(100, ls = ':', color = 'orange')

plt.legend()

In [None]:
#Using Scipy 
α = 0.05

t, p = stats.ttest_ind_from_stats(90,15,40,100,20,50, equal_var= False)
t,p

In [None]:
print(f'''
t = {t:.04f}
p = {p:.04f}

Because p ({one_tailed_p:.4f}) < alpha (.05), we reject the null hypothesis that
the average home sale time is the same for office 1 and office 2.
'''.strip())

# MPG DATASET

Load the __mpg dataset__ and use it to answer the following questions:

- Is there a difference in fuel-efficiency in cars from 2008 vs 1999?    
***
- Are compact cars more fuel-efficient than the average car?
***
- Do manual cars get better gas mileage than automatic cars?

In [4]:
mpg = pydataset.data('mpg')


### _Is there a difference in fuel-efficiency in cars from 2008 vs 1999?_

In [5]:
mpg['avg_fe'] = stats.hmean(mpg[['cty', 'hwy']], axis =1)
mpg.head()

Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class,avg_fe
1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact,22.212766
2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact,24.36
3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact,24.313725
4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact,24.705882
5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact,19.809524


In [9]:
fe_2008 = mpg[mpg.year == 2008].avg_fe
fe_1999 = mpg[mpg.year == 1999].avg_fe


In [None]:
# plot distribution for fe_2008
fe_2008.hist()

In [None]:
# plot distribution for fe_2008
fe_1999.hist()

In [None]:
# how many observations I have for each sample? (N>30,we we meet normality condition)

fe_2008.count(), fe_1999.count()

In [None]:
# is the variance same for both sample? Yes

fe_2008.var(), fe_1999.var()

In [None]:
# # stats Levene test - returns p value. small p-value means unequal variances
# stats.levene(fe_2008, fe_1999)

# calculate t-statistic and p value

t, p = stats.ttest_ind(fe_2008, fe_1999)
t, p

In [None]:
print(f'''
Because p ({p:.3f}) > alpha (.05), we fail to reject the null\
 hypothesis that there is no difference in fuel-efficency in cars\
 from 2008 and 1999.
''')

In [None]:
fe_2008.mean(), fe_1999.mean()

In [None]:
plt.hist([fe_1999, fe_2008], label=["1999 cars", "2008 cars"])
plt.legend(loc="upper right")

In [None]:
# The above was from florence. 
# below darden 

In [None]:
x1 = mpg[mpg.year == 1999].hwy
x2 = mpg[mpg.year == 2008].hwy

x1.var(), x2.var()

In [None]:
t, p = stats.ttest_ind(x1, x2)
p

In [None]:
print(f'''
Because p ({p:.3f}) > alpha (.05), we fail to reject the null hypothesis that there
is no difference in fuel-efficency in cars from 2008 and 1999.
''')

### _Are compact cars more fuel-efficient than the average car?_

In [None]:
x = mpg[mpg['class'] == 'compact'].hwy
mu = mpg.hwy.mean()

t, p = stats.ttest_1samp(x, mu)
print('t=', t)
print('p=', p)

In [None]:
print(f'''
Because p ({p:.12f}) < alpha (.05), we reject the null hypothesis that there is
no difference in fuel-efficiency between compact cars and the overall average.
''')

In [None]:
print(f'''
Avg mileage for compact cars: {x.mean():.2f}
Overall average mileage:      {mu:.2f}
''')

### _Do manual cars get better gas milage than automatic cars?_

In [10]:
is_automatic_transmission = mpg.trans.str.startswith('auto')

x1 = mpg[is_automatic_transmission].hwy
x2 = mpg[~ is_automatic_transmission].hwy

x1.var(), x2.var()

(31.61873264739507, 35.54272043745727)

In [11]:
t, p = stats.ttest_ind(x2, x1)
print('t=', t)
print('p=', p)

t= 4.368349972819688
p= 1.888044765552951e-05


In [12]:
print(f'''
Becuase p ({p:.5f}) < alpha (.05), we reject the null hypothesis that there
is no difference in gas mileage between manual and automatic cars.
''')


Becuase p (0.00002) < alpha (.05), we reject the null hypothesis that there
is no difference in gas mileage between manual and automatic cars.

