#### HYPOTHESIS TESTING

<br>

## Familiar: A Study in Data Analysis
<hr>

Welcome to Familiar, a startup in the new market of blood transfusion!

### What Can Familiar Do For You?

In [18]:
import pandas as pd
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind, chi2_contingency

lifespans = pd.read_csv('familiar_lifespan.csv')
iron = pd.read_csv('familiar_iron.csv')

In [19]:
lifespans.head()

Unnamed: 0,pack,lifespan
0,vein,76.25509
1,artery,76.404504
2,artery,75.952442
3,artery,76.923082
4,artery,73.771212


In [20]:
vein_pack_lifespans = lifespans.lifespan[lifespans.pack == 'vein']
vein_pack_lifespans.head()

0     76.255090
7     74.502021
8     76.937674
9     74.838006
11    77.484756
Name: lifespan, dtype: float64

In [21]:
vein_lifespans_mean = np.mean(vein_pack_lifespans)
print("Vein Lifespan Mean: " + str(vein_lifespans_mean)) #it's longer than 73 years

Vein Lifespan Mean: 76.16901335636044


We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy of 73 years.
<br>

<b>Null:</b> The average lifespan of a Vein Pack subscriber is 73 years. <br>
<b>Alternative:</b> The average lifespan of a Vein Pack subscriber is NOT 73 years.

In [22]:
tstat, pval = ttest_1samp(vein_pack_lifespans, 73)
print("p-value: " + str("{:.10f}".format(float(pval))))
print("Reject the null hypothesis. The average lifespan of a Vein Pack subscriber is NOT 73 years")

p-value: 0.0000005972
Reject the null hypothesis. The average lifespan of a Vein Pack subscriber is NOT 73 years


### Upselling Familiar: Pumping Life Into The Company

In [23]:
artery_pack_lifespans = lifespans.lifespan[lifespans.pack == 'artery']
artery_pack_lifespans.head()

1    76.404504
2    75.952442
3    76.923082
4    73.771212
5    74.117420
Name: lifespan, dtype: float64

In [24]:
artery_lifespans_mean = np.mean(artery_pack_lifespans)
print("Artery Lifespan Mean: " + str(artery_lifespans_mean))

Artery Lifespan Mean: 74.87366223517039


We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy for the Artery Pack.
<br>

<b>Null: </b>The average lifespan of a Vein Pack subscriber is equal to the average lifespan of an Artery Pack subscriber. <br>
<b>Alternative: </b>The average lifespan of a Vein Pack subscriber is NOT equal to the average lifespan of an Artery Pack subscriber.

In [30]:
tstat, pval = ttest_ind(vein_pack_lifespans, artery_pack_lifespans)
print("p-value: " + str(pval))
print("Don't reject the null hypothesis. The average lifespan of a Vein Pack subscriber is equal to the average lifespand of an Artery Pack subscriber.")

p-value: 0.05588883079070819
Don't reject the null hypothesis. The average lifespan of a Vein Pack subscriber is equal to the average lifespand of an Artery Pack subscriber.


### Side Effects: A Familiar Problem

The Familiar team has provided us with another dataset containing survey data about iron counts for our subscribers. This data has been pre-processed to categorize iron counts as “low”, “normal”, and “high” for each subscriber. Familiar wants to be able to advise potential subscribers about possible side effects of these packs and whether they differ for the Vein vs. the Artery pack.

In [26]:
iron.head()

Unnamed: 0,pack,iron
0,vein,low
1,artery,normal
2,artery,normal
3,artery,normal
4,artery,high


In [27]:
#create contingency table
Xtab = pd.crosstab(iron.pack, iron.iron)
print(Xtab)

iron    high  low  normal
pack                     
artery    87   29      29
vein      20  140      40


We’d like to find out if there is a significant association between which pack (Vein vs. Artery) someone subscribes to and their iron level.
<br>

<b>Null: </b>There is NOT an association between which pack (Vein vs. Artery) someone subscribes to and their iron level. <br>
<b>Alternative: </b>There is an association between which pack (Vein vs. Artery) someone subscribes to and their iron level.

In [31]:
chi2, pval, dof, exp = chi2_contingency(Xtab)
print(str("{:.28f}".format(float(pval))))
print("Reject the null hypothesis. There is a significant association between pack and iron level.")

0.0000000000000000000000009360
Reject the null hypothesis. There is a significant association between pack and iron level.
