# Familiar: A Study In Data Analysis
Welcome to Familiar, a startup in the new market of blood transfusion!

Familiar has fallen into some tough times lately, so we’re hoping to help them make some insights about their product and help move the needle (so to speak).

The Familiar team has provided us with some data on lifespans for subscribers to two different packages, the Vein Pack and the Artery Pack!

In [22]:
import pandas as pd
from scipy.stats import ttest_1samp,ttest_ind,chi2_contingency
import numpy as np

In [5]:
lifespans = pd.read_csv('familiar_lifespan.csv')
iron = pd.read_csv('familiar_iron.csv')
lifespans.head()

Unnamed: 0,pack,lifespan
0,vein,76.25509
1,artery,76.404504
2,artery,75.952442
3,artery,76.923082
4,artery,73.771212


In [6]:
iron.head()

Unnamed: 0,pack,iron
0,vein,low
1,artery,normal
2,artery,normal
3,artery,normal
4,artery,high


# Finding 1
We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy of 73 years.

Null: The average lifespan of a Vein Pack subscriber is 73 years.

Alternative: The average lifespan of a Vein Pack subscriber is NOT 73 years.

In [18]:
vein_lifespan = lifespans.lifespan[lifespans.pack == 'vein']
print(np.mean(vein_lifespan))
# Running ttest for vein pack lifespan
tstat, pval = ttest_1samp(vein_lifespan, 73)
print(pval)

76.16901335636044
5.972157921433211e-07


which is equivalent to 0.000000597. This is much smaller than 0.05, so we conclude that the average lifespan of Vein Pack subscribers IS significantly different from 73 years.

# Finding 2
We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy for the Artery Pack.


Null: The average lifespan of a Vein Pack subscriber is equal to the average lifespan of an Artery Pack subscriber.

Alternative: The average lifespan of a Vein Pack subscriber is NOT equal to the average lifespan of an Artery Pack subscriber.

In [21]:
# Running ttest 
artery_lifespan = lifespans.lifespan[lifespans.pack == 'artery']
print(np.mean(artery_lifespan))
tstat,pval = ttest_ind(vein_lifespan,artery_lifespan)
print(pval)

74.87366223517039
0.05588883079070819


This is larger than 0.05, so we conclude that the average lifespan of Vein Pack subscribers not significantly different from the average lifespan of an Artery Pack subscriber.

# Finding3
# Side Effects: A Familiar Problem
The Familiar team has provided us with another dataset containing survey data about iron counts for our subscribers. This data has been pre-processed to categorize iron counts as “low”, “normal”, and “high” for each subscriber. Familiar wants to be able to advise potential subscribers about possible side effects of these packs and whether they differ for the Vein vs. the Artery pack.

We’d like to find out if there is a significant association between which pack (Vein vs. Artery) someone subscribes to and their iron level.


Null: There is NOT an association between which pack (Vein vs. Artery) someone subscribes to and their iron level.

Alternative: There is an association between which pack (Vein vs. Artery) someone subscribes to and their iron level.

In [23]:
crosstab = pd.crosstab(iron.pack,iron.iron)
crosstab

iron,high,low,normal
pack,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
artery,87,29,29
vein,20,140,40


In [24]:
chi2,pval,dof,expected = chi2_contingency(crosstab)
print(pval)

9.359749337433008e-25


which is equivalent to 0.000000000000000000000000936. This is smaller than 0.05, so we conclude that there IS a significant association between pack and iron level.