Welcome to Familiar, a startup in the new market of blood transfusion! You’ve joined the team because you appreciate the flexible hours and extremely intelligent team, but the overeager doorman welcoming you into the office is a nice way to start your workday (well, work-evening).

Familiar has fallen into some tough times lately, so you’re hoping to help them make some insights about their product and help move the needle (so to speak).

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Loading the dataset
lifespans = pd.read_csv('data/life_spans.csv')
iron = pd.read_csv('data/familiar_iron.csv')

In [3]:
lifespans.head()

Unnamed: 0,pack,lifespan
0,vein,76.25509
1,artery,76.404504
2,artery,75.952442
3,artery,76.923082
4,artery,73.771212


In [4]:
iron.head()

Unnamed: 0,pack,iron
0,vein,low
1,artery,normal
2,artery,normal
3,artery,normal
4,artery,high


The first thing we want to know is whether Familiar’s most basic package, the Vein Pack, actually has a significant impact on the subscribers. It would be a marketing goldmine if we can show that subscribers to the Vein Pack live longer than other people.

We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy of 73 years.

Hypothesis:
- Null: The average lifespan of a Vein Pack subscriber is 73 years.
- Alternative: The average lifespan of a Vein Pack subscriber is NOT 73 years.

In [5]:
#Importing 1-sample t-test from scipy
from scipy.stats import ttest_1samp

#Extracting the lifespans of vein pack subscribers
vein_pack_lifespans = lifespans[lifespans['pack'] == 'vein']['lifespan']

#Average lifespan of vein pack subscribers
print('Average life span of vein pack subscribers:', np.mean(vein_pack_lifespans))

#Performing 1-sample t-test
tsat, pval = ttest_1samp(vein_pack_lifespans, 73)
print('P-Value:', pval)

Average life span of vein pack subscribers: 76.16901335636044
P-Value: 5.972157921433211e-07


The p-value is smaller than 0.05. This implies that the average lifespan of vein pack subscribers is not 73 years.

In order to differentiate Familiar’s different product lines, we’d like to compare this lifespan data between our different packages. Our next step up from the Vein Pack is the Artery Pack.

We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy for the Artery Pack.

Hypothesis:
- Null: The average lifespan of a Vein Pack subscriber is equal to the average lifespan of an Artery Pack subscriber.
- Alternative: The average lifespan of a Vein Pack subscriber is NOT equal to the average lifespan of an Artery Pack subscriber.

In [6]:
#Extracting Lifespans of artery pack subscribers
artery_pack_lifespans = lifespans[lifespans['pack'] == 'artery']['lifespan']
print('Average lifespan of artery pack subscribers:', np.mean(artery_pack_lifespans))

#importing 2-sample ttest
from scipy.stats import ttest_ind

tstat, pval = ttest_ind(vein_pack_lifespans,artery_pack_lifespans)
print('P-Value:', pval)

Average lifespan of artery pack subscribers: 74.8736622351704
P-Value: 0.05588883079070819


The p-value here is greater than 0.05, this means that there is no significant difference between the lifespans between the two groups.

The Familiar team has provided us with another dataset containing survey data about iron counts for our subscribers. This data has been pre-processed to categorize iron counts as “low”, “normal”, and “high” for each subscriber. Familiar wants to be able to advise potential subscribers about possible side effects of these packs and whether they differ for the Vein vs. the Artery pack.

We’d like to find out if there is a significant association between which pack (Vein vs. Artery) someone subscribes to and their iron level.

Hypothesis
- Null: There is NOT an association between which pack (Vein vs. Artery) someone subscribes to and their iron level.
- Alternative: There is an association between which pack (Vein vs. Artery) someone subscribes to and their iron level.

In [7]:
from scipy.stats import chi2_contingency

#Cross-tabulation
Xtab = pd.crosstab(iron['pack'], iron['iron'])
print(Xtab)

chi2, pval, dof, exp = chi2_contingency(Xtab)
print('P-value:', pval)

iron    high  low  normal
pack                     
artery    87   29      29
vein      20  140      40
P-value: 9.359749337433008e-25


The p-value is smaller than 0.05, so we conclude that there IS a significant association between pack and iron level.