### Familiar: A Study In Data Analysis
Welcome to Familiar, a startup in the new market of blood transfusion! You’ve joined the team because you appreciate the flexible hours and extremely intelligent team, but the overeager doorman welcoming you into the office is a nice way to start your workday (well, work-evening).

Familiar has fallen into some tough times lately, so you’re hoping to help them make some insights about their product and help move the needle (so to speak).

The Familiar team has provided us with some data on lifespans for subscribers to two different packages, the Vein Pack and the Artery Pack! This data has been loaded for you as a dataframe named lifespans.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Load datasets
lifespans = pd.read_csv('/Users/elorm/Documents/Repos/Datasets/familiar_lifespan.csv')
iron = pd.read_csv('/Users/elorm/Documents/Repos/Datasets/familiar_iron.csv')

In [5]:
#Inspecting the first few rows of the dataframes
print(lifespans.head())
print('')
print(iron.head())

     pack   lifespan
0    vein  76.255090
1  artery  76.404504
2  artery  75.952442
3  artery  76.923082
4  artery  73.771212

     pack    iron
0    vein     low
1  artery  normal
2  artery  normal
3  artery  normal
4  artery    high


The first thing we want to know is whether Familiar’s most basic package, the Vein Pack, actually has a significant impact on the subscribers. It would be a marketing goldmine if we can show that subscribers to the Vein Pack live longer than other people.

In [6]:
# Extract the life spans of subscribers to the 'vein' pack and save the data into a variable called vein_pack_lifespans.
vein_pack_lifespans = lifespans[lifespans['pack'] == 'vein']['lifespan']
print(vein_pack_lifespans)

0     76.255090
7     74.502021
8     76.937674
9     74.838006
11    77.484756
12    79.969158
13    77.047370
15    72.874752
18    77.492341
19    77.583983
21    75.993359
23    74.798150
24    73.343702
26    78.326720
27    76.341550
30    77.488889
33    72.142566
34    76.532101
37    75.993032
38    77.435045
Name: lifespan, dtype: float64


In [7]:
#Calculating the average lifespan of the vein pack subscribers
print(np.mean(vein_pack_lifespans))

76.16901335636044


We’d like to find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy of 73 years.

Import the statistical test from scipy.stats that we would use to test the following null and alternative hypotheses:

- Null: The average lifespan of a Vein Pack subscriber is 73 years.
- Alternative: The average lifespan of a Vein Pack subscriber is NOT 73 years.

We have a single sample of lifespans and we want to compare the mean of this sample to a hypothetical population value of 73 years. So a one sample t-test is appropriate here.

In [8]:
# Run one-sample t-test
from scipy.stats import ttest_1samp
tstat, pval = ttest_1samp(vein_pack_lifespans, 73)
print(pval)

5.972157921433082e-07


The pval is much smaller than 0.05, so we conclude that the average lifespan of Vein Pack subscribers IS significantly different from 73 years.

#### Upselling Familiar: Pumping Life Into The Company
In order to differentiate Familiar’s different product lines, we’d like to compare this lifespan data between our different packages. Our next step up from the Vein Pack is the Artery Pack.

In [9]:
# Extract the life spans of subscribers to the 'artery' pack and save the data into a variable called artery_pack_lifespans
artery_pack_lifespans = lifespans[lifespans['pack'] == 'artery']['lifespan']
print(artery_pack_lifespans)

1     76.404504
2     75.952442
3     76.923082
4     73.771212
5     74.117420
6     74.963119
10    73.319543
14    75.857401
16    76.335370
17    73.079249
20    76.152654
22    77.023545
25    74.639757
28    78.385477
29    68.314898
31    77.386507
32    73.044766
35    74.544983
36    73.355103
39    73.902213
Name: lifespan, dtype: float64


In [10]:
print(np.mean(artery_pack_lifespans))

74.8736622351704
