## Analyzing Distinctions in Success of Blood Transfusion Products using Inferential Statistics Techniques

A project for my Codecademy-based Data Scientist Certification, for Machine Learning specialists.

Robert Hall

08/22/2024

##### Product and Scenario Rundown

Analysis is for a fictional company named *Familiar*, a blood transfusion product manufacturer which provides two different products; the 'artery' pack, and the 'vein' pack. The company needs to understand the efficacy and relative success of the two products. 

The following questions will be guiding the project:

* What are the average participant lifespans for each product?
* Is the average lifespan of vein pack subscribers significantly different from the [hypothetical] average life expectancy of 73 years? 
* Is the average lifespan of artery pack subscribers significantly different from the [hypothetical] average life expectancy of 73 years? 
* Is the average lifespan of vein pack patients significantly different to average lifespan for artery pack patients?
* Is there an association between the product that a given patient uses and their iron level?

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_1samp, ttest_ind, chi2_contingency

In [5]:
lifespans = pd.read_csv('lifespan.csv')
print(lifespans.head())

     pack   lifespan
0    vein  76.255090
1  artery  76.404504
2  artery  75.952442
3  artery  76.923082
4  artery  73.771212


In [6]:
iron = pd.read_csv('iron.csv')
print(iron.head())

     pack    iron
0    vein     low
1  artery  normal
2  artery  normal
3  artery  normal
4  artery    high


#### What are the average participant lifespans for each product?


In [18]:
# create dataframe with only lifespans from vein pack patients
vein_pack_lifespans = lifespans[lifespans['pack'] == 'vein']['lifespan']

# create dataframe with only lifespans of artery pack patients
artery_pack_lifespans = lifespans[lifespans['pack'] == 'artery']['lifespan']

# means for each group
vein_lifespan_mean = np.round(np.mean(vein_pack_lifespans), 2)
artery_lifespan_mean = np.round(np.mean(artery_pack_lifespans), 2)

# print mean lifespan for vein pack patients
print(f"mean lifespan for Vein:  {vein_lifespan_mean} years")

# print mean lifespan for artery pack patients
print(f"mean lifespan for Artery:  {artery_lifespan_mean} years")

mean lifespan for Vein:  76.17 years
mean lifespan for Artery:  74.87 years


#### Is the average lifespan of vein pack subscribers significantly different from the [hypothetical] average life expectancy of 73 years? 

In [16]:
sig = 0.05 # significance threshold

print(f"average lifespan:            73 years")
print(f"vein pack average lifespan:  {np.round(np.mean(vein_pack_lifespans), 2)} years")

ttest, pval = ttest_1samp(vein_pack_lifespans, 73) # one-sample t-test

if pval < sig:
  print("p-value: ", round(pval, 6))
  print("significant difference from average life expectancy.")
elif pval >= sig:
  print("p-value: ", round(pval, 6))
  print("no significant difference from average life expectancy.")

average lifespan:            73 years
vein pack average lifespan:  76.17 years
p-value:  1e-06
significant difference from average life expectancy.


#### Is the average lifespan of artery pack subscribers significantly different from the [hypothetical] average life expectancy of 73 years?

In [14]:
sig = 0.05 # significance threshold

print(f"average lifespan:               73 years")
print(f"artery pack average life span:  {np.round(np.mean(artery_pack_lifespans), 2)} years")

ttest, pval = ttest_1samp(artery_pack_lifespans, 73) # one-sample t-test

if pval < sig:
  print("p-value: ", round(pval, 6))
  print("significant difference from average life expectancy.")
elif pval >= sig:
  print("p-value: ", round(pval, 6))
  print("no significant difference from average life expectancy.")

average lifespan:               73 years
artery pack average life span:  74.87 years
p-value:  0.001241
significant difference from average life expectancy.


#### Is the average lifespan of vein pack patients significantly different to average lifespan for artery pack patients?

In [20]:
sig = 0.05 # significance threshold

print(f'average lifespan for artery pack group: {artery_lifespan_mean} years')
print(f'average lifespan for vein pack group: {vein_lifespan_mean} years')

ttest, pval = ttest_ind(artery_pack_lifespans, vein_pack_lifespans)
if pval < sig:
  print("p-value: ", round(pval, 6))
  print("significant difference in life expectancy.")
elif pval >= sig:
  print("p-value: ", round(pval, 6))
  print("no significant difference in life expectancy.")

average lifespan for artery pack group: 74.87 years
average lifespan for vein pack group: 76.17 years
p-value:  0.055889
no significant difference in life expectancy.


#### Is there an association between the product that a given patient uses and their iron level?

In [17]:
sig = 0.05 # significance threshold
contingency = pd.crosstab(iron.pack, iron.iron) # contingency table for each product group

print(contingency, '\n') # relative frequencies of iron thresholds per product group

chi2, pval, dof, expected = chi2_contingency(contingency) # chi2 test of association

if pval < sig:
  print("p-value: ", round(pval, 26))
  print("significant association.")
elif pval >= sig:
  print("p-value: ", round(pval, 26))
  print("no significant association.")

iron    high  low  normal
pack                     
artery    87   29      29
vein      20  140      40 

p-value:  9.400000000000001e-25
significant association.


#### Summary Analysis and Key Takeaways

* Those who use the vein pack have an average life expectancy of 76.17 years. Those who use the artery pack have an average life expectancy of 74.87 years.

* Both products have a statistically significant average lifespan above the hypothetical mean of 73 years.

* There is no statistically significant difference between the two products in terms of lifespan.

* There is a statistically significant association between the product groups and their iron levels. Patients using the vein pack have significantly lower levels of iron than do those using the artery pack. 