# Heart Disease Research Part I
In this project, we’ll investigate some data from a sample patients who were evaluated for heart disease at the Cleveland Clinic Foundation. The data was downloaded from the UCI Machine Learning Repository and then cleaned for analysis. 

In [16]:
import pandas as pd
from scipy.stats import ttest_1samp
from scipy.stats import binom_test

In [3]:
df = pd.read_csv('heart_disease.csv')
df.head()

Unnamed: 0,age,sex,trestbps,chol,cp,exang,fbs,thalach,heart_disease
0,63.0,male,145.0,233.0,typical angina,0.0,1.0,150.0,absence
1,67.0,male,160.0,286.0,asymptomatic,1.0,0.0,108.0,presence
2,67.0,male,120.0,229.0,asymptomatic,1.0,0.0,129.0,presence
3,37.0,male,130.0,250.0,non-anginal pain,0.0,0.0,187.0,absence
4,41.0,female,130.0,204.0,atypical angina,0.0,0.0,172.0,absence


# Cholestrol Analysis

In [6]:
yes_hd = df[df.heart_disease == 'presence']
no_hd = df[df.heart_disease == 'absence']
# we’ll investigate cholesterol levels for patients with heart disease
chol_hd = yes_hd['chol']
print(chol_hd.mean())

251.4748201438849


In general, total cholesterol over 240 mg/dl is considered “high” (and therefore unhealthy).


Do people with heart disease have high cholesterol levels (greater than or equal to 240 mg/dl) on average?

Null hypothesis: People with heart disease have an average cholesterol level equal to 240 mg/dl

Alternative hypothesis: People with heart disease have an average cholesterol level that is greater than 240 mg/dl

In [9]:
# compare to cut-off for high cholesterol
tstat,pval = ttest_1samp(chol_hd,240)
print(pval/2)

0.0035411033905155707


# Finding 1
Using a significance threshold of 0.05. we can conclude that heart disease patients have an average cholesterol level significantly greater than 240 mg/dl .

In [10]:
# we’ll investigate cholesterol levels for patients without heart disease
chol_no_hd = no_hd['chol']
print(chol_no_hd.mean())

242.640243902439


In [11]:
# compare to cut-off for high cholesterol
tstat,pval = ttest_1samp(chol_no_hd,240)
print(pval/2)

0.26397120232220506


# Finding 2

Using a significance threshold of 0.05. we can conclude that patients without heart disease do not have average cholesterol levels significantly above 240 mg/dl.


# Fasting Blood Sugar Analysis

In [12]:
# calculate number of patients total
print(len(df))

303


the fbs column of this dataset indicates whether or not a patient’s fasting blood sugar was greater than 120 mg/dl (1 means that their fasting blood sugar was greater than 120 mg/dl; 0 means it was less than or equal to 120 mg/dl).

In [14]:
num_highfbs_patients = sum(df.fbs)
num_highfbs_patients

45.0

By some estimates, about 8% of the U.S. population had diabetes (diagnosed or undiagnosed) in 1988 when this data was collected.

Does this sample come from a population in which the rate of fbs > 120 mg/dl is equal to 8%?

Null Hypothesis: This sample was drawn from a population where 8% of people have fasting blood sugar > 120 mg/dl

Alternative hypothesis: This sample was drawn from a population where more than 8% of people have fasting blood sugar > 120 mg/dl

In [22]:
# run binomial test
p_val = binom_test(45,303,0.08,alternative='greater')
print(p_val)

4.689471951449078e-05


# Finding 3
which is equivalent to 0.0000469. This is less than 0.05, indicating that this sample likely comes from a population where more than 8% of people have fbs > 120 mg/dl. i.e our null hypothesis is false .