## Cholesterol Analysis
The full dataset has been loaded and split into two subsets:

`yes_hd`, which contains data for patients with heart disease
`no_hd`, which contains data for patients without heart disease

For this project, we’ll investigate the following variables:

`chol`: serum cholestorol in mg/dl
`fbs`: an indicator for whether fasting blood sugar is greater than 120 mg/dl (`1` = true; `0` = false)

In [12]:
# Import libraries
import pandas as pd
import numpy as np
from scipy.stats import ttest_1samp

# Load data
heart = pd.read_csv('heart_disease_data.csv')
yes_hd = heart[heart.heart_disease == 'presence']
no_hd = heart[heart.heart_disease == 'absence']

To start, we’ll investigate cholesterol levels for patients with heart disease. We'll be using the dataset `yes_hd` to save cholesterol levels for patients with heart disease as a variable named `chol_hd`.

In [13]:
# Get cholesterol levels for patients with heart disease
chol_hd = yes_hd.chol

# Calculate the mean cholesterol for people with heart disease
mean_chol_hd = np.mean(chol_hd)
# Print the rounded value
print(round(mean_chol_hd))

251


We want to know if people with heart disease have high cholesterol levels (greater than or equal to 240 mg/dl) on average. We'll have to run a two-sided test for it.

In [14]:
# Compare to cut-off for high cholesterol
tstat, pval = ttest_1samp(chol_hd, 240)
# Calculate the p-value for the one-sided test by dividing the two-sided p-value in half
print(pval/2)

0.0035411033905155703


The result `0.0035` is less than a significance threshold of `0.05`, suggesting that heart disease patients have an average cholesterol level higher than 240 mg/dl.

Now we want to answer the same question for patients without heart disease, thus we'll repeat the previous steps for `no_hd`.

In [16]:
# Get cholesterol levels for patients without heart disease
chol_no_hd = no_hd.chol

# Calculate the mean cholesterol for people without heart disease
mean_chol_no_hd = np.mean(chol_no_hd)
# Print the rounded value
print(round(mean_chol_no_hd))

# Compare to cut-off for high cholesterol
tstat, pval = ttest_1samp(chol_no_hd, 240)
# Calculate the p-value for the one-sided test by dividing the two-sided p-value in half
print(pval/2)

243
0.26397120232220506


## Fasting Blood Sugar Analysis

The fbs column of this dataset indicates whether or not a patient’s fasting blood sugar was greater than 120 mg/dl (`1` means that their fasting blood sugar was greater than 120 mg/dl; `0` means it was less than or equal to 120 mg/dl). Thus, we can first calculate the length of the dataset and then use `sum` to separate those of greater blood sugar value.

In [19]:
# Calculate the total number of patients in the heart dataset
num_patients = len(heart)
# Print the results
print(num_patients)

# Calculate number of patients with fbs>120
num_highfbs_patients = np.sum(heart.fbs)
# Print the results
print(num_highfbs_patients)

303
45.0
