In [17]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_1samp, ttest_ind, ttest_rel

# Load the heart disease dataset
df = pd.read_csv("heart.csv")


### One-Sample t-test

The one-sample t-test checks if the mean of a single sample is different from a known or hypothesized population mean. Here, we’ll test if the average cholesterol level (chol) in our sample is significantly different from a known population mean of 200 mg/dL.

In [11]:
# One-sample t-test: Test if the mean cholesterol level is different from 200

chol_sample = df['Chol']
population_mean = 200

t_statistic, p_value = ttest_1samp(chol_sample, population_mean)
print("One-Sample t-test:")
print("Null Hypothesis: The mean cholesterol level of the sample is equal to 200 mg/dL.")
print(f"t-statistic = {t_statistic}, p-value = {p_value}")
if p_value < 0.05:
    print("Result: Reject the null hypothesis - the sample mean cholesterol is significantly different from 200 mg/dL.\n")
else:
    print("Result: Fail to reject the null hypothesis - the sample mean cholesterol is not significantly different from 200 mg/dL.\n")


One-Sample t-test:
Null Hypothesis: The mean cholesterol level of the sample is equal to 200 mg/dL.
t-statistic = 15.697754943543861, p-value = 5.111676087498585e-41
Result: Reject the null hypothesis - the sample mean cholesterol is significantly different from 200 mg/dL.



### Two-Sample t-test

An independent t-test is used to compare the means of two independent groups. In the heart disease dataset, we can test if the mean resting blood pressure (trestbps) differs between individuals with and without heart disease (assuming target column represents the presence of heart disease).

In [14]:
# Independent two-sample t-test: Test if the mean resting blood pressure differs between heart disease and no heart disease

group1 = df[df['AHD'] == 1]['RestBP']  # Patients with heart disease
group2 = df[df['AHD'] == 0]['RestBP']  # Patients without heart disease

t_statistic, p_value = ttest_ind(group1, group2)
print("Independent Two-Sample t-test:")
print("Null Hypothesis: The mean resting blood pressure is the same for individuals with and without heart disease.")
print(f"t-statistic = {t_statistic}, p-value = {p_value}")
if p_value < 0.05:
    print("Result: Reject the null hypothesis - there is a significant difference in resting blood pressure between individuals with and without heart disease.\n")
else:
    print("Result: Fail to reject the null hypothesis - no significant difference in resting blood pressure between individuals with and without heart disease.\n")

Independent Two-Sample t-test:
Null Hypothesis: The mean resting blood pressure is the same for individuals with and without heart disease.
t-statistic = nan, p-value = nan
Result: Fail to reject the null hypothesis - no significant difference in resting blood pressure between individuals with and without heart disease.



### Paired t-test

A paired t-test is used to compare the means of two related groups. Let’s say the dataset includes a scenario where each individual’s cholesterol level (chol_before) was measured before and after a treatment (chol_after). We can test if there’s a significant change in cholesterol levels post-treatment.

For this example, let’s assume we create these columns for demonstration purposes:

In [18]:
# Adding hypothetical columns for demonstration
df['chol_before'] = df['Chol']  # Assume 'chol' as before-treatment cholesterol
df['chol_after'] = df['Chol'] - np.random.normal(5, 10, size=len(df))  # Hypothetical after-treatment cholesterol

# Null Hypothesis: The mean cholesterol level before and after treatment is the same.
t_statistic, p_value = ttest_rel(df['chol_before'], df['chol_after'])
print("Paired t-test:")
print("Null Hypothesis: The mean cholesterol level before and after treatment is the same.")
print(f"t-statistic = {t_statistic}, p-value = {p_value}")
if p_value < 0.05:
    print("Result: Reject the null hypothesis - there is a significant difference in cholesterol levels before and after treatment.\n")
else:
    print("Result: Fail to reject the null hypothesis - no significant difference in cholesterol levels before and after treatment.\n")


Paired t-test:
Null Hypothesis: The mean cholesterol level before and after treatment is the same.
t-statistic = 9.619967189412986, p-value = 2.7706526851086036e-19
Result: Reject the null hypothesis - there is a significant difference in cholesterol levels before and after treatment.

