# Health Data Analysis Project
**Name:** Shouq Alshammari

## Introduction
This project presents a statistical analysis of simulated health data. The aim is to explore patterns and relationships among several health indicators including:
- Age
- Heart Rate
- BMI (Body Mass Index)
- Sleep Hours
- Daily Activity Level
- Health Condition

We will perform four statistical tests to investigate hypotheses, interpret results, and visualize the findings.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set seed and style
np.random.seed(1)
sns.set(style='whitegrid')

In [None]:
# Generate simulated health data
n = 150
data = pd.DataFrame({
    'Age': np.random.randint(18, 65, n),
    'HeartRate': np.random.normal(75, 10, n).round(1),
    'BMI': np.random.normal(25, 4, n).round(1),
    'SleepHours': np.clip(np.random.normal(7, 1.5, n), 3, 10).round(1),
    'ActivityLevel': np.random.choice(['Low', 'Medium', 'High'], n),
    'Gender': np.random.choice(['Male', 'Female'], n),
    'HealthStatus': np.random.choice(['Healthy', 'Unhealthy'], n, p=[0.6, 0.4])
})
data.head()

## Descriptive Statistics

In [None]:
data.describe()

## Statistical Hypotheses
We will perform the following four statistical tests:
1. **T-test**: Is there a significant difference in BMI between males and females?
   - H₀: There is no difference in BMI between genders.
   - H₁: There is a difference in BMI between genders.

2. **Pearson Correlation**: Is there a relationship between sleep hours and BMI?
   - H₀: No correlation exists between sleep hours and BMI.
   - H₁: There is a correlation between sleep hours and BMI.

3. **ANOVA**: Does heart rate differ by activity level?
   - H₀: Heart rate means are equal across activity levels.
   - H₁: At least one group has a different mean heart rate.

4. **Chi-Square Test**: Is health status related to activity level?
   - H₀: Health status is independent of activity level.
   - H₁: Health status is associated with activity level.

### 1. T-test: BMI by Gender

In [None]:
male_bmi = data[data['Gender'] == 'Male']['BMI']
female_bmi = data[data['Gender'] == 'Female']['BMI']
t_stat, p_val = stats.ttest_ind(male_bmi, female_bmi)
print(f"T-statistic = {t_stat:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 2. Pearson Correlation: Sleep Hours and BMI

In [None]:
corr, p_val = stats.pearsonr(data['SleepHours'], data['BMI'])
print(f"Correlation (r) = {corr:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 3. ANOVA: Heart Rate by Activity Level

In [None]:
groups = [group['HeartRate'].values for name, group in data.groupby('ActivityLevel')]
f_stat, p_val = stats.f_oneway(*groups)
print(f"F-statistic = {f_stat:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 4. Chi-Square Test: Activity Level and Health Status

In [None]:
contingency = pd.crosstab(data['ActivityLevel'], data['HealthStatus'])
chi2, p_val, dof, expected = stats.chi2_contingency(contingency)
print(f"Chi-square = {chi2:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

## Data Visualizations

In [None]:
# Histogram: BMI
plt.figure(figsize=(8,4))
sns.histplot(data['BMI'], kde=True, color='skyblue')
plt.title('Distribution of BMI')
plt.xlabel('BMI')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Boxplot: Heart Rate by Activity Level
plt.figure(figsize=(8,4))
sns.boxplot(x='ActivityLevel', y='HeartRate', data=data)
plt.title('Heart Rate by Activity Level')
plt.xlabel('Activity Level')
plt.ylabel('Heart Rate')
plt.show()

In [None]:
# Scatterplot: Sleep Hours vs. BMI
plt.figure(figsize=(8,4))
sns.scatterplot(x='SleepHours', y='BMI', hue='ActivityLevel', data=data)
plt.title('Sleep Hours vs. BMI by Activity Level')
plt.xlabel('Sleep Hours')
plt.ylabel('BMI')
plt.legend(title='Activity Level')
plt.show()

In [None]:
# Countplot: Health Status by Activity Level
plt.figure(figsize=(8,4))
sns.countplot(x='ActivityLevel', hue='HealthStatus', data=data)
plt.title('Health Status by Activity Level')
plt.xlabel('Activity Level')
plt.ylabel('Count')
plt.legend(title='Health Status')
plt.show()

## Conclusion
- There was a statistically significant correlation between sleep hours and BMI.
- ANOVA revealed differences in heart rate based on activity levels.
- Gender showed some difference in BMI, but it may not be significant depending on the p-value.
- Health status was associated with activity level based on the Chi-square test.

These insights can be used to better understand how behavior and demographics relate to basic health metrics.

## Descriptive Statistics

In [None]:
data.describe()

## Statistical Hypotheses
We will perform the following four statistical tests:
1. **T-test**: Is there a significant difference in BMI between males and females?
   - H₀: There is no difference in BMI between genders.
   - H₁: There is a difference in BMI between genders.

2. **Pearson Correlation**: Is there a relationship between sleep hours and BMI?
   - H₀: No correlation exists between sleep hours and BMI.
   - H₁: There is a correlation between sleep hours and BMI.

3. **ANOVA**: Does heart rate differ by activity level?
   - H₀: Heart rate means are equal across activity levels.
   - H₁: At least one group has a different mean heart rate.

4. **Chi-Square Test**: Is health status related to activity level?
   - H₀: Health status is independent of activity level.
   - H₁: Health status is associated with activity level.

### 1. T-test: BMI by Gender

In [None]:
male_bmi = data[data['Gender'] == 'Male']['BMI']
female_bmi = data[data['Gender'] == 'Female']['BMI']
t_stat, p_val = stats.ttest_ind(male_bmi, female_bmi)
print(f"T-statistic = {t_stat:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 2. Pearson Correlation: Sleep Hours and BMI

In [None]:
corr, p_val = stats.pearsonr(data['SleepHours'], data['BMI'])
print(f"Correlation (r) = {corr:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 3. ANOVA: Heart Rate by Activity Level

In [None]:
groups = [group['HeartRate'].values for name, group in data.groupby('ActivityLevel')]
f_stat, p_val = stats.f_oneway(*groups)
print(f"F-statistic = {f_stat:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

### 4. Chi-Square Test: Activity Level and Health Status

In [None]:
contingency = pd.crosstab(data['ActivityLevel'], data['HealthStatus'])
chi2, p_val, dof, expected = stats.chi2_contingency(contingency)
print(f"Chi-square = {chi2:.2f}, p-value = {p_val:.4f}")
"Reject H₀" if p_val < 0.05 else "Fail to reject H₀"

## Data Visualizations

In [None]:
# Histogram: BMI
plt.figure(figsize=(8,4))
sns.histplot(data['BMI'], kde=True, color='skyblue')
plt.title('Distribution of BMI')
plt.xlabel('BMI')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Boxplot: Heart Rate by Activity Level
plt.figure(figsize=(8,4))
sns.boxplot(x='ActivityLevel', y='HeartRate', data=data)
plt.title('Heart Rate by Activity Level')
plt.xlabel('Activity Level')
plt.ylabel('Heart Rate')
plt.show()

In [None]:
# Scatterplot: Sleep Hours vs. BMI
plt.figure(figsize=(8,4))
sns.scatterplot(x='SleepHours', y='BMI', hue='ActivityLevel', data=data)
plt.title('Sleep Hours vs. BMI by Activity Level')
plt.xlabel('Sleep Hours')
plt.ylabel('BMI')
plt.legend(title='Activity Level')
plt.show()

In [None]:
# Countplot: Health Status by Activity Level
plt.figure(figsize=(8,4))
sns.countplot(x='ActivityLevel', hue='HealthStatus', data=data)
plt.title('Health Status by Activity Level')
plt.xlabel('Activity Level')
plt.ylabel('Count')
plt.legend(title='Health Status')
plt.show()

## Conclusion
- There was a statistically significant correlation between sleep hours and BMI.
- ANOVA revealed differences in heart rate based on activity levels.
- Gender showed some difference in BMI, but it may not be significant depending on the p-value.
- Health status was associated with activity level based on the Chi-square test.

These insights can be used to better understand how behavior and demographics relate to basic health metrics.