

Health & Lifestyle Dataset – Analyze Well-being, Habits, and Trends Easily

Sleep, Exercise, and Heart Health

Goal: Investigate how sleep hours and exercise relate to:

- Heart Rate

- Blood Pressure

- Heart Disease

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
import os

ModuleNotFoundError: No module named 'seaborn'

In [None]:
#Configuring Pandas to show all rows and columns.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [None]:
# Open dataset
df = pd.read_csv('C:\\Users\\Luis\\Downloads\\health_activity_data.csv',)


In [None]:
df.head()

In [None]:
df = df.set_index('ID')

In [None]:
df.head()

In [None]:
df.isnull().sum()

In [None]:
df.duplicated().sum()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
sns.histplot(data=df, x='Hours_of_Sleep')
plt.show()

In [None]:
sns.histplot(data=df, x='Exercise_Hours_per_Week')
plt.show()

In [None]:
sns.histplot(data=df, x='Heart_Rate')
plt.show()

In [None]:
df['Heart_Disease'].value_counts().sort_index()

In [None]:
heart_disease_counts = df['Heart_Disease'].value_counts()


plt.bar(
    x=heart_disease_counts.index.astype(str),  # astype(str) makes sure your x-axis shows labels, not just numbers
    height=heart_disease_counts.values,
    color=['green', 'crimson'],
)

plt.title('Heart Disease Distribution')
plt.xlabel('Heart Disease (0 = No, 1 = Yes)')
plt.ylabel('Number of People')
plt.show()

Sleep vs Heart health


In [None]:
plt.figure(figsize=(8, 5))
sns.regplot(
    x='Hours_of_Sleep',
    y='Heart_Rate',
    data=df,
    scatter_kws={'alpha': 0.6, 'color': 'mediumslateblue'},
    line_kws={'color': 'red', 'linewidth': 2}
)
plt.title("Sleep Duration vs Heart Rate")
plt.xlabel("Hours of Sleep")
plt.ylabel("Heart Rate (Bpm)")
plt.grid(True)
plt.tight_layout()
plt.show()

The relationship is weak — sleep alone doesn’t explain much of the variation in heart rate.

There is no  correlation between sleep hours and heart rate.

Other factors (like fitness, stress, caffeine, etc.) probably play a big role in heart rate too.



In [None]:
plt.figure(figsize=(10, 5))
sns.boxplot(x='Heart_Disease', y='Hours_of_Sleep', data=df)

plt.title('Hours of Sleep vs Heart Disease ')
plt.ylabel('Hours of Sleep')
plt.xlabel('Heart_Disease')
plt.tight_layout()
plt.show()

In [None]:
df.groupby('Heart_Disease')['Hours_of_Sleep'].describe()

Sleep duration alone doesn't differ significantly between individuals with and without heart disease — at least not visually.


Exercise vs Heart health

In [None]:
plt.figure(figsize=(8, 5))
sns.regplot(
    x='Heart_Rate',
    y='Exercise_Hours_per_Week',
    data=df,
    scatter_kws={'alpha': 0.6, 'color': 'mediumslateblue'},
    line_kws={'color': 'red', 'linewidth': 2}
)
plt.title("Exercise Hours per Week vs Heart Rate")
plt.xlabel("Heart Rate")
plt.ylabel("Exercise Hours per Week")
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
df['HR_bin'] = pd.cut(df['Heart_Rate'], bins=range(50, 125, 5), right=False) #Using 'pd.cut' to segment and sort data values into bins

plt.figure(figsize=(16, 6))
sns.boxplot(data=df, x='HR_bin', y='Exercise_Hours_per_Week')

plt.xticks(rotation=45, ha='right')
plt.xlabel("Heart Rate (binned)")
plt.ylabel("Exercise Hours per Week")
plt.title("Exercise Hours per Week vs Heart Rate (binned)")
plt.tight_layout()
plt.show()

More hours of exercise per week are associated with a lower resting heart rate, supporting known cardiovascular fitness effects — although the relationship appears moderate or weak in this dataset.

Heart Disease vs No Heart Disease

In [None]:
avg_sleep = df.groupby('Heart_Disease')['Hours_of_Sleep'].mean()
print(avg_sleep)

In [None]:
avg_exercise_hours = df.groupby('Heart_Disease')['Exercise_Hours_per_Week'].mean()
print(avg_exercise_hours)

In [None]:
# First, split into two new columns
df[['Systolic_BP', 'Diastolic_BP']] = df['Blood_Pressure'].str.split('/', expand=True)

# Convert them to numeric
df['Systolic_BP'] = pd.to_numeric(df['Systolic_BP'], errors='coerce')
df['Diastolic_BP'] = pd.to_numeric(df['Diastolic_BP'], errors='coerce')


In [None]:
avg_hr_bp = df.groupby('Heart_Disease')[['Heart_Rate', 'Systolic_BP', 'Diastolic_BP']].mean().round(2)
print(avg_hr_bp)

In this dataset, only blood pressure shows the clearest relationship with heart disease.
Sleep, exercise, and heart rate show no strong or clear differences between groups