# Data Exploration

In this notebook, we will perform exploratory data analysis (EDA) on the workout schedule dataset. The goal is to understand the data better, identify patterns, and prepare for further analysis and model training.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the dataset
data_path = '../data/processed/workout_schedule.csv'  # Update with the correct path
workout_data = pd.read_csv(data_path)

# Display the first few rows of the dataset
workout_data.head()

In [3]:
# Summary statistics
workout_data.describe()

In [4]:
# Check for missing values
missing_values = workout_data.isnull().sum()
missing_values[missing_values > 0]

In [5]:
# Visualize the distribution of muscle gains
plt.figure(figsize=(10, 6))
sns.histplot(workout_data['muscle_gain'], bins=30, kde=True)
plt.title('Distribution of Muscle Gains')
plt.xlabel('Muscle Gain')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = workout_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this notebook, we explored the workout schedule dataset, visualized the distribution of muscle gains, and examined the correlations between different features. This analysis will help inform our data preprocessing and model training steps.