# 🚀 Project Steps

1️⃣ **Load & Explore Data**

2️⃣ **Data Preprocessing**

3️⃣ **Feature Engineering**

4️⃣ **Model Selection & Training**

5️⃣ **Streamlit Deployment**

## 1️⃣ Load & Explore Data

Let’s start by loading the dataset and understanding its structure.

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv("workout_fitness_tracker.csv")

# Display first few rows
print(df.head())

# Check for missing values
print(df.isnull().sum())

# Get basic statistics
print(df.describe())

   User ID  Age  Gender  Height (cm)  Weight (kg) Workout Type  \
0        1   39    Male          175           99      Cycling   
1        2   36   Other          157          112       Cardio   
2        3   25  Female          180           66         HIIT   
3        4   56    Male          154           89      Cycling   
4        5   53   Other          194           59     Strength   

   Workout Duration (mins)  Calories Burned  Heart Rate (bpm)  Steps Taken  \
0                       79              384               112         8850   
1                       73              612               168         2821   
2                       27              540               133        18898   
3                       39              672               118        14102   
4                       56              410               170        16518   

   Distance (km) Workout Intensity  Sleep Hours  Water Intake (liters)  \
0          14.44              High          8.2             

## 2️⃣ Data Preprocessing

✅ **Handling Missing Values**

In [2]:
# Fill missing numerical values with mean
num_cols = df.select_dtypes(include=['float64', 'int64']).columns
df[num_cols] = df[num_cols].fillna(df[num_cols].mean())

# Fill missing categorical values with mode
cat_cols = df.select_dtypes(include=['object']).columns
df[cat_cols] = df[cat_cols].fillna(df[cat_cols].mode().iloc[0])

✅ **Encoding Categorical Variables**

In [3]:
from sklearn.preprocessing import LabelEncoder
import pickle
# List of categorical columns
categorical_cols = ["Gender", "Workout Type", "Workout Intensity", "Mood Before Workout", "Mood After Workout"]
encoders = {}

# Encode categorical columns and store encoders
for col in categorical_cols:
    encoders[col] = LabelEncoder()
    df[col] = encoders[col].fit_transform(df[col])

# Save all encoders in one file
with open("label_encoders.pkl", "wb") as f:
    pickle.dump(encoders, f)

✅ **Feature Scaling**

In [None]:
from sklearn.preprocessing import StandardScaler
import pickle

scaled_features = ["Age", "Height (cm)", "Weight (kg)", "Heart Rate (bpm)", "Steps Taken",
                   "Distance (km)", "Sleep Hours", "Water Intake (liters)", "Daily Calories Intake",
                   "Resting Heart Rate (bpm)", "VO2 Max", "Body Fat (%)"]
scaler = StandardScaler()
scaler.fit(df[scaled_features])
pickle.dump(scaler, open("scaler.pkl", "wb"))

## 3️⃣ Feature Engineering

We’ll define the target variable (Workout Efficiency) based on calories burned.

In [None]:
# Define target variable
df["Workout Efficiency"] = df["Calories Burned"].apply(lambda x: "Low" if x < 200 else ("Medium" if x < 400 else "High"))

# ✅ Use a separate LabelEncoder for target variable
efficiency_encoder = LabelEncoder()
df["Workout Efficiency"] = efficiency_encoder.fit_transform(df["Workout Efficiency"])

# Save efficiency encoder
with open("efficiency_encoder.pkl", "wb") as f:
    pickle.dump(efficiency_encoder, f)

# Drop unnecessary columns
df.drop(["User ID", "Calories Burned"], axis=1, inplace=True)

## 4️⃣ Model Selection & Training

We’ll use Random Forest Classifier to predict workout efficiency.

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pickle

# Split data into training and testing sets
X = df.drop("Workout Efficiency", axis=1)
y = df["Workout Efficiency"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

with open("workout_model.pkl", "wb") as f:
    pickle.dump(model, f)

# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2000

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000



## Save Model

## 5️⃣ Streamlit Deployment

Now, we’ll build an interactive UI to take user input and predict Workout Efficiency.

In [7]:
import streamlit as st
import pandas as pd
import numpy as np
import pickle

# Load the trained model
model = pickle.load(open("workout_model.pkl", "rb"))

# Load categorical encoders
encoders = pickle.load(open("label_encoders.pkl", "rb"))  # ✅ Correct encoder
scaler = pickle.load(open("scaler.pkl", "rb"))  # Load scaler if used during training

# Streamlit UI
st.title("🏋️ Workout & Fitness Tracker")

st.sidebar.header("Enter Your Workout Data")

# Collect user input
age = st.sidebar.slider("Age", 18, 60, 25)
gender = st.sidebar.selectbox("Gender", ["Male", "Female", "Other"])
height = st.sidebar.slider("Height (cm)", 140, 210, 175)
weight = st.sidebar.slider("Weight (kg)", 40, 150, 70)
workout_type = st.sidebar.selectbox("Workout Type", ["Cardio", "Strength", "Yoga", "HIIT", "Cycling", "Running"])
duration = st.sidebar.slider("Workout Duration (mins)", 10, 120, 45)
heart_rate = st.sidebar.slider("Heart Rate (bpm)", 60, 200, 120)
steps = st.sidebar.slider("Steps Taken", 100, 20000, 5000)
distance = st.sidebar.slider("Distance (km)", 0.0, 15.0, 3.0)
intensity = st.sidebar.selectbox("Workout Intensity", ["Low", "Medium", "High"])
sleep = st.sidebar.slider("Sleep Hours", 0, 12, 7)
water = st.sidebar.slider("Water Intake (liters)", 0.5, 5.0, 2.0)
calories_intake = st.sidebar.slider("Daily Calories Intake", 1000, 4000, 2500)
resting_hr = st.sidebar.slider("Resting Heart Rate (bpm)", 50, 100, 70)
vo2_max = st.sidebar.slider("VO2 Max", 20, 60, 40)
body_fat = st.sidebar.slider("Body Fat (%)", 5, 40, 20)
mood_before = st.sidebar.selectbox("Mood Before Workout", ["Happy", "Neutral", "Tired", "Stressed"])
mood_after = st.sidebar.selectbox("Mood After Workout", ["Energized", "Neutral", "Fatigued"])

# Convert input data into a DataFrame
input_data = pd.DataFrame([[age, gender, height, weight, workout_type, duration, heart_rate, steps, distance,
                            intensity, sleep, water, calories_intake, resting_hr, vo2_max, body_fat, mood_before, mood_after]],
                          columns=["Age", "Gender", "Height (cm)", "Weight (kg)", "Workout Type", "Workout Duration (mins)",
                                   "Heart Rate (bpm)", "Steps Taken", "Distance (km)", "Workout Intensity",
                                   "Sleep Hours", "Water Intake (liters)", "Daily Calories Intake",
                                   "Resting Heart Rate (bpm)", "VO2 Max", "Body Fat (%)", "Mood Before Workout", "Mood After Workout"])

# ✅ Encode categorical values correctly
for col in ["Gender", "Workout Type", "Workout Intensity", "Mood Before Workout", "Mood After Workout"]:
    input_data[col] = encoders[col].transform([input_data[col][0]])  # ✅ Fix transformation

# ✅ Scale numerical features
numerical_cols = ["Age", "Height (cm)", "Weight (kg)", "Workout Duration (mins)", "Heart Rate (bpm)", "Steps Taken",
                  "Distance (km)", "Sleep Hours", "Water Intake (liters)", "Daily Calories Intake",
                  "Resting Heart Rate (bpm)", "VO2 Max", "Body Fat (%)"]

input_data[numerical_cols] = scaler.transform(input_data[numerical_cols])  # ✅ Scale input

# Make prediction
prediction = model.predict(input_data)[0]

# Decode the prediction
efficiency_map = {0: "Low", 1: "Medium", 2: "High"}
st.subheader(f"Predicted Workout Efficiency: **{efficiency_map[prediction]}**")

2025-02-26 08:26:18.233 
  command:

    streamlit run /home/codespace/.local/lib/python3.12/site-packages/ipykernel_launcher.py [ARGUMENTS]
2025-02-26 08:26:18.240 Session state does not function when running a script without `streamlit run`


ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- Workout Duration (mins)
Feature names seen at fit time, yet now missing:
- Calories Burned
