🏋️ Fitness Classification Project

📋 Problem Description

This project addresses a Binary classification problem using a synthetic dataset that simulates real-world fitness assessment. The goal is to predict whether a person is fit (is_fit = 1) or not fit (is_fit = 0) based on various health and lifestyle features.

📊 Dataset Overview

📈 Samples: 2,000 individuals
🎯 Features: 10 predictive features + 1 target variable
⚖️ Target Distribution: ~60% not fit (0), ~40% fit (1)
🚨 Data Challenges:
- 🔀 Mixed data types (numerical and categorical)
- ❓ Missing values (~8% in sleep_hours)
- 🔠 Inconsistent formatting (mixed numeric/string values)
- 📊 Outliers present in weight_kg (~2% of samples)
- 🔗 Complex feature relationships with target

🗂️ Features

Feature	Description	Type
`age`	Age in years	Integer
`height_cm`	Height in centimeters	Integer
`weight_kg`	Weight in kilograms (contains outliers)	Integer
`heart_rate`	Resting heart rate in BPM	Float
`blood_pressure`	Systolic blood pressure in mmHg	Float
`sleep_hours`	Average daily sleep hours (contains NaNs)	Float
`nutrition_quality`	Daily nutrition score 0-10	Float
`activity_index`	Physical activity level 1-5	Float
`smokes`	Smoking status (mixed types)	Mixed
`gender`	Gender ('M' or 'F')	String
`is_fit`	Target variable (1=fit, 0=not fit)	Binary

🔧 Methodology

🧹 Data Preprocessing

🔄 Handled Mixed Data Types:
- ✅ Converted 'smokes' column to consistent binary format (0/1)
- ✅ Created 'is_male' feature from gender column (0/1 encoding)
🎯 Missing Value Treatment:
- ✅ Imputed missing 'sleep_hours' values with median from training set to avoid Data Leakage
📏 Feature Scaling:
- ✅ Standardized numerical features for logistic regression model, Tree based models do not require scaling - hence used a copy of same dataset.
🔍 Exploratory Data Analysis:
- ✅ Generated histograms and boxplots to understand data distribution
- ✅ Analyzed feature relationships and outlier patterns

🤖 Model Development

Four classification models were trained and evaluated:

Model	Type	Purpose
Logistic Regression	Linear	Baseline performance
Random Forest	Ensemble	Handle non-linearity
XGBoost	Gradient Boosting	State-of-art performance
Decision Tree	Tree-based	Interpretability

📊 Model Performance

🏆 Validation ROC-AUC Scores

Model	🥇 ROC-AUC Score	Rank	Status
Logistic Regression	0.8566	1	🥇 Best Model
Random Forest	0.8417	2	🥈
XGBoost	0.8400	3	🥉
Decision Tree	0.7999	4

⚙️ Hyperparameter Tuning

🔧 Performed hyperparameter optimization on training set
📋 Validated performance on separate validation set
🎯 Logistic Regression demonstrated superior performance

🚀 Deployment

🌐 Production Setup

🎯 Final Model: Logistic Regression trained on full training dataset
☁️ Deployment Platform: Fly.io
🖥️ Application Type: Binary classification web service
🎥 Project Deployment Demo: Includes demonstration video

💡 Key Findings

✅ Logistic regression achieved the best performance despite dataset complexity
✅ Proper data preprocessing was crucial for model success
✅ Feature scaling significantly improved linear model performance
✅ The synthetic dataset effectively mimics real-world data challenges

FastAPI Prediction Service

A machine learning prediction service built with FastAPI, UV, and Docker.

🚀 Quick Start

# 1. Build Docker image
docker build -t fitness-prediction .

# 2. Run container
docker run -it --rm -p 9696:9696 fitness-prediction

# 3. In a new terminal, test the service on the link: http://localhost:9696/docs
The endpoint is predict

🎯 Usage

The deployed model can be used to predict fitness levels based on health and lifestyle parameters, making it suitable for:

🏥 Health assessment applications
💪 Wellness programs

Note: This dataset is synthetic and intended for educational purposes to practice data cleaning, feature engineering, and classification modeling.

❤️ Deployed Video ☁️ Fly.io

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
dataset		dataset
.dockerignore		.dockerignore
.python-version		.python-version
Dockerfile		Dockerfile
Histogram Distribution.png		Histogram Distribution.png
README.md		README.md
Target Class.png		Target Class.png
Video Project.mp4		Video Project.mp4
fly.toml		fly.toml
main.py		main.py
model.bin		model.bin
notebook.ipynb		notebook.ipynb
predict.py		predict.py
project_deployment_video.gif		project_deployment_video.gif
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏋️ Fitness Classification Project

📋 Problem Description

📊 Dataset Overview

🗂️ Features

🔧 Methodology

🧹 Data Preprocessing

🤖 Model Development

📊 Model Performance

🏆 Validation ROC-AUC Scores

⚙️ Hyperparameter Tuning

🚀 Deployment

🌐 Production Setup

💡 Key Findings

FastAPI Prediction Service

🚀 Quick Start

🎯 Usage

About

Uh oh!

Releases

Packages

Languages

sharat25ai-dev/Machine_Learning_Projects

Folders and files

Latest commit

History

Repository files navigation

🏋️ Fitness Classification Project

📋 Problem Description

📊 Dataset Overview

🗂️ Features

🔧 Methodology

🧹 Data Preprocessing

🤖 Model Development

📊 Model Performance

🏆 Validation ROC-AUC Scores

⚙️ Hyperparameter Tuning

🚀 Deployment

🌐 Production Setup

💡 Key Findings

FastAPI Prediction Service

🚀 Quick Start

🎯 Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages