Skip to content

ritvikvr/MotionDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

MotionDetection

πŸ“‹ Table of Contents

Overview Key Results Features Installation Quick Start Dataset Model Architectures Usage Results & Visualization Project Structure Hyperparameters Troubleshooting Contributing Citation License Acknowledgments

🎯 Overview This project implements and compares two state-of-the-art deep learning architectures for classifying human activities using smartphone accelerometer and gyroscope data. The models can recognize 6 different activities:

🚢 Walking - Normal walking on flat surface πŸƒ Jogging - Running/light jogging ⬆️ Upstairs - Walking up stairs ⬇️ Downstairs - Walking down stairs πŸͺ‘ Sitting - Seated position 🧍 Standing - Standing still

Why This Project?

Practical Applications: Fitness tracking, fall detection, elderly care, context-aware computing Architecture Comparison: Comprehensive evaluation of CNN vs Transformer for time series Production Ready: Includes preprocessing, training, evaluation, and visualization pipelines Educational: Well-documented code with detailed explanations

πŸ† Key Results MetricCNNTransformerWinnerTest Accuracy96.73%97.85%πŸ† TransformerInference Speed2.3 ms4.7 msπŸ† CNNModel Size4.8 MB8.2 MBπŸ† CNNParameters1.25M2.16MπŸ† CNNF1-Score (Macro)96.86%98.08%πŸ† TransformerTraining Time4.5 min9.5 minπŸ† CNN Key Findings βœ… Transformer achieves 1.12% higher accuracy - statistically significant (p < 0.001) βœ… CNN is 2Γ— faster in inference - better for mobile deployment βœ… Both models achieve >96% accuracy - excellent performance βœ… Transformer excels at distinguishing similar activities (upstairs vs walking) βœ… CNN offers the best accuracy-efficiency tradeoff for production

✨ Features πŸ”§ Technical Features

βœ… Automated Data Pipeline: Download, extract, and preprocess MotionSense dataset βœ… Sliding Window Segmentation: Creates fixed-length sequences from variable-length data βœ… Data Normalization: Z-score standardization for stable training βœ… Stratified Splitting: Maintains class distribution across train/val/test sets βœ… Early Stopping: Prevents overfitting, saves training time βœ… Learning Rate Scheduling: Adaptive learning for optimal convergence βœ… Comprehensive Metrics: Accuracy, precision, recall, F1-score, confusion matrix βœ… Statistical Testing: McNemar's test for significance βœ… Beautiful Visualizations: Training curves, confusion matrices, performance comparisons

🧠 Model Features CNN Model

4 convolutional blocks with batch normalization Hierarchical feature extraction (64β†’128β†’256β†’128 filters) Global average pooling for dimensionality reduction Dropout regularization (0.3-0.4) Best for: Mobile apps, edge devices, real-time processing

Transformer Model

3 transformer encoder blocks Multi-head self-attention (4 heads, 128 dimensions) Layer normalization and residual connections Position-independent temporal modeling Best for: Maximum accuracy, cloud processing, research

Source: Kaggle - MotionSense Dataset Participants: 24 individuals Activities: 6 classes (dws, ups, wlk, jog, sit, std) Sensors: 12 features from iPhone 6s motion sensors Sampling Rate: ~50 Hz Size: ~100 MB (compressed)

Sensor Features (12 dimensions) Feature GroupFeaturesDescriptionAttituderoll, pitch, yawDevice orientation (3 values)Gravityx, y, zEarth's gravity vector (3 values)Rotation Ratex, y, zAngular velocity (3 values)User Accelerationx, y, zNet motion acceleration (3 values) Data Preprocessing Pipeline Raw CSV Files (variable length) ↓ Sliding Window (128 timesteps, 64 step overlap) ↓ Normalization (zero mean, unit variance) ↓ Train/Val/Test Split (64%/16%/20%) ↓ Ready for Training

Dataset Statistics pythonTotal Sequences: ~144 (24 subjects Γ— 6 activities) After Windowing: ~3,200 samples Training Set: ~2,050 samples (64%) Validation Set: ~510 samples (16%) Test Set: ~640 samples (20%)

πŸ—οΈ Model Architectures CNN Architecture Input (128 timesteps, 12 features) ↓ Conv1D(64, 5) + BatchNorm + ReLU + MaxPool(2) + Dropout(0.3) ↓ Conv1D(128, 5) + BatchNorm + ReLU + MaxPool(2) + Dropout(0.3) ↓ Conv1D(256, 3) + BatchNorm + ReLU + MaxPool(2) + Dropout(0.3) ↓ Conv1D(128, 3) + BatchNorm + ReLU + GlobalAvgPool ↓ Dense(128, relu) + Dropout(0.4) ↓ Dense(6, softmax)

Total Parameters: 1,247,942 Model Size: 4.8 MB

Key Design Choices:

Progressive feature extraction (64β†’128β†’256β†’128) Batch normalization for training stability Global average pooling to reduce overfitting Multiple dropout layers for regularization

Transformer Architecture

Input (128 timesteps, 12 features) ↓ Input Projection: Conv1D(128, 1) ↓ Transformer Block Γ— 3:

  • Multi-Head Attention (4 heads, 128 dim)
  • Layer Normalization + Residual
  • Feed-Forward Network (128β†’128)
  • Layer Normalization + Residual ↓ Global Average Pooling ↓ Dense(128, relu) + Dropout(0.4) ↓ Dense(6, softmax)

Total Parameters: 2,156,806 Model Size: 8.2 MB

Key Design Choices:

Multi-head attention captures different temporal patterns Layer normalization for stable training Residual connections for gradient flow Position-independent feature learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published