# Workout Form Classification (Colab Clean Notebook)

End-to-end pipeline for detecting proper vs improper execution across multiple strength exercises (10 classes total).

Sections:
1. Environment Setup (TensorFlow 2.16, Python 3.11)
2. Project Code Acquisition / Sync
3. Data Placement (raw videos) & Frame Extraction
4. Dataset Assembly (arrays)
5. Model Training (CNN+LSTM baseline)
6. Evaluation & Quick Inference
7. (Optional) Confusion Matrix & Per-Class Metrics
8. Next Steps / Export

Class taxonomy:
- barbell_bicep_curl_(proper|improper)
- bench_press_(proper|improper)
- deadlift_(proper|improper)
- plank_(proper|improper)
- squat_(proper|improper)

> Colab: Runtime → Change runtime type → Python 3.11 + GPU (T4/A100). Then run cells sequentially.


In [None]:
# 1. Environment Setup
import sys, platform
print('Python:', sys.version)
print('Platform:', platform.platform())

!pip install -q --upgrade pip
!pip install -q tensorflow==2.16.2 numpy<2.0 pandas<2.3 opencv-python pillow matplotlib moviepy scikit-learn

import tensorflow as tf, numpy as np
print('TF:', tf.__version__)
print('GPUs:', tf.config.list_physical_devices('GPU'))

if tf.config.list_physical_devices('GPU'):
    from tensorflow.keras import mixed_precision
    mixed_precision.set_global_policy('mixed_float16')
    print('Mixed precision enabled')
else:
    print('Running on CPU')

## 2. Project Code Acquisition
Choose one method below to get the code into this Colab session.

A) Public clone:
```bash
!git clone https://github.com/nikimanhanif/HaakimFYP.git
%cd HaakimFYP
```
B) Private clone (replace TOKEN):
```bash
!git clone https://<TOKEN>@github.com/nikimanhanif/HaakimFYP.git
%cd HaakimFYP
```
C) Google Drive (persistent):
```python
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/HaakimFYP
```
After cloning, list files:
```python
import os
print(os.listdir('.'))
```

In [None]:
# 3. Data Placement & Frame Extraction
from pathlib import Path
import os

RAW = Path('data/raw')
for split in ['train','val','test']:
    (RAW / split).mkdir(parents=True, exist_ok=True)

print('Upload your workout videos into data/raw/train (and val/test).')
print('Each filename should contain exercise + quality tokens, e.g. deadlift_proper_01.mp4, squat_wrong_set2.mp4')

from stretch_detector.config import DEFAULT_CONFIG as cfg
from stretch_detector.data.video_dataset import prepare_frames
cfg.ensure_dirs()
splits = prepare_frames(cfg)
print({ 'train_videos': len(splits.train), 'val_videos': len(splits.val), 'test_videos': len(splits.test) })

In [None]:
# 4. Dataset Assembly (Arrays)
from stretch_detector.data.video_dataset import build_arrays_for_split
from stretch_detector.config import DEFAULT_CONFIG as cfg

cfg.seq_len = 10
cfg.image_size = (128,128)
cfg.num_classes = 10

X_train, y_train = build_arrays_for_split(cfg, 'train')
X_val, y_val = build_arrays_for_split(cfg, 'val')
print('Shapes -> X_train', X_train.shape, 'y_train', y_train.shape, 'X_val', X_val.shape, 'y_val', y_val.shape)

assert X_train.shape[0] > 0, 'No training samples found. Ensure videos uploaded + frame extraction finished.'

In [None]:
# 5. Model Training (CNN+LSTM Baseline)
import collections
from tensorflow import keras
from stretch_detector.models.cnn_lstm import build_cnn_lstm

counts = collections.Counter(y_train.tolist())
class_weight = {cls: max(1.0, float(sum(counts.values()))/(len(counts)*cnt)) for cls, cnt in counts.items()}
print('Class counts:', dict(counts))
print('Class weight:', class_weight)

input_shape = (cfg.seq_len, cfg.image_size[0], cfg.image_size[1], cfg.channels)
model = build_cnn_lstm(input_shape, num_classes=cfg.num_classes)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=cfg.learning_rate),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

callbacks = [
    keras.callbacks.EarlyStopping(patience=4, restore_best_weights=True, monitor='val_loss'),
]

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val) if X_val.shape[0] > 0 else None,
    epochs=cfg.epochs,
    batch_size=min(cfg.batch_size, max(1, X_train.shape[0])),
    shuffle=True,
    class_weight=class_weight,
    verbose=1,
)

from pathlib import Path
models_dir = Path('models'); models_dir.mkdir(exist_ok=True)
model_path = models_dir / f'workout_form_cnn_lstm_{cfg.num_classes}cls.keras'
model.save(model_path)
print('Saved model to', model_path)

In [None]:
# 6. Evaluation & Quick Inference
from stretch_detector.data.video_dataset import class_key_from_label
import numpy as np

if X_val.shape[0] > 0:
    metrics = model.evaluate(X_val, y_val, verbose=0)
    print('Validation:', dict(zip(model.metrics_names, metrics)))

sample = X_train[0:1]
pred = model.predict(sample, verbose=0)
cls_id = int(np.argmax(pred, axis=-1)[0])
print('Predicted class id:', cls_id, '->', class_key_from_label(cls_id))

In [None]:
# 7. Confusion Matrix & Per-Class Metrics
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np, itertools
from stretch_detector.data.video_dataset import COMBINED_CLASS_KEYS

if X_val.shape[0] > 0:
    val_preds = model.predict(X_val, verbose=0)
    y_pred = np.argmax(val_preds, axis=-1)
    cm = confusion_matrix(y_val, y_pred, labels=range(cfg.num_classes))
    print('Confusion Matrix:\n', cm)
    print('\nClassification Report:\n')
    print(classification_report(y_val, y_pred, target_names=COMBINED_CLASS_KEYS[:cfg.num_classes]))
else:
    print('No validation set available; skipping metrics.')

## 8. Next Steps / Export
- Increase `image_size` or `seq_len` for more temporal granularity (after confirming baseline works).
- Try 3D CNN (`build_cnn3d`) for potentially richer temporal encoding.
- Add data augmentation (random crop, brightness, slight rotation) via a tf.data pipeline.
- Consider balancing strategies if improper samples are scarce.
- Export to TFLite:
```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
open('workout_form_model.tflite','wb').write(tflite_model)
```
- Download artifacts:
```python
from google.colab import files
files.download(str(model_path))
```
- Persist to Drive (if mounted) for reuse.

This notebook is now trimmed to only the workflow needed for workout form classification in Colab.

# ✅ Workout Form Classification on Colab (Updated 2025-09-21)

This notebook now targets **multi-class workout form classification** (proper vs improper execution across multiple exercises) using the refactored Python package.

**Pipeline covered:**
1. Environment + dependencies (TensorFlow 2.16, Python 3.11)
2. Project code acquisition (clone / Drive / upload)
3. Raw workout video placement (train/val/test) and frame extraction
4. Building supervised arrays (sequences of frames → tensor)
5. Training CNN+LSTM (or 3D CNN) to classify exercise + form quality
6. Validation, quick inference, and model export

> Runtime: Set Python = 3.11 and enable GPU for faster training.

Classes (10): barbell_bicep_curl_(proper|improper), bench_press_(proper|improper), deadlift_(proper|improper), plank_(proper|improper), squat_(proper|improper).

In [None]:
# Colab Environment Setup
# Ensures Python 3.11 runtime, installs dependencies, and validates TensorFlow.
import sys, platform, subprocess, textwrap
print('Python version:', sys.version)
print('Platform:', platform.platform())

# Install core dependencies (avoid macOS-specific extras)
!pip install -q --upgrade pip
!pip install -q tensorflow==2.16.2 numpy<2.0 pandas<2.3 opencv-python pillow matplotlib moviepy

import tensorflow as tf, numpy as np
print('TF version:', tf.__version__)
print('Num GPUs:', len(tf.config.list_physical_devices('GPU')))

# Optional: mixed precision if GPU present
if tf.config.list_physical_devices('GPU'):
    from tensorflow.keras import mixed_precision
    mixed_precision.set_global_policy('mixed_float16')
    print('Mixed precision enabled')
else:
    print('Running on CPU - expect slower training')

### 🔄 Get the Project Code
Choose ONE of the following methods:

1. Public GitHub clone (recommended):
```bash
!git clone https://github.com/nikimanhanif/HaakimFYP.git
%cd HaakimFYP
```
2. Private repo: Add a fine-grained token then:
```bash
!git clone https://<TOKEN>@github.com/nikimanhanif/HaakimFYP.git
```
3. Manual upload: Use Colab left pane → Files → Upload folder (not ideal for large data).
4. Google Drive (persistent between sessions):
```python
from google.colab import drive
drive.mount('/content/drive')
# Then: %cd /content/drive/MyDrive/HaakimFYP
```

After cloning / moving into project root, list files to verify:
```python
import os, itertools, textwrap
print('\n'.join(os.listdir('.')))
```

In [None]:
# 📂 Dataset Placement & Frame Extraction (Workout Form)
# Place your workout videos in data/raw/train, data/raw/val, data/raw/test.
# Each video filename (or a parent folder name) should contain tokens that hint exercise + form quality
# e.g. squat_proper_001.mp4, bench_wrong_session2.mp4 (aliases resolved automatically).
import os, shutil, glob, json, sys
from pathlib import Path

PROJECT_ROOT = Path.cwd()
print('Project root:', PROJECT_ROOT)
raw_root = PROJECT_ROOT / 'data' / 'raw'
raw_root.mkdir(parents=True, exist_ok=True)

print('Raw split contents:')
for split in ['train','val','test']:
    d = raw_root / split
    if d.exists():
        vids = list(d.rglob('*'))
        count = sum(1 for p in vids if p.is_file() and p.suffix.lower() in ['.mp4','.mov','.avi','.mkv','.mpeg','.mpg'])
        print(f'  {split}: {count} video files (path={d})')
    else:
        print(f'  {split}: MISSING (create and upload videos)')

from stretch_detector.config import DEFAULT_CONFIG as cfg
from stretch_detector.data.video_dataset import prepare_frames
cfg.ensure_dirs()
splits = prepare_frames(cfg)
print('Videos counted:', { 'train': len(splits.train), 'val': len(splits.val), 'test': len(splits.test) })

In [None]:
# 🧪 Build Arrays & Train (Workout Form Classifier)
import numpy as np, math, collections
from stretch_detector.data.video_dataset import build_arrays_for_split
from stretch_detector.models.cnn_lstm import build_cnn_lstm
from stretch_detector.models.cnn3d import build_cnn3d
from stretch_detector.config import DEFAULT_CONFIG as cfg
from tensorflow import keras

# Core hyperparameters tailored for workout form clips
cfg.seq_len = 10            # frames per sample
cfg.image_size = (128,128)  # reduce for speed; increase to improve spatial detail
cfg.batch_size = 8
cfg.epochs = 15
cfg.num_classes = 10        # 5 exercises × 2 forms (proper/improper)

X_train, y_train = build_arrays_for_split(cfg, 'train')
X_val, y_val = build_arrays_for_split(cfg, 'val')
print('Train shape:', X_train.shape, 'Val shape:', X_val.shape)

if X_train.shape[0] == 0:
    raise ValueError('No training samples found. Upload workout videos (proper/improper) then re-run extraction.')

# Optional class weighting for imbalance
counts = collections.Counter(y_train.tolist())
class_weight = {cls: max(1.0, float(sum(counts.values()))/(len(counts)*cnt)) for cls, cnt in counts.items()}
print('Class counts:', dict(counts))
print('Derived class_weight:', class_weight)

model_choice = 'cnn_lstm'  # switch to 'cnn3d' after verifying pipeline
input_shape = (cfg.seq_len, cfg.image_size[0], cfg.image_size[1], cfg.channels)
model = build_cnn_lstm(input_shape, num_classes=cfg.num_classes) if model_choice=='cnn_lstm' else build_cnn3d(input_shape, num_classes=cfg.num_classes)

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=cfg.learning_rate),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
model.summary()

callbacks = [
    keras.callbacks.EarlyStopping(patience=4, restore_best_weights=True, monitor='val_loss'),
]

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val) if X_val.shape[0] > 0 else None,
    epochs=cfg.epochs,
    batch_size=min(cfg.batch_size, max(1, X_train.shape[0])),
    shuffle=True,
    verbose=1,
    class_weight=class_weight
)

from pathlib import Path
models_dir = Path('models'); models_dir.mkdir(exist_ok=True)
model_path = models_dir / f'workout_form_{model_choice}_{cfg.num_classes}cls.keras'
model.save(model_path)
print('Saved model to', model_path)

In [None]:
# 📊 Evaluate & Quick Inference (Workout Form)
import numpy as np
from stretch_detector.data.video_dataset import class_key_from_label

if 'X_val' in globals() and X_val.shape[0] > 0:
    val_metrics = model.evaluate(X_val, y_val, verbose=0)
    print('Validation metrics:', dict(zip(model.metrics_names, val_metrics)))

if X_train.shape[0] > 0:
    sample = X_train[0:1]
    pred = model.predict(sample, verbose=0)
    cls_id = int(np.argmax(pred, axis=-1)[0])
    print('Predicted class id:', cls_id, '->', class_key_from_label(cls_id))


### ✅ Tips & Next Steps (Workout Form)
- Reduce `cfg.seq_len` (e.g., 6) for faster iteration; increase when temporal nuances matter (e.g., deadlift phases).
- Increase `image_size` (160–192) once pipeline is stable to capture bar path & limb positions.
- Add data augmentation (future: random crop, slight brightness shifts) to generalize across gyms.
- Track precision/recall per class by extending evaluation (confusion matrix) for proper vs improper forms.
- Export to TFLite for mobile form feedback apps after pruning/quantization.
- Download model: `from google.colab import files; files.download(str(model_path))`.

Cleanup (optional):
```python
import shutil, pathlib
for p in ['data/frames','data/npz']:
    d=pathlib.Path(p)
    if d.exists():
        shutil.rmtree(d)
```

Next enhancement: replace raw frame stacking with a tf.data streaming loader + on-the-fly augmentation.