# miDiKompanion ML Model Training - Google Colab

This notebook trains all 5 ML models for miDiKompanion using Google Colab's free GPU tier.

**5-Model Architecture**:
1. EmotionRecognizer: Audio → Emotion (128→64)
2. MelodyTransformer: Emotion → MIDI (64→128)
3. HarmonyPredictor: Context → Chords (128→64)
4. DynamicsEngine: Context → Expression (32→16)
5. GroovePredictor: Emotion → Groove (64→32)

**Total**: ~1M parameters, ~4MB, <10ms inference

**Training Time**: 8-14 hours on Colab free GPU

## Setup

In [None]:
# Install dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install numpy pandas librosa soundfile mido music21 pretty_midi
!pip install onnx onnxruntime tensorboard scikit-learn tqdm matplotlib

In [None]:
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")

## Upload Training Scripts

In [None]:
# Upload training scripts from local machine
# Or clone from repository
from google.colab import files
print("Please upload the following files:")
print("  - train_all_models.py")
print("  - prepare_datasets.py")
print("  - export_to_onnx.py")
print("  - training_utils.py")
print("  - dataset_loaders.py")
print("\nOr clone from repository:")
print("!git clone <repository_url> ml_training")

## Prepare Datasets

In [None]:
# Prepare datasets with node labels
!python prepare_datasets.py \
    --audio-dir /content/datasets/audio \
    --midi-dir /content/datasets/midi \
    --output-dir /content/datasets/prepared \
    --node-labels

## Train All Models

In [None]:
# Train all 5 models
!python train_all_models.py \
    --data-dir /content/datasets/prepared \
    --output-dir /content/trained_models \
    --epochs 100 \
    --batch-size 64 \
    --learning-rate 0.001

## Export to ONNX

In [None]:
# Export trained models to ONNX format
!python export_to_onnx.py \
    --models-dir /content/trained_models \
    --output-dir /content/models/onnx \
    --optimize

## Validate Models

In [None]:
# Validate ONNX models
!python validate_models.py \
    --models-dir /content/models/onnx \
    --check-inference

## Download Models

In [None]:
# Download trained models
from google.colab import files
import zipfile
import os

# Create zip archive
with zipfile.ZipFile('midikompanion_models.zip', 'w') as zipf:
    for root, dirs, files in os.walk('/content/models/onnx'):
        for file in files:
            zipf.write(os.path.join(root, file))

# Download
files.download('midikompanion_models.zip')

## Training Progress Monitoring

In [None]:
# Start TensorBoard
%load_ext tensorboard
%tensorboard --logdir /content/logs

## Notes

- **Free GPU Tier**: Limited to 12 hours per session. Save checkpoints frequently.
- **Model Size**: All 5 models total ~4MB, suitable for plugin distribution.
- **Inference Target**: <10ms per model on CPU, <5ms on GPU.
- **Node-Aware Training**: Use `node_aware_training.py` for 216-node thesaurus integration.