# MLPRouter - Training

This notebook demonstrates how to train the **MLPRouter** (Multi-Layer Perceptron Router).

## Overview

MLPRouter uses a neural network classifier with multiple hidden layers to route queries.

**Key Features**:
- Can learn complex non-linear decision boundaries
- Flexible architecture with configurable layers
- Good for large-scale routing problems

## 1. Environment Setup

In [None]:
import os
import sys
from pathlib import Path

PROJECT_ROOT = Path(os.getcwd()).parent.parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

os.chdir(PROJECT_ROOT)
print(f"Working directory: {os.getcwd()}")

In [None]:
from llmrouter.models.mlprouter import MLPRouter, MLPRouterTrainer
from llmrouter.utils import setup_environment

setup_environment()
print("Environment setup complete!")

## 2. Configuration

MLPRouter uses the following configuration parameters:

| Parameter | Description | Default |
|-----------|-------------|--------|
| `hidden_layer_sizes` | Neurons in each hidden layer | [128, 64] |
| `activation` | Activation function | "relu" |
| `solver` | Optimizer: "adam", "lbfgs", "sgd" | "adam" |
| `alpha` | L2 regularization | 0.0001 |
| `learning_rate` | Learning rate schedule | "adaptive" |
| `max_iter` | Maximum iterations | 500 |

In [None]:
import yaml

CONFIG_PATH = "configs/model_config_train/mlprouter.yaml"

with open(CONFIG_PATH, 'r') as f:
    config = yaml.safe_load(f)

print("Current Configuration:")
print("=" * 50)
print(yaml.dump(config, default_flow_style=False))

## 3. Initialize Router

In [None]:
router = MLPRouter(yaml_path=CONFIG_PATH)

print("Router initialized successfully!")
print(f"Number of training samples: {len(router.routing_data_train)}")
print(f"Number of LLM candidates: {len(router.llm_data)}")
print(f"LLM candidates: {list(router.llm_data.keys())}")

In [None]:
# Inspect MLP architecture
print("MLP Model Parameters:")
print(router.mlp_model.get_params())

## 4. Training

In [None]:
trainer = MLPRouterTrainer(router=router, device='cpu')

print("Trainer initialized!")
print(f"Training samples: {len(trainer.query_embedding_list)}")
print(f"Save path: {trainer.save_model_path}")

In [None]:
print("Starting training...")
print("=" * 50)

trainer.train()

print("=" * 50)
print("Training completed!")

## 5. Model Verification

In [None]:
from llmrouter.utils import load_model
import numpy as np

saved_model = load_model(trainer.save_model_path)

print("Model loaded successfully!")
print(f"Model type: {type(saved_model).__name__}")
print(f"Number of layers: {len(saved_model.hidden_layer_sizes)}")
print(f"Layer sizes: {saved_model.hidden_layer_sizes}")
print(f"Classes: {saved_model.classes_}")

In [None]:
# Quick prediction test
test_embedding = trainer.query_embedding_list[0].reshape(1, -1)
prediction = saved_model.predict(test_embedding)

print(f"Test prediction: {prediction[0]}")

proba = saved_model.predict_proba(test_embedding)
print(f"\nPrediction probabilities:")
for model, prob in zip(saved_model.classes_, proba[0]):
    print(f"  {model}: {prob:.4f}")

## 6. Learning Curve Analysis

In [None]:
import matplotlib.pyplot as plt

# Plot training loss curve
if hasattr(saved_model, 'loss_curve_'):
    plt.figure(figsize=(10, 5))
    plt.plot(saved_model.loss_curve_)
    plt.xlabel('Iteration')
    plt.ylabel('Loss')
    plt.title('MLP Training Loss Curve')
    plt.grid(True, alpha=0.3)
    plt.show()
else:
    print("Loss curve not available (model may use 'lbfgs' solver)")

## 7. Architecture Comparison

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

X = np.array(trainer.query_embedding_list)
y = np.array(trainer.model_name_list)

# Test different architectures
architectures = [
    (64,),
    (128,),
    (128, 64),
    (256, 128),
    (256, 128, 64),
]

print("Architecture comparison:")
print("=" * 50)

results = []
for arch in architectures:
    mlp = MLPClassifier(hidden_layer_sizes=arch, max_iter=200, random_state=42)
    scores = cross_val_score(mlp, X, y, cv=3, scoring='accuracy')
    results.append((arch, scores.mean(), scores.std()))
    print(f"{str(arch):20} Accuracy: {scores.mean():.4f} (+/- {scores.std():.4f})")

best_arch, best_score, _ = max(results, key=lambda x: x[1])
print(f"\nBest architecture: {best_arch} with accuracy: {best_score:.4f}")

## Summary

In this notebook, we:

1. **Loaded Configuration**: Set up MLPRouter with YAML configuration
2. **Trained Model**: Used MLPRouterTrainer to fit the neural network
3. **Verified Model**: Loaded and tested the saved model
4. **Compared Architectures**: Found optimal layer configuration

**Next Steps**:
- Use `02_mlprouter_inference.ipynb` for inference
- Experiment with different activation functions