# Tutorial 02: Evaluating Classical Controllers

This tutorial demonstrates how to use Myriad's platform to evaluate classical control strategies on the CartPole task.

## Learning Objectives
1. Use the platform's `evaluate()` function with `EvalConfig`
2. Compare different control strategies using the platform
3. Work with `EvaluationResults` objects
4. Analyze and visualize results

**Key Platform Feature:** The `evaluate()` function with `EvalConfig` provides a clean API for evaluation-only runs—perfect for classical controllers, pre-trained models, and baselines.

## Setup

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from myriad.configs.default import AgentConfig, EnvConfig, EvalConfig
from myriad.platform import evaluate

# Set seaborn style
sns.set_theme(style="whitegrid", palette="muted")

SEED = 42

---

# Section 1: Classical Controllers Overview

The platform includes classical controllers for baselines and debugging:

1. **Random**: Uniform random action selection
2. **Bang-Bang**: Threshold-based switching (if theta > 0, push right; else push left)
3. **PID**: Proportional-Integral-Derivative control (for continuous actions)

These are registered in the agent registry, so you can use them with `AgentConfig(name="random")` just like learned agents.

---

# Section 2: Evaluating with EvalConfig

The `evaluate()` function accepts `EvalConfig`—a simplified configuration focused exclusively on evaluation.

**Why EvalConfig?**
- No training-specific parameters (`num_envs`, `steps_per_env`, `scan_chunk_size`, etc.)
- Clear intent: evaluation-only
- Simpler configuration

## 2.1 Evaluate Random Controller

In [2]:
# Create evaluation config
random_config = EvalConfig(
    env=EnvConfig(name="cartpole-control"),
    agent=AgentConfig(name="random"),
    seed=SEED,
    eval_rollouts=100,      # Run 100 episodes
    eval_max_steps=500,     # Max 500 steps per episode
)

# Evaluate (with return_episodes=True to get trajectories)
random_results = evaluate(config=random_config, return_episodes=True)

# EvaluationResults has pre-computed statistics
print(f"\nResults:")
print(f"  Mean return: {random_results.mean_return:.1f}")
print(f"  Std return: {random_results.std_return:.1f}")
print(f"  Return range: [{random_results.min_return:.1f}, {random_results.max_return:.1f}]")
print(f"  Mean episode length: {random_results.mean_length:.1f}")

# Can also use summary()
print(f"\nSummary:")
print(random_results.summary())


Results:
  Mean return: 21.2
  Std return: 11.0
  Return range: [8.0, 62.0]
  Mean episode length: 21.2

Summary:
{'mean_return': 21.25, 'std_return': 11.014876365661621, 'min_return': 8.0, 'max_return': 62.0, 'mean_length': 21.25, 'num_episodes': 100}


## 2.2 Evaluate Bang-Bang Controller

In [3]:
bangbang_config = EvalConfig(
    env=EnvConfig(name="cartpole-control"),
    agent=AgentConfig(name="bangbang"),
    seed=SEED,
    eval_rollouts=100,
    eval_max_steps=500,
)

bangbang_results = evaluate(config=bangbang_config, return_episodes=True)

print(f"\nResults summary:")
for k, v in bangbang_results.summary().items():
    print(f"\t{k}: {v:.2f}")


Results summary:
	mean_return: 42.01
	std_return: 9.24
	min_return: 24.00
	max_return: 66.00
	mean_length: 42.01
	num_episodes: 100.00


## 2.3 Evaluate PID Controller

In [5]:
pid_config = EvalConfig(
    env=EnvConfig(name="cartpole-control"),
    agent=AgentConfig(name="pid", kp=1.0, ki=0.0, kd=1.0, control_low=-1, control_high=1),
    seed=SEED,
    eval_rollouts=100,
    eval_max_steps=500,
)

pid_results = evaluate(config=pid_config, return_episodes=True)

print(f"\nResults summary:")
for k, v in pid_results.summary().items():
    print(f"\t{k}: {v:.2f}")


Results summary:
	mean_return: 8.75
	std_return: 0.46
	min_return: 8.00
	max_return: 10.00
	mean_length: 8.75
	num_episodes: 100.00


---

# Section 3: Comparing Results

EvaluationResults provides both summary statistics and raw data for custom analysis.

## 3.1 Summary Table

In [8]:
comparison_df = pd.DataFrame({
    "Controller": ["Random", "Bang-Bang", "PID"],
    "Mean Return": [random_results.mean_return, bangbang_results.mean_return, pid_results.mean_return],
    "Std Return": [random_results.std_return, bangbang_results.std_return, pid_results.std_return],
    "Mean Length": [random_results.mean_length, bangbang_results.mean_length, pid_results.mean_length],
    "Max Return": [random_results.max_return, bangbang_results.max_return, pid_results.max_return],
})

print("\nPerformance Comparison:")
print(comparison_df.to_string(index=False))


Performance Comparison:
Controller  Mean Return  Std Return  Mean Length  Max Return
    Random    21.250000   11.014876        21.25        62.0
 Bang-Bang    42.009998    9.236336        42.01        66.0
       PID     8.750000    0.455522         8.75        10.0


## 3.2 Distribution Plots

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle("Classical Controller Performance on CartPole", fontsize=16, fontweight="bold")

# Episode returns (using raw data from results)
ax = axes[0]
data = [random_results.episode_returns, bangbang_results.episode_returns]
bp = ax.boxplot(data, labels=["Random", "Bang-Bang"], patch_artist=True)
for patch, color in zip(bp["boxes"], ["lightblue", "lightcoral"]):
    patch.set_facecolor(color)
ax.axhline(y=500, color="green", linestyle="--", linewidth=2, alpha=0.6, label="Max (500)")
ax.set_ylabel("Episode Return", fontsize=12)
ax.set_title("Return Distribution", fontweight="bold")
ax.legend()
ax.grid(True, alpha=0.3)

# Episode lengths
ax = axes[1]
data = [random_results.episode_lengths, bangbang_results.episode_lengths]
bp = ax.boxplot(data, labels=["Random", "Bang-Bang"], patch_artist=True)
for patch, color in zip(bp["boxes"], ["lightblue", "lightcoral"]):
    patch.set_facecolor(color)
ax.axhline(y=500, color="green", linestyle="--", linewidth=2, alpha=0.6, label="Max (500)")
ax.set_ylabel("Episode Length", fontsize=12)
ax.set_title("Length Distribution", fontweight="bold")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3.3 Histograms

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle("Episode Return Distributions", fontsize=16, fontweight="bold")

# Random
ax = axes[0]
ax.hist(random_results.episode_returns, bins=20, alpha=0.7, color="lightblue", edgecolor="black")
ax.axvline(
    random_results.mean_return, color="blue", linestyle="--", linewidth=2,
    label=f"Mean: {random_results.mean_return:.1f}"
)
ax.set_xlabel("Episode Return")
ax.set_ylabel("Frequency")
ax.set_title("Random Controller", fontweight="bold")
ax.legend()
ax.grid(True, alpha=0.3)

# Bang-Bang
ax = axes[1]
ax.hist(bangbang_results.episode_returns, bins=20, alpha=0.7, color="lightcoral", edgecolor="black")
ax.axvline(
    bangbang_results.mean_return, color="red", linestyle="--", linewidth=2,
    label=f"Mean: {bangbang_results.mean_return:.1f}"
)
ax.set_xlabel("Episode Return")
ax.set_ylabel("Frequency")
ax.set_title("Bang-Bang Controller", fontweight="bold")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

# Section 4: Analyzing Trajectories

Since we used `return_episodes=True`, the results include full trajectories.

In [None]:
print("Trajectory data available in results.episodes:")
print(f"  Keys: {list(bangbang_results.episodes.keys())}")
print(f"\nShapes:")
for key, arr in bangbang_results.episodes.items():
    print(f"  {key}: {arr.shape}")

print(f"\nFormat: (num_episodes={bangbang_results.num_episodes}, max_steps={bangbang_config.eval_max_steps}, ...)")

## 4.1 Visualize a Single Episode

In [None]:
# Pick the first episode
episode_idx = 0
length = int(bangbang_results.episode_lengths[episode_idx])

# Extract trajectory data (trim to actual episode length)
obs = bangbang_results.episodes["observations"][episode_idx, :length, :]
actions = bangbang_results.episodes["actions"][episode_idx, :length]

# Parse observations: [x, x_dot, theta, theta_dot]
x = obs[:, 0]
theta = obs[:, 2]

# Plot
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
fig.suptitle(
    f"Bang-Bang Controller - Episode {episode_idx} (Length: {length}, Return: {bangbang_results.episode_returns[episode_idx]:.1f})",
    fontsize=14, fontweight="bold"
)

# Pole angle
ax = axes[0]
ax.plot(theta, linewidth=2, color="crimson")
ax.axhline(y=0, color="gray", linestyle="--", alpha=0.5)
ax.set_ylabel("Pole Angle (rad)")
ax.set_title("Pole Angle Over Time", fontweight="bold")
ax.grid(True, alpha=0.3)

# Cart position
ax = axes[1]
ax.plot(x, linewidth=2, color="crimson")
ax.axhline(y=0, color="gray", linestyle="--", alpha=0.5)
ax.set_ylabel("Cart Position (m)")
ax.set_title("Cart Position Over Time", fontweight="bold")
ax.grid(True, alpha=0.3)

# Actions
ax = axes[2]
ax.step(range(length), actions, where="post", linewidth=2, color="crimson")
ax.set_ylabel("Action\n(0=Left, 1=Right)")
ax.set_xlabel("Time Step")
ax.set_title("Actions Over Time", fontweight="bold")
ax.set_ylim([-0.1, 1.1])
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

# Tutorial Complete!

## What You Learned
1. How to use `EvalConfig` for clean evaluation-only configuration
2. How to work with `EvaluationResults` objects (pre-computed statistics + raw data)
3. How to use `return_episodes=True` to collect full trajectories
4. How to compare different agents using the platform

## Key Platform Features
- **`EvalConfig`**: Simplified config for evaluation (no training-specific fields)
- **`EvaluationResults`**: Structured results with pre-computed statistics
  - `results.mean_return`, `results.std_return`, etc.
  - `results.episode_returns` (raw data for custom analysis)
  - `results.episodes` (full trajectories if `return_episodes=True`)
  - `results.summary()` (quick overview)
- **Agent registry**: Classical controllers available via `AgentConfig(name="random")`

## API Comparison

**Old approach (manual stats):**
```python
results = evaluate(config)  # Returns dict
mean_return = np.mean(results['episode_return'])  # Manual computation
```

**New approach (structured results):**
```python
results = evaluate(config)  # Returns EvaluationResults
mean_return = results.mean_return  # Pre-computed
print(results.summary())  # Easy overview
```

## Next Steps
Try:
- Compare with learned agents from Tutorial 01 (use `Config` instead of `EvalConfig`)
- Evaluate on different environment configurations
- Use `results.episodes` to debug agent behavior
- Test pre-trained models by passing `agent_state` to `evaluate()`
- Enable W&B logging in `EvalConfig`