# 03: Metrics Visualization

This notebook demonstrates visualization of experiment metrics using the `plot_metrics()` function.

## What You'll Learn

- Using `plot_metrics()` for basic and advanced visualizations
- Plotting training curves (single and multiple experiments)
- Creating multi-metric dashboards
- Comparing experiments with custom labels
- Creating subplot grids for parameter comparisons
- Statistical aggregation with confidence intervals

## Prerequisites

**This notebook uses the same 12 training experiments from Notebooks 01 and 02.**

If you haven't already, run this command from the `examples/cli/05_multi_step_metrics/` directory:

```bash
cd examples/cli/05_multi_step_metrics
yanex run train_model.py \
  --param "epochs=10,20,30" \
  --param "learning_rate=logspace(-4, -1, 4)" \
  --param "batch_size=32" \
  --tag results-demo \
  --parallel 0
```

This creates **12 experiments** (3 epochs × 4 learning rates) with the tag `results-demo`.

## Import Libraries

Import the Results API and configure matplotlib for notebook display.

In [None]:
import yanex.results as yr
import matplotlib.pyplot as plt

# Configure matplotlib for better display in notebooks
%matplotlib inline
plt.rcParams['figure.dpi'] = 100

## Basic Usage: Single Experiment Training Curve

Let's start with the simplest case - plotting the training curve for a single experiment.

In [None]:
# Get one experiment
experiments = yr.get_experiments(tags=["results-demo"], limit=1)
exp_id = experiments[0].id

print(f"Plotting experiment: {exp_id}")
print(f"Epochs: {experiments[0].get_param('epochs')}")
print(f"Learning rate: {experiments[0].get_param('learning_rate')}")

In [None]:
# Plot training accuracy over time
yr.plot_metrics("train_accuracy", ids=[exp_id]);

## Multiple Metrics: Multi-Metric Dashboard

Plot multiple metrics from the same experiment to create a dashboard view.

In [None]:
# Plot multiple metrics in separate subplots
yr.plot_metrics(
    ["train_accuracy", "train_loss"],
    ids=[exp_id],
    figsize=(12, 4)
);

## Comparing Multiple Experiments

Plot the same metric across multiple experiments to compare their performance.

In [None]:
# Compare training accuracy across all experiments
yr.plot_metrics(
    "train_accuracy",
    tags=["results-demo"],
    title="Training Accuracy Comparison"
);

**Note:** The legend shows experiment IDs by default. Let's make this more readable by labeling by learning rate.

## Custom Labels: Label by Parameter

Use `label_by` to label experiments by a parameter value instead of IDs.

In [None]:
# Label experiments by learning rate
yr.plot_metrics(
    "train_accuracy",
    tags=["results-demo"],
    label_by="learning_rate",
    title="Training Accuracy by Learning Rate"
);

**Much better!** Now we can see that different learning rates produce different curves, but some experiments share the same learning rate (varying by epochs).

## Subplots: Organizing by Parameters

Use `subplot_by` to create separate plots for different parameter values.

In [None]:
# Create subplots by number of epochs
yr.plot_metrics(
    "train_accuracy",
    tags=["results-demo"],
    label_by="learning_rate",
    subplot_by="epochs",
    figsize=(15, 4),
    title="Training Accuracy: Learning Rate Effect Across Different Training Durations"
);

**Key Insight:** Each subplot shows one training duration (10, 20, or 30 epochs), with different colored lines for each learning rate. This makes it easy to compare the effect of learning rate within each training duration.

## Custom Subplot Layout

Control the subplot arrangement with `subplot_layout`.

In [None]:
# Arrange subplots vertically
yr.plot_metrics(
    "train_loss",
    tags=["results-demo"],
    label_by="learning_rate",
    subplot_by="epochs",
    subplot_layout=(3, 1),  # 3 rows, 1 column
    figsize=(8, 12)
);

## Multi-Metric + Subplots

Combine multiple metrics with subplots for comprehensive analysis.

In [None]:
# Create a 2x3 grid: 2 metrics × 3 epoch values
yr.plot_metrics(
    ["train_accuracy", "train_loss"],
    tags=["results-demo"],
    label_by="learning_rate",
    subplot_by="epochs",
    subplot_layout=(2, 3),  # 2 rows (metrics) × 3 columns (epochs)
    figsize=(16, 8)
);

## Filtering Experiments

Combine `plot_metrics()` with filtering to focus on specific experiments.

In [None]:
# Get IDs for experiments with 20 epochs using pandas
df = yr.compare(tags=["results-demo"])
exp_20_epochs = df[df[("param", "epochs")] == 20].index.tolist()

print(f"Found {len(exp_20_epochs)} experiments with 20 epochs")

# Plot only those experiments
yr.plot_metrics(
    "train_accuracy",
    ids=exp_20_epochs,
    label_by="learning_rate",
    title="Training Accuracy: 20 Epochs Only"
);

## Customizing Plot Appearance

Customize labels, titles, and figure size for publication-ready plots.

In [None]:
# Custom styling
yr.plot_metrics(
    "train_accuracy",
    tags=["results-demo"],
    label_by="learning_rate",
    subplot_by="epochs",
    title="Neural Network Training Performance",
    xlabel="Training Epoch",
    ylabel="Accuracy",
    figsize=(15, 4),
    grid=True,
    legend_position="lower right"
);

## Advanced: Post-Processing Plots

Use `return_axes=True` to get matplotlib axes for advanced customization.

In [None]:
# Get figure and axes for customization
fig, axes = yr.plot_metrics(
    "train_loss",
    tags=["results-demo"],
    label_by="learning_rate",
    subplot_by="epochs",
    figsize=(15, 4),
    show=False,
    return_axes=True
)

# Add custom annotations
for ax in axes:
    # Add horizontal line at target loss
    ax.axhline(y=0.3, color='red', linestyle='--', alpha=0.5, label='Target Loss')
    ax.legend()

plt.tight_layout()
plt.show()

## Statistical Aggregation (Advanced)

For this section, we'll simulate repeated runs by treating experiments with the same learning rate as "replicate runs." In real scenarios, you'd use `group_by` to aggregate over true random seeds or cross-validation folds.

**Note:** The `group_by` parameter aggregates experiments and shows mean/confidence intervals. This is most useful when you have multiple runs with different random seeds.

In [None]:
# For demonstration, let's show what this would look like conceptually
# In practice, you'd use: group_by="random_seed", label_by="learning_rate"

print("Statistical aggregation with group_by is most useful when you have:")
print("  - Multiple runs with different random seeds")
print("  - Cross-validation folds")
print("  - Replicate experiments")
print("\nExample usage:")
print("  yr.plot_metrics(")
print("      'accuracy',")
print("      tags=['repeated_runs'],")
print("      group_by='random_seed',     # Aggregate over seeds")
print("      label_by='learning_rate',   # Color by LR")
print("      show_ci=True,               # Show 95% confidence interval")
print("      show_individuals=True       # Show individual runs faintly")
print("  )")

## Building Blocks: Using Lower-Level APIs

For maximum flexibility, use the underlying building blocks to create custom visualizations.

In [None]:
# Import building blocks
from yanex.results.viz import extract_metrics_df

# Get experiments
experiments = yr.get_experiments(tags=["results-demo"])

# Extract metrics to pandas DataFrame
df = extract_metrics_df(experiments, ["train_accuracy", "train_loss"])

print(f"Extracted {len(df)} metric records from {df['experiment_id'].nunique()} experiments")
print(f"\nDataFrame columns: {list(df.columns)}")
print(f"\nFirst few rows:")
df.head(10)

In [None]:
# Now you can use any pandas operations!
# For example, compute rolling average
df_with_smooth = df.copy()
df_with_smooth['train_accuracy_smooth'] = (
    df_with_smooth.groupby('experiment_id')['train_accuracy']
    .transform(lambda x: x.rolling(window=3, min_periods=1).mean())
)

# Plot with matplotlib
fig, ax = plt.subplots(figsize=(10, 6))

for exp_id, exp_df in df_with_smooth.groupby('experiment_id'):
    lr = exp_df['learning_rate'].iloc[0]
    ax.plot(exp_df['step'], exp_df['train_accuracy_smooth'], 
           label=f'lr={lr:.4f}', alpha=0.7)

ax.set_xlabel('Step')
ax.set_ylabel('Training Accuracy (smoothed)')
ax.set_title('Custom Plot: Smoothed Training Curves')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Comparing Best vs Worst

Visualize the difference between the best and worst performing experiments.

In [None]:
# Find best and worst experiments
all_experiments = yr.get_experiments(tags=["results-demo"])

# Get final accuracy for each
final_accuracies = [
    (exp.id, exp.get_metric('train_accuracy')[-1], 
     exp.get_param('learning_rate'), exp.get_param('epochs'))
    for exp in all_experiments
]

# Sort by accuracy
final_accuracies.sort(key=lambda x: x[1])

worst_id, worst_acc, worst_lr, worst_epochs = final_accuracies[0]
best_id, best_acc, best_lr, best_epochs = final_accuracies[-1]

print(f"Worst: {worst_id} - {worst_acc:.4f} (lr={worst_lr:.4f}, epochs={worst_epochs})")
print(f"Best:  {best_id} - {best_acc:.4f} (lr={best_lr:.4f}, epochs={best_epochs})")

# Plot comparison
yr.plot_metrics(
    ["train_accuracy", "train_loss"],
    ids=[worst_id, best_id],
    subplot_layout=(2, 1),
    figsize=(10, 8),
    title="Best vs Worst Experiment"
);

## Key Takeaways

✅ **Basic Plotting:**
- `yr.plot_metrics(metric, ids=[...])` - Single experiment
- `yr.plot_metrics([metric1, metric2], ...)` - Multiple metrics
- `yr.plot_metrics(metric, tags=[...])` - Multiple experiments

✅ **Organization:**
- `label_by` - Determines line colors and legend labels
- `subplot_by` - Creates separate subplots by parameter
- `subplot_layout` - Controls subplot arrangement (rows, cols)

✅ **Customization:**
- `title`, `xlabel`, `ylabel` - Custom labels
- `figsize` - Figure dimensions
- `return_axes=True` - Get axes for advanced customization
- `show=False` - Suppress immediate display

✅ **Advanced:**
- `group_by` - Statistical aggregation (for replicate runs)
- `show_ci`, `show_std` - Confidence intervals and standard deviation
- Building blocks: `extract_metrics_df()` for custom analysis

✅ **Best Practices:**
- Use `label_by` to make legends readable
- Use `subplot_by` to organize comparisons
- Combine with pandas for filtering
- Use building blocks for custom visualizations

## Summary

The `plot_metrics()` function provides a flexible, progressive API:
- **Beginners:** Simple one-liners for common plots
- **Intermediate:** Parameters for organizing and labeling
- **Advanced:** Building blocks for custom analysis

All plots use publication-ready defaults (colorblind-safe, clean styling) but can be fully customized when needed.