# 02: Filtering and Comparing

This notebook demonstrates advanced filtering and the powerful `compare()` method for creating pandas DataFrames.

## What You'll Learn

- Using `compare()` to create pandas DataFrames
- Understanding the multi-level column structure
- Accessing parameters and metrics in DataFrames
- Finding optimal hyperparameters with pandas
- Grouping and aggregating experiment results

## Prerequisites

**This notebook uses the same 12 training experiments from Notebook 01.**

If you haven't already, run this command from the `examples/cli/05_multi_step_metrics/` directory:

```bash
cd examples/cli/05_multi_step_metrics
yanex run train_model.py \
  --param "epochs=10,20,30" \
  --param "learning_rate=logspace(-4, -1, 4)" \
  --param "batch_size=32" \
  --tag results-demo \
  --parallel 0
```

This creates **12 experiments** (3 epochs × 4 learning rates) with the tag `results-demo`.

## Import Libraries

In [2]:
import yanex.results as yr

## The `compare()` Method

The `compare()` method creates a pandas DataFrame with all experiment data, making it easy to analyze multiple experiments at once.

**Key features:**
- Returns a pandas DataFrame
- Multi-level columns: `(category, name)` structure
- Categories: `meta` (metadata), `param` (parameters), `metric` (metrics)
- Indexed by experiment ID

**How metric values are extracted:**
- For **multi-step metrics** (logged at each epoch/step): Returns the **final/last logged value**
- For **metrics logged occasionally**: Returns the **most recent occurrence**
- Perfect for comparing overall experiment performance across hyperparameter configurations

Let's create a DataFrame for our 12 training experiments:

In [40]:
# Create a comparison DataFrame for our training experiments
df = yr.compare(tags=["results-demo"])

print(f"DataFrame shape: {df.shape}")
print(f"Number of experiments: {len(df)}")

df

DataFrame shape: (12, 19)
Number of experiments: 12


category,meta,meta,meta,meta,meta,meta,param,param,param,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric
name,script,name,started,duration,status,tags,batch_size,epochs,learning_rate,final_train_accuracy,last_updated,learning_rate,step,timestamp,total_epochs,train_accuracy,train_loss,val_accuracy,val_loss
experiment_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2
b05cd09c,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.1,0.940756,2025-11-12T22:25:27.949296,0.1,31,2025-11-12T22:25:27.951074,30,0.940756,0.295027,0.91334,0.458478
034200fb,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.01,0.926633,2025-11-12T22:25:27.954390,0.01,31,2025-11-12T22:25:27.955868,30,0.926633,0.243508,0.909512,0.352235
97710305,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.001,0.940816,2025-11-12T22:25:27.946561,0.001,31,2025-11-12T22:25:27.948276,30,0.940816,0.35798,0.935315,0.463488
cd0a7565,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.0001,0.903623,2025-11-12T22:25:27.956498,0.0001,31,2025-11-12T22:25:27.957207,30,0.903623,0.349994,0.889964,0.532462
964c39aa,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:06,completed,"results-demo, sweep",32,20,0.1,0.961893,2025-11-12T22:25:24.866215,0.1,21,2025-11-12T22:25:24.867883,20,0.961893,0.426308,0.958098,0.498174
9cfff919,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:06,completed,"results-demo, sweep",32,20,0.01,0.991475,2025-11-12T22:25:24.871731,0.01,21,2025-11-12T22:25:24.873857,20,0.991475,0.332429,0.912048,0.487776
b83a2b46,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:06,completed,"results-demo, sweep",32,20,0.001,0.96208,2025-11-12T22:25:24.874323,0.001,21,2025-11-12T22:25:24.875038,20,0.96208,0.37352,0.86502,0.427178
a03bec22,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:06,completed,"results-demo, sweep",32,20,0.0001,0.975811,2025-11-12T22:25:24.873428,0.0001,21,2025-11-12T22:25:24.874816,20,0.975811,0.354385,0.904226,0.539031
9046b032,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:03,completed,"results-demo, sweep",32,10,0.1,0.920763,2025-11-12T22:25:21.804885,0.1,11,2025-11-12T22:25:21.807546,10,0.920763,0.652956,0.882242,0.767296
da43179d,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:03,completed,"results-demo, sweep",32,10,0.01,0.970567,2025-11-12T22:25:21.802922,0.01,11,2025-11-12T22:25:21.806511,10,0.970567,0.626489,0.926669,0.718267


## Understanding the Multi-Level Column Structure

The DataFrame uses multi-level columns with format `(category, name)`:

- **`meta`** - Metadata columns: `name`, `status`, `started`, `duration`, `tags`, etc.
- **`param`** - Parameter columns: `epochs`, `learning_rate`, `batch_size`
- **`metric`** - Metric columns: `train_loss`, `train_accuracy`, etc.

In [None]:
# Show all column names
print("All columns:")
for col in df.columns:
    print(f"  {col}")

All columns:
  ('meta', 'script')
  ('meta', 'name')
  ('meta', 'started')
  ('meta', 'duration')
  ('meta', 'status')
  ('meta', 'tags')
  ('param', 'batch_size')
  ('param', 'epochs')
  ('param', 'learning_rate')
  ('metric', 'final_train_accuracy')
  ('metric', 'last_updated')
  ('metric', 'learning_rate')
  ('metric', 'step')
  ('metric', 'timestamp')
  ('metric', 'total_epochs')
  ('metric', 'train_accuracy')
  ('metric', 'train_loss')
  ('metric', 'val_accuracy')
  ('metric', 'val_loss')


### Flattening to Single-Level Columns

If you prefer to work with a flat column structure, yanex provides a utility function to convert multi-level columns into simple string names:

- Metadata columns keep their original names (`name`, `status`, etc.)
- Parameter columns get prefixed: `param_epochs`, `param_learning_rate`
- Metric columns get prefixed: `metric_train_loss`, `metric_train_accuracy`

In [43]:
# Flatten the multi-level columns to simple string names
from yanex.results.dataframe import flatten_dataframe_columns

flat_df = flatten_dataframe_columns(df)

print("Flattened DataFrame columns:")
print(flat_df.columns.tolist()[:10], "...")  # Show first 10 columns

print("\nFlattened DataFrame preview:")
flat_df.head()

Flattened DataFrame columns:
['script', 'name', 'started', 'duration', 'status', 'tags', 'param_batch_size', 'param_epochs', 'param_learning_rate', 'metric_final_train_accuracy'] ...

Flattened DataFrame preview:


Unnamed: 0_level_0,script,name,started,duration,status,tags,param_batch_size,param_epochs,param_learning_rate,metric_final_train_accuracy,metric_last_updated,metric_learning_rate,metric_step,metric_timestamp,metric_total_epochs,metric_train_accuracy,metric_train_loss,metric_val_accuracy,metric_val_loss
experiment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
b05cd09c,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.1,0.940756,2025-11-12T22:25:27.949296,0.1,31,2025-11-12T22:25:27.951074,30,0.940756,0.295027,0.91334,0.458478
034200fb,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.01,0.926633,2025-11-12T22:25:27.954390,0.01,31,2025-11-12T22:25:27.955868,30,0.926633,0.243508,0.909512,0.352235
97710305,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.001,0.940816,2025-11-12T22:25:27.946561,0.001,31,2025-11-12T22:25:27.948276,30,0.940816,0.35798,0.935315,0.463488
cd0a7565,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:09,completed,"results-demo, sweep",32,30,0.0001,0.903623,2025-11-12T22:25:27.956498,0.0001,31,2025-11-12T22:25:27.957207,30,0.903623,0.349994,0.889964,0.532462
964c39aa,train_model.py,sweep,2025-11-12 22:25:18,0 days 00:00:06,completed,"results-demo, sweep",32,20,0.1,0.961893,2025-11-12T22:25:24.866215,0.1,21,2025-11-12T22:25:24.867883,20,0.961893,0.426308,0.958098,0.498174


## Accessing Columns by Category

Access columns using the `(category, name)` tuple notation:

In [45]:
# Access metadata columns
print("Experiment statuses:")
print(df[("meta", "status")].value_counts())

print("\nExperiment durations:")
print(df[("meta", "duration")])

Experiment statuses:
(meta, status)
completed    12
Name: count, dtype: int64

Experiment durations:
experiment_id
b05cd09c   0 days 00:00:09
034200fb   0 days 00:00:09
97710305   0 days 00:00:09
cd0a7565   0 days 00:00:09
964c39aa   0 days 00:00:06
9cfff919   0 days 00:00:06
b83a2b46   0 days 00:00:06
a03bec22   0 days 00:00:06
9046b032   0 days 00:00:03
da43179d   0 days 00:00:03
0e6d780d   0 days 00:00:03
f52288d7   0 days 00:00:03
Name: (meta, duration), dtype: timedelta64[ns]


In [47]:
# Access parameter columns using .xs() "cross-section" method

print("Training configurations:")
params_df = df.xs("param", axis=1)
params_df

Training configurations:


name,batch_size,epochs,learning_rate
experiment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
b05cd09c,32,30,0.1
034200fb,32,30,0.01
97710305,32,30,0.001
cd0a7565,32,30,0.0001
964c39aa,32,20,0.1
9cfff919,32,20,0.01
b83a2b46,32,20,0.001
a03bec22,32,20,0.0001
9046b032,32,10,0.1
da43179d,32,10,0.01


## Finding the Best Experiment with Pandas

Use pandas operations to find experiments with optimal metrics:

In [49]:
# Find experiment with highest training accuracy
best_idx = df[("metric", "train_accuracy")].idxmax()
best_exp = df.loc[best_idx]

print("Best Experiment:")
print("=" * 60)
print(f"ID: {best_idx}")
print(f"Status: {best_exp[('meta', 'status')]}")
print("\nHyperparameters:")
print(f"  Epochs: {best_exp[('param', 'epochs')]}")
print(f"  Learning rate: {best_exp[('param', 'learning_rate')]:.6f}")
print(f"  Batch size: {best_exp[('param', 'batch_size')]}")
print("\nFinal Performance:")
print(f"  Train Loss: {best_exp[('metric', 'train_loss')]:.4f}")
print(f"  Train Accuracy: {best_exp[('metric', 'train_accuracy')]:.4f}")
print(f"\nDuration: {best_exp[('meta', 'duration')]}")

Best Experiment:
ID: 9cfff919
Status: completed

Hyperparameters:
  Epochs: 20
  Learning rate: 0.010000
  Batch size: 32

Final Performance:
  Train Loss: 0.3324
  Train Accuracy: 0.9915

Duration: 0 days 00:00:06


In [50]:
# Find experiment with lowest training loss
best_loss_idx = df[("metric", "train_loss")].idxmin()

print(f"Experiment with lowest loss: {best_loss_idx}")
print(f"  Train Loss: {df.loc[best_loss_idx, ('metric', 'train_loss')]:.4f}")
print(f"  Epochs: {df.loc[best_loss_idx, ('param', 'epochs')]}")
print(f"  Learning rate: {df.loc[best_loss_idx, ('param', 'learning_rate')]:.6f}")

Experiment with lowest loss: 034200fb
  Train Loss: 0.2435
  Epochs: 30
  Learning rate: 0.010000


## Sorting and Ranking

Sort experiments by metrics to see rankings:

In [53]:
# Sort by training accuracy (descending)
sorted_df = df.sort_values(("metric", "train_accuracy"), ascending=False)

print("Experiments sorted by Training Accuracy:")
print("=" * 60)

sorted_df = sorted_df[
    [
        ("param", "epochs"),
        ("param", "learning_rate"),
        ("metric", "train_loss"),
        ("metric", "train_accuracy"),
    ]
]

print(sorted_df)

Experiments sorted by Training Accuracy:
category       param                   metric               
name          epochs learning_rate train_loss train_accuracy
experiment_id                                               
9cfff919          20        0.0100   0.332429       0.991475
a03bec22          20        0.0001   0.354385       0.975811
da43179d          10        0.0100   0.626489       0.970567
b83a2b46          20        0.0010   0.373520       0.962080
964c39aa          20        0.1000   0.426308       0.961893
f52288d7          10        0.0001   0.750290       0.960950
97710305          30        0.0010   0.357980       0.940816
b05cd09c          30        0.1000   0.295027       0.940756
034200fb          30        0.0100   0.243508       0.926633
0e6d780d          10        0.0010   0.631879       0.923451
9046b032          10        0.1000   0.652956       0.920763
cd0a7565          30        0.0001   0.349994       0.903623


## Grouping by Hyperparameters

Group experiments by parameter values to analyze effects:

In [62]:
# Group by number of epochs and calculate statistics
grouped = df.groupby(("param", "epochs"))[
    [("metric", "train_loss"), ("metric", "train_accuracy")]
]

print("Performance by Number of Epochs:")
print("=" * 60)
grouped.agg(
    [
        "mean",
        "median",
        "std",
        "min",
        "max",
        ("p95", lambda x: x.quantile(0.95)),
        "count",
    ]
)

Performance by Number of Epochs:


category,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric,metric
name,train_loss,train_loss,train_loss,train_loss,train_loss,train_loss,train_loss,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy
Unnamed: 0_level_2,mean,median,std,min,max,p95,count,mean,median,std,min,max,p95,count
"(param, epochs)",Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3
10,0.665404,0.642417,0.057732,0.626489,0.75029,0.73569,4,0.943933,0.9422,0.02553,0.920763,0.970567,0.969124,4
20,0.371661,0.363953,0.040114,0.332429,0.426308,0.41839,4,0.972815,0.968946,0.014044,0.961893,0.991475,0.989125,4
30,0.311627,0.322511,0.053343,0.243508,0.35798,0.356782,4,0.927957,0.933695,0.017541,0.903623,0.940816,0.940807,4


In [63]:
# Group by learning rate and calculate statistics
lr_grouped = df.groupby(("param", "learning_rate"))[[("metric", "train_accuracy")]]

print("\nPerformance by Learning Rate:")
print("=" * 60)
lr_grouped.agg(
    [
        "mean",
        "median",
        "std",
        "min",
        "max",
        ("p95", lambda x: x.quantile(0.95)),
        "count",
    ]
)


Performance by Learning Rate:


category,metric,metric,metric,metric,metric,metric,metric
name,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy,train_accuracy
Unnamed: 0_level_2,mean,median,std,min,max,p95,count
"(param, learning_rate)",Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3
0.0001,0.946795,0.96095,0.038119,0.903623,0.975811,0.974325,3
0.001,0.942116,0.940816,0.019347,0.923451,0.96208,0.959954,3
0.01,0.962892,0.970567,0.033095,0.926633,0.991475,0.989384,3
0.1,0.941137,0.940756,0.020568,0.920763,0.961893,0.959779,3


## Filtering the DataFrame

Use pandas filtering to focus on specific subsets:

In [67]:
# Filter for experiments with 30 epochs
long_training = df[df[("param", "epochs")] == 30]

print(f"Experiments with 30 epochs: {len(long_training)}")
print("\nTheir performance (in descending order):")
long_training[[("param", "learning_rate"), ("metric", "train_accuracy")]].sort_values(
    ("metric", "train_accuracy"), ascending=False
)

Experiments with 30 epochs: 4

Their performance (in descending order):


category,param,metric
name,learning_rate,train_accuracy
experiment_id,Unnamed: 1_level_2,Unnamed: 2_level_2
97710305,0.001,0.940816
b05cd09c,0.1,0.940756
034200fb,0.01,0.926633
cd0a7565,0.0001,0.903623


In [None]:
# Filter for high-accuracy experiments (>95%)
high_accuracy = df[df[("metric", "train_accuracy")] > 0.95]

print(f"High-accuracy experiments (>95%): {len(high_accuracy)}")
print("\nTheir configurations:")
print(
    high_accuracy[
        [("param", "epochs"), ("param", "learning_rate"), ("metric", "train_accuracy")]
    ]
)

## Finding Optimal Hyperparameters

Identify which parameter values consistently produce good results:

In [69]:
# For each learning rate, show average performance
lr_performance = (
    df.groupby(("param", "learning_rate"))
    .agg(
        {
            ("metric", "train_accuracy"): ["mean", "std", "max"],
            ("metric", "train_loss"): ["mean", "min"],
        }
    )
    .round(4)
)

print("Learning Rate Analysis:")
print("=" * 80)
display(lr_performance)

# Find best learning rate
best_lr = lr_performance[("metric", "train_accuracy")]["mean"].idxmax()
print(f"\n✨ Best learning rate: {best_lr:.6f}")

Learning Rate Analysis:


category,metric,metric,metric,metric,metric
name,train_accuracy,train_accuracy,train_accuracy,train_loss,train_loss
Unnamed: 0_level_2,mean,std,max,mean,min
"(param, learning_rate)",Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3
0.0001,0.9468,0.0381,0.9758,0.4849,0.35
0.001,0.9421,0.0193,0.9621,0.4545,0.358
0.01,0.9629,0.0331,0.9915,0.4008,0.2435
0.1,0.9411,0.0206,0.9619,0.4581,0.295



✨ Best learning rate: 0.010000


In [70]:
# Correlation between epochs and final accuracy
epochs_effect = (
    df.groupby(("param", "epochs"))
    .agg(
        {
            ("metric", "train_accuracy"): ["mean", "std"],
        }
    )
    .round(4)
)

print("Effect of Training Duration:")
print("=" * 60)
print(epochs_effect)

best_epochs = epochs_effect[("metric", "train_accuracy")]["mean"].idxmax()
print(f"\n✨ Best epoch count: {best_epochs}")

Effect of Training Duration:
category                metric        
name            train_accuracy        
                          mean     std
(param, epochs)                       
10                      0.9439  0.0255
20                      0.9728  0.0140
30                      0.9280  0.0175

✨ Best epoch count: 20


## Creating Custom Views

Extract specific columns for cleaner analysis:

In [72]:
# Create a simplified view with just key columns
simple_df = df[
    [
        ("param", "epochs"),
        ("param", "learning_rate"),
        ("metric", "train_loss"),
        ("metric", "train_accuracy"),
        ("meta", "duration"),
    ]
]

# Flatten column names for easier access
simple_df.columns = [
    "epochs",
    "learning_rate",
    "train_loss",
    "train_accuracy",
    "duration",
]

print("Simplified view:")
simple_df.sort_values("train_accuracy", ascending=False)

Simplified view:


Unnamed: 0_level_0,epochs,learning_rate,train_loss,train_accuracy,duration
experiment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
9cfff919,20,0.01,0.332429,0.991475,0 days 00:00:06
a03bec22,20,0.0001,0.354385,0.975811,0 days 00:00:06
da43179d,10,0.01,0.626489,0.970567,0 days 00:00:03
b83a2b46,20,0.001,0.37352,0.96208,0 days 00:00:06
964c39aa,20,0.1,0.426308,0.961893,0 days 00:00:06
f52288d7,10,0.0001,0.75029,0.96095,0 days 00:00:03
97710305,30,0.001,0.35798,0.940816,0 days 00:00:09
b05cd09c,30,0.1,0.295027,0.940756,0 days 00:00:09
034200fb,30,0.01,0.243508,0.926633,0 days 00:00:09
0e6d780d,10,0.001,0.631879,0.923451,0 days 00:00:03


## Statistical Summary

Get quick statistics across all experiments:

In [74]:
# Summary statistics for metrics
metric_cols = [("metric", "train_loss"), ("metric", "train_accuracy")]
print("Training Metrics Summary:")
print("=" * 60)

df[metric_cols].describe()

Training Metrics Summary:


category,metric,metric
name,train_loss,train_accuracy
count,12.0,12.0
mean,0.449564,0.948235
std,0.167899,0.026295
min,0.243508,0.903623
25%,0.345603,0.925838
50%,0.36575,0.950883
75%,0.627836,0.964202
max,0.75029,0.991475


## Key Takeaways

✅ **`compare()` Method:**
- Creates pandas DataFrame from experiments
- Multi-level columns: `(category, name)`
- Categories: `meta`, `param`, `metric`
- Indexed by experiment ID

✅ **DataFrame Operations:**
- Access columns: `df[("category", "name")]`
- Find best/worst: `idxmax()`, `idxmin()`
- Sort: `sort_values()`
- Filter: Boolean indexing `df[df[col] > threshold]`

✅ **Grouping and Aggregation:**
- Group by parameters: `groupby(("param", "name"))`
- Aggregate statistics: `agg(['mean', 'std', 'min', 'max'])`
- Find optimal values: Compare means across groups

✅ **Use Cases:**
- Finding optimal hyperparameters
- Analyzing parameter effects
- Ranking experiments
- Statistical analysis across experiments

## Next Steps

Continue to **Notebook 03: Analysis and Visualization** to learn:
- Creating visualizations with matplotlib
- Advanced analysis patterns
- Training curve visualization
- Cleaning up and deleting experiments