# Benchmark Results Plotting

Plot benchmark results from Google Drive.

**Features:**
- Models on x-axis, grouped by benchmark (AMC23/AIME25) and eval type (Greedy/Avg@32)
- Solid bars = Greedy, Hatched bars = Avg@32
- Green = AMC23, Blue = AIME25
- Auto-discovers models from result files

## Setup

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Import plotting functions
import sys
sys.path.insert(0, '/content/drive/MyDrive/SRL-reasoning/plots')

from plot_benchmarks import (
    plot_benchmark_results,
    list_available_results,
    DRIVE_RESULTS_FOLDER,
)

print(f"Results folder: {DRIVE_RESULTS_FOLDER}")

## View Available Results

In [None]:
# See what results are available
list_available_results()

## Plot All Results

Generate a bar chart with all models, benchmarks, and eval types.

In [None]:
# Plot all results - saves PDF locally by default
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types='all',
    models='all',
    save_pdf=True,
    output_filename='all_benchmarks.pdf',
)

## Custom Plots

Examples of filtering by benchmark, eval type, or model.

In [None]:
# Plot only AMC23 results
fig = plot_benchmark_results(
    benchmarks=['amc23'],
    eval_types='all',
    models='all',
    save_pdf=True,
    output_filename='amc23_results.pdf',
    title='AMC23 Performance',
)

In [None]:
# Plot only AIME25 results
fig = plot_benchmark_results(
    benchmarks=['aime25'],
    eval_types='all',
    models='all',
    save_pdf=True,
    output_filename='aime25_results.pdf',
    title='AIME25 Performance',
)

In [None]:
# Plot only Greedy results (both benchmarks)
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types=['greedy'],
    models='all',
    save_pdf=True,
    output_filename='greedy_results.pdf',
    title='Greedy Decoding Performance',
)

In [None]:
# Plot only Avg@32 results (both benchmarks)
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types=['avg32'],
    models='all',
    save_pdf=True,
    output_filename='avg32_results.pdf',
    title='Avg@32 Performance',
)

## Save to Google Drive

Use `save_to_drive=True` to save PDF directly to Drive.

In [None]:
# Save plot to Drive (same folder as results)
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types='all',
    models='all',
    save_pdf=True,
    save_to_drive=True,
    output_filename='benchmark_results.pdf',
)

In [None]:
# Save to custom Drive path
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types='all',
    models='all',
    save_pdf=True,
    save_to_drive=True,
    drive_save_path='/content/drive/MyDrive/plots',
    output_filename='benchmark_results.pdf',
)

## Display Only (No Save)

Use `save_pdf=False` to just display the plot without saving.

In [None]:
# Just display, don't save
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types='all',
    models='all',
    save_pdf=False,
)

## Filter by Specific Models

In [None]:
# Plot specific models only
# Use model short names (without repo prefix) or full paths
fig = plot_benchmark_results(
    benchmarks='all',
    eval_types='all',
    models=['Qwen3-4B-Instruct-2507'],  # Add more models to this list
    save_pdf=False,
)