# MATH Dataset: Pre-generate Solutions

**Step 1 of 2**: Query LLMs to generate solutions, weak scores, and ground truth labels.

This notebook produces a JSON file that is consumed by `math_experiment.ipynb`.

In [1]:
# Install dependencies (Colab)
!pip install openai datasets numpy



In [2]:
import sys
sys.path.insert(0, '..')  # Add repo root to path

import numpy as np
from src.config import make_experiment_config
from src.llm import LLMWrapper
from src.data.math_dataset import sample_and_save_datasets
from src.data.pregeneration import pregenerate_all_data, append_pregenerated_data, print_pregeneration_stats

## Configuration

In [3]:
# API Keys
OPENAI_API_KEY = ""  # or set OPENAI_API_KEY env var
DEEPSEEK_API_KEY = ""  # or set DEEPSEEK_API_KEY env var

# Experiment settings
RANDOM_SEED = 42
EXAMPLES_PER_DIFFICULTY = 200
DIFFICULTY_LEVELS = [2, 3, 5]
NUM_ATTEMPTS = 5

RESULTS_DIR = "../results"

# Create config
config = make_experiment_config(
    openai_api_key=OPENAI_API_KEY,
    deepseek_api_key=DEEPSEEK_API_KEY,
    use_deepseek_weak_verifier=True,
    use_math_shepherd=False,
    generator_model="gpt-4o-mini",
    strong_verifier_model="gpt-4o-mini",
)

llm = LLMWrapper(config)
print(f"Weak verifier: {type(llm.weak_verifier).__name__}")

Weak verifier: DeepSeekWeakVerifier


## Load Dataset

In [4]:
np.random.seed(RANDOM_SEED)

datasets = sample_and_save_datasets(
    seed=RANDOM_SEED,
    examples_per_difficulty=EXAMPLES_PER_DIFFICULTY,
    difficulty_levels=DIFFICULTY_LEVELS,
)

for diff, probs in datasets.items():
    print(f"  Difficulty {diff}: {len(probs)} problems")

Loading MATH dataset (levels [2])...


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/5.12M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/210k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/12000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/12000 [00:00<?, ? examples/s]

✓ Loaded 200 problems (levels [2])
Loading MATH dataset (levels [3])...


Filter:   0%|          | 0/12000 [00:00<?, ? examples/s]

✓ Loaded 200 problems (levels [3])
Loading MATH dataset (levels [5])...


Filter:   0%|          | 0/12000 [00:00<?, ? examples/s]

✓ Loaded 200 problems (levels [5])
  Difficulty 2: 200 problems
  Difficulty 3: 200 problems
  Difficulty 5: 200 problems


## Pre-generate Solutions

In [7]:
pregenerated_data = pregenerate_all_data(
    datasets=datasets,
    llm=llm,
    num_problems=1, #EXAMPLES_PER_DIFFICULTY,
    num_attempts=NUM_ATTEMPTS,
    difficulties=DIFFICULTY_LEVELS,
    save_dir=RESULTS_DIR,
    checkpoint_freq=10,
    random_seed=RANDOM_SEED,
)


DIFFICULTY 2
  Problem 1/1 | Correct: 5/5 | Avg weak: 0.996 | Rate: 0.1 attempts/sec
    → Checkpoint saved

DIFFICULTY 3
  Problem 1/1 | Correct: 5/5 | Avg weak: 0.994 | Rate: 0.2 attempts/sec
    → Checkpoint saved

DIFFICULTY 5
  Problem 1/1 | Correct: 0/5 | Avg weak: 0.248 | Rate: 0.1 attempts/sec
    → Checkpoint saved

PRE-GENERATION COMPLETE
Total problems: 3
Total attempts: 15
Time: 122.8s (0.1 attempts/sec)

--- Summary Statistics ---

Difficulty 2:
  Problems: 1
  Total attempts: 5
  Overall accuracy: 100.0%
  Problems solvable (≥1 correct): 1/1 (100.0%)
  Weak score - mean: 0.996, std: 0.005

Difficulty 3:
  Problems: 1
  Total attempts: 5
  Overall accuracy: 100.0%
  Problems solvable (≥1 correct): 1/1 (100.0%)
  Weak score - mean: 0.994, std: 0.008

Difficulty 5:
  Problems: 1
  Total attempts: 5
  Overall accuracy: 0.0%
  Problems solvable (≥1 correct): 0/1 (0.0%)
  Weak score - mean: 0.248, std: 0.015

✓ Final data saved to: ../results/pregenerated_20260206_151818.json


## (Optional) Append More Data

In [None]:
# Uncomment to append more problems to an existing file:
#
# pregenerated_data = append_pregenerated_data(
#     existing_path="../results/pregenerated_XXXXXXXX_XXXXXX.json",
#     datasets=datasets,
#     llm=llm,
#     num_additional_problems=50,
#     num_attempts=NUM_ATTEMPTS,
#     difficulties=DIFFICULTY_LEVELS,
# )