A framework for generating verified synthetic training data using best-of-N candidate selection with multiple LLM providers.
Given a dataset of questions, BestOfN generates N candidate responses per question using Claude or OpenAI models, then applies domain-specific verifiers to select the best response. The result is high-quality training data with verified reasoning chains.
bestofn/
├── common/ # Shared utilities (schemas, API retry, LLM judge, etc.)
├── verifiers/ # Pluggable verification system (math, code, tool, spatial, ...)
├── claude_gen/ # Claude generation pipeline (extended thinking, tool use)
├── openai_gen/ # OpenAI generation pipeline (responses API, structured output)
├── scripts/ # Dataset generators and data processing pipelines
└── tests/ # Test suite (400+ tests)
The verification system uses a registry pattern to dispatch verifiers based on dataset split names. Each verifier inherits from Verifier and implements domain-specific validation.
| Verifier | Domain | Method |
|---|---|---|
MathVerifier |
Math/STEM | Symbolic equivalence via SymPy |
CodeVerifier |
Python/JS | Docker-sandboxed execution |
ToolVerifier |
CLI/HTTP tool use | Schema + execution validation |
OpenAPIToolVerifier |
API tool calls | OpenAPI spec validation |
SpatialVerifier |
Hamiltonian paths | Deterministic path verification (2D/3D) |
PolyominoVerifier |
Tiling puzzles | Multi-strategy placement parsing |
InstructionFollowingVerifier |
General | LLM-backed instruction checking |
StructuredOutputVerifier |
JSON/schema | JSON schema validation |
RefusalClassifier |
Safety | Hybrid pattern + LLM refusal detection |
PersonaVerifier |
Personality | Style/character consistency checking |
from verifiers import get_verifier, get_verifier_for_split
# By name
verifier = get_verifier('math')
result = verifier.verify(question, candidate_answer, reference_answer)
# By dataset split (auto-dispatches)
verifier = get_verifier_for_split('gsm8k')The scripts/ directory contains generators for synthetic training datasets:
- Spatial Reasoning (
generate_spatial_reasoning_dataset.py) — Hamiltonian path puzzles on 2D grids (3x3-8x8) and 3D cubes (3x3x3-4x4x4), with obstacles and impossible variants - Polyomino Tiling (
generate_polyomino_tiling_dataset.py) — Tetromino/pentomino placement puzzles with 23 piece types, 6 difficulty levels, and impossibility proofs - Terminal Gym (
generate_terminal_gym_dataset.py,generate_terminal_gym_llm.py) — Bash/CLI tool-use trajectories with a sandboxed filesystem, 16 task subcategories, and optional plan-then-implement mode
Each generator produces JSONL output that can be converted to Harmony training format using the corresponding convert_*_to_harmony.py script.
Training data uses a structured multi-channel format with special tokens:
<|start|>system<|message|>You are a helpful assistant.<|end|>
<|start|>user<|message|>Solve this puzzle...<|end|>
<|start|>assistant<|channel|>analysis<|message|>Let me think step by step...<|end|>
<|start|>assistant<|channel|>final<|message|>The answer is...<|end|>
Channels (analysis, final, planning) enable structured reasoning traces with selective loss masking during training.
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtexport ANTHROPIC_API_KEY=your-key # For Claude generation
export OPENAI_API_KEY=your-key # For OpenAI generation# Claude pipeline
python -m claude_gen.generate --config experiments/baseline/baseline.yaml
# OpenAI pipeline
python -m openai_gen.generate --config experiments/baseline/baseline.yaml# Spatial reasoning (2D)
python scripts/generate_spatial_reasoning_dataset.py --grid-sizes 3 4 5 --num-per-size 50
# Spatial reasoning (3D)
python scripts/generate_spatial_reasoning_dataset.py --dimensions 3 --grid-sizes 3 4 --num-per-size 50
# Polyomino tiling
python scripts/generate_polyomino_tiling_dataset.py --num-puzzles 100 --difficulty-levels 1 2 3
# Terminal gym (templated)
python scripts/generate_terminal_gym_dataset.py --num-tasks 100python -m pytest tests/ -x --tb=shortMIT License
Copyright (c) 2026
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.