BestOfN

A framework for generating verified synthetic training data using best-of-N candidate selection with multiple LLM providers.

Given a dataset of questions, BestOfN generates N candidate responses per question using Claude or OpenAI models, then applies domain-specific verifiers to select the best response. The result is high-quality training data with verified reasoning chains.

Repository Structure

bestofn/
├── common/              # Shared utilities (schemas, API retry, LLM judge, etc.)
├── verifiers/           # Pluggable verification system (math, code, tool, spatial, ...)
├── claude_gen/          # Claude generation pipeline (extended thinking, tool use)
├── openai_gen/          # OpenAI generation pipeline (responses API, structured output)
├── scripts/             # Dataset generators and data processing pipelines
└── tests/               # Test suite (400+ tests)

Verifiers

The verification system uses a registry pattern to dispatch verifiers based on dataset split names. Each verifier inherits from Verifier and implements domain-specific validation.

Verifier	Domain	Method
`MathVerifier`	Math/STEM	Symbolic equivalence via SymPy
`CodeVerifier`	Python/JS	Docker-sandboxed execution
`ToolVerifier`	CLI/HTTP tool use	Schema + execution validation
`OpenAPIToolVerifier`	API tool calls	OpenAPI spec validation
`SpatialVerifier`	Hamiltonian paths	Deterministic path verification (2D/3D)
`PolyominoVerifier`	Tiling puzzles	Multi-strategy placement parsing
`InstructionFollowingVerifier`	General	LLM-backed instruction checking
`StructuredOutputVerifier`	JSON/schema	JSON schema validation
`RefusalClassifier`	Safety	Hybrid pattern + LLM refusal detection
`PersonaVerifier`	Personality	Style/character consistency checking

from verifiers import get_verifier, get_verifier_for_split

# By name
verifier = get_verifier('math')
result = verifier.verify(question, candidate_answer, reference_answer)

# By dataset split (auto-dispatches)
verifier = get_verifier_for_split('gsm8k')

Dataset Generators

The scripts/ directory contains generators for synthetic training datasets:

Spatial Reasoning (generate_spatial_reasoning_dataset.py) — Hamiltonian path puzzles on 2D grids (3x3-8x8) and 3D cubes (3x3x3-4x4x4), with obstacles and impossible variants
Polyomino Tiling (generate_polyomino_tiling_dataset.py) — Tetromino/pentomino placement puzzles with 23 piece types, 6 difficulty levels, and impossibility proofs
Terminal Gym (generate_terminal_gym_dataset.py, generate_terminal_gym_llm.py) — Bash/CLI tool-use trajectories with a sandboxed filesystem, 16 task subcategories, and optional plan-then-implement mode

Each generator produces JSONL output that can be converted to Harmony training format using the corresponding convert_*_to_harmony.py script.

Harmony Format

Training data uses a structured multi-channel format with special tokens:

<|start|>system<|message|>You are a helpful assistant.<|end|>
<|start|>user<|message|>Solve this puzzle...<|end|>
<|start|>assistant<|channel|>analysis<|message|>Let me think step by step...<|end|>
<|start|>assistant<|channel|>final<|message|>The answer is...<|end|>

Channels (analysis, final, planning) enable structured reasoning traces with selective loss masking during training.

Quick Start

Installation

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

API Keys

export ANTHROPIC_API_KEY=your-key    # For Claude generation
export OPENAI_API_KEY=your-key       # For OpenAI generation

Generate Candidates

# Claude pipeline
python -m claude_gen.generate --config experiments/baseline/baseline.yaml

# OpenAI pipeline
python -m openai_gen.generate --config experiments/baseline/baseline.yaml

Generate Datasets

# Spatial reasoning (2D)
python scripts/generate_spatial_reasoning_dataset.py --grid-sizes 3 4 5 --num-per-size 50

# Spatial reasoning (3D)
python scripts/generate_spatial_reasoning_dataset.py --dimensions 3 --grid-sizes 3 4 --num-per-size 50

# Polyomino tiling
python scripts/generate_polyomino_tiling_dataset.py --num-puzzles 100 --difficulty-levels 1 2 3

# Terminal gym (templated)
python scripts/generate_terminal_gym_dataset.py --num-tasks 100

Run Tests

python -m pytest tests/ -x --tb=short

License

MIT License

Copyright (c) 2026

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BestOfN

Repository Structure

Verifiers

Dataset Generators

Harmony Format

Quick Start

Installation

API Keys

Generate Candidates

Generate Datasets

Run Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
claude_gen		claude_gen
common		common
openai_gen		openai_gen
scripts		scripts
tests		tests
verifiers		verifiers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
verifier_config.yaml		verifier_config.yaml

License

eous/bestofn

Folders and files

Latest commit

History

Repository files navigation

BestOfN

Repository Structure

Verifiers

Dataset Generators

Harmony Format

Quick Start

Installation

API Keys

Generate Candidates

Generate Datasets

Run Tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages