# Sudoku: Pre-generate Tree Data

This notebook generates the tree-structured candidate data for Sudoku experiments.
Each puzzle gets a tree of depth `n` with branching factor `m`.

**Output:** `results/sudoku_tree_data.json`

In [2]:
import sys, os
sys.path.insert(0, os.path.abspath('..'))

from src.sudoku import (
    SudokuConfig, SudokuGenerator, StrongVerifier, WeakVerifier,
    load_sudoku_dataset_hf, generate_tree_dataset, analyze_tree_data,
)

In [3]:
# Configure (reads API keys from env vars)
config = SudokuConfig(
    provider='deepseek',
    generator_model='deepseek-chat',
    weak_verifier_model='deepseek-chat',
    num_problems=500,
)

print(f'Provider: {config.provider}')
print(f'Generator: {config.generator_model}')

Provider: deepseek
Generator: deepseek-chat


In [4]:
# Load dataset
dataset = load_sudoku_dataset_hf(config.num_problems)
print(f'Loaded {len(dataset)} puzzles')



Loading dataset from HuggingFace...
âœ… Loaded 500 puzzles from HuggingFace
Loaded 500 puzzles


In [5]:
# Initialize components
generator = SudokuGenerator(config)
weak_verifier = WeakVerifier(config)
strong_verifier = StrongVerifier()

print('All components initialized')

All components initialized


In [8]:
# Generate tree data
os.makedirs('../results', exist_ok=True)

trees = generate_tree_dataset(
    dataset=dataset,
    generator=generator,
    weak_verifier=weak_verifier,
    strong_verifier=strong_verifier,
    num_problems=5,
    n=3, m=3,
    puzzle_parallelism=3,
    node_parallelism=8,
    save_path='../results/sudoku_tree_data.json',
    checkpoint_every=10,
)

analyze_tree_data(trees)

ðŸ“‚ Loading existing data from ../results/sudoku_tree_data.json...
   Found 2 existing puzzles

TREE GENERATION
  Existing: 2 puzzles
  Target: 5 puzzles
  To generate: 3 puzzles
  Parallelization: 3 puzzles Ã— 8 nodes


ðŸ”„ Batch 1/1 (3 puzzles)
  âœ“ Puzzle puzzle_2 done (3 total) - 23/39 correct, 48.3s
  âœ“ Puzzle puzzle_4 done (4 total) - 13/39 correct, 69.0s
  âœ“ Puzzle puzzle_3 done (5 total) - 0/39 correct, 82.1s

âœ… COMPLETE
  Previously had: 2
  Newly generated: 3
  Total now: 5
  Accuracy: 18.46%
  Time: 82.1s

âœ“ Saved 5 puzzles to ../results/sudoku_tree_data.json

ANALYSIS
Puzzles: 5
Total nodes: 195
Accuracy: 18.46%

By Depth:
  Depth 1: accuracy=33.33%, weak_score=0.467
  Depth 2: accuracy=17.78%, weak_score=0.156
  Depth 3: accuracy=17.04%, weak_score=0.230


{'total_nodes': 195, 'accuracy': 0.18461538461538463}