Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,8 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/

# Training artifacts
checkpoints/
lightning_logs/
*.ckpt
113 changes: 111 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,111 @@
# tiny-recursion-models
Reproduction of Less is More: Recursive Reasoning with Tiny Networks
# Tiny Recursion Models for Sudoku Solving

This repository implements a custom Deep Learning Architecture using PyTorch Lightning to train and test on the Sudoku dataset from HuggingFace. The project is inspired by "Less is More: Recursive Reasoning with Tiny Networks" and implements a recursive neural network architecture specifically designed for Sudoku puzzle solving.

## Features

- **Custom Recursive Architecture**: Implements a novel recursive neural network with constraint layers specifically designed for Sudoku solving
- **PyTorch Lightning Integration**: Full PyTorch Lightning implementation with proper data modules, training loops, and callbacks
- **HuggingFace Dataset Support**: Loads the `sapientinc/sudoku-extreme-1k` dataset with fallback to mock data for development
- **Comprehensive Evaluation**: Includes visualization and evaluation tools for model performance analysis

## Architecture

The `TinyRecursionModel` consists of:

1. **Recursive Cells**: Process Sudoku grids through multiple recursive steps using GRU cells
2. **Constraint Layers**: Enforce Sudoku rules (row, column, and 3x3 box constraints) during processing
3. **Multi-layer Processing**: Stack multiple recursive layers for deeper reasoning
4. **Prediction Head**: Final classification layer for digit prediction (0-9)

## Installation

```bash
pip install -r requirements.txt
```

## Usage

### Quick Start

1. **Test the setup**:
```bash
python test_setup.py
```

2. **Train the model**:
```bash
python train.py --max_epochs 50 --batch_size 32
```

3. **Evaluate a trained model**:
```bash
python evaluate.py --model_path ./checkpoints/tiny_recursion_sudoku/final_model.ckpt
```

### Configuration

You can modify training parameters using command line arguments or by editing `config.yaml`:

```bash
python train.py \
--hidden_dim 64 \
--num_recursive_steps 5 \
--num_layers 3 \
--learning_rate 0.001 \
--batch_size 32 \
--max_epochs 50
```

### Dataset

The model is designed to work with the `sapientinc/sudoku-extreme-1k` dataset from HuggingFace. If the dataset is not accessible, the system automatically falls back to generated mock data for development and testing.

## Model Architecture Details

### Recursive Cell
- Uses GRU cells for temporal processing
- Applies layer normalization for training stability
- Configurable number of recursive steps

### Constraint Layer
- Enforces row constraints using linear transformations
- Applies column constraints through transposition
- Implements 3x3 box constraints with tensor reshaping

### Training Features
- Adam optimizer with weight decay
- Learning rate scheduling with ReduceLROnPlateau
- Early stopping to prevent overfitting
- TensorBoard logging for training visualization

## Project Structure

```
tiny-recursion-models/
├── src/
│ ├── data/
│ │ └── sudoku_datamodule.py # Data loading and preprocessing
│ ├── models/
│ │ └── tiny_recursion_model.py # Main model architecture
│ └── utils/
│ ├── config.py # Configuration utilities
│ └── sudoku_utils.py # Sudoku-specific utilities
├── train.py # Training script
├── evaluate.py # Evaluation script
├── test_setup.py # Setup verification
├── config.yaml # Configuration file
└── requirements.txt # Dependencies
```

## Results

The model achieves reasonable performance on Sudoku puzzle solving tasks. Training logs and model checkpoints are saved to the `./checkpoints` directory.

## Contributing

Feel free to submit issues and enhancement requests!

## License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
29 changes: 29 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Configuration file for Tiny Recursion Models

# Model configuration
model:
hidden_dim: 64
num_recursive_steps: 5
num_layers: 3
learning_rate: 0.001
weight_decay: 0.0001

# Data configuration
data:
dataset_name: "sapientinc/sudoku-extreme-1k"
batch_size: 32
num_workers: 4
val_split: 0.2

# Training configuration
training:
max_epochs: 50
patience: 10
accelerator: "auto"
devices: 1

# Logging configuration
logging:
save_dir: "./checkpoints"
experiment_name: "tiny_recursion_sudoku"
log_every_n_steps: 50
87 changes: 87 additions & 0 deletions demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Demo script for Tiny Recursion Model
"""

import torch
import sys
import os
sys.path.append(os.path.join(os.path.dirname(__file__), 'src'))

from src.models.tiny_recursion_model import TinyRecursionModel
from src.utils.sudoku_utils import generate_random_sudoku, print_sudoku
import numpy as np


def demo_model_inference():
"""Demonstrate model inference on a sample Sudoku puzzle"""
print("🧩 Tiny Recursion Model for Sudoku Solving Demo\n")

# Create a small model for demo
model = TinyRecursionModel(
hidden_dim=32,
num_layers=2,
num_recursive_steps=3
)
model.eval()

# Generate a sample puzzle
puzzle, solution = generate_random_sudoku()

print("📝 Sample Sudoku Puzzle:")
print_sudoku(puzzle)

print("\n✅ Ground Truth Solution:")
print_sudoku(solution)

# Convert to tensor and predict
puzzle_tensor = torch.tensor(puzzle, dtype=torch.float32).unsqueeze(0) # Add batch dim

with torch.no_grad():
predictions = model(puzzle_tensor)
predicted_digits = torch.argmax(predictions, dim=-1)[0] # Remove batch dim

print("\n🤖 Model Prediction (untrained):")
print_sudoku(predicted_digits.numpy())

# Calculate accuracy
correct_cells = (predicted_digits.numpy() == solution).sum()
total_cells = solution.size
accuracy = correct_cells / total_cells

print(f"\n📊 Accuracy: {correct_cells}/{total_cells} ({accuracy:.2%})")
print("\n💡 Note: This is an untrained model. Train with 'python train.py' for better results!")


def demo_architecture_info():
"""Display information about the model architecture"""
print("\n🏗️ Model Architecture Information:")

model = TinyRecursionModel(hidden_dim=64, num_layers=3, num_recursive_steps=5)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Hidden dimension: {model.hidden_dim}")
print(f"Number of layers: {len(model.recursive_layers)}")
print(f"Recursive steps: {model.num_recursive_steps}")

# Show layer structure
print("\n📋 Layer Structure:")
for i, (name, module) in enumerate(model.named_children()):
if hasattr(module, '__len__'):
print(f" {name}: {len(module)} layers")
else:
print(f" {name}: {type(module).__name__}")


if __name__ == "__main__":
demo_model_inference()
demo_architecture_info()

print("\n🚀 To train the model, run:")
print(" python train.py --max_epochs 50")
print("\n🔍 To evaluate a trained model, run:")
print(" python evaluate.py --model_path ./checkpoints/tiny_recursion_sudoku/final_model.ckpt")
102 changes: 102 additions & 0 deletions evaluate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
"""
Evaluation script for Tiny Recursion Models
"""

import argparse
import torch
import pytorch_lightning as pl
from src.data.sudoku_datamodule import SudokuDataModule
from src.models.tiny_recursion_model import TinyRecursionModel
import numpy as np


def visualize_predictions(puzzle, solution, prediction):
"""Visualize a single Sudoku puzzle, solution, and prediction"""
print("Puzzle:")
print_sudoku(puzzle)
print("\nGround Truth Solution:")
print_sudoku(solution)
print("\nModel Prediction:")
print_sudoku(prediction)
print("\nCorrect cells:", (solution == prediction).sum().item(), "/", solution.numel())
print("-" * 50)


def print_sudoku(grid):
"""Pretty print a 9x9 Sudoku grid"""
if isinstance(grid, torch.Tensor):
grid = grid.cpu().numpy()

for i in range(9):
if i % 3 == 0 and i != 0:
print("------+-------+------")
for j in range(9):
if j % 3 == 0 and j != 0:
print("| ", end="")
if j == 8:
print(grid[i][j])
else:
print(str(grid[i][j]) + " ", end="")


def evaluate_model(model_path: str, data_module: SudokuDataModule, num_samples: int = 5):
"""Evaluate the trained model and show sample predictions"""

# Load the trained model
model = TinyRecursionModel.load_from_checkpoint(model_path)
model.eval()

# Set up trainer for testing
trainer = pl.Trainer(logger=False, enable_progress_bar=True)

# Test the model
print("Evaluating model on test set...")
test_results = trainer.test(model, data_module)

# Show sample predictions
print(f"\nShowing {num_samples} sample predictions:")
test_dataloader = data_module.test_dataloader()

with torch.no_grad():
for i, (puzzles, solutions) in enumerate(test_dataloader):
if i >= num_samples:
break

predictions = model(puzzles)
pred_digits = torch.argmax(predictions, dim=-1)

# Show first puzzle from batch
visualize_predictions(
puzzles[0],
solutions[0],
pred_digits[0]
)

return test_results


def main():
parser = argparse.ArgumentParser(description='Evaluate Tiny Recursion Model')
parser.add_argument('--model_path', type=str, required=True, help='Path to trained model checkpoint')
parser.add_argument('--batch_size', type=int, default=32, help='Batch size for evaluation')
parser.add_argument('--num_workers', type=int, default=4, help='Number of data workers')
parser.add_argument('--num_samples', type=int, default=5, help='Number of samples to visualize')

args = parser.parse_args()

# Set up data module
data_module = SudokuDataModule(
batch_size=args.batch_size,
num_workers=args.num_workers
)

# Evaluate the model
results = evaluate_model(args.model_path, data_module, args.num_samples)

print("\nEvaluation Results:")
for key, value in results[0].items():
print(f"{key}: {value:.4f}")


if __name__ == '__main__':
main()
12 changes: 12 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
torch>=2.0.0
pytorch-lightning>=2.0.0
datasets>=2.0.0
transformers>=4.20.0
numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=1.0.0
matplotlib>=3.5.0
seaborn>=0.11.0
tqdm>=4.62.0
tensorboard>=2.10.0
pyyaml>=6.0
Empty file added src/__init__.py
Empty file.
Empty file added src/data/__init__.py
Empty file.
Loading