MathSmith

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

Overview

MathSmith is a framework for enhancing mathematical reasoning capabilities of large language models by generating challenging synthetic problems from scratch. Unlike methods that modify existing problems, MathSmith creates novel problems through a reinforced policy, ensuring diversity and scalability.

Resources

🧠 Problem Synthesizers

MathSmith-HC-Qwen3-8B: complexity + consistency reward
MathSmith-Hard-Qwen3-8B: complexity-only reward

📘 Datasets

🔧 SFT Models

ShortCoT (Qwen3 series): 1.7B | 8B | 14B | 32B
LongCoT: Qwen3-8B | DS-Qwen-7B

Pipeline

The MathSmith framework consists of four main stages:

Concept Collection: Randomly sample concept–explanation pairs from PlanetMath to ensure data independence.
Supervised Fine-tuning (SFT): Train the model on collected concept–explanation pairs to establish foundational understanding.
Reinforcement Learning (RL): Optimize the model using GRPO with rewards based on:
- Structural validity
- Reasoning complexity
- Answer consistency
Weakness-Focused Self-Improvement: Iteratively identify and address model weaknesses by generating targeted problem variants.

Quick Start

Installation

git clone https://github.com/Jasaxion/MathSmith.git
cd MathSmith
pip install -r requirements.txt

Data Collection

Collect concept–explanation pairs from PlanetMath:

cd data_collect/planetmath_process
# Follow instructions to process PlanetMath data

We have processed the concept-explanation pairs from PlanetMath and stored them in ./data_collect/sampled_concept/collect_planetmath_grouped_deduplicated.jsonl

Problem Generation

Generate mathematical problems using the trained model:

python QM_sampler.py

Evaluation

Evaluate on benchmarks (GSM8K, MATH-500, AIME2024, AIME2025, OlympiadBench):

cd evaluate
bash eval.sh

Self-Improvement

Run the weakness-focused improvement pipeline: Instruction

cd self-improvement
bash self_improve.sh

Repository Structure

MathSmith/
├── data_collect/          # Concept collection and data processing
├── sft-stage/             # Supervised fine-tuning scripts
├── rl-stage/              # Reinforcement learning training
│   ├── train_script/      # RL training scripts
│   └── reward_func/       # Reward function implementations
├── answer_sampler/        # Answer generation for problems
├── evaluate/              # Evaluation scripts and benchmarks
├── self-improvement/      # Weakness-focused improvement pipeline
├── utils/                 # Utility functions
└── QM_sampler.py         # Problem generation script

Training

If you want to start training a MathSmith framework's problem synthesis model from scratch, please complete two stages of training according to the following steps.

SFT Stage to get a MathSmith-cold-start model

cd sft-stage
# Configure MathSmith_Questioner-Qwen3-8B.yaml
# Run SFT training

RL Stage to custom reward and training a HC/Hard version.

cd rl-stage/train_script
bash rl_mathsmith.sh

Results

MathSmith consistently outperforms baselines across five benchmarks under both short and long chain-of-thought settings:

Easy & Medium: GSM8K, MATH-500
Hard: AIME2024, AIME2025, OlympiadBench

Citation

If you find this work useful, please cite:

@article{zhan2025mathsmith,
  title={MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy},
  author={Zhan, Shaoxiong and Lai, Yanlin and Lu, Ziyu and Lin, Dahua and Yang, Ziqing and Tan, Fei},
  journal={arXiv preprint arXiv:2508.05592},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
answer_sampler		answer_sampler
data_collect		data_collect
docs		docs
eval_train		eval_train
evaluate		evaluate
exp		exp
rl-stage		rl-stage
self-improvement		self-improvement
sft-stage		sft-stage
utils/data_previewer		utils/data_previewer
.gitignore		.gitignore
LICENSE		LICENSE
QM_sampler.py		QM_sampler.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathSmith

Overview

Resources

🧠 Problem Synthesizers

📘 Datasets

🔧 SFT Models

Pipeline

Quick Start

Installation

Data Collection

Problem Generation

Evaluation

Self-Improvement

Repository Structure

Training

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MathSmith

Overview

Resources

🧠 Problem Synthesizers

📘 Datasets

🔧 SFT Models

Pipeline

Quick Start

Installation

Data Collection

Problem Generation

Evaluation

Self-Improvement

Repository Structure

Training

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages