This repository contains the implementation of our DoBeVi proposed in the paper: LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation [📄 arXiv:2505.12031]
We thank LeanDojo for its significant contributions to the ATP community, upon which our DoBeVi framework is proudly built.
llm_base_atp/
├── README.md # Project documentation
├── Visual/ # The visualization of tree-search proof results for MiniF2F
├── DoBeVi/ # Core implementation of DoBeVi
│ ├── requirements.txt # Python dependencies
│ └── src/ # Source code
│ ├── __init__.py # Package initializer
│ ├── dojo/ # Lightweight LeanDojo wrapper for Lean 4 interaction
│ ├── search/ # Proof search algorithms and logic
│ ├── eval.py # Entry point for evaluation
│ ├── config.py # Configuration settings
│ ├── utils.py # Utility functions
│ └── .env.template # Template for environment setup
conda create -n dobevi python=3.11 -y
conda activate dobevipip install -r requirements.txtconda install -c conda-forge graphvizPlease download the pretrained policy model from Hugging Face.
Ensure you have a Lean 4 repository that builds successfully using:
lake buildCopy and edit the template config file to match your local setup:
cd src/
cp .env.template .envYou’ll need to set values for:
- Benchmark project path
- Model path
- Tree search budget
- Output path for results
python -m eval- Release code for synthetic data generation
- Add support for whole-proof methods and fine-tuning
We welcome issues, feature requests, and feedback from the community. Please feel free to open an issue if you encounter any problems or have suggestions!
If you use this work in your research, please cite:
@article{lai2025llm,
title={LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation},
author={Lai, Junyu and Zhang, Jiakun and Xu, Shuo and Chen, Taolue and Wang, Zihang and Yang, Yao and Zhang, Jiarui and Cao, Chun and Xu, Jingwei},
journal={arXiv preprint arXiv:2505.12031},
year={2025}
}