Optimizing the Cost-Quality of Agentic Theorem Provers in Lean

Clone Repository

To clone the repository, one can run the following command:

git clone --recurse-submodules git@github.com:eth-sri/optimizing-lean-agents.git

Install Lean

For this project, we use Lean version 4.9.0. To set up the environment in the way we did, one can run the following commands:

curl https://elan.lean-lang.org/elan-init.sh -sSf | sh -s -- -y
source ~/.elan/env
cd mathlib4 && lake build && cd ..

To verify that the Lean environment is set up correctly, one can run the following command to check that the Lean math library compiles:

uv run python lean_compiler/repl_scheduler.py

The output should include the following line if the Lean environment is set up correctly:

Progress: 1/1 proofs processed. REPL errors: 0

Quick Start

The following are the relevant commands to set up the uv environment and the relevant environment variables:

# Python deps (agent = API clients, gpu = vLLM + torch)
uv sync --extra agent --extra gpu

# Env variables
export SCRATCH=./scratch && mkdir -p $SCRATCH/results
export TOGETHER_API_KEY=<your_key>
export CUDA_VISIBLE_DEVICES=<your-available-gpu>

Data Generation

Whole-Proof Generation

To run whole-proof generation, one can configure the run in configs/hydra/prover/config.yaml and then run it with

uv run python prover/runner.py --config configs/hydra/prover/config.yaml

Agent

To run the agent data collection, one configures the run in configs/hydra/seed_prover/config.yaml, the prover module is configures in configs/hydra/seed_prover/prover/unified.yaml, and then

uv run python seed_prover/hydra_runner.py

Simulations

Note that all the example configs are for the example problems in datasets. The simulations are based around the example train-test split in dataset/example_problems_train_test_split.txt. For reproducibility, we have included our version of the Putnam Lean formalizations along with our train-test splits in dataset/putnam_rewrite_solved_train_test_split.txt. In proof simulations, the train-test can be configured via simulation.problem_split.path in the config files.

Our agent

To run the action routing Lean agent, we need three steps. Feature tracking, training the quality estimator model, and then running the sweep. The configs for these steps are in configs/proof_simulation/example/, example commands to run them are

uv run python scripts/proof_simulation/sweep.py --config configs/proof_simulation/example/fixed_feature_tracker_full_router.yaml
uv run python scripts/proof_simulation/train_cost_logistic.py --config configs/proof_simulation/example/train_onestage.yaml
uv run python scripts/proof_simulation/sweep.py --config configs/proof_simulation/example/sweep_onestage.yaml

Fixed-Step Baseline

To run the fixed-step baseline (and the whole-proof baseline to evaluate the performance of the data plane), one can adjust the configuration file found in configs/proof_simulation/example/sweep_fixed_test.yaml and then run the following command:

uv run python scripts/proof_simulation/sweep.py --config configs/proof_simulation/example/sweep_fixed_test.yaml

For whole-proof generation, one can run:

uv run python scripts/proof_simulation/sweep.py --config configs/proof_simulation/example/sweep_fixed_whole_proof.yaml

Noisy Oracle Router

Similarly, to run the noisy oracle router, one can use the configs defined in configs/proof_simulation/oracle and run the feature tracker, training, and sweep steps exactly as described for our agent above.

Graphical User Interface

We developed a GUI to analyze the data from the agent data collection and the simulations. To run the GUI, one can run the following command:

uv run --extra gui streamlit run analysis_gui/seed/app.py --server.headless=True

In the GUI, it is possible to both analyze the runs from the agent data collection step visually, and analyze trajectories and generate the plots from the simulations.

Citation

If you find our work useful, please consider citing our paper:

@article{rögnvaldsson2026optimizingcostqualitytradeoffagentic,
      title={Optimizing the Cost-Quality Tradeoff of Agentic Theorem Provers in Lean}, 
      author={Kári Rögnvaldsson and Chenhao Sun and Jasper Dekoninck and Martin Vechev},
      year={2026},
      eprint={2606.04883},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.04883}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis_gui/seed		analysis_gui/seed
configs		configs
dataset		dataset
feature_engineering		feature_engineering
lean_compiler		lean_compiler
mathlib4 @ b23c7a9		mathlib4 @ b23c7a9
proof_simulation		proof_simulation
prover		prover
scripts		scripts
seed_data_models		seed_data_models
seed_prover		seed_prover
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
id_utils.py		id_utils.py
metadata_utils.py		metadata_utils.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing the Cost-Quality of Agentic Theorem Provers in Lean

Clone Repository

Install Lean

Quick Start

Data Generation

Whole-Proof Generation

Agent

Simulations

Our agent

Fixed-Step Baseline

Noisy Oracle Router

Graphical User Interface

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Optimizing the Cost-Quality of Agentic Theorem Provers in Lean

Clone Repository

Install Lean

Quick Start

Data Generation

Whole-Proof Generation

Agent

Simulations

Our agent

Fixed-Step Baseline

Noisy Oracle Router

Graphical User Interface

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages