Lean Eval

This repository is a comparator-based Lean benchmark for formal mathematics. Benchmark authors write trusted problem statements once in shared Lean modules, and the tooling generates one comparator workspace per problem under generated/.

A submission is scored entirely by comparator results: a problem counts as solved iff comparator accepts the submitted solution.

The main user-facing entrypoint is:

lake exe lean-eval --help

Quick Start For Benchmark Problem Contributors

Use this path if you are adding or editing benchmark problems.

1. Install and fetch dependencies

lake exe cache get
lake build

2. Add or edit a trusted theorem

Put the statement in one of the shared modules under LeanEval/ and mark it with @[eval_problem].

@[eval_problem]
theorem my_new_problem : ... := by
  sorry

Current source modules live in topic folders such as:

LeanEval/NumberTheory/
LeanEval/Topology/
LeanEval/ComplexAnalysis/
LeanEval/EasyProblems.lean

3. Add the manifest entry

Each tagged declaration must be listed by exactly one entry in manifests/problems.toml. The holes array names every @[eval_problem]-tagged declaration in the module that the problem owns; for the common single-theorem case it has one element.

[[problem]]
id = "my_new_problem"
title = "My new problem"
test = false
module = "LeanEval.SomeModule"
holes = ["my_new_problem"]
submitter = "Your Name"
notes = "Optional notes."
source = "Optional citation or URL."
informal_solution = "Optional proof sketch or reference."

The required fields are:

id
title
test
module
holes
submitter

Multi-hole problems

A problem may bundle several defs, instances and theorems — list them all in holes. Comparator then asks the participant to fill every listed declaration in their Submission.lean. Two conventions:

Name every instance. The generator addresses holes by their declaration name, and Lean's auto-generated names for anonymous instances (e.g. instTopologicalSpaceJacobian) are not stable. Use instance instAddCommGroup : ... := sorry rather than instance : ... := sorry.
Use holes even for one declaration. There is no theorem = "..." shorthand; a singleton hole is just holes = ["my_thm"].

See LeanEval/Sandbox/DefHoleExample.lean and LeanEval/Sandbox/InstanceHoleExample.lean for the smallest working examples, or LeanEval/Geometry/JacobianChallenge.lean for a realistic multi-hole problem.

4. Validate the authored source of truth (optional)

lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-build

validate-manifest checks that @[eval_problem] declarations and manifest entries match. check-problem-build builds the problem modules so warning-producing Lean changes do not slip through. Both are cheap and catch the most common mistakes before a CI roundtrip.

5. Open a PR

That's it — push your branch and open a PR. CI regenerates the comparator workspaces under generated/ and verifies they build, so you do not need to commit anything under generated/ yourself. Once the PR merges, a separate workflow regenerates generated/ on main and pushes the result.

If CI fails with a generation or build error, you'll need to fix the source. The fastest local equivalent is:

lake exe lean-eval generate --problem my_new_problem
lake exe lean-eval check-generated-builds --problem my_new_problem

Quick Start For Solvers

Use this path if you want to prove benchmark problems without touching the trusted benchmark files.

1. Pick a problem

Generated workspaces live under generated/, one directory per problem. The current catalog is summarized in generated/index.json.

Examples:

generated/two_plus_two/
generated/list_append_singleton_length/
generated/cyclotomic_integer_house_le_two/

2. Create your local workspace

Copy a clean starter workspace:

lake exe lean-eval start-problem two_plus_two

That creates workspaces/two_plus_two/ by default. You can also pass a destination:

lake exe lean-eval start-problem two_plus_two /tmp/two_plus_two

3. Install workspace dependencies

cd workspaces/two_plus_two
lake update

4. Write your proof

Solver-owned files are:

Submission.lean
Submission/Helpers.lean
any additional Lean files you add under Submission/

Trusted files you should not edit in the normal solver workflow are:

Challenge.lean
Solution.lean
config.json
lakefile.toml

Challenge.lean contains the benchmark statement. Solution.lean is the fixed bridge that tells comparator to check your theorem from the Submission namespace.

5. Run comparator locally

lake test

If the comparator binary is not on your PATH, set it explicitly:

COMPARATOR_BIN=/path/to/comparator lake test

You can verify the installation against the starter problem from the repo root with:

lake exe lean-eval check-comparator-installation

Comparator setup also requires the upstream external tools, including landrun and lean4export. Install landrun from its git main branch (go install github.com/zouuup/landrun/cmd/landrun@main); the latest tagged release (v0.1.15) is missing fixes that comparator's sandbox relies on.

CI pins lean4export to tag v4.30.0-rc2 and comparator to commit 71b52ec29e06d4b7d882726553b1ceb99a2499e0 (which adds support for def-shaped holes).

6. Check your local score

From the repo root:

lake exe lean-eval run-eval
lake exe lean-eval run-eval --json

The scorer prefers workspaces/<problem-id>/ when present and falls back to generated/<problem-id>/ otherwise.

Submission Rules

Participants may use Mathlib freely.

If a proof needs helper code that is not already in Mathlib, that code must be included inside the submission workspace itself. Multi-file submissions are allowed through Submission.lean and extra local modules under Submission/.

For benchmark-repo submissions, validate changed paths with:

lake exe lean-eval validate-submission --file generated/two_plus_two/Solution.lean

The current validator accepts:

modifications to generated/<problem-id>/Solution.lean and generated/<problem-id>/Submission.lean
additions, modifications, deletions, renames, or copies of .lean files under generated/<problem-id>/Submission/
additions (only) of markdown files anywhere inside generated/<problem-id>/, other than the generated README.md
additions (only) of a top-level generated/<problem-id>/LICENCE or generated/<problem-id>/LICENSE file

In practice, solvers should normally work in Submission.lean and Submission/.

Repository Layout

LeanEval/: trusted authored problem statements
manifests/problems.toml: problem metadata
generated/: generated comparator workspaces
scripts/: generation, validation, and scoring helpers
PLAN.md: deferred design and roadmap notes

End-To-End Repo Checks

For a local health pass over the repository:

lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-build
lake exe lean-eval generate --check
lake exe lean-eval check-generated-builds
lake exe lean-eval run-eval

There is also an end-to-end workflow self-check:

lake exe lean-eval check-eval-workflow

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
EvalTools		EvalTools
LeanEval		LeanEval
docs		docs
generated		generated
manifests		manifests
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
EvalTools.lean		EvalTools.lean
LANDRUN.md		LANDRUN.md
LeanEval.lean		LeanEval.lean
PLAN.md		PLAN.md
README.md		README.md
lake-manifest.json		lake-manifest.json
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lean Eval

Quick Start For Benchmark Problem Contributors

1. Install and fetch dependencies

2. Add or edit a trusted theorem

3. Add the manifest entry

Multi-hole problems

4. Validate the authored source of truth (optional)

5. Open a PR

Quick Start For Solvers

1. Pick a problem

2. Create your local workspace

3. Install workspace dependencies

4. Write your proof

5. Run comparator locally

6. Check your local score

Submission Rules

Repository Layout

End-To-End Repo Checks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lean Eval

Quick Start For Benchmark Problem Contributors

1. Install and fetch dependencies

2. Add or edit a trusted theorem

3. Add the manifest entry

Multi-hole problems

4. Validate the authored source of truth (optional)

5. Open a PR

Quick Start For Solvers

1. Pick a problem

2. Create your local workspace

3. Install workspace dependencies

4. Write your proof

5. Run comparator locally

6. Check your local score

Submission Rules

Repository Layout

End-To-End Repo Checks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages