This project aims to automate and accelerate progress toward the k-server conjecture, a central open problem in competitive analysis, using open-ended AI-driven discovery.
The core task is to discover a potential function that satisfies a large, graph-structured system of linear inequalities. The evaluation procedure is sound but incomplete:
- Any violated inequality definitively refutes a candidate.
- Satisfying all inequalities does not constitute a formal proof of the conjecture.
Nevertheless, a candidate that satisfies all constraints would provide strong evidence toward a valid proof. To the best of our knowledge, no known potential satisfies all constraints under our formulation for the open circle case with
As such, a successful candidate would already represent a meaningful contribution—and could potentially lead to a full theoretical result when paired with a formal proof.
For additional mathematical background, see the documentation under docs/. A paper link can be added here once the public version is finalized.
Among previously human-discovered solutions for the open circle case with
- 17 violations out of ~7 million inequalities
Using the tools in this repository together with coding agents such as Codex, we discovered a potential with:
- 3 violations out of ~7 million inequalities
One reproducible path to that result is the staged search in examples/search_n7_async_pipeline/: a Codex-developed, human-guided coefficient search for a fixed n=7 index matrix. It first filters candidates on circle_k4_m6, then on hard circle_taxi_k4_m6 edges, then on timeout-limited random taxi edges, and finally runs full taxi evaluation. With random candidates at the beginning, this procedure can find a 3-violation coefficient vector in roughly 20-30 minutes.
This problem is an extremely challenging optimization task:
- The search space is vast
- Improvements are incremental and rare
- Each evaluation requires checking millions of inequalities
These factors make large-scale exploration computationally and algorithmically demanding.
This repository may be useful as:
-
Research tooling For researchers exploring potential-function approaches to the
$$k$$ -server problem, particularly the open circle$$k = 4$$ case and beyond. -
AI discovery benchmark As a benchmark for agentic, code-based mathematical discovery, where progress is measured via automatically computed violations.
A candidate is typically implemented as a Python class defining a potential function:
class Potential:
def __init__(
self,
context, # k-server parameters (metric space, number of servers, etc.)
**kwargs # optional hyperparameters
):
self.context = context
...
def __call__(
self,
wf: np.ndarray # representation of a graph node
) -> float:
...See examples for both human-designed and AI-discovered potentials.
The staged n=7 Codex search example also documents the exact reproduction command and search-family settings.
The evaluator is a Python program that:
- Computes potential values
- Checks inequalities over a precomputed dataset of instances (see
metrics/)
Each instance corresponds to a metric space
- Nodes
$$v \in V$$ are parameterized by vectors$$w \in \mathbb{R}^d$$ - Edges
$$(u, v) \in E$$ are parameterized by requests$$r \in M$$
The goal is to find a potential
where:
The evaluator reports:
- Number of violations
- Additional diagnostic statistics
See tools/ for details.
This repository uses Git LFS to store large precomputed metric datasets.
Before cloning the repository, ensure Git LFS is installed and initialized:
git lfs install
git clone <repo-url>
cd <repo>
git lfs pullWithout Git LFS, the files in metrics/ will not be downloaded correctly, and the evaluator will not function.
The evaluator depends on the k-servers package, which can be installed in an environment separate from the agent:
pip install -e ./k-serversThis separation is recommended when running agents in isolated or containerized environments.
To prepare a workspace with benchmark documentation and source materials:
bash agents/setup.sh <target-dir>This setup is agent-agnostic and copies only the necessary benchmark-facing docs and papers into the target directory.
You have two main options:
1. Direct invocation (simplest) Instruct your coding agent to run:
/abs/path/to/tools/evaluator/evaluate.py
This can be added to agent context files (e.g., AGENTS.md, CLAUDE.md) so that the agent can call the evaluator when needed.
2. MCP integration
Alternatively, you can expose the evaluator via MCP using the entrypoint:
tools/evaluator/evaluate_mcp_server.py
This allows agents to interact with the evaluator as a tool rather than invoking it directly.
For Codex, we provide a convenience setup script that performs both:
- the base workspace setup
- MCP configuration
pip install tomli tomli_w mcp
bash agents/codex/setup.sh <target-dir>For additional details (including split-environment setups), see agents/codex/README.md.
We also provide an additional autoresearch layering that can be applied on top of an existing workspace setup. It adds an AGENTS.md file plus an autoresearch-style program.md.
bash agents/autoresearch/setup.sh <target-dir>See agents/autoresearch/README.md.
After setup:
- Enter the target directory.
- Start the agent.
- Run a simple test task.
Example prompt:
Implement a very simple potential that always returns zero and evaluate it on circle_k3_m6.pickle
Expected output (truncated):
{
"correct": true,
"error": null,
"combined_score": 0.7285714285714286,
"public": {
"0/violations_k": 570,
"0/violations_k_score": 0.7285714285714286
}
}For discovery systems (e.g., AlphaEvolve), the main evaluator entry point is:
tools/evaluator/evaluate.pyKey parameters:
--program_path: path to the candidate implementation--metrics_names: comma-separated list of metric files
See:
The tasks/ directory contains reusable task definitions, including:
- Goals
- Implementation constraints
- Hints to reduce the search space
Predefined tasks:
You can build a fully reproducible environment using Docker.
docker build -t k-server-env:latest -f docker/Dockerfile .
docker build -t generic-k-server:latest -f agents/Dockerfile .Or, if you specifically want the Codex image:
docker build -t codex-k-server:latest -f agents/codex/Dockerfile .k-server-env: base environment with dependencies and toolinggeneric-k-server: base environment plus a generic coding-agent workspace bootstrap, which you can use for tools such as Claude, for examplecodex-k-server: environment configured for running Codex agents
docker run --rm -it \
-v $HOME/.codex/auth.json:/run/secrets/codex-auth.json \
codex-k-server:latestThis mounts your Codex authentication file into the container and launches an interactive session.
To layer autoresearch into the same image, invoke its entrypoint directly:
docker run --rm -it <BASE_IMAGE> /home/kserver/k-server-bench/agents/autoresearch/entrypoint.shwhere <BASE_IMAGE> might be codex-k-server:latest or generic-k-server:latest.
Start here:
docs/concepts.md: core mathematical concepts and scoringdocs/task-model.md: task composition and sweepsdocs/evaluators.md: evaluator interfacesdocs/experiments.md: experiment structure
docs/: detailed documentationk-servers/: core Python packagetasks/: task definitions and hintsexperiments/: experiment configurationstools/: evaluators and utilitiesmetrics/: benchmark datasetsexamples/: example runs and candidatestests/: basic validation testsagents/: agent integrationsdocker/: reproducible environments