TRACE Benchmark

TRACE (Tracking Resource Allocation Consistency Evaluation) benchmarks LLMs on state tracking under conservation constraints over time.

Steel Thread (Quick Start)

Mock (no network):
- export MOCK_STEEL_THREAD=1
- python steel_thread.py --model openai/gpt-3.5-turbo --save-prompt
Live (OpenRouter):
- export OPENROUTER_API_KEY=your_key
- python steel_thread.py --model openai/gpt-3.5-turbo --save-prompt

Outputs: steel_thread_results.json plus optional steel_thread_prompt.txt and steel_thread_response.txt.

Validate Results

Validate a single JSON file or a JSONL stream against the schema:

python tools/validate_results.py steel_thread_results.json
python tools/validate_results.py path/to/results.jsonl

Schema: schemas/trace-result.schema.json.

Tests & CI

Run locally: pytest -v
CI runs on push/PR (badge above) using mock mode.

For full plans and roadmap, see PRD.md and milestones/.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
hooks		hooks
milestones		milestones
schemas		schemas
tests		tests
tools		tools
trace		trace
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
PRD.md		PRD.md
README.md		README.md
RUN.md		RUN.md
additional_models_results.json		additional_models_results.json
multi_model_results.json		multi_model_results.json
pyproject.toml		pyproject.toml
steel_thread.py		steel_thread.py
test_additional_models.py		test_additional_models.py
test_multiple_models.py		test_multiple_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRACE Benchmark

Steel Thread (Quick Start)

Validate Results

Tests & CI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TRACE Benchmark

Steel Thread (Quick Start)

Validate Results

Tests & CI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages