TRACE (Tracking Resource Allocation Consistency Evaluation) benchmarks LLMs on state tracking under conservation constraints over time.
- Mock (no network):
export MOCK_STEEL_THREAD=1python steel_thread.py --model openai/gpt-3.5-turbo --save-prompt
- Live (OpenRouter):
export OPENROUTER_API_KEY=your_keypython steel_thread.py --model openai/gpt-3.5-turbo --save-prompt
Outputs: steel_thread_results.json plus optional steel_thread_prompt.txt and steel_thread_response.txt.
Validate a single JSON file or a JSONL stream against the schema:
python tools/validate_results.py steel_thread_results.jsonpython tools/validate_results.py path/to/results.jsonl
Schema: schemas/trace-result.schema.json.
- Run locally:
pytest -v - CI runs on push/PR (badge above) using mock mode.
For full plans and roadmap, see PRD.md and milestones/.