Skip to content

stanford-iris-lab/meta-harness-tbench2-artifact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meta-Harness

Agent scaffold for Terminal-Bench 2.0, built on top of Terminus-KIRA by KRAFTON AI and Harbor's Terminus-2 framework.

Results

76.4% on Terminal-Bench 2.0 (89 tasks × 5 trials, Claude Opus 4.6).

Split N Score
Easy 4 100.0
Medium 55 81.1
Hard 30 64.7
All 89 76.4

Usage

pip install harbor

export ANTHROPIC_API_KEY=<your-key>

harbor run \
  --agent-import-path agent:AgentHarness \
  -d terminal-bench@2.0 \
  -m anthropic/claude-opus-4-6 \
  -e runloop \
  -n 20 \
  --n-attempts 5

Method

Meta-Harness extends the Terminus-KIRA agent with environment bootstrapping: before the agent loop starts, it gathers a snapshot of the sandbox environment (working directory, file listing, available languages/tools, package managers, memory) and injects it into the initial prompt. This saves 2-5 early exploration turns that the agent normally spends on ls, which python3, etc.

The agent was discovered through automated harness evolution. More details coming soon.

Acknowledgements

We thank KRAFTON AI for compute support.

About

Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages