Haize Annotate

Overview

This repository contains a custom Claude Code skill to help with exploring and annotating agent trace data.

SKILL.md - Full instructions Claude uses when running the workflow
references/ - Deep dives on ingestion patterns, rubric design, etc.
scripts/ - Helper scripts and a simple API server your agent uses when aiding with annotation workflows

Quick Start

1. Install the skill

# Create and activate virtual environment
uv venv
source .venv/bin/activate

# Clone the repository
git clone git@github.com:haizelabs/annotate.git

# Move skill to Claude skills directory
mv annotate/annotate_skill ~/.claude/skills/annotate_skill
cd ~/.claude/skills/annotate_skill

# Install dependencies
uv pip install -r requirements.txt
cd frontend && yarn install

2. Navigate to a directory with agent traces

# If you don't have logs yet, use the example data:
cd /Users/haizelabsguest/haizelabs/osource-aa/annotate/tests/example_research_agent

# OR navigate to your own agent traces directory:
# cd /path/to/your/agent/data

# Set your API key (required for AI judge setup)
export OPENAI_API_KEY=...

# Start Claude Code
claude

Note: Any supported Pydantic AI model can power this tool. To change the underlying model, set the HAIZE_ANNOTATE_MODEL_NAME environment variable, e.g. "openai:gpt-4.1"

3. Trigger the skill

Once Claude Code is running, activate the skill:

> hey claude use annotate

Claude will guide you through:

Ingesting your raw traces into a normalized format
Configuring what you want to evaluate (pass/fail? pairwise ranking? scoring?)
Annotating based on your bespoke configuration with assistant from an AI judge

The skill handles:

giving Claude the relevant setup scripts and tools to navigate your trace data
distilling and filtering raw agent transcripts into the specific information relevant for annotating

⚠️ Important Notes

Browser Interaction: Claude will automatically open an AI interaction visualizer in your browser during the annotation process. This visualizer provides a detailed view of the agent's interactions and decisions. Use this as your reference when providing feedback, as it shows the full context of what the AI agent did.

Code execution: During the ingestion phase, Claude will generate a custom ingest.py script containing a ingest() function that transforms your raw trace data. You must review this generated code before execution - it will write custom parsing logic based on your specific data format. Only run the script if you understand what it's doing and agree with the transformation logic.

What Gets Created

When you use this skill, a .haize_annotations/ directory is created in your working directory

your-project/
├── .haize_annotations/
│   ├── ingest.py              # Custom ingestion script
│   ├── ingested_data/         # Ingested agent interactions
│   ├── feedback_config.json   # Evaluation criteria
│   └── test_cases/            # Annotated cases
└── <your-raw-traces.jsonl>

All state lives here - delete it to start fresh.

Dependencies

The skill needs Python and Node.js dependencies:

# Install Python backend dependencies
cd ~/.claude/skills/annotate

# Create and activate virtual environment (if not already done)
uv venv
source .venv/bin/activate

pip install -r requirements.txt

# Install frontend dependencies
(cd frontend && yarn install)

Next steps

Once you've landed on an annotation experience / judge you find useful, ask claude for next steps:

Example: > next steps
Example: > for next steps help me export the rubric to my current eval setup
Example: > for next steps i want to convert this to a quick web-app we can use for annotations

Limitations

Traces can't be too large - they currently have to fit in an LLM call for summarization purposes
If the source data is just missing some info (e.g. session id), there's not much we can do - the ingestion script is very basic and cannot re-construct or do intelligent analysis
This is a lightweight local-only tool; we manually cap the number of test cases to 100 at any given time

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
annotate_skill		annotate_skill
tests/example_research_agent		tests/example_research_agent
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.toml		.secrets.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Haize Annotate

Overview

Quick Start

1. Install the skill

2. Navigate to a directory with agent traces

3. Trigger the skill

⚠️ Important Notes

What Gets Created

Dependencies

Next steps

Limitations

About

Uh oh!

Releases

Packages

Contributors 2

Languages

haizelabs/annotate

Folders and files

Latest commit

History

Repository files navigation

Haize Annotate

Overview

Quick Start

1. Install the skill

2. Navigate to a directory with agent traces

3. Trigger the skill

⚠️ Important Notes

What Gets Created

Dependencies

Next steps

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages