Open schema + CLI for crowdsourcing agent traces to Hugging Face Hub.
Every coding session with an AI agent produces action trajectories, tool-use sequences, and reasoning chains. These are the most valuable dataset nobody is collecting in the open. opentraces captures them automatically, scans for secrets, and publishes structured JSONL datasets to HuggingFace Hub. Private by default. You control what leaves your machine.
Sharing coding agent sessions risks leaking secrets and PII. opentraces applies context-aware scanning and redaction, but no redaction is perfect. Read the security docs before use.
- Parse agent sessions (Claude Code, Cursor, Cline, Codex, Hermes)
- Scan every field for secrets, API keys, paths, and PII
- Redact detected secrets with
[REDACTED]or hashed path segments - Enrich with git signals, attribution, cost estimates, dependency metadata
- Review in the browser or terminal before anything leaves your machine
- Push approved traces as sharded JSONL to a Hugging Face dataset
If you use coding agents for open source work, please share your sessions.
Public session data helps improve coding agents with real-world tasks, tool use, failures, and fixes instead of toy benchmarks. For the full explanation, see opentraces.ai.
Published datasets are tagged opentraces and agent-traces, findable via:
pip install opentracesOr from source:
python3 -m venv .venv
source .venv/bin/activate
pip install -e packages/opentraces-schema
pip install -e ".[dev]"push needs huggingface-cli:
pip install "huggingface_hub[cli]"
huggingface-cli loginWhen logging in, create a token at https://huggingface.co/settings/tokens with write scope.
Paste this into your coding agent to set up trace collection:
I want you to set up opentraces in this project for trace collection.
Step 1 - Install:
pipx install opentraces
Step 2 - Authenticate:
Run `opentraces auth status` to check if already logged in.
If not authenticated, ask me to run `opentraces login --token` myself,
I need to paste a HuggingFace access token with write scope
(from https://huggingface.co/settings/tokens).
Step 3 - Choose agent:
Ask me which coding agent I'm using. Supported agents can be
listed with `opentraces init --help` (the --agent option).
If I'm using you (the agent running this prompt), detect which
agent you are and suggest that. I can pick multiple.
Step 4 - Initialize and install skill:
opentraces init --agent <chosen-agent> --review-policy review --start-fresh
This sets up automatic trace collection with manual review before
anything is shared, and installs the opentraces agent skill into
.agents/skills/opentraces/ (plus a symlink in .<agent>/skills/)
so you have the full CLI reference for future sessions. If your agent
already has past sessions for this repo, use `--import-existing` to bring
that backlog into the inbox immediately, or `--start-fresh` to begin from now on.
After setup, the workflow is:
- `opentraces web` to inspect traces before sharing
- `opentraces commit --all` to commit inbox traces
- `opentraces push` to publish committed traces to HuggingFace
# Authenticate and initialize
opentraces login --token
opentraces init --review-policy review
# Review traces in the browser
opentraces web
# Commit reviewed traces
opentraces commit --all
# Publish to HuggingFace Hub
opentraces push --repo your-username/my-tracesEvery string field in every trace record is scanned using context-aware rules:
| Field type | Scan mode | Notes |
|---|---|---|
| Message content, system prompts | Full scan | Regex + entropy + classifier |
| Reasoning content | Regex only | No entropy (too noisy) |
| Tool inputs | Full or regex | Depends on tool type |
| Tool results, observations | Regex only | High-entropy output expected |
| Patches, diffs | Full scan | Truncated when very large |
A second pass runs over the serialized JSONL output, so redaction does not depend on field shape alone.
- Embedded reasoning: LLM thinking blocks may contain paraphrased secrets
- Non-standard secret formats: only common API key and token patterns are matched
- Contextual PII: names and emails in free text require manual review
The security pipeline targets the common case reliably. For everything else, the review step exists.
See the full scanning docs and security tiers.
The trace format is defined in packages/opentraces-schema/. Each JSONL line is a self-contained TraceRecord covering one complete agent session: steps (TAO loops), tool calls, outcome signals, attribution, and security metadata.
Designed for the people who consume traces, not just the tools that produce them:
- Training / SFT , clean message sequences with role labels, tool-use as tool_call/tool_result pairs, outcome signals
- RL / RLHF , trajectory-level reward signals, step-level annotations, decision point identification
- Telemetry , token counts, latency, model identifiers, cache hit rates, cost estimates per step
- Code attribution (experimental) , file and line-level attribution linking each edit back to the agent step that produced it
The schema builds on public standards:
| Standard | Relationship |
|---|---|
| ATIF | Trajectory structure (superset) |
| Agent Trace | Code attribution |
| ADP | Training-pipeline interoperability |
| OTel GenAI | Observability alignment |
Every schema version ships with a rationale document explaining design decisions: RATIONALE-0.2.0.md.
| Section | What's inside |
|---|---|
| Installation | Install, verify, upgrade |
| Authentication | Hugging Face login and credentials |
| Quick Start | Init, inbox, commit, push |
| Commands | Full CLI reference |
| Supported Agents | Claude Code, Cursor, Cline, Codex, Hermes |
| Security | Review policy, scanning, redaction |
| Schema | TraceRecord, steps, outcome, attribution |
| Workflow | Parse, review, assess, push, consume |
| CI/CD | Headless automation and token auth |
| Contributing | Local dev and schema changes |
| Package | Description |
|---|---|
| opentraces | CLI: parse, scan, review, push |
| opentraces-schema | Standalone Pydantic v2 schema models |
| opentraces-ui | Design system: tokens, components, logo assets |
packages/
opentraces-schema/ Schema package (Pydantic v2 models)
opentraces-ui/ Design system (tokens, components)
src/opentraces/
parsers/ Agent session parsers
hooks/ Claude Code hook scripts (on_stop, on_compact)
security/ Secret scanning, anonymization, classification
enrichment/ Git signals, attribution, metrics
quality/ Trace quality assessment, upload gates
clients/ Browser and terminal review frontends
upload/ HF Hub sharded upload
pipeline.py Shared enrichment + security pipeline
web/
viewer/ React inbox UI
site/ Next.js marketing site + MkDocs documentation
tests/ Test suite
Schema feedback, questions, and proposals are welcome via GitHub Issues. For schema changes, include what you would change, why it matters for your use case, and how it relates to existing standards. See the VERSION-POLICY.md for how changes are versioned.
python3 -m venv .venv
source .venv/bin/activate
pip install -e packages/opentraces-schema
pip install -e ".[dev]"
pytest tests/ -vMIT