-
Notifications
You must be signed in to change notification settings - Fork 2
Provider Plugin Contract
How budi supports a new AI coding agent. The current set is Claude Code, Codex CLI, Copilot CLI, Copilot Chat (VS Code-family), and Cursor; everything below is what a contributor needs to know to add a sixth.
This page describes the shape of the extension surface. The live-path rationale (why a Provider trait at all, why JSONL tailing is the only live path) is pinned in JSONL Tailing as Live Ingestion (ADR-0089). The attribution contract every row must uphold is in SOUL.md.
Every supported agent implements Provider, defined in crates/budi-core/src/provider.rs. It owns four things and nothing else:
| Method | Returns | Purpose |
|---|---|---|
discover_files() |
Vec<DiscoveredFile> |
One-shot enumeration of all transcript files this agent has ever written. Used by budi db import for historical backfill. |
parse_file(path, content, offset) |
Vec<ParsedMessage> (+ new offset) |
Incremental parse of one transcript file starting at the stored byte offset. Returns whatever new messages it found. The shared pipeline turns those into canonical rows. |
watch_roots() |
Vec<PathBuf> |
The directories the daemon's filesystem tailer subscribes to via notify. New files matching this provider's transcript shape are picked up automatically. |
sync_direct(...) (optional) |
(provider-specific) | Only for agents with a real Usage API. Currently used by Cursor to pull per-request cost/token truth-up from the dashboard API (Cursor Usage API Contract). |
Everything else — cost calculation, repo / branch / ticket attribution, tool-outcome inference, deduplication, cloud sync — runs in the shared pipeline. Providers are parsers, not full ingestion paths. That is the load-bearing rule: a new agent should be a one-file change under crates/budi-core/src/providers/ plus its registration.
After a Provider hands back ParsedMessages, the daemon runs them through an ordered enricher chain. Order matters — each enricher depends on prior enrichers.
-
IdentityEnricher— stamps the row withsession_id,provider, the daemon install ID, and a deterministic message identity. -
GitEnricher— resolvesrepo_idandgit_branchfrom the message'scwd(or from the per-linegitBranchthe transcript carries natively); extractsticket_idfrom the branch name with companionticket_source/ticket_prefixtags. -
ToolEnricher— extracts tool-call outcomes from tool-result blocks; emitstool_outcome(success/error/denied/retry) withtool_outcome_sourceandtool_outcome_confidencesiblings. -
FileEnricher— extracts per-file attribution from file-aware tool arguments (Read/Write/Edit/MultiEdit/Grep/ …); enforces the repo-root privacy boundary (no outside-of-repo paths, no file contents). -
CostEnricher— looks up the price for(model, provider)via the singlepricing::lookupcall and writescost_cents+pricing_source. See Model Pricing – Embedded Baseline and Runtime Refresh for the manifest contract. -
TagEnricher— finalizes the row for SQLite write.
A provider whose transcript already carries some of these fields still goes through the chain; enrichers no-op on rows whose fields are already populated. The chain is also where the cross-message tool-outcome correlation lives, so the live tailer and budi db import produce identical rows from identical inputs.
| Provider id | Watch root(s) | Source crate | Notes |
|---|---|---|---|
claude_code |
~/.claude/projects/<project-hash>/ |
providers/claude_code.rs |
JSONL, one record per turn. Carries sessionId, model, token counts, cwd, gitBranch natively. |
codex |
~/.codex/sessions/<session-id>/ |
providers/codex.rs |
Same shape; slightly different field names. The provider normalizes them. |
copilot_cli |
~/.copilot/session-state/ |
providers/copilot.rs |
Standalone Copilot CLI; unrelated to the VS Code extension. |
copilot_chat |
VS Code-family workspaceStorage/ + globalStorage/ across Code, Insiders, Exploration, VSCodium, Cursor, and remote-server installs |
providers/copilot_chat.rs |
Five envelope shapes, five token-key dispatches, plus the v3 output-only fallback for May-2026+ builds. See Copilot Chat Data Contract (ADR-0092). |
cursor |
state.vscdb (cursorDiskKV bubbles) + ~/.cursor/projects/*/agent-transcripts/
|
providers/cursor.rs |
Bubbles are primary as of 2026-04-23; the Usage API is a supplementary overage signal. See Cursor Usage API Contract (ADR-0090). |
A reference fixture lives next to each provider under the crate's tests/ directory. Fixtures are real (scrubbed) transcripts from the maintainer's own machine, not hand-written examples — regressions surface against shapes we actually see in production.
For a hypothetical new agent (call it gemini):
-
Discover the transcript path and shape. Confirm it actually writes parseable JSONL (or that the on-disk shape is at least machine-readable). If it only persists conversation state in-memory or in an opaque binary blob, the agent is out of scope for the tailer path until it ships a transcript option. The Cursor case (where the on-disk state is
state.vscdb) is the existing precedent for non-JSONL shapes. -
Add
crates/budi-core/src/providers/gemini.rsimplementingProvider. Map the agent's fields intoParsedMessage. Uphold the attribution contract: RFC3339 UTC timestamps, canonicalsession_id, normalizedgit_branch(norefs/heads/, no detachedHEAD). -
Add a fixture under the crate's
tests/tree with a small real transcript (scrubbed). Write a parser test that runsparse_fileagainst the fixture and asserts row counts and key fields. - Register the provider in the daemon's startup wiring.
-
If the agent's cost model is unusual (per-request flat fee, billing-API truth-up, etc.), extend the relevant downstream surface —
CostEnricherfor pricing, async_directimpl for billing-API reconciliation — rather than baking it into the provider. The Copilot Chat GitHub Billing API truth-up insync/copilot_chat_billing.rsis the existing precedent for that shape. - If the agent ships an undocumented upstream API (Cursor, Copilot Chat), pin the contract as a wiki ADR before merging the parser. The ADR is the place the next maintainer reaches for when the upstream shape shifts; the parser is updated in lockstep.
A new provider PR should not need to touch the cloud sync path, the schema, the auth code, or the dashboard UI. If it does, the design has drifted and we should fix the architecture before merging.
-
No network calls. Providers run on the engineer's laptop; the only IO they do is reading transcript bytes from disk. Network truth-up belongs in
sync_direct, which is a scheduled pull — not part of the live hot path. -
No reading of prompt or response content. Token counts, model id, timestamps,
cwd,gitBranch, tool-call arguments needed for file attribution — yes. Prompt text, response text, embedded code blocks, tool-result body — no. The privacy boundary is enforced in the parser, not at the upload boundary. - No mutation of the transcript file. Even fixing what looks like a malformed line gets us into territory where we are silently editing the engineer's data on disk.
-
No outside-of-repo file paths.
file_attribution::attribute_filesstrips absolute paths against the message'scwd/ resolved repo root and drops anything that cannot be proven to sit inside the repo root. This runs after the provider, but providers should hand back raw candidate paths from tool arguments rather than pre-resolving them.
-
crates/budi-core/src/provider.rs— trait definition -
crates/budi-core/src/providers/— five reference implementations -
crates/budi-core/src/pipeline/mod.rs—Pipelinestruct,default_pipeline()ordering, cross-message tool-outcome correlation -
crates/budi-core/src/pipeline/enrichers.rs— all enricher implementations -
crates/budi-core/src/file_attribution.rs— repo-relative file-path extractor; enforces privacy limits -
crates/budi-core/src/jsonl.rs— shared JSONL parser,ParsedMessagestruct
- Per-agent transcript-rotation handling and tailer mechanics → Daemon Lifecycle and Autostart
- Cost-pricing resolution and immutable history → Model Pricing – Embedded Baseline and Runtime Refresh (ADR-0091)
- Cursor- and Copilot Chat-specific upstream contracts → Cursor Usage API Contract (ADR-0090), Copilot Chat Data Contract (ADR-0092)
- Cloud sync of the resulting rows → Cloud Data Contract and Privacy Boundary (ADR-0083)
budi · Issues · Releases · app.getbudi.dev · getbudi.dev
Start here
ADRs — Data & privacy
ADRs — Ingestion
ADRs — Pricing
- Model Pricing – Embedded Baseline and Runtime Refresh
- Custom Team Pricing and Effective Cost
- Codex Cost Model – Marginal-Token Counting
ADRs — Provider contracts
Operational references
- Daemon Lifecycle and Autostart
- Provider Plugin Contract
- Cloud Sync Mechanics
- Statusline Integration
- Operations and Observability
- Release and Versioning
Ecosystem