-
Notifications
You must be signed in to change notification settings - Fork 2
Copilot Chat – JetBrains Storage Contract
- Date: 2026-05-11
- Status: Accepted
- Issue: #716
- Milestone: 8.4.3
- Related: ADR-0092 (Copilot Chat data contract — this ADR amends §2.1 by adding a JetBrains host class), ADR-0088 §7 (host-scoped vs. provider-scoped surfaces)
ADR-0092 §2.1 pins the local-tail path roots for Copilot Chat at five VS Code-family directory names under each OS's application-support root (Code, Code - Insiders, Code - Exploration, VSCodium, Cursor). That contract covered every Copilot Chat host that mattered at the time the ADR was accepted — VS Code stable, Insiders, the Exploration channel, VSCodium, Cursor, and every remote/dev-container shape of each.
The 8.4.x train added JetBrains as a first-class host. The budi-jetbrains plugin shipped to the JetBrains Marketplace on 2026-05-08 (listing: https://plugins.jetbrains.com/plugin/31662-budi) and renders the statusline against whatever the daemon attributes to surface=jetbrains. The classifier at crates/budi-core/src/surface.rs::infer_copilot_chat_surface was already wired in #701 to return jetbrains for JetBrains-shaped paths, but the corresponding watch_roots() discovery in crates/budi-core/src/providers/copilot_chat.rs continued to iterate only VS Code-family roots. The JetBrains classifier therefore never fired against a real path in production — the statusline rendered zeros for every JetBrains-only user.
Before a JetBrains-side parser ticket can land, the storage shape needs the same evidence-based treatment ADR-0092 §2.3 gave to the VS Code side. Without a captured fixture and an explicit data contract, the parser would be guessing at file framing, entity names, and value-encoding strategies from documentation alone — and the JetBrains shape diverges materially enough from VS Code's plain-JSON tail that the existing contract cannot be stretched to cover it.
The Copilot Chat provider's JetBrains-side discovery and parser are scoped to the storage shape described below. This ADR is the spec a forthcoming JetBrains parser ticket implements against; any change in the GitHub Copilot for JetBrains plugin's storage layout is handled by amending this ADR in the same PR as the parser update.
Provider id stays copilot_chat. Per ADR-0088 §7, the surface dimension carries the host distinction: VS Code-family rows surface as surface=vscode or surface=cursor; JetBrains rows surface as surface=jetbrains. This ADR does not introduce a new provider — JetBrains is a host of the same Copilot Chat provider, no different in identity from how Cursor is a host of the same Copilot Chat provider for VS Code-family files.
The provider's watch_roots() is extended with a JetBrains-side root in addition to the five VS Code-family roots already pinned in ADR-0092 §2.1.
-
macOS / Linux:
~/.config/github-copilot/ -
Windows:
%LOCALAPPDATA%\github-copilot\(needs an on-Windows capture pass to confirm; the on-disk layout is otherwise expected to be identical)
This root is not under Application Support/JetBrains/<Product><Year>/ — the assumption in the original #716 ticket text turned out to be wrong. The GitHub Copilot plugin uses an XDG-style root keyed off the GitHub identity, identical to the path the GitHub CLI publishes under, plus a per-IDE-flavor sub-slug.
~/.config/github-copilot/
├── apps.json # OAuth tokens. Treat as secret; never read by budi.
├── versions.json # {"copilot-intellij":"<plugin-version>"}
├── copilot-intellij.db # SQLite single-table state (first-boot flag etc.). Not interesting.
├── intellij/ # Shared cross-IDE settings (instruction markdown, mcp.json)
└── <ide-slug>/ # ic, iu, ws, and others per JetBrains product family
├── chat-sessions/<session-id>/
├── chat-agent-sessions/<session-id>/
├── chat-edit-sessions/<session-id>/
└── bg-agent-sessions/<session-id>/
Known IDE slugs observed so far: ic (IntelliJ IDEA Community), iu (IntelliJ IDEA Ultimate), ws (WebStorm), intellij (shared cross-IDE settings — not a session-bearing slug). PyCharm, GoLand, RustRover, PhpStorm etc. will introduce additional slugs that the discovery code must enumerate by directory-listing rather than a hardcoded allowlist — the slug set is open by design, the same way ADR-0092 §2.1's VS Code-family allowlist would be wrong to pin closed.
<session-id> is a 27-character base58-shaped opaque identifier (e.g. 36WZJbBx05NpO28apIrHaBmmyCJ). The same <session-id> may appear under multiple IDE slugs concurrently when the same chat conversation is opened from different JetBrains products — they are independent stores, not symlinked, and must be tailed independently.
Each <session-id>/ directory contains a binary dual-store layout:
<session-id>/
├── 00000000000.xd # Xodus log file (binary, JetBrains' embedded entity store)
├── xd.lck # ASCII Xodus lockfile; first line embeds host name & PID
├── copilot-chat-nitrite.db # Nitrite NoSQL document store (MVStore-backed). May be absent on legacy sessions.
└── blobs/
└── version # 4-byte version stamp (observed: 00 00 00 01)
This is fundamentally different from the VS Code-side contract (ADR-0092 §2.3), which is plain newline-delimited JSON the parser can stream-read with serde_json::from_str. The JetBrains side requires either a Java/Kotlin bridge, a reimplemented Xodus log reader, or — pragmatically — a parse-on-top of strings(1)-extracted byte patterns at the cost of robustness. The choice between these is parser-ticket scope, not ADR scope.
Xodus log (00000000000.xd): written by JetBrains' embedded transactional key/value store, accessed via kotlinx-dnq ORM. Entity types observed across the per-directory schemas:
| Directory | Xodus entity types |
|---|---|
chat-sessions/ |
XdChatSession, XdClient, XdSelectedModel, XdMigration
|
chat-agent-sessions/ |
XdAgentSession, XdMigration
|
chat-edit-sessions/ |
TBD — same Xodus scaffold; capture in follow-up |
bg-agent-sessions/ |
TBD — observed only under iu/ so far |
XdChatSession properties observed: activeAt, createdAt, editorName, editorPluginVersion, editorVersion, modifiedAt, nameSource, projectName. XdSelectedModel properties observed: modelName, scope. XdClient carries a per-install UUID treat-as-PII.
Nitrite store (copilot-chat-nitrite.db, copilot-agent-sessions-nitrite.db, copilot-edit-sessions-nitrite.db): Nitrite NoSQL documents on top of H2 MVStore 2.2.224. Header line (ASCII at offset 0): H:2,block:8,blockSize:1000,.... Document values are wrapped in NitriteDocument (LinkedHashMap-backed) and persisted via standard Java serialization (ac ed 00 05 framing). Collections observed:
| File | Nitrite collection (FQCN) |
|---|---|
copilot-chat-nitrite.db |
com.github.copilot.chat.session.persistence.nitrite.entity.NtSelectedModel |
copilot-agent-sessions-nitrite.db |
NtAgentSession, NtAgentTurn, NtAgentWorkingSetItem
|
copilot-edit-sessions-nitrite.db |
TBD |
NtSelectedModel fields observed: scope, modelName, _revision, _modified, $nitrite_id. NtAgentTurn is the most likely candidate for per-message attribution but its inner schema needs a non-empty fixture capture before this ADR can pin it; the parser ticket is the right place for that follow-up.
Critically, no promptTokens / outputTokens / per-message token counts were observed in either the Xodus or Nitrite schema-string inventory on any captured session — empty or populated. The JetBrains plugin appears to persist session metadata and selected-model state locally, but not the per-turn token telemetry the VS Code-side surface emits to usage.* and result.metadata.* keys (ADR-0092 §2.3).
This has direct consequences for the data contract:
- Local-tail attribution for JetBrains is best-effort metadata-only. A JetBrains parser can attribute "a session existed at this time, with this model selected, under this project" but cannot dollarize the per-turn cost from local data alone.
-
GitHub Billing API reconciliation is the primary token source for JetBrains sessions, not the supplementary truth-up role it plays for VS Code-family sessions. The reconciliation path in
crates/budi-core/src/sync/copilot_chat_billing.rsalready handles individually-licensed users; org-managed-license JetBrains users will see zero billing-API token rows in the same way they already do on VS Code, since the upstream API itself is empty for those users (ADR-0092 §3). - The statusline rolling 1d/7d/30d aggregate will not reflect JetBrains usage in real time the way it does for VS Code-family local tails. JetBrains contributions land only after the next billing-API reconciliation pass. This is an acceptable trade for 8.4.x — the alternative (parse the binary stores) is parser-ticket scope and may be revisited in 8.5+ if the local stores turn out to carry token data in fields not yet inspected.
If a future inspection of NtAgentTurn reveals token fields, this section is amended in lockstep with the parser change that consumes them — the surface contract and the code must never disagree.
The classifier at crates/budi-core/src/surface.rs::infer_copilot_chat_surface already returns surface::JETBRAINS for paths under ~/.config/github-copilot/ (the unit test at crates/budi-core/src/providers/copilot_chat.rs::surface_jetbrains_path_classifier_returns_jetbrains_placeholder pins this). This ADR does not change classifier behavior; it only describes the storage shape the discovery code will walk before the classifier ever sees a path.
The "placeholder" framing in the existing classifier comments is retained until a JetBrains-side parser actually emits rows. Until then, the placeholder language honestly describes the state of the system: the classifier knows what surface=jetbrains means; the discovery code has the path; the parser is the missing piece.
- A new fixture (
crates/budi-core/src/providers/copilot_chat/fixtures/jetbrains_copilot_1_5_53_243_empty_session/plus.shape.mdand.expected.json) anchors the next parser ticket against ground truth instead of synthetic shapes. - The JetBrains-side
watch_roots()extension can land as a small follow-up against ADR-0092 §2.1; the path root is now pinned here. - The statusline-only "Partial" status on the JetBrains row of the README's "Supported agents" table is the correct level of honesty until the parser lands. When local-store parsing or billing-API-only reconciliation produces non-zero JetBrains rows in production, that row promotes to "Supported" in the same PR as the parser merges.
- ADR-0092's §2.1 path-root contract continues to be the VS Code-family side's authoritative spec. This ADR sits alongside it as the JetBrains-side companion rather than amending §2.1 in place, because the binary dual-store shape diverges far enough from the plain-JSON contract that section-level amendment would obscure rather than clarify.
- Does
NtAgentTurncarry token counts in its serialized document body, or are local-only files strictly metadata? - What's the
xd.lckconcurrency contract — does Copilot for JetBrains release the Xodus lock when idle, allowing read-only opens while the IDE is running, or must the daemon defer reads until IDE shutdown? - Confirm Windows path (
%APPDATA%vs%LOCALAPPDATA%) once a Windows capture is feasible. - Confirm the IDE-slug discovery pattern (directory-listing under
~/.config/github-copilot/excludingintellij/and known top-level files) is forward-compatible with new JetBrains products as the plugin ships to them.
Amendment 2026-05-11 — #757: dual-store probe accepts either .xd or .nitrite.db as the existence marker
Post-acceptance smoke testing in v8.4.4 surfaced a behavioral fact §4 did not anticipate: recent versions of the GitHub Copilot for JetBrains plugin skip the Xodus log entirely on new sessions and persist conversation state to the Nitrite store only. A real-world chat-sessions/<session-id>/ captured on 2026-05-11 carried copilot-chat-nitrite.db (mtime matched the most recent prompt) and no 00000000000.xd at all. The original parser shape — which bailed when .xd was missing — therefore emitted zero rows for every post-migration JetBrains session, leaving the surface=jetbrains rollup at $0.00 even when reconciliation rows existed upstream.
The parser's existence-check accepts either of the two stores. Probe order:
-
00000000000.xd— legacy shape; if present and populated, use it (this preserves the pre-#757 behavior for old sessions verbatim). -
copilot-chat-nitrite.db,copilot-agent-sessions-nitrite.db,copilot-chat-edit-sessions-nitrite.db— current shape; first hit wins.
Either path supplies the same one-row-per-populated-session signal: a single assistant-role ParsedMessage with surface=jetbrains, zero token counts (cost reconciles via the GitHub Billing API per §5 above), and a deterministic UUID derived from the session-id + path.
Nitrite writes its collection class names verbatim into the MVStore catalog. The byte-scan looks for these suffixes (the FQCN prefix is the same for every entry — only the class-name tail is matched so the scan stays robust to future Java-package renames):
| Marker | Meaning |
|---|---|
NtChatSession |
Chat session record (chat-sessions/) |
NtAgentSession |
Agent session record (chat-agent-sessions/) |
NtEditSession |
Edit session record (chat-edit-sessions/) |
NtTurn |
Per-turn record under a chat session |
NtAgentTurn |
Per-turn record under an agent session |
NtEditTurn |
Per-turn record under an edit session |
NtSelectedModel is not in this set: it is the per-session model preference Nitrite writes the moment the user opens a chat pane, before any prompt has been sent. Treating it as a populated marker would synthesize fake assistant turns for every empty chat tab.
The parser still does not extract token counts from Nitrite — §5's conclusion stands. The byte-scan is a "this session is non-empty" signal, not a full MVStore + Java-serialization decoder. Full per-turn extraction (parsing the BSON-like document bodies into role / content / tokens / model) remains parser-ticket scope and pairs naturally with the open question on NtAgentTurn above. The amendment closes the regression where Nitrite-only sessions emitted no rows at all; deeper extraction is a future ADR amendment.
Phase 1 of the JetBrains repo-attribution work (#766, v8.4.7 / v8.4.8) shipped a byte-walker for the Xodus log's XdChatSession.projectName property. On the 8.4.8 smoke-test machine that path resolved 3 of 23 surface=jetbrains sessions to a repo_id. The remaining 20 are agent sessions whose data lives only in copilot-agent-sessions-nitrite.db with no .xd (or with a .xd that carries no projectName property), so Phase 1 cannot reach them.
Phase 2 (#778) extends the byte-walker into the Nitrite store to recover whatever file:// URIs the plugin happened to persist. The recovered URIs feed a longest-common-prefix → .git-walk → resolve_repo_id chain that lights up additional sessions without requiring a real MVStore + Java-serialization decoder.
Across 98 real Nitrite DBs surveyed on 2026-05-12, only 3 carried any file:// token at all. Where present, the URIs live inside an escape-encoded JSON blob in NtAgentTurn.stringContent — specifically the currentFileUri key of the per-turn model-state snapshot:
currentFileUri\\\":\\\"file:///Users/me/_projects/Verkada-Web/src/x.tsx\\\",
The byte-walker scans for the literal file:// token, reads bytes until the next non-path-safe character (backslash, quote, brace, control byte, etc.), percent-decodes the result, and dedupes. The output feeds resolve_workspace_from_paths which:
- Computes the longest common directory prefix across all recovered URIs (filename components are dropped).
- Walks the prefix upward until a
.gitdirectory is found. - Resolves the checkout via
crate::repo_id::resolve_repo_idand reads the current branch from.git/HEAD.
The Phase 2 byte-walker only fires on sessions that happen to carry a currentFileUri model-state snapshot. The ticket's original assumption — that NtAgentWorkingSetItem documents would carry per-file URIs in a top-level stringContent TC_STRING — does not hold up on real fixtures. Inspecting the working-set markers in 8.4.8 captures shows each NtAgentWorkingSetItem instance only persists created_at / uuid / last_modified_at plus an opaque _revision cursor; the actual file-reference payload (presumably the fileUrl index field referenced by the Nitrite catalog) lives in a separate MVStore segment that the byte walker cannot reach.
Deferred to a future ADR amendment:
- Full MVStore page-segment + Java-
ObjectInputStreamdecoder forNtAgentWorkingSetItemdocuments, which would let the resolver light up every agent session that has ever opened a file, not just those that captured acurrentFileUrimodel-state snapshot. - LZF-decompression support for MVStore pages, should a future plugin version begin writing compressed pages. The current heuristic looks for
format:3(uncompressed default) and proceeds; pages outside that format will not yield URIs.
The #778 ticket asked for "≥ 12 of 23" sessions resolved post-Phase-2. The honest delivered bar against this data shape is +1 session beyond Phase 1 — the one additional agent session that happens to carry a currentFileUri snapshot for a repo not already covered by Phase 1. The data simply does not contain enough file:// tokens to reach the speculative bar. Sessions that do not write a currentFileUri cleanly fall through to the Phase 1 projectName heuristic (where present) or remain with repo_id = NULL (the honest "we don't know" signal — the dashboard renders Repo: (unknown) rather than guessing).
Fixture: crates/budi-core/src/providers/copilot_chat/fixtures/jetbrains_nitrite_working_set_phase2/ (size-preserving redacted from a real session, see the matching .shape.md).
The 8.5.2 polish release (umbrella #798) prompted a keep-vs-retire decision for the two byte-walker phases shipped in v8.4.x / v8.5.0, ahead of the Phase 3 MVStore + Java-serialization decoder planned for 8.6.0 (#789).
The data the decision rests on (from the 8.4.8 smoke-test machine):
-
Phase 1 (PR #766, Xodus
XdChatSession.projectNamebyte-scan): resolved 3 of 23surface=jetbrainssessions. -
Phase 2 (PR #778, Nitrite
currentFileUriJSON-blob byte-scan): resolved 0 additional sessions over Phase 1 in the production capture (the §"Amendment 2026-05-12" "+1 session" figure was the honest hypothetical against the data shape — it was not observed end-to-end in the post-#778 smoke run).
Both byte-walkers are retired conditionally — kept in tree, but every entry-point function, struct, and call-site block carries a retire-with: #789 annotation so the Phase 3 implementor knows exactly what gets deleted when the real decoder lands. Rationale per phase:
- Phase 1: real value (3-of-23) as a low-cost fast-path. Even after Phase 3 ships, it may remain useful if the decoder is heavier than the heuristic on cold caches; the call is deferred to whoever lands #789. Until then, do not delete.
-
Phase 2: marginal-to-zero value, but deleting it now without re-running the original smoke capture risks an unmeasured regression on the +1 hypothetical session (or on any future plugin version whose default-on
currentFileUrisnapshot shape we haven't seen yet). The Phase 3 decoder will subsume it; deletion belongs in that PR, not this one.
A retire-with: #789 comment promises:
- The annotated function/block is eligible for deletion in the PR that closes #789.
- The Phase 3 implementor is not obligated to delete it — they may keep it as a fast-path if it earns its keep against the decoded baseline. The "retire-with" is a permission to delete, not a mandate.
- Removing an annotation without deleting the code requires a fresh ADR amendment explaining why the heuristic graduated from "retire-with" to "permanent".
-
extract_xodus_project_name(function-level rustdoc) - the
xd_candidateblock insideparse_session_dir
-
extract_nitrite_workspace_paths(function-level rustdoc) -
resolve_phase2_workspace(function-level rustdoc — companion resolver) - the
phase2block insideparse_session_dir
- No smoke-test re-run was performed for this release. The resolution numbers are quoted from the 8.4.8 and #778 captures, not freshly measured against 8.5.1. Re-measurement happens as part of the Phase 3 PR, against the same fixture set, so the decoder's delta over the heuristics is honestly comparable.
- Phase 1 and Phase 2 are not "frozen": small fixes (e.g. new path-safe terminator bytes, fixture additions) remain in scope. The retire-with annotation marks the block, not the behavior — bug-fixes inside an annotated block do not need a fresh ADR amendment.
Last verified against code on 2026-05-14.
Wiki cross-references audited on 2026-05-23 — broken relative paths from the retired in-tree docs/adr/ directory were rewritten to wiki links; references to historical ADRs 0081 / 0082 / 0086 / 0088 were left as plain text. Content claims against current source were not re-verified in this pass.
budi · Issues · Releases · app.getbudi.dev · getbudi.dev
Start here
ADRs — Data & privacy
ADRs — Ingestion
ADRs — Pricing
- Model Pricing – Embedded Baseline and Runtime Refresh
- Custom Team Pricing and Effective Cost
- Codex Cost Model – Marginal-Token Counting
ADRs — Provider contracts
Operational references
- Daemon Lifecycle and Autostart
- Provider Plugin Contract
- Cloud Sync Mechanics
- Statusline Integration
- Operations and Observability
- Release and Versioning
Ecosystem