Skip to content

feat: parse GitHub.copilot-chat/transcripts/*.jsonl event-stream format #64

@hora7ce

Description

@hora7ce

Summary

VS Code with the GitHub Copilot Chat extension writes session data in two distinct locations:

  1. workspaceStorage/<hash>/GitHub.copilot-chat/ — the existing format already parsed by the extension (per-session JSON blobs).
  2. GitHub.copilot-chat/transcripts/*.jsonl — a newer event-stream format written at the user-data level, alongside the workspace storage directory.

The extension currently only reads format 1. Format 2 is silently ignored, so sessions recorded there never appear in the dashboard.

The transcript format

Each .jsonl file is a newline-delimited stream of typed events:

type Meaning
session.start New conversation begins
user.message User turn
assistant.message AI response
tool.execution_start Tool call begins (includes toolName)
tool.execution_complete Tool call finishes

Who is affected

  • WSL / VS Code Server / Remote SSH / Dev Container users are most likely to encounter this because their workspaceStorage path is nested under ~/.vscode-server/ (fixed in fix: discover .vscode-server workspaceStorage dirs for WSL / Remote SSH / devcontainer #63), but the transcripts/ directory sits at ~/.vscode-server/data/User/GitHub.copilot-chat/transcripts/.
  • Desktop users writing to the standard user-data directory are equally affected if their sessions happen to be stored in the newer format.

Proposed solution

Add two functions to parser-vscode.ts:

  • listTranscriptFiles(dir: string): string[] — finds all *.jsonl files under a transcripts/ subfolder.
  • parseTranscriptFile(filePath: string): SessionData | null — reads the event stream and returns a SessionData object (same shape the rest of the pipeline expects), grouping events into turns and deduplicating tool names.

Wire both into the existing processWorkspaceEntry / processWorkspaceEntryAsync functions so the dashboard picks up sessions from either format transparently.

Acceptance criteria

  • parseTranscriptFile correctly maps event-stream turns to SessionData.requests[]
  • Tool names are deduplicated per turn
  • Empty/malformed files return null gracefully
  • Existing tests remain green
  • New unit tests cover: multi-turn grouping, empty session, tool deduplication, and full flow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions