Skip to content

feat(bridge): Phase 2 — Vision + Context (GB screenshot → structured state) #2

@gHashTag

Description

@gHashTag

TASK: Phase 2 — Vision + Context

Status from Phase 0+1

  • but mcp running — 66 tools loaded
  • ✅ GitButler MCP server running
  • tools/list → 500 error — SDK issue @hono/mcp@^0.2.3
  • trios-mcp-bridge scaffolded at packages/browseros-agent/apps/trios-mcp-bridge/

Mission

Fix the SDK issue and implement vision pipeline:
screenshot → LLM → structured GitButler state JSON

PHI LOOP: edit spec → seal hash → gen → test → verdict → experience → skill commit → git commit


Step 1: Fix @hono/mcp SDK 500 error

Diagnose and fix tools/list returning 500.

# Check current version
cat packages/browseros-agent/apps/trios-mcp-bridge/package.json | grep hono

# Option A: pin to working version
npm install @hono/mcp@0.1.x

# Option B: switch to official MCP SDK
npm install @modelcontextprotocol/sdk

# Verify fix
curl http://localhost:9005/mcp/tools/list
# Expected: JSON array of tool names

Deliverable: tools/list returns 200 + JSON list. Log fix to .trinity.


Step 2: take_gitbutler_screenshot tool

Calls BrowserOS CDP to capture GitButler app window:

// tools/take_gitbutler_screenshot.ts
export async function takeGitButlerScreenshot(): Promise<{
  base64: string;     // PNG base64
  timestamp: string;  // ISO
  window: string;     // 'gitbutler' | 'unknown'
}>

Logic:

  1. Find GitButler window via CDP tab list (match title GitButler)
  2. Capture screenshot via Page.captureScreenshot
  3. Return base64 PNG

Fallback: if CDP fails → try screencapture -x (macOS) as secondary, log warning.


Step 3: analyze_gitbutler_ui tool (vision core)

Sends screenshot to LLM, returns structured state:

// tools/analyze_gitbutler_ui.ts
export interface GitButlerUIState {
  activeBranch: string;           // e.g. "feature-xyz"
  changedFiles: FileChange[];     // [{path, status: 'modified'|'added'|'deleted', staged: boolean}]
  stacks: Stack[];                // [{name, commits: number}]
  hasUncommitted: boolean;
  rawSummary: string;             // LLM free-form summary
}

export async function analyzeGitButlerUI(
  base64png: string
): Promise<GitButlerUIState>

LLM prompt (system):

You are analyzing a GitButler UI screenshot.
Return ONLY valid JSON matching the GitButlerUIState schema.
Extract: active virtual branch name, list of changed files with their staged status, stack names.
If you cannot determine a field, use null.

Step 4: commit_visible_changes tool (high-level)

Chains screenshot → analyze → stage → commit:

// tools/commit_visible_changes.ts
export async function commitVisibleChanges(params: {
  message: string;
  stageAll?: boolean;  // default: true (stage all unstaged)
}): Promise<{ sha: string; branch: string; filesCommitted: string[] }>

Flow:

  1. takeGitButlerScreenshot()
  2. analyzeGitButlerUI(screenshot) → get changed files
  3. Call but mcp tool stage_changes with file list
  4. Call but mcp tool create_commit with message
  5. Return commit SHA + summary

Step 5: Tests

// tests/vision.test.ts
describe('analyze_gitbutler_ui', () => {
  it('parses staged files from screenshot', async () => {
    const fixture = readFileSync('tests/fixtures/gb_screenshot.png', 'base64');
    const state = await analyzeGitButlerUI(fixture);
    expect(state.changedFiles.length).toBeGreaterThan(0);
    expect(state.activeBranch).toBeTruthy();
  });
});

Add fixture: save a real GitButler screenshot to tests/fixtures/gb_screenshot.png.


Experience log

echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] PHASE2: vision pipeline complete | tools: take_screenshot + analyze_ui + commit_visible | SDK fix: @hono/mcp patched" \
  >> experience/trios/phase2_vision_context.trinity

Laws

Law Rule
L1 Closes #2 in every PR
L3 ASCII-only, English identifiers
L4 Tests for every tool
L7 NO .sh files

Success Criteria

  • tools/list returns 200 (SDK fixed)
  • take_gitbutler_screenshot returns base64 PNG via MCP POST
  • analyze_gitbutler_ui returns GitButlerUIState JSON from real screenshot
  • commit_visible_changes("msg") stages + commits — returns SHA
  • All tests pass (bun test or npx vitest)
  • experience/trios/phase2_vision_context.trinity committed

Commit template

git commit -m "feat(bridge): Phase 2 vision pipeline (screenshot → LLM → GB state)

- fix: @hono/mcp SDK 500 on tools/list
- feat: take_gitbutler_screenshot via CDP
- feat: analyze_gitbutler_ui → GitButlerUIState JSON
- feat: commit_visible_changes high-level tool
- tests: fixture-based vision tests
- experience: phase2_vision_context.trinity sealed
Closes #2"

Agent returns:

✅ TASK phase2-vision-context COMPLETED | SHA: xxxxxxxxx

φ² + 1/φ² = 3 | TRINITY | GO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions