Skip to content

feat(pdf-server): get_viewer_state interact action#590

Merged
ochafik merged 2 commits intomainfrom
ochafik/pdf-get-viewer-state
Apr 2, 2026
Merged

feat(pdf-server): get_viewer_state interact action#590
ochafik merged 2 commits intomainfrom
ochafik/pdf-get-viewer-state

Conversation

@ochafik
Copy link
Copy Markdown
Contributor

@ochafik ochafik commented Apr 2, 2026

Summary

New interact action get_viewer_state that returns a JSON snapshot of the live viewer:

{
  "currentPage": 3,
  "pageCount": 12,
  "zoom": 126,
  "displayMode": "fullscreen",
  "selectedAnnotationIds": [],
  "selection": {
    "text": "the selected text",
    "contextBefore": "…up to 200 chars before…",
    "contextAfter": "…up to 200 chars after…",
    "boundingRect": { "x": 72.4, "y": 318.1, "width": 211.6, "height": 13.2 }
  }
}

selection is null when nothing is selected (or the selection isn't in the text-layer). boundingRect is in PDF points, top-left origin / y-down — same coord system add_annotations takes, so the model can highlight what's selected without a second round-trip.

Why: the viewer already pushes selection passively via setModelContext (<pdf-selection> tags), but not all hosts surface model-context. This is an explicit pull.

Wiring: new PdfCommand variant → processCommands case → handleGetViewerState → new app-only submit_viewer_state tool (mirrors submit_save_data) → waitForViewerState → text content block.

Description drift fixed: display_pdf's "follow-up actions go through interact" list was missing save_as. Added that and get_viewer_state. The interact description itself already covered every enum action.

Test Plan

  • npm run --workspace examples/pdf-server build
  • npm test — 264 pass / 0 fail
  • e2e (pdf-annotations.spec.ts): two new tests
    • no selection → asserts selection: null, currentPage: 1, displayMode: "inline", numeric pageCount/zoom
    • programmatically select first text-layer span → asserts selection.text matches and boundingRect is present

New interact action that returns a JSON snapshot of the live viewer:
{currentPage, pageCount, zoom, displayMode, selectedAnnotationIds,
 selection: {text, contextBefore, contextAfter, boundingRect} | null}.

The viewer already pushes selection passively via setModelContext as
<pdf-selection> tags, but not all hosts surface model-context. This gives
the model an explicit pull.

selection.boundingRect is a single bbox in PDF points (top-left origin,
y-down) so it can be fed straight back into add_annotations. selection is
null when nothing is selected or the selection is outside the text-layer.

Wiring: new PdfCommand variant -> processCommands case ->
handleGetViewerState -> submit_viewer_state (new app-only tool, mirrors
submit_save_data) -> waitForViewerState -> text content block.

Also fills a gap in the display_pdf description: it listed interact
actions but was missing save_as; added that and get_viewer_state.

e2e: two tests covering selection:null and a programmatic text-layer
selection.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 2, 2026

Open in StackBlitz

@modelcontextprotocol/ext-apps

npm i https://pkg.pr.new/@modelcontextprotocol/ext-apps@590

@modelcontextprotocol/server-basic-preact

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-preact@590

@modelcontextprotocol/server-basic-react

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-react@590

@modelcontextprotocol/server-basic-solid

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-solid@590

@modelcontextprotocol/server-basic-svelte

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-svelte@590

@modelcontextprotocol/server-basic-vanillajs

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-vanillajs@590

@modelcontextprotocol/server-basic-vue

npm i https://pkg.pr.new/@modelcontextprotocol/server-basic-vue@590

@modelcontextprotocol/server-budget-allocator

npm i https://pkg.pr.new/@modelcontextprotocol/server-budget-allocator@590

@modelcontextprotocol/server-cohort-heatmap

npm i https://pkg.pr.new/@modelcontextprotocol/server-cohort-heatmap@590

@modelcontextprotocol/server-customer-segmentation

npm i https://pkg.pr.new/@modelcontextprotocol/server-customer-segmentation@590

@modelcontextprotocol/server-debug

npm i https://pkg.pr.new/@modelcontextprotocol/server-debug@590

@modelcontextprotocol/server-map

npm i https://pkg.pr.new/@modelcontextprotocol/server-map@590

@modelcontextprotocol/server-pdf

npm i https://pkg.pr.new/@modelcontextprotocol/server-pdf@590

@modelcontextprotocol/server-scenario-modeler

npm i https://pkg.pr.new/@modelcontextprotocol/server-scenario-modeler@590

@modelcontextprotocol/server-shadertoy

npm i https://pkg.pr.new/@modelcontextprotocol/server-shadertoy@590

@modelcontextprotocol/server-sheet-music

npm i https://pkg.pr.new/@modelcontextprotocol/server-sheet-music@590

@modelcontextprotocol/server-system-monitor

npm i https://pkg.pr.new/@modelcontextprotocol/server-system-monitor@590

@modelcontextprotocol/server-threejs

npm i https://pkg.pr.new/@modelcontextprotocol/server-threejs@590

@modelcontextprotocol/server-transcript

npm i https://pkg.pr.new/@modelcontextprotocol/server-transcript@590

@modelcontextprotocol/server-video-resource

npm i https://pkg.pr.new/@modelcontextprotocol/server-video-resource@590

@modelcontextprotocol/server-wiki-explorer

npm i https://pkg.pr.new/@modelcontextprotocol/server-wiki-explorer@590

commit: 1836155

readLastToolResult clicked .last() before the interact result panel
existed (callInteract doesn't block), so it expanded the display_pdf
panel instead. Wait for the expected panel count first.

Also: basic-host renders the full CallToolResult JSON, with the state
double-escaped inside content[0].text. Parse instead of regex-matching.

playwright.config.ts: honor PW_CHANNEL env to use system Chrome locally
when the bundled chromium_headless_shell is broken.
@ochafik ochafik merged commit 4fc9513 into main Apr 2, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant