feat(pdf-server): get_viewer_state interact action#590
Merged
Conversation
New interact action that returns a JSON snapshot of the live viewer:
{currentPage, pageCount, zoom, displayMode, selectedAnnotationIds,
selection: {text, contextBefore, contextAfter, boundingRect} | null}.
The viewer already pushes selection passively via setModelContext as
<pdf-selection> tags, but not all hosts surface model-context. This gives
the model an explicit pull.
selection.boundingRect is a single bbox in PDF points (top-left origin,
y-down) so it can be fed straight back into add_annotations. selection is
null when nothing is selected or the selection is outside the text-layer.
Wiring: new PdfCommand variant -> processCommands case ->
handleGetViewerState -> submit_viewer_state (new app-only tool, mirrors
submit_save_data) -> waitForViewerState -> text content block.
Also fills a gap in the display_pdf description: it listed interact
actions but was missing save_as; added that and get_viewer_state.
e2e: two tests covering selection:null and a programmatic text-layer
selection.
@modelcontextprotocol/ext-apps
@modelcontextprotocol/server-basic-preact
@modelcontextprotocol/server-basic-react
@modelcontextprotocol/server-basic-solid
@modelcontextprotocol/server-basic-svelte
@modelcontextprotocol/server-basic-vanillajs
@modelcontextprotocol/server-basic-vue
@modelcontextprotocol/server-budget-allocator
@modelcontextprotocol/server-cohort-heatmap
@modelcontextprotocol/server-customer-segmentation
@modelcontextprotocol/server-debug
@modelcontextprotocol/server-map
@modelcontextprotocol/server-pdf
@modelcontextprotocol/server-scenario-modeler
@modelcontextprotocol/server-shadertoy
@modelcontextprotocol/server-sheet-music
@modelcontextprotocol/server-system-monitor
@modelcontextprotocol/server-threejs
@modelcontextprotocol/server-transcript
@modelcontextprotocol/server-video-resource
@modelcontextprotocol/server-wiki-explorer
commit: |
readLastToolResult clicked .last() before the interact result panel existed (callInteract doesn't block), so it expanded the display_pdf panel instead. Wait for the expected panel count first. Also: basic-host renders the full CallToolResult JSON, with the state double-escaped inside content[0].text. Parse instead of regex-matching. playwright.config.ts: honor PW_CHANNEL env to use system Chrome locally when the bundled chromium_headless_shell is broken.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New
interactactionget_viewer_statethat returns a JSON snapshot of the live viewer:{ "currentPage": 3, "pageCount": 12, "zoom": 126, "displayMode": "fullscreen", "selectedAnnotationIds": [], "selection": { "text": "the selected text", "contextBefore": "…up to 200 chars before…", "contextAfter": "…up to 200 chars after…", "boundingRect": { "x": 72.4, "y": 318.1, "width": 211.6, "height": 13.2 } } }selectionisnullwhen nothing is selected (or the selection isn't in the text-layer).boundingRectis in PDF points, top-left origin / y-down — same coord systemadd_annotationstakes, so the model can highlight what's selected without a second round-trip.Why: the viewer already pushes selection passively via
setModelContext(<pdf-selection>tags), but not all hosts surface model-context. This is an explicit pull.Wiring: new
PdfCommandvariant →processCommandscase →handleGetViewerState→ new app-onlysubmit_viewer_statetool (mirrorssubmit_save_data) →waitForViewerState→ text content block.Description drift fixed:
display_pdf's "follow-up actions go through interact" list was missingsave_as. Added that andget_viewer_state. Theinteractdescription itself already covered every enum action.Test Plan
npm run --workspace examples/pdf-server build✓npm test— 264 pass / 0 failpdf-annotations.spec.ts): two new testsselection: null,currentPage: 1,displayMode: "inline", numericpageCount/zoomselection.textmatches andboundingRectis present