feat: @elasticpath/plasmic-mcp-registry + dev host sync + MCP hardening#150
Merged
Conversation
Renumbered P10-P14: Dev Host Variant Sync (was P14) promoted to P10 top priority. Eval runner robustness, grader quality, scenario coverage, and infrastructure shifted to P11-P14.
…ant discovery The Plasmic persisted project bundle does not contain code component variant data (e.g., "Selected", "Disabled" states). Studio gets this by connecting to the dev host via iframe, but the MCP has no browser. This adds an HTTP-based sync that fetches variant metadata from a running dev host on project.set and project.refresh. New package: @elasticpath/plasmic-registry - Reads globalThis.__PlasmicComponentRegistry (same global @plasmicapp/host writes to) - Serializes component metadata, stripping functions and React elements - Zero runtime dependencies, works in Node.js and browser - 21 unit tests (serialize + read-registry) Dev host API route: plasmicpkgs-dev/app/api/plasmic-registry - Server-compatible registration file (plasmic-init-server.ts) - GET /api/plasmic-registry returns full serialized component metadata MCP sync module: packages/plasmic-mcp/src/devhost-sync.ts - fetchDevHostRegistry(): HTTP fetch with 5s timeout, non-fatal on failure - syncVariantMetadata(): populates codeComponentMeta.variants on matching CCs - ensureVariantObjects(): creates Variant objects on wrapper components - syncFromDevHost(): orchestrator called from project.set and project.refresh - Flexible $dev suffix matching for component names - 22 unit tests covering fetch, sync, variant creation, and full flow Session/types extended with hostUrl, devHostSynced, syncedVariantComponents.
Add 65 new eval scenarios covering all 103 MCP actions (previously 38/103). Coverage now ~98% across all 8 STRAP domains: - P11.1: 11 node scenarios (remove, move, clone, reorder, rich-text, attrs, visibility, image, detach-mixin, add/remove-animation) - P11.2: 16 design scenarios (remove/duplicate-token, list/update/remove-mixin, list/update/remove-animation, list/create/update/remove-theme, set-active-theme, list/rename/remove-asset) - P11.3: 12 data scenarios (update/remove-query, list/create/update/remove-data-token, list/create/update/remove-split, get-code-meta, list-functions) - P11.4: 10 component scenarios (delete, convert-to-page, convert-to-component, list/add/update/remove-prop, list/add/remove-state) - P11.5: 8 variant scenarios (list/create/add/remove-global-group, rename-global, update-screen, rename, remove) - P11.6: 5 inspect scenarios (subtree, export, style-properties, preview-url, page-meta) - P11.7: 2 interaction scenarios (update, remove) - P11.8: 1 project scenario (set) - P11.9: INDEX.md regenerated (66 simple + 50 medium + 19 complex)
…ngs, data loss, and misleading results P12.1: Tool execution timeout — onToolCall wrapped in Promise.race with remaining wall-clock countdown. Hanging Plasmic API calls no longer block the entire eval process indefinitely. P12.2: saveReport fallback — writeFileSync wrapped in try/catch with stderr JSON dump. Disk-full or permission errors no longer silently lose all eval results. P12.3: Visual capture wall-clock cap — capture() wrapped in 30s Promise.race. Inner logic extracted to captureInner() so the timeout covers navigation + screenshot + mobile capture end-to-end. P12.4: InMemoryTransport server cleanup — server.close() called before nulling in close(). MCP server and transport resources no longer leak across scenarios. P12.5: MAX_TURNS exhaustion flag — new maxTurnsExhausted boolean on ConversationResult. Runner checks it alongside timedOut/incomplete. The 25-turn limit exit no longer silently looks like success. P12.6: Playwright tracing unbounded growth — new resetTracing() method stops and restarts tracing after each successful capture. Trace buffer no longer grows for the entire eval run. P12.7: console.error suppression leak — dynamic imports and server creation wrapped in try/finally. console.error is always restored even when initialization throws. P12.8: desktopPath on screenshot failure — desktopPath initialized as null in the nav-failure path, only set after successful screenshot write. The LLM judge no longer reads nonexistent files. 5 new tests: tool call timeout, MAX_TURNS exhaustion (3 cases), runner maxTurnsExhausted handling. Total: 1317 unit + 137 integration = 1454.
…able eval results P13.1: Property grader coerces style values to String() before .toLowerCase() — numeric values (e.g., lineHeight: 1.5) no longer throw TypeError. P13.2: Existence grader uses exact name matching by default (params.exact !== false) across all entity types (component, page, node, token, variant, mixin). Prevents false positives where "Card" would incorrectly match "CreditCard". P13.3: Existence grader searches only the relevant list — entityType "page" searches data.pages only, "component" searches data.components only. Prevents a component named "Contact" from satisfying a page existence check. P13.4: Tool-params grader uses exact string matching by default. Pass substring: true for cases where substring matching is intentional. Prevents "red" from matching "bordered". P13.5: LLM judge wraps readFileSync in try/catch — race conditions or permission errors no longer crash the judge. P13.6: LLM judge score regex uses \d+ instead of \d — "SCORE: 10" is now correctly rejected instead of being parsed as score 1. P13.7: Review-flags always adds low-quality when qualityScore <= 2, independently of judge-disagrees. Before this fix, success=true + score=1 would get only judge-disagrees but never low-quality. P13.8: loadPreviousReport validates Array.isArray(report.scenarios) — a report with missing or non-array scenarios no longer crashes applyReviewFlags. P13.9: Data grader validates name, queryType, and event — not just count. A scenario asking for "add a REST query named fetchUsers" now validates the specific query exists. New eval-llm-judge.test.ts (16 tests) for parseJudgeResponse and formatTranscriptForJudge. Total: 1356 unit tests across 29 suites.
…iable eval runs P14.1: Partial re-run report merging — skipped-as-passed results from previous report are merged into newly-run results, giving accurate full-suite success rates. P14.2: Dirty-tree detection — getGitSha() appends "-dirty" when working tree has uncommitted changes, preventing incorrect skip logic on dirty trees. P14.3: Scenario content hashing — SHA256 hash of scenario content stored in reports. Resume/skip compares hashes so modified scenarios are re-run, not skipped. P14.4: CLI argument validation — --tier validated against known values with clear error; unrecognized flags produce warnings instead of being silently ignored. P14.5: Scenario validator now loads both mock and integration scenarios, closing the gap where integration-only scenarios were never validated by eval:validate. P14.6: Regression detection flag — "regression" review flag fires when a scenario was passing in the previous run but now fails. P14.7: High-retry-count flag — "high-retries" review flag fires when a passing scenario required >3 error-retry cycles, surfacing fragile scenarios. 25 new tests added across 3 test files. All 1381 unit tests pass.
…ved, 21 new tests - Fix #22: resolveComponentUuid exact matching (matchEntityName instead of .includes) - Fix #19: API client session state leak (clearSessionState on project.set) - Fix #27: undo stack bounded to MAX_UNDO_DEPTH=50 (oldest dropped on overflow) - Fix #24: MODEL_PRICING versioned IDs (claude-sonnet-4, claude-haiku-4, claude-opus-4) - Fix #26: reporter test coverage (saveReport, printSummary, loadOverrides, saveOverride) 21 new tests across 4 files. All 1402 unit tests pass. Build and typecheck clean.
findWrapperComponents() checked _type but real WAB model instances use a typeTag getter. Changed to typeTag ?? _type fallback pattern (matching edit-tools.ts convention). Added 10 integration tests against real WAB model classes verifying syncVariantMetadata, ensureVariantObjects, listVariants, and resolveVariant work with MobX-observed instances.
flattenWithPaths now tracks visited UUIDs with a Set and enforces a MAX_TREE_DEPTH=200 limit, preventing infinite recursion on malformed or corrupted WAB models (#23). 5 new tests cover self-referencing nodes, multi-node cycles, diamond graphs, and UUID-less node traversal. Created evals/.env.example documenting all required and optional environment variables for the eval system (#25), improving onboarding for new developers who previously had to read cli.ts source code. All 1407 unit + 147 integration tests pass.
Fix @elasticpath/plasmic-registry not in Yarn workspaces — the API route import would fail at runtime with module-not-found. Add package to root workspaces and plasmicpkgs-dev dependencies. Add fetchDevHostRegistry() timeout (AbortError) unit test. Add 6 API route handler tests in plasmicpkgs-dev covering response shape, variant data, serialization safety, empty registry, and error handling.
…on gap (P19) Spec-vs-implementation audit found skill docs had zero mention of dev host sync, preventing users from discovering CC variant styling prerequisites. Also fixed plasmic-init-server.ts missing registerShopify that client had. - plasmic.md: project.set/refresh docs + new Dev Host Variant Sync section - plasmic-edit.md: variant workflow guidance for CC variant troubleshooting - plasmic-inspect.md: variant.list note about dev host sync requirement - plasmic-init-server.ts: add registerShopify matching client registration - README.md: expand project.refresh re-sync behavior documentation
…ions (P20) Server handler test coverage was at ~37% (161/223 tests covering ~55/103 actions). Added 62 new tests across all 8 STRAP domains: component props/states (9 actions), node (7 actions), design (17 actions), data (9 actions), interaction (4 actions), variant globals (5 actions). Also added 47 missing edit-tools mock declarations, devhost-sync mock module, and fixed syncFromDevHost mock to return proper SyncResult shape. All 1617 tests pass across 31 suites.
Replace err:any with err:unknown type-narrowing in 16 catch blocks across server.ts, api-client.ts, save-manager.ts, and edit-tools.ts. Non-Error thrown values now produce readable messages instead of "undefined". Add undo-manager rollback when save fails after applying in-memory undo. Fix session recovery after failed reload in create-page/create/clone handlers. Distinguish ENOENT from JSON parse errors in auth file reading. Add process-level catch on main(). 6 new tests. All 1623 tests pass.
Spec audit found 3 gaps between the dev host variant sync spec and implementation. Fixed spec field name (syncedComponents → syncedVariantComponents) to match code. Added 2 unit tests verifying the full updateStyles → resolveVariant → CC variant path by key name and display name. Added 2 integration tests using synthetic wrapper components to exercise ensureVariantObjects creation and idempotency where the fixture lacks TplComponent-rooted wrappers. All 1478 unit + 149 integration tests pass.
…n, and security (P23) Session state preservation: create-page/create/clone model reload now calls syncFromDevHost and passes hostUrl/devHostSynced/syncedVariantComponents to setSession, preventing silent loss of dev host variant data after component creation. Null-safe revision handling: project.undo and project.end-batch now use optional chaining on result.save?.revisionNum, preventing TypeError when save result is null. Input validation: Five update handlers (update-mixin, update-animation, update-data-token, update-split, interaction.update) now require at least one updatable field. component.extract now validates non-empty name. Security: inspect.export sanitizes componentUuid to [a-zA-Z0-9_-] before constructing temp file path, preventing path traversal. Response consistency: component.clone error uses JSON format, convert-to-page and convert-to-component include message field, variant.list wraps result with componentUuid/componentName, data.list-queries uses standard error format. 16 new tests. All 1494 unit + 149 integration tests pass.
Fix 13 bugs found via comprehensive audit: removeToken only replacing first token occurrence, isAncestorOf missing slot traversal allowing cycles, setImage crash on empty vsettings and CSS injection via unescaped URLs, dead code unreachable non-serializable detection, missing parameter validation in node.add/set-image/update-token, missing requireSession in end-batch/undo, obsolete tool name references across 6 files, deriveLayoutType ignoring reverse flex directions, and devhost-sync crash on malformed variant data. 12 new tests. All 1655 tests pass.
…stry support (P25)
Renames packages/plasmic-registry to packages/plasmic-mcp-registry and
adds readers for all five Plasmic globalThis registries: components,
contexts, functions, tokens, and traits.
New serializers (serializeContextMeta, serializeFunctionMeta) strip
non-serializable fields via JSON roundtrip. New readers (getContextRegistry,
getFunctionRegistry, getTokenRegistry, getTraitRegistry) follow the same
defensive pattern as the existing component reader. getFullRegistry()
returns all five in one call.
Adds withPlasmicRegistry() Next.js config wrapper that auto-detects
Plasmic packages and adds them to serverExternalPackages to prevent
RSC boundary errors.
Updates plasmicpkgs-dev route to call getFullRegistry() and return the
full { components, contexts, functions, tokens, traits } response shape.
75 tests in plasmic-mcp-registry (54 new), 6 in plasmicpkgs-dev (updated),
1655 in plasmic-mcp (all passing).
…x (P26) Parse all five Plasmic registries (components, contexts, functions, tokens, traits) in fetchDevHostRegistry with backward compatibility for old endpoints. Add in-memory cache with 60s default TTL (configurable via PLASMIC_REGISTRY_CACHE_TTL_MS) to avoid redundant fetches across 5 call sites. Cache is cleared explicitly on project.refresh via clearRegistryCache(). Fix getCodeComponentVariantMetas to use typeTag ?? _type pattern, matching devhost-sync.ts, so variant resolution works on real WAB model instances. Wrap plasmicpkgs-dev/next.config.js with withPlasmicRegistry() to auto-detect and externalize Plasmic packages for serverExternalPackages.
Store full registry data (contexts, functions, tokens, traits) in the session after each dev host sync, making it available to MCP tool handlers. Previously the data was fetched but discarded after variant sync. Enrichments: - design.list-tokens: includes devHostTokens from registered packages - data.list-functions: includes devHostFunctions from registered packages - project.set/refresh: includes devHostRegistry summary with counts for contexts, functions, tokens, and traits Also marks P5 defensive JSON handling as complete (already implemented).
… on node.add (P28) Fix registryData being silently dropped from session after component.create-page, component.create, and component.clone operations. All three handlers now preserve registryData in their setSession calls, matching the existing correct pattern in project.set and project.refresh. When adding code component instances via node.add, the MCP server now looks up the component in session.registryData to apply defaultStyles (e.g. width, padding) and validate parentComponentName constraints. Warnings are surfaced non-fatally in the JSON response. This enables the AI model to create properly styled component instances and receive guidance about component placement rules.
…y asymmetry (P29) When plasmicElementToTpl creates a TplComponent, it now iterates registry props for slot-type entries with defaultValue. For each slot without explicit content, defaultValue PlasmicElement trees are recursively converted to TplNodes and wired as Arg+RenderExpr in the base variant setting. This ensures components like Button or Card render with meaningful placeholder content out of the box. Also fixes missing registerShopify(PLASMIC) call in plasmic-init-client.tsx that was imported but never invoked, causing Shopify components to appear in the registry API but not in the canvas host. 5 new tests for slot defaultValue population covering: basic population, named slots, explicit children priority, missing model slots, and non-slot prop handling.
…nctions enrichment (P30) Replace Record<string, unknown> type widening with spread pattern in design.list-tokens and data.list-functions handlers. The interface ListCustomFunctionsResult lacks an index signature, causing tsc --noEmit to fail. Both handlers now build enriched results via object spread, preserving type safety while allowing optional devHost* fields.
…k (P31) Replace `session.registryData?: any` with `FullRegistryData | null` via strongly-typed interfaces (RegistryComponent, RegistryContext, RegistryFunction, RegistryToken, RegistryTrait) that mirror canonical types from the registry package without adding a runtime dependency. This eliminates cascading `as any` casts in server.ts, edit-tools.ts, and devhost-sync.ts. Add PLASMIC_DEV_HOST_URL environment variable as fallback when the Plasmic project has no configured hostUrl, completing the spec requirement for env-based dev host configuration. Project settings always take priority.
Remove 18 as-any casts across the two largest source files: - server.ts: 5 redundant parameter casts removed, toggle validation added for variant.create-global-group - edit-tools.ts: 6 PlasmicElement union property casts removed (TS narrows correctly), EventHandler cast narrowed from any to typed, 5 readonly array includes patterns fixed Zero as-any casts remain in server.ts and edit-tools.ts. 1701 tests passing, typecheck and build clean.
P33: data.get-code-meta now enriches response with devHostMeta containing the full component registration (typed props, variants, defaultStyles, parentComponentName) when a matching registry component is found. Uses $dev suffix-aware name matching. P34: project.get-meta now includes devHostContexts (global context providers with props and globalActions) and devHostTraits when registry data is available. Previously these were synced and counted but never surfaced in any tool response. Also removes the last as-any cast in production code (model-loader.ts narrowToSite) — zero as-any across all production source files. 1710 tests passing (31 suites), build and typecheck clean.
…istry - Restore plasmic-register.ts with EP commerce registration (registerElasticPath) - Restore plasmic-init-client.tsx to use @/plasmic-register side-effect import - Remove redundant plasmic-init-server.ts (plasmic-register.ts serves both) - Fix package.json: file: link for registry, restore EP dep and --port 3001 - Update route.ts to use @/plasmic-register + getFullRegistry() - Add @/ path alias to vitest.config.ts for Next.js-style imports - Update test mock from plasmic-init-server to plasmic-register - Update PROMPT_plan.md for registry package scope
- Add capture.ts with withRegistryCapture() that wraps PLASMIC loader to also call @plasmicapp/host registration functions on the server, populating globalThis registries for the MCP API route - Split webpack externals from serverExternalPackages patterns: @plasmicapp/host and @plasmicapp/query use serverExternalPackages only (preserves shared React for SSR), while @plasmicpkgs/* and @elasticpath/plasmic-* use both (webpack externals needed for monorepo packages where serverExternalPackages has no effect) - Export registerAllPackages() from plasmic-register.ts so the API route can re-register with captured PLASMIC without polluting the shared registration file - Add @plasmicapp/host as peer dependency of plasmic-mcp-registry
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@elasticpath/plasmic-mcp-registry— new package that reads all five PlasmicglobalThisregistries (components, contexts, functions, tokens, traits) and serializes metadata for HTTP transport. IncludeswithPlasmicRegistry()Next.js config wrapper andwithRegistryCapture()for server-side registration.project.set/project.refresh, enablingvariant.listandnode.update-stylesfor CC variant statesKey packages
@elasticpath/plasmic-mcp-registry(packages/plasmic-mcp-registry/)Reads
globalThis.__PlasmicComponentRegistryand four other registries, strips non-serializable fields (functions, React elements), and returns clean JSON.Three exports:
@elasticpath/plasmic-mcp-registry) —getFullRegistry(),withRegistryCapture(), serialization helpers@elasticpath/plasmic-mcp-registry/next) —withPlasmicRegistry()config wrapperwithPlasmicRegistry()solves RSC boundary errors via two mechanisms:serverExternalPackages— for packages innode_modules(@plasmicapp/host,@plasmicapp/query)@plasmicpkgs/*,@elasticpath/plasmic-*) whereserverExternalPackageshas no effect (Next.js #48739)withRegistryCapture()wraps the PLASMIC loader so server-side registration calls also invoke@plasmicapp/host's functions (which populateglobalThis). The server loader's registration methods are noops forglobalThis— this wrapper is only needed in the API route.Consumer usage (
plasmicpkgs-dev/)plasmic-register.tsstays clean — no registry imports, just plain PLASMIC.MCP server improvements
project.set/project.refreshTest plan
cd packages/plasmic-mcp-registry && npx vitest— all 79 registry tests passcd packages/plasmic-mcp && npm test— all 1,655+ MCP server tests passcd plasmicpkgs-dev && npm run devthencurl localhost:3001/api/plasmic-registry— returns 66 components, 4 contexts, 6 functions/plasmic-hostloads without RSC errors or duplicate React warnings