Skip to content

feat: add plugin staged analysis pipeline#160

Merged
onlycastle merged 14 commits intomainfrom
feat/plugin-staged-analysis-pipeline-20260316
Mar 18, 2026
Merged

feat: add plugin staged analysis pipeline#160
onlycastle merged 14 commits intomainfrom
feat/plugin-staged-analysis-pipeline-20260316

Conversation

@onlycastle
Copy link
Copy Markdown
Owner

Summary

  • Add multi-stage analysis pipeline to the Claude Code plugin with decomposed skills, staged MCP tools, and multi-source session scanning
  • Introduce packages/shared for cross-package type definitions and scoring logic
  • Update web app API routes, dashboard, and CLI for plugin pipeline compatibility

Changes

  • feat(plugin): add staged analysis types and infrastructure - Core types, stage DB, hook utils
  • feat(plugin): add multi-source scanner and analysis engine - Session scanner, evaluation assembler, deterministic scoring
  • feat(plugin): add session hooks and MCP tools - Session-start hook, get-domain-results, get/save-stage-output tools
  • feat(plugin): add analysis skills and plugin config - 6 new decomposed analysis skills
  • build(plugin): update compiled dist output - Rebuilt for marketplace compatibility
  • feat: add shared package and test infrastructure - Cross-package schemas, scoring, test fixtures + unit tests
  • refactor: update web app and CLI for plugin analysis pipeline - API routes, dashboard, orchestrator updates
  • docs: update architecture and plugin documentation - Architecture, deployment, plugin docs

Test Plan

  • TypeScript compilation passes (npx tsc --noEmit)
  • Plugin dist builds cleanly (npm run build)
  • Unit tests pass (npm test)
  • Manual verification of plugin analysis flow

Generated with Claude Code using /ship-it

onlycastle and others added 8 commits March 16, 2026 23:41
Add core type definitions, stage database, hook utilities, debounce
helper, and results database updates for the multi-stage analysis
pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add multi-source session scanner, evaluation assembler, and update
data extractor, deterministic scorer/mapper, background analyzer,
and report template for the staged analysis pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add session-start hook, get-domain-results, get-stage-output, and
save-stage-output MCP tools. Update existing hooks and tools for
the staged analysis pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add decomposed analysis skills (classify-type, generate-weekly-insights,
summarize-projects, summarize-sessions, translate-report, verify-evidence)
and update existing skills and plugin manifest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rebuild dist/ to match source changes for marketplace compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add packages/shared for cross-package type definitions and utilities.
Add plugin parity test fixtures and unit tests for API and plugin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update API routes, dashboard pages, analysis orchestrator, evaluation
assembler, and local database to support the staged plugin pipeline.
Update CLI index and uploader for compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update agent architecture docs, deployment guide, plugin docs, and
user flows for the staged analysis pipeline. Update AGENTS.md and
README.md with new plugin capabilities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@onlycastle
Copy link
Copy Markdown
Owner Author

Code Review: PR #160 -- Plugin Staged Analysis Pipeline

Reviewed 80+ source files across plugin core, shared package, web app, CLI, hooks, MCP tools, skills, and tests. Focused on CLAUDE.md compliance, bug scan, security, and pattern adherence.


What Was Done Well

  • Clean architectural decomposition. Extracting scoring, type-mapping, and evaluation assembly into @betterprompt/shared is the right move for cross-package parity. The plugin types file now re-exports from shared, keeping a single source of truth.
  • Deferred queue pattern is sound. The SessionEnd hook marks analysis as "pending" instead of spawning a detached process, and SessionStart injects context so the next Claude Code session runs the pipeline. This sidesteps the "no LLM host" problem cleanly.
  • Good test coverage for new code. The session hooks lifecycle test, canonical analysis assembly test, parity fixture test, and get-domain-results test cover the critical data paths. The parity fixture approach (frozen JSON fixtures validated against both plugin and server assembly) is particularly strong.
  • No Fallback Policy compliance in the critical data path. `extractPhase1DataFromParsedSessions` throws when no sessions are provided. `extract_data` tool re-throws errors after calling `markAnalysisFailed`. The background analyzer is properly deprecated with an explicit throw.
  • Zod validation at MCP boundaries. `save_stage_output` and `save_domain_results` both validate inputs against shared schemas before persisting. This prevents corrupted data from reaching the DB.
  • English-only compliance. All code, comments, prompts, and documentation are in English. The only non-English content is test fixture translation data, which is correct (testing Korean translation overlay).

Issues Found

Important (Should Fix)

1. `readCachedParsedSessions` silently returns empty array on error
File: `packages/plugin/lib/core/multi-source-session-scanner.ts:78-83`

export async function readCachedParsedSessions(): Promise<ParsedSession[]> {
  try {
    const raw = await readFile(PARSED_SESSIONS_CACHE, 'utf-8');
    return JSON.parse(raw) as ParsedSession[];
  } catch {
    return []; // <-- silent fallback
  }
}

This catches ALL errors (corrupt JSON, permission denied, out of memory) and returns an empty array. The downstream `extract_data` tool checks for `sessions.length === 0` and returns a user-facing "no data" message, which masks the real error. If the cache file exists but is corrupted, the user sees "No cached parsed sessions. Call scan_sessions first" instead of learning about the corruption.

The CLAUDE.md No Fallback Policy says errors must propagate. The "file not found" case is the only expected error here. Consider narrowing the catch to distinguish file-not-found (return `[]`) from unexpected errors (rethrow).

2. Sync route auth weakness: Bearer token used as email lookup without verification
File: `app/api/analysis/sync/route.ts:274-283`

if (authHeader?.startsWith('Bearer ')) {
  const token = authHeader.slice(7);
  const userByEmail = findUserByEmail(token);
  userId = userByEmail?.id ?? getCurrentUserFromRequest().id;
}

The Bearer token is treated as a raw email string for user lookup. If someone sends `Authorization: Bearer admin@company.com`, they can write analysis records as that user. This bypasses actual authentication. The comment says "Token could be an email or a user ID" but there is no cryptographic verification of either. For a self-hosted deployment this may be acceptable, but the fallback to `getCurrentUserFromRequest().id` when the email lookup fails means a typo doesn't fail -- it silently writes to the wrong user.

3. Dual DB singleton risk between `stage-db.ts` and `results-db.ts`
Files: `packages/plugin/lib/stage-db.ts:18` and `packages/plugin/lib/results-db.ts:28`

Both modules open independent `better-sqlite3` connections to the same `results.db` file. While WAL mode permits concurrent readers, having two writer connections to the same DB file from the same process can cause SQLITE_BUSY if writes overlap. The `stage_outputs` table has a foreign key to `analysis_runs`, so the `results-db` must create the run before `stage-db` can write to it. If the connections are not coordinated, the foreign key might fail or a write-write collision could occur.

Consider sharing a single DB connection between the two modules, or at minimum documenting the ordering constraint.

4. `save_domain_results` schema has `severity` enum mismatch
File: `packages/shared/src/schemas/domain-result.ts:39` defines severity as `z.enum(['critical', 'high', 'medium', 'low'])`.
File: `app/api/analysis/sync/route.ts:36` defines the plugin interface severity as `'low' | 'medium' | 'high'`.

The legacy `PluginDomainResult` interface in the sync route is missing `'critical'` from its severity union. This means if a plugin sends a domain result with `severity: 'critical'`, the sync route's TypeScript types will accept it at runtime (since it's not validated against the interface), but the intent mismatch could cause confusion. This is a minor type-level inconsistency rather than a runtime bug, since the sync route uses the canonical run path now, but the legacy interface should be updated for correctness.


Suggestions (Nice to Have)

5. `debounce.ts` silent catches for filesystem operations are justified but undocumented
The session counting functions (`countClaudeSessions`, `countCursorSessions`, etc.) swallow errors for missing directories. This is intentional (the directories may not exist on all machines), but the No Fallback Policy section of CLAUDE.md could trip up future reviewers. Consider adding a brief inline comment referencing the policy exception, e.g., `// Justified silent fallback: directory may not exist on this platform`.

6. `readState()` returns `DEFAULT_STATE` on parse error
File: `packages/plugin/lib/debounce.ts:72-78`

If the state file is corrupted, this returns default state rather than surfacing the problem. This is a pragmatic choice (the state file is ephemeral), but worth a comment explaining why this is an exception to the No Fallback Policy.

7. Consider adding input schema to `get_stage_output` tool registration
File: `packages/plugin/mcp/server.ts:170-172`

The `get_stage_output` tool accepts an optional `stage` string parameter, but unlike `save_stage_output` there is no enum constraint on the stage name. Adding a Zod enum validation matching `STAGE_NAMES` would give the LLM better guidance on valid stage names.

8. `generate_report` HTTP server serves all reports from `latest.html` on refresh
File: `packages/plugin/mcp/tools/generate-report.ts:100-105`

After the first request, subsequent requests re-read `latest.html` from disk. If another analysis completes and overwrites `latest.html`, refreshing the page shows the new report instead of the originally generated one. This is probably the intended behavior but worth confirming.

9. Gemini schema nesting depth compliance -- confirmed OK
The shared schemas in `packages/shared/src/schemas/` respect the 4-level object nesting limit. The deepest path I found is: `Phase1Output > sessionMetrics > frictionSignals` (3 object levels) and `CanonicalAnalysisRun > stageOutputs > weeklyInsights > stats` (4 object levels). Arrays with nested objects don't count toward the limit per CLAUDE.md. No violations found.

10. Structured JSON convention -- confirmed OK
No pipe-delimited or semicolon-delimited string fields were introduced in any schema. All data fields use proper structured JSON types.


Architecture Summary

The PR successfully decomposes the monolithic analysis pipeline into a staged plugin-local architecture. The key data flow change is:

Before: SessionEnd hook -> spawn background process -> upload raw sessions to server -> server runs Gemini pipeline -> results stored on server

After: SessionEnd hook -> mark "pending" in state file -> next SessionStart injects context -> Claude Code runs local skill-based pipeline stage by stage -> results stored in local SQLite -> optional sync to team server

The `@betterprompt/shared` package becomes the canonical schema authority, with both the plugin and the web app importing from it. The evaluation assembly logic that previously lived only in the server is now shared, enabling the parity test fixtures to verify that plugin-assembled evaluations match server-assembled ones.

The 8 atomic commits are well-structured, each with a clear scope (core types, analysis engine, hooks/tools, skills, dist, shared package, web app/CLI, docs).


Verdict

Approve with reservations. Issues #1 (silent fallback in cache reader), #2 (auth weakness in sync route), and #3 (dual DB singletons) should be addressed. None are blocking for a merge if the team is aware, but #2 especially should get a tracking issue if not fixed in this PR.

onlycastle and others added 2 commits March 16, 2026 23:54
1. No Fallback: readCachedParsedSessions now only returns [] for ENOENT
   (cache miss), all other errors propagate per No Fallback Policy.

2. Auth: sync route validates BETTERPROMPT_AUTH_TOKEN as shared secret
   instead of accepting raw email as Bearer token. User identity moved
   to X-User-Email header, gated behind token validation.

3. Dual DB: stage-db now reuses the shared connection from results-db
   instead of opening an independent connection to the same file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The @betterprompt/shared package dist/ is gitignored (root dist/ rule)
so CI has no compiled output to resolve. Add a workspace build step
after npm ci to compile shared types before typecheck, tests, and build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@onlycastle
Copy link
Copy Markdown
Owner Author

Code Review

Found 6 issues (filtered from 19 raw findings across 5 parallel review agents, threshold ≥ 80/100):


1. Auth bypass when BETTERPROMPT_AUTH_TOKEN is not configured (Score: 90 — Security)

When the env var is unset, the token check at line 278 is skipped entirely, but X-User-Email header resolution still runs at line 289. Any unauthenticated caller who knows a valid email can store analysis results under another user's account. Fix: gate X-User-Email identity resolution behind the token check, or require the token in all deployments.

if (expectedToken) {
const providedToken = authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null;
if (!providedToken || providedToken !== expectedToken) {
return NextResponse.json(
{ error: 'unauthorized', message: 'Invalid or missing auth token.' },
{ status: 401 },
);
}
}
// Resolve user: check for email in X-User-Email header, fall back to local user
const emailHeader = request.headers.get('x-user-email');
let userId: string;
if (emailHeader) {
const userByEmail = findUserByEmail(emailHeader);
userId = userByEmail?.id ?? getCurrentUserFromRequest().id;
} else {
userId = getCurrentUserFromRequest().id;
}


2. force: true recovery on SessionEnd corrupts mid-analysis state (Score: 85 — Logic Bug)

handleSessionEndHook calls recoverStaleAnalysisState({ force: true }) unconditionally at line 92. Because force: true bypasses the 30-minute staleness window, any running analysis state is immediately transitioned to failed — even if /analyze is actively running. This corrupts the analysis lifecycle if a sub-session ends while analysis is in-flight.

deps.recoverStaleAnalysisState({
force: true,
reason: 'Recovered stale running state on SessionEnd hook startup.',
});


3. write-content.md prompt references fields that don't exist in schema (Score: 85 — Prompt-Schema Desync)

The "Focus Area Constraints" section references impactScore and relatedDomains (lines 114-115), but TopFocusAreaSchema in packages/shared/src/schemas/stage-outputs.ts only defines title, description, relatedQualities, and actions. The JSON example was updated but the constraints prose was not. (CLAUDE.md: "Prompt–Schema Constraint Sync")

- `impactScore` (0.0-1.0): How much improvement the developer could see by addressing this area
- `relatedDomains`: Array of domain keys that contributed evidence to this focus area


4. DomainGrowthArea.recommendation has .min(50) with no matching prompt instruction (Score: 80 — CLAUDE.md Violation)

The schema defines recommendation: z.string().min(50) but none of the five domain analysis skills include a corresponding "MINIMUM 50 characters" instruction for recommendation. (CLAUDE.md: "Every Zod constraint that affects content length MUST have a matching natural-language instruction in the worker's prompt")

recommendation: z.string().min(50),


5. PluginDomainResult interface missing 'critical' severity (Score: 80 — Type Inconsistency, Prior Review)

The local interface defines severity: 'low' | 'medium' | 'high' but the canonical DomainGrowthAreaSchema in shared includes 'critical'. This was flagged in the prior review (issue #4) and remains unaddressed.

severity: 'low' | 'medium' | 'high';


6. CLAUDE.md Commands section lists stale CLI command (Score: 80 — Doc Accuracy)

npx betterprompt-cli --auto is listed as "Non-interactive analysis (skips all prompts)" but the CLI now only prints a plugin migration notice and exits with code 1. The --auto flag is not recognized in the new main().

npx betterprompt-cli --auto # Non-interactive analysis (skips all prompts)


13 additional findings below threshold (score < 80)
Score Category Summary
75 previous-feedback readState() silent fallback needs No Fallback Policy comment
75 previous-feedback countClaudeSessions/countCursorSessions silent catches need inline comment
60 previous-feedback get_stage_output missing z.enum constraint for stage param
60 history-context Missing gzip decode in sync route (old CLI clients)
50 claude-md Korean example values in translate-report.md (contextually intentional)
50 bug ensureColumn SQL injection sink (latent, all callers hardcoded)
50 bug Cursor state.vscdb lock contention causes undercounted sessions
50 code-comments sync/route.ts type attribution comment points to re-export barrel
40 history-context uploader.ts URL normalization is unreachable dead code
40 code-comments uploadForAnalysis JSDoc describes removed streaming flow
40 code-comments ~200 lines dead code in uploader.ts with stale comments
40 code-comments runAnalysis() dead code with stale JSDoc
80 code-comments CLAUDE.md GOOGLE_GEMINI_API_KEY "Required" may be stale (nuanced)

Generated with Claude Code using /ship-it

If useful, react with a thumbs-up. Otherwise, thumbs-down.

onlycastle and others added 4 commits March 17, 2026 15:57
When BETTERPROMPT_AUTH_TOKEN is not configured, the X-User-Email header
was still trusted for identity resolution, allowing unauthenticated
callers to store analysis results under any known user's account.
Now X-User-Email is only read after successful token validation.

Also adds missing 'critical' severity to PluginDomainResult interface
to match the canonical DomainGrowthAreaSchema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
force: true bypassed the 30-minute staleness check, causing any
running analysis to be immediately marked as failed when a sub-session
ended. Changed to force: false so only genuinely stale states are
recovered, preventing corruption of in-flight analysis runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- write-content.md: replace stale impactScore/relatedDomains references
  with relatedQualities/actions to match TopFocusAreaSchema
- All 5 analyze-*.md skills: add severity and recommendation field
  instructions (MINIMUM 50 characters) to match DomainGrowthAreaSchema

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CLI now prints a plugin migration notice and exits. The --auto flag
is no longer recognized by the new main() entry point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@onlycastle onlycastle merged commit 50ae1f3 into main Mar 18, 2026
1 check passed
@onlycastle onlycastle deleted the feat/plugin-staged-analysis-pipeline-20260316 branch March 18, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant