feat: add plugin staged analysis pipeline by onlycastle · Pull Request #160 · onlycastle/BetterPrompt

onlycastle · 2026-03-17T06:43:29Z

Summary

Add multi-stage analysis pipeline to the Claude Code plugin with decomposed skills, staged MCP tools, and multi-source session scanning
Introduce packages/shared for cross-package type definitions and scoring logic
Update web app API routes, dashboard, and CLI for plugin pipeline compatibility

Changes

feat(plugin): add staged analysis types and infrastructure - Core types, stage DB, hook utils
feat(plugin): add multi-source scanner and analysis engine - Session scanner, evaluation assembler, deterministic scoring
feat(plugin): add session hooks and MCP tools - Session-start hook, get-domain-results, get/save-stage-output tools
feat(plugin): add analysis skills and plugin config - 6 new decomposed analysis skills
build(plugin): update compiled dist output - Rebuilt for marketplace compatibility
feat: add shared package and test infrastructure - Cross-package schemas, scoring, test fixtures + unit tests
refactor: update web app and CLI for plugin analysis pipeline - API routes, dashboard, orchestrator updates
docs: update architecture and plugin documentation - Architecture, deployment, plugin docs

Test Plan

TypeScript compilation passes (npx tsc --noEmit)
Plugin dist builds cleanly (npm run build)
Unit tests pass (npm test)
Manual verification of plugin analysis flow

Generated with Claude Code using /ship-it

Add core type definitions, stage database, hook utilities, debounce helper, and results database updates for the multi-stage analysis pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add multi-source session scanner, evaluation assembler, and update data extractor, deterministic scorer/mapper, background analyzer, and report template for the staged analysis pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add session-start hook, get-domain-results, get-stage-output, and save-stage-output MCP tools. Update existing hooks and tools for the staged analysis pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add decomposed analysis skills (classify-type, generate-weekly-insights, summarize-projects, summarize-sessions, translate-report, verify-evidence) and update existing skills and plugin manifest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rebuild dist/ to match source changes for marketplace compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add packages/shared for cross-package type definitions and utilities. Add plugin parity test fixtures and unit tests for API and plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update API routes, dashboard pages, analysis orchestrator, evaluation assembler, and local database to support the staged plugin pipeline. Update CLI index and uploader for compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update agent architecture docs, deployment guide, plugin docs, and user flows for the staged analysis pipeline. Update AGENTS.md and README.md with new plugin capabilities. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

onlycastle · 2026-03-17T06:47:29Z

Code Review: PR #160 -- Plugin Staged Analysis Pipeline

Reviewed 80+ source files across plugin core, shared package, web app, CLI, hooks, MCP tools, skills, and tests. Focused on CLAUDE.md compliance, bug scan, security, and pattern adherence.

What Was Done Well

Clean architectural decomposition. Extracting scoring, type-mapping, and evaluation assembly into @betterprompt/shared is the right move for cross-package parity. The plugin types file now re-exports from shared, keeping a single source of truth.
Deferred queue pattern is sound. The SessionEnd hook marks analysis as "pending" instead of spawning a detached process, and SessionStart injects context so the next Claude Code session runs the pipeline. This sidesteps the "no LLM host" problem cleanly.
Good test coverage for new code. The session hooks lifecycle test, canonical analysis assembly test, parity fixture test, and get-domain-results test cover the critical data paths. The parity fixture approach (frozen JSON fixtures validated against both plugin and server assembly) is particularly strong.
No Fallback Policy compliance in the critical data path. `extractPhase1DataFromParsedSessions` throws when no sessions are provided. `extract_data` tool re-throws errors after calling `markAnalysisFailed`. The background analyzer is properly deprecated with an explicit throw.
Zod validation at MCP boundaries. `save_stage_output` and `save_domain_results` both validate inputs against shared schemas before persisting. This prevents corrupted data from reaching the DB.
English-only compliance. All code, comments, prompts, and documentation are in English. The only non-English content is test fixture translation data, which is correct (testing Korean translation overlay).

Issues Found

Important (Should Fix)

1. `readCachedParsedSessions` silently returns empty array on error
File: `packages/plugin/lib/core/multi-source-session-scanner.ts:78-83`

export async function readCachedParsedSessions(): Promise<ParsedSession[]> {
  try {
    const raw = await readFile(PARSED_SESSIONS_CACHE, 'utf-8');
    return JSON.parse(raw) as ParsedSession[];
  } catch {
    return []; // <-- silent fallback
  }
}

This catches ALL errors (corrupt JSON, permission denied, out of memory) and returns an empty array. The downstream `extract_data` tool checks for `sessions.length === 0` and returns a user-facing "no data" message, which masks the real error. If the cache file exists but is corrupted, the user sees "No cached parsed sessions. Call scan_sessions first" instead of learning about the corruption.

The CLAUDE.md No Fallback Policy says errors must propagate. The "file not found" case is the only expected error here. Consider narrowing the catch to distinguish file-not-found (return `[]`) from unexpected errors (rethrow).

2. Sync route auth weakness: Bearer token used as email lookup without verification
File: `app/api/analysis/sync/route.ts:274-283`

if (authHeader?.startsWith('Bearer ')) {
  const token = authHeader.slice(7);
  const userByEmail = findUserByEmail(token);
  userId = userByEmail?.id ?? getCurrentUserFromRequest().id;
}

The Bearer token is treated as a raw email string for user lookup. If someone sends `Authorization: Bearer admin@company.com`, they can write analysis records as that user. This bypasses actual authentication. The comment says "Token could be an email or a user ID" but there is no cryptographic verification of either. For a self-hosted deployment this may be acceptable, but the fallback to `getCurrentUserFromRequest().id` when the email lookup fails means a typo doesn't fail -- it silently writes to the wrong user.

3. Dual DB singleton risk between `stage-db.ts` and `results-db.ts`
Files: `packages/plugin/lib/stage-db.ts:18` and `packages/plugin/lib/results-db.ts:28`

Both modules open independent `better-sqlite3` connections to the same `results.db` file. While WAL mode permits concurrent readers, having two writer connections to the same DB file from the same process can cause SQLITE_BUSY if writes overlap. The `stage_outputs` table has a foreign key to `analysis_runs`, so the `results-db` must create the run before `stage-db` can write to it. If the connections are not coordinated, the foreign key might fail or a write-write collision could occur.

Consider sharing a single DB connection between the two modules, or at minimum documenting the ordering constraint.

4. `save_domain_results` schema has `severity` enum mismatch
File: `packages/shared/src/schemas/domain-result.ts:39` defines severity as `z.enum(['critical', 'high', 'medium', 'low'])`.
File: `app/api/analysis/sync/route.ts:36` defines the plugin interface severity as `'low' | 'medium' | 'high'`.

The legacy `PluginDomainResult` interface in the sync route is missing `'critical'` from its severity union. This means if a plugin sends a domain result with `severity: 'critical'`, the sync route's TypeScript types will accept it at runtime (since it's not validated against the interface), but the intent mismatch could cause confusion. This is a minor type-level inconsistency rather than a runtime bug, since the sync route uses the canonical run path now, but the legacy interface should be updated for correctness.

Suggestions (Nice to Have)

5. `debounce.ts` silent catches for filesystem operations are justified but undocumented
The session counting functions (`countClaudeSessions`, `countCursorSessions`, etc.) swallow errors for missing directories. This is intentional (the directories may not exist on all machines), but the No Fallback Policy section of CLAUDE.md could trip up future reviewers. Consider adding a brief inline comment referencing the policy exception, e.g., `// Justified silent fallback: directory may not exist on this platform`.

6. `readState()` returns `DEFAULT_STATE` on parse error
File: `packages/plugin/lib/debounce.ts:72-78`

If the state file is corrupted, this returns default state rather than surfacing the problem. This is a pragmatic choice (the state file is ephemeral), but worth a comment explaining why this is an exception to the No Fallback Policy.

7. Consider adding input schema to `get_stage_output` tool registration
File: `packages/plugin/mcp/server.ts:170-172`

The `get_stage_output` tool accepts an optional `stage` string parameter, but unlike `save_stage_output` there is no enum constraint on the stage name. Adding a Zod enum validation matching `STAGE_NAMES` would give the LLM better guidance on valid stage names.

8. `generate_report` HTTP server serves all reports from `latest.html` on refresh
File: `packages/plugin/mcp/tools/generate-report.ts:100-105`

After the first request, subsequent requests re-read `latest.html` from disk. If another analysis completes and overwrites `latest.html`, refreshing the page shows the new report instead of the originally generated one. This is probably the intended behavior but worth confirming.

9. Gemini schema nesting depth compliance -- confirmed OK
The shared schemas in `packages/shared/src/schemas/` respect the 4-level object nesting limit. The deepest path I found is: `Phase1Output > sessionMetrics > frictionSignals` (3 object levels) and `CanonicalAnalysisRun > stageOutputs > weeklyInsights > stats` (4 object levels). Arrays with nested objects don't count toward the limit per CLAUDE.md. No violations found.

10. Structured JSON convention -- confirmed OK
No pipe-delimited or semicolon-delimited string fields were introduced in any schema. All data fields use proper structured JSON types.

Architecture Summary

The PR successfully decomposes the monolithic analysis pipeline into a staged plugin-local architecture. The key data flow change is:

Before: SessionEnd hook -> spawn background process -> upload raw sessions to server -> server runs Gemini pipeline -> results stored on server

After: SessionEnd hook -> mark "pending" in state file -> next SessionStart injects context -> Claude Code runs local skill-based pipeline stage by stage -> results stored in local SQLite -> optional sync to team server

The `@betterprompt/shared` package becomes the canonical schema authority, with both the plugin and the web app importing from it. The evaluation assembly logic that previously lived only in the server is now shared, enabling the parity test fixtures to verify that plugin-assembled evaluations match server-assembled ones.

The 8 atomic commits are well-structured, each with a clear scope (core types, analysis engine, hooks/tools, skills, dist, shared package, web app/CLI, docs).

Verdict

Approve with reservations. Issues #1 (silent fallback in cache reader), #2 (auth weakness in sync route), and #3 (dual DB singletons) should be addressed. None are blocking for a merge if the team is aware, but #2 especially should get a tracking issue if not fixed in this PR.

1. No Fallback: readCachedParsedSessions now only returns [] for ENOENT (cache miss), all other errors propagate per No Fallback Policy. 2. Auth: sync route validates BETTERPROMPT_AUTH_TOKEN as shared secret instead of accepting raw email as Bearer token. User identity moved to X-User-Email header, gated behind token validation. 3. Dual DB: stage-db now reuses the shared connection from results-db instead of opening an independent connection to the same file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The @betterprompt/shared package dist/ is gitignored (root dist/ rule) so CI has no compiled output to resolve. Add a workspace build step after npm ci to compile shared types before typecheck, tests, and build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

onlycastle · 2026-03-17T22:08:10Z

Code Review

Found 6 issues (filtered from 19 raw findings across 5 parallel review agents, threshold ≥ 80/100):

1. Auth bypass when BETTERPROMPT_AUTH_TOKEN is not configured (Score: 90 — Security)

When the env var is unset, the token check at line 278 is skipped entirely, but X-User-Email header resolution still runs at line 289. Any unauthenticated caller who knows a valid email can store analysis results under another user's account. Fix: gate X-User-Email identity resolution behind the token check, or require the token in all deployments.

BetterPrompt/app/api/analysis/sync/route.ts

Lines 278 to 297 in 5a098c9

    
           if (expectedToken) { 
        
             const providedToken = authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null; 
        
             if (!providedToken || providedToken !== expectedToken) { 
        
               return NextResponse.json( 
        
                 { error: 'unauthorized', message: 'Invalid or missing auth token.' }, 
        
                 { status: 401 }, 
        
               ); 
        
             } 
        
           } 
        
           // Resolve user: check for email in X-User-Email header, fall back to local user 
        
           const emailHeader = request.headers.get('x-user-email'); 
        
           let userId: string; 
        
           if (emailHeader) { 
        
             const userByEmail = findUserByEmail(emailHeader); 
        
             userId = userByEmail?.id ?? getCurrentUserFromRequest().id; 
        
           } else { 
        
             userId = getCurrentUserFromRequest().id; 
        
           }

2. force: true recovery on SessionEnd corrupts mid-analysis state (Score: 85 — Logic Bug)

handleSessionEndHook calls recoverStaleAnalysisState({ force: true }) unconditionally at line 92. Because force: true bypasses the 30-minute staleness window, any running analysis state is immediately transitioned to failed — even if /analyze is actively running. This corrupts the analysis lifecycle if a sub-session ends while analysis is in-flight.

BetterPrompt/packages/plugin/hooks/post-session-handler.ts

Lines 92 to 95 in 5a098c9

    
           deps.recoverStaleAnalysisState({ 
        
             force: true, 
        
             reason: 'Recovered stale running state on SessionEnd hook startup.', 
        
           });

3. write-content.md prompt references fields that don't exist in schema (Score: 85 — Prompt-Schema Desync)

The "Focus Area Constraints" section references impactScore and relatedDomains (lines 114-115), but TopFocusAreaSchema in packages/shared/src/schemas/stage-outputs.ts only defines title, description, relatedQualities, and actions. The JSON example was updated but the constraints prose was not. (CLAUDE.md: "Prompt–Schema Constraint Sync")

BetterPrompt/packages/plugin/skills/write-content.md

Lines 114 to 115 in 5a098c9

    
           - `impactScore` (0.0-1.0): How much improvement the developer could see by addressing this area 
        
           - `relatedDomains`: Array of domain keys that contributed evidence to this focus area

4. DomainGrowthArea.recommendation has .min(50) with no matching prompt instruction (Score: 80 — CLAUDE.md Violation)

The schema defines recommendation: z.string().min(50) but none of the five domain analysis skills include a corresponding "MINIMUM 50 characters" instruction for recommendation. (CLAUDE.md: "Every Zod constraint that affects content length MUST have a matching natural-language instruction in the worker's prompt")

BetterPrompt/packages/shared/src/schemas/domain-result.ts

Line 40 in 5a098c9

recommendation: z.string().min(50),

5. PluginDomainResult interface missing 'critical' severity (Score: 80 — Type Inconsistency, Prior Review)

The local interface defines severity: 'low' | 'medium' | 'high' but the canonical DomainGrowthAreaSchema in shared includes 'critical'. This was flagged in the prior review (issue #4) and remains unaddressed.

BetterPrompt/app/api/analysis/sync/route.ts

Line 35 in 5a098c9

severity: 'low' | 'medium' | 'high';

6. CLAUDE.md Commands section lists stale CLI command (Score: 80 — Doc Accuracy)

npx betterprompt-cli --auto is listed as "Non-interactive analysis (skips all prompts)" but the CLI now only prints a plugin migration notice and exits with code 1. The --auto flag is not recognized in the new main().

BetterPrompt/CLAUDE.md

Line 31 in 5a098c9

npx betterprompt-cli --auto # Non-interactive analysis (skips all prompts)

13 additional findings below threshold (score < 80)

Score	Category	Summary
75	previous-feedback	`readState()` silent fallback needs No Fallback Policy comment
75	previous-feedback	`countClaudeSessions`/`countCursorSessions` silent catches need inline comment
60	previous-feedback	`get_stage_output` missing `z.enum` constraint for stage param
60	history-context	Missing gzip decode in sync route (old CLI clients)
50	claude-md	Korean example values in `translate-report.md` (contextually intentional)
50	bug	`ensureColumn` SQL injection sink (latent, all callers hardcoded)
50	bug	Cursor `state.vscdb` lock contention causes undercounted sessions
50	code-comments	`sync/route.ts` type attribution comment points to re-export barrel
40	history-context	`uploader.ts` URL normalization is unreachable dead code
40	code-comments	`uploadForAnalysis` JSDoc describes removed streaming flow
40	code-comments	~200 lines dead code in `uploader.ts` with stale comments
40	code-comments	`runAnalysis()` dead code with stale JSDoc
80	code-comments	CLAUDE.md `GOOGLE_GEMINI_API_KEY` "Required" may be stale (nuanced)

Generated with Claude Code using /ship-it

_{If useful, react with a thumbs-up. Otherwise, thumbs-down.}

When BETTERPROMPT_AUTH_TOKEN is not configured, the X-User-Email header was still trusted for identity resolution, allowing unauthenticated callers to store analysis results under any known user's account. Now X-User-Email is only read after successful token validation. Also adds missing 'critical' severity to PluginDomainResult interface to match the canonical DomainGrowthAreaSchema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

force: true bypassed the 30-minute staleness check, causing any running analysis to be immediately marked as failed when a sub-session ended. Changed to force: false so only genuinely stale states are recovered, preventing corruption of in-flight analysis runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- write-content.md: replace stale impactScore/relatedDomains references with relatedQualities/actions to match TopFocusAreaSchema - All 5 analyze-*.md skills: add severity and recommendation field instructions (MINIMUM 50 characters) to match DomainGrowthAreaSchema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CLI now prints a plugin migration notice and exits. The --auto flag is no longer recognized by the new main() entry point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

onlycastle and others added 8 commits March 16, 2026 23:41

feat(plugin): add staged analysis types and infrastructure

f7da6b6

Add core type definitions, stage database, hook utilities, debounce helper, and results database updates for the multi-stage analysis pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(plugin): add session hooks and MCP tools

a35613d

Add session-start hook, get-domain-results, get-stage-output, and save-stage-output MCP tools. Update existing hooks and tools for the staged analysis pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

build(plugin): update compiled dist output

d813907

Rebuild dist/ to match source changes for marketplace compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add shared package and test infrastructure

26f7b46

Add packages/shared for cross-package type definitions and utilities. Add plugin parity test fixtures and unit tests for API and plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

onlycastle and others added 2 commits March 16, 2026 23:54

onlycastle and others added 4 commits March 17, 2026 15:57

docs: remove stale betterprompt-cli command from CLAUDE.md

bb53f01

CLI now prints a plugin migration notice and exits. The --auto flag is no longer recognized by the new main() entry point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

onlycastle merged commit 50ae1f3 into main Mar 18, 2026
1 check passed

onlycastle deleted the feat/plugin-staged-analysis-pipeline-20260316 branch March 18, 2026 00:11

onlycastle mentioned this pull request Mar 18, 2026

chore: remove CLI package and migrate scanner to plugin #162

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add plugin staged analysis pipeline#160

feat: add plugin staged analysis pipeline#160
onlycastle merged 14 commits intomainfrom
feat/plugin-staged-analysis-pipeline-20260316

onlycastle commented Mar 17, 2026

Uh oh!

onlycastle commented Mar 17, 2026

Uh oh!

onlycastle commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

onlycastle commented Mar 17, 2026

Summary

Changes

Test Plan

Uh oh!

onlycastle commented Mar 17, 2026

Code Review: PR #160 -- Plugin Staged Analysis Pipeline

What Was Done Well

Issues Found

Important (Should Fix)

Suggestions (Nice to Have)

Architecture Summary

Verdict

Uh oh!

onlycastle commented Mar 17, 2026

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant