Skip to content

feat(core): AST-based pattern analysis via tree-sitter queries#16

Merged
prosdev merged 8 commits into
mainfrom
feat/mcp-phase1-ast-patterns
Mar 31, 2026
Merged

feat(core): AST-based pattern analysis via tree-sitter queries#16
prosdev merged 8 commits into
mainfrom
feat/mcp-phase1-ast-patterns

Conversation

@prosdev
Copy link
Copy Markdown
Contributor

@prosdev prosdev commented Mar 31, 2026

Summary

Adds AST-based pattern detection to dev_patterns using tree-sitter queries. Replaces regex-only detection with 12 S-expression queries across 3 categories, covering the full JS/TS ecosystem (.ts, .tsx, .js, .jsx).

What it does

Error handling — detects patterns regex can't:

Pattern Before (regex) After (AST)
throw new Error(...) Yes Yes
Result<T> / { ok: true } Yes Yes
try { } catch { } No Yes
promise.catch(handler) No Yes
await inside try/catch No Yes
class AppError extends BaseError No Yes

Import style — more precise detection:

  • Dynamic import() — now counts as ESM (was invisible)
  • require() — AST-precise (regex was fragile)
  • Re-exports — distinguished from regular imports

Type coverage — catches what regex misses:

  • Arrow function return types ((): Type => ...) — regex is fragile on these
  • Accurate denominator via arrow-total + function-total queries

Architecture

PatternAnalysisService
    │ calls runAllAstQueries() ONCE per file
    ▼
PatternMatcher (interface — 1 method)
    └── WasmPatternMatcher (web-tree-sitter, WASM)
          └── runQueries() — tree.delete() + query.delete() in finally

Extractors are synchronous pure functions that take pre-computed AST map.
Rules are S-expression string constants (no query builder — YAGNI).
Designed for future swap to @ast-grep/napi if bulk scanning perf matters.

Changes

New files

  • packages/core/src/pattern-matcher/ — PatternMatcher interface, 12 rules, WASM implementation
  • packages/core/src/pattern-matcher/__tests__/wasm-matcher.test.ts — 51 tests

Modified files

  • packages/core/src/scanner/tree-sitter.ts — add TS/TSX/JS languages + runQueries()
  • packages/dev-agent/scripts/copy-wasm.js — bundle 3 new WASM grammars
  • packages/core/src/services/pattern-analysis-service.ts — AST-enhanced extractors + runAllAstQueries
  • packages/core/src/services/pattern-analysis-types.ts — optional patternMatcher in config
  • packages/mcp-server/ — wire PatternMatcher through InspectAdapter

Bundle size

+5.3MB WASM (TS 2.3MB + TSX 2.3MB + JS 647KB). Acceptable for CLI tool.

Test plan

Automated (51 tests in wasm-matcher.test.ts)

  • 10 positive tests with exact match counts (count === 1)
  • 10 negative tests (one per query, count === 0)
  • 3 total count queries (arrow-total, function-total, no functions)
  • 3 language routing tests (TSX fixture, JSX→javascript, unsupported)
  • 3 edge cases (empty source, malformed TS, invalid S-expression)
  • 1 performance sanity (552 lines + 12 queries in 39ms)
  • 5 resolveLanguage extension routing tests
  • 7 extractErrorHandlingWithAst merge logic tests
  • 4 extractImportStyleWithAst merge logic tests
  • 5 extractTypeCoverageWithAst merge logic tests (accurate denominator)
  • 49 existing pattern-analysis-service tests pass unchanged (regex fallback)
  • 1628 total tests pass, 0 failures

Manual smoke test

pnpm build                          # Bundles 4 WASM grammars (go, ts, tsx, js)
pnpm test -- wasm-matcher           # 51 tests, 710ms ✓
pnpm test -- pattern-analysis       # 49 passed, 1 skipped ✓

# Full MCP flow (requires Antfly running):
dev index
# In Cursor/Claude Code:
# > Use dev_patterns with filePath "packages/core/src/services/pattern-analysis-service.ts"
# > Use dev_patterns with filePath "packages/core/src/services/pattern-analysis-service.ts" format "json"

Known test gap

  • InspectAdapter test does not construct with a real PatternMatcher — AST path is not exercised through the MCP adapter layer. Tracked in .claude/scratchpad.md for follow-up.

Review findings addressed

Round Finding Fix
Code review Query objects not deleted (WASM leak) q.delete() in finally
Code review Type coverage ratio inflation Added arrow-total + function-total denominator queries
Code review try-catch alone classified as 'throw' Falls through to regex instead
Code review Import count asymmetry Documented with comment
Test coverage Extractors never tested with real AST 16 extractor merge logic tests added
Test coverage Denominator queries untested 3 tests for arrow-total/function-total
Senior review Same source parsed 3x runAllAstQueries parses once, extractors take pre-computed map
Senior review runQueries doing too much Extractors are now sync pure functions

Research that informed this

  • Studied ast-grep architecture (custom matcher on tree-sitter AST, not S-expression queries)
  • Validated all 12 S-expressions against actual tree-sitter-typescript grammar by parsing real snippets
  • Three plan reviews (architecture, WASM/AST specialist, SDET) before implementation
  • Decided against ast-grep CLI (shell injection, temp files) and NAPI (premature — WASM is 39ms for 552 lines)

Generated with Claude Code

prosdev and others added 8 commits March 31, 2026 02:34
Replace ast-grep CLI approach with tree-sitter queries using WASM we
already bundle. Key changes: PatternMatcher interface for future NAPI
swap, PatternRule DSL translated to S-expressions, zero new dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add typescript, tsx, and javascript to TreeSitterLanguage type and
SUPPORTED_LANGUAGES. Covers the full Node/frontend ecosystem:
.ts, .tsx, .js, .jsx files.

Bundle size: +5.3MB WASM (TS 2.3MB + TSX 2.3MB + JS 647KB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PatternMatcher interface with WasmPatternMatcher implementation:
- 5 error handling queries (try-catch, throw, promise.catch, await-in-try, error-class)
- 3 import style queries (dynamic-import, re-export, require)
- 2 type coverage queries (arrow-return-type, function-return-type)

All S-expressions verified against actual tree-sitter-typescript grammar.
runQueries() handles parser creation, tree.delete() cleanup, and error recovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 positive (exact counts), 10 negative (one per query), 3 language
routing (TSX fixture, JSX→javascript, unsupported), 3 edge cases
(empty, malformed, invalid query), 1 performance sanity (552 lines
+ 10 queries in 35ms), 5 resolveLanguage extension tests.

32 tests total. All S-expressions verified against real tree-sitter
parsing — no mocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire PatternMatcher through InspectAdapter to PatternAnalysisService.
Three AST-enhanced extractors with regex fallback:
- extractErrorHandlingWithAst: detects try/catch, promise.catch, error classes
- extractImportStyleWithAst: detects dynamic imports, precise require
- extractTypeCoverageWithAst: detects arrow function return types

All extractors fall back to regex when PatternMatcher is not configured
or file extension is unsupported. Existing 49 tests pass unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Query object WASM leak: call q.delete() in finally after matches()
- Fix type coverage ratio inflation: add arrow-total and function-total
  queries for accurate denominator (was counting only typed functions)
- Fix try/catch-alone classification: fall through to regex instead of
  mapping to 'throw' style (try-catch is a mechanism, not a style)
- Add comment explaining AST/regex import count asymmetry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fill the test coverage gaps identified in review:
- 3 tests for arrow-total/function-total denominator queries
- 7 tests for extractErrorHandlingWithAst merge logic
- 4 tests for extractImportStyleWithAst merge logic
- 5 tests for extractTypeCoverageWithAst merge logic (accurate denominator)
- Fallback tests verifying matcher=undefined produces identical regex output

51 tests total in wasm-matcher.test.ts (was 32). 1628 tests pass overall.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…puted AST

Address review findings:
1. runQueries doing too much: extracted runAllAstQueries() that parses
   once and runs all 12 queries. Extractors now take a pre-computed
   Map<string, number> instead of matcher+filePath (pure functions).
2. 3x parsing eliminated: analyzeFileFromIndex and analyzeFileWithDocs
   call runAllAstQueries once, pass results to all 3 extractors.
3. Extractors are now synchronous pure functions — easier to test,
   no async overhead, no hidden parsing.

Tests updated to match new signatures. Same 51 test count, same coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prosdev prosdev force-pushed the feat/mcp-phase1-ast-patterns branch from f94c7c3 to 5e36957 Compare March 31, 2026 10:06
@prosdev prosdev merged commit 2bac9dd into main Mar 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant