fix(parity): log per-file reasons for native orchestrator drops (#1011)#1024
fix(parity): log per-file reasons for native orchestrator drops (#1011)#1024carlos-alm merged 2 commits intomainfrom
Conversation
The native orchestrator's WASM backfill warning was a single line — "dropped N file(s); backfilling via WASM" — which made it impossible to tell which files were dropped or why. With 49 drops on the 3.9.5 self- build, users could not distinguish legitimate parser limits (languages with no Rust extractor, e.g. .fs/.gleam/.clj/.jl/.R/.sol/.cu) from real extractor regressions in natively-supported languages. Add a `classifyNativeDrops` helper in `src/domain/parser.ts` that buckets each missing file by extension and reason: - `unsupported-by-native`: extension has no Rust extractor — legitimate parser limit per the issue's acceptance criteria. Logged at info level. - `native-extractor-failure`: extension IS in the Rust addon yet the file was still dropped — points at a real native bug. Logged at warn level with explicit "likely a Rust extractor bug" framing so it stays loud. `backfillNativeDroppedFiles` in `pipeline.ts` now uses the helper to emit per-extension counts and up to 3 sample paths per extension (capped at 6 extensions to keep the log line readable). The classifier set `NATIVE_SUPPORTED_EXTENSIONS` mirrors `LanguageKind::from_extension` in `crates/codegraph-core/src/parser_registry.rs` and carries a comment flagging the sync requirement — the native addon ships as a separate npm package, so JS has no runtime way to discover its language coverage. Adds unit tests covering the classification, sample paths, case-insensitive extensions, and empty input. Note: the Node 24 vitest config has a pre-existing incompatibility (`--strip-types is not allowed in NODE_OPTIONS`); test logic was verified locally via a standalone Node script that mirrors the unit tests, and the PR's CI on Node 22 will run the vitest suite normally. docs check acknowledged: internal logging refactor; no user-visible language support, command, or roadmap deliverable changes.
|
Claude finished @carlos-alm's task —— View job PR Review: Fix Native Orchestrator Drop LoggingReviewing PR #1024 that improves logging for native orchestrator drops. Todo List
Root Cause AnalysisProblem Being Solved: Issue #1011 - Replace the vague Why This Approach: The PR chooses to improve logging rather than add 11 new Rust extractors. This is justified because:
Risk Assessment: Low risk. This is purely additive logging with no functional changes to parsing logic. Critical ConcernsNone identified. This is a well-implemented logging improvement that: ✅ No Configuration RelaxationsNo changes to linting rules, test timeouts, coverage thresholds, or build configs. ✅ Test Integrity Maintained
✅ Clean Code Qualityparser.ts:427-467 - parser.ts:488-513 - pipeline.ts:747-760 - pipeline.ts:800-814 - Integration point provides clear differentiation:
The logging approach ensures real regressions stay loud while legitimate limitations become quieter info messages. Backlog Compliance
Final Recommendation
|
Greptile SummaryReplaces the coarse native-orchestrator drop warning with per-extension, per-reason log messages: Confidence Score: 5/5Safe to merge — logic is correct, well-tested, and the only finding is a robustness nit in the drift guard test. No P0 or P1 issues found. The single P2 comment concerns a fragile boundary condition in the drift guard regex that could produce spurious test failures but cannot cause a mis-classification in production. tests/parsers/native-drop-classification.test.ts — drift guard body-slice fallback could be unbounded if from_extension is the last pub fn in the file. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[backfillNativeDroppedFiles] --> B{file in existing DB?}
B -- yes --> C[skip]
B -- no --> D{WASM grammar installed?}
D -- no --> C
D -- yes --> E[add to missingRel / missingAbs]
E --> F[classifyNativeDrops]
F --> G{extension in NATIVE_SUPPORTED_EXTENSIONS?}
G -- yes --> H[native-extractor-failure bucket]
G -- no --> I[unsupported-by-native bucket]
H --> J[warn log + formatDropExtensionSummary]
I --> K[info log + formatDropExtensionSummary]
J --> L[parseFilesAuto via WASM backfill]
K --> L
Reviews (2): Last reviewed commit: "test(parity): cover formatDropExtensionS..." | Re-trigger Greptile |
| /** | ||
| * Lowercase file extensions covered by the native Rust addon. | ||
| * | ||
| * Mirrors `LanguageKind::from_extension` in | ||
| * `crates/codegraph-core/src/parser_registry.rs`. Used to classify why the | ||
| * native orchestrator dropped a file: extensions outside this set are a | ||
| * legitimate parser limit (no Rust extractor exists), while extensions inside | ||
| * it indicate a real native bug (parse/read/extract failure). | ||
| * | ||
| * Keep this list in sync with the Rust enum — the native addon is a separate | ||
| * npm package, so JS has no runtime way to discover its language coverage. | ||
| */ | ||
| export const NATIVE_SUPPORTED_EXTENSIONS: ReadonlySet<string> = new Set([ | ||
| '.js', | ||
| '.jsx', | ||
| '.mjs', | ||
| '.cjs', | ||
| '.ts', | ||
| '.tsx', | ||
| '.py', | ||
| '.pyi', | ||
| '.tf', | ||
| '.hcl', | ||
| '.go', | ||
| '.rs', | ||
| '.java', | ||
| '.cs', | ||
| '.rb', | ||
| '.rake', | ||
| '.gemspec', | ||
| '.php', | ||
| '.phtml', | ||
| '.c', | ||
| '.h', | ||
| '.cpp', | ||
| '.cc', | ||
| '.cxx', | ||
| '.hpp', | ||
| '.kt', | ||
| '.kts', | ||
| '.swift', | ||
| '.scala', | ||
| '.sh', | ||
| '.bash', | ||
| '.ex', | ||
| '.exs', | ||
| '.lua', | ||
| '.dart', | ||
| '.zig', | ||
| '.hs', | ||
| '.ml', | ||
| '.mli', | ||
| ]); |
There was a problem hiding this comment.
Silent mis-classification risk on native addon version drift
NATIVE_SUPPORTED_EXTENSIONS is keyed to one specific snapshot of LanguageKind::from_extension. If the Rust addon gains a new language (or drops one) between addon releases without a matching JS update, drops will be silently mis-classified: a real native failure shows up as unsupported-by-native (info, quiet) instead of native-extractor-failure (warn, loud). The inverse case — removed support — would spam false native-extractor-failure warnings. There's no runtime assertion that the two lists agree, so the drift won't be caught until a user notices wrong log levels. Consider adding a CI step or a startup assertion that cross-checks the set against the native addon's own exported metadata if the addon exposes it; if it doesn't, at minimum add an integration test that verifies the current addon version is the one this set was generated from.
There was a problem hiding this comment.
Fixed in cbbc9ae — added a drift guard test that parses crates/codegraph-core/src/parser_registry.rs and asserts NATIVE_SUPPORTED_EXTENSIONS agrees with the Rust LanguageKind::from_extension arms. The native addon doesn't expose its own metadata, so source-level cross-check at CI time is the cheapest way to catch drift before users see mis-classified log levels. If parser_registry.rs adds or removes an extension, the test fails loudly with a list of mismatches.
| import { describe, expect, it } from 'vitest'; | ||
| import { classifyNativeDrops, NATIVE_SUPPORTED_EXTENSIONS } from '../../src/domain/parser.js'; | ||
|
|
||
| describe('classifyNativeDrops', () => { | ||
| it('groups WASM-only languages under unsupported-by-native', () => { | ||
| const { byReason, totals } = classifyNativeDrops([ | ||
| 'src/a.fs', | ||
| 'src/b.gleam', | ||
| 'src/c.clj', | ||
| 'src/d.jl', | ||
| 'src/e.R', | ||
| 'src/f.erl', | ||
| 'src/g.sol', | ||
| 'src/h.cu', | ||
| 'src/i.groovy', | ||
| 'src/j.v', | ||
| 'src/k.m', | ||
| ]); | ||
| expect(totals['unsupported-by-native']).toBe(11); | ||
| expect(totals['native-extractor-failure']).toBe(0); | ||
| expect(byReason['unsupported-by-native'].get('.fs')).toEqual(['src/a.fs']); | ||
| expect(byReason['unsupported-by-native'].get('.gleam')).toEqual(['src/b.gleam']); | ||
| expect(byReason['unsupported-by-native'].get('.r')).toEqual(['src/e.R']); | ||
| }); | ||
|
|
||
| it('flags natively-supported extensions as native-extractor-failure', () => { | ||
| const { byReason, totals } = classifyNativeDrops([ | ||
| 'src/a.ts', | ||
| 'src/b.py', | ||
| 'src/c.go', | ||
| 'src/d.rs', | ||
| ]); | ||
| expect(totals['native-extractor-failure']).toBe(4); | ||
| expect(totals['unsupported-by-native']).toBe(0); | ||
| expect(byReason['native-extractor-failure'].get('.ts')).toEqual(['src/a.ts']); | ||
| expect(byReason['native-extractor-failure'].get('.py')).toEqual(['src/b.py']); | ||
| }); | ||
|
|
||
| it('handles a mix of supported and unsupported extensions', () => { | ||
| const { byReason, totals } = classifyNativeDrops([ | ||
| 'src/a.ts', | ||
| 'src/b.fs', | ||
| 'src/c.fs', | ||
| 'src/d.gleam', | ||
| ]); | ||
| expect(totals['native-extractor-failure']).toBe(1); | ||
| expect(totals['unsupported-by-native']).toBe(3); | ||
| expect(byReason['unsupported-by-native'].get('.fs')).toEqual(['src/b.fs', 'src/c.fs']); | ||
| expect(byReason['unsupported-by-native'].get('.gleam')).toEqual(['src/d.gleam']); | ||
| }); | ||
|
|
||
| it('lowercases extensions so .R and .r share a bucket', () => { | ||
| const { byReason, totals } = classifyNativeDrops(['scripts/a.R', 'scripts/b.r']); | ||
| expect(totals['unsupported-by-native']).toBe(2); | ||
| expect(byReason['unsupported-by-native'].get('.r')).toEqual(['scripts/a.R', 'scripts/b.r']); | ||
| }); | ||
|
|
||
| it('returns empty buckets when no files are passed', () => { | ||
| const { byReason, totals } = classifyNativeDrops([]); | ||
| expect(totals['native-extractor-failure']).toBe(0); | ||
| expect(totals['unsupported-by-native']).toBe(0); | ||
| expect(byReason['native-extractor-failure'].size).toBe(0); | ||
| expect(byReason['unsupported-by-native'].size).toBe(0); | ||
| }); | ||
|
|
||
| it('exposes the native-supported extension set for callers', () => { | ||
| expect(NATIVE_SUPPORTED_EXTENSIONS.has('.ts')).toBe(true); | ||
| expect(NATIVE_SUPPORTED_EXTENSIONS.has('.py')).toBe(true); | ||
| expect(NATIVE_SUPPORTED_EXTENSIONS.has('.fs')).toBe(false); | ||
| expect(NATIVE_SUPPORTED_EXTENSIONS.has('.gleam')).toBe(false); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
formatDropExtensionSummary cap logic is not unit-tested
The non-trivial formatting function (MAX_EXTS = 6, MAX_SAMPLES = 3, +N more extension(s) display) lives in pipeline.ts but has no dedicated unit tests. The current test suite only covers classifyNativeDrops. A regression in the cap logic (e.g., off-by-one in the +N more calculation, or the sort being reversed) would produce silently truncated log lines without any test failure. Adding a small test or extracting and testing formatDropExtensionSummary directly would cover that path.
There was a problem hiding this comment.
Fixed in cbbc9ae — moved formatDropExtensionSummary from pipeline.ts to parser.ts (next to classifyNativeDrops) and exported it. Added 6 unit tests covering: empty buckets, under-cap rendering, the +N more per-extension sample cap, the exact-cap boundary, the +N more extension(s) suffix when the extension cap is hit, and descending-count ordering so a future regression in the sort or off-by-one in the +N math fails loudly.
Codegraph Impact Analysis5 functions changed → 5 callers affected across 3 files
|
…RTED_EXTENSIONS drift (#1024) Address Greptile P2 review feedback on #1024: - Move `formatDropExtensionSummary` from `pipeline.ts` to `parser.ts` next to `classifyNativeDrops` and export it so the cap logic can be unit-tested. - Add tests for the formatter covering: empty buckets, under-cap rendering, per-extension sample cap with `+N more`, exact-cap boundary, extension cap with `+N more extension(s)` suffix, and descending-count ordering. - Add a drift guard test that parses `crates/codegraph-core/src/parser_registry.rs` and asserts the JS `NATIVE_SUPPORTED_EXTENSIONS` set agrees with the Rust `LanguageKind::from_extension` arms. The native addon doesn't expose its own metadata, so source-level cross-check at CI time is the cheapest way to catch drift before users see mis-classified log levels. Impact: 1 functions changed, 3 affected
Summary
Fixes #1011. Replaces the coarse
Native orchestrator dropped N file(s); backfilling via WASM for engine paritywarning with per-extension counts and sample paths, so users can tell legitimate parser limits from real native bugs.What changed
src/domain/parser.ts: AddsNATIVE_SUPPORTED_EXTENSIONS(mirrorsLanguageKind::from_extensionincrates/codegraph-core/src/parser_registry.rs) and a pureclassifyNativeDrops(relPaths)helper that buckets dropped files by reason × extension.src/domain/graph/builder/pipeline.ts:backfillNativeDroppedFilesnow classifies drops:unsupported-by-native(no Rust extractor — legitimate parser limit) → logged at info level.native-extractor-failure(extension IS in the addon yet the file was still dropped) → logged at warn level with explicit "likely a Rust extractor bug" framing so it stays loud.tests/parsers/native-drop-classification.test.ts: Unit tests covering classification, mixed inputs, case-insensitive extensions, and empty input.Why this approach
The issue offers two acceptance paths: fix the gap (add native extractors for 11 WASM-only languages) or log per-file drop reasons that are legitimate parser limits. The WASM-only languages (
.fs,.gleam,.clj,.jl,.R,.erl,.sol,.m,.cu,.groovy,.v— ~48 of the 49 dropped files on a 3.9.5 self-build) are precisely the legitimate limits the issue describes; the second criterion fits the scope. Adding 11 Rust extractors is a much larger effort and a separate concern.The categorization makes any future regression in a natively-supported language stand out as a
warnrather than blending into the existing baseline noise — the CI parity gate from #1014 still catches the count-level threshold.Sample output
Before:
After (info — legitimate gap):
After (warn — real bug):
Test plan
tests/parsers/native-drop-classification.test.ts(logic verified locally via standalone Node script — pre-existing Node 24 vitest config bug--strip-types is not allowed in NODE_OPTIONSblocks running tests on this machine; CI on Node 22 will run them)npx tsc --noEmit— cleannpx biome checkon touched files — only pre-existing unused-function warnings in unrelated parser internalsOut of scope
Adding native Rust extractors for
.fs/.gleam/.clj/.jl/.R/.erl/.sol/.m/.cu/.groovy/.v/.sv— separate, larger effort. Once any of those are added on the Rust side, removing the corresponding entries fromNATIVE_SUPPORTED_EXTENSIONSwill automatically reclassify those drops as failures (good).