Skip to content

test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity#1154

Open
carlos-alm wants to merge 3 commits into
mainfrom
feat/1121-language-registry-native-parity
Open

test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity#1154
carlos-alm wants to merge 3 commits into
mainfrom
feat/1121-language-registry-native-parity

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Closes #1121.

The acceptance criterion from #1071 called for a CI gate that fires when a JS LANGUAGE_REGISTRY entry has no corresponding Rust extractor (or sits on an explicit allowlist). The existing drift guard in tests/parsers/native-drop-classification.test.ts covers NATIVE_SUPPORTED_EXTENSIONS ↔ parser_registry.rs (link 2 ↔ 3), but the link from LANGUAGE_REGISTRY to NATIVE_SUPPORTED_EXTENSIONS (link 1 ↔ 2) had no test. A WASM-only language added to the registry would silently be classified as unsupported-by-native and only the JS backfill would parse it — exactly the regression class #1071 was meant to prevent.

Adds a new describe block (LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity) with three tests:

  1. Every LANGUAGE_REGISTRY extension has a Rust extractor (or appears on the documented WASM_ONLY_ALLOWLIST).
  2. Allowlist entries can't reference extensions that already have a Rust extractor (catches stale entries left behind after a Rust port).
  3. Allowlist entries must reference extensions actually declared in LANGUAGE_REGISTRY (catches typos and dead config).

WASM_ONLY_ALLOWLIST is an empty Set<string> today, documented in-code so future contributors know when to add to it (intentional WASM-only languages with a tracked Rust port) vs. when to port directly.

Test plan

  • npx vitest run tests/parsers/native-drop-classification.test.ts — 16/16 passing (13 existing + 3 new).
  • Verified the parity logic catches simulated drift (added .toy to a registry copy and confirmed the detector flagged it) before removing the throwaway test.
  • npx biome check tests/parsers/native-drop-classification.test.ts — clean.
  • CI green.

…arity

Closes #1121.

The drift guard between NATIVE_SUPPORTED_EXTENSIONS and parser_registry.rs
covered link 2 ↔ 3, but link 1 ↔ 2 (LANGUAGE_REGISTRY ↔
NATIVE_SUPPORTED_EXTENSIONS) had no test. A WASM-only language added to
the registry would silently degrade the native engine without flagging.

Adds three tests with an explicit WASM_ONLY_ALLOWLIST (currently empty) so
the allowlist itself can't rot — entries must reference a real registry
extension and must not duplicate a language that's already been ported.
@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Claude finished @carlos-alm's task —— View job


PR Review - Final Assessment

Reviewing PR #1154: test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity

Todo List

  • Read the modified test file to understand implementation
  • Analyze against Phase 1: Root Cause Analysis
  • Analyze against Phase 2: Configuration Change Detection
  • Analyze against Phase 3: Test Integrity
  • Analyze against Phase 4: Code Quality
  • Analyze against Phase 5: Backlog Compliance
  • Provide final recommendation and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Real issue from #1121 - there was a gap in drift detection between the JS LANGUAGE_REGISTRY (single source of truth for supported languages) and NATIVE_SUPPORTED_EXTENSIONS (hand-maintained mirror of Rust enum coverage). A WASM-only language could be silently added to the registry without a corresponding Rust extractor, causing the native engine to degrade silently without triggering CI warnings.

Why This Approach: Justified architectural fix. The existing drift guard only covered NATIVE_SUPPORTED_EXTENSIONS ↔ parser_registry.rs (link 2↔3), but the critical link 1↔2 (LANGUAGE_REGISTRYNATIVE_SUPPORTED_EXTENSIONS) was unguarded. This would allow regressions exactly as described in #1071.

Risk Assessment: Low risk - pure additive testing with well-defined allowlist mechanism for intentional exceptions.

Phase 2: Configuration Change Detection ✅

HIGH-RISK Changes: None detected. This PR only adds tests - no linting rules disabled, no timeouts increased, no build configuration modified, no TypeScript strict mode changes.

Assessment: Clean - purely additive test coverage with no configuration relaxation.

Phase 3: Test Integrity ✅

RED FLAGS: None detected. The tests are properly structured and strict:

Strong assertions: Tests use .toEqual([]) which fails on any unexpected entries
Comprehensive coverage: Three complementary tests cover all drift scenarios
No weakened expectations: Tests maintain strict parity requirements
Clear documentation: Extensive comments explain the architectural purpose

Test Quality Assessment:

  • Test 1 (lines 239-256): Ensures every LANGUAGE_REGISTRY extension either has native support OR is explicitly allowlisted
  • Test 2 (lines 258-266): Prevents stale allowlist entries after Rust ports are completed
  • Test 3 (lines 268-282): Catches typos and orphaned allowlist entries

All tests are properly isolated and test exactly what they claim to test.

Phase 4: Code Quality ✅

Implementation Quality: Excellent

Clean TypeScript: Proper typing with ReadonlySet<string>, correct imports
Logical structure: Tests follow natural dependency order (parity → staleness → orphans)
Error messages: Comprehensive error messages with actionable guidance (lines 250-254, 263-264, 279-280)
Code organization: Well-integrated into existing test structure
Documentation: Extensive inline documentation explaining architectural context

Minor Code Quality Notes:

  • Line 237: Empty allowlist is intentional and well-documented
  • Line 243: Proper case normalization with .toLowerCase()
  • Lines 246-249: Efficient filtering chains that are readable

Security/Performance: No concerns - pure test code with no external dependencies or side effects.

Phase 5: Backlog Compliance ✅

Zero-dep: ✓ — No new runtime dependencies added. Uses existing test infrastructure (vitest, existing imports)

Foundation-aligned: ✓ — Fully aligned. Strengthens the dual-engine architecture (P1: always-current) by ensuring native engine coverage matches the JS registry. No cloud APIs, external DBs, or agent behavior.

Problem-fit: 5/5 — Directly prevents AI agent confusion from inconsistent engine results. When the native engine silently degrades due to missing extractors, agents receive inconsistent dependency graphs depending on engine selection, leading to hallucinated dependencies and incorrect blast radius analysis. This gate prevents exactly that scenario.

Breaking: No — Purely additive test coverage with no behavior changes to public APIs, CLI output, or database schema.

Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None. This is a high-quality defensive improvement that strengthens the core architecture.

Final Recommendation

Root Cause Analysis

Problem Being Solved: CI gap allowing silent native engine degradation via WASM-only language additions
Why This Approach: Closes architectural drift detection gap with proper allowlist mechanism
Risk Assessment: Very low - additive testing only

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies
  • Foundation-aligned: ✓ — Strengthens dual-engine architecture
  • Problem-fit: 5 — Directly prevents AI agent confusion from engine inconsistency
  • Breaking: No — Purely additive
  • Tier: 1

Critical Concerns

None

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: Exemplary defensive programming. Addresses real architectural vulnerability with comprehensive test coverage, clear documentation, and proper escape hatch design. The allowlist mechanism is well-designed for the rare case where WASM-only support is temporarily needed. This directly improves AI agent reliability by preventing silent engine parity drift.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

Adds a new describe block to native-drop-classification.test.ts that gates the LANGUAGE_REGISTRYNATIVE_SUPPORTED_EXTENSIONS link — the previously untested segment that would allow a WASM-only language to silently degrade native parsing. An empty WASM_ONLY_ALLOWLIST with clear contribution guidance is included for future intentional exceptions.

  • Test 1 verifies every registry extension has a Rust extractor or is on the allowlist; it applies toLowerCase() to registry extensions but does not normalise the allowlist before the Set.has() lookup, unlike the parallel fix already applied in tests 2 and 3.
  • Tests 2 & 3 correctly normalise allowlist entries before comparison (addressing feedback from the prior review round), catching stale and orphaned allowlist entries respectively.
  • registryExts is computed identically in both test 1 and test 3; hoisting it to the describe scope would remove the duplication.

Confidence Score: 5/5

Test-only change adding three new CI gates; no production code is modified and all three tests pass today with an empty allowlist.

The change is purely additive test code that closes a documented gap. The new gates correctly catch forward drift and the normalisation fixes from the prior round are in place for tests 2 and 3. One minor asymmetry remains in test 1's allowlist lookup, but it would only produce a false positive rather than silently missing a regression.

No files require special attention; the single changed file is low-risk test code.

Important Files Changed

Filename Overview
tests/parsers/native-drop-classification.test.ts Adds three new tests gating LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity; logic is sound with one minor asymmetry in the allowlist normalization for test 1.

Fix All in Claude Code

Reviews (4): Last reviewed commit: "Merge branch 'main' into feat/1121-langu..." | Re-trigger Greptile

it('WASM_ONLY_ALLOWLIST does not list extensions that already have a Rust extractor', () => {
// Catches stale allowlist entries: once a language is ported to Rust the
// allowlist line should be deleted, not left behind as dead config.
const stale = [...WASM_ONLY_ALLOWLIST].filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Allowlist entries in tests 2 and 3 are compared as-is, while registryExts (used in test 1 and test 3) and NATIVE_SUPPORTED_EXTENSIONS are all lowercase. If a future contributor adds a mixed-case entry such as .TS to the allowlist, test 2 would silently miss it as stale (NATIVE_SUPPORTED_EXTENSIONS.has('.TS') → false) and test 3 would wrongly report it as an orphan (registryExts.has('.TS') → false because registryExts was built with .toLowerCase()). Normalising allowlist entries to lowercase in these two filters keeps all three tests consistent.

Suggested change
const stale = [...WASM_ONLY_ALLOWLIST].filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));
const stale = [...WASM_ONLY_ALLOWLIST]
.map((ext) => ext.toLowerCase())
.filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57eb769 — added .map((ext) => ext.toLowerCase()) before the NATIVE_SUPPORTED_EXTENSIONS filter so a mixed-case allowlist entry can't slip past the staleness check.

registryExts.add(ext.toLowerCase());
}
}
const orphans = [...WASM_ONLY_ALLOWLIST].filter((ext) => !registryExts.has(ext));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same normalisation gap as in test 2 — without .toLowerCase() here, a capitalised allowlist entry would be falsely reported as an orphan even though a matching (lowercase) extension exists in registryExts.

Suggested change
const orphans = [...WASM_ONLY_ALLOWLIST].filter((ext) => !registryExts.has(ext));
const orphans = [...WASM_ONLY_ALLOWLIST]
.map((ext) => ext.toLowerCase())
.filter((ext) => !registryExts.has(ext));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57eb769 — added .map((ext) => ext.toLowerCase()) before the registryExts filter so the orphan check matches the lowercase form actually stored in registryExts.

Lowercase allowlist entries before comparing against `registryExts` and
`NATIVE_SUPPORTED_EXTENSIONS` so a mixed-case future entry can't slip
past the staleness and orphan checks. Both reference sets are already
lowercased, so without this normalization a capitalized allowlist entry
would be silently skipped by test 2 and falsely flagged by test 3.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CI gate: LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS consistency

1 participant