Skip to content

test: add comprehensive test suite with 1000 test cases#3

Open
seedquan wants to merge 2 commits intoiamtouchskyer:mainfrom
seedquan:feat/comprehensive-test-suite
Open

test: add comprehensive test suite with 1000 test cases#3
seedquan wants to merge 2 commits intoiamtouchskyer:mainfrom
seedquan:feat/comprehensive-test-suite

Conversation

@seedquan
Copy link
Copy Markdown

Summary

  • Add 8 test files with 1000 test cases covering all modules
  • Uses Node.js built-in node:test + node:assert — zero external dependencies
  • All 1000 tests pass in ~4 seconds

Test Coverage

File Tests Coverage
eval-parser.test.mjs 300 Severity detection, file refs, verdicts, fix lines, reasoning, hedging, findings count, edge cases
flow-commands.test.mjs 300 cmdRoute, cmdInit, cmdValidate, cmdTransition, cmdValidateChain — full state machine
eval-commands.test.mjs 150 cmdVerify, cmdSynthesize, cmdReport, cmdDiff with oscillation detection
viz-commands.test.mjs 50 getMarker, cmdViz, cmdReplayData
flow-templates.test.mjs 50 Structure validation, edge completeness, limit ranges
opc-cli.test.mjs 50 version, help, install, uninstall via child process
verify-devil-advocate.test.mjs 50 Challenge/verdict parsing, quality checks via python3
integration.test.mjs 50 End-to-end flows, error recovery

Run

node --test tests/*.test.mjs

Test plan

  • All 1000 tests pass locally
  • No external dependencies added
  • Tests are independent (no shared state)
  • File I/O tests use temp directories with cleanup

🤖 Generated with Claude Code

iris and others added 2 commits April 11, 2026 00:24
Add 8 test files covering all modules with node:test + node:assert:

- eval-parser (300 tests): severity detection, file refs, verdicts,
  fix lines, reasoning, hedging, findings count, edge cases
- flow-commands (300 tests): cmdRoute, cmdInit, cmdValidate,
  cmdTransition, cmdValidateChain with full state machine coverage
- eval-commands (150 tests): cmdVerify, cmdSynthesize, cmdReport,
  cmdDiff with oscillation detection
- viz-commands (50 tests): getMarker, cmdViz, cmdReplayData
- flow-templates (50 tests): structure validation, edge completeness
- opc-cli (50 tests): version, help, install, uninstall via child process
- verify-devil-advocate (50 tests): challenge/verdict parsing, quality checks
- integration (50 tests): end-to-end flows, error recovery

All 1000 tests pass in ~4 seconds. Zero external dependencies.

Run with: node --test tests/*.test.mjs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 7 verification test files with deep boundary, property, and
integration testing:

- verify-parser-boundaries (200): regex edge cases, encoding stress,
  fuzzy inputs, large-scale, regression patterns
- verify-parser-properties (150): idempotency, count consistency,
  ordering, verdict/file-ref/severity/hedging invariants
- verify-flow-state-machine (200): exhaustive route table, state
  invariants, limit exhaustion, concurrent state, full traversals
- verify-handshake-schema (150): field types, enum boundaries,
  artifact paths, evidence rules, cross-field, malformed JSON
- verify-synthesis-diff (150): verdict logic, role extraction,
  diff normalization, oscillation thresholds, report generation
- verify-viz-replay (50): marker transitions, viz consistency,
  replay data completeness
- verify-e2e-scenarios (100): happy paths, fail loops, oscillation,
  max limits, devil's advocate, report-replay round-trips

Bug found: cmdValidate crashes on null JSON input (V685)

All 2000 tests pass in ~4 seconds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iamtouchskyer added a commit that referenced this pull request Apr 19, 2026
Addresses 3 ITERATE findings from U1.6r contract + semantics reviewers:

1. fireArtifactEmit: recordSuccess was unconditionally resetting _failStreak
   after per-item write failures, so circuit-breaker would never trip on
   persistent write failures. Track anyItemFailed and only call recordSuccess
   when every item in the call succeeded. (semantics F1, contract #2)

2. fireArtifactEmit: accept ArrayBufferView (Uint8Array, DataView) in addition
   to string / Buffer. Modern APIs (crypto.subtle, TextEncoder, Playwright)
   commonly return Uint8Array — tight Buffer.isBuffer check was silently
   dropping them with a misleading WARN. (semantics F2)

3. cmdExtensionArtifact: add nodeCapabilities to stdout JSON for consistency
   with cmdExtensionVerdict. (contract #1)

4. CONTRIBUTING.md: document executeRun + artifactEmit hooks with sample
   skeleton + hook surface summary table. (contract #3)

Regression tests: 4 new tests (Uint8Array accepted, _failStreak persists
across calls, success reset is all-or-nothing, CLI JSON includes
nodeCapabilities). Total 118/118 extension tests, 22/22 suite files green.
iamtouchskyer added a commit that referenced this pull request Apr 19, 2026
…llow-up)

Reviewer B (U2.8d) found two real bugs in the U2.8c JSON sidecar fix:

🔴 #2 dedup key collision: `${ext}|${hook}|${kind}|${message}` doesn't
   escape `|`. Two genuinely different failures collide silently:
     A: ext="a|b", hook="c"   →  "a|b|c|error|msg"
     B: ext="a",   hook="b|c" →  "a|b|c|error|msg"
   Fix: use JSON.stringify on a tuple `[ext,hook,kind,message]` — keys
   are unambiguous regardless of field contents.

🔴 #5 droppedTotal overwrite: the field name promises accumulation
   ("droppedTotal") but the code wrote `dropped` from the current call,
   silently resetting prior cap-overflow signal across CLI invocations.
   Fix: read priorDropped from sidecar, write `priorDropped + dropped`.
   Markdown view's "N earlier failure record(s) dropped" message now
   reflects the lifetime total, not just the last call.

Verification:
- New unit tests:
    6.1 pipe-collision: A and B above both preserved (length=2)
    7.1 droppedTotal accumulates 5+3+0 = 8 across three CLI invocations
- test-run2-failure-merge.sh: 11/11 pass (was 9/9)
- Full suite: 27/27 still pass — no regression

Out-of-scope (acknowledged, deferred):
- #3 R-M-W race under concurrent CLI lanes sharing runDir: documented
  single-writer invariant assumption; future work if multi-lane CI lands.
- #4 schema drift on top-level unknown fields: per-entry fields already
  preserved (we spread the whole entry); top-level only carries failures+
  droppedTotal so drift surface is bounded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant