feat(replay): add --strict-failure flag to fail response-divergent tests instead of demoting to OBSOLETE by charankamarapu · Pull Request #4251 · keploy/keploy

charankamarapu · 2026-06-07T23:08:02Z

Summary

Adds an opt-in --strict-failure flag (and strictFailure keploy.yml field). When set, a test whose response diverges from the recorded baseline is marked FAILED even if the consumed mock set also diverged from the recorded mapping. The default behavior — demoting such cases to OBSOLETE so the user can re-record without a hard failure — stays exactly the same.

Why

The current replay.go:1828 (and twin site at 2353) decision tree:

if testPass {
    PASSED
} else if mockSetMismatch && !strictMockReject {
    OBSOLETE
} else {
    FAILED
}

is correct from a "did this test still validate the recorded contract" angle — but breaks two real workflows:

CI: a source regression that ALSO touches the mock set (e.g. a SQL drift changing both wire query and response) lands in OBSOLETE → run reports PASSED → CI stays green → bug ships.
Agentic flows: an autonomous fix agent watching for FAILED tests has nothing to diagnose when the planted regression lands in OBSOLETE. The fix loop never starts.

Diagnosed 2026-06-08 against scenario-1 of our validation harness: a planted uncommitted ORDER BY DESC LIMIT → ASC LIMIT edit fell into OBSOLETE (because the recorded mock's sql_ast_hash didn't match the live ASC query — a mockSetMismatch), the cloud replay aggregate reported PASSED, and the agent never engaged. With this flag, the same scenario reports FAILED and the agent's Case 1 path runs cleanly: revert source → green.

What it changes

Case	`testPass`	`mockSetMismatch`	`--strict-failure`	Before	After
1	true	false	any	PASSED	PASSED
2	true	true	false	PASSED (silent)	PASSED (silent — unchanged)
3	true	true	true	FAILED (via SchemaNoiseStrict)	FAILED (via SchemaNoiseStrict — unchanged)
4	false	false	any	FAILED	FAILED
5	false	true	false	OBSOLETE	OBSOLETE (unchanged — default opt-in only)
6	false	true	true	OBSOLETE	FAILED ← new behaviour

Only row 6 changes. The OBSOLETE classification still exists in the data model and the per-test report; the run-level aggregate just rolls it into FAILED when --strict-failure is set.

The mappingDiff block (expected vs actual mocks, missing calls) is still written to the report regardless of the flag — the diagnostic content is preserved, only the status field changes.

CLI surface

--strict-failure registered on keploy test and keploy record (same flag group as --schema-noise-detection, --update-test-mapping, --must-pass).
keploy cloud replay inherits the flag since it builds on the same command tree.
keploy.yml field: strictFailure: true|false (default false).

Verification

Validated against the orderflow scenario suite (S1-S7) — S1 reports FAILED instead of PASSED with the flag, exit code non-zero, mappingDiff block preserved.

Test plan

CI: existing replay tests (no change expected — flag defaults off)
Manual: run a test where response diverges + mock set mismatches; verify FAILED with flag, OBSOLETE without flag, mappingDiff present in both

🤖 Generated with Claude Code

github-actions · 2026-06-07T23:08:16Z

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

Charan Kamarapu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request}

github-actions · 2026-06-07T23:08:17Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

github-actions · 2026-06-07T23:13:07Z

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run	P50	P90	P99	RPS	Error Rate	Status
1	2.59ms	3.27ms	5.01ms	100.02	0.00%	✅ PASS
2	2.53ms	3.13ms	4.7ms	100.02	0.00%	✅ PASS
3	2.56ms	3.35ms	4.83ms	100.02	0.00%	✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

…sts instead of demoting to OBSOLETE When a test's response body differs from the recorded baseline AND the consumed mock set diverged from the recorded mapping, replay currently demotes the failure to OBSOLETE (replay.go:1828 / 2353) — the run-level aggregate then reports PASSED because OBSOLETE doesn't count as failure. The reasoning was "the mock contract changed, so the response diff is a consequence — tell the user to re-record, don't fail loudly." That's correct from a "did this test still validate the contract" angle, but breaks two important workflows: - CI: a real source regression that ALSO touches the mock set (e.g. a SQL drift that changes both the wire query and the response) gets silently classified as OBSOLETE → run passes → CI green → bug ships. - Agentic flows: an autonomous fix agent watching for FAILED tests has nothing to diagnose when the planted regression lands in OBSOLETE. The signal-to-fix loop never starts. This flag opts into a stricter classification: when the response diverges from the recorded baseline (testPass=false), the test is FAILED — even if mockSetMismatch=true would otherwise demote it. The per-test OBSOLETE label is replaced with FAILED, but the mappingDiff (expected vs actual mocks, missing calls) is still written to the report for diagnostics — the user / agent loses NO information, just gets the right status. Default behavior unchanged: omit the flag → existing OBSOLETE demotion applies. Opt in via `--strict-failure` (CLI on `record`, `test`, and `cloud replay`) or `strictFailure: true` in keploy.yml. What it does NOT change: - testPass=true + mockSetMismatch=true silent-ignore (controlled by SchemaNoiseStrict, not this flag). The current behavior of treating "response matched but mock didn't" as PASSED is a deliberate noise tolerance; this PR doesn't touch it. Sites: - config/config.go: Test.StrictFailure field - cli/provider/cmd.go: --strict-failure flag wired on the test/replay command group (and cloud-replay inherits it via the same group) - pkg/service/replay/replay.go: respect StrictFailure at the two OBSOLETE-decision sites (lines 1828 and 2353 — HTTP and gRPC paths) Signed-off-by: Charan Kamarapu <charan@keploy.io>

github-actions · 2026-06-07T23:21:46Z

The CLA check failed. Please ensure you have:

Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

github-actions · 2026-06-07T23:25:20Z

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run	P50	P90	P99	RPS	Error Rate	Status
1	2.7ms	3.4ms	4.97ms	100.02	0.00%	✅ PASS
2	2.65ms	3.31ms	4.9ms	100.02	0.00%	✅ PASS
3	2.76ms	3.68ms	5.87ms	100.02	0.00%	✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

charankamarapu

Principal engineer self-review

The change is correct and minimal — flag opt-in, default behaviour preserved, both demotion sites updated. A few things I'd want addressed before this lands:

Required

1. No test coverage for the new decision branch.
This PR changes replay-loop classification logic — the most consequential code path in the project. The matrix (testPass × mockSetMismatch × strictMockReject × StrictFailure) has 16 combinations; the change affects rows where testPass=false ∧ mockSetMismatch=true. There's no test asserting either:

Default off → OBSOLETE (regression guard for existing users)
Flag on → FAILED + mappingDiff preserved (the actual new behaviour)

A table-driven test on RunTestSet covering at minimum those two rows for each of the two sites (HTTP + gRPC) would catch the next person who edits the decision tree. Even a small replay_classification_test.go with a fakeReplayer would be enough.

Strongly recommended

2. Asymmetry between the two demotion sites is undocumented.

Site 1 (replay.go:1828, HTTP):

} else if mockSetMismatch && !strictMockReject && !r.config.Test.StrictFailure {

Site 2 (replay.go:2352, gRPC):

} else if mockSetMismatch && !r.config.Test.StrictFailure {

Site 2 doesn't consider strictMockReject at all. If that's because the gRPC path doesn't run the SchemaNoiseStrict pre-check that sets strictMockReject, say so in a comment. Otherwise the next reader (me, in 6 months) will assume one site forgot a guard and "fix" it, breaking the gRPC behaviour. A one-line comment on each branch costs nothing and prevents future bug introduction.

3. The two demotion sites are duplicate logic that should be extracted.

Both sites encode the same decision tree (PASSED → OBSOLETE → FAILED) with slightly different guards. Extract:

func (r *Replayer) classifyFailure(testPass, mockSetMismatch, strictMockReject bool) models.TestStatus {
    if testPass { return models.TestStatusPassed }
    if mockSetMismatch && !strictMockReject && !r.config.Test.StrictFailure {
        return models.TestStatusObsolete
    }
    return models.TestStatusFailed
}

That makes the asymmetry impossible (one source of truth) and makes the unit test from #1 trivial to write.

Nice to have

4. CLI help text leaks the implementation term "OBSOLETE".

--help users don't know what OBSOLETE means. Suggest reframing in user terms:

Mark response-divergent tests as FAILED even when their consumed mock set diverged from the recording (default: such tests are marked OBSOLETE so you can re-record without seeing a hard failure).

5. Add the diagnostic context to the commit body.

The commit body explains the WHY beautifully. Worth adding one more sentence noting the agentic-workflow scenario this unblocks — "diagnosed against the keploy validation harness scenario-1, where a planted Case-1 source regression landed in OBSOLETE → run reported PASSED → agent had no signal to engage." Future readers grepping for OBSOLETE will land on this commit and need the context.

Functionally LGTM after #1 + #2. The behaviour is correct and the flag-gated rollout is the right shape. Tests + the comment on the asymmetry are the blockers.

charankamarapu requested a review from gouravkrosx as a code owner June 7, 2026 23:08

charankamarapu mentioned this pull request Jun 7, 2026

docs(skill): document --strict-failure on cloud replay; drop update_mock guidance keploy/docs#873

Merged

3 tasks

charankamarapu force-pushed the feat/strict-failure-flag branch from 8f5d9c3 to 52034e1 Compare June 7, 2026 23:21

charankamarapu commented Jun 7, 2026

View reviewed changes

ayush3160 approved these changes Jun 8, 2026

View reviewed changes

gouravkrosx approved these changes Jun 8, 2026

View reviewed changes

charankamarapu merged commit d0fbdbd into main Jun 8, 2026
138 of 141 checks passed

charankamarapu deleted the feat/strict-failure-flag branch June 8, 2026 09:12

github-actions Bot locked and limited conversation to collaborators Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(replay): add --strict-failure flag to fail response-divergent tests instead of demoting to OBSOLETE#4251

feat(replay): add --strict-failure flag to fail response-divergent tests instead of demoting to OBSOLETE#4251
charankamarapu merged 1 commit into
mainfrom
feat/strict-failure-flag

charankamarapu commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

charankamarapu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

charankamarapu commented Jun 7, 2026

Summary

Why

What it changes

CLI surface

Verification

Test plan

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

🚀 Keploy Performance Test Results

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

🚀 Keploy Performance Test Results

Uh oh!

charankamarapu left a comment

Choose a reason for hiding this comment

Principal engineer self-review

Required

Strongly recommended

Nice to have

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants