feat(ci): harden e2e diff bot (coverage, regression, retries, safety)#2593
Merged
Conversation
- Changed-line coverage: report what % of the PR's changed source lines the tests actually execute (parses a coverprofile against the diff's new line ranges), turning "N tests added" into an adequacy signal. - Regression gate: run `go build ./...` after the agent and fold it into the pass criterion, so the agent can't green its tests while breaking the build elsewhere. - Best-of-N: the agent is stochastic, so `/e2e diff xN` retries up to N times (≤5) until a run passes, resetting the tree between attempts and keeping the best; a single run is labelled as one sample. - Safety: pass the triggering login through an env var instead of interpolating it into the shell; make provider pricing configurable via repo variables (cost was a hardcoded placeholder).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the gaps from review:
go build ./...regression gate — folded into pass./e2e diff xNretries up to N (≤5) until a pass; single run labelled a sample.All smoke-tested locally (coverage 50%-partial case, by-assertion vs compile-only, retry loop).