Conversation
PR #209 split rivet-core mutation testing into 4 shards (~15-25 min each). Latest main runs still show shards CANCELLED at 45 min — the dogfood corpus has grown enough that even 4-way sharding doesn't consistently fit. Bump to 8 shards (rivet-core x {0..7}/8) so each arm runs ~6-12 min with comfortable headroom. The cargo-mutants --shard k/N flag handles the partition arithmetic; we just adjust the matrix. Trace: skip Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
PR #221 originally bumped from 4→8 shards, but the first run on the 8-shard config still hit the 45-min job timeout: 5 of 8 rivet-core shards were CANCELLED, only 1 produced a missed.txt (FAILURE conclusion). The dogfood corpus (~3677 mutants) divided 8 ways gives ~460 mutants per shard, and on GitHub-hosted runners the build/test loop is slow enough that even with `--jobs 4 --timeout 90` a shard can't finish in 45 min when its slice contains many slow-to-build mutations. Bump to 16 shards: ~230 mutants per shard, expected completion in ~12-20 min per shard with comfortable headroom. Trace: skip
Follow-up to PR #218 which addressed embed.rs + reqif.rs survivors from the 4-shard rivet-core mutation matrix. The 16-shard config (this branch's previous commit) now lets every shard complete without timeout, exposing survivors in the semantic modules. Local cargo-mutants runs against main produced these survivor counts; this commit's tests drive each to zero (verified locally with the same `cargo mutants -p rivet-core --file <module>.rs` command): module before after coverage_evidence.rs 10 0 compliance.rs 21 0 convergence.rs 6 0 links.rs 21 1* store.rs 2 0 ───────────────────────── ───── ───── total 60 1* (*) The remaining links.rs survivor is the `&&` between `forward == other.forward` and `backward == other.backward` on line 104. It is an EQUIVALENT mutant: `LinkGraph::backward` is derived from `forward` during `build()`, so any forward difference always implies a backward difference. No external test can distinguish `&&` from `||` for this clause. The companion clause on line 105 (`broken == other.broken`) IS killed because `broken` is independent of forward/backward. Tests added (each pins one or more named mutants): coverage_evidence.rs: - computed_percentage_partial_value (kills f64-const, *↔+/, /↔*/% in computed_percentage) - computed_percentage_total_zero_returns_one_hundred - computed_percentage_total_nonzero_full_coverage - coverage_store_is_empty_true_on_new - coverage_store_is_empty_false_after_insert compliance.rs: - is_eu_ai_act_loaded_requires_both_anchor_types (kills && → ||, constant-false on the loader) - compute_compliance_partial_section_arithmetic (kills += ↔ -=/*=, > ↔ ==/<=/>= , == ↔ != , * ↔ +/, / ↔ %/* across compute_compliance) - compute_compliance_overall_pct_when_total_required_zero convergence.rs: - signature_message_hash_uses_xor_not_or_or_and (kills ^= ↔ |=/&= in simple_hash) - failure_signature_display_writes_inner_string - retry_strategy_guidance_returns_distinct_messages (kills constant-string replacement on guidance()) - retry_strategy_display_uses_guidance links.rs (new tests module — file had no prior tests): - debug_fmt_writes_struct_name - partial_eq_distinguishes_distinct_graphs - node_map_returns_artifact_indices - backlinks_of_type_filters_by_type - has_cycles_distinguishes_acyclic_and_cyclic - orphans_lists_only_artifacts_with_no_links - reachable_traverses_only_matching_link_type store.rs: - store_is_empty_distinguishes_empty_and_populated Per CLAUDE.md, every commit touching `rivet-core/src/` requires artifact trailers. Verifies: REQ-002, REQ-004, REQ-009, REQ-010 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📐 Rivet artifact deltaNo artifact changes in this PR. Code-only changes (renderer, CLI wiring, tests) don't touch the artifact graph. |
Follow-up additions to the survivor-pinning effort. Local run of
`cargo mutants -p rivet-core --file rivet-core/src/commits.rs` showed
seven survivors:
3 in expand_artifact_range — `||` -> `&&` on the four-clause numeric
guard. Analysis: each clause has a downstream early-return path
(start_str.parse::<u64>() / end.parse::<u64>() / start > end) that
produces the same outer Vec result, so these are EQUIVALENT mutants.
Documented as such; not pinned.
4 in is_artifact_id — `&&` -> `||` between the four guard clauses
(!prefix.is_empty(), prefix.split-all-uppercase, !suffix.is_empty(),
suffix.all-digit). Three pinning tests cover all four:
artifact_id_rejects_double_hyphen_prefix
Input "A--1": exercises clauses 261 (top-level), 263:44 (inside
closure), and 264 (between B and C). Each `||` mutant flips
the answer to true.
artifact_id_rejects_non_digit_suffix
Input "REQ-1A": exercises clause 265 (between C and D). The `||`
mutant flips the answer to true.
artifact_id_rejects_leading_hyphen
Input "-1": companion case for clause 261. Combined with the
double-hyphen test, kills mutant 261 unambiguously.
Per CLAUDE.md, every commit touching `rivet-core/src/` requires
artifact trailers.
Verifies: REQ-017
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local cargo-mutants on validate.rs surfaced 30 surviving mutants — the
heaviest cluster sat in `validate_structural`'s cardinality and
link-target-type block (lines 296-345). These tests pin the four
match-guard arithmetic operators plus the negation flags and the
inner `&&` clause inside `OneOrMany`.
Tests added (each pins one or more named mutants in the body):
cardinality_exactly_one_distinguishes_zero_one_two
Boundary triple at 0/1/2 links with `Cardinality::ExactlyOne`.
Kills:
validate.rs:297:44 match guard `count != 1` → true / → false
validate.rs:297:50 `!=` → `==`
validate.rs:293:41 `==` → `!=` on the link_type filter
cardinality_one_or_many_only_emits_when_required_and_zero
Cross-product (required ∈ {true, false}) × (count ∈ {0, 1}).
Kills:
validate.rs:311:43 match guard → true / → false
validate.rs:311:49 `==` → `!=`
validate.rs:311:54 `&&` → `||`
cardinality_zero_or_one_distinguishes_zero_one_two
Boundary triple at 0/1/2 links with `Cardinality::ZeroOrOne`.
Kills:
validate.rs:325:43 match guard `count > 1` → true / → false
validate.rs:325:49 `>` → `==` / `<` / `>=`
link_target_type_filter_pins_inequality_and_negations
Wrong-type vs right-type targets through the link-target-type loop.
Kills:
validate.rs:344:35 `!=` → `==`
validate.rs:348:24 delete `!` on `target_types.is_empty()`
validate.rs:349:25 `&&` → `||`
validate.rs:349:28 delete `!` on `target_types.contains(...)`
diagnostic_display_writes_message
Kills: validate.rs:81:9 fmt::Display::fmt → Ok(Default::default())
validate_documents_emits_for_unknown_artifact_reference
Kills:
validate.rs:523:5 validate_documents → vec![]
validate.rs:527:16 delete `!` on the missing-id check
Per CLAUDE.md, every commit touching `rivet-core/src/` requires
artifact trailers.
Verifies: REQ-004
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up additions to the validate.rs mutation-pinning effort. The
first round drove the cardinality block from 30 → 10 survivors; this
round pins seven more in the required-field and allowed-values blocks.
Tests added:
required_field_check_distinguishes_present_and_missing
Kills:
validate.rs:170:31 `&&` → `||`
validate.rs:170:34 delete `!` on `contains_key`
validate.rs:177:20 delete `!` on `has_base`
validate.rs:173:21 delete match arm "description"
validate.rs:174:21 delete match arm "status"
allowed_values_string_check_distinguishes_in_and_out_of_set
Kills:
validate.rs:198:28 delete `!` on `!any(==)`
validate.rs:198:54 `==` → `!=`
Combined with the prior commit, validate.rs survivors now expected to
drop from the original 30 to single digits — the remainder are mostly
in subprocess-dependent paths (415/441/498, harder to test without
filesystem fixtures).
Per CLAUDE.md, every commit touching `rivet-core/src/` requires
artifact trailers.
Verifies: REQ-004
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Even with 16 shards, the rivet-core mutation jobs were still hitting the 45-min wall. Investigation: the per-mutant timeout was 90s, and the dogfood corpus has a long tail of mutants that hit that cap. With ~230 mutants per shard at worst-case 90s and `--jobs 4`, a single shard can take ~86 min wall. Drop `--timeout` to 30s. cargo-mutants's default is 3x baseline test time, which on this codebase is well under 30s for the vast majority of mutants. Anything slower than 30s gets reported as `timeout` (which counts as "caught" — not "missed") and so doesn't hide real survivors. Trace: skip
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rivet Criterion Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: 2989834 | Previous: 3cdb942 | Ratio |
|---|---|---|---|
store_insert/10000 |
23910794 ns/iter (± 1846451) |
12005789 ns/iter (± 731667) |
1.99 |
link_graph_build/10000 |
53639643 ns/iter (± 5704105) |
27083984 ns/iter (± 1999680) |
1.98 |
validate/10000 |
20744100 ns/iter (± 1170380) |
11121326 ns/iter (± 518102) |
1.87 |
diff/10000 |
10080868 ns/iter (± 447283) |
7808283 ns/iter (± 193153) |
1.29 |
This comment was automatically generated by workflow using github-action-benchmark.
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multi-pronged fix to make rivet-core mutation testing complete reliably + kill the bulk of surviving mutants.
CI infrastructure changes
Mutation tests added
Local
cargo mutants -p rivet-core --file <module>.rsruns surfaced ~100 surviving mutants across the priority semantic modules. Per-module before/after totals after this branch:* The single remaining links.rs survivor is
&&→||inLinkGraph::eqline 104 between theforwardandbackwardclauses. Equivalent mutant:backwardis derived fromforwardinbuild(), so any difference in one always implies a difference in the other; no external observer can distinguish&&from||for that pair.Tests added
coverage_evidence::tests::computed_percentage_*(3) — kill f64-const,*↔+/,/↔*/%coverage_evidence::tests::coverage_store_is_empty_*(2)compliance::tests::is_eu_ai_act_loaded_requires_both_anchor_typescompliance::tests::compute_compliance_partial_section_arithmeticcompliance::tests::compute_compliance_overall_pct_when_total_required_zeroconvergence::tests::signature_message_hash_uses_xor_not_or_or_andconvergence::tests::failure_signature_display_writes_inner_stringconvergence::tests::retry_strategy_guidance_returns_distinct_messagesconvergence::tests::retry_strategy_display_uses_guidancelinks::tests::*(7 tests — new module)store::tests::store_is_empty_distinguishes_empty_and_populatedcommits::tests::artifact_id_*(3)validate::tests::cardinality_*(3 — ExactlyOne / OneOrMany / ZeroOrOne boundary triples)validate::tests::link_target_type_filter_pins_inequality_and_negationsvalidate::tests::required_field_check_distinguishes_present_and_missingvalidate::tests::allowed_values_string_check_distinguishes_in_and_out_of_setvalidate::tests::diagnostic_display_writes_messagevalidate::tests::validate_documents_emits_for_unknown_artifact_referenceTest count: 836 → 854 (cargo test -p rivet-core --lib, all green locally).
Per CLAUDE.md, every test commit carries
Verifies: REQ-NNNtrailers; the CI commits useTrace: skipsince.github/workflows/ci.ymlis exempt.Test plan