fix: make fuzz run without fuzzy harness errors by anyasabo · Pull Request #678 · hashicorp/raft

anyasabo · 2026-05-05T23:31:48Z

Hi friends long time no see. I was trying to run make fuzz locally and it panicked relatively quickly which was surprising. I then saw it wasn't because it found a bug bug, just a bug in the harness :/ The fix did not seem that complicated though. I wonder if we should run these regularly (or least annually?) or remove them

clanker generated pr desc below

Summary

This PR fixes fuzzy-harness failures that could make make fuzz fail for harness reasons unrelated to Raft correctness.

Specifically, it:

adds missing pre-vote RPC decoding in the fuzzy transport
makes fuzzy leadership transfer fail via Future.Error() when no leader exists, instead of panicking
adds/strengthens leadership-transfer fuzzy tests to assert the no-leader path is handled explicitly

Problem

make fuzz was not reliably usable as a correctness signal because the harness had two gaps:

Missing pre-vote decode path in fuzzy/transport.go
- The transport handled RequestVote, AppendEntries, etc., but did not decode RequestPreVoteRequest inside sendRPC.
- This is a historical integration gap: pre-vote support was introduced later (PR #530, commit 181475c), while the fuzzy transport code path predates that work. The RequestPreVote method was added, but the decode switch was not updated for the new request type.
- In pre-vote election paths this produced errors such as:
  - unexpected request type: *raft.RequestPreVoteRequest
  - RPC does not have a header
- Result: election traffic could fail in the harness even when Raft behavior itself was not the issue.
Nil dereference during leadership transfer in fuzzy/cluster.go
- cluster.leadershipTransfer() assumed a leader existed and dereferenced ldr.raft.
- Under realistic fuzz timing (no stable leader yet), this panicked instead of returning a normal failure signal.
- Result: test process crashes (false-negative fuzz failure mode) instead of actionable test output.

Impact of the bug

The impact is on test reliability and signal quality:

make fuzz could fail due to harness defects, not Raft logic.
Panics terminate runs early, reducing explored interleavings.
Engineers can waste time triaging harness noise rather than real protocol problems.
CI/local fuzz runs lose trustworthiness as an early regression detector.

Why this fix is necessary

Fuzzing is only useful when the harness both:

models expected protocol traffic correctly, and
reports failures deterministically.

Without this fix, pre-vote election traffic is mis-modeled and no-leader transfer scenarios crash instead of returning explicit errors.

Why this is more correct

This change aligns the harness with Raft expectations and Go API contracts:

Transport correctness: pre-vote requests are decoded and routed like other supported RPC types.
Failure-semantics correctness: no-leader transfer returns an error future instead of panicking.
Regression protection: tests now assert explicit no-leader behavior.

In short, failures become deterministic and diagnosable test failures rather than runtime panics.

Scope / Risk

Scope is intentionally narrow:

fuzzy/cluster.go
fuzzy/leadershiptransfer_test.go
fuzzy/transport.go

No Raft core algorithm changes, no CI workflow changes.

Test Plan

go test -count=1 -run 'TestRaft_FuzzyLeadershipTransfer|TestRaft_FuzzyLeadershipTransferWithoutLeader' ./fuzzy
make fuzz

hashicorp-cla-app · 2026-05-05T23:32:01Z

All committers have signed the CLA.

hashicorp-cla-app · 2026-05-05T23:32:03Z

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

_{Have you signed the CLA already but the status is still pending? Recheck it.}

tgross

Hi @anyasabo! I took this for a spin and it LGTM. Do you want to mark it ready for review? (Also, if we remove the LLM authorship the CLA bot will be happy.)

I've opened #681 to follow-up on the data race we're seeing in the back compat test, as that cannot be related to this PR.

anyasabo · 2026-05-29T21:38:59Z

@tgross done, sorry for the time. took another pass and lgtm too and removed llm authorship

tgross · 2026-06-01T23:13:25Z

Sorry about the failed checks here @anyasabo. #687 was just merged to main and if you rebase on that you should be good-to-go to have this run.

tgross mentioned this pull request May 19, 2026

data race in TestRaft_ReplaceUpgrade back compat test #681

Open

tgross reviewed May 19, 2026

View reviewed changes

tgross mentioned this pull request May 19, 2026

fix data race in error logs for replication #682

Open

rm llm

cada209

anyasabo force-pushed the fix/fuzzy-harness-make-fuzz branch from 471717c to cada209 Compare May 29, 2026 21:36

anyasabo marked this pull request as ready for review May 29, 2026 21:37

anyasabo requested review from a team as code owners May 29, 2026 21:37

tgross added this to Nomad - Community Issues Triage Jun 1, 2026

github-project-automation Bot moved this to Needs Triage in Nomad - Community Issues Triage Jun 1, 2026

tgross self-assigned this Jun 1, 2026

tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make fuzz run without fuzzy harness errors#678

fix: make fuzz run without fuzzy harness errors#678
anyasabo wants to merge 1 commit into
hashicorp:mainfrom
anyasabo:fix/fuzzy-harness-make-fuzz

anyasabo commented May 5, 2026 •

edited

Loading

Uh oh!

hashicorp-cla-app Bot commented May 5, 2026 •

edited

Loading

Uh oh!

hashicorp-cla-app Bot commented May 5, 2026

Uh oh!

tgross left a comment

Uh oh!

anyasabo commented May 29, 2026

Uh oh!

tgross commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anyasabo commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Impact of the bug

Why this fix is necessary

Why this is more correct

Scope / Risk

Test Plan

Uh oh!

hashicorp-cla-app Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashicorp-cla-app Bot commented May 5, 2026

Uh oh!

tgross left a comment

Choose a reason for hiding this comment

Uh oh!

anyasabo commented May 29, 2026

Uh oh!

tgross commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anyasabo commented May 5, 2026 •

edited

Loading

hashicorp-cla-app Bot commented May 5, 2026 •

edited

Loading