Skip to content

fix: make fuzz run without fuzzy harness errors#678

Open
anyasabo wants to merge 1 commit into
hashicorp:mainfrom
anyasabo:fix/fuzzy-harness-make-fuzz
Open

fix: make fuzz run without fuzzy harness errors#678
anyasabo wants to merge 1 commit into
hashicorp:mainfrom
anyasabo:fix/fuzzy-harness-make-fuzz

Conversation

@anyasabo
Copy link
Copy Markdown

@anyasabo anyasabo commented May 5, 2026

Hi friends long time no see. I was trying to run make fuzz locally and it panicked relatively quickly which was surprising. I then saw it wasn't because it found a bug bug, just a bug in the harness :/ The fix did not seem that complicated though. I wonder if we should run these regularly (or least annually?) or remove them

clanker generated pr desc below


Summary

This PR fixes fuzzy-harness failures that could make make fuzz fail for harness reasons unrelated to Raft correctness.

Specifically, it:

  • adds missing pre-vote RPC decoding in the fuzzy transport
  • makes fuzzy leadership transfer fail via Future.Error() when no leader exists, instead of panicking
  • adds/strengthens leadership-transfer fuzzy tests to assert the no-leader path is handled explicitly

Problem

make fuzz was not reliably usable as a correctness signal because the harness had two gaps:

  1. Missing pre-vote decode path in fuzzy/transport.go

    • The transport handled RequestVote, AppendEntries, etc., but did not decode RequestPreVoteRequest inside sendRPC.
    • This is a historical integration gap: pre-vote support was introduced later (PR #530, commit 181475c), while the fuzzy transport code path predates that work. The RequestPreVote method was added, but the decode switch was not updated for the new request type.
    • In pre-vote election paths this produced errors such as:
      • unexpected request type: *raft.RequestPreVoteRequest
      • RPC does not have a header
    • Result: election traffic could fail in the harness even when Raft behavior itself was not the issue.
  2. Nil dereference during leadership transfer in fuzzy/cluster.go

    • cluster.leadershipTransfer() assumed a leader existed and dereferenced ldr.raft.
    • Under realistic fuzz timing (no stable leader yet), this panicked instead of returning a normal failure signal.
    • Result: test process crashes (false-negative fuzz failure mode) instead of actionable test output.

Impact of the bug

The impact is on test reliability and signal quality:

  • make fuzz could fail due to harness defects, not Raft logic.
  • Panics terminate runs early, reducing explored interleavings.
  • Engineers can waste time triaging harness noise rather than real protocol problems.
  • CI/local fuzz runs lose trustworthiness as an early regression detector.

Why this fix is necessary

Fuzzing is only useful when the harness both:

  • models expected protocol traffic correctly, and
  • reports failures deterministically.

Without this fix, pre-vote election traffic is mis-modeled and no-leader transfer scenarios crash instead of returning explicit errors.

Why this is more correct

This change aligns the harness with Raft expectations and Go API contracts:

  • Transport correctness: pre-vote requests are decoded and routed like other supported RPC types.
  • Failure-semantics correctness: no-leader transfer returns an error future instead of panicking.
  • Regression protection: tests now assert explicit no-leader behavior.

In short, failures become deterministic and diagnosable test failures rather than runtime panics.

Scope / Risk

Scope is intentionally narrow:

  • fuzzy/cluster.go
  • fuzzy/leadershiptransfer_test.go
  • fuzzy/transport.go

No Raft core algorithm changes, no CI workflow changes.

Test Plan

  • go test -count=1 -run 'TestRaft_FuzzyLeadershipTransfer|TestRaft_FuzzyLeadershipTransferWithoutLeader' ./fuzzy
  • make fuzz

@hashicorp-cla-app
Copy link
Copy Markdown

hashicorp-cla-app Bot commented May 5, 2026

CLA assistant check
All committers have signed the CLA.

@hashicorp-cla-app
Copy link
Copy Markdown

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anyasabo! I took this for a spin and it LGTM. Do you want to mark it ready for review? (Also, if we remove the LLM authorship the CLA bot will be happy.)

I've opened #681 to follow-up on the data race we're seeing in the back compat test, as that cannot be related to this PR.

@anyasabo anyasabo force-pushed the fix/fuzzy-harness-make-fuzz branch from 471717c to cada209 Compare May 29, 2026 21:36
@anyasabo anyasabo marked this pull request as ready for review May 29, 2026 21:37
@anyasabo anyasabo requested review from a team as code owners May 29, 2026 21:37
@anyasabo
Copy link
Copy Markdown
Author

@tgross done, sorry for the time. took another pass and lgtm too and removed llm authorship

@tgross tgross self-assigned this Jun 1, 2026
@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Jun 1, 2026
@tgross
Copy link
Copy Markdown
Member

tgross commented Jun 1, 2026

Sorry about the failed checks here @anyasabo. #687 was just merged to main and if you rebase on that you should be good-to-go to have this run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants