eve-k pillar: upgrade longhorn-manager to v1.9.1 and fix BackupTargetName on failover by andrewd-zededa · Pull Request #5765 · lf-edge/eve

andrewd-zededa · 2026-04-07T22:31:29Z

Description

Upgrades github.com/longhorn/longhorn-manager from v1.6.0 to v1.9.1 in
pkg/pillar and fixes a failover regression that surfaces when volumes have
been migrated forward from older longhorn versions.

Fix: BackupTargetName empty on migrated volumes (kubeapi/longhorninfo.go)

Longhorn v1.9.x introduces a webhook validator that rejects any volume
Update() where Spec.BackupTargetName is empty:

"backup target name cannot be empty when creating a volume or updating
from an existing backup target"

Volumes migrated from longhorn < v1.7 predate the BackupTargetName field
and carry an empty value. longhornVolumeSetNode() — called during failover
— hits this validator and fails. The fix sets BackupTargetName to
"default" when the field is empty before calling Update(). The
default BackupTarget CR is always present in a longhorn installation,
even when no external backup target is configured (empty URL).

Dependency bump (go.mod / vendor)

go mod tidy and go mod vendor applied. Notable transitive bumps:

k8s.io/* v0.32.5 → v0.33.3
sigs.k8s.io/controller-runtime v0.16.1 → v0.20.4
sigs.k8s.io/structured-merge-diff/v4 v4.4.3 → v4.7.0
prometheus/client_golang v1.19.1 → v1.22.0

PR dependencies

None.

How to test and validate this PR

Set up a 2+ node HV=k cluster running longhorn.
Simulate or identify volumes originally created with longhorn < v1.7
(i.e., volumes whose Spec.BackupTargetName is empty).
Trigger a failover (graceful node reboot or power-off of the designated node).
Confirm that longhornVolumeSetNode no longer logs
"backup target name cannot be empty" and the volume successfully
migrates to the surviving node.

Changelog notes

Optimize a failover of HV=k applications, trimming time spent migrating volumes.

PR Backports

16.0-stable: To be determined.
14.5-stable: To be determined.
13.4-stable: To be determined.

Checklist

I've provided a proper description
I've added the proper documentation
I've tested my PR on amd64 device
I've tested my PR on arm64 device
I've written the test verification instructions
I've set the proper labels to this PR

And the last but not least:

I've checked the boxes above, or I've provided a good reason why I didn't
check them.

Please, check the boxes above after submitting the PR in interactive mode.

codecov · 2026-04-07T23:07:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.87%. Comparing base (2281599) to head (a420786).
⚠️ Report is 486 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5765       +/-   ##
===========================================
+ Coverage   19.52%   29.87%   +10.34%     
===========================================
  Files          19       18        -1     
  Lines        3021     2417      -604     
===========================================
+ Hits          590      722      +132     
+ Misses       2310     1549      -761     
- Partials      121      146       +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewd-zededa · 2026-04-08T16:13:35Z

rebased off latest master

naiming-zededa

LGTM

zedi-pramodh

LGTM, just few comments on the document.

Also please explain how de-scheduler works.

zedi-pramodh · 2026-04-09T20:15:50Z

We still need some documentation on de-scheduler.

andrewd-zededa · 2026-04-09T20:27:57Z

We still need some documentation on de-scheduler.

Done, expanded on the policy and trigger under the '### Failback handling' section.

zedi-pramodh

LGTM

andrewd-zededa · 2026-04-10T21:41:18Z

/rerun red

rene · 2026-04-11T16:23:46Z

@andrewd-zededa , pls, rebase on top of master.

Volumes migrated from longhorn < v1.7 may have an empty BackupTargetName. The v1.9.x webhook validator rejects any Update() where BackupTargetName is empty, producing "backup target name cannot be empty" errors during failover. Set it to "default" when unset before calling Update(). Also clarify failover.md: kubevirt virtualization support means HV=k. Document Kubernetes object state timeline for VMIRs app failover in docs/failover.md, covering node NotReady through new VMI Running, including EVE-specific tolerateSec=15 and logcollectInterval=10s timing, best-case timing summary, and descheduler-based failback handling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andrew Durbin <andrewd@zededa.com>

Bump github.com/longhorn/longhorn-manager v1.6.0 → v1.9.1 with go mod tidy and go mod vendor. Transitive bumps include: - k8s.io/* v0.32.5 → v0.33.3 - sigs.k8s.io/controller-runtime v0.16.1 → v0.20.4 - sigs.k8s.io/structured-merge-diff/v4 v4.4.3 → v4.7.0 - prometheus/client_golang v1.19.1 → v1.22.0 - gorilla/websocket, onsi/gomega, and several others Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andrew Durbin <andrewd@zededa.com>

andrewd-zededa · 2026-04-13T13:32:26Z

The yetus failure is not clear, no listing even runs on this PR.

andrewd-zededa · 2026-04-13T15:17:10Z

Rebased, yetus seems to not even run which may be due to all the vendor changes.

eriknordmark · 2026-04-13T20:51:54Z

FWIW all 4 eden smoke tests fail with the know issue with FAIL: TestHWInventory

github-actions Bot requested review from eriknordmark, naiming-zededa and zedi-pramodh April 7, 2026 22:34

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from 06c8480 to af63985 Compare April 8, 2026 16:13

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from af63985 to 02b9878 Compare April 9, 2026 17:42

andrewd-zededa marked this pull request as ready for review April 9, 2026 17:51

naiming-zededa approved these changes Apr 9, 2026

View reviewed changes

zedi-pramodh reviewed Apr 9, 2026

View reviewed changes

Comment thread pkg/pillar/docs/failover.md Outdated

zedi-pramodh reviewed Apr 9, 2026

View reviewed changes

Comment thread pkg/pillar/docs/failover.md Outdated

zedi-pramodh reviewed Apr 9, 2026

View reviewed changes

Comment thread pkg/pillar/docs/failover.md

zedi-pramodh reviewed Apr 9, 2026

View reviewed changes

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from 02b9878 to 61fce74 Compare April 9, 2026 19:58

github-actions Bot requested review from naiming-zededa and zedi-pramodh April 9, 2026 20:01

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from 61fce74 to 39d4781 Compare April 9, 2026 20:03

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from 39d4781 to 63469d5 Compare April 9, 2026 20:27

zedi-pramodh approved these changes Apr 9, 2026

View reviewed changes

andrewd-zededa and others added 2 commits April 13, 2026 07:05

andrewd-zededa force-pushed the lh-mgr-mod-1.9.1 branch from 63469d5 to a420786 Compare April 13, 2026 13:21

github-actions Bot requested a review from zedi-pramodh April 13, 2026 13:24

eriknordmark merged commit 6eac75b into lf-edge:master Apr 13, 2026
58 of 66 checks passed

Conversation

andrewd-zededa commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR dependencies

How to test and validate this PR

Changelog notes

PR Backports

Checklist

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andrewd-zededa commented Apr 8, 2026

Uh oh!

naiming-zededa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zedi-pramodh left a comment

Choose a reason for hiding this comment

Uh oh!

zedi-pramodh commented Apr 9, 2026

Uh oh!

andrewd-zededa commented Apr 9, 2026

Uh oh!

zedi-pramodh left a comment

Choose a reason for hiding this comment

Uh oh!

andrewd-zededa commented Apr 10, 2026

Uh oh!

rene commented Apr 11, 2026

Uh oh!

andrewd-zededa commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewd-zededa commented Apr 13, 2026

Uh oh!

eriknordmark commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewd-zededa commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading

andrewd-zededa commented Apr 13, 2026 •

edited

Loading