[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

derekbit · 2024-01-09T12:04:03Z

Describe the bug

v2 volume becomes faulted and detached after deleting one replica during full restoration.

To Reproduce

Create a v2 volume with 3 replicas
Create a backup
Restore a v2 volume from the backup
Delete one replica during restoration.
The volume becomes faulted and detached after restoration.

Expected behavior

Support bundle for troubleshooting

Environment

Longhorn version:
Impacted volume (PV):
Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
Node config
- OS type and version:
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:

Additional context

<!-Please add any other context about the problem here.-->

longhorn-io-github-bot · 2024-01-09T12:06:55Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:

After the fix, the behavior is

The restoration should succeed
Offline rebuilding is triggered right after the finish of the volume restoration
The volume becomes detached and not faulted in the end

Does the PR include the explanation for the fix or the feature?
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/longhorn-manager#2439

Which areas/issues this PR might have potential impacts on?
Area: v2 volume, restoration
Issues

chriscchien · 2024-01-10T01:57:18Z

Verified pass on longhorn master(longhorn-manager 2ac44e) with test stpes

Delete replica when v2 volume is restoring, the volume will finish restore first and then start offline rebuilding, volume become detached(ready for workload) in the end, the data intact.

derekbit added kind/bug area/volume-backup-restore Volume backup restore area/v2-data-engine v2 data engine (SPDK) require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Jan 9, 2024

derekbit added this to the v1.6.0 milestone Jan 9, 2024

derekbit self-assigned this Jan 9, 2024

derekbit mentioned this issue Jan 9, 2024

v2 volume: fix the conflict of offline rebuilding and full restoration longhorn/longhorn-manager#2439

Merged

derekbit added the area/resilience System or volume resilience label Jan 9, 2024

derekbit changed the title ~~[BUG] v2 volume becomes faulted and detached after deleting one replica on non-attached node during full restoration.~~ [BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration. Jan 9, 2024

derekbit changed the title ~~[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration.~~ [BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration Jan 9, 2024

innobead assigned chriscchien Jan 9, 2024

innobead added the require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated label Jan 9, 2024

github-actions bot mentioned this issue Jan 9, 2024

[TEST][BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7598

Open

chriscchien closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

derekbit commented Jan 9, 2024 •

edited

longhorn-io-github-bot commented Jan 9, 2024 •

edited by derekbit

chriscchien commented Jan 10, 2024

[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

Comments

derekbit commented Jan 9, 2024 • edited

Describe the bug

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

longhorn-io-github-bot commented Jan 9, 2024 • edited by derekbit

Pre Ready-For-Testing Checklist

chriscchien commented Jan 10, 2024

derekbit commented Jan 9, 2024 •

edited

longhorn-io-github-bot commented Jan 9, 2024 •

edited by derekbit