Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration #7597

Closed
derekbit opened this issue Jan 9, 2024 · 2 comments
Assignees
Labels
area/resilience System or volume resilience area/v2-data-engine v2 data engine (SPDK) area/volume-backup-restore Volume backup restore kind/bug require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@derekbit
Copy link
Member

derekbit commented Jan 9, 2024

Describe the bug

v2 volume becomes faulted and detached after deleting one replica during full restoration.

To Reproduce

  1. Create a v2 volume with 3 replicas
  2. Create a backup
  3. Restore a v2 volume from the backup
  4. Delete one replica during restoration.
  5. The volume becomes faulted and detached after restoration.

Expected behavior

Support bundle for troubleshooting

Environment

  • Longhorn version:
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of control plane nodes in the cluster:
    • Number of worker nodes in the cluster:
  • Node config
    • OS type and version:
    • Kernel version:
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

<!-Please add any other context about the problem here.-->

@derekbit derekbit added kind/bug area/volume-backup-restore Volume backup restore area/v2-data-engine v2 data engine (SPDK) require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Jan 9, 2024
@derekbit derekbit added this to the v1.6.0 milestone Jan 9, 2024
@derekbit derekbit self-assigned this Jan 9, 2024
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jan 9, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

After the fix, the behavior is

  • The restoration should succeed
  • Offline rebuilding is triggered right after the finish of the volume restoration
  • The volume becomes detached and not faulted in the end
  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-manager#2439

  • Which areas/issues this PR might have potential impacts on?
    Area: v2 volume, restoration
    Issues

@derekbit derekbit added the area/resilience System or volume resilience label Jan 9, 2024
@derekbit derekbit changed the title [BUG] v2 volume becomes faulted and detached after deleting one replica on non-attached node during full restoration. [BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration. Jan 9, 2024
@derekbit derekbit changed the title [BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration. [BUG] v2 volume becomes faulted and detached after deleting one replica during full restoration Jan 9, 2024
@innobead innobead added the require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated label Jan 9, 2024
@chriscchien
Copy link
Contributor

Verified pass on longhorn master(longhorn-manager 2ac44e) with test stpes

Delete replica when v2 volume is restoring, the volume will finish restore first and then start offline rebuilding, volume become detached(ready for workload) in the end, the data intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/resilience System or volume resilience area/v2-data-engine v2 data engine (SPDK) area/volume-backup-restore Volume backup restore kind/bug require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
None yet
Development

No branches or pull requests

4 participants