Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Protect spdkClient in ReplicaCreate #7752

Closed
derekbit opened this issue Jan 23, 2024 · 2 comments
Closed

[BUG] Protect spdkClient in ReplicaCreate #7752

derekbit opened this issue Jan 23, 2024 · 2 comments
Assignees
Labels
kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@derekbit
Copy link
Member

Describe the bug

Protect spdkClient in ReplicaCreate when replacing the client when there is a connection issue.

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

  • Longhorn version:
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of control plane nodes in the cluster:
    • Number of worker nodes in the cluster:
  • Node config
    • OS type and version:
    • Kernel version:
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

@derekbit derekbit added kind/bug require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Jan 23, 2024
@derekbit derekbit added this to the v1.6.0 milestone Jan 23, 2024
@derekbit derekbit self-assigned this Jan 23, 2024
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jan 23, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

It is not easy to trigger the error and the lock protection, so QA can simply validate the v2 volume attachment and detachment to see if there is any unexpected behavior.

  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-spdk-engine#104
longhorn/longhorn-instance-manager#379

  • Which areas/issues this PR might have potential impacts on?
    Area: v2 volume
    Issues

@chriscchien
Copy link
Contributor

Verified pass on longhorn master(longhorn-instance-manager 52decf), v1.6.x(longhorn-instance-manager c100da)

Below v2 volume operations worked well on master-head and v1.6.x-head

  • Attach / detach
  • Offline replica rebuilding
  • Crash instance-manger pod when replica offline rebuilding, rebuild success.
  • Create backup
  • Restore the backup to v1 and v2 volume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
None yet
Development

No branches or pull requests

3 participants