Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Concurrent expanding volume and upgrade longhorn-engine image #2471

Open
jenting opened this issue Apr 12, 2021 · 2 comments
Open

[BUG] Concurrent expanding volume and upgrade longhorn-engine image #2471

jenting opened this issue Apr 12, 2021 · 2 comments
Labels
area/v1-data-engine v1 data engine (iSCSI tgt) component/longhorn-manager Longhorn manager (control plane) kind/bug severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@jenting
Copy link
Contributor

jenting commented Apr 12, 2021

Describe the bug

Concurrent expanding volume and upgrade longhorn-engine image, the volume state become strange

  • The volume state is not ready
  • The volume is attached to a node
  • The volume can't detach anymore

image
image
image

To Reproduce
Steps to reproduce the behavior:

  1. Set Concurrent Automatic Engine Upgrade Per Node Limit to 0.
  2. Deploy longhorn with version v1.1.1-rc1 but with engine-image version v1.0.2.
  3. Detach the volume.
  4. Expand volume from 2Gi to 8Gi.
  5. During volume expansion, manually upgrade the longhorn engine image from v1.0.2 to v1.1.1-rc1 through UI.
  6. The volume state becomes NotReady and attached to a Node. However, I can't click the detach icon

Expected behavior

  1. The volume expansion should expand to 8Gi
  2. The longhorn engine image should upgrade from v1.0.2 to v1.1.1-rc1

Log

longhorn-manager-tc9jr longhorn-manager time="2021-04-12T06:46:18Z" level=info msg="Preparing to start frontend blockdev" controller=longhorn-engine engine=pvc-9b23038f-5e81-46c1-aaee-8dab6f95c356-e-a5376dbd node=jenting-k3s-worker-2
longhorn-manager-tc9jr longhorn-manager E0412 06:46:19.337584       1 engine_controller.go:693] failed to update status for engine pvc-9b23038f-5e81-46c1-aaee-8dab6f95c356-e-a5376dbd: failed to start frontend blockdev: error starting frontend tgt-blockdev: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.1-rc1/longhorn [--url 10.42.2.42:10001 frontend start tgt-blockdev], output , stderr, time="2021-04-12T06:46:19Z" level=fatal msg="Error running frontend start command: failed to start frontend tgt-blockdev for volume 10.42.2.42:10001: rpc error: code = Unknown desc = failed to upgrade frontend: failed to reload socket connection at /dev/longhorn/pvc-9b23038f-5e81-46c1-aaee-8dab6f95c356: exit status 15"
longhorn-manager-tc9jr longhorn-manager , error exit status 1

You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment:

  • Longhorn version: v1.1.1-rc1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.20.4+k3s1
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: SLES SP15
    • CPU per node: 4
    • Memory per node: 8G
    • Disk type(e.g. SSD/NVMe): SSD
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): VMWare
  • Number of Longhorn volumes in the cluster: 1

Additional context
If I change the reproduce steps 4 and 5 (i.e upgrade the longhorn engine image first and then perform volume expansion), it will work.

@innobead
Copy link
Member

innobead commented Apr 13, 2021

cc @PhanLe1010 @khushboo-rancher

@innobead innobead added the severity/3 Function working but has a major issue w/ workaround label Apr 13, 2021
@khushboo-rancher
Copy link
Contributor

Users are expected to upgrade Longhorn components and upgrade each volumes'engine before doing any operation. This should be a low severity/priority issue in my view.

@innobead innobead added this to the v1.1.2 milestone Apr 13, 2021
@innobead innobead added the component/longhorn-manager Longhorn manager (control plane) label Apr 13, 2021
@innobead innobead added the area/v1-data-engine v1 data engine (iSCSI tgt) label Apr 28, 2021
@innobead innobead modified the milestones: v1.1.2, v1.2.0 Apr 29, 2021
@innobead innobead modified the milestones: v1.2.0, v1.3.0 Aug 12, 2021
@innobead innobead modified the milestones: v1.3.0, Backlog Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/v1-data-engine v1 data engine (iSCSI tgt) component/longhorn-manager Longhorn manager (control plane) kind/bug severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

3 participants