Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Take advantage of the attach/detach mechanism in Longhorn v1.5+ #4907

Closed
ejweber opened this issue Dec 22, 2023 · 5 comments
Closed
Assignees
Labels
area/longhorn issues depends on the upstream longhorn backport-needed/1.2.3 kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release
Milestone

Comments

@ejweber
Copy link

ejweber commented Dec 22, 2023

While looking into a few Harvester issues, I noticed #3670 and #3648. It appears from these (though I need to collect more context) that Harvester manually attached Longhorn volumes to nodes in order to ensure snapshots/backups can be taken.

This behavior makes sense for Longhorn <= v1.4.x. In those versions, Longhorn did not have any mechanism to attach the volume automatically to the node and perform the operation, so these operations were generally stuck until a volume was attached. However, in v1.5.x, Longhorn added a new attach/detach controller that automatically attaches volumes for these operations and then detaches them. See https://github.com/longhorn/longhorn/blob/master/enhancements/20221024-longhorn-volumeattachment.md and longhorn/longhorn#3726.

I don't know the full extent of Harvester's need to manipulate Longhorn volumes, but I think it may be possible to remove or reduce this functionality in Harvester. It may even be safer to do so to avoid situations in which Harvester's attach/detach logic fights Longhorn's.

Copying @PhanLe1010 and @james-munson, who I was discussing this with.

@ejweber ejweber added the kind/enhancement Issues that improve or augment existing functionality label Dec 22, 2023
@innobead innobead added priority/0 Must be fixed in this release area/longhorn issues depends on the upstream longhorn labels Dec 23, 2023
@bk201 bk201 added this to the v1.3.0 milestone Dec 25, 2023
@bk201
Copy link
Member

bk201 commented Dec 25, 2023

Thanks @ejweber for the heads-up. We'll evaluate the change.

@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Dec 25, 2023

Pre Ready-For-Testing Checklist

  • If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: refactor: remove mount logic in harvester #4910

  • Is there a workaround for the issue? If so, where is it documented?

  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at: refactor: remove mount logic in harvester #4910

  • If labeled: area/ui Has the UI issue filed or ready to be merged?

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#1027

@TachunLin
Copy link

TachunLin commented Jan 24, 2024

Verified fixed on master-6ca6acfd-head (24/01/23). Close this issue.

Result

$\color{green}{\textsf{PASS}}$ Case 1: snapshot can work on a stopped VM $~~$ * VM volumes are detached after we stop VM.
vokoscreenNG-2024-01-24_16-17-36.mp4
node1:~ # kubectl get volume -A
NAMESPACE         NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
longhorn-system   pvc-5a861225-920d-4059-b501-f02b2fd0ff27   detached   unknown                  10737418240          19m
  • Take the vm snapshot, The snapshot of vm in off state can be ready.
    image
$\color{green}{\textsf{PASS}}$ Case 2: restore a snapshot from detached volumes can work. $~~$
  • Restore the snapshot to a new VM. The new VM can be ready.
vokoscreenNG-2024-01-24_16-27-23.mp4
  • Restore the snapshot to replace the old VM. The old VM can be ready.
vokoscreenNG-2024-01-24_16-28-56.mp4
$\color{green}{\textsf{PASS}}$ Case 3: backup can work on a stopped VM. $~~$
  • VM volumes are detached after we stop VM.

    NAMESPACE         NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
    longhorn-system   pvc-d1226d97-ab90-4d40-92f9-960b668093c2   detached   unknown                  10737418240          5m12s
    
  • Take the vm backup, The backup of vm in off state can be ready.
    image

$\color{green}{\textsf{PASS}}$ Case 4: race condition doesn't break VMBackup $~~$
  • Take multiple backup on the VM in a short time. All backup can be ready.
vokoscreenNG-2024-01-24_16-50-13.mp4

image

$\color{green}{\textsf{PASS}}$ Case 5: restore a backup from detached volumes can work $~~$
  • Restore the backup to a new VM. The new VM can be ready.
vokoscreenNG-2024-01-24_17-04-17.mp4
  • Restore the backup to replace the old VM. (Retain Volume), The old VM can be ready.
vokoscreenNG-2024-01-24_17-06-41.mp4
  • Restore the backup to replace the old VM. (Delete Volume), The old VM can be ready.
vokoscreenNG-2024-01-24_17-08-05.mp4

Test Information

  • Test Environment: 3 nodes harvester on baremetal machine
  • Harvester version: master-6ca6acfd-head (24/01/23).

Verify Steps

Set the NFS backup target

Case 1: snapshot can work on a stopped VM.
  1. Create a VM.
  2. After the VM is ready, stop the VM.
  3. Check VM volumes are detached.
  4. Take a snapshot on the VM. The snapshot can be ready.
Case 2: restore a snapshot from detached volumes can work.
  1. Follow Case 1.
  2. Make sure VM volumes are detached.
  3. Restore the snapshot to a new VM. The new VM can be ready.
  4. Restore the snapshot to replace the old VM. The old VM can be ready.
Case 3: backup can work on a stopped VM
  1. Create a VM.
  2. After the VM is ready, stop the VM.
  3. Check VM volumes are detached.
  4. Take a backup on the VM. The backup can be ready.
Case 4: race condition doesn't break VMBackup
  1. Create a VM.
  2. After the VM is ready, stop the VM.
  3. Check VM volumes are detached.
  4. Take multiple backup on the VM in a short time. All backup can be ready.
Case 5: restore a backup from detached volumes can work
  1. Follow Case 3.
  2. Make sure VM volumes are detached.
  3. Restore the backup to a new VM. The new VM can be ready.
  4. Restore the backup to replace the old VM. The old VM can be ready.

@harvesterhci-io-github-bot

added backport-needed/1.2.3 issue: #5853.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/longhorn issues depends on the upstream longhorn backport-needed/1.2.3 kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release
Projects
None yet
Development

No branches or pull requests

6 participants