Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Snapshot CRD #3144

Closed
HubbeKing opened this issue Oct 12, 2021 · 7 comments
Closed

[FEATURE] Snapshot CRD #3144

HubbeKing opened this issue Oct 12, 2021 · 7 comments
Assignees
Labels
area/api Longhorn manager public API highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/lep Require adding/updating enhancement proposal require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Milestone

Comments

@HubbeKing
Copy link

Is your feature request related to a problem? Please describe.
Snapshot RecurringJobs set to Retain X number of snapshots do not touch unrelated snapshots, so if one ever changes the name of the RecurringJob, the old snapshots will stick around forever. These then have to be manually deleted in the UI.
Having a CRD for snapshots would greatly simplify this, as one could prune snapshots using kubectl - much like how one can currently manage backups using kubectl due to the existance of the backups.longhorn.io CRD.

Describe the solution you'd like
A CRD for Snapshots. Something like snapshots.longhorn.io, and code in longhorn-manager to ensure that there exists a CRD object for each snapshot for each volume, and that they are in sync.

Describe alternatives you've considered
I suppose a browser automation framework might also work for pruning large numbers of snapshots, but this feels janky as all hell.

Additional context
Screenshot of v1.2.2 UI showing leftover snapshots from a deleted recurringjob, and a single snapshot from a new RecurringJob with retain set to 4:
https://i.imgur.com/SnlP23P.png

@HubbeKing HubbeKing added the kind/feature Feature request, new feature label Oct 12, 2021
@jenting
Copy link
Contributor

jenting commented Oct 15, 2021

I remember we discussed internally that Longhorn needs the snapshot CR. cc @joshimoo @shuo-wu

@jenting jenting modified the milestones: Planning, v1.3.0 Oct 15, 2021
@innobead innobead added priority/0 Must be implement or fixed in this release (managed by PO) area/api Longhorn manager public API highlight Important feature/issue to highlight labels Oct 15, 2021
@joshimoo
Copy link
Contributor

joshimoo commented Oct 21, 2021

Hey team! Please add your planning poker estimate with ZenHub @jenting @shuo-wu @PhanLe1010

20 points seems to be the max for the planning poker, but this requires quite a bit of work. (Engine, Replica, Manager modifications + refactoring along the way)

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented May 13, 2022

Test Plan:

Innit

  1. Create volume: vol1, vol2
  2. Attach and write some data to it

Create

  1. Create new snapshot CR, volume attached

    1. Deloy the yaml
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap001
        namespace: longhorn-system
      spec:
        volume: vol1
        createSnapshot: true    
      
    2. Verify that Longhorn provison a new snapshot snap001 inside Longhorn UI
    3. Run kubectl get snapshots -n longhorn-system snap001. Verify the status field of the snapshot CR is good (all fields make sense)
    4. Run kubectl describe snapshots -n longhorn-system snap001. Verify that the related events sequence is good
  2. Create new snapshot CR, volume detached

    1. Detach vol1
    2. Deloy the yaml
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap002
        namespace: longhorn-system
      spec:
        volume: vol1
        createSnapshot: true    
      
    3. Run kubectl get snapshots -n longhorn-system snap002. Verify the status field contains the error message cannot take snapshot because the volume engine vol1-e-cbc27184 is not running
    4. Run kubectl describe snapshots -n longhorn-system snap002. Verify that the related events cotains the same error
    5. Attach vol1
    6. Verify the the snapshot is created inside Longhorn UI and the snapshot CR status looks good
  3. Create snapshot using UI

    1. Create a snapshot using Longhorn UI
    2. Verify the the snapshot CR is created with the correct status and related events
  4. Create snapshot using recurring job

    1. Setting up a snapshot recurring job with retain count 5 and period very minute
    2. Verify that recurring job is functioning correctly
    3. Delete the recurring job
  5. Create snapshot CR with duplicated name

    1. Deploying the following yaml
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap002
        namespace: longhorn-system
      spec:
        volume: vol1
        createSnapshot: true    
      
    2. Verify that nothing change
    3. Deploying the following yaml
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap002
        namespace: longhorn-system
      spec:
        volume: vol2
        createSnapshot: true    
      
    4. Verify that we cannot deploy the yaml because spec.volume is immutable
  6. Create snapshot CR for the next step

    1. Deploying the following yaml
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap003
        namespace: longhorn-system
      spec:
        volume: vol1
        createSnapshot: true    
      

Delete

  1. Delete snapshot CR, volume attached

    1. Delete the snap001 CR
    2. Verify the snap001 CR is deleted and the corresponding snapshot is deleted in Longhorn UI
  2. Delete snapshot CR, volume detached

    1. Detach vol1
    2. Delete snap003 CR
    3. Verify the the snap003 CR is stuck in deleting because the volume is not attached
    4. Verify the related events contains cannot delete snapshot because the volume engine vol1-e-cbc27184 is not running by doing kubectl describe snapshots -n longhorn-system snap003
  3. Delete snapshot CR that is next to volume-head

    1. Attach vol1
    2. Verify the snap003 is still stuck in deleting because it is the snapshot next to volume-head thus cannot be removed
    3. Create a new snap004 by applying
      apiVersion: longhorn.io/v1beta2
      kind: Snapshot
      metadata:
        name: snap004
        namespace: longhorn-system
      spec:
        volume: vol1
        createSnapshot: true    
      
    4. Verify the snap004 is created and snap003 is deleted in both kubectl and Longhorn UI
  4. Delete snapshot using UI

    1. Create a new snapshot using Longhorn UI, assume that its name is abc
    2. Delete snap004 using Longhorn UI
    3. Verify the snap004 are removed in both kubectl and UI
  5. Delete snapshot that is next to volume-head using Longhorn UI

    1. delete snapshot abc
    2. verify that the snapshot abc is stuck in deleting
    3. Verify that snapshot abc CR status is correct (e.g status.markRemoved:true and status.readyToUse:false)

Others

  1. Deleting attached volume with existing snapshot CRs
    1. Create a few more snapshots for vol1
    2. Verify that corressponding snapshot CRs are creatred
    3. Delete vol1
    4. Verify that vol1is deleted together with all of its snapshots
  2. Deleting detached volume with existing snapshot CRs
    1. Attach vol2
    2. Create a few more snapshots for vol2
    3. Verify that corressponding snapshot CRs are creatred
    4. Detach vol2
    5. Delete vol2
    6. Verify that vol2is deleted together with all of its snapshots
  3. Uninstall Longhorn with existing snapshot CRs
    1. Longhorn manager pods are still runing
      1. Create a few volumes, attach them
      2. Create snapshot for them
      3. Detach some of them
      4. Unistall Longhorn
      5. Verify that the uninstallation succeed
    2. Longhorn manager pods are death
      1. Create a few volumes, attach them
      2. Create snapshot for them
      3. Detach some of them
      4. Delete Longhorn manager daemonset
      5. Unistall Longhorn
      6. Verify that the uninstallation succeed

@innobead innobead added the require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated label May 13, 2022
@innobead innobead added the require/doc Require updating the longhorn.io documentation label May 17, 2022
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented May 18, 2022

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at: Update charts for Longhorn snapshot CRDs #3991

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

  • Support Longhorn snapshot CRD longhorn-manager#1304

  • Keep the owner check inside the isResponsibleFor in the worker goroutine only longhorn-manager#1334

  • Which areas/issues this PR might have potential impacts on?
    Area Snapshot/volume attachment/detachment
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at [outdated] Add LEP for Longhorn snapshot CRD feature #3876

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at [FEATURE] Snapshot CRD #3144 (comment)

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@PhanLe1010 PhanLe1010 added require/manual-test-plan Require adding/updating manual test cases if they can't be automated and removed require/doc Require updating the longhorn.io documentation labels May 18, 2022
@chriscchien
Copy link
Contributor

chriscchien commented May 18, 2022

HI @PhanLe1010 ,

I have quick questions about test plan steps Create -> 2 -> iii , is the snapshot name snap001 typo in command kubectl get snapshots -n longhorn-system snap001 ? and

In step Delete -> 3 -> ii, snapshot snap002 not next to volume-head because previously we tested recurring jobs with 5 retains, so there should be recurring job snapshots after to snap002, could you change the test steps let steps more smoothly? thank you

@PhanLe1010
Copy link
Contributor

steps Create -> 2 -> iii , is the snapshot name snap001 typo in command kubectl get snapshots -n longhorn-system snap001 ? and

Yeah, it is a typo. Fixed it. Thank you

In step Delete -> 3 -> ii, snapshot snap002 not next to volume-head because previously we tested recurring jobs with 5 retains, so there should be recurring job snapshots after to snap002, could you change the test steps let steps more smoothly? thank you

Done. Thank you

@chriscchien
Copy link
Contributor

chriscchien commented May 19, 2022

Validate on master-head 20220519
Result Pass

Follow test steps, below snapshot behavior in different scenarios all working as expected

Create

  • Create new snapshot CR, volume attached
  • Create new snapshot CR, volume detached
  • Create snapshot using UI
  • Create snapshot using recurring job
  • Create snapshot CR with duplicated name

Delete

  • Delete snapshot CR, volume attached
  • Delete snapshot CR, volume detached
  • Delete snapshot CR that is next to volume-head

Others

  • Deleting attached volume with existing snapshot CRs
  • Deleting detached volume with existing snapshot CRs
  • Uninstall Longhorn with existing snapshot CRs

@innobead innobead changed the title [FEATURE] CRD for snapshots [FEATURE] Snapshot CRD Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Longhorn manager public API highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/lep Require adding/updating enhancement proposal require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Projects
Archived in project
Status: Closed
Development

No branches or pull requests

7 participants