Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalid backup is getting listed on the backup page #2912

Closed
khushboo-rancher opened this issue Aug 22, 2021 · 8 comments
Closed

[BUG] Invalid backup is getting listed on the backup page #2912

khushboo-rancher opened this issue Aug 22, 2021 · 8 comments
Assignees
Labels
area/volume-backup-restore Volume backup restore kind/bug kind/regression Regression which has worked before priority/2 Nice to fix in this release (managed by PO) severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@khushboo-rancher
Copy link
Contributor

Describe the bug
The backup which is not completed and interrupted in middle still shows on the backup page.

To Reproduce
Steps to reproduce the behavior:

  1. Create a volume and attach it to a pod.
  2. Write 1 Gi data into it.
  3. Click Create backup.
  4. Interrupt the backup by deleting the responsible replica.
  5. Check the backup store, it lists the backup with 0B and other details as empty.
  6. Try to restore it, it throws error 405

Expected behavior
The invalid backup should not appear in the backup page as v1.1.2

Environment:

  • Longhorn version: v1.2.0-rc1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
@khushboo-rancher khushboo-rancher added kind/bug kind/regression Regression which has worked before severity/3 Function working but has a major issue w/ workaround labels Aug 22, 2021
@innobead innobead added the priority/2 Nice to fix in this release (managed by PO) label Aug 23, 2021
@innobead innobead added this to the v1.2.0 milestone Aug 23, 2021
@jenting
Copy link
Contributor

jenting commented Aug 23, 2021

Could you please describe how to perform this?

Interrupt the backup by deleting the responsible replica.

BTW, after longhorn backup refactoring, if the backup config metadata be corrupted after backup creation, longhorn would not detect the backup config corrupted because:

  • the backup config metadata should not be overwritten again by longhorn once the backup completion, so usually it's corrupted by the user or the external file system corrupted
  • if we detect all the backups, it would cause the cost of S3 API to increase because longhorn have to read all the backups on each poll interval time
  • even we implement to check the backup config metadata corrupted, we did not check the data of a backup is being corrupted or not

So, it's better to verify the corrupted backup config metadata and its data when the user performs restoring from the backup volume.

@jenting jenting added the area/volume-backup-restore Volume backup restore label Aug 23, 2021
@khushboo-rancher
Copy link
Contributor Author

Could you please describe how to perform this?
Interrupt the backup by deleting the responsible replica.

You can hover mouse to the backup in progress icon and it will show the replica responsible for the backup, then you can delete that replica to interrupt the backup.

So, it's better to verify the corrupted backup config metadata and its data when the user performs restoring from the backup volume.

If the cost of checking this high, can we at least indicate that the backup is corrupted and can't be restored? As of now, all the options (restore, delete, etc) for this corrupted backup are available don't work.
cc: @innobead @yasker
Screen Shot 2021-08-23 at 10 40 02 AM

@yasker
Copy link
Member

yasker commented Aug 23, 2021

@khushboo-rancher it's a nice-to-have enhancement. Since we already know the backup size is 0 with no valid snapshot name/volume, it doesn't make sense to indicate it as restorable in the UI. And it should be deletable still.

@khushboo-rancher
Copy link
Contributor Author

Okay, makes sense. But delete is not working.

@jenting
Copy link
Contributor

jenting commented Aug 24, 2021

Okay, makes sense. But delete is not working.

okay, I'll check why delete is not working, thank you.

@jenting
Copy link
Contributor

jenting commented Aug 25, 2021

Because the Volume name is empty, and backupDelete endpoint needs volume name as the parameter.
So the request from GUI is without a volume name.

@smallteeths could you please get the volume name from
image
instead of from here?
image

Also, I'll create a new Backup state Ready. Could you please make the Operation buttons Restore and Get URL clickable only when the Backup state is Ready?
image

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Aug 25, 2021

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at: Add a new backup state ready longhorn-manager#1005
    The PR for the chart change is at: Show backup state #2928

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at Add a new backup state ready longhorn-manager#1005

  • Which areas/issues this PR might have potential impacts on?
    Area Backup store
    Issues Invalid page can't be deleted

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at fix(delete backup) Fixes the parameters for passing volumeName longhorn-ui#430

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@khushboo-rancher
Copy link
Contributor Author

Verified with Longhorn v1.2.0-rc2

Validation - Pass

The invalid backups can be deleted using Longhorn UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volume-backup-restore Volume backup restore kind/bug kind/regression Regression which has worked before priority/2 Nice to fix in this release (managed by PO) severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

6 participants