Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] New setting to limit the concurrent volume restoring from backup #4558

Closed
c3y1huang opened this issue Sep 14, 2022 · 4 comments
Closed
Assignees
Labels
area/recurring-job Longhorn recurring job related area/system-backup-restore Longhorn system backup restore area/volume-backup-restore Volume backup restore component/longhorn-manager Longhorn manager (control plane) kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Milestone

Comments

@c3y1huang
Copy link
Contributor

Is your feature request related to a problem? Please describe

We like to support to rolling out the volume with data from backup/backing-image. However, Longhorn has no limitation on the concurrent volume restoring from backup.

Longhorn should have another setting similar to concurrent-replica-rebuild-per-node-limit.

Describe the solution you'd like

Introduce a new setting to limit the concurrent volume creating from the backup.

Describe alternatives you've considered

Handle in rollout-load controller.

Additional context

#4388 (comment)

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Dec 5, 2022

Pre Ready-For-Testing Checklist

@innobead innobead changed the title [FEATURE] New setting to limit the concurrent volume restoring from backup. [FEATURE] New setting to limit the concurrent volume restoring from backup Dec 7, 2022
@innobead innobead added area/system-backup-restore Longhorn system backup restore require/doc Require updating the longhorn.io documentation labels Dec 7, 2022
@chriscchien
Copy link
Contributor

Hi @c3y1huang

I have a 3 nodes cluster, 8 backups from different volumes and Longhorn set Concurrent Volume Backup Restore Per Node Limit to 1
After I select all backups in backup page then click Restore latest Backup
The concurrent restore number somehow will greater then 3 as below picture shown.

Screenshot from 2022-12-09 17-03-15

In addition, I had a simple test script to calculate rebuilding numbers from backend API, when restoring, somehow it will give me number greater than 3 which reflected the situation as UI displayed.

def test_count_restore_limit(client):

    for i in range(1000):
        try:
            restoring = 0
            volumes = client.list_volume()
            for v in volumes:
                if v.restoreStatus and v.restoreStatus[0].progress != 0 and v.restoreStatus[0].progress != 100:
                    restoring = restoring + 1
                else:
                    pass
        except:
            print("x")
            pass

        print(restoring)
        time.sleep(1)

@chriscchien
Copy link
Contributor

chriscchien commented Dec 12, 2022

Did several times of restoring volume with 1 replica and Concurrent Volume Backup Restore Per Node Limit set to 1

At begging the restore number = 3 matched the setting. But there still have chance that the restoring volume number surpassed Concurrent Volume Backup Restore Per Node Limit value, not every time, but not very hard to reproduce ( Can reproduce every time when restoring volume with 3 replicas)

Screenshot_20221212_084949

In addition, last success build of longhorn-engine was 6 days ago, not sure if this impacted

Screenshot_20221212_085348

@chriscchien
Copy link
Contributor

Verified in longhorn master aa3998 with steps
Result Pass

  • Environment:
    • 3 nodes cluster
    • 8 backups from different volume, each size were 2Gi
  1. The setting won't effect DR volumes, no matter the setting is 0 or not, all DR volumes were restoring at the same time.
  2. Set concurrent-volume-backup-restore-per-node-limit to 0, then restore all volumes, can see volume attach to nodes in maintenance mode and restoring never started.
  3. Set concurrent-volume-backup-restore-per-node-limit to 1, can observe volume start restoring and each node's restoring number not exceed the setting.
  4. Also tested concurrent-volume-backup-restore-per-node-limit = 2 , restore volume with 1 and 3 replicas, all worked well, the restoring number not exceed setting per node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/recurring-job Longhorn recurring job related area/system-backup-restore Longhorn system backup restore area/volume-backup-restore Volume backup restore component/longhorn-manager Longhorn manager (control plane) kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Projects
None yet
Development

No branches or pull requests

4 participants