Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Replica scheduling with multiple factors #5149

Open
Vicente-Cheng opened this issue Dec 27, 2022 · 15 comments
Open

[FEATURE] Replica scheduling with multiple factors #5149

Vicente-Cheng opened this issue Dec 27, 2022 · 15 comments
Assignees
Labels
area/volume-replica-scheduling Volume replica scheduling related component/longhorn-manager Longhorn manager (control plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/lep Require adding/updating enhancement proposal require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Milestone

Comments

@Vicente-Cheng
Copy link

Vicente-Cheng commented Dec 27, 2022

Is your improvement request related to a feature? Please describe (👍 if you like this request)

The replica scheduler may consider more aspects, like replica counts and the scheduled size.
We may schedule the replica to the same disk if other volumes are not used because of the thin-provision design. We calculate the disk usage only if allocated.
We could consider more when replica scheduling to improve data balance, performance and placement.
Also, it would benefit the sharding implementation. (If sharding is implemented, the balance of whole disks on the nodes would be important)

Right now, where to schedule replicas depends on the existing implicit rules instead of obvious specific/tunable strategies, so users don't have ways to decide the scheduling strategies except node tags, node/disk scheduling, etc.

There are some discussions at longhorn/longhorn-manager#1397 (review), so it's good to see what strategies are valuable to design.

Describe the solution you'd like

TBD with LEP

Have different replica scheduling strategies based on some conditions like the number of replicas, available disk space, etc.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

The example for current behavior.
Nodes A, B, and C all have two disks.
have -> disk A (200G), disk B (150 G)

We would schedule the replica to disk A until the whole volume size is bigger than the disk size * overcommit ratio instead of scheduling more balance.

@Vicente-Cheng Vicente-Cheng added the kind/improvement Request for improvement of existing function label Dec 27, 2022
@innobead innobead added component/longhorn-manager Longhorn manager (control plane) area/volume-replica-scheduling Volume replica scheduling related labels Dec 27, 2022
@innobead innobead added this to the v1.5.0 milestone Dec 27, 2022
@innobead innobead added the require/lep Require adding/updating enhancement proposal label Dec 27, 2022
@innobead
Copy link
Member

cc @longhorn/dev

@derekbit
Copy link
Member

derekbit commented Dec 27, 2022

Ref: [FEATURE] Different replica scheduling strategies

@innobead
Copy link
Member

Ref: [FEATURE] Different replica scheduling strategies

Let's consolidate both issues, and use this one instead.

@innobead innobead changed the title [IMPROVEMENT] The replica scheduler may consider more factors [FEATURE] The replica scheduler may consider more factors Dec 27, 2022
@innobead innobead added kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) and removed kind/improvement Request for improvement of existing function labels Dec 27, 2022
@innobead
Copy link
Member

innobead commented Dec 28, 2022

@innobead innobead added the highlight Important feature/issue to highlight label Jan 9, 2023
@innobead innobead assigned Vicente-Cheng and unassigned shuo-wu Jan 13, 2023
@innobead
Copy link
Member

innobead commented Feb 6, 2023

Can consider node/disk anti-affinity for replica scheduling or zone affinity.

@Vicente-Cheng
Copy link
Author

Can consider node/disk anti-affinity for replica scheduling or zone affinity.

Did you mean that we could group the node/disk for replica scheduling (like whitelist) or blacklist for node/disk anti-affinity?

@joshimoo
Copy link
Contributor

Consider doing the below as part of the replica scheduler improvement.
#4826

@iosifnicolae2
Copy link

iosifnicolae2 commented May 23, 2023

Schedule replicas on different disks on the same node

It would be great to have two soft anti-affinity options:

  • Replica Node Level Soft Anti-Affinity - to schedule replicas on different nodes
  • Replica Disk Level Soft Anti-Affinity - to schedule replicas on different disks

For example, on a single node Kubernetes cluster where HA is not required but data loss is a problem we would disable the above two anti-affinity options and we would get a software RAID based on Longhorn.

  • the advantage of using Longhorn for implementing the software RAID is that we can add a few more nodes and in a few clicks we get a HA storage layer

Slack discussion: https://rancher-users.slack.com/archives/CC2UQM49Y/p1684827748167229

@iosifnicolae2
Copy link

Configure volume replicas using Persistent Volume Claim

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

@ejweber
Copy link
Contributor

ejweber commented May 24, 2023

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

You might consider modifying the spec of the Longhorn volume for an easily programmatic way of modifying the replica count. For example:

eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
2
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   15s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   15s
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system patch volume test --type merge -p '{"spec": {"numberOfReplicas": 3}}'
volume.longhorn.io/test patched
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
3
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-033140d7   running   eweber-v124-worker-1ae51dbb-ppvzp   711d4bcb-85ab-4cfc-ad06-fcf12de42916   instance-manager-faf47d6949e508720ae71ed3ee10e466   longhornio/longhorn-engine:master-head   13s
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   36s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   36s

@iosifnicolae2
Copy link

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

You might consider modifying the spec of the Longhorn volume for an easily programmatic way of modifying the replica count. For example:

eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
2
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   15s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   15s
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system patch volume test --type merge -p '{"spec": {"numberOfReplicas": 3}}'
volume.longhorn.io/test patched
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
3
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-033140d7   running   eweber-v124-worker-1ae51dbb-ppvzp   711d4bcb-85ab-4cfc-ad06-fcf12de42916   instance-manager-faf47d6949e508720ae71ed3ee10e466   longhornio/longhorn-engine:master-head   13s
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   36s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   36s

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

But for now, this is an acceptable solution as we can implement some hooks to update PV replica count.

Thank you!

@ejweber
Copy link
Contributor

ejweber commented May 25, 2023

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

Got it. So, just to be clear, you're saying you want to:

  • Deploy the application with a PVC that has some number of replicas based on the StorageClass (e.g. 3).
  • Later, modify the deployment so that a different number of replicas are used (e.g. 2).
  • But your automation only knows about the PVC (not the underlying Longhorn volume), so you want to manipulate it directly?

@iosifnicolae2
Copy link

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

Got it. So, just to be clear, you're saying you want to:

  • Deploy the application with a PVC that has some number of replicas based on the StorageClass (e.g. 3).
  • Later, modify the deployment so that a different number of replicas are used (e.g. 2).
  • But your automation only knows about the PVC (not the underlying Longhorn volume), so you want to manipulate it directly?

Yes, we want to control the number of replicas from PVC (the number of replicas from StorageClass will be used if no replica count label is not defined on PVC).

@innobead innobead assigned ChanYiLin and unassigned Vicente-Cheng Jun 30, 2023
@innobead
Copy link
Member

@ChanYiLin Please help with this. You can discuss this with @Vicente-Cheng because he has some ideas about this.

@innobead innobead changed the title [FEATURE] The replica scheduler may consider more factors [FEATURE] Replica scheduling with multiple factors Jul 19, 2023
@innobead
Copy link
Member

innobead commented Jul 19, 2023

Yes, we want to control the number of replicas from PVC (the number of replicas from StorageClass will be used if no replica count label is not defined on PVC).

@c3y1huang this is similar to recurring job applied to PVC instead of PV you have handled.

@innobead innobead modified the milestones: v1.6.0, v1.7.0 Sep 14, 2023
@innobead innobead modified the milestones: v1.7.0, v1.8.0 May 28, 2024
@innobead innobead added the require/manual-test-plan Require adding/updating manual test cases if they can't be automated label Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volume-replica-scheduling Volume replica scheduling related component/longhorn-manager Longhorn manager (control plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/lep Require adding/updating enhancement proposal require/manual-test-plan Require adding/updating manual test cases if they can't be automated
Projects
Status: Resolved/Scheduled
Development

No branches or pull requests

8 participants