[FEATURE] Replica scheduling with multiple factors #5149

Vicente-Cheng · 2022-12-27T02:33:00Z

Is your improvement request related to a feature? Please describe (👍 if you like this request)

The replica scheduler may consider more aspects, like replica counts and the scheduled size.
We may schedule the replica to the same disk if other volumes are not used because of the thin-provision design. We calculate the disk usage only if allocated.
We could consider more when replica scheduling to improve data balance, performance and placement.
Also, it would benefit the sharding implementation. (If sharding is implemented, the balance of whole disks on the nodes would be important)

Right now, where to schedule replicas depends on the existing implicit rules instead of obvious specific/tunable strategies, so users don't have ways to decide the scheduling strategies except node tags, node/disk scheduling, etc.

There are some discussions at longhorn/longhorn-manager#1397 (review), so it's good to see what strategies are valuable to design.

Describe the solution you'd like

TBD with LEP

Have different replica scheduling strategies based on some conditions like the number of replicas, available disk space, etc.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

The example for current behavior.
Nodes A, B, and C all have two disks.
have -> disk A (200G), disk B (150 G)

We would schedule the replica to disk A until the whole volume size is bigger than the disk size * overcommit ratio instead of scheduling more balance.

innobead · 2022-12-27T04:35:59Z

cc @longhorn/dev

derekbit · 2022-12-27T04:39:01Z

Ref: [FEATURE] Different replica scheduling strategies

innobead · 2022-12-27T05:58:35Z

Ref: [FEATURE] Different replica scheduling strategies

Let's consolidate both issues, and use this one instead.

innobead · 2022-12-28T06:47:25Z

ref: [IMPROVEMENT] Consider resource capability of IM pods when scheduling volume replies

innobead · 2023-02-06T09:23:16Z

Can consider node/disk anti-affinity for replica scheduling or zone affinity.

Vicente-Cheng · 2023-02-08T16:26:57Z

Can consider node/disk anti-affinity for replica scheduling or zone affinity.

Did you mean that we could group the node/disk for replica scheduling (like whitelist) or blacklist for node/disk anti-affinity?

joshimoo · 2023-02-24T01:05:56Z

Consider doing the below as part of the replica scheduler improvement.
#4826

iosifnicolae2 · 2023-05-23T14:48:56Z

Schedule replicas on different disks on the same node

It would be great to have two soft anti-affinity options:

Replica Node Level Soft Anti-Affinity - to schedule replicas on different nodes
Replica Disk Level Soft Anti-Affinity - to schedule replicas on different disks

For example, on a single node Kubernetes cluster where HA is not required but data loss is a problem we would disable the above two anti-affinity options and we would get a software RAID based on Longhorn.

the advantage of using Longhorn for implementing the software RAID is that we can add a few more nodes and in a few clicks we get a HA storage layer

Slack discussion: https://rancher-users.slack.com/archives/CC2UQM49Y/p1684827748167229

iosifnicolae2 · 2023-05-24T06:31:29Z

Configure volume replicas using Persistent Volume Claim

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

ejweber · 2023-05-24T20:31:03Z

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

You might consider modifying the spec of the Longhorn volume for an easily programmatic way of modifying the replica count. For example:

eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
2
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   15s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   15s
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system patch volume test --type merge -p '{"spec": {"numberOfReplicas": 3}}'
volume.longhorn.io/test patched
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
3
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-033140d7   running   eweber-v124-worker-1ae51dbb-ppvzp   711d4bcb-85ab-4cfc-ad06-fcf12de42916   instance-manager-faf47d6949e508720ae71ed3ee10e466   longhornio/longhorn-engine:master-head   13s
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   36s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   36s

iosifnicolae2 · 2023-05-24T20:58:03Z

It would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels.

Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate.

You might consider modifying the spec of the Longhorn volume for an easily programmatic way of modifying the replica count. For example:

eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
2
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   15s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   15s
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system patch volume test --type merge -p '{"spec": {"numberOfReplicas": 3}}'
volume.longhorn.io/test patched
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get volume -o jsonpath='{.spec.numberOfReplicas}{"\n"}' test
3
eweber@laptop:~/longhorn-manager> kubectl -n longhorn-system get replicas
NAME              STATE     NODE                                DISK                                   INSTANCEMANAGER                                     IMAGE                                    AGE
test-r-033140d7   running   eweber-v124-worker-1ae51dbb-ppvzp   711d4bcb-85ab-4cfc-ad06-fcf12de42916   instance-manager-faf47d6949e508720ae71ed3ee10e466   longhornio/longhorn-engine:master-head   13s
test-r-a0c5ef24   running   eweber-v124-worker-1ae51dbb-4pngn   9699a051-aff8-457d-a6f4-4c067615e7ed   instance-manager-bd54850e35239b37069374780ca0f9a3   longhornio/longhorn-engine:master-head   36s
test-r-a29095b9   running   eweber-v124-worker-1ae51dbb-pbxr9   4a758df2-1b96-4c11-a664-4f211f542923   instance-manager-4639bda14281d41f3af00d64bc364bb9   longhornio/longhorn-engine:master-head   36s

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

But for now, this is an acceptable solution as we can implement some hooks to update PV replica count.

Thank you!

ejweber · 2023-05-25T19:16:23Z

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

Got it. So, just to be clear, you're saying you want to:

Deploy the application with a PVC that has some number of replicas based on the StorageClass (e.g. 3).
Later, modify the deployment so that a different number of replicas are used (e.g. 2).
But your automation only knows about the PVC (not the underlying Longhorn volume), so you want to manipulate it directly?

iosifnicolae2 · 2023-05-25T19:32:58Z

hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV

Got it. So, just to be clear, you're saying you want to:

Deploy the application with a PVC that has some number of replicas based on the StorageClass (e.g. 3).

Later, modify the deployment so that a different number of replicas are used (e.g. 2).

But your automation only knows about the PVC (not the underlying Longhorn volume), so you want to manipulate it directly?

Yes, we want to control the number of replicas from PVC (the number of replicas from StorageClass will be used if no replica count label is not defined on PVC).

innobead · 2023-06-30T06:23:34Z

@ChanYiLin Please help with this. You can discuss this with @Vicente-Cheng because he has some ideas about this.

innobead · 2023-07-19T16:38:01Z

Yes, we want to control the number of replicas from PVC (the number of replicas from StorageClass will be used if no replica count label is not defined on PVC).

@c3y1huang this is similar to recurring job applied to PVC instead of PV you have handled.

Vicente-Cheng added the kind/improvement Request for improvement of existing function label Dec 27, 2022

innobead added component/longhorn-manager Longhorn manager (control plane) area/volume-replica-scheduling Volume replica scheduling related labels Dec 27, 2022

innobead added this to the v1.5.0 milestone Dec 27, 2022

innobead added the require/lep Require adding/updating enhancement proposal label Dec 27, 2022

Vicente-Cheng mentioned this issue Dec 27, 2022

[Storage is allocating more than overcommit-config] harvester/harvester#3083

Open

innobead mentioned this issue Dec 27, 2022

[FEATURE] Different replica scheduling strategies #4230

Closed

innobead changed the title ~~[IMPROVEMENT] The replica scheduler may consider more factors~~ [FEATURE] The replica scheduler may consider more factors Dec 27, 2022

innobead added kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) and removed kind/improvement Request for improvement of existing function labels Dec 27, 2022

innobead mentioned this issue Dec 28, 2022

[IMPROVEMENT] Consider resource capability of IM pods when scheduling volume replies #4613

Closed

innobead assigned shuo-wu Jan 9, 2023

innobead added the highlight Important feature/issue to highlight label Jan 9, 2023

innobead assigned Vicente-Cheng and unassigned shuo-wu Jan 13, 2023

c3y1huang mentioned this issue Mar 8, 2023

Consider disk space when schedule replica longhorn/longhorn-manager#1397

Merged

innobead modified the milestones: v1.5.0, v1.6.0 May 7, 2023

ejweber mentioned this issue May 25, 2023

[FEATURE] GitOps-aware volume config update #5995

Open

innobead assigned ChanYiLin and unassigned Vicente-Cheng Jun 30, 2023

ejweber mentioned this issue Jul 18, 2023

[FEATURE] Single Node Disk affinity #3823

Closed

innobead changed the title ~~[FEATURE] The replica scheduler may consider more factors~~ [FEATURE] Replica scheduling with multiple factors Jul 19, 2023

innobead mentioned this issue Jul 19, 2023

[FEATURE] Storage topology aware scheduling of replicas #1790

Open

innobead assigned Vicente-Cheng Jul 19, 2023

innobead unassigned ChanYiLin Aug 14, 2023

innobead modified the milestones: v1.6.0, v1.7.0 Sep 14, 2023

derekbit mentioned this issue May 22, 2024

[BUG] Longhorn volume distribution across nodes #8610

Open

innobead modified the milestones: v1.7.0, v1.8.0 May 28, 2024

innobead added the require/manual-test-plan Require adding/updating manual test cases if they can't be automated label Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Replica scheduling with multiple factors #5149

[FEATURE] Replica scheduling with multiple factors #5149

Vicente-Cheng commented Dec 27, 2022 •

edited by innobead

Loading

innobead commented Dec 27, 2022

derekbit commented Dec 27, 2022 •

edited

Loading

innobead commented Dec 27, 2022

innobead commented Dec 28, 2022 •

edited

Loading

innobead commented Feb 6, 2023 •

edited

Loading

Vicente-Cheng commented Feb 8, 2023

joshimoo commented Feb 24, 2023

iosifnicolae2 commented May 23, 2023 •

edited

Loading

iosifnicolae2 commented May 24, 2023

ejweber commented May 24, 2023

iosifnicolae2 commented May 24, 2023

ejweber commented May 25, 2023

iosifnicolae2 commented May 25, 2023

innobead commented Jun 30, 2023

innobead commented Jul 19, 2023 •

edited

Loading

[FEATURE] Replica scheduling with multiple factors #5149

[FEATURE] Replica scheduling with multiple factors #5149

Comments

Vicente-Cheng commented Dec 27, 2022 • edited by innobead Loading

Is your improvement request related to a feature? Please describe (👍 if you like this request)

Describe the solution you'd like

Describe alternatives you've considered

Additional context

innobead commented Dec 27, 2022

derekbit commented Dec 27, 2022 • edited Loading

innobead commented Dec 27, 2022

innobead commented Dec 28, 2022 • edited Loading

innobead commented Feb 6, 2023 • edited Loading

Vicente-Cheng commented Feb 8, 2023

joshimoo commented Feb 24, 2023

iosifnicolae2 commented May 23, 2023 • edited Loading

Schedule replicas on different disks on the same node

iosifnicolae2 commented May 24, 2023

Configure volume replicas using Persistent Volume Claim

ejweber commented May 24, 2023

iosifnicolae2 commented May 24, 2023

ejweber commented May 25, 2023

iosifnicolae2 commented May 25, 2023

innobead commented Jun 30, 2023

innobead commented Jul 19, 2023 • edited Loading

Vicente-Cheng commented Dec 27, 2022 •

edited by innobead

Loading

derekbit commented Dec 27, 2022 •

edited

Loading

innobead commented Dec 28, 2022 •

edited

Loading

innobead commented Feb 6, 2023 •

edited

Loading

iosifnicolae2 commented May 23, 2023 •

edited

Loading

innobead commented Jul 19, 2023 •

edited

Loading