-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Replica scheduling with multiple factors #5149
Comments
cc @longhorn/dev |
Let's consolidate both issues, and use this one instead. |
Can consider node/disk anti-affinity for replica scheduling or zone affinity. |
Did you mean that we could group the node/disk for replica scheduling (like whitelist) or blacklist for node/disk anti-affinity? |
Consider doing the below as part of the replica scheduler improvement. |
Schedule replicas on different disks on the same nodeIt would be great to have two soft anti-affinity options:
For example, on a single node Kubernetes cluster where HA is not required but data loss is a problem we would disable the above two anti-affinity options and we would get a software RAID based on Longhorn.
Slack discussion: https://rancher-users.slack.com/archives/CC2UQM49Y/p1684827748167229 |
Configure volume replicas using Persistent Volume ClaimIt would be extremly helpful to be able to configure the number of replicas using Persistent Volume Claim labels. Currently, we have multiple Storage Classes with default replica count, but when we want to change the number of replicas programatically, we do the change manually from UI and this is not convinient to automate. |
You might consider modifying the spec of the Longhorn volume for an easily programmatic way of modifying the replica count. For example:
|
hmm, got it, but we're deploying our application using helm charts, so we define just the PVC and not the PV But for now, this is an acceptable solution as we can implement some hooks to update PV replica count. Thank you! |
Got it. So, just to be clear, you're saying you want to:
|
Yes, we want to control the number of replicas from PVC (the number of replicas from StorageClass will be used if no replica count label is not defined on PVC). |
@ChanYiLin Please help with this. You can discuss this with @Vicente-Cheng because he has some ideas about this. |
@c3y1huang this is similar to recurring job applied to PVC instead of PV you have handled. |
Is your improvement request related to a feature? Please describe (👍 if you like this request)
The replica scheduler may consider more aspects, like replica counts and the scheduled size.
We may schedule the replica to the same disk if other volumes are not used because of the thin-provision design. We calculate the disk usage only if allocated.
We could consider more when replica scheduling to improve data balance, performance and placement.
Also, it would benefit the sharding implementation. (If sharding is implemented, the balance of whole disks on the nodes would be important)
Right now, where to schedule replicas depends on the existing implicit rules instead of obvious specific/tunable strategies, so users don't have ways to decide the scheduling strategies except node tags, node/disk scheduling, etc.
There are some discussions at longhorn/longhorn-manager#1397 (review), so it's good to see what strategies are valuable to design.
Describe the solution you'd like
TBD with LEP
Have different replica scheduling strategies based on some conditions like the number of replicas, available disk space, etc.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
The example for current behavior.
Nodes A, B, and C all have two disks.
have -> disk A (200G), disk B (150 G)
We would schedule the replica to disk A until the whole volume size is bigger than the disk size * overcommit ratio instead of scheduling more balance.
The text was updated successfully, but these errors were encountered: