-
Notifications
You must be signed in to change notification settings - Fork 265
Support Equivalence failure cache by template-uuid #538
Comments
NOTE: the result of pod affinity/ant-affinity can not be re-used, let's handle other predicates in v0.4 . |
I would like to work on this issue. Per my understanding, we have to wait for the upstream issue kubernetes/kubernetes#72322 to be fixed, right? And the change will be adding a map to the session. The map will cache the result that if a template-uuid fits a node. The map will be used in actions like allocate to improve the performance. |
It's not necessary to wait for kubernetes/kubernetes#72322 ; we can have that feature here, and propose it to upstream :) |
Yes, we need to add a map in session; and this map is used by |
So we will calculate the pod-template-hash for the pod. How do we store the hash? I can come up with two approaches:
Which one do you prefer? Or any better approach? |
Calculating hash is really heavy, prefer to let operation/controller set an annotation for it. In kube-batch, only the pods from the same PodGroup will check eCache. |
And we need to pay attention on pod affinity/anti-affinity. |
Do you mean the controllers in controller manager? |
nop, we only support this feature in kube-batch; other operator, e.g. tf-operator, will decide whether use it or not. And user can aslo add annotation in PodTemplate which is not convience. For the proposal in upstream, I'd suggest to add this annotation into pod template in controller manager. |
Ok, got it, thanks. |
/assign |
@zionwu: GitHub didn't allow me to assign the following users: zionwu. Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@zionwu , thanks very much! Please continue your work :) |
I think it is a good solution to |
regarding "hash", how to resolve conflict? |
@k82cn In my option the conflict is very rare, even it happens, kubelet could deal with it well. |
if hash conflict, scheduler may not bind pod. |
The hash is stored in |
/sig apps |
@k82cn @hex108 Do we have a conclusion on how to get the hash? do we let controller to set it as annotation or calculate it by kube-batch? I also checked the latest implementation of eCache in default scheduler, it is using the UID of pod's controller as hash : func GetEquivHash(pod *v1.Pod) types.UID {
ownerReferences := pod.GetOwnerReferences()
if ownerReferences != nil {
return ownerReferences[0].UID
}
// If pod's UID of controllerRef is nil, return nil.
return ""
} In my opinion this is also a good approach, even for controller with multiple pod specs, like tf-operator. For TFJob, if one of its pod specs is unschedulable, the whole job is unschedulable and we don't have to schedule other pod spec of this job. What do you think of this approach? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
Description:
For batch workload, there're several "similar" tasks in one job; if one task failed to fit one node, the other tasks will have the same result (except pod affinity/anti-affinity). In kube-batch, we don-t know which tasks are similar, but we can support customized indexed, e.g. an annotation of task template uuid.
The text was updated successfully, but these errors were encountered: