Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kep for coscheduling base on CRD #42
Add kep for coscheduling base on CRD #42
Changes from all commits
122cdea
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if the podgroup is made up of different types of pods? Like ps and trainer in tensorflow. How to set the MinResources ?
@cwdsuzhou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the v1alpha1 version. I could add that if need and extend this CRD later. e.g
type PodGroup struct {
...
Groups [map]PodGroup
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one PodGroup should describe a particular type of workload. In terms of tensorflow application, a good model is to have PS and Worker referencing their corresponding (Sub-)PodGroup, and then there can be a (Parent-)PodGroup to wire those two (Sub-)PodGroup(s).
(not quite sure it would work, need a PoC on this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought per #42 (comment)
MinResources
was going to be removed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, owned, like
OwnerReference
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more intuitive to use the more common terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical
meta.OwnerReference
may be used to exercise the concept of hierarchical PodGroups, i.e., a child PodGroup "ownedBy" the other parent PodGroup, so using the term "OccupiedBy" instatus
can distinguish from that semantics IMO - to represent the semantics of PodGroup <-> Workload mapping.@cwdsuzhou may comment further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep ,this field is used for recording the related object e.g. workload or crd. But not absolutely same as the owner reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this "pod belonging to a pod group" relationship expressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OccupiedBy
- which refers to the the group of pods' controller (i.e., Deployment or StatefulSet).@cwdsuzhou can confirm.