New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom labels for each replica #413
Conversation
User can specify labels to each replica. Labels size must equal to replica size, or failed at validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove the duplicate definition.
/ok-to-test |
If import "github.com/kubeflow/tf-operator/pkg/trainer", there is a import cycle, so I use []*map[string]string directly. P.S. This PR is for #340 |
@@ -102,6 +102,7 @@ type TFReplicaSpec struct { | |||
// TFPort is the port to use for TF services. | |||
TFPort *int32 `json:"tfPort,omitempty" protobuf:"varint,1,opt,name=tfPort"` | |||
TFReplicaType `json:"tfReplicaType"` | |||
TFExtraLabels []*map[string]string `json:"tfExtraLabels,omitempty" protobuf:"varint,4,opt,name=tfExtraLabels"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need TFExtraLabels? Since Replica contains a PodTemplateSpec, couldn't users just use the Metadata in the template spec?
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#podtemplatespec-v1-core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he/she wants to set different labels for different replica instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use map instead of *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaocegege Thanks.
@sdf611097 Can you provide more justification for the PR? Doesn't this make the code brittle?
What happens if a user wants to change the number of replicas? Do they need to update the list of labels?
The convention in K8s seems to be that the labels should all be the same e.g. ReplicaSet applies the same labels to all pods. Why would we want to do something different here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlewi My team have a monitor tool which depends on the the label(which value equal to id from db), so I want this feature.
User can ignore TFExtraLabels so that each pod will have same label. Maybe I can remove the validation and avoid index out of bound in replica.go, and ignore labels which is more than replica size.
I didn't notice that labels should all be the same by K8s convention . If this PR is inappropriate, I think it is ok to reject this PR.
BTW, does it make sense that pods in same replicaSet have different environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be adding features/code to support one of use cases.
Is there a reason your monitoring tool can't use the existing labels?
Does your monitoring tool work with other types of built in K8s controllers? e.g. ReplicaSet, StatefulSet, etc... If not why do you want/need to treat TFJob controllers differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, and this pr is not ok.
User can specify labels to each replica by set replicaSpec[i].tfExtraLabels.
Labels size must equal to replica size, or failed at validation.
This change is