Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Review #64

Closed
jlewi opened this issue Oct 19, 2017 · 7 comments
Closed

API Review #64

jlewi opened this issue Oct 19, 2017 · 7 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Oct 19, 2017

Opening an issue to track API issues that should be addressed as part of moving from Alpha to Beta.

  • TfReplicaType
    - Should this be an enum?
    - Instead of specifying replica type should we specify features e.g. run forever, restart always, etc...

  • master vs. chief nomenclature

@mrry from tensorflow should be included on the eventual api review.

@ScorpioCPH
Copy link
Member

I also have some comments about API changes #249, maybe we can discuss this topic together :)

@gaocegege
Copy link
Member

  • Instead of specifying replica type should we specify features e.g. run forever, restart always, etc...

I think the type name tfReplicaType is a little redundant, maybe type is fine.

And I agree with @ScorpioCPH about the status:

Maybe we should revisit status representation: State/ReplicaState/TfReplicaStatus.

For example, I think State field in TFReplicaStatus could not express the state accurately since it is complicated, then TFReplicasStates is enough maybe.

@0xgj
Copy link
Contributor

0xgj commented Jan 14, 2018

I suggest we should use chief instead of master as #306 stated

* it's confusing using master when there is distributed master in tensorflow;
* due to issue #61 , in tensorflow 1.4, TF_CONFIG use chief instead of master;

@jlewi @ScorpioCPH @gaocegege

@jlewi
Copy link
Contributor Author

jlewi commented Jan 16, 2018

I agree we should probably get rid of "master". I actually think we should probably get rid of ReplicaType and introduce properties that control different behaviors of the replica such as termination policy. Users can then assign an arbitrary name to each replica.

@0xgj
Copy link
Contributor

0xgj commented Jan 16, 2018

+1, @jlewi we can use a property named job or name instead of replicaType, which can be assigned to any valid string. other terms we can use include task, group... by default job will be set to worker.

apiVersion: "tensorflow.org/v1alpha1"
kind: "TfJob"
metadata:
  name: "example-job"
spec:
  replicaSpecs:
    - replicas: 1
      name: chief
      template:
        spec:
          containers:
            - image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
              name: chief   ## if i need a seperate chief, add volumes if we want persistent logs
          restartPolicy: OnFailure
    - replicas: 1
      name: worker
      template:
        spec:
          containers:
            - image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
              name: worker
          restartPolicy: OnFailure
    - replicas: 2
        name: ps
        template:
          spec:
            containers:
            - image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
              name: ps
          restartPolicy: OnFailure

@jlewi jlewi added this to the Kubecon Europe milestone Jan 25, 2018
@ScorpioCPH
Copy link
Member

I think the type name tfReplicaType is a little redundant, maybe type is fine.

It seems like we must use something to specify the Type of TFReplica.
+1 for type.

I don't think name or job can represent this exactly.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 19, 2018

Closing this issue since the proposal for the v1alpha2 API has been approved.

@jlewi jlewi closed this as completed Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants