Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the function of tfReplicaType: Master ? #1442

Closed
lxl910915 opened this issue Aug 29, 2018 · 5 comments
Closed

What's the function of tfReplicaType: Master ? #1442

lxl910915 opened this issue Aug 29, 2018 · 5 comments

Comments

@lxl910915
Copy link

lxl910915 commented Aug 29, 2018

Why and when kubeflow removed Master?
In a doc, we see: The master only acts as the chief and doesn't do any training. When master started and intilized, how master waits ps and worker to completed?

@gaocegege
Copy link
Member

Master now is replaced by chief.

@jlewi
Copy link
Contributor

jlewi commented Aug 31, 2018

Earlier versions of TensorFlow used "Master" to indicate the process that coordinates distributed training. TensorFlow then changed to calling this "Chief".

We added "Chief" to be consistent with TensorFlow terminology; but we've kept "Master" for backwards compatibility with older versions of TF.

@jlewi jlewi closed this as completed Aug 31, 2018
@lxl910915
Copy link
Author

lxl910915 commented Sep 4, 2018

@jlewi
In version v1alpha2, is master removed? The tf_job yaml file has no master. We are very confusing for this.

@johnugeorge
Copy link
Member

As @jlewi said, Master type is still kept for backwards compatibility. https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/tensorflow/v1alpha2/types.go#L135

@lxl910915
Copy link
Author

lxl910915 commented Sep 4, 2018

@johnugeorge
Do you mean tf_job only need to contain ps and worker ? no "Master", and no "Chief" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants