Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppose i've install the tfjob in my k8s cluster #265

Closed
GoodJoey opened this issue Jan 4, 2018 · 2 comments
Closed

suppose i've install the tfjob in my k8s cluster #265

GoodJoey opened this issue Jan 4, 2018 · 2 comments

Comments

@GoodJoey
Copy link

GoodJoey commented Jan 4, 2018

if i want distribute train with tfjob, what changes do i need to do in my code(say train.py).
if i understand right, i still need to set up the ClusterSpec, right? thanks

@jlewi
Copy link
Contributor

jlewi commented Jan 5, 2018

You might want to look at this tutorial
#195 (comment)

The correct ClusterSpec will be provided as part of the TF_CONFIG environment variable. The code changes you make depends on which TensorFlow APIs you are using.

Higher level API's like tf.Estimator I believe automatically check the TF_CONFIG environment variable for the cluster spec so you might not need to do any work.

If you are manually constructing Tf servers then you might have to get the cluster spec from TF_CONFIG and pass it through.

For one such example look at the tf-cnn example. This example uses one of the TensorFlow models published by the TensorFlow team. This example uses low level APIs that depend on a variety of flags being set to configure the job for distributed processing. So in this case we have a launcher script which parses TF_CONFIG to determine the values to set for the flags and then invokes the binary.

@jlewi
Copy link
Contributor

jlewi commented Jan 7, 2018

Please reopen this issue if you have further questions.

@jlewi jlewi closed this as completed Jan 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants