New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFJob controller cannot terminate job #193
Comments
@bleachzk Thanks for your issue, and I am not sure if we support TerminationPolicy now, maybe jlewi@ could give us more info |
@bleachzk Can you please provide the spec/status for your TFJob? Can you also clarify what your expected behavior is and what the observed behavior is? Its possible you're hitting kubeflow/training-operator#128 |
@jlewi |
That's kubeflow/training-operator#128 We originally let the jobs continue to run until the TFJob is deleted to make the logs accessible after the job terminated. We are working on fixing that. |
Duplicate of kubeflow/training-operator#128 |
I test kubeflow/tf-controller-examples/tf-cnn/tf_job_gpu.yaml by running kubectl:
The TFJob controller cannot terminate job when the WORKER is done based on the TerminationPolicy。
WORKER log:
logs-from-tensorflow-in-inception-171202-163257-gpu-1-worker-272u-0-fdx5x.txt
The text was updated successfully, but these errors were encountered: