Tools for ML/Tensorflow on Kubernetes.
Clone or download
gabrielwen and k8s-ci-robot Verify pod names. (#911)
* First commit of pod_names_validation_test

* Fix lint error

* Add components to test

* Add replica specs extraction

* Add pod-names-validation-tests to workflow

* Remove config check

* Remove some unused changes

* Add missing params

* Add get_jobs

* comment out unused var

* fix typo

* Add pretty print

* Fix print

* Fix typo

* fix typo

* Try use list_namespaced_pod from core api

* Remove json formatting

* Try remove field selector

* try probe namespace/name

* Remove print resp

* disable list_namespaced_pod for now

* Try use list_cluster_custom_object

* Fix get_jobs call

* Fix func all

* Get list of pod names

* fix list callable

* Log response

* Add label selector

* Add some logging

* Fix indent

* Add expected pod names

* Fix get error

* dummy change to retest

* remove temp vim file

* Encode unicode to plain string

* Fix

* log type

* dummy

* casting

* print

* print

* uncomment

* remove swp

* Fix format

* Add set comp

* remove swp

* rename xrange

* take a look at env and params

* Use to_selector

* Try labels

* Temp remove label

* revert bad changes

* Add explanation

* fix unintentional changes
Latest commit 07b3f74 Jan 15, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build/images/tf_operator Install go manually Dec 5, 2018
cmd Delete v1alpha1 Nov 8, 2018
dashboard add frontend-dir & port to backend program (#841) Oct 8, 2018
docs dist-mnist: Move to examples (#660) Jun 15, 2018
examples Add mnist example with TF summary (#880) Dec 7, 2018
hack Fix build Nov 8, 2018
pkg Don't reinitialize replica statuses after TFJob completes (#897) Jan 9, 2019
py Verify pod names. (#911) Jan 15, 2019
test Verify pod names. (#911) Jan 15, 2019
vendor gopkg: Use version instead of branch (#872) Nov 19, 2018
.gcloudignore Adds gcloudignore (#510) Mar 30, 2018
.gitignore Add .swp to gitignore. (#912) Jan 13, 2019
.pylintrc .pylinrc: Add dist_mnist (#581) May 10, 2018
.style.yapf use yapf to format python code (#401) Feb 26, 2018
.travis.yml build backend in travis ci (#892) Dec 17, 2018
CHANGELOG.md Update changelog to include changes in v0.3. (#839) Oct 8, 2018
Gopkg.lock Upgrade to 1.10.1 (#874) Nov 21, 2018
Gopkg.toml Upgrade to 1.10.1 (#874) Nov 21, 2018
LICENSE Initial commit Jun 28, 2017
OWNERS Add richardsliu to OWNERS (#847) Oct 18, 2018
README.md CHANGELOG: Add (#693) Aug 29, 2018
developer_guide.md Fix spelling mistake in developer_guide.md (#884) Dec 10, 2018
e2e_testing.md Add documentation for writing E2E tests (#852) Oct 28, 2018
linter_config.json Delete v1alpha1 Nov 8, 2018
prow_config.yaml TF operator v1beta1 e2etests (#863) Nov 7, 2018
releasing.md Create a script to release the TFJob operator image (#515) Apr 2, 2018
submit_release_job.sh Create a script to release the TFJob operator image (#515) Apr 2, 2018
tf_job_design_doc.md Remove TensorBoard related code in operator (#391) Feb 27, 2018

README.md

K8s Custom Resource and Operator For TensorFlow jobs

Build Status Coverage Status Go Report Card

Quick Links

Overview

TFJob provides a Kubernetes custom resource that makes it easy to run distributed or non-distributed TensorFlow jobs on Kubernetes.

Please refer to the user guide for more information.

Contributing

Please refer to the developer_guide

Change Log

Please refer to CHANGELOG

Community

This is a part of Kubeflow, so please see readme in kubeflow/kubeflow to get in touch with the community.