Kubeflow Training Operator
Starting from v1.3, this training operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.
Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes.
- For a complete reference of the custom resource definitions, please refer to the API Definition.
- For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator
- For details on its observability, please refer to the monitoring design doc.
- Version >= 1.23 of Kubernetes cluster and
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.5.0"
TensorFlow Release Only
For users who prefer to use original TensorFlow controllers, please checkout
v1.2-branch, patches for bug fixes will still be accepted to this branch.
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.2.0"
Python SDK for Kubeflow Training Operator
Training Operator provides Python SDK for the custom resources. More docs are available in sdk/python folder.
pip install command to install the latest release of the SDK:
pip install kubeflow-training
Please refer to following API Documentation:
The following links provide information about getting involved in the community:
- Attend the AutoML and Training Working Group community meeting.
- Join our Slack channel.
- Check out who is using the Training Operator.
This is a part of Kubeflow, so please see readme in kubeflow/kubeflow to get in touch with the community.
Please refer to the DEVELOPMENT
Please refer to CHANGELOG
The following table lists the most recent few versions of the operator.
|Operator Version||API Version||Kubernetes Version|
This project was originally started as a distributed training operator for TensorFlow and later we merged efforts from other Kubeflow training operators to provide a unified and simplified experience for both users and developers. We are very grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions. We'd also like to thank everyone who's contributed to and maintained the original operators.