This documentation assumes you have a Kubernetes cluster already available.
If you need help setting up a Kubernetes cluster please refer to Kubernetes Setup.
If you want to use GPUs, be sure to follow the Kubernetes instructions for enabling GPUs.
Arena doesn't have to run can be run within Kubernetes cluster. It can also be run in your laptop. If you can run kubectl
to manage the Kubernetes cluster there, you can also use arena
to manage Training Jobs.
- Kubernetes >= 1.10
- helm version v2.8.2 or later
- tiller with ths same version of helm should be also installed (https://docs.helm.sh/using_helm/#installing-tiller)
1. Prepare kubeconfig file by using export KUBECONFIG=/etc/kubernetes/admin.conf
or creating a ~/.kube/config
2. Install kubectl client
Please follow kubectl installation guide
3. Install Helm client
- Download Helm client from github.com
- Unpack it (tar -zxvf helm-v2.8.2-linux-amd64.tgz)
- Find the
helm
binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm)
Then run helm list
to check if the the kubernetes can be managed successfully by helm.
# helm list
# echo $?
0
4. Download the charts
mkdir /charts
git clone https://github.com/kubeflow/arena.git
cp -r arena/charts/* /charts
5. Install TFJob Controller
kubectl create -f arena/kubernetes-artifacts/jobmon/jobmon-role.yaml
kubectl create -f arena/kubernetes-artifacts/tf-operator/tf-operator.yaml
6. Install Dashboard
kubectl create -f arena/kubernetes-artifacts/dashboard/dashboard.yaml
7. Install MPIJob Controller
kubectl create -f arena/kubernetes-artifacts/mpi-operator/mpi-operator.yaml
8. Install arena
Prerequisites:
- Go >= 1.8
mkdir -p $GOPATH/src/github.com/kubeflow
cd $GOPATH/src/github.com/kubeflow
git clone https://github.com/kubeflow/arena.git
cd arena
make
arena
binary is located in directory arena/bin
. You may want add the directory to $PATH
.
9. Install and configure kube-arbitrator for gang scheduling(optional)
kubectl create -f arena/kubernetes-artifacts/kube-batchd/kube-batched.yaml