A CLI for Kubeflow.
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
charts add tensorRT inference server support (#124) Feb 19, 2019
cmd add tensorRT inference server support (#124) Feb 19, 2019
docs fix incorrect command. (#122) Feb 13, 2019
hack init data support (#48) Sep 4, 2018
jupyter Features/release v0.2.0/add notebook (#123) Feb 19, 2019
kubernetes-artifacts fix get gpu metric error & typo error of 'gpu exporter' (#118) Feb 1, 2019
pkg Features/release v0.2.0/update documents (#117) Jan 31, 2019
samples Refactor TFServing & Enable Istio for Traffic Split (#47) Sep 5, 2018
vendor Support GKE (#58) Sep 19, 2018
.gitignore first commit Jul 30, 2018
.travis.yml add LICENSE, travis and update docs Jul 31, 2018
CHANGELOG.md init rdma support (#56) Sep 26, 2018
Dockerfile.install set default hostnetwork as false (#72) Nov 6, 2018
Dockerfile.notebook.cpu Features/release v0.2.0/add notebook (#123) Feb 19, 2019
Dockerfile.notebook.kubeflow Features/release v0.2.0/add notebook (#123) Feb 19, 2019
Gopkg.lock Support GKE (#58) Sep 19, 2018
Gopkg.toml Refactor TFServing & Enable Istio for Traffic Split (#47) Sep 5, 2018
LICENSE add LICENSE, travis and update docs Jul 31, 2018
Makefile Features/release v0.2.0/add notebook (#123) Feb 19, 2019
OWNERS setup arena repo in kubeflow community (#42) Aug 31, 2018
README.md Fix long latency of 'arena list' (#93) Jan 3, 2019
README_cn.md Fix grammar issue in chinese version (#84) Dec 6, 2018
ROADMAP.md first commit Jul 30, 2018
ROADMAP_cn.md Fix grammar issue in chinese version (#84) Dec 6, 2018
VERSION Features/release v0.2.0/submit job (#109) Jan 23, 2019
demo.jpg first commit Jul 30, 2018
license.txt init data support (#48) Sep 4, 2018
prow_config.yaml setup arena repo in kubeflow community (#42) Aug 31, 2018
run_arena.sh fix typo in run_arena.sh (#119) Feb 1, 2019
run_jupyter.sh Features/release v0.2.0/add notebook (#123) Feb 19, 2019
version.go add file header license (#45) Aug 31, 2018

README.md

Arena

Build Status Go Report Card

Overview

Arena is a command-line interface for the data scientists to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed TensorFlow training. In the backend, it is based on Kubernetes, helm and Kubeflow. But the data scientists can have very little knowledge about kubernetes.

Meanwhile, the end users require GPU resource and node management. Arena also provides top command to check avaliable GPU resources in the Kubernetes cluster.

In one word, Arena's goal is to make the data scientists feel like to work on a single machine but with the Power of GPU clusters indeed.

For the Chinese version, please refer to 中文文档

Setup

You can follow up the Installation guide

User Guide

Arena is a command-line interface to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed training.

Demo

Developing

Prerequisites:

  • Go >= 1.8
mkdir -p $GOPATH/src/github.com/kubeflow
cd $GOPATH/src/github.com/kubeflow
git clone https://github.com/kubeflow/arena.git
cd arena
make

arena binary is located in directory arena/bin. You may want to add the directory to $PATH.

CPU Profiling

# set profile rate (HZ)
export PROFILE_RATE=1000

# arena {command} --pprof
arena list --pprof
INFO[0000] Dump cpu profile file into /tmp/cpu_profile

Then you can analyze the profile by following Go CPU profiling: pprof and speedscope

CLI Document

Please refer to arena.md

RoadMap

See RoadMap