A CLI for Kubeflow.
Clone or download
xiaozhouX and k8s-ci-robot support prune cmd (#100)
* add prune cmd

* modify prune params

* add since params description

Signed-off-by: xiaozhouX <xuxiaozhou93@gmail.com>

* fix vet
Latest commit 7f60a29 Jan 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
charts Fix several minimal bugs (#107) Jan 18, 2019
cmd support prune cmd (#100) Jan 19, 2019
docs Fix several minimal bugs (#107) Jan 18, 2019
hack init data support (#48) Sep 4, 2018
kubernetes-artifacts Docs (#87) Jan 4, 2019
pkg/mpi-operator update to latest mpi-operator defintion (#21) Aug 20, 2018
samples Refactor TFServing & Enable Istio for Traffic Split (#47) Sep 5, 2018
types add file header license (#45) Aug 31, 2018
util support prune cmd (#100) Jan 19, 2019
vendor Support GKE (#58) Sep 19, 2018
.gitignore first commit Jul 30, 2018
.travis.yml add LICENSE, travis and update docs Jul 31, 2018
CHANGELOG.md init rdma support (#56) Sep 26, 2018
Dockerfile.install set default hostnetwork as false (#72) Nov 6, 2018
Dockerfile.notebook Put arena in notebook (#105) Jan 14, 2019
Gopkg.lock Support GKE (#58) Sep 19, 2018
Gopkg.toml Refactor TFServing & Enable Istio for Traffic Split (#47) Sep 5, 2018
LICENSE add LICENSE, travis and update docs Jul 31, 2018
Makefile Put arena in notebook (#105) Jan 14, 2019
OWNERS setup arena repo in kubeflow community (#42) Aug 31, 2018
README.md Fix long latency of 'arena list' (#93) Jan 3, 2019
README_cn.md Fix grammar issue in chinese version (#84) Dec 6, 2018
ROADMAP.md first commit Jul 30, 2018
ROADMAP_cn.md Fix grammar issue in chinese version (#84) Dec 6, 2018
VERSION first commit Jul 30, 2018
demo.jpg first commit Jul 30, 2018
license.txt init data support (#48) Sep 4, 2018
prow_config.yaml setup arena repo in kubeflow community (#42) Aug 31, 2018
run_arena.sh Fix several minimal bugs (#107) Jan 18, 2019
version.go add file header license (#45) Aug 31, 2018

README.md

Arena

Build Status Go Report Card

Overview

Arena is a command-line interface for the data scientists to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed TensorFlow training. In the backend, it is based on Kubernetes, helm and Kubeflow. But the data scientists can have very little knowledge about kubernetes.

Meanwhile, the end users require GPU resource and node management. Arena also provides top command to check avaliable GPU resources in the Kubernetes cluster.

In one word, Arena's goal is to make the data scientists feel like to work on a single machine but with the Power of GPU clusters indeed.

For the Chinese version, please refer to 中文文档

Setup

You can follow up the Installation guide

User Guide

Arena is a command-line interface to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed training.

Demo

Developing

Prerequisites:

  • Go >= 1.8
mkdir -p $GOPATH/src/github.com/kubeflow
cd $GOPATH/src/github.com/kubeflow
git clone https://github.com/kubeflow/arena.git
cd arena
make

arena binary is located in directory arena/bin. You may want to add the directory to $PATH.

CPU Profiling

# set profile rate (HZ)
export PROFILE_RATE=1000

# arena {command} --pprof
arena list --pprof
INFO[0000] Dump cpu profile file into /tmp/cpu_profile

Then you can analyze the profile by following Go CPU profiling: pprof and speedscope

CLI Document

Please refer to arena.md

RoadMap

See RoadMap