Skip to content

This repo is for beginner who want to learn and use Submarine

License

Notifications You must be signed in to change notification settings

pingsutw/hello-submarine

Repository files navigation

Docker Hub package Maintenance Ask Me Anything ! Open Source Love svg1 GitHub forks Tweeting

ForTheBadge built-with-swag ForTheBadge built-with-science

Table of Contents

What is Hadoop Submarine?

Submarine is a new subproject of Apache Hadoop.

Submarine is a project which allows infra engineer / data scientist to run unmodified Tensorflow or PyTorch programs on YARN or Kubernetes.

Goals of Submarine:

  • It allows jobs easy access data/models in HDFS and other storages.
  • Can launch services to serve Tensorflow/PyTorch models.
  • Support run distributed Tensorflow jobs with simple configs.
  • Support run user-specified Docker images.
  • Support specify GPU and other resources.
  • Support launch tensorboard for training jobs if user specified.
  • Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

Architecture

  • workbench

hello-submarine

There is no complete and easy to understand example for beginner, and Submarine support many open source infrastructure, it's hard to deploy each runtime environment for engineer, not to mention data sciences

This repo is aim to let user easily deploy container orchestrations (like Hadoop Yarn, k8s) by docker container, support full distributed deep learning example for each runtimes, and step by step tutorial for beginner.

Prerequisites

  • Ubuntu 18.04+
  • Docker
  • Memory > 8G

Before you start it, you need to know

mini-submarine (Submarine in docker)

A fast and easy way to deploy Submarine on your laptop.

With just a few clicks, you are up for experimentation, and for running complete Submarine experiment.

mini-submarine includes:

  • Standalone Hadoop v2.9.2
  • Standalone Zookeeper v3.4.14
  • Latest version of Apache Submarine
  • TensorFlow example (MNIST handwritten digit)

Build docker image from local

docker build --tag hello-submarine ./mini-submarine
docker run -it -h submarine-dev --name mini-submarine --net=bridge --privileged -P hello-submarine /bin/bash

Pull image from dockerhub

docker pull pingsutw/hello-submarine
docker run -it -h submarine-dev --name mini-submarine --net=bridge --privileged -P pingsutw/hello-submarine /bin/bash

Run submarine CTR Library

pwd # /home/yarn/submarine
. ./venv/bin/activate

# change directory
cd ..
cd tests

# run locally
python run_deepfm.py -conf deepfm.json -task train
python run_deepfm.py -conf deepfm.json -task evaluate
# Model metrics :  {'auc': 0.64110434, 'loss': 0.4406755, 'global_step': 12}

# run distributedly
export SUBMARINE_VERSION=0.6.0-SNAPSHOT
export SUBMARINE_HADOOP_VERSION=2.9
export SUBMARINE_JAR=/opt/submarine-dist-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}/submarine-dist-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar

java -cp $(${HADOOP_COMMON_HOME}/bin/hadoop classpath --glob):${SUBMARINE_JAR}:${HADOOP_CONF_PATH} \
 org.apache.submarine.client.cli.Cli job run --name deepfm-job-001 \
 --framework tensorflow \
 --verbose \
 --input_path "" \
 --num_workers 2 \
 --worker_resources memory=2G,vcores=4 \
 --num_ps 1 \
 --ps_resources memory=2G,vcores=4 \
 --worker_launch_cmd "myvenv.zip/venv/bin/python run_deepfm.py -conf=deepfm_distributed.json" \
 --ps_launch_cmd "myvenv.zip/venv/bin/python run_deepfm.py -conf=deepfm_distributed.json" \
 --insecure \
 --conf tony.containers.resources=../submarine/myvenv.zip#archive,${SUBMARINE_JAR},deepfm_distributed.json,run_deepfm.py

Submarine On Kubernetes

Deploy all component on K8s, including

Prerequisites

Install Kind (local clusters for testing Kubernetes)

curl -Lo ./kind "https://github.com/kubernetes-sigs/kind/releases/download/v0.7.0/kind-$(uname)-amd64"
chmod +x ./kind
mv ./kind /some-dir-in-your-PATH/kind

Create K8s Cluster

kind create cluster --image kindest/node:v1.15.6 --name k8s-submarine
kubectl create namespace submarine
# set submarine as default namspace
kubectl config set-context --current --namespace=submarine

Install Kubectl

curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
kubectl version --client

Install Helm

curl https://helm.baltorepo.com/organization/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

Deploy Submarine On K8s

helm install submarine ./helm-charts/submarine

Expose submarine server service, so that user can connect submarine workbench

kubectl port-forward svc/submarine-server 8080:8080
# open workbench http://localhsot:8080
# Account: admin
# Password: admin

Create a TensorFlow distributed training experiment

curl -X POST -H "Content-Type: application/json" -d '
{
  "meta": {
    "name": "tf-mnist-json",
    "namespace": "submarine",
    "framework": "TensorFlow",
    "cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
    "envVars": {
      "ENV_1": "ENV1"
    }
  },
  "environment": {
    "image": "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
  },
  "spec": {
    "Ps": {
      "replicas": 1,
      "resources": "cpu=1,memory=512M"
    },
    "Worker": {
      "replicas": 1,
      "resources": "cpu=1,memory=512M"
    }
  }
}
' http://127.0.0.1:32080/api/v1/experiment

Submarine On Hadoop

TBD

About

This repo is for beginner who want to learn and use Submarine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published