# Job Loader

This notebook is used to load jobs in this repo to the `wandb/jobs` public project.
- You will need to be logged into wandb and have access to the `wandb` entity.

## Setup

Kill ports ahead of spinning up containers (you may need to restart docker)

In [1]:
# !lsof -i TCP:3307 | grep LISTEN | awk '{print $2}' | xargs kill -9
# !lsof -i TCP:8000 | grep LISTEN | awk '{print $2}' | xargs kill -9

In [2]:
import wandb
from pathlib import Path

In [3]:
%%capture
%env WANDB_API_KEY {wandb.api.api_key}
%env WANDB_ENTITY wandb
%env WANDB_PROJECT jobs

## Python jobs

Note: The SQL Query job depends on access to a database.  You can load this dummy database with the snippet below:

In [4]:
!sudo docker run -p 3307:3306 -d sakiladb/mysql:latest
%env MYSQL_USER sakila
%env MYSQL_PASSWORD p_ssW0rd

93dcdd9821437d243a2d7354ca2f7375a828ca5b7282c51f73d943af418b0cec
docker: Error response from daemon: driver failed programming external connectivity on endpoint happy_mirzakhani (c5bba04dd80bc6614f0560b5eb96766a3369965efeaec80d0cebc2ae5d2e7916): Bind for 0.0.0.0:3307 failed: port is already allocated.
env: MYSQL_USER=sakila
env: MYSQL_PASSWORD=p_ssW0rd


Run python jobs as usual:

In [5]:
python_jobs = list(Path('jobs').glob('**/*job.py'))
python_jobs

[PosixPath('jobs/sql_query_table/job.py'),
 PosixPath('jobs/sql_query_artifact/job.py'),
 PosixPath('jobs/github_actions_workflow_dispatch/job.py'),
 PosixPath('jobs/msft_teams_webhook/job.py'),
 PosixPath('jobs/hello_world/job.py'),
 PosixPath('jobs/http_webhook/job.py')]

In [6]:
for job in python_jobs:
    %env WANDB_NAME {job.parent.name}
    %env WANDB_JOBS_REPO_CONFIG {job.parent/'config.yml'}
    !pip install -r {job.parent/'requirements.txt'} --quiet
    !python {job}

env: WANDB_NAME=sql_query_table
env: WANDB_JOBS_REPO_CONFIG=jobs/sql_query_table/config.yml
[34m[1mwandb[0m: Currently logged in as: [33mmegatruong[0m ([33mwandb[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.14.0
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/home/ubuntu/launch-jobs/wandb/run-20230320_155706-fw511hph[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33msql_query_table[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/wandb/jobs[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/wandb/jobs/runs/fw511hph[0m
[34m[1mwandb[0m: Waiting for W&B process to finish... [32m(success).[0m
[34m[1mwandb[0m: 🚀 View run [33msql_query_table[0m at: [34m[4mhttps://wandb.ai/wandb/jobs/runs/fw511hph[0m
[34m[1mwandb[0m: Synced 6 W&B file(s), 1 media file(s), 4 artifact file(s) and 1 other file(s)
[34m[

## Docker jobs
- These jobs touch AWS, so they mount the `.aws` directory.
- If you need to see the literal command, prepend `set -x &&` to the shell command

## Sagemaker Endpoints job

In [7]:
%env WANDB_NAME deploy_to_sagemaker_endpoints
%env WANDB_JOBS_REPO_CONFIG config_tensorflow.yml

!sudo docker build -t $WANDB_NAME jobs/deploy_to_sagemaker_endpoints && \
sudo docker run \
   -v $HOME/.aws:/root/.aws:ro \
   -e WANDB_API_KEY=$WANDB_API_KEY \
   -e WANDB_ENTITY=$WANDB_ENTITY \
   -e WANDB_PROJECT=$WANDB_PROJECT \
   -e WANDB_NAME=$WANDB_NAME \
   -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
   -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
   $WANDB_NAME

env: WANDB_NAME=deploy_to_sagemaker_endpoints
env: WANDB_JOBS_REPO_CONFIG=config_tensorflow.yml
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 301B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9-buster       0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (3/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 301B                 

## Nvidia Triton Job

This job requires a running Triton Server.  You can start one with this snippet

In [8]:
# you may need this export on M1
# related: https://github.com/keras-team/keras-tuner/issues/317#issuecomment-640181692
%env LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libgomp.so.1"

!sudo docker build -t tritonserver-wandb jobs/deploy_to_nvidia_triton/server && \
sudo docker run \
  -v $HOME/.aws:/root/.aws:ro \
  -p 8000:8000 \
  --rm --net=host -d \
  tritonserver-wandb

env: LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libgomp.so.1"
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/2)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (3/4)                                                         
[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 203B                                       0.0s
[0m => [internal] load metadata for nvcr.io/nvidia/tritonserver:22.11-py3     0.1s
[34m => [auth] nvidia/tritonserver:pull,push token for nvcr.io                 0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (3/4)      

Then launch the job

In [9]:
%env WANDB_NAME deploy_to_nvidia_triton
%env WANDB_JOBS_REPO_CONFIG config_tensorflow.yml

!sudo docker build -t $WANDB_NAME jobs/deploy_to_nvidia_triton/deployer && \
sudo docker run \
   -v $HOME/.aws:/root/.aws:ro \
   -e WANDB_API_KEY=$WANDB_API_KEY \
   -e WANDB_ENTITY=$WANDB_ENTITY \
   -e WANDB_PROJECT=$WANDB_PROJECT \
   -e WANDB_NAME=$WANDB_NAME \
   -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
   -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
   --rm --net=host \
   $WANDB_NAME

env: WANDB_NAME=deploy_to_nvidia_triton
env: WANDB_JOBS_REPO_CONFIG=config_tensorflow.yml
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 415B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9-buster       0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => 

## Nvidia Tensor RT Conversion Job
This job requires a GPU.

In [10]:
%env WANDB_NAME optimize_with_nvidia_tensorrt
%env WANDB_JOBS_REPO_CONFIG config.yml

!sudo docker build -t $WANDB_NAME jobs/optimize_with_tensor_rt && \
sudo docker run \
    --gpus all \
    --runtime=nvidia \
    -e WANDB_API_KEY=$WANDB_API_KEY \
    -e WANDB_ENTITY=$WANDB_ENTITY \
    -e WANDB_PROJECT=$WANDB_PROJECT \
    -e WANDB_NAME=$WANDB_NAME \
    -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
    -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
    $WANDB_NAME

env: WANDB_NAME=optimize_with_nvidia_tensorrt
env: WANDB_JOBS_REPO_CONFIG=config.yml
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (3/4)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 203B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for nvcr.io/nvidia/tensorflow:22.12-tf2-py3   0.1s
[34m => [auth] nvidia/tensorflow:pull,push token for nvcr.io                   0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+