# Job Loader

This notebook is used to load jobs in this repo to the `wandb/jobs` public project.
- You will need to be logged into wandb and have access to the `wandb` entity.

## Setup

Kill ports ahead of spinning up containers (you may need to restart docker)

In [1]:
# !lsof -i TCP:3307 | grep LISTEN | awk '{print $2}' | xargs kill -9
# !lsof -i TCP:8000 | grep LISTEN | awk '{print $2}' | xargs kill -9

In [2]:
import wandb
from pathlib import Path

In [3]:
%%capture
%env WANDB_API_KEY {wandb.api.api_key}
%env WANDB_ENTITY megatruong
%env WANDB_PROJECT jobz

## Python jobs

Note: The SQL Query job depends on access to a database.  You can load this dummy database with the snippet below:

In [4]:
!docker run -p 3307:3306 -d sakiladb/mysql:latest
%env MYSQL_USER sakila
%env MYSQL_PASSWORD p_ssW0rd

e1ba379b3d85e36c8d8896b472290d6c2c4bf6fa2b85b24235a977c81f227df2
docker: Error response from daemon: driver failed programming external connectivity on endpoint quizzical_mccarthy (9b20ce06f6467848340a26d4da4ea653d7762223c862ac12ed95dd70663fca71): Bind for 0.0.0.0:3307 failed: port is already allocated.
env: MYSQL_USER=sakila
env: MYSQL_PASSWORD=p_ssW0rd


Run python jobs as usual:

In [7]:
python_jobs = list(Path('jobs').glob('**/*job.py'))
python_jobs = python_jobs[:1]
python_jobs

[PosixPath('jobs/openai_eval/job.py')]

In [58]:
for job in python_jobs:
    %env WANDB_NAME {job.parent.name}
    %env WANDB_JOBS_REPO_CONFIG {job.parent/'config.yml'}
    if (job.parent/'bootstrap.sh').is_file():
        !bash {job.parent}/bootstrap.sh
    !pip install -r {job.parent/'requirements.txt'}
    !python {job}

env: WANDB_NAME=openai_eval
env: WANDB_JOBS_REPO_CONFIG=jobs/openai_eval/config.yml
Obtaining file:///Users/andrewtruong/repos/launch-jobs/evals
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: evals
  Building editable for evals (pyproject.toml) ... [?25ldone
[?25h  Created wheel for evals: filename=evals-0.1.1-0.editable-py3-none-any.whl size=4278 sha256=de17ee2073e505929ef6473d7ffdd440fe55b644b6e64eb30218eb8178aeb19f
  Stored in directory: /private/var/folders/tz/w7nbszhj74s3s9lyvpv71mz80000gn/T/pip-ephem-wheel-cache-2_3j0aq_/wheels/c6/a9/b2/f4a8a1f184ea24764056d59bb9c6dc4b8b66e9c63e0668e034
Successfully built evals
Installing collected packages: evals
  Attempting uninstall: evals
    Found existing installation: evals 0.1.1
    Uninstalling e

## Docker jobs
- These jobs touch AWS, so they mount the `.aws` directory.
- If you need to see the literal command, prepend `set -x &&` to the shell command

## Sagemaker Endpoints job

In [7]:
%env WANDB_NAME deploy_to_sagemaker_endpoints
%env WANDB_JOBS_REPO_CONFIG config_tensorflow.yml

!sudo docker build -t $WANDB_NAME jobs/deploy_to_sagemaker_endpoints && \
sudo docker run \
   -v $HOME/.aws:/root/.aws:ro \
   -e WANDB_API_KEY=$WANDB_API_KEY \
   -e WANDB_ENTITY=$WANDB_ENTITY \
   -e WANDB_PROJECT=$WANDB_PROJECT \
   -e WANDB_NAME=$WANDB_NAME \
   -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
   -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
   $WANDB_NAME

env: WANDB_NAME=deploy_to_sagemaker_endpoints
env: WANDB_JOBS_REPO_CONFIG=config_tensorflow.yml
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 301B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9-buster       0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (3/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 301B                 

## Nvidia Triton Job

This job requires a running Triton Server.  You can start one with this snippet

In [8]:
# you may need this export on M1
# related: https://github.com/keras-team/keras-tuner/issues/317#issuecomment-640181692
%env LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libgomp.so.1"

!sudo docker build -t tritonserver-wandb jobs/deploy_to_nvidia_triton/server && \
sudo docker run \
  -v $HOME/.aws:/root/.aws:ro \
  -p 8000:8000 \
  --rm --net=host -d \
  tritonserver-wandb

env: LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libgomp.so.1"
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/2)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (3/4)                                                         
[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 203B                                       0.0s
[0m => [internal] load metadata for nvcr.io/nvidia/tritonserver:22.11-py3     0.1s
[34m => [auth] nvidia/tritonserver:pull,push token for nvcr.io                 0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (3/4)      

Then launch the job

In [9]:
%env WANDB_NAME deploy_to_nvidia_triton
%env WANDB_JOBS_REPO_CONFIG config_tensorflow.yml

!sudo docker build -t $WANDB_NAME jobs/deploy_to_nvidia_triton/deployer && \
sudo docker run \
   -v $HOME/.aws:/root/.aws:ro \
   -e WANDB_API_KEY=$WANDB_API_KEY \
   -e WANDB_ENTITY=$WANDB_ENTITY \
   -e WANDB_PROJECT=$WANDB_PROJECT \
   -e WANDB_NAME=$WANDB_NAME \
   -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
   -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
   --rm --net=host \
   $WANDB_NAME

env: WANDB_NAME=deploy_to_nvidia_triton
env: WANDB_JOBS_REPO_CONFIG=config_tensorflow.yml
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 415B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/python:3.9-buster       0.1s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => 

## Nvidia Tensor RT Conversion Job
This job requires a GPU.

In [10]:
%env WANDB_NAME optimize_with_nvidia_tensorrt
%env WANDB_JOBS_REPO_CONFIG config.yml

!sudo docker build -t $WANDB_NAME jobs/optimize_with_tensor_rt && \
sudo docker run \
    --gpus all \
    --runtime=nvidia \
    -e WANDB_API_KEY=$WANDB_API_KEY \
    -e WANDB_ENTITY=$WANDB_ENTITY \
    -e WANDB_PROJECT=$WANDB_PROJECT \
    -e WANDB_NAME=$WANDB_NAME \
    -e WANDB_RUN_GROUP=$WANDB_RUN_GROUP \
    -e WANDB_JOBS_REPO_CONFIG=$WANDB_JOBS_REPO_CONFIG \
    $WANDB_NAME

env: WANDB_NAME=optimize_with_nvidia_tensorrt
env: WANDB_JOBS_REPO_CONFIG=config.yml
ERROR: ld.so: object '"/usr/lib/aarch64-linux-gnu/libgomp.so.1"' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.2s (3/4)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 203B                                       0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for nvcr.io/nvidia/tensorflow:22.12-tf2-py3   0.1s
[34m => [auth] nvidia/tensorflow:pull,push token for nvcr.io                   0.0s
[0m[?25h[1A[1A[1A[1A[1A[1A[1A[0G[?25l[+