## Merlin Setup

The goal of this notebook is to create a user-managed container that can be used in Vertex Workbench

### Setup variables, paths, and create artifact registry

In [1]:
VERSION=22.09
REPO_NAME="workbench"
REGION="us-central1"
PROJECT="hybrid-vertex" # TODO: update with your project_id
IMAGE_ID="tensorflow"
MERLIN_IMAGE_NAME=f"{REGION}-docker.pkg.dev/{PROJECT}/{REPO_NAME}/merlin-{IMAGE_ID}-{VERSION}"

MERLIN_CONTAINER="merlin_container"

!mkdir {MERLIN_CONTAINER}

!gcloud beta artifacts repositories create {REPO_NAME} \
    --repository-format=docker \
    --location=$REGION

mkdir: cannot create directory ‘merlin_container’: File exists
[1;31mERROR:[0m (gcloud.beta.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [2]:
!gcloud config set project $PROJECT

Updated property [core/project].


#### Find your service account for you tenant project - usually project_id-compute@developer.gserviceaccount.com

In [3]:
# !gcloud projects add-iam-policy-binding hybrid-vertex --member=serviceAccount:xxxxxxxx-compute@developer.gserviceaccount.com --role=roles/artifactregistry.admin

In [4]:
!gcloud auth configure-docker us-central1-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


### Create Docker image derived from Deeplearning containers

In [5]:
%%writefile {MERLIN_CONTAINER}/jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.token = ''
c.NotebookApp.password = ''
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8080
c.NotebookApp.terminado_settings = {'shell_command': ['/bin/bash']}
c.NotebookApp.allow_origin_pat = (
'(^https://8080-dot-[0-9]+-dot-devshell\.appspot\.com$)|'
'(^https://colab\.research\.google\.com$)|'
'((https?://)?[0-9a-z]+-dot-(?:us|asia|europe|northamerica|southamerica)-?[0-9a-z]+\.notebooks\.googleusercontent.com)')
c.NotebookApp.allow_remote_access = True
c.NotebookApp.disable_check_xsrf = False

Overwriting merlin_container/jupyter_notebook_config.py


In [9]:
%%writefile {MERLIN_CONTAINER}/Dockerfile
FROM nvcr.io/nvidia/merlin/merlin-tensorflow:22.09
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg  add - && apt-get update -y && apt-get install google-cloud-sdk -y
EXPOSE 8080

# copy the dependencies file to the working directory
COPY merlin_container/requirements.txt .
# install dependencies
RUN pip install -r requirements.txt
#RUN mkdir /root/.jupyter
             
COPY merlin_container/jupyter_notebook_config.py /root/.jupyter

ENV pwd=""
ENTRYPOINT exec jupyter-lab --ip=0.0.0.0 --port=8080 --no-browser --allow-root --ServerApp.allow_origin="*" --NotebookApp.token="$pwd" --NotebookApp.password="$pwd"

Overwriting merlin_container/Dockerfile


In [10]:
%%writefile {MERLIN_CONTAINER}/requirements.txt
fastapi
git+https://github.com/NVIDIA-Merlin/models.git
gsutil
gcsfs
matplotlib
google-cloud-aiplatform

Overwriting merlin_container/requirements.txt


In [None]:
!docker build . -f $MERLIN_CONTAINER/Dockerfile -t $MERLIN_IMAGE_NAME
!docker push $MERLIN_IMAGE_NAME