## Merlin Setup

The goal of this notebook is to create a user-managed container that can be used in Vertex Workbench

### Setup variables, paths, and create artifact registry

In [1]:
VERSION=22.07
REPO_NAME="workbench"
REGION="us-central1"
PROJECT="hybrid-vertex" # TODO: update with your project_id
IMAGE_ID="tensorflow"
MERLIN_IMAGE_NAME=f"{REGION}-docker.pkg.dev/{PROJECT}/{REPO_NAME}/merlin-{IMAGE_ID}-{VERSION}"

MERLIN_CONTAINER="merlin_container"

!mkdir {MERLIN_CONTAINER}

!gcloud beta artifacts repositories create {REPO_NAME} \
    --repository-format=docker \
    --location=$REGION

[1;31mERROR:[0m (gcloud.beta.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [2]:
!gcloud config set project $PROJECT

Updated property [core/project].


#### Find your service account for you tenant project - usually project_id-compute@developer.gserviceaccount.com

In [3]:
# !gcloud projects add-iam-policy-binding hybrid-vertex --member=serviceAccount:xxxxxxxx-compute@developer.gserviceaccount.com --role=roles/artifactregistry.admin

In [4]:
!gcloud auth configure-docker us-central1-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
Docker configuration file updated.


### Create Docker image derived from Deeplearning containers

In [5]:
%%writefile {MERLIN_CONTAINER}/jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.token = ''
c.NotebookApp.password = ''
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8080
c.NotebookApp.terminado_settings = {'shell_command': ['/bin/bash']}
c.NotebookApp.allow_origin_pat = (
'(^https://8080-dot-[0-9]+-dot-devshell\.appspot\.com$)|'
'(^https://colab\.research\.google\.com$)|'
'((https?://)?[0-9a-z]+-dot-(?:us|asia|europe|northamerica|southamerica)-?[0-9a-z]+\.notebooks\.googleusercontent.com)')
c.NotebookApp.allow_remote_access = True
c.NotebookApp.disable_check_xsrf = False

Writing merlin_container/jupyter_notebook_config.py


In [6]:
%%writefile {MERLIN_CONTAINER}/Dockerfile
FROM nvcr.io/nvidia/merlin/merlin-tensorflow:22.07
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg  add - && apt-get update -y && apt-get install google-cloud-sdk -y
RUN pip install google-cloud-aiplatform google-cloud-pipeline-components google-cloud-bigquery-storage kfp ipykernel
EXPOSE 8080
             
# copy the dependencies file to the working directory
COPY merlin_container/requirements.txt .

# install dependencies
RUN pip install -r requirements.txt
#RUN mkdir /root/.jupyter
             
COPY merlin_container/jupyter_notebook_config.py /root/.jupyter
USER jupyter
ENV pwd="/home/jupyter"
ENTRYPOINT exec jupyter-lab --ip=0.0.0.0 --port=8080 --no-browser --allow-root --ServerApp.allow_origin="*" --NotebookApp.token="$pwd" --NotebookApp.password="$pwd"

Writing merlin_container/Dockerfile


In [7]:
%%writefile {MERLIN_CONTAINER}/requirements.txt
gcsfs
gsutil
google-cloud-aiplatform

Writing merlin_container/requirements.txt


In [8]:
!docker build . -f $MERLIN_CONTAINER/Dockerfile -t $MERLIN_IMAGE_NAME
!docker push $MERLIN_IMAGE_NAME

Sending build context to Docker daemon  2.813MB
Step 1/10 : FROM nvcr.io/nvidia/merlin/merlin-tensorflow:22.07
22.07: Pulling from nvidia/merlin/merlin-tensorflow

[1Be07ed847: Pulling fs layer 
[1B836e525f: Pulling fs layer 
[1Bd80655c9: Pulling fs layer 
[1B6f9095ad: Pulling fs layer 
[1B0fa05b26: Pulling fs layer 
[1Bcb5b926d: Pulling fs layer 
[1B2a62ee1a: Pulling fs layer 
[1B3aeb5eff: Pulling fs layer 
[1B75fc348a: Pulling fs layer 
[5Bcb5b926d: Waiting fs layer 
[1B9411216f: Pulling fs layer 
[2B9411216f: Waiting fs layer 
[1B99ab7e47: Pulling fs layer 
[6B75fc348a: Waiting fs layer 
[4B6fa62517: Waiting fs layer 
[1B38179bf5: Pulling fs layer 
[4Bb700ef54: Waiting fs layer 
[4B3c7c351c: Waiting fs layer 
[1B2e55b7c4: Pulling fs layer 
[4B5e8cd6b2: Waiting fs layer 
[6B38179bf5: Waiting fs layer 
[1Bf3c8600e: Pulling fs layer 
[1B5f748676: Pulling fs layer 
[1Ba873c9fa: Pulling fs layer 
[1Ba607ec34: Pulling fs layer 
[1Bf5dc6a40: Waiting fs layer 
[1B