## Merlin Setup

The goal of this notebook is to create a user-managed container that can be used in Vertex Workbench

### Setup variables, paths, and create artifact registry

In [1]:
VERSION=23.12 #nvcr.io/nvidia/merlin/merlin-pytorch:23.02
REPO_NAME="workbench"
REGION="us-central1"
PROJECT="wortz-project-352116" # TODO: update with your project_id
IMAGE_ID="merlin-training"
MERLIN_IMAGE_NAME=f"{REGION}-docker.pkg.dev/{PROJECT}/{REPO_NAME}/merlin-{IMAGE_ID}-{VERSION}"

MERLIN_CONTAINER="merlin_container"

!mkdir {MERLIN_CONTAINER}

!gcloud beta artifacts repositories create {REPO_NAME} \
    --repository-format=docker \
    --location=$REGION

mkdir: cannot create directory ‘merlin_container’: File exists
[1;31mERROR:[0m (gcloud.beta.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [2]:
!gcloud config set project $PROJECT

Updated property [core/project].


#### Find your service account for you tenant project - usually project_id-compute@developer.gserviceaccount.com

In [3]:
# !gcloud projects add-iam-policy-binding hybrid-vertex --member=serviceAccount:xxxxxxxx-compute@developer.gserviceaccount.com --role=roles/artifactregistry.admin

In [4]:
# !gcloud auth configure-docker us-central1-docker.pkg.dev --quiet

### Create Docker image derived from Deeplearning containers

In [5]:
%%writefile {MERLIN_CONTAINER}/jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.token = ''
c.NotebookApp.password = ''
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8080
c.NotebookApp.terminado_settings = {'shell_command': ['/bin/bash']}
c.NotebookApp.allow_origin_pat = (
'(^https://8080-dot-[0-9]+-dot-devshell\.appspot\.com$)|'
'(^https://colab\.research\.google\.com$)|'
'((https?://)?[0-9a-z]+-dot-(?:us|asia|europe|northamerica|southamerica)-?[0-9a-z]+\.notebooks\.googleusercontent.com)')
c.NotebookApp.allow_remote_access = True
c.NotebookApp.disable_check_xsrf = False

In [6]:
%%writefile {MERLIN_CONTAINER}/Dockerfile
FROM nvcr.io/nvidia/merlin/merlin-tensorflow:23.12

EXPOSE 8080

COPY merlin_container/requirements.txt .

COPY merlin_container/jupyter_notebook_config.py /root/.jupyter

ENV pwd=""
ENTRYPOINT exec jupyter-lab --ip=0.0.0.0 --port=8080 --no-browser --allow-root --ServerApp.allow_origin="*" --NotebookApp.token="$pwd" --NotebookApp.password="$pwd"

Overwriting merlin_container/Dockerfile


In [7]:
!pwd

/home/jupyter/merlin-gcp


In [8]:
%%writefile {MERLIN_CONTAINER}/requirements.txt
gsutil
gcsfs
matplotlib

Overwriting merlin_container/requirements.txt


### Quick option for running on cloud build

In [9]:
# !gcloud builds submit $MERLIN_CONTAINER --dir $MERLIN_CONTAINER -t $MERLIN_IMAGE_NAME
# !gcloud push $MERLIN_IMAGE_NAME

#### Or build locally and push

In [10]:
!docker build . -f $MERLIN_CONTAINER/Dockerfile -t $MERLIN_IMAGE_NAME
!docker push $MERLIN_IMAGE_NAME

Sending build context to Docker daemon  3.571MB
Step 1/6 : FROM nvcr.io/nvidia/merlin/merlin-tensorflow:23.12
 ---> 7a1aca1855e2
Step 2/6 : EXPOSE 8080
 ---> Using cache
 ---> 3c4710b6916c
Step 3/6 : COPY merlin_container/requirements.txt .
 ---> Using cache
 ---> 5254c77a85fd
Step 4/6 : COPY merlin_container/jupyter_notebook_config.py /root/.jupyter
 ---> Using cache
 ---> 31eb19a0da47
Step 5/6 : ENV pwd=""
 ---> Using cache
 ---> 0a898a1d9be6
Step 6/6 : ENTRYPOINT exec jupyter-lab --ip=0.0.0.0 --port=8080 --no-browser --allow-root --ServerApp.allow_origin="*" --NotebookApp.token="$pwd" --NotebookApp.password="$pwd"
 ---> Using cache
 ---> 5a655b382881
Successfully built 5a655b382881
Successfully tagged us-central1-docker.pkg.dev/wortz-project-352116/workbench/merlin-merlin-training-23.12:latest
Using default tag: latest
The push refers to repository [us-central1-docker.pkg.dev/wortz-project-352116/workbench/merlin-merlin-training-23.12]

[1Ba6ed31a1: Preparing 
[1Bb4e5d75f: Prepari

# New
_____
### [Updated instructions from 6.8.24](https://cloud.google.com/vertex-ai/docs/workbench/instances/create-custom-container)

#### [Required roles](https://cloud.google.com/vertex-ai/docs/workbench/instances/create-custom-container#permissions)
To ensure that your user account has the necessary permissions to create a Vertex AI Workbench instance, ask your administrator to grant your user account the Notebooks Runner (roles/notebooks.runner) IAM role on the project. For more information about granting roles, see Manage access.

Your administrator might also be able to give your user account the required permissions through custom roles or other predefined roles.

In [11]:
INSTANCE_NAME = 'hybrid-merlin-workbench'

In [12]:
##### One time permissions
! gcloud projects add-iam-policy-binding $PROJECT \
    --member=serviceAccount:$PROJECT-compute@developer.gserviceaccount.com \
    --role=roles/notebooks.runner

[1;31mERROR:[0m (gcloud.projects.add-iam-policy-binding) User [679926387543-compute@developer.gserviceaccount.com] does not have permission to access projects instance [wortz-project-352116:getIamPolicy] (or it may not exist): The caller does not have permission


In [13]:
! gcloud workbench instances create $INSTANCE_NAME \
    --project=$PROJECT \
    --location=$REGION-b \
    --container-repository=$MERLIN_IMAGE_NAME \
    --container-tag=latest \
    --machine-type=n1-standard-2 \
    --accelerator-type=NVIDIA_TESLA_T4 \
    --accelerator-core-count=1

Waiting for operation on Instance [hybrid-merlin-workbench] to be created with 
[projects/wortz-project-352116/locations/us-central1-b/operations/operation-171
8663469672-61b1d8599267d-06fc42d5-45f93865]...done.                            
Created workbench instance hybrid-merlin-workbench [https://notebooks.googleapis.com/v2/projects/wortz-project-352116/locations/us-central1-b/operations/operation-1718663469672-61b1d8599267d-06fc42d5-45f93865].


#### You should be able to start up the workbench and follow the [example](https://nvidia-merlin.github.io/Merlin/stable/examples/getting-started-movielens/01-Download-Convert.html)


```python
# External dependencies
import os

from merlin.core.utils import download_file

# Get dataframe library - cudf or pandas
from merlin.core.dispatch import get_lib
df_lib = get_lib()
```