# Python Custom Containers
### IN ACTIVE DEVELOPMENT - not complete

Containers are helpful.

> TL;DR
> Use containers to bring together **software** and training **code** so that you can easily launch jobs on different **compute** with different **parameters** to simplify the operations of ML training.

At the point in our workflow where we train an ML model a lot of things come together to make it happen:
- **compute** in running: CPUs, memory, networking, GPUs on one or more instances
- **software** is running on the compute
    - the required packages are installed with the software
- training **code**/script is launched with the software
- training **data** is read by the training code
- **parameters** that the code uses to configure the training run

It's tempting to develop in an IDE, like JupyterLab here, and then just make the VM behind it much larger.  The notebook here is running in JupyterLab hosted on **compute** running **software** and is being used to author **code** that reads **data** using **parameters** set as Python variables.  One of the issues with this is that typing these words just cost `$$$$` and this instance might not be able to run this notebook 10 times in parallel with different **parameters**.  

A better way?  Keep using an enviorment like this to develop our **code** and make sure it works. Just use smaller **compute** and **data** during this development process.  Then, launch a sepearate, managed job, that runs the full training.  How? What if we could instruct a service to take the list of inputs above and run a job and only charge for the compute used during the duration of training? That is exactly what Vertex AI Training is used for.  With this in mind it also helps scale the usefulness of training as a next step:
- specify distributed training, pools of compute instances
- manage hyperparameter tuning with multiple parallel training jobs focusing in on the right values for hyperparameters
- run many training jobs at the same time without managing compute but also controling the cost of scaling

Vertex AI has a [list of provided pre-built training containers](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) for the most popular frameworks.  They are made available in multiple release versions of common frameworks both with and without [CUDA](https://developer.nvidia.com/cuda-toolkit) already configured and setup for GPU based training.

For Vertex AI Training Custom Jobs you:
- specify the **compute** to use as input parameters or as worker pool specs
- provide a URI for a container with the **software** to use on each worker
- provide training **code** in one of three ways
    - as a link to a Python script (file.py)
    - as URI to GCS for a Python Source Distribution
    - as a starting point to code already included on the container with the **software**
- provide **data**
    - as a **parameter** specifying the location the **code** can use to read/retrieve it
    - or build the logic for connecting to the data source into the **code**

If we learn the skill of building a derivative containers that packages our desired **software** and installs additional packages while also holding a copy of our **code**, and maybe even our **parameters**, then this ML training job become very simple to incorporate in our workflow!

That is this notebooks goal.

---
## Notes:

**Prerequisite:**

This notebook depends on ML training code prepared in multiple forms by the [Python Packages](./Python%20Packages.ipynb) notebook.  Please run that notebook first before running this notebook. 

**We will use [Cloud Build](https://cloud.google.com/build) to construct containers.**

- [API Overview](https://cloud.google.com/build/docs/api)
    - REST API, gcloud CLI, and Client Libraries for Go, Java, Node.js, and Python
- [Python Client for Cloud Build API](https://github.com/googleapis/python-cloudbuild)
- [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/cloudbuild/latest)

**We will store built containers in [Artifact Registry](https://cloud.google.com/artifact-registry).**

- [API Overview](https://cloud.google.com/artifact-registry/docs/apis)
- [Python Client for Artifact Registry API](https://github.com/googleapis/python-artifact-registry)
- [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/artifactregistry/latest)

**Notes on Python and Google Cloud:**

Google Cloud APIs can be used with the [Google Cloud Python Client](https://github.com/googleapis/google-cloud-python).  The client has [libraries](https://github.com/googleapis/google-cloud-python#libraries) for Google Cloud services.  The documentation for each library is centralized in the [Python Cloud Client Libraries](https://cloud.google.com/python/docs/reference) reference documentation.
- Also helpful: [Getting started with Python](https://cloud.google.com/python/docs/getting-started) in Google Cloud

---
## Setup

### Package Installs (if needed)

This notebook uses the Python Clients for
- Google Service Usage
    - to enable APIs (Artifact Registry and Cloud Build)
- Artifact Registry
    - to create repositories for Python packages and Docker containers
- Cloud Build
    - To build custom Docker containers

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These installs must be completed before continuing this notebook.

In [135]:
try:
    import google.cloud.service_usage_v1
except ImportError:
    print('You need to pip install google-cloud-service-usage')
    !pip install google-cloud-service-usage -q

In [136]:
try:
    import google.cloud.artifactregistry_v1
except ImportError:
    print('You need to pip install google-cloud-artifact-registry')
    !pip install google-cloud-artifact-registry -q

In [137]:
try:
    import google.cloud.devtools.cloudbuild
except ImportError:
    print('You need to pip install google-cloud-build')
    !pip install google-cloud-build

### Environment

inputs:

In [138]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [139]:
REGION = 'us-central1'
EXPERIMENT = 'containers'
SERIES = 'tips'

packages:

In [204]:
import os, shutil
import pkg_resources
from google.cloud import service_usage_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import artifactregistry_v1
from google.cloud import aiplatform
from google.cloud import storage

clients:

In [141]:
su_client = service_usage_v1.ServiceUsageClient()
ar_client = artifactregistry_v1.ArtifactRegistryClient()
cb_client = cloudbuild_v1.CloudBuildClient()
aiplatform.init(project = PROJECT_ID, location = REGION)
gcs = storage.Client()

parameters:

In [142]:
DIR = f'temp/{EXPERIMENT}'

environment:

In [143]:
# remove directory named DIR if exists
shutil.rmtree(DIR, ignore_errors = True)

# create directory DIR
os.makedirs(DIR)

# check for existance of DIR
print('DIR exists? ', os.path.exists(DIR))

# list contents of directory one level higher than DIR
os.listdir(DIR + '/../')

DIR exists?  True


['job-parms', 'gcs', 'containers', 'multiprocess', 'packages']

---
## Enable APIs

Using Cloud Build and Artifact Registry requires enabling these APIs for the Google Cloud Project.

Options for enabeling these.  In this notebook (2) is used.
 1. Use the APIs & Services page in the console: https://console.cloud.google.com/apis
     - `+ Enable APIs and Services`
     - Search for Cloud Build and Enable
     - Search for Artifact Registry and Enable
 2. Use [Google Service Usage](https://cloud.google.com/service-usage/docs) API from Python
     - [Python Client For Service Usage](https://github.com/googleapis/python-service-usage)
     - [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/serviceusage/latest)
     
The following code cells use the Service Usage Client to:
- get the state of the service
- if 'DISABLED':
    - Try enabling the service and return the state after trying
- if 'ENABLED' print the state for confirmation

### Artifact Registry

In [144]:
artifactregistry = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
    )
).state.name


if artifactregistry == 'DISABLED':
    print(f'Artifact Registry is currently {artifactregistry} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Artifact Registry is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Artifact Registry already enabled for project: {PROJECT_ID}')

Artifact Registry already enabled for project: statmike-mlops-349915


### Cloud Build

In [145]:
cloudbuild = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
    )
).state.name


if cloudbuild == 'DISABLED':
    print(f'Cloud Build is currently {cloudbuild} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Build is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Build already enabled for project: {PROJECT_ID}')

Cloud Build already enabled for project: statmike-mlops-349915


---
## Setup Artifact Registry

Artifact registry organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

### List Repositories

This may be empty if no repositories have been created for this project

In [146]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


In [147]:
repo

name: "projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python"
format_: PYTHON
description: "A repository for the statmike-mlops-349915 experiment that holds Python Packages."
labels {
  key: "experiment"
  value: "packages"
}
labels {
  key: "series"
  value: "tips"
}
create_time {
  seconds: 1663696036
  nanos: 276742000
}
update_time {
  seconds: 1663768642
  nanos: 771288000
}

### Create Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images crated by this notebook.

In [148]:
docker_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}-docker' in repo.name:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}-docker',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {EXPERIMENT} experiment that holds docker images.',
                name = f'{PROJECT_ID}-docker',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Creating Repository ...
Completed creating repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker


In [149]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker',
 'DOCKER')

### Create Python Package Repository

Create an Artifact Registry Repository to hold Python Packages created by this notebook.

In [150]:
python_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}-python' in repo.name:
        python_repo = repo
        print(f'Retrieved existing repo: {python_repo.name}')

if not python_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}-python',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {PROJECT_ID} experiment that holds Python Packages.',
                name = f'{PROJECT_ID}-python',
                format_ = artifactregistry_v1.Repository.Format.PYTHON,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    python_repo = operation.result()
    print(f'Completed creating repo: {python_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


In [151]:
python_repo.name, python_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python',
 'PYTHON')

### List Repositories

In [152]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


---
## Training Code

ML Training code can take the form of a single script, a folder of scripts/modules, a Python Package distribution file. These could be located on the local disk, in GCS buckets, GitHub repository, or in a repository like Artifact Registry.  This notebook will explore many workflows for getting training code in all these forms and locations into a custom container.  

The example training code being used here is from the [05 - TensorFlow](../05%20-%20TensorFlow/readme.md) series has a model training file named [05_train.py](../05%20-%20TensorFlow/05_train.py).

These training code has already been prepared as local files, GCS hosted files, GitHub hosted files, and the source distribution is stored in Artifact Registry as a Python Package.

**Please pause here and review and run the tip in the [Python Packages](./Python%20Packages.ipynb) notebook first to prepare these versions for use in this notebook.**

### Local: Files/Folders

In [153]:
for root, dirs, files in os.walk('code'):
    for f in files:
        print(os.path.join(root, f))

code/tips_trainer/pyproject.toml
code/tips_trainer/.ipynb_checkpoints/pyproject-checkpoint.toml
code/tips_trainer/src/tips_trainer/__init__.py
code/tips_trainer/src/tips_trainer/train.py
code/tips_trainer/src/tips_trainer.egg-info/top_level.txt
code/tips_trainer/src/tips_trainer.egg-info/SOURCES.txt
code/tips_trainer/src/tips_trainer.egg-info/requires.txt
code/tips_trainer/src/tips_trainer.egg-info/dependency_links.txt
code/tips_trainer/src/tips_trainer.egg-info/PKG-INFO
code/tips_trainer/dist/tips_trainer-0.1.tar.gz
code/tips_trainer/dist/tips_trainer-0.1-py3-none-any.whl


### GCS Bucket: Files/Folders/Source Distributions

In [154]:
bucket = gcs.lookup_bucket(PROJECT_ID)

In [155]:
for blob in list(bucket.list_blobs(prefix = f'{SERIES}/code')):
    print(blob.name)

tips/code/tips_trainer/.ipynb_checkpoints/pyproject-checkpoint.toml
tips/code/tips_trainer/dist/tips_trainer-0.1-py3-none-any.whl
tips/code/tips_trainer/dist/tips_trainer-0.1.tar.gz
tips/code/tips_trainer/pyproject.toml
tips/code/tips_trainer/src/tips_trainer.egg-info/PKG-INFO
tips/code/tips_trainer/src/tips_trainer.egg-info/SOURCES.txt
tips/code/tips_trainer/src/tips_trainer.egg-info/dependency_links.txt
tips/code/tips_trainer/src/tips_trainer.egg-info/requires.txt
tips/code/tips_trainer/src/tips_trainer.egg-info/top_level.txt
tips/code/tips_trainer/src/tips_trainer/__init__.py
tips/code/tips_trainer/src/tips_trainer/train.py


### Artifact Registry: Python Package Distributions

In [156]:
ar_client.list_packages(
    parent = python_repo.name
)

ListPackagesPager<packages {
  name: "projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python/packages/tips-trainer"
  create_time {
    seconds: 1663768642
    nanos: 502115000
  }
  update_time {
    seconds: 1663768642
    nanos: 771288000
  }
}
>

---
## Creating a Custom Container with Cloud Build

Cloud Build creates and manages the build on GCP.  The API creates a build by providing:
- location of the source
- instructions
- location to store the built artifacts

The instruction part of Cloud Build has options:
- Dockerfile
- Build Config file (YAML or JSON)
- Cloud Native Buildpacks

This notebook uses the approach of using the Python Client for Cloud Build and not referencing any local files.  For that reason, the first step is creating a Dockerfile for the workflow and storing it in GCS. The next step is running Cloud Build and using the client to specify the Build config rather than a config file.  The steps of the build config start with getting the code (git clone, or copy from GCS) and copying the Dockerfile).  

Other options:
If you store the Dockerfile(s) in repository folder then you could go as far as having GitHub trigger a build on commit.  It could also simply the process below by not needing the extra steps of storing the Dockerfile separately and copying it before building.  

There are multiple stand-alone subsections here that illustrate different workflows for building a custom container depending on where and how the training code is stored.

- [Workflow 1: copy script to container](#workflow1)
- [Workflow 2: copy folder to container](#workflow2)
- [Workflow 3: copy package to container](#workflow3)
- [Workflow 4: pip install package from GCS to container](#workflow4)
- [Workflow 5: pip install package from GitHub to container](#workflow5)
- [Workflow 6: pip install package from Artifact Registry to container](#workflow6)

### Common Parameters for All Workflows

Choose a Vertex AI Pre-Built container for ML Training from the list [here](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) and store in in the `TRAIN_IMAGE` parameter below.  Also, create a parameter the references the Artifact Registry custom repository for Docker images created above. 

In [291]:
TRAIN_IMAGE = 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest'
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

---
### Workflow 1: copy script to container
<a id = 'workflow1'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the `requirements.txt`, `Dockerfile` and training script in the project GCS bucket and then launches a Cloud Build job that copies these and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 1](./Python%20Training.ipynb#workflow1)

In [247]:
WORKFLOW = 'workflow_1' # name for this workflow
CONTAINER = f'tips_trainer_{WORKFLOW}' # name for custom container
SOURCEPATH = f'{SERIES}/{EXPERIMENT}/{WORKFLOW}' # in gcs

#### Source Code
A copy of just the training code script that will be copied into the custom container.

In [232]:
bucket.copy_blob(
    blob = bucket.get_blob(blob_name = f'{SERIES}/code/tips_trainer/src/tips_trainer/train.py'),
    destination_bucket = bucket,
    new_name = f'{SOURCEPATH}/train.py'
)

<Blob: statmike-mlops-349915, tips/containers/workflow_1/train.py, 1663861926053053>

#### Requirements.txt
A list of requirments for the training code to run.  These packages will be added/updated on the pre-built container when the `Dockerfile` instructions runs `pip install ...`.

In [250]:
requirements = f"""
tensorflow_io
google-cloud-aiplatform>={aiplatform.__version__}
protobuf=={pkg_resources.get_distribution('protobuf').version}
"""

In [251]:
blob = bucket.blob(f'{SOURCEPATH}/requirements.txt')
blob.upload_from_string(requirements)

#### Dockerfile
The Docker build instructions.  This copies the `requirements.txt` and training script into the containers, `pip installs ...` the requirements, sets the entrypoint for the container to the training script.

In [248]:
dockerfile = f"""
FROM {TRAIN_IMAGE}
WORKDIR /training
# copy requirements and install them
COPY requirements.txt ./
RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt
## Copies the trainer code to the docker image
COPY train.py .
## Sets up the entry point to invoke the trainer
ENTRYPOINT ["python", "train.py"]
"""

In [249]:
blob = bucket.blob(f'{SOURCEPATH}/Dockerfile')
blob.upload_from_string(dockerfile)

#### Build Custom Container
Use the Cloud Build client to construct and run the build instructions.  Here the files collected in GCS are copied to the build instance, then the Docker build in run in the folder with the `Dockerfile`.  The resulting image is pushed to Artifact Registry (setup above).

In [280]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{PROJECT_ID}/{SOURCEPATH}/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{CONTAINER}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/tips_trainer_{WORKFLOW}"]

In [284]:
build

steps {
  name: "gcr.io/cloud-builders/gsutil"
  args: "cp"
  args: "-r"
  args: "gs://statmike-mlops-349915/tips/containers/workflow_1/*"
  args: "/workspace"
}
steps {
  name: "gcr.io/cloud-builders/docker"
  args: "build"
  args: "-t"
  args: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_1"
  args: "/workspace"
}
images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_1"

In [285]:
operation = cb_client.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [292]:
response = operation.result()
response.status, response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_1")

In [290]:
print(f"Review the Custom Container with Artifact Registry in the Google Cloud Console:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}-docker?project={PROJECT_ID}")

Review the Custom Container with Artifact Registry in the Google Cloud Console:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915-docker?project=statmike-mlops-349915


### Workflow 2: copy folder to container
<a id = 'workflow2'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the `requirements.txt`, `Dockerfile` and training script folder in the project GCS bucket and then launches a Cloud Build job that copies these and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 2](./Python%20Training.ipynb#workflow2)

In [293]:
WORKFLOW = 'workflow_2' # name for this workflow
CONTAINER = f'tips_trainer_{WORKFLOW}' # name for custom container
SOURCEPATH = f'{SERIES}/{EXPERIMENT}/{WORKFLOW}' # in gcs

#### Source Code
A copy of folder of training code that will be copied into the custom container.

In [300]:
for blob in list(bucket.list_blobs(prefix = f'{SERIES}/code/tips_trainer/src/tips_trainer/')):
    foldername = '/'.join(blob.name.split('/')[-2:])
    bucket.copy_blob(
        blob = blob,
        destination_bucket = bucket,
        new_name = f"{SOURCEPATH}/{foldername}"
    )

In [301]:
for blob in list(bucket.list_blobs(prefix = SOURCEPATH)):
    print(blob.name)

tips/containers/workflow_2/tips_trainer/__init__.py
tips/containers/workflow_2/tips_trainer/train.py


#### Requirements.txt
A list of requirments for the training code to run.  These packages will be added/updated on the pre-built container when the `Dockerfile` instructions runs `pip install ...`.

In [361]:
requirements = f"""
tensorflow_io
google-cloud-aiplatform>={aiplatform.__version__}
protobuf=={pkg_resources.get_distribution('protobuf').version}
"""

In [362]:
blob = bucket.blob(f'{SOURCEPATH}/requirements.txt')
blob.upload_from_string(requirements)

#### Dockerfile
The Docker build instructions.  This copies the `requirements.txt` and training script into the containers, `pip installs ...` the requirements, sets the entrypoint for the container to the training script.

Note: the `COPY` statement is moving multiple files so the destination must end in `/` here

In [372]:
dockerfile = f"""
FROM {TRAIN_IMAGE}
WORKDIR /training
# copy requirements and install them
COPY requirements.txt ./
RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt
## Copies the trainer code to the docker image
COPY tips_trainer/* ./tips_trainer/
## Sets up the entry point to invoke the trainer
ENTRYPOINT ["python", "-m", "tips_trainer.train"]
"""

In [373]:
blob = bucket.blob(f'{SOURCEPATH}/Dockerfile')
blob.upload_from_string(dockerfile)

#### Build Custom Container
Use the Cloud Build client to construct and run the build instructions.  Here the files collected in GCS are copied to the build instance, then the Docker build in run in the folder with the `Dockerfile`.  The resulting image is pushed to Artifact Registry (setup above).

In [374]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{PROJECT_ID}/{SOURCEPATH}/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{CONTAINER}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/tips_trainer_{WORKFLOW}"]

In [375]:
build

steps {
  name: "gcr.io/cloud-builders/gsutil"
  args: "cp"
  args: "-r"
  args: "gs://statmike-mlops-349915/tips/containers/workflow_2/*"
  args: "/workspace"
}
steps {
  name: "gcr.io/cloud-builders/docker"
  args: "build"
  args: "-t"
  args: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_2"
  args: "/workspace"
}
images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_2"

In [376]:
operation = cb_client.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [377]:
response = operation.result()
response.status, response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_2")

In [378]:
print(f"Review the Custom Container with Artifact Registry in the Google Cloud Console:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}-docker?project={PROJECT_ID}")

Review the Custom Container with Artifact Registry in the Google Cloud Console:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915-docker?project=statmike-mlops-349915


### Workflow 3: copy package to container
<a id = 'workflow3'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the Python package distribution and `Dockerfile` in the project GCS bucket and then launches a Cloud Build job that copies these and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 3](./Python%20Training.ipynb#workflow3)

### Workflow 4: pip install package from GCS to container
<a id = 'workflow4'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the `Dockerfile` in the project GCS bucket and then launches a Cloud Build job that copies this and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 4](./Python%20Training.ipynb#workflow4)

### Workflow 5: pip install package from GitHub to container
<a id = 'workflow5'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the `Dockerfile` in the project GCS bucket and then launches a Cloud Build job that copies this and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 5](./Python%20Training.ipynb#workflow5)

### Workflow 6: pip install package from Artifact Registry to container
<a id = 'workflow6'></a>

This workflow builds a custom container using a Vertex AI Pre-Built container for traning as the base.  The code below stores the `Dockerfile` in the project GCS bucket and then launches a Cloud Build job that copies this and runs the Docker build process with Cloud Build.

To see an an example Vertex AI Training Custom Job running with the custom container created here go to [Python Training - workflow 6](./Python%20Training.ipynb#workflow6)