# Python Client for Cloud Build
### IN ACTIVE DEVELOPMENT - not complete

Containers are helpful.

> TL;DR
> Use containers to bring together **software** and training **code** so that you can easily launch jobs on different **compute** with different **parameters** to simplify the operations of ML training.

At the moment we train an ML model a lot of things come together to make it happen:
- **compute** in running: CPUs, memory, networking, GPUs on one or more instances
- **software** is running on the compute
    - the required packages are installed with the software
- training **code**/script is launched with the software
- training **data** is read by the training code
- **parameters** that the code uses to configure the training run

It's tempting to develop in an IDE, like JupyterLab here, and then just make the VM behind it much larger.  This note book here is running in JupyterLab hosted on **compute** running **software** and is being used to author **code** that reads **data** using **parameters** set as Python variables.  One of the issues with that is that typing these words just cost `$$$$` and this instance might not be able to run this notebook 10 times in parallel with different **parameters**.  

A better way?  Keep using an enviorment like this to develop our **code** and make sure it works. Just use smaller **compute** and **data** during this development process.  Then, launch a sepearate, managed job, that runs the full training.  How? What if we could instruct a service to take the list of inputs above and run a job and only charge for the compute used for the duration of training? That is exactly what Vertex AI Training is used for.  With that in mind it also helps scale the usefulness of training as a next step:
- specify distributed training, pools of compute instances
- manage hyperparameter tuning with multiple parallel training jobs focusing in on the right values for hyperparameters
- run many training jobs at the same time without managing compute but also controling cost of this scale

Vertex AI has a [list of provided pre-built training containers](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) for the most popular frameworks.  They are made available in multiple release versions of the frameworks and with/without [CUDA](https://developer.nvidia.com/cuda-toolkit) already configured and setup for GPU based training.

For Vertex AI Training Custom Jobs you:
- specify the **compute** to use in parameters or as worker pool specs
- provide a URI for a container with the **software** to use
- provide training **code** in one of three ways
    - as a link to a Python script (file.py)
    - as URI to GCS for a Python Source Distribution
    - as a starting point to code already included on the container with the **software**
- provide **data**
    - as a **parameter** specifying the location the **code** can use to retrieve it
    - or build the logic for connecting to the data source into the **code**

If we learn the skill of building a derivative containers that packages our desired **software** and install additional packages while also holding a copy of our **code** and maybe even our **parameters** then this ML training jobs become very simple to incorporate in our workflow!

That is this notebooks goal.

---
**We will use [Cloud Build](https://cloud.google.com/build) to construct containers.**

- [API Overview](https://cloud.google.com/build/docs/api)
    - REST API, gcloud CLI, and Client Libraries for Go, Java, Node.js, and Python
- [Python Client for Cloud Build API](https://github.com/googleapis/python-cloudbuild)
- [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/cloudbuild/latest)

---
**We will store built containers in [Artifact Registry](https://cloud.google.com/artifact-registry).**

- [API Overview](https://cloud.google.com/artifact-registry/docs/apis)
- [Python Client for Artifact Registry API](https://github.com/googleapis/python-artifact-registry)
- [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/artifactregistry/latest)

---
**Notes on Python and Google Cloud:**

Google Cloud APIs can be used with the [Google Cloud Python Client](https://github.com/googleapis/google-cloud-python).  The client has [libraries](https://github.com/googleapis/google-cloud-python#libraries) for Google Cloud services.  The documentation for each library is centralized in the [Python Cloud Client Libraries](https://cloud.google.com/python/docs/reference) reference documentation.
- Also helpful: [Getting started with Python](https://cloud.google.com/python/docs/getting-started) in Google Cloud

---
## Package Installs (if needed)

This notebook uses the Python Clients for
- Google Service Usage
    - to enable APIs (Artifact Registry and Cloud Build)
- Artifact Registry
    - to create repositories for Python packages and Docker containers
- Cloud Build
    - To build custom Docker containers

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These installs must be completed before continuing this notebook.

In [123]:
try:
    import google.cloud.service_usage_v1
except ImportError:
    print('You need to pip install google-cloud-service-usage')
    !pip install google-cloud-service-usage -q

In [124]:
try:
    import google.cloud.artifactregistry_v1
except ImportError:
    print('You need to pip install google-cloud-artifact-registry')
    !pip install google-cloud-artifact-registry -q

In [125]:
try:
    import google.cloud.devtools.cloudbuild
except ImportError:
    print('You need to pip install google-cloud-build')
    !pip install google-cloud-build

---
## Setup

inputs:

In [4]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [32]:
REGION = 'us-central1'
EXPERIMENT = 'build'
SERIES = 'tips'

packages:

In [97]:
import os, shutil

from google.cloud import service_usage_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import artifactregistry_v1
from google.cloud import aiplatform

clients:

In [98]:
su_client = service_usage_v1.ServiceUsageClient()
ar_client = artifactregistry_v1.ArtifactRegistryClient()
cb_client = cloudbuild_v1.CloudBuildClient()
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [19]:
DIR = f'temp/{EXPERIMENT}'

environment:

In [85]:
# remove directory named DIR if exists
shutil.rmtree(DIR, ignore_errors = True)

# create directory DIR
os.makedirs(DIR)

# check for existance of DIR
print('DIR exists? ', os.path.exists(DIR))

# list contents of directory one level higher than DIR
os.listdir(DIR + '/../')

True


['job-parms', 'tips_build', '.ipynb_checkpoints', 'multi', 'gcs']

---
## Enable APIs

Using Cloud Build and Artifact Registry requires enabling these APIs for the Google Cloud Project.

Options for enabeling these.  In this notebook (2) is used.
 1. Use the APIs & Services page in the console: https://console.cloud.google.com/apis
     - `+ Enable APIs and Services`
     - Search for Cloud Build and Enable
     - Search for Artifact Registry and Enable
 2. Use [Google Service Usage](https://cloud.google.com/service-usage/docs) API from Python
     - [Python Client For Service Usage](https://github.com/googleapis/python-service-usage)
     - [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/serviceusage/latest)
     
The following code cells use the Service Usage Client to:
- get the state of the service
- if 'DISABLED':
    - Try enabling the service and return the state after trying
- if 'ENABLED' print the state for confirmation

### Artifact Registry

In [24]:
artifactregistry = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
    )
).state.name


if artifactregistry == 'DISABLED':
    print(f'Artifact Registry is currently {artifactregistry} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Artifact Registry is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Artifact Registry already enabled for project: {PROJECT_ID}')

Artifact Registry already enabled for project: statmike-mlops-349915


### Cloud Build

In [25]:
cloudbuild = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
    )
).state.name


if cloudbuild == 'DISABLED':
    print(f'Cloud Build is currently {cloudbuild} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Build is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Build already enabled for project: {PROJECT_ID}')

Cloud Build already enabled for project: statmike-mlops-349915


## Setup Artifact Registry

Artifact registry organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

### List Repositories

This may be empty if no repositories have been created for this project

In [30]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915


In [31]:
repo

name: "projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915"
format_: DOCKER
description: "Vertex AI Training Custom Containers"
create_time {
  seconds: 1655254405
  nanos: 113143000
}
update_time {
  seconds: 1663071981
  nanos: 599247000
}
maven_config {
}

### Create Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images crated by this notebook.

In [69]:
docker_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-docker' in repo.name:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-docker',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {EXPERIMENT} experiment that holds docker images.',
                name = f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-docker',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    response = operation.result()
    print(f'Completed creating repo: {response.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-docker


In [75]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-docker',
 'DOCKER')

### Create Python Package Repository

Create an Artifact Registry Repository to hold Python Packages created by this notebook.

In [65]:
python_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-python' in repo.name:
        python_repo = repo
        print(f'Retrieved existing repo: {python_repo.name}')

if not python_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-python',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {EXPERIMENT} experiment that holds Python Packages.',
                name = f'{PROJECT_ID}-{SERIES}-{EXPERIMENT}-python',
                format_ = artifactregistry_v1.Repository.Format.PYTHON,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    python_repo = operation.result()
    print(f'Completed creating repo: {python_repo.name}')

Creating Repository ...
Completed Creating repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-python


In [74]:
python_repo.name, python_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-python',
 'PYTHON')

### List Repositories

In [76]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-tips-build-python


## Training Code

brief description here

link to other tip that created file here

need to specify requirement for other tip

list directory:

In [122]:
for root, dirs, files in os.walk(DIR):
    for d in dirs:
        print(os.path.join(root, d))
    for f in files:
        print(os.path.join(root, d, f))

temp/tips_build/trainer
temp/tips_build/trainer/.ipynb_checkpoints
temp/tips_build/trainer/src
temp/tips_build/trainer/dist
temp/tips_build/trainer/dist/pyproject.toml
temp/tips_build/trainer/src/trainer.egg-info
temp/tips_build/trainer/src/trainer
temp/tips_build/trainer/src/trainer.egg-info/trainer/top_level.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/SOURCES.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/requires.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/dependency_links.txt
temp/tips_build/trainer/src/trainer.egg-info/trainer/PKG-INFO
temp/tips_build/trainer/src/trainer/trainer/__init__.py
temp/tips_build/trainer/src/trainer/trainer/train.py
temp/tips_build/trainer/dist/trainer/trainer-0.1-py3-none-any.whl
temp/tips_build/trainer/dist/trainer/trainer-0.1.tar.gz


This directory now has three key items:
- a single training file: {DIR}/training/src/trainer/train.py
- a folder of training code: {DIR}/training/src/trainer*
    - with a starting point of train.py
- a source distribution: {DIR}/training/dist/trainer-0.1.tar.gz

## Python Packages in Artifact Registry

The goal is to show multiple way of getting training Code into custom containers.


---
## Remove

Artifact Registry Repositories
Cloud Build Assets

---
---
enable cloud build
    list?
enable cloud artifact registry
    list, create registry

build container with pip installs
run with script on aiplatform

build container with pip and copy of script
run on aiplatform

build source distro from code
build containerr with source distro
run on aiplatform

build container from GitHub repo + folder + commitID
run on aiplatform


paths
- local > container
- local > package > AR > container
- GitHub > container
- GitHub > local > container

thoughts:
- container with file
- container with distro, install
- basically copy in versus pip install in dockerfile


separate python package to new tip
- results: file, folder, distro
- store in local, gcs
- run aiplatform custom jobs:
    - file from local
    - source from gcs

---
---