# Python Training - Vertex AI Training Custom Jobs

ML Training with Python code as a Vertex AI Training Custom Job

Why?  This notebook is an IDE that happens to also happen to have:
- **compute**: CPU, Memory, GPU
- **software**: container running with Python and loaded packages like TensorFlow, PyTorch, ...
- **code**: user-written instruction for ML training

But scaling this notebook instance to run our ML training code has limitations:
- paying `$$$$` while typing and troubleshooting
- running training code multiple times with different data sources
- running training code with multiple configuration of hyperparameters for tuning
- automating training code execution based on time or events

Rather than scaling this notebook up to larger **compute** we want to launch a fit for purpose job that runs our training **code** using the **software** of choice on the needed **compute** to handle the size of our training data.  That is made simple with Vertex AI Training Custom Jobs.  

Our training code can be in many locations and forms:
- local files
    - single script
    - folders/modules
    - Python Package Distribution
- GCS Bucket
    - single script
    - folders/modules
    - Python Package Distribution
- GitHub
    - single script
    - folders/modules
    - Python Package Distribution
- Repository
    - Python Package hosted on Artifact Registry
    
Vertex AI Training Custom Jobs can use training code from:
- local files: single script
- GCS Bucket: Python Source Distribution
- Custom Container
    - Built with code originating at any of the locations and forms above!

<p align="center" width="100%">
    <img src="../architectures/overview/training.png" width="45%">
    &nbsp; &nbsp; &nbsp; &nbsp;
    <img src="../architectures/overview/training2.png" width="45%">
</p>

---

**Prerequisites:**

The examples below use:
- the code in various formats created in the [Python Packages](./Python%20Packages.ipynb) notebook
- the custom containers created in multiple workflows by the [Python Custom Containers](./Python%20Custom%20Containers.ipynb) notebook



---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = 'training'
SERIES = 'tips'

packages:

In [3]:
import os, shutil
import pkg_resources
from datetime import datetime

from google.cloud import aiplatform

clients:

In [4]:
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [5]:
DIR = f'temp/{EXPERIMENT}'

In [138]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [139]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [71]:
# remove directory named DIR if exists
shutil.rmtree(DIR, ignore_errors = True)

# create directory DIR
os.makedirs(DIR)

# check for existance of DIR
print('DIR exists? ', os.path.exists(DIR))

DIR exists?  True


---
## Vertex AI Training Custom Jobs Example Workflows

Vertex AI Training Custom Jobs can use:
- a local script
- GCS housed Python source distribution
- a custom containers
    - all the workflows from the [Python Custom Containers](./Python%20Custom%20Container.ipynb) notebook

This section show examples of running Vertex AI Custom Jobs in many different workflows.  It also shows how to uses the workflow and test the training script locally, in the notebook instance.

**Examples**

- [Local Script](#script)
- [Python Source Distribution](#source)
- [Custom Container - Workflow 1 - Copy Script To Container](#workflow1)
- [Custom Container - Workflow 2 - Copy Folder To Container](#workflow2)
- [Custom Container - Workflow 3 - Copy Package To Container](#workflow3)
- [Custom Container - Workflow 4 - pip install package from GCS to container](#workflow4)
- [Custom Container - Workflow 5 - pip install package from GitHub to container](#workflow5)
- [Custom Container - Workflow 6 - pip install package from Artifact Registry to container](#workflow6)
- [Running in Notebook](#notebook)

---
### Common Prep for Examples

#### Inputs & Parameters

In [120]:
# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters
EPOCHS = 10
BATCH_SIZE = 100

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Experiment Tracking
FRAMEWORK = 'tf'
TASK = 'classification'
MODEL_TYPE = 'dnn'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'

# Resources
TRAIN_COMPUTE = 'n1-standard-4'
TRAIN_IMAGE = 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest'
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PROJECT_ID}-docker"

# parameters
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

#### Tensorboard

The example test jobs below are based on jobs in the `05 - TensorFlow` series and takes advantage of Vertex AI Experiments and mangaed TensorBoard.  This section creates a TensorBoard instance and gets other inputs for the jobs:

In [121]:
tb = aiplatform.Tensorboard.list(filter=f"labels.series={SERIES}")
if tb:
    tb = tb[0]
else: 
    tb = aiplatform.Tensorboard.create(display_name = SERIES, labels = {'series' : f'{SERIES}'})

In [122]:
tb.resource_name

'projects/1026793852137/locations/us-central1/tensorboards/7360834523774320640'

#### Vertex AI Experiments

The code in this section initializes the experiment that represents this notebook.  Throughout the notebook sections the model training and evaluation information will be logged to the experiment using as an experiment run using:

- [.log_params](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_params)
- [.log_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics)
- [.log_time_series_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_time_series_metrics)

In [123]:
aiplatform.init(experiment = EXPERIMENT_NAME, experiment_tensorboard = tb.resource_name)

#### Vertex AI Training Custom Job Parameters

In [124]:
CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE),
    "--var_target=" + VAR_TARGET,
    "--var_omit=" + VAR_OMIT,
    "--project_id=" + PROJECT_ID,
    "--bq_project=" + BQ_PROJECT,
    "--bq_dataset=" + BQ_DATASET,
    "--bq_table=" + BQ_TABLE,
    "--region=" + REGION,
    "--experiment=" + EXPERIMENT,
    "--series=" + SERIES,
    "--experiment_name=" + EXPERIMENT_NAME,
    "--run_name="
]

MACHINE_SPEC = {
    "machine_type": TRAIN_COMPUTE,
    "accelerator_count": 0
}

WORKER_POOL_SPEC = [
    {
        "replica_count": 1,
        "machine_spec": MACHINE_SPEC,
        "container_spec": {
            "image_uri": '', # will be filled in below by the workflow
            "command": [],
            "args": [] # will be filled in below by the workflow
        }
    }
]

---
<a id = 'workflow1'></a>
### Custom Container - Workflow 1 - Copy Script To Container

The custom container used here was created by [Python Custom Containers - Workflow 1](./Python%20Custom%20Containers.ipynb#workflow1).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).


Job Parameters:

In [125]:
WORKFLOW = 'workflow_1'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_1',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-1-20220924193043']}}]

Define the `aiplatform.CustomJob`:

In [17]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [18]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/3255733919315656704
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/3255733919315656704')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/3255733919315656704?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+3255733919315656704
CustomJob projects/1026793852137/locations/us-central1/customJobs/3255733919315656704 current state:
JobState.JOB_STATE_QUEUED
CustomJob projects/1026793852137/locations/us-central1/customJobs/3255733919315656704 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/3255733919315656704 current state:
JobState.JOB_STATE_PENDING
C

Review the Job:

In [19]:
customJob.display_name

'training_tips_workflow_1_20220922193615'

In [20]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/3255733919315656704'

In [21]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

---
<a id = 'workflow2'></a>
### Custom Container - Workflow 2 - Copy Folder To Container

The custom container used here was created by [Python Custom Containers - Workflow 2](./Python%20Custom%20Containers.ipynb#workflow2).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).

Job Parameters:

In [126]:
WORKFLOW = 'workflow_2'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_2',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-2-20220924193049']}}]

Define the `aiplatform.CustomJob`:

In [27]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [28]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/1381392049399398400
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/1381392049399398400')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1381392049399398400?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+1381392049399398400
CustomJob projects/1026793852137/locations/us-central1/customJobs/1381392049399398400 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/1381392049399398400 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/1381392049399398400 current state:
JobState.JOB_STATE_PENDING


Review the Job:

In [29]:
customJob.display_name

'training_tips_workflow_2_20220922212304'

In [30]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/1381392049399398400'

In [32]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/1381392049399398400/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+1381392049399398400


---
<a id = 'workflow3'></a>
### Custom Container - Workflow 3 - Copy Package To Container

The custom container used here was created by [Python Custom Containers - Workflow 3](./Python%20Custom%20Containers.ipynb#workflow3).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).

Job Parameters:

In [104]:
WORKFLOW = 'workflow_3'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_3',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-3-20220924190426']}}]

Define the `aiplatform.CustomJob`:

In [34]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [35]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/4519934796746457088
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/4519934796746457088')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/4519934796746457088?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+4519934796746457088
CustomJob projects/1026793852137/locations/us-central1/customJobs/4519934796746457088 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/4519934796746457088 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/4519934796746457088 current state:
JobState.JOB_STATE_PENDING


Review the Job:

In [40]:
customJob.display_name

'training_tips_workflow_3_20220923184924'

In [41]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/4519934796746457088'

In [43]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/4519934796746457088/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+4519934796746457088


---
<a id = 'workflow4'></a>
### Custom Container - Workflow 4 - pip install package from GCS to container

The custom container used here was created by [Python Custom Containers - Workflow 4](./Python%20Custom%20Containers.ipynb#workflow4).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).

In [105]:
WORKFLOW = 'workflow_4'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_4',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-4-20220924190435']}}]

Define the `aiplatform.CustomJob`:

In [63]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [64]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/7963077449359556608
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/7963077449359556608')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7963077449359556608?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+7963077449359556608
CustomJob projects/1026793852137/locations/us-central1/customJobs/7963077449359556608 current state:
JobState.JOB_STATE_QUEUED
CustomJob projects/1026793852137/locations/us-central1/customJobs/7963077449359556608 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/7963077449359556608 current state:
JobState.JOB_STATE_PENDING
C

Review the Job:

In [65]:
customJob.display_name

'training_tips_workflow_4_20220924025409'

In [66]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/7963077449359556608'

In [68]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/7963077449359556608/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+7963077449359556608


---
<a id = 'workflow5'></a>
### Custom Container - Workflow 5 - pip install package from GitHub to container

The custom container used here was created by [Python Custom Containers - Workflow 5](./Python%20Custom%20Containers.ipynb#workflow5).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).

In [106]:
WORKFLOW = 'workflow_5'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_5',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-5-20220924190445']}}]

Define the `aiplatform.CustomJob`:

In [45]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [46]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/1618490736813015040
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/1618490736813015040')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1618490736813015040?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+1618490736813015040
CustomJob projects/1026793852137/locations/us-central1/customJobs/1618490736813015040 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/1618490736813015040 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/1618490736813015040 current state:
JobState.JOB_STATE_PENDING


Review the Job:

In [51]:
customJob.display_name

'training_tips_workflow_5_20220923191307'

In [52]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/1618490736813015040'

In [54]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/1618490736813015040/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+1618490736813015040


---
<a id = 'workflow6'></a>
### Custom Container - Workflow 6 - pip install package from Artifact Registry to container

The custom container used here was created by [Python Custom Containers - Workflow 6](./Python%20Custom%20Containers.ipynb#workflow6).

> This is a modified version of notebook [05c - Vertex AI Custom Model - TensorFlow - Custom Job With Custom Container](../05%20-%20TensorFlow/05c%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Custom%20Container.ipynb).

In [107]:
WORKFLOW = 'workflow_6'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME
WORKER_POOL_SPEC[0]['container_spec']['image_uri'] = f"{REPOSITORY}/tips_trainer_{WORKFLOW}"
WORKER_POOL_SPEC[0]['container_spec']['args'] = CMDARGS

WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'container_spec': {'image_uri': 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915-docker/tips_trainer_workflow_6',
   'command': [],
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-6-20220924190455']}}]

Define the `aiplatform.CustomJob`:

In [56]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Run the job:

In [57]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/2791256227277963264
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/2791256227277963264')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/2791256227277963264?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+2791256227277963264
CustomJob projects/1026793852137/locations/us-central1/customJobs/2791256227277963264 current state:
JobState.JOB_STATE_QUEUED
CustomJob projects/1026793852137/locations/us-central1/customJobs/2791256227277963264 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/2791256227277963264 current state:
JobState.JOB_STATE_PENDING
C

Review the Job:

In [58]:
customJob.display_name

'training_tips_workflow_6_20220924020627'

In [59]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/2791256227277963264'

In [60]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

---
<a id = 'script'></a>
### Local Script

Run a single file training script with `aiplatform.CustomJob.from_local_script()`

Notes:
- This uses a single file `filename.py` from the local directory, not a GCS URI
- When you run `aiplatform.CustomJob.from_local_script()` it responds with a message confirming the local script was copied to the GCS URI provide in the parameter `staging_bucket = `.

This is a modified version of notebook [05a - Vertex AI Custom Model - TensorFlow - Custom Job With Python File](../05%20-%20TensorFlow/05a%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Python%20File.ipynb) that uses the local script for this project.

In [108]:
WORKFLOW = 'workflow_script'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"

CMDARGS[-1] = "--run_name=" + RUN_NAME
CMDARGS

['--epochs=10',
 '--batch_size=100',
 '--var_target=Class',
 '--var_omit=transaction_id',
 '--project_id=statmike-mlops-349915',
 '--bq_project=statmike-mlops-349915',
 '--bq_dataset=fraud',
 '--bq_table=fraud_prepped',
 '--region=us-central1',
 '--experiment=training',
 '--series=tips',
 '--experiment_name=experiment-tips-training-tf-classification-dnn',
 '--run_name=run-workflow-script-20220924190856']

In [113]:
customJob = aiplatform.CustomJob.from_local_script(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    script_path = f"./code/tips_trainer/src/tips_trainer/train.py",
    container_uri = TRAIN_IMAGE,
    args = CMDARGS,
    requirements = ['tensorflow_io', f'google-cloud-aiplatform>={aiplatform.__version__}', f"protobuf=={pkg_resources.get_distribution('protobuf').version}"],
    replica_count = 1,
    machine_type = TRAIN_COMPUTE,
    accelerator_count = 0,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Training script copied to:
gs://statmike-mlops-349915/tips/training/workflow_script/20220924190856/aiplatform-2022-09-24-19:10:21.717-aiplatform_custom_trainer_script-0.1.tar.gz.


In [114]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/398367081516498944
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/398367081516498944')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/398367081516498944?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+398367081516498944
CustomJob projects/1026793852137/locations/us-central1/customJobs/398367081516498944 current state:
JobState.JOB_STATE_QUEUED
CustomJob projects/1026793852137/locations/us-central1/customJobs/398367081516498944 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/398367081516498944 current state:
JobState.JOB_STATE_PENDING
CustomJo

In [115]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
print(f'Review the Job here:\n{job_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/398367081516498944/cpu?cloudshell=false&project=statmike-mlops-349915


In [116]:
print(f'Review the model output here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SERIES}/{EXPERIMENT}/{WORKFLOW}/{TIMESTAMP}?project={PROJECT_ID}')

Review the model output here:
https://console.cloud.google.com/storage/browser/statmike-mlops-349915/tips/training/models/20220924190856?project=statmike-mlops-349915


<a id = 'source'></a>
### Python Source Distribution

Use the Python Source Distribution to run with `aiplatform.CustomJob(..., worker_pool_specs = )` by specifying `python_package_spec = ` in the `worker_pool_specs`.

Notes:
- This uses a Python Source Distribution which is in the format `.tar.gz`, a compressed tarball
- This project has a prepared Python Source Distribution in `./code/tips_trainer/dist/` which was created by [Python Packages](./Python%20Packages.ipynb)
- The `python_package_spec` parameter of the `worker_pool_specs` has subparameter `package_uris` which allows a list of up to 100 source distributions.  These must be provided as GCS URIs like `gs://bucketname/path_to_file.tar.gz`

This is a modified version of notebook [05b - Vertex AI Custom Model - TensorFlow - Custom Job With Python Source Distribution](../05%20-%20TensorFlow/05b%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Custom%20Job%20With%20Python%20Source%20Distribution.ipynb) that uses the source distribution stored in GCS for this project.

In [133]:
WORKFLOW = 'workflow_source'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS[-1] = "--run_name=" + RUN_NAME

# remove container_spec and replace with python_package_spec:
SOURCE_WORKER_POOL_SPEC = WORKER_POOL_SPEC
SOURCE_WORKER_POOL_SPEC[0].pop('container_spec', None)
SOURCE_WORKER_POOL_SPEC[0]['python_package_spec'] = {
            "executor_image_uri": TRAIN_IMAGE,
            "package_uris": [f"gs://{PROJECT_ID}/{SERIES}/code/tips_trainer/dist/tips_trainer-0.1.tar.gz"],
            "python_module": "tips_trainer.train",
            "args": CMDARGS
}

SOURCE_WORKER_POOL_SPEC

[{'replica_count': 1,
  'machine_spec': {'machine_type': 'n1-standard-4', 'accelerator_count': 0},
  'python_package_spec': {'executor_image_uri': 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',
   'package_uris': ['gs://statmike-mlops-349915/tips/code/tips_trainer/dist/tips_trainer-0.1.tar.gz'],
   'python_module': 'tips_trainer.train',
   'args': ['--epochs=10',
    '--batch_size=100',
    '--var_target=Class',
    '--var_omit=transaction_id',
    '--project_id=statmike-mlops-349915',
    '--bq_project=statmike-mlops-349915',
    '--bq_dataset=fraud',
    '--bq_table=fraud_prepped',
    '--region=us-central1',
    '--experiment=training',
    '--series=tips',
    '--experiment_name=experiment-tips-training-tf-classification-dnn',
    '--run_name=run-workflow-source-20220924194755']}}]

In [134]:
customJob = aiplatform.CustomJob(
    display_name = f'{EXPERIMENT}_{SERIES}_{WORKFLOW}_{TIMESTAMP}',
    worker_pool_specs = SOURCE_WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{WORKFLOW}/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

In [135]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/7138003923876446208
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/7138003923876446208')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7138003923876446208?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+7138003923876446208
CustomJob projects/1026793852137/locations/us-central1/customJobs/7138003923876446208 current state:
JobState.JOB_STATE_QUEUED
CustomJob projects/1026793852137/locations/us-central1/customJobs/7138003923876446208 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/7138003923876446208 current state:
JobState.JOB_STATE_PENDING
C

In [136]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
print(f'Review the Job here:\n{job_link}')

Review the Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/7138003923876446208/cpu?cloudshell=false&project=statmike-mlops-349915


In [138]:
print(f'Review the model output here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SERIES}/{EXPERIMENT}/{WORKFLOW}/{TIMESTAMP}?project={PROJECT_ID}')

Review the model output here:
https://console.cloud.google.com/storage/browser/statmike-mlops-349915/tips/training/workflow_source/20220924194755?project=statmike-mlops-349915


<a id = 'notebook'></a>
### Notebook - local code


Use the training script in the local notebook environment.  While the script is authored and packaged for running in a Vertex AI Training Custom Job it can also be used locally.  This is helpful for testing.  

The job will be launched like any Python job with `python -m tips_trainer.train <list_of_args_here>`.  Choices for making the `tips_trainer.train` module/file/script available are:
- pip install from Artifact Registry: `pip install --index-url https://{REGION}-python.pkg.dev/{PROJECT_ID}/{PROJECT_ID}-python/simple tips-trainer`
- pip install from local directory: `pip install /code/tips_trainer/dist/*.whl`
- pip install from GitHub: `pip install https://github.com/statmike/vertex-ai-mlops/blob/main/Tips/code/tips_trainer/dist/tips_trainer-0.1-py3-none-any.whl?raw=true`
- run from local directory: `./code/tips_trainer/src` or copy to directory of choice

Notes:
- Vetex AI Training Jobs set [environment variables for Cloud Storage locations](https://cloud.google.com/vertex-ai/docs/training/code-requirements#environment-variables).  Since this example is running in the training code in the local notebook instnace rather than in a custom job, these will need to be set manually.
    - `AIP_MODEL_DIR` - this extends `base_output_directory` with `/model`
    - `AIP_TENSORBOARD_LOG_DIR` - this extends `base_output_directory` with `/log`

In [84]:
WORKFLOW = 'workflow_nb_local'
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f"run-{WORKFLOW.replace('_', '-')}-{TIMESTAMP}"


CMDARGS.append("--run_name=" + RUN_NAME)

Set environment variable the code expects:

In [85]:
base_output_dir = f"{URI}/{WORKFLOW}/{TIMESTAMP}"

os.environ["AIP_MODEL_DIR"] = base_output_dir + '/model'
os.environ["AIP_TENSORBOARD_LOG_DIR"] = base_output_dir + '/logs'

In [88]:
%%bash
echo $AIP_MODEL_DIR
echo $AIP_TENSORBOARD_LOG_DIR

gs://statmike-mlops-349915/tips/training/workflow_nb_local/20220924180350/model
gs://statmike-mlops-349915/tips/training/workflow_nb_local/20220924180350/logs


Run the training code locally:

In [92]:
!cd ./code/tips_trainer/src && python -m tips_trainer.train {(' ').join(CMDARGS)}

2022-09-24 18:08:42.932394: W tensorflow_io/core/kernels/audio_video_mp3_kernels.cc:271] libmp3lame.so.0 or lame functions are not available
2022-09-24 18:08:42.932813: I tensorflow_io/core/kernels/cpu_check.cc:128] Your CPU supports instructions that this TensorFlow IO binary was not compiled to use: AVX2 FMA
Associating projects/1026793852137/locations/us-central1/metadataStores/default/contexts/experiment-tips-training-tf-classification-dnn-run-workflow-nb-local-20220924180350 to Experiment: experiment-tips-training-tf-classification-dnn
2022-09-24 18:08:56.776553: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299995000 Hz
2022-09-24 18:08:56.777145: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55869ed952c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-24 18:08:56.777183: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-24 18:0

In [93]:
print(f"Review the Cloud Storage contents for this job here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SERIES}/{EXPERIMENT}/{WORKFLOW}/{TIMESTAMP}?project={PROJECT_ID}")

Review the Cloud Storage contents for this job here:
https://console.cloud.google.com/storage/browser/statmike-mlops-349915/tips/training/workflow_nb_local/20220924180350?project=statmike-mlops-349915


In [100]:
print(f"Review the TensorBoard for this Experiment here:\nhttps://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{EXPERIMENT_NAME}")

Review the TensorBoard for this Experiment here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7360834523774320640+experiments+experiment-tips-training-tf-classification-dnn
