## Configure environment settings

Set location paths, connections strings, and other environment settings. Make sure to update   `REGION`, and `ARTIFACT_STORE`  with the settings reflecting your lab environment. 

- `REGION` - the compute region for Vertex AI Training and Prediction
- `ARTIFACT_STORE` - A GCS bucket in the created in the same region.

Containerized Training Cloud Serving
- `Phase 1` - (Before ML Pipeline)

In [116]:
import os
import time

from google.cloud import aiplatform

In [117]:
REGION = "us-east1"

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]

ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"

DATA_ROOT = f"{ARTIFACT_STORE}/data"
JOB_DIR_ROOT = f"{ARTIFACT_STORE}/jobs"
TRAINING_FILE_PATH = f"{DATA_ROOT}/train.parquet"
VALIDATION_FILE_PATH = f"{DATA_ROOT}/valid.parquet"
API_ENDPOINT = f"{REGION}-aiplatform.googleapis.com"

In [118]:
os.environ["JOB_DIR_ROOT"] = JOB_DIR_ROOT
os.environ["TRAINING_FILE_PATH"] = TRAINING_FILE_PATH
os.environ["VALIDATION_FILE_PATH"] = VALIDATION_FILE_PATH
os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"] = REGION

In [119]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/


### Package the script into a docker image.

Notice that we are installing specific versions of `scikit-learn` and `pandas` in the training image. This is done to make sure that the training runtime in the training container is aligned with the serving runtime in the serving container. 

Make sure to update the URI for the base image so that it points to your project's **Container Registry**.

In [120]:
TRAINING_APP_FOLDER = "training_app"
os.makedirs(TRAINING_APP_FOLDER, exist_ok=True)

In [121]:
%%writefile {TRAINING_APP_FOLDER}/Dockerfile

FROM gcr.io/deeplearning-platform-release/base-cpu
RUN pip install -U fire cloudml-hypertune implicit
WORKDIR /app
COPY train.py .

ENTRYPOINT ["python", "train.py"]

Overwriting training_app/Dockerfile


In [122]:
IMAGE_NAME = "trainer_image"
IMAGE_TAG = "latest"
IMAGE_URI = f"gcr.io/{PROJECT_ID}/{IMAGE_NAME}:{IMAGE_TAG}"

os.environ["IMAGE_URI"] = IMAGE_URI

In [123]:
!gcloud builds submit --tag $IMAGE_URI $TRAINING_APP_FOLDER

Creating temporary tarball archive of 4 file(s) totalling 15.7 KiB before compression.
Uploading tarball of [training_app] to [gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654594411.17675-78d8543396d14197b1d0784745295a5d.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/qwiklabs-asl-04-5e165f533cac/locations/global/builds/9e6c9d34-03dc-4794-bf75-f9ac09fe55a3].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/9e6c9d34-03dc-4794-bf75-f9ac09fe55a3?project=547029906128].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "9e6c9d34-03dc-4794-bf75-f9ac09fe55a3"

FETCHSOURCE
Fetching storage object: gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654594411.17675-78d8543396d14197b1d0784745295a5d.tgz#1654594411391223
Copying gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654594411.17675-78d8543396d14197b1d0784745295a5d.tgz#1654594411391223...
/ [1 files][  3.1 KiB/  3.1 KiB]                      

## Submit an Vertex AI hyperparameter tuning job

In [124]:
TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"beer_recom_tuning_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

os.environ["JOB_NAME"] = JOB_NAME
os.environ["JOB_DIR"] = JOB_DIR

In [125]:
%%bash

CONFIG_YAML=config.yaml

gcloud ai hp-tuning-jobs create \
    --region=$REGION \
    --display-name=$JOB_NAME \
    --config=$CONFIG_YAML \
    --max-trial-count=5 \
    --parallel-trial-count=5

echo "JOB_NAME: $JOB_NAME"

JOB_NAME: beer_recom_tuning_20220607_093545


Using endpoint [https://us-east1-aiplatform.googleapis.com/]
Hyperparameter tuning job [1381921189370265600] submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai hp-tuning-jobs describe 1381921189370265600 --region=us-east1

Job State: JOB_STATE_PENDING


In [126]:
jobs = aiplatform.HyperparameterTuningJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai hp-tuning-jobs describe $JOB_NUM --region=us-east1

1381921189370265600
Using endpoint [https://us-east1-aiplatform.googleapis.com/]
createTime: '2022-06-07T09:35:46.720471Z'
displayName: beer_recom_tuning_20220607_093545
maxTrialCount: 5
name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1381921189370265600
parallelTrialCount: 5
startTime: '2022-06-07T09:35:46.868390Z'
state: JOB_STATE_QUEUED
studySpec:
  metrics:
  - goal: MAXIMIZE
    metricId: map_at_10
  parameters:
  - discreteValueSpec:
      values:
      - 16.0
      - 32.0
      - 64.0
      - 128.0
    parameterId: factors
  - integerValueSpec:
      maxValue: '100'
      minValue: '10'
    parameterId: iterations
    scaleType: UNIT_LINEAR_SCALE
  - doubleValueSpec:
      maxValue: 0.1
      minValue: 0.0001
    parameterId: regularization
    scaleType: UNIT_LOG_SCALE
trialJobSpec:
  workerPoolSpecs:
  - containerSpec:
      imageUri: gcr.io/qwiklabs-asl-04-5e165f533cac/trainer_image
    diskSpec:
      bootDiskSizeGb: 100
      bootDiskType: pd-ssd
   

### Retrieve HP-tuning results.

After the job completes you can review the results using GCP Console or programmatically using the following functions (note that this code supposes that the metrics that the hyperparameter tuning engine optimizes is maximized): 

In [129]:
def get_trials(job_name):
    #aiplatform.init(location="us-east1")
    jobs = aiplatform.HyperparameterTuningJob.list()
    match = [job for job in jobs if job.display_name == JOB_NAME]
    tuning_job = match[0] if match else None
    return tuning_job.trials if tuning_job else None


def get_best_trial(trials):
    metrics = [trial.final_measurement.metrics[0].value for trial in trials]
    best_trial = trials[metrics.index(max(metrics))]
    return best_trial


def retrieve_best_trial_from_job_name(jobname):
    trials = get_trials(jobname)
    best_trial = get_best_trial(trials)
    return best_trial

In [133]:
jobs = aiplatform.HyperparameterTuningJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
print(match)
tuning_job = match[0] if match else None
print(tuning_job)
print(tuning_job.trials)
best_trial = retrieve_best_trial_from_job_name(JOB_NAME)

[<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f0df9745590> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1381921189370265600]
<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f0df9745590> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1381921189370265600
[id: "1"
state: INFEASIBLE
parameters {
  parameter_id: "factors"
  value {
    number_value: 64.0
  }
}
parameters {
  parameter_id: "iterations"
  value {
    number_value: 55.0
  }
}
parameters {
  parameter_id: "regularization"
  value {
    number_value: 0.003162277660168379
  }
}
start_time {
  seconds: 1654594556
  nanos: 21423181
}
end_time {
  seconds: 1654594654
}
, id: "2"
state: INFEASIBLE
parameters {
  parameter_id: "factors"
  value {
    number_value: 32.0
  }
}
parameters {
  parameter_id: "iterations"
  value {
    number_value: 36.0
  }
}
parameters {
  parameter_id: "regularization"
  value {
    number_val

IndexError: list index (0) out of range

## Retrain the model with the best hyperparameters

You can now retrain the model using the best hyperparameters and using combined training and validation splits as a training dataset.

### Configure and run the training job

In [134]:
FACTORS = int(best_trial.parameters[0].value)
ITERATIONS = int(best_trial.parameters[1].value)
REGULARIZATION = best_trial.parameters[2].value
ISTUNE = False

REGION = "us-east1"
PROJECT_ID = "qwiklabs-asl-04-5e165f533cac"
ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"
DATA_ROOT = os.path.join(ARTIFACT_STORE, "data")
JOB_DIR_ROOT = os.path.join(ARTIFACT_STORE, "jobs")

TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"JOB_VERTEX_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

MACHINE_TYPE="n1-standard-16"
REPLICA_COUNT=1

WORKER_POOL_SPEC = f"""\
machine-type={MACHINE_TYPE},\
replica-count={REPLICA_COUNT},\
container-image-uri={IMAGE_URI}\
"""

ARGS = f"""\
--factors={FACTORS},\
--regularization={REGULARIZATION},\
--iterations={ITERATIONS},\
--is_tune={ISTUNE}
"""

!gcloud ai custom-jobs create \
  --region={REGION} \
  --display-name={JOB_NAME} \
  --worker-pool-spec={WORKER_POOL_SPEC} \
  --args={ARGS}

#!gcloud ai custom-jobs create \
#  --region={REGION} \
#  --display-name={JOB_NAME} \
#  --worker-pool-spec={WORKER_POOL_SPEC} \

print("The model will be exported at:", JOB_DIR)

Using endpoint [https://us-east1-aiplatform.googleapis.com/]
CustomJob [projects/547029906128/locations/us-east1/customJobs/1535043576700862464] is submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/1535043576700862464

or continue streaming the logs with the command

  $ gcloud ai custom-jobs stream-logs projects/547029906128/locations/us-east1/customJobs/1535043576700862464
The model will be exported at: gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/jobs/JOB_VERTEX_20220607_094936


In [135]:
#jobs = aiplatform.HyperparameterTuningJob.list()
jobs = aiplatform.CustomJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/$JOB_NUM

1535043576700862464
Using endpoint [https://us-east1-aiplatform.googleapis.com/]
createTime: '2022-06-07T09:49:37.283054Z'
displayName: JOB_VERTEX_20220607_094936
jobSpec:
  workerPoolSpecs:
  - containerSpec:
      args:
      - --factors=32
      - --regularization=0.0007930078893541997
      - --iterations=100
      - --is_tune=False
      imageUri: gcr.io/qwiklabs-asl-04-5e165f533cac/trainer_image:latest
    diskSpec:
      bootDiskSizeGb: 100
      bootDiskType: pd-ssd
    machineSpec:
      machineType: n1-standard-16
    replicaCount: '1'
name: projects/547029906128/locations/us-east1/customJobs/1535043576700862464
startTime: '2022-06-07T09:49:37.481925Z'
state: JOB_STATE_PENDING
updateTime: '2022-06-07T09:49:37.716216Z'
