## Configure environment settings

Set location paths, connections strings, and other environment settings. Make sure to update   `REGION`, and `ARTIFACT_STORE`  with the settings reflecting your lab environment. 

- `REGION` - the compute region for Vertex AI Training and Prediction
- `ARTIFACT_STORE` - A GCS bucket in the created in the same region.

Containerized Training Cloud Serving
- `Phase 1` - (Before ML Pipeline)

In [23]:
import os
import time

from google.cloud import aiplatform

In [24]:
REGION = "us-east1"

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]

ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"

DATA_ROOT = f"{ARTIFACT_STORE}/data"
JOB_DIR_ROOT = f"{ARTIFACT_STORE}/jobs"
TRAINING_FILE_PATH = f"{DATA_ROOT}/train.parquet"
VALIDATION_FILE_PATH = f"{DATA_ROOT}/valid.parquet"
API_ENDPOINT = f"{REGION}-aiplatform.googleapis.com"

In [25]:
os.environ["JOB_DIR_ROOT"] = JOB_DIR_ROOT
os.environ["TRAINING_FILE_PATH"] = TRAINING_FILE_PATH
os.environ["VALIDATION_FILE_PATH"] = VALIDATION_FILE_PATH
os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"] = REGION

In [26]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/


### Package the script into a docker image.

Notice that we are installing specific versions of `scikit-learn` and `pandas` in the training image. This is done to make sure that the training runtime in the training container is aligned with the serving runtime in the serving container. 

Make sure to update the URI for the base image so that it points to your project's **Container Registry**.

In [27]:
TRAINING_APP_FOLDER = "training_app"
os.makedirs(TRAINING_APP_FOLDER, exist_ok=True)

In [28]:
%%writefile {TRAINING_APP_FOLDER}/Dockerfile

FROM gcr.io/deeplearning-platform-release/base-cpu
RUN pip install -U fire cloudml-hypertune implicit
WORKDIR /app
COPY train.py .

ENTRYPOINT ["python", "train.py"]

Overwriting training_app/Dockerfile


In [29]:
IMAGE_NAME = "trainer_image"
IMAGE_TAG = "latest"
IMAGE_URI = f"gcr.io/{PROJECT_ID}/{IMAGE_NAME}:{IMAGE_TAG}"

os.environ["IMAGE_URI"] = IMAGE_URI

In [30]:
!gcloud builds submit --tag $IMAGE_URI $TRAINING_APP_FOLDER

Creating temporary tarball archive of 4 file(s) totalling 12.3 KiB before compression.
Uploading tarball of [training_app] to [gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654568701.931167-bad19bd8f3654931afdc32eb3b40a166.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/qwiklabs-asl-04-5e165f533cac/locations/global/builds/9badeee5-c921-45e1-97cc-8ee02ccc39df].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/9badeee5-c921-45e1-97cc-8ee02ccc39df?project=547029906128].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "9badeee5-c921-45e1-97cc-8ee02ccc39df"

FETCHSOURCE
Fetching storage object: gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654568701.931167-bad19bd8f3654931afdc32eb3b40a166.tgz#1654568702179003
Copying gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654568701.931167-bad19bd8f3654931afdc32eb3b40a166.tgz#1654568702179003...
/ [1 files][  2.7 KiB/  2.7 KiB]                   

## Submit an Vertex AI hyperparameter tuning job

In [31]:
TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"beer_recom_tuning_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

os.environ["JOB_NAME"] = JOB_NAME
os.environ["JOB_DIR"] = JOB_DIR

In [32]:
%%bash

CONFIG_YAML=config.yaml

gcloud ai hp-tuning-jobs create \
    --region=$REGION \
    --display-name=$JOB_NAME \
    --config=$CONFIG_YAML \
    --max-trial-count=5 \
    --parallel-trial-count=5

echo "JOB_NAME: $JOB_NAME"

JOB_NAME: beer_recom_tuning_20220607_022720


Using endpoint [https://us-east1-aiplatform.googleapis.com/]
Hyperparameter tuning job [3073022849447886848] submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai hp-tuning-jobs describe 3073022849447886848 --region=us-east1

Job State: JOB_STATE_PENDING


In [62]:
jobs = aiplatform.HyperparameterTuningJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai hp-tuning-jobs describe $JOB_NUM --region=us-east1

3073022849447886848
Using endpoint [https://us-east1-aiplatform.googleapis.com/]
createTime: '2022-06-07T02:27:21.658327Z'
displayName: beer_recom_tuning_20220607_022720
endTime: '2022-06-07T02:35:37Z'
maxTrialCount: 5
name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/3073022849447886848
parallelTrialCount: 5
startTime: '2022-06-07T02:27:28Z'
state: JOB_STATE_SUCCEEDED
studySpec:
  metrics:
  - goal: MAXIMIZE
    metricId: map_at_10
  parameters:
  - discreteValueSpec:
      values:
      - 16.0
      - 32.0
      - 64.0
      - 128.0
    parameterId: factors
  - integerValueSpec:
      maxValue: '100'
      minValue: '10'
    parameterId: iterations
    scaleType: UNIT_LINEAR_SCALE
  - doubleValueSpec:
      maxValue: 0.1
      minValue: 0.0001
    parameterId: regularization
    scaleType: UNIT_LOG_SCALE
trialJobSpec:
  workerPoolSpecs:
  - containerSpec:
      imageUri: gcr.io/qwiklabs-asl-04-5e165f533cac/trainer_image
    diskSpec:
      bootDiskSizeGb: 100
  

### Retrieve HP-tuning results.

After the job completes you can review the results using GCP Console or programmatically using the following functions (note that this code supposes that the metrics that the hyperparameter tuning engine optimizes is maximized): 

In [63]:
def get_trials(job_name):
    #aiplatform.init(location="us-east1")
    jobs = aiplatform.HyperparameterTuningJob.list()
    match = [job for job in jobs if job.display_name == JOB_NAME]
    tuning_job = match[0] if match else None
    return tuning_job.trials if tuning_job else None


def get_best_trial(trials):
    metrics = [trial.final_measurement.metrics[0].value for trial in trials]
    best_trial = trials[metrics.index(max(metrics))]
    return best_trial


def retrieve_best_trial_from_job_name(jobname):
    trials = get_trials(jobname)
    best_trial = get_best_trial(trials)
    return best_trial

In [64]:
best_trial = retrieve_best_trial_from_job_name(JOB_NAME)

id: "5"
state: SUCCEEDED
parameters {
  parameter_id: "factors"
  value {
    number_value: 32.0
  }
}
parameters {
  parameter_id: "iterations"
  value {
    number_value: 100.0
  }
}
parameters {
  parameter_id: "regularization"
  value {
    number_value: 0.0007930078893541997
  }
}
final_measurement {
  step_count: 1
  metrics {
    metric_id: "map_at_10"
    value: 0.08571391392946995
  }
}
start_time {
  seconds: 1654568853
  nanos: 410365532
}
end_time {
  seconds: 1654569105
}



## Retrain the model with the best hyperparameters

You can now retrain the model using the best hyperparameters and using combined training and validation splits as a training dataset.

### Configure and run the training job

In [100]:
FACTORS = best_trial.parameters[0].value
ITERATIONS = best_trial.parameters[1].value
REGULARIZATION = best_trial.parameters[2].value

In [110]:
FACTORS = best_trial.parameters[0].value
ITERATIONS = best_trial.parameters[1].value
REGULARIZATION = best_trial.parameters[2].value
ISTUNE = False

REGION = "us-east1"
PROJECT_ID = "qwiklabs-asl-04-5e165f533cac"
ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"
DATA_ROOT = os.path.join(ARTIFACT_STORE, "data")
JOB_DIR_ROOT = os.path.join(ARTIFACT_STORE, "jobs")

TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"JOB_VERTEX_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

MACHINE_TYPE="n1-standard-16"
REPLICA_COUNT=1

WORKER_POOL_SPEC = f"""\
machine-type={MACHINE_TYPE},\
replica-count={REPLICA_COUNT},\
container-image-uri={IMAGE_URI}\
"""

ARGS = f"""\
--factors={FACTORS},\
--iterations={ITERATIONS},\
--regularization={REGULARIZATION},\
--is_tune={ISTUNE}
"""

!gcloud ai custom-jobs create \
  --region={REGION} \
  --display-name={JOB_NAME} \
  --worker-pool-spec={WORKER_POOL_SPEC} \
  --args={ARGS}

#!gcloud ai custom-jobs create \
#  --region={REGION} \
#  --display-name={JOB_NAME} \
#  --worker-pool-spec={WORKER_POOL_SPEC} \

print("The model will be exported at:", JOB_DIR)

Using endpoint [https://us-east1-aiplatform.googleapis.com/]
CustomJob [projects/547029906128/locations/us-east1/customJobs/8367004211421904896] is submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/8367004211421904896

or continue streaming the logs with the command

  $ gcloud ai custom-jobs stream-logs projects/547029906128/locations/us-east1/customJobs/8367004211421904896
The model will be exported at: gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/jobs/JOB_VERTEX_20220607_091530


In [111]:
#jobs = aiplatform.HyperparameterTuningJob.list()
jobs = aiplatform.CustomJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/$JOB_NUM

8367004211421904896
Using endpoint [https://us-east1-aiplatform.googleapis.com/]
createTime: '2022-06-07T09:15:31.900820Z'
displayName: JOB_VERTEX_20220607_091530
jobSpec:
  workerPoolSpecs:
  - containerSpec:
      args:
      - --factors=32.0
      - --iterations=100.0
      - --regularization=0.0007930078893541997
      - --is_tune=False
      imageUri: gcr.io/qwiklabs-asl-04-5e165f533cac/trainer_image:latest
    diskSpec:
      bootDiskSizeGb: 100
      bootDiskType: pd-ssd
    machineSpec:
      machineType: n1-standard-16
    replicaCount: '1'
name: projects/547029906128/locations/us-east1/customJobs/8367004211421904896
startTime: '2022-06-07T09:15:32.190339Z'
state: JOB_STATE_PENDING
updateTime: '2022-06-07T09:15:32.524134Z'
