## Configure environment settings

Set location paths, connections strings, and other environment settings. Make sure to update   `REGION`, and `ARTIFACT_STORE`  with the settings reflecting your lab environment. 

- `REGION` - the compute region for Vertex AI Training and Prediction
- `ARTIFACT_STORE` - A GCS bucket in the created in the same region.

Containerized Training Cloud Serving
- `Phase 1` - (Before ML Pipeline)

In [36]:
import os
import time

from google.cloud import aiplatform

In [37]:
REGION = "us-east1"

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]

ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"

DATA_ROOT = f"{ARTIFACT_STORE}/data"
JOB_DIR_ROOT = f"{ARTIFACT_STORE}/jobs"
TRAINING_FILE_PATH = f"{DATA_ROOT}/train.parquet"
VALIDATION_FILE_PATH = f"{DATA_ROOT}/valid.parquet"
API_ENDPOINT = f"{REGION}-aiplatform.googleapis.com"

In [38]:
os.environ["JOB_DIR_ROOT"] = JOB_DIR_ROOT
os.environ["TRAINING_FILE_PATH"] = TRAINING_FILE_PATH
os.environ["VALIDATION_FILE_PATH"] = VALIDATION_FILE_PATH
os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"] = REGION

In [39]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/


### Package the script into a docker image.

Notice that we are installing specific versions of `scikit-learn` and `pandas` in the training image. This is done to make sure that the training runtime in the training container is aligned with the serving runtime in the serving container. 

Make sure to update the URI for the base image so that it points to your project's **Container Registry**.

In [40]:
TRAINING_APP_FOLDER = "training_app"
os.makedirs(TRAINING_APP_FOLDER, exist_ok=True)

In [57]:
%%writefile {TRAINING_APP_FOLDER}/Dockerfile

FROM gcr.io/deeplearning-platform-release/base-cpu
RUN pip install -U fire cloudml-hypertune implicit
WORKDIR /app
COPY train.py .

ENTRYPOINT ["python", "train.py"]

Overwriting training_app/Dockerfile


In [58]:
IMAGE_NAME = "trainer_image"
IMAGE_TAG = "latest"
IMAGE_URI = f"gcr.io/{PROJECT_ID}/{IMAGE_NAME}:{IMAGE_TAG}"

os.environ["IMAGE_URI"] = IMAGE_URI

In [59]:
!gcloud builds submit --tag $IMAGE_URI $TRAINING_APP_FOLDER

Creating temporary tarball archive of 4 file(s) totalling 16.3 KiB before compression.
Uploading tarball of [training_app] to [gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654655329.564066-ee0e5fe600cf442e8f7cf95d9c5ab6e5.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/qwiklabs-asl-04-5e165f533cac/locations/global/builds/3129b0a1-d610-4d25-90ee-f4c00ad774d2].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/3129b0a1-d610-4d25-90ee-f4c00ad774d2?project=547029906128].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "3129b0a1-d610-4d25-90ee-f4c00ad774d2"

FETCHSOURCE
Fetching storage object: gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654655329.564066-ee0e5fe600cf442e8f7cf95d9c5ab6e5.tgz#1654655329829270
Copying gs://qwiklabs-asl-04-5e165f533cac_cloudbuild/source/1654655329.564066-ee0e5fe600cf442e8f7cf95d9c5ab6e5.tgz#1654655329829270...
/ [1 files][  3.2 KiB/  3.2 KiB]                   

## Submit an Vertex AI hyperparameter tuning job

In [44]:
TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"beer_recom_tuning_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

os.environ["JOB_NAME"] = JOB_NAME
os.environ["JOB_DIR"] = JOB_DIR

In [45]:
%%bash

CONFIG_YAML=config.yaml

gcloud ai hp-tuning-jobs create \
    --region=$REGION \
    --display-name=$JOB_NAME \
    --config=$CONFIG_YAML \
    --max-trial-count=5 \
    --parallel-trial-count=5

echo "JOB_NAME: $JOB_NAME"

JOB_NAME: beer_recom_tuning_20220608_015533


Using endpoint [https://us-east1-aiplatform.googleapis.com/]
Hyperparameter tuning job [8556929451957420032] submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai hp-tuning-jobs describe 8556929451957420032 --region=us-east1

Job State: JOB_STATE_PENDING


In [31]:
aiplatform.init(location="us-east1")
jobs = aiplatform.HyperparameterTuningJob.list()
print(jobs)
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai hp-tuning-jobs describe $JOB_NUM --region=us-east1

[<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271c967d0> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1783304506159661056, <google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271cf3250> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1515551434563649536, <google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271c86e50> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/7512094338407464960, <google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271c87b10> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1049428873130803200, <google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271c87910> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/5553943294175608832, <google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271c86510> 
res

### Retrieve HP-tuning results.

After the job completes you can review the results using GCP Console or programmatically using the following functions (note that this code supposes that the metrics that the hyperparameter tuning engine optimizes is maximized): 

In [32]:
def get_trials(job_name):
    #aiplatform.init(location="us-east1")
    jobs = aiplatform.HyperparameterTuningJob.list()
    match = [job for job in jobs if job.display_name == JOB_NAME]
    tuning_job = match[0] if match else None
    return tuning_job.trials if tuning_job else None


def get_best_trial(trials):
    metrics = [trial.final_measurement.metrics[0].value for trial in trials]
    best_trial = trials[metrics.index(max(metrics))]
    return best_trial


def retrieve_best_trial_from_job_name(jobname):
    trials = get_trials(jobname)
    best_trial = get_best_trial(trials)
    return best_trial

In [33]:
jobs = aiplatform.HyperparameterTuningJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
print(match)
tuning_job = match[0] if match else None
print(tuning_job)
print(tuning_job.trials)
best_trial = retrieve_best_trial_from_job_name(JOB_NAME)

[<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271d20350> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1515551434563649536]
<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4271d20350> 
resource name: projects/547029906128/locations/us-east1/hyperparameterTuningJobs/1515551434563649536
[id: "1"
state: SUCCEEDED
parameters {
  parameter_id: "factors"
  value {
    number_value: 32.0
  }
}
parameters {
  parameter_id: "iterations"
  value {
    number_value: 55.0
  }
}
parameters {
  parameter_id: "regularization"
  value {
    number_value: 0.003162277660168379
  }
}
final_measurement {
  step_count: 1
  metrics {
    metric_id: "map_at_10"
    value: 0.08547677242622731
  }
}
start_time {
  seconds: 1654651842
  nanos: 316353021
}
end_time {
  seconds: 1654652051
}
, id: "2"
state: SUCCEEDED
parameters {
  parameter_id: "factors"
  value {
    number_value: 64.0
  }
}
parameters {
  parameter_id: "iterations

## Retrain the model with the best hyperparameters

You can now retrain the model using the best hyperparameters and using combined training and validation splits as a training dataset.

### Configure and run the training job

In [60]:
FACTORS = int(best_trial.parameters[0].value)
ITERATIONS = int(best_trial.parameters[1].value)
REGULARIZATION = best_trial.parameters[2].value
ISTUNE = False

REGION = "us-east1"
PROJECT_ID = "qwiklabs-asl-04-5e165f533cac"
ARTIFACT_STORE = f"gs://{PROJECT_ID}-beer-artifact-store"
DATA_ROOT = os.path.join(ARTIFACT_STORE, "data")
JOB_DIR_ROOT = os.path.join(ARTIFACT_STORE, "jobs")

TIMESTAMP = time.strftime("%Y%m%d_%H%M%S")
JOB_NAME = f"JOB_VERTEX_{TIMESTAMP}"
JOB_DIR = f"{JOB_DIR_ROOT}/{JOB_NAME}"

MACHINE_TYPE="n1-standard-16"
REPLICA_COUNT=1

WORKER_POOL_SPEC = f"""\
machine-type={MACHINE_TYPE},\
replica-count={REPLICA_COUNT},\
container-image-uri={IMAGE_URI}\
"""

ARGS = f"""\
--factors={FACTORS},\
--regularization={REGULARIZATION},\
--iterations={ITERATIONS},\
--is_tune={ISTUNE}
"""

!gcloud ai custom-jobs create \
  --region={REGION} \
  --display-name={JOB_NAME} \
  --worker-pool-spec={WORKER_POOL_SPEC} \
  --args={ARGS}

#!gcloud ai custom-jobs create \
#  --region={REGION} \
#  --display-name={JOB_NAME} \
#  --worker-pool-spec={WORKER_POOL_SPEC} \

print("The model will be exported at:", JOB_DIR)

Using endpoint [https://us-east1-aiplatform.googleapis.com/]
CustomJob [projects/547029906128/locations/us-east1/customJobs/8836152628854390784] is submitted successfully.

Your job is still active. You may view the status of your job with the command

  $ gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/8836152628854390784

or continue streaming the logs with the command

  $ gcloud ai custom-jobs stream-logs projects/547029906128/locations/us-east1/customJobs/8836152628854390784
The model will be exported at: gs://qwiklabs-asl-04-5e165f533cac-beer-artifact-store/jobs/JOB_VERTEX_20220608_023049


In [35]:
#jobs = aiplatform.HyperparameterTuningJob.list()
jobs = aiplatform.CustomJob.list()
match = [job for job in jobs if job.display_name == JOB_NAME]
tuning_job = match[0] if match else None
JOB_NUM = str(tuning_job)[-19:]
print(JOB_NUM)
!gcloud ai custom-jobs describe projects/547029906128/locations/us-east1/customJobs/$JOB_NUM

3821394443777343488
Using endpoint [https://us-east1-aiplatform.googleapis.com/]
createTime: '2022-06-08T01:42:50.996217Z'
displayName: JOB_VERTEX_20220608_014249
jobSpec:
  workerPoolSpecs:
  - containerSpec:
      args:
      - --factors=32
      - --regularization=0.004571501226874493
      - --iterations=96
      - --is_tune=False
      imageUri: gcr.io/qwiklabs-asl-04-5e165f533cac/trainer_image:latest
    diskSpec:
      bootDiskSizeGb: 100
      bootDiskType: pd-ssd
    machineSpec:
      machineType: n1-standard-16
    replicaCount: '1'
name: projects/547029906128/locations/us-east1/customJobs/3821394443777343488
startTime: '2022-06-08T01:42:51.215900Z'
state: JOB_STATE_PENDING
updateTime: '2022-06-08T01:42:51.535750Z'
