# Improving Relevance of Recommendations

Our focus so far has been on functionality: We've deployed 3 types of recommendation model to our website (and enabled real-time event feedback via the `personalize-events` SDK) using the same original interactions dataset of just user ID, item ID, and timestamp.

So how can we improve the **accuracy/relevance** of our deployed models to boost performance?

This notebook demonstrates two tools available:

1. Adding **metadata** to help the model generalize and even tackle "cold start" problem of recommending new items with no data.
2. Performing **hyperparameter optimization (HPO)** to tune the model's performance as best we can on given datasets.

The relative gains available are strongly dataset-dependent. In this notebook we'll start by focussing on our source data, and then bring in HPO second.

First, let's get our library imports and initial setup out of the way:

In [None]:
# We'll use this library for progress bars later, and it's not installed by default:
!pip install tqdm

In [None]:
%load_ext autoreload
%autoreload 2

# Python Built-Ins:
import csv
from datetime import datetime as dt
import json
import os
import random
import time
import warnings

# External Dependencies:
import boto3
from botocore import exceptions as botoexceptions
import requests
from tqdm import tqdm

# Local Dependencies:
import util

%store -r role_arn
%store -r dataset_group_arn
%store -r hrnn_solution_arn
%store -r hrnn_campaign_arn

personalize = boto3.client("personalize")
rekognition = boto3.client("rekognition")  # For optional item data enrichment from image
s3 = boto3.resource("s3")  # Cloud object storage (for our data!)

## Adding metadata

[Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html) supports 3 historical and one live-updating type of dataset in each **dataset group**:

* **Interactions (Required)**: Observed `user-item` interactions are at the core of Personalize's modelling philosophy, which focusses on directly analysing patterns to predict future interactions, rather than proxying via assumed labels like user demographics or item categories. In this respect, HRNN-based models could be considered as an extension over the traditional field of [Collaborative Filtering](https://en.wikipedia.org/wiki/Collaborative_filtering)
* **Items (Optional)**: Item metadata which, *secondarily to the core interaction data*, could help the model understand relationships between different item IDs.
* **Users (Optional)**: User metadata which, *secondarily to the core interaction data*, coudl help the model understand relationships between different user IDs.
* ***Events (Optional)***: *Live additions* to the historical interactions dataset, as collected through the Personalize Events API.

Since ([as detailed in the developer guide](https://docs.aws.amazon.com/personalize/latest/dg/recording-events.html)) **Events** should map back to the **Interactions** schema, there are only 3 independent schemas for us to design.

All of these schemas support certain [required and reserved fields](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html) (like `ITEM_ID`) - but also **custom** fields.

As usual in machine learning we have the *freedom* to define extra fields to help the model (e.g. generalize observed patterns to other users or items with less data) - but the *responsibility* to add only **informative** data. Adding attributes which don't bring **value** to the modelling just introduces noise and makes it *harder* for the model to perform: The [feature selection](https://en.wikipedia.org/wiki/Feature_selection) problem.

### User data

As discussed earlier, the [UCSD Amazon Reviews datasets](https://nijianmo.github.io/amazon/index.html) our example is based on don't have any user metadata beyond the simple user ID - so we **won't create a "Users" dataset** in Amazon Personalize as we don't have any data.

To explore this option with custom datasets, you can use our **Item** metadata steps as a guide since the process is very similar.

### Interaction metadata

Interaction metadata is a little special because there are two ways it can be used:

* To help the model **better understand patterns** in the historic data, and/or;
* To provide **context** already known at the point we generate recommendations for users (see the ["Getting Real-Time Recommendations"](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html) doc)

Contextual information such as device type or location might help us adapt recommendations in real-time, whereas retrospective information like whether the customer left a review for a purchase would only be useful for understanding historic patterns.

Let's take another look at the raw interaction data:

In [None]:
n_interactions = 0
sample_interaction = None

for interaction in util.dataformat.data_folder_reader("data/raw/interactions"):
    n_interactions += 1

    # Reservoir sampling R-algorithm (simple, non-optimal) with k=1:
    if random.randint(1, n_interactions) <= 1:
        sample_interaction = interaction

print(f"\nGot {n_interactions} interactions total")
print("Sample interaction:")
print(sample_interaction)

So is there anything here that might be helpful?

User or item attributes like `reviewerName` sometimes feature in interaction data to make analysis easier or because they might change over time - but for our modelling purposes are probably best left for user and item metadata sets

Since the UCSD dataset is **reviews** (rather than views or purchases), we have a whole host of extra information like:

* `overall`: a "stars" style score from 1.0 to 5.0
    * Useful for quantifying the customer's experience of the item
    * Likely predictive of repeat purchases *for categories where repeat purchase is likely* (e.g. consumables, short-lifespan)
* `vote`: review usefulness vote total (number represented as string - not always present)
    * Is the perceived insightfulness of the review relevant for us?
    * Potentially could be a marker for inauthentic/fraudulent behaviour
* `verified`: whether the reviewer had a verified purchase of the item (boolean)
    * Potentially could be a marker for inauthentic/fraudulent behaviour e.g. reviewing competitor or own products
* `summary` and `reviewText`: the actual text of the review
    * Useful for understanding detailed item feedback
    * Length (perhaps relative to users' average review length) could provide further insight into how strong the opinion is

At the time of writing Amazon Personalize reserves `EVENT_TYPE` (e.g. review vs purchase vs view) and `EVENT_VALUE` (e.g. purchase price or review rating) fields in the **Interactions** schema: But the standard recipes do not automatically account for events of different types and values to train one composite model.

For our example website we'd like to avoid filtering the data (e.g. on only positive reviews) because this would exacerbate the sparsity of the data: We already know users only review a fraction of the products they purchase, and even e.g. leaving a bad review for a bag of ground coffee may indicate the customer's interest in other ground coffee products.

...So **we'll add the overall "stars" score** as our `EVENT_VALUE` field, recognising that it matches the intended use of the reserved field even though current models may not be affected by it.

TODO: EVENT_TYPE too?

It could be argued that tagging reviews with some sort of authenticity score (e.g. `verified`) might be relevant for judging historic patterns, but probably it's more important to track down and remove fraudulent behaviour (e.g. using [Amazon Fraud Detector](https://aws.amazon.com/fraud-detector/)) than to improve the product recommendations the fraudsters see. If their interaction patterns are markedly different from typical users, this should already coach the algorithm to treat them differently.

There aren't really any fields that could give us useful contextual cues at runtime (e.g. device, location, etc.) - and even if there were, this is a reviews dataset so the review context might be quite different from the purchase context.

...Therefore for our example's interactions we'll add the `EVENT_VALUE` field and leave it at that:

In [None]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        { "name": "USER_ID", "type": "string" },
        { "name": "ITEM_ID", "type": "string" },
        { "name": "TIMESTAMP", "type": "long" },
        { "name": "EVENT_TYPE", "type": "string" },
        { "name": "EVENT_VALUE", "type": "float" },
    ],
    "version": "1.0",
}

try:
    create_schema_response = personalize.create_schema(
        name=f"{os.environ['CF_STACK_NAME']}-schema-interactions-extended",
        schema=json.dumps(interactions_schema)
    )

    interactions_schema_arn = create_schema_response["schemaArn"]
    print(json.dumps(create_schema_response, indent=2))

except botoexceptions.ClientError as err:
    # If the schema already exists, scrape the ARN from the message and check it:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn(
            "Schema already exists.\nScraping ARN from error message.\n"
            "To change the existing schema, delete it and re-create it."
        )
        msg = err.response["Error"]["Message"]
        interactions_schema_arn = msg[msg.index("arn:aws:personalize"):].partition(" ")[0]
        description = personalize.describe_schema(schemaArn=interactions_schema_arn)
        print(description)
    else:
        raise err

In [None]:
interactions_filename = "data/interactions-extended.csv"
with open(interactions_filename, "w") as f:
    writer = csv.DictWriter(
        f,
        dialect="unix",
        fieldnames=["USER_ID", "ITEM_ID", "TIMESTAMP", "EVENT_TYPE", "EVENT_VALUE"]
    )
    writer.writeheader()
    print("Writing interactions...")
    for event in util.dataformat.data_folder_reader("data/raw/interactions"):
        writer.writerow({
            "USER_ID": util.dataformat.get_interaction_user_id(event),
            "ITEM_ID": util.dataformat.get_interaction_item_id(event),
            "TIMESTAMP": util.dataformat.get_interaction_timestamp(event),
            "EVENT_TYPE": "review",
            "EVENT_VALUE": util.dataformat.get_interaction_value(event),
        })
    print("Done!")

In [None]:
# Upload to S3:
s3.Object(os.environ["STAGING_BUCKET"], interactions_filename).upload_file(interactions_filename)

In [None]:
try:
    # Typically this will fail because the old dataset already exists:
    create_dataset_response = personalize.create_dataset(
        name=f"{os.environ['CF_STACK_NAME']}-interactions",
        datasetType="INTERACTIONS",
        datasetGroupArn=dataset_group_arn,
        schemaArn=interactions_schema_arn,
    )

    interactions_dataset_arn = create_dataset_response["datasetArn"]
    print(json.dumps(create_dataset_response, indent=2))

except botoexceptions.ClientError as err:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn("Dataset already exists. Deleting old dataset to update schema...")
        old_dataset_arn = (
            dataset_group_arn.replace(":dataset-group/", ":dataset/")
            + "/INTERACTIONS"
        )

        # Delete the existing dataset and wait for the deletion to complete:
        personalize.delete_dataset(datasetArn=old_dataset_arn)
        status="DELETE PENDING"
        max_time = time.time() + 20*60 # 20 mins
        while status != "DELETED":
            assert time.time() < max_time, "Timed out waiting for old dataset to delete"
            try:
                description = personalize.describe_dataset(datasetArn=old_dataset_arn)
            except botoexceptions.ClientError as err:
                if err.response["Error"]["Code"] == "ResourceNotFoundException":
                    print("Existing dataset deleted")
                    status = "DELETED"
                else:
                    raise err
            time.sleep(15)  # Should only take a short while, so polling can be fast

        # Re-create the dataset with the new schema:
        create_dataset_response = personalize.create_dataset(
            name=f"{os.environ['CF_STACK_NAME']}-interactions",
            datasetType="INTERACTIONS",
            datasetGroupArn=dataset_group_arn,
            schemaArn=interactions_schema_arn,
        )
        print(json.dumps(create_dataset_response, indent=2))
    else:  # Some other problem
        raise err

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    # (strftime %f is microseconds, so we trim 3 from the end for milliseconds)
    jobName=f"{os.environ['CF_STACK_NAME']}-interact-{dt.now().strftime('%Y-%m-%d-%H-%M-%S-%f')[:-3]}",
    datasetArn=interactions_dataset_arn,
    dataSource={
        "dataLocation": f"s3://{os.environ['STAGING_BUCKET']}/{interactions_filename}"
    },
    roleArn=role_arn,  # Remember the IAM role we created earlier?
)

interactions_dataset_import_job_arn = create_dataset_import_job_response["datasetImportJobArn"]
print(json.dumps(create_dataset_import_job_response, indent=2))

That import job will take some time to run, but rather than waiting on it now we'll start preparing and importing our Item data in parallel:

### Item data

TODO

In [None]:
n_items = 0
sample_item = None

for item in util.dataformat.data_folder_reader("data/raw/items"):
    n_items += 1

    # Reservoir sampling R-algorithm (simple, non-optimal) with k=1:
    if random.randint(1, n_items) <= 1:
        sample_item = item

print(f"\nGot {n_items} items total")
print("Sample item:")
print(sample_item)

In this case there's a lot of potentially interesting data associated to the items - although we need to be careful of some inconsistent/missing fields and some occasional crawling errors in the raw data. (e.g. where website HTML and JavaScript are given in description fields, instead of just the description text.

For our purposes, we'll try to create a simple `CATEGORY` field describing each item to improve the model's ability to generalize.

There are a couple of ways we might do this:

1. If (as for the UCSD data) category information is already provided in the dataset, we could just pass it through to an items CSV file.
2. Alternatively, other AI/ML models could be used to derive product features... Such as using [Amazon Rekognition](https://aws.amazon.com/rekognition/) to create category tags based on [DetectLabels](https://docs.aws.amazon.com/rekognition/latest/dg/API_DetectLabels.html) results

Here we'll show how to build up an items dataset via both methods **in the notebook**. In real-world architectures this feature engineering would typically be integrated into an **automated pipeline** on new product registration: Just like our CloudFormation stack automatically indexes products in ElasticSearch when they're added to the website's DynamoDB table.

In both cases (since multiple category labels might apply to a product) we'll use the format for multi-value categorical fields as detailed in the [Amazon Personalize docs](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html): bar-separated values along the lines of `category1|category2|category3`.

#### Method 1: Using existing categories

This method is simple enough, as we're simply constructing the CSV from the features already provided in the raw UCSD data:

In [None]:
%%time

max_cats_per_item = 3

# We set this differently between the two methods, so you won't overwrite by default:
items_filename = "data/items-basic.csv"

with open(items_filename, "w") as f:
    writer = csv.DictWriter(f, dialect="unix", fieldnames=["ITEM_ID", "CATEGORY"])
    writer.writeheader()
    for i, item in enumerate(tqdm(
        util.dataformat.data_folder_reader("data/raw/items"),
        total=n_items,
        desc="Writing items CSV",
        mininterval=0.2,
    )):
        item_id = util.dataformat.get_item_id(item)
        cats = item.get("category", item.get("categories", item.get("CATEGORY", [])))
        if isinstance(cats, str):
            cats = cats.split("|")
        writer.writerow({
            "ITEM_ID": item_id,
            "CATEGORY": "|".join(cats[:3]),  # Only take up to the first 3 categories
        })
print("Done")

#### Method 2: Using Amazon Rekognition

In this alternative method, we'll fetch the advertised image URL (the first one, if there are multiple) of each product and run it through the standard public [Amazon Rekognition](https://aws.amazon.com/rekognition/) model.

<div class="alert alert-warning">
    <p>
        <b>Warning:</b> Depending on the size of your dataset, this method could be <b>slow</b> and/or incur
        significant <b>cost</b>.
    </p>
    <p>
        Make sure you understand the
        <a href="https://aws.amazon.com/rekognition/pricing/">pricing of Amazon Rekognition</a>
        before continuing with this method. For example processing 50k items in
        <span class="code">us-east-1</span>
        could cost US&#36;50, excluding free tier allowances.
    </p>
</div>

In [None]:
%%time

max_cats_per_item = 3

# We set this differently between the two methods, so you won't overwrite by default:
items_filename = "data/items-rekognition.csv"

with open(items_filename, "w") as f:
    writer = csv.DictWriter(f, dialect="unix", fieldnames=["ITEM_ID", "CATEGORY"])
    writer.writeheader()
    for i, item in enumerate(tqdm(
        util.dataformat.data_folder_reader("data/raw/items"),
        total=n_items,
        desc="Analyzing item images",
        mininterval=0.2,
    )):
        item_id = util.dataformat.get_item_id(item)
        imgurl = util.dataformat.get_item_imgurl(item)
        if imgurl:
            try:
                img = requests.get(imgurl).content
            except:
                warnings.warn(f"Couldn't get image for item ID {item_id}, URL {imgurl}")
                img = None
            if not img:
                labels = []
            else:
                try:
                    rekresponse = rekognition.detect_labels(
                        Image={ "Bytes" : img },
                        MaxLabels=5,
                        MinConfidence=60,
                    )
                except botoexceptions.ClientError as err:
                    warnings.warn("Couldn't query Rekognition for item ID {}: {}".format(
                        item_id,
                        err.response["Error"]["Code"],
                    ))
                    rekresponse = None
                if not rekresponse:
                    labels = []
                else:
                    # Rekognition already sorts the detected labels by descending confidence, but we want to take the names
                    # of any parent categories too if detected. So first build up a nested list [name, parent1, parent2] per
                    # detection:
            #         labels = [
            #             [label["Name"]] + [parent["Name"] for parent in label["Parents"]] for label in rekresponse["Labels"]
            #         ]
                    labels = [
                        [label["Name"]] for label in rekresponse["Labels"]
                    ]
                    # ...then flatten out into a list of strings, and keep only the first max_cats_per_item:
                    labels = [item for sublist in labels for item in sublist][:max_cats_per_item]
        else:
            labels = []

        writer.writerow({
            "ITEM_ID": item_id,
            "CATEGORY": "|".join(labels),
        })

#### Uploading the data

Now that our local CSV is created, uploading the data to S3 and importing it to Amazon Personalize is just like for the **Interactions** dataset we provided first.

In [None]:
# Upload to S3:
s3.Object(os.environ["STAGING_BUCKET"], items_filename).upload_file(items_filename)

In [None]:
# Create the schema in Personalize:
items_schema = {
    "type": "record",
    "name": "Items",  # Note different name vs 'Interactions'
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        { "name": "ITEM_ID", "type": "string" },
        # Per the docs, our custom string field should be marked as 'categorical'
        { "name": "CATEGORY", "type": "string", "categorical": True },
    ],
    "version": "1.0",
}

try:
    create_schema_response = personalize.create_schema(
        name=f"{os.environ['CF_STACK_NAME']}-schema-items",
        schema=json.dumps(items_schema)
    )

    items_schema_arn = create_schema_response["schemaArn"]
    print(json.dumps(create_schema_response, indent=2))

except botoexceptions.ClientError as err:
    # If the schema already exists, scrape the ARN from the message and check it:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn(
            "Schema already exists.\nScraping ARN from error message.\n"
            "To change the existing schema, delete it and re-create it."
        )
        msg = err.response["Error"]["Message"]
        items_schema_arn = msg[msg.index("arn:aws:personalize"):].partition(" ")[0]
        # TODO: Deal with no way to delete schemas in Personalize console
        personalize.delete_schema(schemaArn=items_schema_arn)
        description = personalize.describe_schema(schemaArn=items_schema_arn)
        print(description)
    else:
        raise err

In [None]:
# Create the dataset:
try:
    create_dataset_response = personalize.create_dataset(
        name=f"{os.environ['CF_STACK_NAME']}-items",
        datasetType="ITEMS",
        datasetGroupArn=dataset_group_arn,
        schemaArn=items_schema_arn,
    )

    items_dataset_arn = create_dataset_response["datasetArn"]
    print(json.dumps(create_dataset_response, indent=2))

except botoexceptions.ClientError as err:
    # If the schema already exists, infer the ARN and check it:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn(
            "Dataset already exists.\n"
            "To change the existing dataset's schema, delete it and re-create it."
        )
        items_dataset_arn = (
            dataset_group_arn.replace(":dataset-group/", ":dataset/")
            + "/ITEMS"
        )
        description = personalize.describe_dataset(datasetArn=items_dataset_arn)
        print(description)
    else:  # Some other problem
        raise err

In [None]:
print(f"Importing from {items_filename}")
create_dataset_import_job_response = personalize.create_dataset_import_job(
    # (strftime %f is microseconds, so we trim 3 from the end for milliseconds)
    jobName=f"{os.environ['CF_STACK_NAME']}-items-{dt.now().strftime('%Y-%m-%d-%H-%M-%S-%f')[:-3]}",
    datasetArn=items_dataset_arn,
    dataSource={
        "dataLocation": f"s3://{os.environ['STAGING_BUCKET']}/{items_filename}"
    },
    roleArn=role_arn,  # Remember the IAM role we created earlier?
)

items_dataset_import_job_arn = create_dataset_import_job_response["datasetImportJobArn"]
print(json.dumps(create_dataset_import_job_response, indent=2))

### Re-training the model

To create a new metadata-aware model from our new dataset versions, first we need to check that the imports are finished:

In [None]:
def dataset_import_is_done(description):
    status = description["datasetImportJob"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for dataset import {interactions_dataset_import_job_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_dataset_import_job(datasetImportJobArn=interactions_dataset_import_job_arn),
    dataset_import_is_done,
    fn_stringify_result = lambda desc: desc["datasetImportJob"]["status"],
    timeout_secs=60*60,
)

print(f"Waiting for dataset import {items_dataset_import_job_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_dataset_import_job(datasetImportJobArn=items_dataset_import_job_arn),
    dataset_import_is_done,
    fn_stringify_result = lambda desc: desc["datasetImportJob"]["status"],
    timeout_secs=60*60,
)

Next, we'll select a **metadata-aware recipe** - since the base HRNN recipe doesn't make use of item (or user) metadata sets - and create a new solution and solution version:

In [None]:
metadata_recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn-metadata"

create_solution_response = personalize.create_solution(
    name=f"{os.environ['CF_STACK_NAME']}-soln-metadata",
    datasetGroupArn=dataset_group_arn,
    recipeArn=metadata_recipe_arn,
)

metadata_solution_arn = create_solution_response["solutionArn"]
%store metadata_solution_arn
print(json.dumps(create_solution_response, indent=2))

In [None]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn=metadata_solution_arn
)

metadata_solution_version_arn = create_solution_version_response["solutionVersionArn"]
print(json.dumps(create_solution_version_response, indent=2))

In [None]:
# Or if you'd like to resume with an existing solution version ARN taken from the console:
#metadata_solution_version_arn = "arn:aws:personalize:us-east-1:387269085412:solution/thewsfooda-soln-metadata/68311135"

As before, we then just need to wait until our solution version is active (finished training): Then we'll be able to view its offline metrics.

In [None]:
def solution_version_is_ready(description):
    status = description["solutionVersion"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for solution version {metadata_solution_version_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_solution_version(solutionVersionArn=metadata_solution_version_arn),
    solution_version_is_ready,
    fn_stringify_result = lambda desc: desc["solutionVersion"]["status"],
    timeout_secs=3*60*60,
)

In [None]:
solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn=metadata_solution_version_arn
)

print(json.dumps(solution_metrics_response, indent=2))

These metrics (either here through the API, or as shown in the Amazon Personalize console) can help us compare the metadata-aware solution with our first-cut plain HRNN model.

In our example (on the "Grocery and Gourmet Food" category), metric differences between metadata-enabled and plain HRNN recipes were pretty marginal.

**Using Method 1** (Category data pulled through from source), we saw:

- Exactly equal Precision@5 = 0.0144, and a nominally small uplift in nDCG@5 from 0.0662 to 0.0665.
- However, the metadata model was able to deliver this near-identical result relevance with an uplift of coverage from 0.1706 to 0.2345
- This means that a bigger proportion (23%) of our product catalogue appeared in recommendations without sacrificing relevance, which we might generally see as a positive thing.

**Using Method 2** (Category data from Rekognition product image analysis), we saw:

- Significantly noisier "CATEGORY" annotations, which is to be expected as we're using a general-purpose computer vision model on images of unknown quality, rather than taking the curated category labels from Amazon.com! E.g. for B00006FWVX `Food|Sweets|Confectionery` versus `Grocery & Gourmet Food|Cooking & Baking|Food Coloring`
- ...But interestingly, some performance metrics were a little improved! Precision@5 up to 0.0146, nDCG@5 at 0.0707. Given the small size of these changes, they could easily be simple noise indicating that the metadata is not significantly contributing to these metrics.
- Coverage at 0.2034 was higher than the base model, but not as good as method 1: This is more in line with our expectation that some kind of generalization power was derived from the extra data, but the quality of signal was lower in method 2 than method 1.

In general it's possible to enrich Amazon Personalize datasets with metadata, and possible to derive this metadata from other ML and AI models where appropriate: But we need to keep in mind that:

- Only **high-quality, useful fields** (that tell us information beyond what can already be seen from the correlations between items) are likely to cause big changes in our precision/relevance.
- These offline metrics are only **part of the story**, and the expanded model may still deliver useful results in production.

### Updating the campaign

There might be a few different ways we'd like to put our new solution version live to see real-world performance: For example by deploying to a separate "canary" campaign firts which only receives a subset of traffic... In this simple example though, we'll just update our existing campaign to point at the new solution version:

In [None]:
update_campaign_response = personalize.update_campaign(
    campaignArn=hrnn_campaign_arn,
    solutionVersionArn=metadata_solution_version_arn,
    minProvisionedTPS=1,
)
print(json.dumps(update_campaign_response, indent=2))

In [None]:
def campaign_is_ready(description):
    status = description["campaign"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for campaign {hrnn_campaign_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_campaign(campaignArn=hrnn_campaign_arn),
    campaign_is_ready,
    fn_stringify_result = lambda desc: desc["campaign"]["status"],
    timeout_secs=20*60,
)

...and that's it! We:

- Added some metadata fields on our **interactions** and **items** datasets to help Personalize understand correlations and generalize behaviours
- Trained a new solution version using a metadata-aware recipe
- Updated our deployed HRNN campaign to point at the new metadata-enriched model

## Hyperparameter Tuning

Another way to optimize model performance is to tune the model **hyperparameters** while keeping the dataset the same.

Amazon Personalize draws a distinction between [**hyperparameter optimization**](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html) (tuning the parameters of a recipe to optimize performance) and [**AutoML**](https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html) (also trying *multiple* recipes to see which one can perform best). However as discussed already the different Personalize recipes often **solve different use cases**: So we'll focus on HPO here as our primary interest in performance optimization, and ignore AutoML.

To show HPO in action, we'll train another new solution using another new recipe: **HRNN-Coldstart**. HRNN-Coldstart works similarly to HRNN-Metadata (i.e. it's an HRNN-based recipe which is aware of the user and item metadata sets), but gives additional options to *force recommendation of new (unseen) items*.

HRNN-Coldstart provides a range of [HPO-tunable parameters](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-hrnn-coldstart.html), which if we wanted we could [define explicit search ranges for](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html) in the HPO request: But here we'll just trigger a search over the default hyperparameter ranges:

In [None]:
coldstart_recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn-coldstart"

create_solution_response = personalize.create_solution(
    name=f"{os.environ['CF_STACK_NAME']}-soln-hpo",
    datasetGroupArn=dataset_group_arn,
    performHPO=True,
    performAutoML=False,  # As discussed above
    recipeArn=coldstart_recipe_arn,
    # TODO: Commentary on extra config
    solutionConfig={
        "featureTransformationParameters": {
            "cold_start_max_duration": "5.0",
            "cold_start_max_interactions": "15",
            "cold_start_relative_from": "latestItem",
        },
    },
)

hpo_solution_arn = create_solution_response["solutionArn"]
%store hpo_solution_arn
print(json.dumps(create_solution_response, indent=2))

Because HPO (and AutoML) are simply additional parameters in the solution specification - the process from here of training and deploying a solution is just as we've seen already!

In [None]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn=hpo_solution_arn
)

hpo_solution_version_arn = create_solution_version_response["solutionVersionArn"]
print(json.dumps(create_solution_version_response, indent=2))

In [None]:
print(f"Waiting for solution version {hpo_solution_version_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_solution_version(solutionVersionArn=hpo_solution_version_arn),
    solution_version_is_ready,
    fn_stringify_result = lambda desc: desc["solutionVersion"]["status"],
    timeout_secs=3*60*60,
)

In [None]:
update_campaign_response = personalize.update_campaign(
    campaignArn=hrnn_campaign_arn,
    solutionVersionArn=hpo_solution_version_arn,
    minProvisionedTPS=1,
)
print(json.dumps(update_campaign_response, indent=2))

In [None]:
print(f"Waiting for campaign {hrnn_campaign_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_campaign(campaignArn=hrnn_campaign_arn),
    campaign_is_ready,
    fn_stringify_result = lambda desc: desc["campaign"]["status"],
    timeout_secs=20*60,
)

## Recap

In this series of notebooks we set up 3 different types of real-time recommendation engine on our "AllStore" website, and started to investigate performance optimization via metadata and hyperparameter tuning.

While offline metrics (like coverage, precision @ N and nCDG reported in the Amazon Personalize console) are useful for drawing relative comparisons between models, the strong feedback loop between recommendation engines' output and user behaviour means the best way to understand the value of these solutions is via production pilot testing against measurable business outcomes!

We hope these worked examples have helped you understand how to explore recommendation engines with Amazon Personalize, and see what the service can do for you.