# Building Our First Recommendation Campaign

CloudFormation has created the infrastructure of our website for us (see the URL in the CloudFormation 'stack outputs' to visit it in your browser), but now it's up to us to enrich it with personalized recommendations!

This notebook will walk you through the steps to build an initial recommendation model to rank the products on the homepage, so it's more relevant to logged-in users.

In further notebooks, we'll add other kinds of recommendation models to the website and enrich our base models with metadata.

## About Python notebooks (if you're new to them)

"Notebooks" like this break code up into **cells**, alongside formatted comment cells like this one.

You don't *have* to use Amazon Personalize via notebooks or via Python - we're just using it as a way to interactively show the steps.

To run a cell you can press the triangular `Run` button at the top of the page - or press `Shift+Enter` while you have the cell selected.

The order in which cells ran is shown in square brackets next to the cell. `[ ]:` means the cell hasn't been run yet, and `[*]:` means it's currently running. `[1]:` means it ran first, and so on. Only one cell executes at a time, so if you start multiple they'll queue up and run in requested order.

Simply follow the instructions below and execute the code cells to get started.

## Understanding the stack

Let's review what's already been done for us, and what we'll need to do to get recommendations live:

- A **staging bucket** has been created in S3 for us to use for data preparation
- The **input files** provided during stack creation (`ProductSource`, `UserSource`, `InteractionSource`) have been copied to the bucket (in the `/raw` folder prefix)
- The **products** have been loaded into the **products table** of DynamoDB (and from there automatically synced to the site's Elasticsearch cluster)
    - ...So products are viewable and searchable in the website.
- The **users** have been loaded into the **users table** of DynamoDB
    - ...Populating the 'log in as' dropdown menu
- Three **Lambda functions** (plus several others) have been published as API endpoints to connect recommendations to the app:
    - `GetRecommendations`, for core/homepage recommendations by user
    - `GetRecommendationsByItem`, for personalized recommendations per user in the context of an item
    - `Rerank`, to re-rank the relevance of a set of candidate items to suggest to a user
- Another **Lambda function** has been published to *push live click-stream data back in* to Personalize recommendation campaigns.

We've already populated some helpful environment variables to navigate the stack, via this notebook's [lifecycle configuration script](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html):

In [None]:
%load_ext autoreload
%autoreload 2

import os
for envvar in [
    "CF_STACK_NAME",
    "LAMBDA_GETITEMRECS_ARN",
    "LAMBDA_GETRECS_ARN",
    "LAMBDA_RERANK_ARN",
    "LAMBDA_POSTCLICK_ARN",
    "STAGING_BUCKET"
]:
    print(f"{envvar} = {os.environ[envvar]}")


**To add recommendations to the website**, what we need to do is:

1. **Prepare** the data in the correct format in S3
2. **Train** models and **deploy** them to "campaigns" in Amazon Personalize
3. **Update** the environment variable configurations of these Lambda functions, to point them at Personalize.

## Connecting to AWS and managing access

Since this notebook is running in an Amazon SageMaker notebook instance created by the CloudFormation stack, we've already been set up with the access we need to call relevant AWS services - with no need to provide credentials:

In [None]:
import boto3  # (The AWS SDK for Python)

personalize = boto3.client("personalize")  # The administrative API e.g. *training* models
personalize_runtime = boto3.client("personalize-runtime")  # The runtime API e.g. *using* models
s3 = boto3.resource("s3")  # Cloud object storage (for our data!)

# We'll quickly grab our AWS Account ID while we're here, which might be needed later:
account_id = boto3.client("sts").get_caller_identity()["Account"]

...However (since CloudFormation doesn't set up any Personalize stuff for us), we'll still need to **grant the Personalize service access to our staging bucket**.

We do this in two steps;

1. Creating a **Role** in the Identity & Access Management (IAM) service which has the required permissions and which we allow Amazon Personalize to *assume* when running jobs:
2. Applying a **Bucket Policy** in Amazon S3 to grant the access from the S3 side.

### Creating the role:

In [None]:
import json
import time

from botocore import exceptions as botoexceptions

iam = boto3.client("iam")

role_name = f"{os.environ['CF_STACK_NAME']}-PersonalizeRole"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [{
        # Only Amazon Personalize service may assume this role:
        "Effect": "Allow",
        "Principal": { "Service": "personalize.amazonaws.com" },
        "Action": "sts:AssumeRole"
    }]
}

try:
    create_role_response = iam.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps(assume_role_policy_document)
    )
    
    # We'll use a custom, inline policy tweaking the AWS-managed "AmazonPersonalizeFullAccess" policy to
    # be more specific about which S3 bucket can be accessed (only our staging bucket).
    iam.put_role_policy(
        RoleName=role_name,
        PolicyName="InlinePolicy",
        PolicyDocument=json.dumps({
            "Version": "2012-10-17",
            "Statement": [
                {
                    # Any action on any Personalize resources is permitted (quite loose!):
                    "Effect": "Allow",
                    "Action": ["personalize:*"],
                    "Resource": "*",
                },
                {
                    # Allow Personalize jobs/etc to log metrics in CloudWatch:
                    "Effect": "Allow",
                    "Action": [
                        "cloudwatch:PutMetricData",
                        "cloudwatch:ListMetrics",
                    ],
                    "Resource": "*",
                },
                {
                    # Grant access specifically to our S3 Staging bucket
                    "Effect": "Allow",
                    "Action": [
                        "s3:GetObject",
                        "s3:PutObject",
                        "s3:DeleteObject",
                        "s3:ListBucket",
                    ],
                    "Resource": [
                        f"arn:aws:s3:::{os.environ['STAGING_BUCKET']}",
                        f"arn:aws:s3:::{os.environ['STAGING_BUCKET']}/*",
                    ],
                },
                {
                    # Allow passing roles to Personalize service:
                    "Effect": "Allow",
                    "Action": ["iam:PassRole"],
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {
                            "iam:PassedToService": "personalize.amazonaws.com"
                        }
                    }
                }
            ]
        })
    )

    print("Waiting to allow new IAM role policy to propagate...")
    time.sleep(20)
    role_arn = create_role_response["Role"]["Arn"]
except botoexceptions.ClientError as err:
    if err.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Using pre-existing role")
        role_arn = f"arn:aws:iam::{account_id}:role/{role_name}"
    else:  # Some other problem
        raise err

print(role_arn)
%store role_arn

### Applying the Bucket Policy:

In [None]:
bucket_policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": { "Service": "personalize.amazonaws.com" },
            "Action": [ "s3:GetObject", "s3:ListBucket" ],
            "Resource": [
                f"arn:aws:s3:::{os.environ['STAGING_BUCKET']}",
                f"arn:aws:s3:::{os.environ['STAGING_BUCKET']}/*"
            ]
        }
    ]
}

s3.BucketPolicy(os.environ["STAGING_BUCKET"]).put(Policy=json.dumps(bucket_policy))

## Setting up the project: Create a Dataset Group

The highest level construct in Amazon Personalize is the **dataset group**. You can think of it like a project: a container into which we'll put *up to 1 of each type of dataset* supported by the service, which we can then use to train some models and deploy some campaigns.


In [None]:
try:
    create_dataset_group_response = personalize.create_dataset_group(
        name=f"{os.environ['CF_STACK_NAME']}-dataset-group"
    )

    dataset_group_arn = create_dataset_group_response["datasetGroupArn"]
    print(json.dumps(create_dataset_group_response, indent=2))

except botoexceptions.ClientError as err:
    # If the dataset group already exists, we'll just use the existing:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        print("Dataset Group already exists - scraping ARN from error message")
        msg = err.response["Error"]["Message"]
        dataset_group_arn = msg[msg.index("arn:aws:personalize"):].partition(" ")[0]
        description = personalize.describe_dataset_group(datasetGroupArn=dataset_group_arn)
        print(description)
    else:  # Some other problem
        raise err

%store dataset_group_arn

In [None]:
# Or attach to existing by name:
# dataset_group_arn = next(
#     group["datasetGroupArn"]
#     for group in personalize.list_dataset_groups()["datasetGroups"]
#     if group["name"] == # TODO
# )
# print(dataset_group_arn)
# %store dataset_group_arn

## Exploring the data

Let's download the raw data files from S3 and see what we've got.


In [None]:
!aws s3 sync --quiet s3://$STAGING_BUCKET/raw ./data/raw
!ls -lhR ./data/raw

# Ensure empty subfolders in case any datasets were not supplied to CloudFormation:
os.makedirs("data/raw/interactions", exist_ok=True)
os.makedirs("data/raw/items", exist_ok=True)
os.makedirs("data/raw/users", exist_ok=True)

This stack is designed to take load data from the public [UCSD Amazon Reviews dataset](https://nijianmo.github.io/amazon/index.html) for `interactions` and `items` data, and takes custom dummy data for `users`.

We've supplied some utilities in the `util` folder to transparently handle the UCSD `json.gz` format as well as `csv` - so let's use those pre-built functions to explore the contents of the files, starting with item metadata:

<div class="alert alert-info">
    <p>
        <b>Note:</b> Because the items and interactions datasets are potentially very <b>large</b>, we'll
        perform streaming analyses where possible, rather than loading straight into memory.
    </p>
</div>

We'll use a **utility function** `data_folder_reader()`, which is able to loop over records from all the **CSV** and [**JSON-lines**](http://jsonlines.org/) files in the top level of a folder - optionally compressed e.g. as `.csv.gz` or `.json.gz` files.

You can explore the implementations of this and other processing functions in [util/dataformat.py](util/dataformat.py).

### Items

For the **Items** dataset, we'll:

* Count the number of items listed
* Build a dictionary in memory from item ID to human-readable title (for visualizing recommendations, later)
* Print out one item at random (using [Reservoir Sampling](https://en.wikipedia.org/wiki/Reservoir_sampling)) to get an idea of structure


In [None]:
import random

import util  # Our local utilities including data_folder_reader

item_titles = {}
n_items = 0
sample_item = None

for item in util.dataformat.data_folder_reader("data/raw/items"):
    n_items += 1

    # Reservoir sampling R-algorithm (simple, non-optimal) with k=1:
    if random.randint(1, n_items) <= 1:
        sample_item = item

    item_titles[util.dataformat.get_item_id(item)] = util.dataformat.get_item_title(item)

print(f"\nGot {n_items} items total ({len(item_titles)} unique IDs)")
print("Sample item:")
print(sample_item)

Let's also take a look at some of our item titles:

In [None]:
import itertools
for item_id in itertools.islice(item_titles, 5):
    print(f"{item_id} -> {item_titles[item_id]}")

### Interactions

The core dataset we'll train our initial model on is the **interactions** between users and items.

For now we'll just count the records and print one at random to explore structure. Some extra checks will follow soon:

In [None]:
n_interactions = 0
sample_interaction = None

for interaction in util.dataformat.data_folder_reader("data/raw/interactions"):
    n_interactions += 1

    # Reservoir sampling R-algorithm (simple, non-optimal) with k=1:
    if random.randint(1, n_interactions) <= 1:
        sample_interaction = interaction

print(f"\nGot {n_interactions} interactions total")
print("Sample interaction:")
print(sample_interaction)

### Users

We don't have any user metadata available for the UCSD Amazon Reviews dataset: The pre-prepared "users" datasets just associate a small subset of "interesting" user IDs with some dummy metadata.

We'll take the same streaming approach to explore the set though, in case you want to work with bigger data. We'll build up a [set](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset) of the IDs as we go for efficient querying later of whether a user is recognised.

In [None]:
n_users = 0
sample_user = None
user_set = set()

for user in util.dataformat.data_folder_reader("data/raw/users"):
    n_users += 1
    user_set.add(util.dataformat.get_user_id(user))

    # Reservoir sampling R-algorithm (simple, non-optimal) with k=1:
    if random.randint(1, n_users) <= 1:
        sample_user = user

print(f"\nGot {n_users} users total ({len(user_set)} unique IDs)")
print("Sample user:")
print(sample_user)

### Sanity checks

Now that we've loaded our items, interactions and users, we should do some quick checks that our dataset looks feasible to model with:

In [None]:
from collections import defaultdict

item_interactions = defaultdict(int)  # (dict where unset elements default to 0)
user_interactions = defaultdict(int)

no_items = 0
no_users = 0
unknown_items = 0
unknown_users = 0

print("Analysing interactions...")
for event in util.dataformat.data_folder_reader("data/raw/interactions"):
    item_id = util.dataformat.get_interaction_item_id(event)
    user_id = util.dataformat.get_interaction_user_id(event)

    if item_id is None:
        no_items += 1
    else:
        item_interactions[item_id] += 1
        if item_id not in item_titles:
            unknown_items += 1

    if user_id is None:
        no_users += 1
    else:
        user_interactions[user_id] += 1
        if user_id not in user_set:
            unknown_users += 1

print()
print(f"Could not determine item ID for {no_items} interactions")
print(f"Could not determine user ID for {no_users} interactions")
print(f"Unrecognised item ID (not in items dataset) for {unknown_items} interactions")
print(f"Unrecognised user ID (not in users dataset) for {unknown_users} interactions")
print()

n_items_interacted = sum(map(lambda iid: item_interactions[iid] > 0, item_titles))
print(f"{n_items_interacted} recognised items interacted with")
n_returning_users = sum(map(lambda uid: user_interactions[uid] > 1, user_set))
print(f"{n_returning_users} recognised users with more 2+ interactions")

<div class="alert alert-warning">
    <b>Note:</b> we didn't do any hard/error-raising assertions here, but it's important to check your data
    meets (and, for good results, easily exceeds) the minimum specifications laid out in the
    <a href="https://docs.aws.amazon.com/personalize/latest/dg/limits.html">
        Amazon Personalize Service Limits doc</a>.
</div>

## Setting up a schema

Now we understand our source data, we need to [import it to Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/data-prep.html).

To do this we will:

- Format the data as a comma-separated values (CSV) file
- Provide a schema so that Amazon Personalize can interpret the data correctly
- Upload the CSV to an S3 bucket that Amazon Personalize can access (i.e. our staging bucket)

Each **dataset type** (interactions, user metadata, item metadata) has different [required fields and reserved field names](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

To keep things simple for our first model, we'll do the bare minimum: Just an interactions dataset including just the required fields.

In [None]:
import warnings

interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        { "name": "USER_ID", "type": "string" },
        { "name": "ITEM_ID", "type": "string" },
        { "name": "TIMESTAMP", "type": "long" },
    ],
    "version": "1.0",
}

try:
    create_schema_response = personalize.create_schema(
        name=f"{os.environ['CF_STACK_NAME']}-schema-interactions",
        schema=json.dumps(interactions_schema)
    )

    interactions_schema_arn = create_schema_response["schemaArn"]
    print(json.dumps(create_schema_response, indent=2))

except botoexceptions.ClientError as err:
    # If the schema already exists, scrape the ARN from the message and check it:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn(
            "Schema already exists.\nScraping ARN from error message.\n"
            "To change the existing schema, delete it and re-create it."
        )
        msg = err.response["Error"]["Message"]
        interactions_schema_arn = msg[msg.index("arn:aws:personalize"):].partition(" ")[0]
        description = personalize.describe_schema(schemaArn=interactions_schema_arn)
        print(description)
    else:  # Some other problem
        raise err

## Formatting and uploading the data

Next, we convert our data to a CSV (matching the schema) and upload it to S3:

In [None]:
import csv

interactions_filename = "data/interactions-minimal.csv"
with open(interactions_filename, "w") as f:
    writer = csv.DictWriter(f, dialect="unix", fieldnames=["USER_ID", "ITEM_ID", "TIMESTAMP"])
    writer.writeheader()
    print("Writing interactions...")
    for event in util.dataformat.data_folder_reader("data/raw/interactions"):
        writer.writerow({
            "USER_ID": util.dataformat.get_interaction_user_id(event),
            "ITEM_ID": util.dataformat.get_interaction_item_id(event),
            "TIMESTAMP": util.dataformat.get_interaction_timestamp(event),
        })
    print("Done!")

In [None]:
# Upload to S3:
s3.Object(os.environ["STAGING_BUCKET"], interactions_filename).upload_file(interactions_filename)

## Importing the data to Amazon Personalize

Now our prep is done, let's put Personalize to work!

First, just in case you ran through the last part **super** quickly that our **Dataset Group** is still creating, we'll check and wait for it to become active.

Here (and throughout these notebooks) we'll use the handy local `polling_spinner` function from our [util/](util) folder for a nice animated status during the wait:

In [None]:
def dataset_group_is_ready(description):
    status = description["datasetGroup"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for dataset group {dataset_group_arn}...")
dataset_group_desc = util.progress.polling_spinner(
    # Function to poll (i.e. get the status object):
    lambda: personalize.describe_dataset_group(datasetGroupArn=dataset_group_arn),
    # Function to determine whether the result is "finished":
    dataset_group_is_ready,
    # Function to represent the result in print():
    fn_stringify_result = lambda desc: desc["datasetGroup"]["status"],
    # Raise an error if we're waiting too long:
    timeout_secs=15*60,
)
dataset_group_desc

Next, we **create the interactions dataset** container in the Dataset Group, referencing the schema we created earlier:

In [None]:
try:
    create_dataset_response = personalize.create_dataset(
        name=f"{os.environ['CF_STACK_NAME']}-interactions",
        datasetType="INTERACTIONS",
        datasetGroupArn=dataset_group_arn,
        schemaArn=interactions_schema_arn,
    )

    interactions_dataset_arn = create_dataset_response["datasetArn"]
    print(json.dumps(create_dataset_response, indent=2))

except botoexceptions.ClientError as err:
    # If the schema already exists, infer the ARN and check it:
    if err.response["Error"]["Code"] == "ResourceAlreadyExistsException":
        warnings.warn(
            "Dataset already exists.\n"
            "To change the existing dataset's schema, delete it and re-create it."
        )
        interactions_dataset_arn = (
            dataset_group_arn.replace(":dataset-group/", ":dataset/")
            + "/INTERACTIONS"
        )
        description = personalize.describe_dataset(datasetArn=interactions_dataset_arn)
        print(description)
    else:  # Some other problem
        raise err

Finally, we create a **dataset import job** to validate and import the data to Personalize.

This can take a few minutes to complete even with small datasets (due to infrastructure overheads) but scales well to larger imports.

In [None]:
from datetime import datetime as dt

create_dataset_import_job_response = personalize.create_dataset_import_job(
    # (strftime %f is microseconds, so we trim 3 from the end for milliseconds)
    jobName=f"{os.environ['CF_STACK_NAME']}-interact-{dt.now().strftime('%Y-%m-%d-%H-%M-%S-%f')[:-3]}",
    datasetArn=interactions_dataset_arn,
    dataSource={
        "dataLocation": f"s3://{os.environ['STAGING_BUCKET']}/{interactions_filename}"
    },
    roleArn=role_arn,  # Remember the IAM role we created earlier?
)

interactions_dataset_import_job_arn = create_dataset_import_job_response["datasetImportJobArn"]
print(json.dumps(create_dataset_import_job_response, indent=2))

In [None]:
def dataset_import_is_done(description):
    status = description["datasetImportJob"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for dataset import {interactions_dataset_import_job_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_dataset_import_job(datasetImportJobArn=interactions_dataset_import_job_arn),
    dataset_import_is_done,
    fn_stringify_result = lambda desc: desc["datasetImportJob"]["status"],
    timeout_secs=60*60,
)

## Training a model (a "Solution Version")

Now our data is imported, we can start training models. A trained model in Amazon Personalize is called a **Solution**, or more precisely a **Solution Version** - since models can be re-trained with new data, and a version history will be kept.

The **type (or architecture)** of model is determined by the **recipe** we select, so let's start by reviewing what's available:

In [None]:
personalize.list_recipes()

While some of these recipes might be comparable to each other, in many cases it's just a case of choosing the right tool for the right kind of use case:

* `Popularity-Count` is just a simple **baseline** model that recommends the most popular items.
* `SIMS` recommends items in the **context of both a user and an item** (e.g. 'customers also bought').
* `Personalized-Ranking` **re-ranks** a given list of candidate items, to reflect their relevance to the user.
* `HRNN`-based models recommend items relevant to a user.

For this first model we're just generating personalized homepage recommendations - so we'll choose `HRNN`. We haven't imported any extra metadata, so we'll be using vanilla HRNN rather than the more sophisticated HRNN-based recipes.

### Creating the solution

In [None]:
hrnn_arn = "arn:aws:personalize:::recipe/aws-hrnn"

create_solution_response = personalize.create_solution(
    name=f"{os.environ['CF_STACK_NAME']}-soln-hrnn",
    datasetGroupArn=dataset_group_arn,
    recipeArn=hrnn_arn
)

hrnn_solution_arn = create_solution_response["solutionArn"]
%store hrnn_solution_arn
print(json.dumps(create_solution_response, indent=2))

### Creating the solution version (training a model)

Because creating the solution version trains the model, this next step will take 20 minutes or more.

As before, the second cell polls the status until it completes or fails.

In [None]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn=hrnn_solution_arn
)

hrnn_solution_version_arn = create_solution_version_response["solutionVersionArn"]
print(json.dumps(create_solution_version_response, indent=2))

In [None]:
# Or if you'd like to resume with an existing solution version ARN taken from the console:
#hrnn_solution_version_arn = "arn:aws:personalize:us-east-1:387269085412:solution/thewsfooda-soln-hrnn/68311135"

In [None]:
def solution_version_is_ready(description):
    status = description["solutionVersion"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for solution version {hrnn_solution_version_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_solution_version(solutionVersionArn=hrnn_solution_version_arn),
    solution_version_is_ready,
    fn_stringify_result = lambda desc: desc["solutionVersion"]["status"],
    timeout_secs=3*60*60,
)

## Reviewing Solution Metrics

As part of the training process, Personalize calculates a range of validation metrics describing the solution's performance on the provided data.

These are useful for comparing candidate models offline (e.g. for tuning hyperparameters), or getting an initial idea of how a deployed solution might perform, but **not as concrete** than the kind of metrics we see in other ML applications.

This is because of the **interaction** between a deployed recommendation model and users' behavior: The items a user sees influence their decisions, so actual click-through-rate or revenue uplift can be quite different from model metrics.

In general these metrics are a good guide to understand candidate models, to be followed by live A/B testing cycles to measure their true impact. For more information on how the metrics are calculated and how to interpret them, see the ["Evaluating a Solution Version"](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) page in the Personalize Developer Guide.

In [None]:
solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn=hrnn_solution_version_arn
)

print(json.dumps(solution_metrics_response, indent=2))

## Deploying the model (to a "Campaign")

Amazon Personalize supports both batch and real-time recommendations, but for this website we'll be generating recommendations in real-time so we can respond dynamically to user feedback.

We deploy our trained solution version by creating a **campaign**.

Note that the campaign is billable for all the time it's deployed (regardless of requests). Here we provision the minimum capacity: There's more information about how capacity and auto-scaling work in the [CreateCampaign API doc](https://docs.aws.amazon.com/personalize/latest/dg/API_CreateCampaign.html).

Again, we'll wait for the deployment to become active before testing it out:

In [None]:
create_campaign_response = personalize.create_campaign(
    name=f"{os.environ['CF_STACK_NAME']}-camp-hrnn",
    solutionVersionArn=hrnn_solution_version_arn,
    minProvisionedTPS=1,
)

hrnn_campaign_arn = create_campaign_response["campaignArn"]
%store hrnn_campaign_arn
print(json.dumps(create_campaign_response, indent=2))

In [None]:
# Or if you'd like to resume with an existing campaign ARN taken from the console:
# hrnn_campaign_arn = "arn:aws:personalize:us-east-1:387269085412:campaign/thewsfooda-camp-hrnn"
# %store hrnn_campaign_arn

In [None]:
def campaign_is_ready(description):
    status = description["campaign"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(
            f"Wait ended with unexpected status '{status}':\n{description}"
        )
    else:
        return False

print(f"Waiting for campaign {hrnn_campaign_arn}...")
util.progress.polling_spinner(
    lambda: personalize.describe_campaign(campaignArn=hrnn_campaign_arn),
    campaign_is_ready,
    fn_stringify_result = lambda desc: desc["campaign"]["status"],
    timeout_secs=20*60,
)

### Getting sample recommendations

After the campaign is active, we can query it for recommendations.

Let's compare what 2 of our users, and an unknown user, would see:

In [None]:
# Alternatively, if you wanted to explore users for an interactions dataset where you don't have a list:

# candidate_users = []
# for uid, nevents in user_interactions.items():
#     if nevents >= 10 and nevents < 20:
#         candidate_users.append(uid)
#     if len(candidate_users) >= 40:
#         break

# ...and edit the test_users= line below

In [None]:
import pandas as pd

test_users = list(itertools.islice(user_set, 2)) + ["some-new-person"]
user_recs = []

for user_id in test_users:
    recs_response = personalize_runtime.get_recommendations(
        campaignArn=hrnn_campaign_arn,
        userId=str(user_id),
    )

    # Extract the item IDs from the response:
    recommended_ids = [item["itemId"] for item in recs_response["itemList"]]
    # Remember we created an `items` dict from item ID to tile earlier?
    recommended_titles = [item_titles[item_id] for item_id in recommended_ids]
    user_recs.append(pd.Series(recommended_titles, name=user_id))

pd.set_option("display.max_colwidth", 80)  # Increase a bit vs default
pd.concat(user_recs, axis=1).head(10)

<div class="alert alert-info">
    <p>
        <b>Note:</b> If you see the <i>same recommendations</i> for these users, the IDs in your
        <span class="code">users</span> file probably don't feature in your 
        <span class="code">interactions</span> data.
    </p>
    <p>
        You'll need to check through your interactions data to find some user IDs that do have history, and
        then add them to the <b>users table in DynamoDB</b> to make them appear on the website.
    </p>
</div>


## Enabling recommendations on the website

Now we've trained, deployed and briefly validated our model - let's link the website to it!

Most of the work has already been done for us by the CloudFormation solution. All we need to do is configure the Lambda function handling homepage recommendations requests: pointing at the newly deployed campaign.

In [None]:
# We'll be doing this in a couple of notebooks, so it's packaged in a util function.
# The implementation's not very complicated, feel free to go have a look!

util.lambdafn.update_lambda_envvar(os.environ["LAMBDA_GETRECS_ARN"], "CAMPAIGN_ARN", hrnn_campaign_arn)

## Next steps

The homepage recommendations on our website are now powered by a basic HRNN model.

In the further notebooks, we'll explore:

* Using the **Event Tracker** to push real-time feedback in to Personalize to update recommendations
* Other ways to embed personalized recommendations in our site, with **"Similar Items"** and **Re-ranking** models
* Using **metadata** to improve the relevance of our recommendations


In [None]:
# To disconnect the campaign we created here from the site, you could either delete the environment variable
# through the Lambda function, or uncomment and run the below:

# update_lambda_envvar(os.environ["LAMBDA_GETRECS_ARN"], "CAMPAIGN_ARN", "")
# print("Disabled per-user recommendations")

In [None]:
# Don't forget that campaigns provision inference capacity: they're billable for all the time they're active!
# To dispose of your campaigns after disconnecting them from the website, you can use code something like:
# personalize.delete_campaign(campaignArn=hrnn_campaign_arn)