# Interacting with Campaigns <a class="anchor" id="top"></a>

In this notebook, you will deploy and interact with campaigns in Amazon Personalize.

1. [Introduction](#intro)
1. [Create campaigns](#create)
1. [Interact with campaigns](#interact)
1. [Batch recommendations](#batch)
1. [Wrap up](#wrapup)

## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

At this point, you should have several solutions and at least one solution version for each. Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.

You should also have deployed a campaign for each of these solution versions. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. 

As you with your customer on Amazon Personalize, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.

To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK.

In [1]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random

import boto3
import pandas as pd

In [2]:
%store -r

In [3]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's event streaming
personalize_events = boto3.client(service_name='personalize-events')

## Interact with campaigns <a class="anchor" id="interact"></a>
[Back to top](#top)

Now that all campaigns are deployed and active, we can start to get recommendations via an API call. Each of the campaigns is based on a different recipe, which behave in slightly different ways because they serve different use cases. We will cover the campaigns in a different order than used in previous notebooks, in order to deal with the possible complexities in ascending order (i.e. simplest first).

First, let's create a supporting function to help make sense of the results returned by a Personalize campaign. Personalize returns only an `item_id`. This is great for keeping data compact, but it means you need to query a database or lookup table to get a human-readable result for the notebooks. We will create a helper function to return a human-readable result from the Movielens dataset.

Start by loading in the dataset which we can use for our lookup table.

In [86]:
# Create a dataframe for the items by reading in the correct source CSV
items_df = pd.read_csv(data_dir + 'item-meta.csv')
items_df = items_df.set_index('ITEM_ID')

# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,net_votes,TIMESTAMP,Humor,Quotes,Motivation,Inspiration,American_History,Military_History,Leadership,Donald_Trump,World_History,Joe_Biden
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
6621543,-1,1609459467000000000,0,0,0,0,0,0,1,0,0,0
6621544,0,1609459473000000000,0,0,0,0,0,0,1,0,0,0
6621546,2,1609459529000000000,0,0,0,0,0,0,0,0,0,0
6621547,0,1609459534000000000,0,0,0,0,0,0,1,0,0,0
6621548,2,1609459551000000000,0,0,0,0,0,0,0,0,0,0


By defining the ID column as the index column it is trivial to return a movie by just querying the ID. Movie #589 should be Terminator 2: Judgment Day.

In [87]:
items_df.shape

(387721, 12)

That isn't terrible, but it would get messy to repeat this everywhere in our code, so the function below will clean that up.

In [92]:
def get_item_by_id(df, id):
    if id in df.index:
        rval = df.loc[id]
    else:
        rval = f'id {id} was not found'
    return rval
    

In [105]:
# get a random ITEM-ID
ran_items = items_df.sample(3).index.tolist()
print(ran_items)

[6917734, 7157431, 6890235]


In [106]:
for item in ran_items:
    print(get_item_by_id(items_df, item))

net_votes                             7
TIMESTAMP           1619009511000000000
Humor                                 0
Quotes                                0
Motivation                            0
Inspiration                           0
American_History                      0
Military_History                      1
Leadership                            0
Donald_Trump                          0
World_History                         1
Joe_Biden                             0
Name: 6917734, dtype: int64
net_votes                            16
TIMESTAMP           1628085306000000000
Humor                                 0
Quotes                                0
Motivation                            0
Inspiration                           0
American_History                      0
Military_History                      0
Leadership                            0
Donald_Trump                          0
World_History                         0
Joe_Biden                             0
Name: 715743

### SIMS

SIMS requires just an item as input, and it will return items which users interact with in similar ways to their interaction with the input item. In this particular case the item is a movie. 

The cells below will handle getting recommendations from SIMS and rendering the results. Let's see what the recommendations are for the first item we looked at earlier in this notebook (Terminator 2: Judgment Day).

In [107]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = sims_campaign_arn,
    itemId = str(6917734),
)

In [108]:
item_list = get_recommendations_response['itemList']
print(item_list)

[{'itemId': '7110928'}, {'itemId': '6865970'}, {'itemId': '4392742'}, {'itemId': '70679'}, {'itemId': '6215594'}, {'itemId': '6810191'}, {'itemId': '4552036'}, {'itemId': '2997196'}, {'itemId': '1552905'}, {'itemId': '2684943'}, {'itemId': '6781635'}, {'itemId': '1109234'}, {'itemId': '1825081'}, {'itemId': '1219333'}, {'itemId': '7187056'}, {'itemId': '1982912'}, {'itemId': '6363786'}, {'itemId': '6846397'}, {'itemId': '4715403'}, {'itemId': '1158511'}, {'itemId': '2311319'}, {'itemId': '3028271'}, {'itemId': '6620550'}, {'itemId': '7162930'}, {'itemId': '6161281'}]


In [109]:
for item in item_list:
    print(get_item_by_id(items_df, int(item['itemId'])))
    

net_votes                           186
TIMESTAMP           1626393170000000000
Humor                                 0
Quotes                                0
Motivation                            0
Inspiration                           0
American_History                      0
Military_History                      0
Leadership                            0
Donald_Trump                          0
World_History                         0
Joe_Biden                             0
Name: 7110928, dtype: int64
net_votes                            46
TIMESTAMP           1617129538000000000
Humor                                 0
Quotes                                0
Motivation                            0
Inspiration                           0
American_History                      0
Military_History                      0
Leadership                            0
Donald_Trump                          0
World_History                         0
Joe_Biden                             0
Name: 686597

Congrats, this is your first list of recommendations! This list is fine, but it would be better to see the recommendations for similar movies render in a nice dataframe. Again, let's create a helper function to achieve this.

Now, let's test the helper function with several different movies. Let's sample some data from our dataset to test our SIMS campaign. Grab 5 random movies from our dataframe.

Note: We are going to show similar titles, so you may want to re-run the sample until you recognize some of the movies listed

### User Personalization

HRNN is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. 

Since HRNN relies on having a sampling of users, let's load the data we need for that and select 3 random users. Since Movielens does not include user data, we will select 3 random numbers from the range of user id's in the dataset.

Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.

Again, we create a helper function to render the results in a nice dataframe.

#### API call results

In [110]:
# Create a dataframe for the items by reading in the correct source CSV
users_df = pd.read_csv(data_dir + 'user-meta.csv')
users_df = users_df.set_index('USER_ID')

# Render some sample data
users_df.head(5)

Unnamed: 0_level_0,marital_status,age,gender,rank,TIMESTAMP
USER_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
604,Married,44,Male,LTC,1344606156000000000
605,Married,46,Male,CPT,1344609187000000000
607,Married,39,Male,Capt,1344622316000000000
610,Married,40,Male,LTC,1344633409000000000
619,Married,38,Male,Lt Col,1344795354000000000


In [111]:
ran_users = users_df.sample(3).index.tolist()
print(ran_users)

[208965, 820186, 1868667]


In [114]:
user_id = 1868667

get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = userpersonalization_campaign_arn,
    userId = str(user_id),
)

item_list = get_recommendations_response['itemList']
for item in item_list:
    print(get_item_by_id(items_df, item['itemId']))


id 6215594 was not found
id 4392742 was not found
id 1825081 was not found
id 2684943 was not found
id 1158511 was not found
id 70679 was not found
id 1219333 was not found
id 6620550 was not found
id 1109234 was not found
id 2997196 was not found
id 1982912 was not found
id 6810191 was not found
id 4552036 was not found
id 2311319 was not found
id 6846397 was not found
id 6363786 was not found
id 6272032 was not found
id 3028271 was not found
id 4556332 was not found
id 6161281 was not found
id 2662304 was not found
id 1552905 was not found
id 4724787 was not found
id 6747302 was not found
id 6852494 was not found


Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.

# Real-time Events

The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.

Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.

We are not doing real-time events for this session.


### Personalized Ranking

The core use case for personalized ranking is to take a collection of items and to render them in priority or probable order of interest for a user. For a VOD application you want dynamically render a personalized shelf/rail/carousel based on some information (director, location, superhero franchise, movie time period etc). This may not be information that you have in your metadata, so a item metadata filter will not work, howeverr you may have this information within you system to generate the item list. 

To demonstrate this, we will use the same user from before and a random collection of items.

In [116]:
rerank_user = user_id
rerank_items = items_df.sample(25).index.tolist()

Now build a nice dataframe that shows the input data.

In [117]:
rerank_list = []
for item in rerank_items:
    post = get_item_by_id(items_df, item)
    rerank_list.append(post)
rerank_df = pd.DataFrame(rerank_list, columns = ['Un-Ranked'])
rerank_df

Unnamed: 0,Un-Ranked
6929500,
6862717,
7170470,
6926118,
6899673,
6864269,
6901798,
7135468,
6723949,
7085444,


Then make the personalized ranking API call.

In [118]:
# Convert user to string:
user_id = str(rerank_user)
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))
    
# Get recommended reranking
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = user_id,
        inputList = rerank_item_list
)

Now add the reranked items as a second column to the original dataframe, for a side-by-side comparison.

In [120]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    post = get_item_by_id(items_df, item['itemId'])
    ranked_list.append(post)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

Unnamed: 0,Un-Ranked,Re-Ranked
0,,id 6723949 was not found
1,,id 6645958 was not found
2,,id 7085444 was not found
3,,id 6926118 was not found
4,,id 6862717 was not found
5,,id 6864269 was not found
6,,id 6899673 was not found
7,,id 7048347 was not found
8,,id 7099654 was not found
9,,id 7146544 was not found


You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user, a list of promotions for example.

## Batch recommendations <a class="anchor" id="batch"></a>
[Back to top](#top)

There are many cases where you may want to have a larger dataset of exported recommendations. Recently, Amazon Personalize launched batch recommendations as a way to export a collection of recommendations to S3. In this example, we will walk through how to do this for the HRNN solution. For more information about batch recommendations, please see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html). This feature applies to all recipes, but the output format will vary.

A simple implementation looks like this:

```python
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_inference_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Batch job name",
    roleArn = "IAM role ARN",
    jobInput = 
       {"s3DataSource": {"path": <S3 input path>}},
    jobOutput = 
       {"s3DataDestination": {"path": <S3 output path>}}
)
```

The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.

Starting with the input for HRNN, it looks like:


```JSON
{"userId": "4638"}
{"userId": "663"}
{"userId": "3384"}
```

This should yield an output that looks like this:

```JSON
{"input":{"userId":"4638"}, "output": {"recommendedItems": ["296", "1", "260", "318"]}}
{"input":{"userId":"663"}, "output": {"recommendedItems": ["1393", "3793", "2701", "3826"]}}
{"input":{"userId":"3384"}, "output": {"recommendedItems": ["8368", "5989", "40815", "48780"]}}
```

The output is a JSON Lines file. It consists of individual JSON objects, one per line. So we will need to put in more work later to digest the results in this format.

### Building the input file

When you are using the batch feature, you specify the users that you'd like to receive recommendations for when the job has completed. The cell below will again select a few random users and will then build the file and save it to disk. From there, you will upload it to S3 to use in the API call later.

In [123]:
# We will use the same users from before
print(ran_users)

# Write the file to disk
json_input_filename = "json_input.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for user_id in ran_users:
        json_input.write('{"userId": "' + str(user_id) + '"}\n')

[208965, 820186, 1868667]


In [124]:
# Showcase the input file:
!cat $data_dir"/"$json_input_filename

{"userId": "208965"}
{"userId": "820186"}
{"userId": "1868667"}


Upload the file to S3 and save the path as a variable for later.

In [131]:
# Upload files to S3
s3input_path = f's3://{s3bucket}/{s3prefix}/{json_input_filename}'
!aws s3 cp $data_dir/$json_input_filename $s3input_path


upload: data/json_input.json to s3://am-tmp2/rallypoint/json_input.json


Batch recommendations read the input from the file we've uploaded to S3. Similarly, batch recommendations will save the output to file in S3. So we define the output path where the results should be saved.

In [134]:
# Define the output path
s3output_path = f's3://{s3bucket}/{s3prefix}/results/'
print(s3output_path)

s3://am-tmp2/rallypoint/results/


Now just make the call to kick off the batch export process.

In [135]:
batchInferenceJobArn = personalize.create_batch_inference_job (
    solutionVersionArn = userpersonalization_solution_version_arn,
    jobName = "VOD-POC-Batch-Inference-Job-UserPersonalization_" + str(round(time.time()*1000)),
    roleArn = role_arn,
    jobInput = 
     {"s3DataSource": {"path": s3input_path}},
    jobOutput = 
     {"s3DataDestination":{"path": s3output_path}}
)
batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']

Run the while loop below to track the status of the batch recommendation call. This can take around 30 minutes to complete, because Personalize needs to stand up the infrastructure to perform the task. We are testing the feature with a dataset of only 3 users, which is not an efficient use of this mechanism. Normally, you would only use this feature for bulk processing, in which case the efficiencies will become clear.

In [136]:
current_time = datetime.now()
print("Import Started on: ", current_time.strftime("%I:%M:%S %p"))

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(
        batchInferenceJobArn = batchInferenceJobArn
    )
    status = describe_dataset_inference_job_response["batchInferenceJob"]['status']
    print("DatasetInferenceJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
current_time = datetime.now()
print("Import Completed on: ", current_time.strftime("%I:%M:%S %p"))

Import Started on:  08:57:11 PM
DatasetInferenceJob: CREATE PENDING
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: ACTIVE
Import Completed on:  09:11:12 PM


In [140]:
s3 = boto3.client('s3')
export_name = f'{json_input_filename}.out'
s3export_name = f'{s3prefix}/results/{export_name}'

s3.download_file(s3bucket, s3export_name, f'{data_dir}/{export_name}')

In [142]:
# Update DF rendering
pd.set_option('display.max_rows', 30)
with open(data_dir+"/"+export_name) as json_file:
    # Get the first line and parse it
    line = json.loads(json_file.readline())
    # Do the same for the other lines
    while line:
        # extract the user ID 
        col_header = "User: " + line['input']['userId']
        # Create a list for all the artists
        recommendation_list = []
        # Add all the entries
        for item in line['output']['recommendedItems']:
            post = get_item_by_id(items_df, item)
            recommendation_list.append(post)
        if 'bulk_recommendations_df' in locals():
            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])
            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)
        else:
            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])
        try:
            line = json.loads(json_file.readline())
        except:
            line = None
bulk_recommendations_df

Unnamed: 0,User: 820186,User: 1868667,User: 208965
0,id 7162930 was not found,id 6215594 was not found,id 6215594 was not found
1,id 6643074 was not found,id 4392742 was not found,id 4392742 was not found
2,id 2994788 was not found,id 1825081 was not found,id 1109234 was not found
3,id 6165220 was not found,id 2684943 was not found,id 70679 was not found
4,id 7189043 was not found,id 1158511 was not found,id 2684943 was not found
5,id 7175586 was not found,id 70679 was not found,id 1825081 was not found
6,id 168585 was not found,id 1219333 was not found,id 2997196 was not found
7,id 2028006 was not found,id 6620550 was not found,id 6620550 was not found
8,id 1982912 was not found,id 1109234 was not found,id 4552036 was not found
9,id 4249254 was not found,id 2997196 was not found,id 1219333 was not found


## Wrap up <a class="anchor" id="wrapup"></a>
[Back to top](#top)

With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate customer data to better integrate with the service, and a knowledge of how to do all this over APIs and by leveraging open source data science tools.

Use these notebooks as a guide to getting started with your customers for POCs. As you find missing components, or discover new approaches, make a pull request and provide any additional helpful components that may be missing from this collection.

You'll want to make sure that you clean up all of the resources deployed during this POC. We have provided a separate notebook which shows you how to identify and delete the resources in `cleanup.ipynb`.