# Interacting with Campaigns <a class="anchor" id="top"></a>

In this notebook, you will deploy and interact with campaigns in Amazon Personalize.

1. [Introduction](#intro)
1. [Create campaigns](#create)
1. [Interact with campaigns](#interact)
1. [Batch recommendations](#batch)
1. [Wrap up](#wrapup)

## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

At this point, you should have several solutions and at least one solution version for each. Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.

You should also have deployed a campaign for each of these solution versions. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. 

As you with your customer on Amazon Personalize, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.

To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK.

In [1]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random
import boto3
import pandas as pd

In [2]:
%store -r

In [3]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's event streaming
personalize_events = boto3.client(service_name='personalize-events')

## Interact with campaigns <a class="anchor" id="interact"></a>
[Back to top](#top)

Now that all campaigns are deployed and active, we can start to get recommendations via an API call. Each of the campaigns is based on a different recipe, which behave in slightly different ways because they serve different use cases. We will cover the campaigns in a different order than used in previous notebooks, in order to deal with the possible complexities in ascending order (i.e. simplest first).

First, let's create a supporting function to help make sense of the results returned by a Personalize campaign. Personalize returns only an `item_id`. This is great for keeping data compact, but it means you need to query a database or lookup table to get a human-readable result for the notebooks. We will create a helper function to return a human-readable result from the Movielens dataset.

Start by loading in the dataset which we can use for our lookup table.

In [4]:
# Create a dataframe for the items by reading in the correct source CSV
items_df = pd.read_csv(f'{data_dir}/raw/posts.csv')
items_df = items_df.set_index('post_id')


In [5]:
# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,type,ancestry,title,body,active,last_activity_at,profile_id,votes_count,created_at,updated_at,...,comments_count,r_and_c_count,short_group_url,sponsored_post,root_type,best_of_rp,best_of_rp_setter_id,engagement_locked,command_post_type,qrc_groups
post_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6621543,Comment,6084286/6099168,,**redacted contact** - and where do you get ...,1,,1580444,1,2021-01-01 00:04:27,2021-01-01 00:11:38,...,0,0,,0,Question,0,,0,,"Retirement,Leadership,Values,Promotions"
6621544,Comment,6084286/6619121,,"col trinh, most people in the chain of command...",1,,1459261,0,2021-01-01 00:04:33,2021-01-01 00:04:33,...,0,0,,0,Question,0,,0,,"Retirement,Leadership,Values,Promotions"
6621546,Response,6620500,,great accomplishments in difficult times. wat...,1,,1652327,2,2021-01-01 00:05:29,2021-01-01 00:05:29,...,0,0,,0,SharedLink,0,,0,,"Awards,Physics,Theoretical Physics,Science"
6621547,Comment,6084286/6098802,,**redacted contact** - i certainly didn't vo...,1,,181760,0,2021-01-01 00:05:34,2021-01-01 00:05:34,...,0,0,,0,Question,0,,0,,"Retirement,Leadership,Values,Promotions"
6621548,Comment,1157569/6592815,,tanana alaska january #### operation jack fros...,1,,1631106,2,2021-01-01 00:05:51,2021-01-01 00:05:51,...,0,0,,0,Question,0,,0,,"Friends,Memories,Photography"


By defining the ID column as the index column it is trivial to return a movie by just querying the ID. Movie #589 should be Terminator 2: Judgment Day.

In [88]:
items_df.shape

(387721, 41)

That isn't terrible, but it would get messy to repeat this everywhere in our code, so the function below will clean that up.

In [107]:
def get_item_by_id(item_id):
    item_id = int(item_id)
    if item_id in items_df.index.values:
        rval = items_df.loc[item_id]
    else:
        rval = pd.Series([])
    return rval
    

In [108]:
# get a random ITEM-ID
ran_items = items_df.sample(3).index.tolist()
print(ran_items)

[6954093, 6881159, 7197119]


In [109]:
for item_id in ran_items:
    post = get_item_by_id(item_id)
    print(f'{item_id}:\t', end='')
    if post.empty:
        print('Not found.')
    else:
        print(post.body)

6954093:	the leftists would tell you yeah, but this is different....
6881159:	man great video but that guys accent sounds like people who cannot help but talk to you like your an idiot.
7197119:	 **redacted contact**  where are the party pics? i would like to see them.  i know you have them if you threw the party!  come in, just one!


### SIMS

SIMS requires just an item as input, and it will return items which users interact with in similar ways to their interaction with the input item. In this particular case the item is a movie. 

The cells below will handle getting recommendations from SIMS and rendering the results. Let's see what the recommendations are for the first item we looked at earlier in this notebook (Terminator 2: Judgment Day).

In [98]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = sims_campaign_arn,
    itemId = str(6944144),
)

In [99]:
item_list = get_recommendations_response['itemList']
print(item_list)

[{'itemId': '7110928'}, {'itemId': '6865970'}, {'itemId': '6810191'}, {'itemId': '6781635'}, {'itemId': '7187056'}, {'itemId': '6846397'}, {'itemId': '7162930'}, {'itemId': '6852494'}, {'itemId': '7069922'}, {'itemId': '6896215'}, {'itemId': '6643074'}, {'itemId': '6747302'}, {'itemId': '7094780'}, {'itemId': '7076599'}, {'itemId': '6770828'}, {'itemId': '7008304'}, {'itemId': '7102876'}, {'itemId': '6830280'}, {'itemId': '7059659'}, {'itemId': '6870328'}, {'itemId': '6828448'}, {'itemId': '7050472'}, {'itemId': '6715995'}, {'itemId': '6905957'}, {'itemId': '6622424'}]


In [110]:
for rec_item in item_list:
    item_id = rec_item['itemId']
    print(f'{item_id}:\t', end='')
    post = get_item_by_id(item_id)
    if post.empty:
        print('Not found.')
    else:
        print(post.body)


7110928:	thanks to all who participated! this sweepstakes event has ended and all prizes have been awarded. please continue to share your stories and follow the rallysweeps page for the next event! https://rly.pt/rlyswp
6865970:	scroll updates will be posted when available
6810191:	final edit made ## august ####: i was promoted ## august #### with an mli recommendation for promotion to e#. i had to contact my s# since they didn't cut orders, but i am a sergeant as of yesterday. i'm looking forward to hopefully going to my first ever board in #-# months as i enter secondary zone for staff sergeant. thank you all for answers, and the answer ended up being yes! (just need to be fully qualified and fit mli specific tis/tig requirements while making points)    major edit for others ## june ####: mli has been rescinded effective ## july ####, those who received mli (p) will keep, but otherwise for e#->e# it's rescinded (e#->e# is still currently there). i will still give update when i'm on t

Congrats, this is your first list of recommendations! This list is fine, but it would be better to see the recommendations for similar movies render in a nice dataframe. Again, let's create a helper function to achieve this.

Now, let's test the helper function with several different movies. Let's sample some data from our dataset to test our SIMS campaign. Grab 5 random movies from our dataframe.

Note: We are going to show similar titles, so you may want to re-run the sample until you recognize some of the movies listed

### User Personalization

HRNN is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. 

Since HRNN relies on having a sampling of users, let's load the data we need for that and select 3 random users. Since Movielens does not include user data, we will select 3 random numbers from the range of user id's in the dataset.

Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.

Again, we create a helper function to render the results in a nice dataframe.

#### API call results

In [49]:
# Create a dataframe for the items by reading in the correct source CSV
users_df = pd.read_csv(data_dir + 'user-meta.csv')
users_df = users_df.set_index('USER_ID')

# Render some sample data
users_df.head(5)

Unnamed: 0_level_0,marital_status,age,gender,rank,TIMESTAMP
USER_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
604,Married,44,Male,LTC,1344606156000000000
605,Married,46,Male,CPT,1344609187000000000
607,Married,39,Male,Capt,1344622316000000000
610,Married,40,Male,LTC,1344633409000000000
619,Married,38,Male,Lt Col,1344795354000000000


In [111]:
ran_users = users_df.sample(3).index.tolist()
print(ran_users)

[1881050, 803091, 862333]


In [118]:

get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = userpersonalization_campaign_arn,
    userId = str(803091),
)

item_list = get_recommendations_response['itemList']
print(item_list)

[{'itemId': '7050472', 'score': 0.0202207}, {'itemId': '7059659', 'score': 0.017084}, {'itemId': '6846397', 'score': 0.0152388}, {'itemId': '6770828', 'score': 0.0146097}, {'itemId': '6810191', 'score': 0.0144212}, {'itemId': '6828448', 'score': 0.0130757}, {'itemId': '7031063', 'score': 0.0130193}, {'itemId': '7031191', 'score': 0.0128974}, {'itemId': '6830280', 'score': 0.0120832}, {'itemId': '6643074', 'score': 0.0115107}, {'itemId': '6937853', 'score': 0.0113073}, {'itemId': '7008304', 'score': 0.0105913}, {'itemId': '7102876', 'score': 0.0105222}, {'itemId': '7162930', 'score': 0.0104653}, {'itemId': '6905957', 'score': 0.0080874}, {'itemId': '7076599', 'score': 0.0080449}, {'itemId': '7186507', 'score': 0.0067413}, {'itemId': '6853551', 'score': 0.0065115}, {'itemId': '7094780', 'score': 0.006439}, {'itemId': '7152032', 'score': 0.0062416}, {'itemId': '6852494', 'score': 0.0061744}, {'itemId': '6865970', 'score': 0.0061622}, {'itemId': '7187618', 'score': 0.0061174}, {'itemId': '

In [119]:
for item in item_list:
    item_id = item['itemId']
    print(f'{item_id}:\t', end ='')
    post = get_item_by_id(item_id)
    if len(post):
        print(post.body)
    else:
        print(f'Not found.')
print('\n\n')

7050472:	having been at ft bliss (dry heat) and now being at ft riley (very humid heat) when is it acceptable for soldiers to modify the uniform?  we are hitting heat cat # & # every day now and there's many solders (including my wife's unit) who work outside all day every day right under the sun which is tough.  up until recently there has been no issue with them (this specific unit is a uas platoon so they're on an air strip for hours) taking off their tops due to the extreme heat.  recently they were told army policy says you're allowed to roll sleeves not take off tops so no more taking them off. knowing a bit about how regs work there is no army policy but rather a regulation that leaves it up to the local commander to decide (which is usually overruled by crusty grumpy #sg's for no reason at all).    so my question is, because i can't find supporting regs to try and help out, when is it acceptable for soldiers to remove tops?  is it just never?  is it a local call from someone wi

Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.

### Personalized Ranking

The core use case for personalized ranking is to take a collection of items and to render them in priority or probable order of interest for a user. For a VOD application you want dynamically render a personalized shelf/rail/carousel based on some information (director, location, superhero franchise, movie time period etc). This may not be information that you have in your metadata, so a item metadata filter will not work, howeverr you may have this information within you system to generate the item list. 

To demonstrate this, we will use the same user from before and a random collection of items.

In [None]:
rerank_user = '1717295'
rerank_items = items_df.sample(25).index.tolist()

Now build a nice dataframe that shows the input data.

In [None]:
rerank_list = []
for item in rerank_items:
    post = get_item_by_id(item)
    rerank_list.append(post)
rerank_df = pd.DataFrame(rerank_list, columns = ['Un-Ranked'])
rerank_df

Then make the personalized ranking API call.

In [None]:
# Convert user to string:
user_id = str(rerank_user)
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))
    
# Get recommended reranking
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = user_id,
        inputList = rerank_item_list
)

Now add the reranked items as a second column to the original dataframe, for a side-by-side comparison.

In [None]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    post = get_item_by_id(item['itemId'])
    ranked_list.append(post)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user, a list of promotions for example.

## Batch recommendations <a class="anchor" id="batch"></a>
[Back to top](#top)

There are many cases where you may want to have a larger dataset of exported recommendations. Recently, Amazon Personalize launched batch recommendations as a way to export a collection of recommendations to S3. In this example, we will walk through how to do this for the HRNN solution. For more information about batch recommendations, please see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html). This feature applies to all recipes, but the output format will vary.

A simple implementation looks like this:

```python
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_inference_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Batch job name",
    roleArn = "IAM role ARN",
    jobInput = 
       {"s3DataSource": {"path": <S3 input path>}},
    jobOutput = 
       {"s3DataDestination": {"path": <S3 output path>}}
)
```

The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.

Starting with the input for HRNN, it looks like:


```JSON
{"userId": "4638"}
{"userId": "663"}
{"userId": "3384"}
```

This should yield an output that looks like this:

```JSON
{"input":{"userId":"4638"}, "output": {"recommendedItems": ["296", "1", "260", "318"]}}
{"input":{"userId":"663"}, "output": {"recommendedItems": ["1393", "3793", "2701", "3826"]}}
{"input":{"userId":"3384"}, "output": {"recommendedItems": ["8368", "5989", "40815", "48780"]}}
```

The output is a JSON Lines file. It consists of individual JSON objects, one per line. So we will need to put in more work later to digest the results in this format.

### Building the input file

When you are using the batch feature, you specify the users that you'd like to receive recommendations for when the job has completed. The cell below will again select a few random users and will then build the file and save it to disk. From there, you will upload it to S3 to use in the API call later.

In [None]:
# We will use the same users from before
print(ran_users)

# Write the file to disk
json_input_filename = "json_input.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for user_id in ran_users:
        json_input.write('{"userId": "' + str(user_id) + '"}\n')

In [None]:
# Showcase the input file:
!cat $data_dir"/"$json_input_filename

Upload the file to S3 and save the path as a variable for later.

In [None]:
# Upload files to S3
s3input_path = f's3://{s3bucket}/{s3prefix}/{json_input_filename}'
!aws s3 cp $data_dir/$json_input_filename $s3input_path


Batch recommendations read the input from the file we've uploaded to S3. Similarly, batch recommendations will save the output to file in S3. So we define the output path where the results should be saved.

In [None]:
# Define the output path
s3output_path = f's3://{s3bucket}/{s3prefix}/results/'
print(s3output_path)

Now just make the call to kick off the batch export process.

In [None]:
batchInferenceJobArn = personalize.create_batch_inference_job (
    solutionVersionArn = userpersonalization_solution_version_arn,
    jobName = "VOD-POC-Batch-Inference-Job-UserPersonalization_" + str(round(time.time()*1000)),
    roleArn = role_arn,
    jobInput = 
     {"s3DataSource": {"path": s3input_path}},
    jobOutput = 
     {"s3DataDestination":{"path": s3output_path}}
)
batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']

Run the while loop below to track the status of the batch recommendation call. This can take around 30 minutes to complete, because Personalize needs to stand up the infrastructure to perform the task. We are testing the feature with a dataset of only 3 users, which is not an efficient use of this mechanism. Normally, you would only use this feature for bulk processing, in which case the efficiencies will become clear.

In [None]:
current_time = datetime.now()
print("Import Started on: ", current_time.strftime("%I:%M:%S %p"))

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(
        batchInferenceJobArn = batchInferenceJobArn
    )
    status = describe_dataset_inference_job_response["batchInferenceJob"]['status']
    print("DatasetInferenceJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
current_time = datetime.now()
print("Import Completed on: ", current_time.strftime("%I:%M:%S %p"))

In [None]:
s3 = boto3.client('s3')
export_name = f'{json_input_filename}.out'
s3export_name = f'{s3prefix}/results/{export_name}'

s3.download_file(s3bucket, s3export_name, f'{data_dir}/{export_name}')

In [None]:
# Update DF rendering
pd.set_option('display.max_rows', 30)
with open(data_dir+"/"+export_name) as json_file:
    # Get the first line and parse it
    line = json.loads(json_file.readline())
    # Do the same for the other lines
    while line:
        # extract the user ID 
        col_header = "User: " + line['input']['userId']
        # Create a list for all the artists
        recommendation_list = []
        # Add all the entries
        for item in line['output']['recommendedItems']:
            post = get_item_by_id(items_df, item)
            recommendation_list.append(post)
        if 'bulk_recommendations_df' in locals():
            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])
            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)
        else:
            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])
        try:
            line = json.loads(json_file.readline())
        except:
            line = None
bulk_recommendations_df

## Wrap up <a class="anchor" id="wrapup"></a>
[Back to top](#top)

With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate customer data to better integrate with the service, and a knowledge of how to do all this over APIs and by leveraging open source data science tools.

Use these notebooks as a guide to getting started with your customers for POCs. As you find missing components, or discover new approaches, make a pull request and provide any additional helpful components that may be missing from this collection.

You'll want to make sure that you clean up all of the resources deployed during this POC. We have provided a separate notebook which shows you how to identify and delete the resources in `cleanup.ipynb`.