# Interacting with Recommenders and Campaigns <a class="anchor" id="top"></a>

In this notebook, you will deploy and interact with campaigns in Amazon Personalize.

1. [Introduction](#intro)
1. [Interact with Recommenders](#interact-recommenders)
1. [Interact with Campaigns](#interact-campaigns)
1. [Using Static and Dynamic Filters](#filters)
1. [Real-time Events](#real-time)
1. [Batch Recommendations](#batch)
1. [Wrap Up](#wrapup)

## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

At this point, you should have 2 Recommenders and one deployed campaign. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. 


In this Notebook we will interact with Recommenders and Campaigns and get recommendatiosn. We will interact with filters and send live data to Amazon Personalize to see the effect on recommendations.

![Workflow](images/image3.png)

To run this notebook, you need to have run the previous notebooks, `01_Data_Layer.ipynb`, and `02_Training_Layer.ipynb`, where you created a dataset and imported interaction, item, and user metadata data into Amazon Personalize, created recommenders, solutions and campaigns. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook.

As you work with your customer on Amazon Personalize, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.

To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK.

In [61]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random
import boto3
import pandas as pd

In [62]:
%store -r

In [7]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's event streaming
personalize_events = boto3.client(service_name='personalize-events')

First, let's create a supporting function to help make sense of the results returned by a Personalize recommender or campaign. Personalize returns only an `item_id`. This is great for keeping data compact, but it means you need to query a database or lookup table to get a human-readable result for the notebooks. We will create a helper function to return a human-readable result from the Movielens dataset.

Start by loading in the dataset which we can use for our lookup table.

In [8]:
# Create a dataframe for the items by reading in the correct source CSV
items_df = pd.read_csv(dataset_dir + '/movies.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "object", 'title': "str"},index_col=0)

# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
1,Toy Story (1995)
2,Jumanji (1995)
3,Grumpier Old Men (1995)
4,Waiting to Exhale (1995)
5,Father of the Bride Part II (1995)


By defining the ID column as the index column it is trivial to return a movie by just querying the ID. Movie #589 should be Terminator 2: Judgment Day.

In [9]:
movieIdExample = 589
title = items_df.loc[movieIdExample]['title']
print(title)

Terminator 2: Judgment Day (1991)


That isn't terrible, but it would get messy to repeat this everywhere in our code, so the function below will clean that up.

In [10]:
def get_movie_by_id(movieId, movie_df=items_df):
    """
    This takes in an artist_id from Personalize so it will be a string,
    converts it to an int, and then does a lookup in a default or specified
    dataframe.
    
    A really broad try/except clause was added in case anything goes wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return movie_df.loc[int(movieId)]['title']
    except:
        return "Error obtaining title"

Now let's test a few simple values to check our error catching.

In [11]:
# A known good id (The Princess Bride)
print(get_movie_by_id(movieId="1197"))
# A bad type of value
print(get_movie_by_id(movieId="987.9393939"))
# Really bad values
print(get_movie_by_id(movieId="Steve"))

Princess Bride, The (1987)
Error obtaining title
Error obtaining title


Great! Now we have a way of rendering results. 

## Interact with recommenders <a class="anchor" id="interact-recommenders"></a>
[Back to top](#top)

Now that the recommenders have been trained, lets have a look at the recommendations we can get for our users!

### "More like X" Recommender

'More like X' requires an item and a user as input, and it will return items which users interact with in similar ways to their interaction with the input item. In this particular case the item is a movie. 

The cells below will handle getting recommendations from the "More like X" Recommender and rendering the results. Let's see what the recommendations are for the first item we looked at earlier in this notebook (Terminator 2: Judgment Day).

We will be using the `recommenderArn`, the `itemId`, the `userId` as well as the number or results we want, `numResults`.

In [16]:
# First pick a user
testUserId = "1"

In [17]:
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = recommender_more_like_x_arn,
    itemId = str(589),
    userId = testUserId,
    numResults = 20
)

In [18]:
itemList = get_recommendations_response['itemList']
for item in itemList:
    print(get_movie_by_id(movieId=item['itemId']))

Speed (1994)
Terminator, The (1984)
Independence Day (a.k.a. ID4) (1996)
Toy Story (1995)
Lion King, The (1994)
Mask, The (1994)
Braveheart (1995)
Seven (a.k.a. Se7en) (1995)
Firm, The (1993)
Matrix, The (1999)
Star Wars: Episode VI - Return of the Jedi (1983)
Die Hard (1988)
Fugitive, The (1993)
Interview with the Vampire: The Vampire Chronicles (1994)
Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
Jurassic Park (1993)
Usual Suspects, The (1995)
Aliens (1986)
True Lies (1994)
Mrs. Doubtfire (1993)


Congrats, this is your first list of recommendations! This list is fine, but it would be better to see the recommendations for similar movies render in a nice dataframe. Again, let's create a helper function to achieve this.

In [22]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df(recommendations_df, movie_id, user_id):
    # Get the movie name
    movie_name = get_movie_by_id(movie_id)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_more_like_x_arn,
        itemId = str(movie_id),
        userId = user_id
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [movie_name])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

Now, let's test the helper function with several different movies. Let's sample some data from our dataset to test our "More like X" Recommender. Grab 5 random movies from our dataframe.

Note: We are going to show similar titles, so you may want to re-run the sample until you recognize some of the movies listed

In [23]:
samples = items_df.sample(5)
samples

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
148731,La Sfida (1958)
65614,Island in the Sun (1957)
161752,Ernst ThÃ¤lmann - Sohn seiner Klasse (1954)
5086,"Five Heartbeats, The (1991)"
104542,Joyride (1997)


In [24]:
more_like_x_recommendations_df = pd.DataFrame()
movies = samples.index.tolist()

for movie in movies:
    more_like_x_recommendations_df = get_new_recommendations_df(more_like_x_recommendations_df, movie, testUserId)

more_like_x_recommendations_df

Unnamed: 0,La Sfida (1958),Island in the Sun (1957),Ernst ThÃ¤lmann - Sohn seiner Klasse (1954),"Five Heartbeats, The (1991)",Joyride (1997)
0,"Rudolph, the Red-Nosed Reindeer (1964)",Julia (1977),Ernst ThÃ¤lmann - FÃ¼hrer seiner Klasse (1955),Carmen Jones (1954),VeggieTales: Very Silly Songs (1997)
1,Invitation to the Dance (1956),"King and I, The (1956)",Kismet (1955),Zoolander (2001),Encounter in the Third Dimension (1999)
2,Yellow Submarine (1968),West Side Story (1961),A Hill in Korea (1956),Selena (1997),"General's Daughter, The (1999)"
3,Shinbone Alley (1970),Lolita (1962),Privilege (1967),"Amazing Spider-Man, The (2012)",Asunder (1998)
4,Head (1968),"Umbrellas of Cherbourg, The (Parapluies de Che...",I Could Go on Singing (1963),Soul Man (1986),"Geography of Fear, The (Pelon maantiede) (2000)"
5,Blue Hawaii (1961),Breathless (Ã bout de souffle) (1960),All Hands on Deck (1961),Akeelah and the Bee (2006),"Intruder, The (1999)"
6,Finian's Rainbow (1968),Brigadoon (1954),Kid Galahad (1962),Tombstone (1993),Cenizas del Paraiso (1997)
7,Seven Brides for Seven Brothers (1954),Marty (1955),Narrien illat (1970),Richard Pryor Here and Now (1983),'R Xmas (2001)
8,Animal Farm (1954),Guys and Dolls (1955),High Time (Big Daddy) (1960),Breakout (1975),Let The Devil Wear Black (1999)
9,"Wonderful World of the Brothers Grimm, The (1962)","Woman Is a Woman, A (femme est une femme, Une)...",Under Ten Flags (1960),Maria's Lovers (1984),Dead Heart (1996)


You may notice that some of the items look the same, hopefully not all of them do (this is more likely with a smaller # of interactions, which will be more common with the movielens small dataset). 

### "Top picks for you" Recommender

"Top picks for you" supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. 

Since "Top picks for you" relies on having a sampling of users, let's load the data we need for that and select 3 random users. Since Movielens does not include user data, we will select 3 random numbers from the range of user id's in the dataset.

In [25]:
if not USE_FULL_MOVIELENS:
    users = random.sample(range(1, 600), 3)
else:
    users = random.sample(range(1, 162000), 3)
users

[28783, 84713, 109510]

Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.

"Top picks for you" requires only a user as input, and it will return items that are relevant for that particular user. In this particular case the item is a movie.

The cells below will handle getting recommendations from the "Top picks for you" Recommender and rendering the results. 

We will be using the `recommenderArn`, the `userId` as well as the number or results we want, `numResults`.

Again, we create a helper function to render the results in a nice dataframe.

#### API call results

In [26]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df_users(recommendations_df, user_id):
    # Get the movie name
    #movie_name = get_movie_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        numResults = 20
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [user_id])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

In [27]:
recommendations_df_users = pd.DataFrame()

for user in users:
    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)

recommendations_df_users

Unnamed: 0,28783,84713,109510
0,Grosse Pointe Blank (1997),On the Waterfront (1954),Love & Human Remains (1993)
1,"O Brother, Where Art Thou? (2000)",His Girl Friday (1940),In the Realm of the Senses (Ai no corrida) (1976)
2,Dazed and Confused (1993),Stalag 17 (1953),Nadja (1994)
3,Forrest Gump (1994),Casablanca (1942),Naked (1993)
4,Monty Python's Life of Brian (1979),It Happened One Night (1934),Exotica (1994)
5,Raising Arizona (1987),Double Indemnity (1944),Killing Zoe (1994)
6,Rushmore (1998),Key Largo (1948),Amateur (1994)
7,Groundhog Day (1993),Some Like It Hot (1959),Strange Days (1995)
8,This Is Spinal Tap (1984),"Philadelphia Story, The (1940)",Faces (1968)
9,Best in Show (2000),Modern Times (1936),Beauty of the Day (Belle de jour) (1967)


Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.

## Interact with Campaigns <a class="anchor" id="interact-campaigns"></a>
[Back to top](#top)

Now that the reranking campaign is deployed and active, we can start to get recommendations via an API call. 

### Personalized Ranking

The core use case for personalized ranking is to take a collection of items and to render them in priority or probable order of interest for a user. For a VOD application you want dynamically render a personalized shelf/rail/carousel based on some information (director, location, superhero franchise, movie time period, etc...). This may not be information that you have in your metadata, so an item metadata filter will not work, however you may have this information within you system to generate the item list. 

To demonstrate this, we will use the same user from before and a random collection of items.

In [28]:
rerank_user = user
rerank_items = items_df.sample(25).index.tolist()

Now build a nice dataframe that shows the input data.

In [29]:
rerank_list = []
for item in rerank_items:
    movie = get_movie_by_id(item)
    rerank_list.append(movie)
rerank_df = pd.DataFrame(rerank_list, columns = ['Un-Ranked'])
rerank_df

Unnamed: 0,Un-Ranked
0,"Princess for Christmas, A (2011)"
1,Enron: The Smartest Guys in the Room (2005)
2,"Man There Was, A (Terje Vigen) (1917)"
3,The Outriders (1950)
4,Stonewall Uprising (2010)
5,"Zero Years, The (2005)"
6,Evil Thoughts (1976)
7,Saint Maybe (1998)
8,Zenda (2009)
9,Les Loups entre eux (1985)


Then make the personalized ranking API call.

In [30]:
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))
    
# Get recommended reranking
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = str(rerank_user),
        inputList = rerank_item_list
)

Now add the reranked items as a second column to the original dataframe, for a side-by-side comparison.

In [31]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    movie = get_movie_by_id(item['itemId'])
    ranked_list.append(movie)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

Unnamed: 0,Un-Ranked,Re-Ranked
0,"Princess for Christmas, A (2011)","Secret Agent, The (1996)"
1,Enron: The Smartest Guys in the Room (2005),Les Loups entre eux (1985)
2,"Man There Was, A (Terje Vigen) (1917)","Sometimes Happiness, Sometimes Sorrow (Kabhi K..."
3,The Outriders (1950),Talk About a Stranger (1952)
4,Stonewall Uprising (2010),Once Upon a Honeymoon (1942)
5,"Zero Years, The (2005)","Man There Was, A (Terje Vigen) (1917)"
6,Evil Thoughts (1976),The General Line (1929)
7,Saint Maybe (1998),The Seashell and the Clergyman (1928)
8,Zenda (2009),Evil Thoughts (1976)
9,Les Loups entre eux (1985),Enron: The Smartest Guys in the Room (2005)


You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user, a list of promotions for example.

## Using Static and Dynamic Filters <a class="anchor" id="filters"></a>
[Back to top](#top)

Lets interact with the static filters we created in the previous notebook, and utilize dynamic filters in realtime.

A few common use cases for dynamic filters in Video On Demand are:

Categorical filters based on Item Metadata (that arent range based) - Often your item metadata will have information about the title such as Genre, Keyword, Year, Director, Actor etc. Filtering on these can provide recommendations within that data, such as action movies, Steven Spielberg movies, Movies from 1995 etc.

Events - you may want to filter out certain events and provide results based on those events, such as moving a title from a "suggestions to watch" recommendation to a "watch again" recommendations.

Now lets apply item filters to see recommendations for one of these users within each decade of our static filters.


In [32]:
def get_new_recommendations_df_by_static_filter(recommendations_df, user_id, filter_arn):
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        filterArn = filter_arn
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    #print(recommendation_list)
    filter_name = filter_arn.split('/')[1]
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [filter_name])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

In [33]:
def get_new_recommendations_df_by_dynamic_filter(recommendations_df, user_id, genre_filter_arn, filter_values):
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        filterArn = genre_filter_arn,
        filterValues = { "GENRE": "\"" + filter_values + "\""}
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    filter_name = genre_filter_arn.split('/')[1]
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [filter_values])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

You can see the recommendations for movies within a given decade. Within a VOD application you could create Shelves (also known as rails or carousels) easily by using these filters. Depending on the information you have about your items, You could also filter on additional information such as keyword, year/decade etc.

In [34]:
recommendations_df_decade_shelves = pd.DataFrame()
for filter_arn in meta_filter_decade_arns:
    recommendations_df_decade_shelves = get_new_recommendations_df_by_static_filter(recommendations_df_decade_shelves, user, filter_arn)

recommendations_df_decade_shelves

Unnamed: 0,1950s,1960s,1970s,1980s,1990s,2000s,2010s
0,"Ballad of Narayama, The (Narayama Bushiko) (1958)",Faces (1968),In the Realm of the Senses (Ai no corrida) (1976),Blade Runner (1982),Love & Human Remains (1993),"Dark Knight, The (2008)",Inception (2010)
1,"20,000 Leagues Under the Sea (1954)",Beauty of the Day (Belle de jour) (1967),"Candidate, The (1972)",Coup de torchon (Clean Slate) (1981),Nadja (1994),"Lord of the Rings: The Fellowship of the Ring,...",Ad Astra (2019)
2,Dial M for Murder (1954),Purple Noon (Plein soleil) (1960),Star Wars: Episode IV - A New Hope (1977),"Fish Called Wanda, A (1988)",Naked (1993),Code Unknown (Code inconnu: RÃ©cit incomplet d...,Rubber (2010)
3,"World of Apu, The (Apur Sansar) (1959)","Umbrellas of Cherbourg, The (Parapluies de Che...",Bread and Chocolate (Pane e cioccolata) (1973),Heavy Metal (1981),Exotica (1994),Little Otik (OtesÃ¡nek) (2000),Uncle Boonmee Who Can Recall His Past Lives (L...
4,"Eyes Without a Face (Yeux sans visage, Les) (1...",Bonnie and Clyde (1967),Monty Python's Life of Brian (1979),Die Hard (1988),Killing Zoe (1994),Chopper (2000),It's Such a Beautiful Day (2012)
5,Rebel Without a Cause (1955),Dr. Strangelove or: How I Learned to Stop Worr...,"Clockwork Orange, A (1971)",Brazil (1985),Amateur (1994),Donnie Darko (2001),Unrivaled (2010)
6,Song of the Little Road (Pather Panchali) (1955),2001: A Space Odyssey (1968),Willy Wonka & the Chocolate Factory (1971),Akira (1988),Strange Days (1995),3 Nights (2001),Dragged Across Concrete (2018)
7,"Unvanquished, The (Aparajito) (1957)",Barbarella (1968),"Ruling Class, The (1972)",Star Wars: Episode VI - Return of the Jedi (1983),Romeo Is Bleeding (1993),Eternal Sunshine of the Spotless Mind (2004),Africa United (2010)
8,"Streetcar Named Desire, A (1951)",Spirits of the Dead (1968),"Tin Drum, The (Blechtrommel, Die) (1979)","Cook the Thief His Wife & Her Lover, The (1989)",Body Snatchers (1993),Joint Security Area (Gongdong gyeongbi guyeok ...,Under the Silver Lake (2018)
9,Rear Window (1954),"Sword in the Stone, The (1963)","Godfather, The (1972)",E.T. the Extra-Terrestrial (1982),Four Rooms (1995),Memento (2000),Tomorrow Ever After (2017)


In [35]:
# Create a dataframe for the items by reading in the correct source CSV
items_meta_df = pd.read_csv(data_dir + '/item-meta.csv', sep=',', index_col=0)

# Render some sample data
items_meta_df.head(10)

Unnamed: 0_level_0,GENRES,YEAR,CREATION_TIMESTAMP
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Adventure|Animation|Children|Comedy|Fantasy,1995,1640995200
2,Adventure|Children|Fantasy,1995,1640995200
3,Comedy|Romance,1995,1640995200
4,Comedy|Drama|Romance,1995,1640995200
5,Comedy,1995,1640995200
6,Action|Crime|Thriller,1995,1640995200
7,Comedy|Romance,1995,1640995200
8,Adventure|Children,1995,1640995200
9,Action,1995,1640995200
10,Action|Adventure|Thriller,1995,1640995200


Now what we want to do is determine the genres to filter on, for that we need a list of all genres. First we will get all the unique values of the column GENRE, then split strings on | if they exist, everyone will then get added to a long list which will be converted to a set for efficiency. That set will then be made into a list so that it can be iterated, and we can then use the get recommendatioins API.

In [36]:
unique_genre_field_values = items_meta_df['GENRES'].unique()

genre_val_list = []

def process_for_bar_char(val, val_list):
    if '|' in val:
        values = val.split('|')
        for item in values:
            val_list.append(item)
    elif '(' in val:
        pass
    else:
        val_list.append(val)
    return val_list
    

for val in unique_genre_field_values:
    genre_val_list = process_for_bar_char(val, genre_val_list)

genres_to_filter = list(set(genre_val_list))

In [37]:
genres_to_filter

['Action',
 'Sci-Fi',
 'Fantasy',
 'Musical',
 'Comedy',
 'War',
 'Romance',
 'Mystery',
 'Film-Noir',
 'Drama',
 'Thriller',
 'Crime',
 'Children',
 'Adventure',
 'Western',
 'IMAX',
 'Documentary',
 'Animation',
 'Horror']

In [38]:
# Iterate through Genres
recommendations_df_genre_shelves = pd.DataFrame()
for genre in genres_to_filter:
    recommendations_df_genre_shelves = get_new_recommendations_df_by_dynamic_filter(recommendations_df_genre_shelves, user, genre_filter_arn , genre)
    
recommendations_df_genre_shelves

Unnamed: 0,Action,Sci-Fi,Fantasy,Musical,Comedy,War,Romance,Mystery,Film-Noir,Drama,Thriller,Crime,Children,Adventure,Western,IMAX,Documentary,Animation,Horror
0,Strange Days (1995),Strange Days (1995),"City of Lost Children, The (CitÃ© des enfants ...","Umbrellas of Cherbourg, The (Parapluies de Che...",Love & Human Remains (1993),Underground (1995),What Happened Was... (1994),Strange Days (1995),Bitter Moon (1992),Love & Human Remains (1993),Killing Zoe (1994),Amateur (1994),Snow White and the Seven Dwarfs (1937),"City of Lost Children, The (CitÃ© des enfants ...",Dead Man (1995),"Lion King, The (1994)",Crumb (1994),Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995),Body Snatchers (1993)
1,Blade Runner (1982),Body Snatchers (1993),"Visitors, The (Visiteurs, Les) (1993)",Farinelli: il castrato (1994),Four Rooms (1995),Dr. Strangelove or: How I Learned to Stop Worr...,Bitter Moon (1992),Dead Man (1995),Devil in a Blue Dress (1995),In the Realm of the Senses (Ai no corrida) (1976),Amateur (1994),Killing Zoe (1994),Willy Wonka & the Chocolate Factory (1971),Star Wars: Episode IV - A New Hope (1977),Geronimo: An American Legend (1993),Apollo 13 (1995),Unzipped (1995),"Secret Adventures of Tom Thumb, The (1993)",Mute Witness (1994)
2,"Crow, The (1994)",Blade Runner (1982),"Crow, The (1994)",Snow White and the Seven Dwarfs (1937),What Happened Was... (1994),Beyond Rangoon (1995),"Umbrellas of Cherbourg, The (Parapluies de Che...",Clockers (1995),Suture (1993),Naked (1993),Strange Days (1995),Strange Days (1995),Wallace & Gromit: A Close Shave (1995),"Secret Adventures of Tom Thumb, The (1993)",Desperado (1995),Beauty and the Beast (1991),Heidi Fleiss: Hollywood Madam (1995),Heavy Metal (1981),From Dusk Till Dawn (1996)
3,Star Wars: Episode IV - A New Hope (1977),"City of Lost Children, The (CitÃ© des enfants ...",Reckless (1995),Willy Wonka & the Chocolate Factory (1971),Kicking and Screaming (1995),Hot Shots! Part Deux (1993),"House of the Spirits, The (1993)","City of Lost Children, The (CitÃ© des enfants ...",2 Days in the Valley (1996),Exotica (1994),Romeo Is Bleeding (1993),Romeo Is Bleeding (1993),"Sword in the Stone, The (1963)",Beyond Rangoon (1995),Wild Bill (1995),"Dark Knight, The (2008)","Endless Summer 2, The (1994)",Snow White and the Seven Dwarfs (1937),"Addiction, The (1995)"
4,Bad Company (1995),"Visitors, The (Visiteurs, Les) (1993)",Snow White and the Seven Dwarfs (1937),"Sword in the Stone, The (1963)",Blue in the Face (1995),Stalingrad (1993),Germinal (1993),Chungking Express (Chung Hing sam lam) (1994),Lost Highway (1997),Amateur (1994),Body Snatchers (1993),Little Odessa (1994),James and the Giant Peach (1996),2001: A Space Odyssey (1968),Bad Girls (1994),More (1998),"Celluloid Closet, The (1995)",Wallace & Gromit: A Close Shave (1995),Wes Craven's New Nightmare (Nightmare on Elm S...
5,Dead Presidents (1995),Star Wars: Episode IV - A New Hope (1977),Willy Wonka & the Chocolate Factory (1971),James and the Giant Peach (1996),Swimming with Sharks (1995),Land and Freedom (Tierra y libertad) (1995),Total Eclipse (1995),Devil in a Blue Dress (1995),Chinatown (1974),Nadja (1994),What Happened Was... (1994),Purple Noon (Plein soleil) (1960),Mary Poppins (1964),Heavy Metal (1981),Lone Star (1996),V for Vendetta (2006),Carmen Miranda: Bananas Is My Business (1994),Lesson Faust (1994),Heavy Metal (1981)
6,From Dusk Till Dawn (1996),Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995),Dragonheart (1996),Mary Poppins (1964),Living in Oblivion (1995),"Tin Drum, The (Blechtrommel, Die) (1979)",Go Fish (1994),Kaspar Hauser (1993),Down by Law (1986),Killing Zoe (1994),Blade Runner (1982),Kiss of Death (1995),"NeverEnding Story III, The (1994)",Hard Target (1993),Wyatt Earp (1994),Across the Sea of Time (1995),The Show (1995),"Sword in the Stone, The (1963)",Serial Mom (1994)
7,Screamers (1995),Mystery Science Theater 3000: The Movie (1996),"Prophecy, The (1995)","Sound of Music, The (1965)",Trainspotting (1996),Forrest Gump (1994),I Like It Like That (1994),Wes Craven's New Nightmare (Nightmare on Elm S...,M (1931),Strange Days (1995),Kiss of Death (1995),Clockers (1995),"Aristocats, The (1970)",Dragonheart (1996),Maverick (1994),Inception (2010),Nico Icon (1995),James and the Giant Peach (1996),"Lawnmower Man, The (1992)"
8,Heavy Metal (1981),Screamers (1995),Lesson Faust (1994),Kansas City (1996),Unstrung Heroes (1995),Before the Rain (Pred dozhdot) (1994),Chungking Express (Chung Hing sam lam) (1994),Boxing Helena (1993),Miller's Crossing (1990),Faces (1968),Purple Noon (Plein soleil) (1960),Casino (1995),Fluke (1995),Bottle Rocket (1996),Tombstone (1993),Everest (1998),Hoop Dreams (1994),Akira (1988),Village of the Damned (1995)
9,Hot Shots! Part Deux (1993),"Clockwork Orange, A (1971)",Highlander III: The Sorcerer (a.k.a. Highlande...,Winnie the Pooh and the Blustery Day (1968),Underground (1995),All Things Fair (Lust och fÃ¤gring stor) (1995),Even Cowgirls Get the Blues (1993),"Prophecy, The (1995)","Grifters, The (1990)",Beauty of the Day (Belle de jour) (1967),Kalifornia (1993),Trainspotting (1996),E.T. the Extra-Terrestrial (1982),Barbarella (1968),"Good, the Bad and the Ugly, The (Buono, il bru...",Wings of Courage (1995),"Haunted World of Edward D. Wood Jr., The (1996)","Aristocats, The (1970)",Mary Reilly (1996)


## Real-time Events<a class="anchor" id="real-time"></a>
[Back to top](#top)

The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.

Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.

Start by creating an event tracker that is attached to the dataset group. This event tracker will add information to the dataset and will influence the recommendations.

In [39]:
response = personalize.create_event_tracker(
    name='MovieTracker',
    datasetGroupArn=dataset_group_arn
)
print(response['eventTrackerArn'])
print(response['trackingId'])
trackingId = response['trackingId']
event_tracker_arn = response['eventTrackerArn']

arn:aws:personalize:us-east-1:061703360474:event-tracker/f216494c
ed187352-b361-4cc0-8fe0-3a8fb5e369a7


We will create some code that simulates a user interacting with a particular item. After running this code, you will get recommendations that differ from the results above.

We start by creating some methods for the simulation of real time events.

In [40]:
sessionDict = {}

def send_movie_click(userId, itemId, eventType):
    """
    Simulates a click as an envent
    to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        sessionId = sessionDict[str(userId)]
    except:
        sessionDict[str(userId)] = str(uuid.uuid1())
        sessionId = sessionDict[str(userId)]
        
    # Configure Properties:
    event = {
    "itemId": str(itemId),
    }
    event_json = json.dumps(event)
        
    # Make Call
    
    personalize_events.put_events(
    trackingId = trackingId,
    userId= str(userId),
    sessionId = sessionId,
    eventList = [{
        'sentAt': int(time.time()),
        'eventType': str(eventType),
        'properties': event_json
        }]
    )

def get_new_recommendations_df_users_real_time(recommendations_df, userId, itemId, eventType):
    # Get the artist name (header of column)
    movieName = get_movie_by_id(itemId)
    
    # Interact with different movies
    print('sending event ' + eventType + ' for ' + get_movie_by_id(itemId))
    send_movie_click(userId=userId, itemId=itemId,eventType=eventType)
    # Get the recommendations (note you should have a base recommendation DF created before)
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(userId),
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        artist = get_movie_by_id(item['itemId'])
        recommendation_list.append(artist)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [movieName])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

At this point, we haven't generated any real-time events yet; we have only set up the code. To compare the recommendations before and after the real-time events, let's pick one user and generate the original recommendations for them.

In [41]:
# Get recommendations for the user
get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(rerank_user),
    )

# Build a new dataframe for the recommendations
itemList = get_recommendations_response['itemList']
recommendationList = []
for item in item_list:
    artist = get_movie_by_id(item['itemId'])
    recommendationList.append(artist)
user_recommendations_df = pd.DataFrame(recommendationList, columns = [rerank_user])
user_recommendations_df

Unnamed: 0,109510
0,"Secret Agent, The (1996)"
1,Les Loups entre eux (1985)
2,"Sometimes Happiness, Sometimes Sorrow (Kabhi K..."
3,Talk About a Stranger (1952)
4,Once Upon a Honeymoon (1942)
5,"Man There Was, A (Terje Vigen) (1917)"
6,The General Line (1929)
7,The Seashell and the Clergyman (1928)
8,Evil Thoughts (1976)
9,Enron: The Smartest Guys in the Room (2005)


Ok, so now we have a list of recommendations for this user before we have applied any real-time events. Now let's pick 3 random artists which we will simulate our user interacting with, and then see how this changes the recommendations.

In [42]:
# Next generate 3 random movies
movies = items_df.sample(3).index.tolist()

In [43]:
# Note this will take about 15 seconds to complete due to the sleeps
for movie in movies:
    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, rerank_user, movie,'click')
    time.sleep(5)
    

sending event click for Black Room, The (1935)
sending event click for Last Emperor, The (1987)
sending event click for Zombieland (2009)


Now we can look at how the click events changed the recommendations.

In [44]:
user_recommendations_df

Unnamed: 0,109510,"Black Room, The (1935)","Last Emperor, The (1987)",Zombieland (2009)
0,"Secret Agent, The (1996)",Love & Human Remains (1993),Love & Human Remains (1993),"Last Emperor, The (1987)"
1,Les Loups entre eux (1985),In the Realm of the Senses (Ai no corrida) (1976),In the Realm of the Senses (Ai no corrida) (1976),Bonnie and Clyde (1967)
2,"Sometimes Happiness, Sometimes Sorrow (Kabhi K...",Naked (1993),Naked (1993),Dial M for Murder (1954)
3,Talk About a Stranger (1952),Exotica (1994),Exotica (1994),Raging Bull (1980)
4,Once Upon a Honeymoon (1942),Nadja (1994),Nadja (1994),Blade Runner (1982)
5,"Man There Was, A (Terje Vigen) (1917)",Killing Zoe (1994),Killing Zoe (1994),Rebel Without a Cause (1955)
6,The General Line (1929),Amateur (1994),Amateur (1994),Mary Shelley's Frankenstein (Frankenstein) (1994)
7,The Seashell and the Clergyman (1928),Strange Days (1995),Strange Days (1995),Strange Days (1995)
8,Evil Thoughts (1976),Faces (1968),Faces (1968),"Fish Called Wanda, A (1988)"
9,Enron: The Smartest Guys in the Room (2005),Romeo Is Bleeding (1993),Romeo Is Bleeding (1993),Menace II Society (1993)


In the cell above, the first column after the index is the user's default recommendations from the "Top pics for you" recommender, and each column after that has as a header of the movie that they interacted with via a real time event, and the recommendations after this event occurred. 

The behavior may not shift very much or a lot; this is due to the relatively limited nature of this dataset and effect of a few random clicks. If you wanted to better understand this, try simulating clicking more movies to see the impact.

Now lets look at the event filters, which allow you to filter items based on the interaction data. For this dataset, it could be click or watch based on the data we imported, but could be based on whatever interaction schema you design (click, rate, like, watch, purchase etc.) 

We will create a new helper function to use the personalized ranking campaign, sice the Recommenders already filter out watched content.

In [45]:
def get_new_ranked_recommendations_df_by_static_filter(recommendations_df, user_id, rerank_item_list, filter_arn):
    
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = str(user_id),
        inputList = rerank_item_list,
        filterArn = filter_arn
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['personalizedRanking']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)

    filter_name = filter_arn.split('/')[1]
    new_rec_df = pd.DataFrame(recommendation_list, columns = [filter_name])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

In [46]:
recommendations_df_events = pd.DataFrame()
for filter_arn in interaction_filter_arns:
    recommendations_df_events = get_new_ranked_recommendations_df_by_static_filter(recommendations_df_events, rerank_user, rerank_item_list, filter_arn)
    
recommendations_df_events

Unnamed: 0,watched,unwatched
0,,"Secret Agent, The (1996)"
1,,Les Loups entre eux (1985)
2,,Stonewall Uprising (2010)
3,,Once Upon a Honeymoon (1942)
4,,8 Dates (2008)
5,,Talk About a Stranger (1952)
6,,Never Forever (2007)
7,,The Seashell and the Clergyman (1928)
8,,"Sometimes Happiness, Sometimes Sorrow (Kabhi K..."
9,,Enron: The Smartest Guys in the Room (2005)


Now lets send a watch event in for the top 4 unwatched recommendations, which would simulate watching 4 movies. In a VOD application, you may choose to send in an event after they have watched a significant amount (over 75%) of a piece of content. Sending at 100% complete could miss people that stop short of the credits.

In [47]:
ranked_unwatched_recommendations_response = personalize_runtime.get_personalized_ranking(
    campaignArn = rerank_campaign_arn,
    userId = str(rerank_user),
    inputList = rerank_item_list,
    filterArn = filter_arn)

item_list = ranked_unwatched_recommendations_response['personalizedRanking'][:4]

for item in item_list:
    print('sending event watch for ' + get_movie_by_id(item['itemId']))
    send_movie_click(userId=rerank_user, itemId=item['itemId'], eventType='Watch')
    time.sleep(10)

sending event watch for Secret Agent, The (1996)
sending event watch for Les Loups entre eux (1985)
sending event watch for Stonewall Uprising (2010)
sending event watch for Once Upon a Honeymoon (1942)


Now we can look at the event filters to see the updated watched and unwatched recommendations 

In [48]:
recommendations_df_events = pd.DataFrame()
for filter_arn in interaction_filter_arns:
    recommendations_df_events = get_new_ranked_recommendations_df_by_static_filter(recommendations_df_events, rerank_user, rerank_item_list, filter_arn)
recommendations_df_events

Unnamed: 0,watched,unwatched
0,Once Upon a Honeymoon (1942),"Sometimes Happiness, Sometimes Sorrow (Kabhi K..."
1,"Secret Agent, The (1996)",The Seashell and the Clergyman (1928)
2,Stonewall Uprising (2010),Talk About a Stranger (1952)
3,Les Loups entre eux (1985),Never Forever (2007)
4,,"Man There Was, A (Terje Vigen) (1917)"
5,,The General Line (1929)
6,,8 Dates (2008)
7,,Enron: The Smartest Guys in the Room (2005)
8,,The Outriders (1950)
9,,Evil Thoughts (1976)


## Batch Recommendations <a class="anchor" id="batch"></a>
[Back to top](#top)

There are many cases where you may want to have a larger dataset of exported recommendations. Amazon Personalize launched batch recommendations as a way to export a collection of recommendations to S3. In this example, we will walk through how to do this for the Personalized Ranking solution. For more information about batch recommendations, please see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html). This feature applies to all recipes, but the output format will vary.

A simple implementation looks like this:

```python
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_inference_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Batch job name",
    roleArn = "IAM role ARN",
    jobInput = 
       {"s3DataSource": {"path": <S3 input path>}},
    jobOutput = 
       {"s3DataDestination": {"path": <S3 output path>}}
)
```

The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.

Starting with the input for Personalized Ranking, it looks like:


```JSON
{"userId": "891", "itemList": ["27", "886", "101"]}
{"userId": "445", "itemList": ["527", "55", "901"]}
{"userId": "71", "itemList": ["27", "351", "101"]}
```

This should yield an output that looks like this:

```JSON
{"input":{"userId":"891","itemList":["27","886","101"]},"output":{"recommendedItems":["27","101","886"],"scores":[0.48421,0.28133,0.23446]}}
{"input":{"userId":"445","itemList":["527","55","901"]},"output":{"recommendedItems":["901","527","55"],"scores":[0.46972,0.31011,0.22017]}}
{"input":{"userId":"71","itemList":["29","351","199"]},"output":{"recommendedItems":["351","29","199"],"scores":[0.68937,0.24829,0.06232]}}

```

The output is a JSON Lines file. It consists of individual JSON objects, one per line. So we will need to put in more work later to digest the results in this format.

### Building the input file

When you are using the batch feature, you specify the users that you'd like to receive recommendations for when the job has completed. The cell below will again select a few random users and will then build the file and save it to disk. From there, you will upload it to S3 to use in the API call later.

In [49]:
# We will use the same users from before
print (users)
# Write the file to disk
json_input_filename = "json_input.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for user_id in users:
        json_input.write('{"userId": "' + str(user_id) + '", "itemList":'+json.dumps(rerank_item_list)+'}\n')

[28783, 84713, 109510]


In [50]:
# Showcase the input file:
!cat $data_dir"/"$json_input_filename

{"userId": "28783", "itemList":["107563", "202599", "78629", "124703", "86985", "96522", "136568", "139693", "146090", "153740", "101319", "31340", "200934", "27362", "133616", "165441", "165727", "160976", "123077", "178839", "74127", "154009", "75335", "1040", "175395"]}
{"userId": "84713", "itemList":["107563", "202599", "78629", "124703", "86985", "96522", "136568", "139693", "146090", "153740", "101319", "31340", "200934", "27362", "133616", "165441", "165727", "160976", "123077", "178839", "74127", "154009", "75335", "1040", "175395"]}
{"userId": "109510", "itemList":["107563", "202599", "78629", "124703", "86985", "96522", "136568", "139693", "146090", "153740", "101319", "31340", "200934", "27362", "133616", "165441", "165727", "160976", "123077", "178839", "74127", "154009", "75335", "1040", "175395"]}


Upload the file to S3 and save the path as a variable for later.

In [51]:
# Upload files to S3
boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+"/"+json_input_filename)
s3_input_path = "s3://" + bucket_name + "/" + json_input_filename
print(s3_input_path)

s3://061703360474-us-east-1-personalizepocvod/json_input.json


Batch recommendations read the input from the file we've uploaded to S3. Similarly, batch recommendations will save the output to file in S3. So we define the output path where the results should be saved.

In [52]:
# Define the output path
s3_output_path = "s3://" + bucket_name + "/"
print(s3_output_path)

s3://061703360474-us-east-1-personalizepocvod/


Now just make the call to kick off the batch export process.

In [53]:
batchInferenceJobArn = personalize.create_batch_inference_job (
    solutionVersionArn = rerank_solution_version_arn,
    jobName = "VOD-POC-Batch-Inference-Job-PersonalizedRanking_" + str(round(time.time()*1000)),
    roleArn = role_arn,
    jobInput = 
     {"s3DataSource": {"path": s3_input_path}},
    jobOutput = 
     {"s3DataDestination":{"path": s3_output_path}}
)
batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']

Run the while loop below to track the status of the batch recommendation call. This can take around 30 minutes to complete, because Personalize needs to stand up the infrastructure to perform the task. We are testing the feature with a dataset of only 3 users, which is not an efficient use of this mechanism. Normally, you would only use this feature for bulk processing, in which case the efficiencies will become clear.

In [54]:
current_time = datetime.now()
print("Import Started on: ", current_time.strftime("%I:%M:%S %p"))

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(
        batchInferenceJobArn = batchInferenceJobArn
    )
    status = describe_dataset_inference_job_response["batchInferenceJob"]['status']
    print("DatasetInferenceJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
current_time = datetime.now()
print("Import Completed on: ", current_time.strftime("%I:%M:%S %p"))

Import Started on:  11:38:30 AM
DatasetInferenceJob: CREATE PENDING
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: ACTIVE
Import Completed on:  11:51:31 AM


In [52]:
s3 = boto3.client('s3')
export_name = json_input_filename + ".out"
s3.download_file(bucket_name, export_name, data_dir+"/"+export_name)

# Update DF rendering
pd.set_option('display.max_rows', 30)
with open(data_dir+"/"+export_name) as json_file:
    # Get the first line and parse it
    line = json.loads(json_file.readline())
    # Do the same for the other lines
    while line:
        # extract the user ID 
        col_header = "User: " + line['input']['userId']
        # Create a list for all the artists
        recommendation_list = []
        # Add all the entries
        for item in line['output']['recommendedItems']:
            movie = get_movie_by_id(item)
            recommendation_list.append(movie)
        if 'bulk_recommendations_df' in locals():
            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])
            try:
                bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)
            except:
                bulk_recommendations_df = bulk_recommendations_df
        else:
            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])
        try:
            line = json.loads(json_file.readline())
        except:
            line = None
bulk_recommendations_df

Unnamed: 0,User: 526,User: 291,User: 426
0,Virunga (2014),"Golden Compass, The (2007)",My Boss's Daughter (2003)
1,PelÃ©: Birth of a Legend (2016),Casanova (2005),Scary Movie 2 (2001)
2,Magnolia (1999),PelÃ©: Birth of a Legend (2016),How the Grinch Stole Christmas! (1966)
3,American Drug War: The Last White Hope (2007),Moon (2009),Casanova (2005)
4,"Golden Compass, The (2007)",Magnolia (1999),"Golden Compass, The (2007)"
5,Trekkies (1997),Battle Royale 2: Requiem (Batoru rowaiaru II: ...,Moon (2009)
6,Moon (2009),Valiant (2005),Funny Farm (1988)
7,"Mark of Zorro, The (1940)",Scary Movie 2 (2001),Magnolia (1999)
8,Battle Royale 2: Requiem (Batoru rowaiaru II: ...,Virunga (2014),Valiant (2005)
9,Casanova (2005),My Boss's Daughter (2003),"Big Tease, The (1999)"


#### Batch Inference for Segmentation

First let's create the input file.  This is a file that lists each item id 

Reference - https://github.com/aws-samples/amazon-personalize-samples/blob/master/next_steps/core_use_cases/user_segmentation/user_segmentation_example.ipynb

In [66]:
interactions_df = pd.read_csv((data_dir+"/interactions.csv"))

In [67]:
interactions_df.head()

Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,EVENT_TYPE
0,2262,1079,789652009,Click
1,2262,47,789652009,Click
2,2262,21,789652009,Click
3,2262,47,789652009,Watch
4,102689,47,822873600,Click


In [77]:
item_ids = interactions_df[(interactions_df['USER_ID']==426)&(interactions_df['EVENT_TYPE']=='Watch')]['ITEM_ID'].values

In [79]:
# Write the file to disk
json_input_filename = "json_input_items.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for item_id in item_ids:
        json_input.write('{"itemId": "'+str(item_id)+'"}\n')

In [82]:
# Upload files to S3
boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+"/"+json_input_filename)
s3_input_path = "s3://" + bucket_name + "/" + json_input_filename
s3_output_path = "s3://"+bucket_name+"/output/"
print(s3_input_path)

s3://061703360474-us-east-1-personalizepocvod/json_input_items.json


In [83]:
# change the jobName if already exists
create_batch_segment_response = personalize.create_batch_segment_job(
    jobName = "notebook-query-demo-500",
    solutionVersionArn = itemaffinity_solution_version_arn,
    numResults = 500,
    jobInput =  {
        "s3DataSource": {
            "path": s3_input_path
        }
    },
    jobOutput = {
        "s3DataDestination": {
            "path": s3_output_path
        }
    },
    roleArn = role_arn # defined in Step 1.4.2
    )

batch_segment_job_arn = create_batch_segment_response['batchSegmentJobArn']
print(batch_segment_job_arn)



arn:aws:personalize:us-east-1:061703360474:batch-segment-job/notebook-query-demo-500


In [84]:
def wait_for_batch_segment_job(batch_segment_job_arn):
    max_time = time.time() + 3 * 60 * 60
    while time.time() < max_time:
        describe_job_response = personalize.describe_batch_segment_job(
            batchSegmentJobArn = batch_segment_job_arn
        )
        status = describe_job_response["batchSegmentJob"]["status"]
        print("Batch Segment Job: {}".format(status))

        start = describe_job_response["batchSegmentJob"]["creationDateTime"]
        end = describe_job_response["batchSegmentJob"]["lastUpdatedDateTime"]
        if status == "ACTIVE":
            print("Time took: {}".format(end - start))
            break
        if status == "CREATE FAILED":
            print("Time took: {}".format(end - start))
            print("Job Failed: {}".format(describe_job_response["batchSegmentJob"]["failureReason"]))
            break

        time.sleep(180)

In [85]:
wait_for_batch_segment_job(batch_segment_job_arn)

Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: CREATE IN_PROGRESS
Batch Segment Job: ACTIVE
Time took: 0:26:02.535000


In [88]:
output_file_name_s3 = s3_output_path+json_input_filename+".out"
output=pd.read_json(output_file_name_s3, lines = True)
prediction=output.apply(lambda x:pd.Series({'ITEM_ID':x['input']['itemId'],'USER_ID':x['output']['usersList']}),axis=1).set_index('ITEM_ID')['USER_ID']
output.sample(3)

Unnamed: 0,input,output,error
16,{'itemId': '3360'},"{'usersList': ['68211', '130751', '139043', '3...",
92,{'itemId': '2313'},"{'usersList': ['68241', '7897', '38099', '2288...",
24,{'itemId': '2502'},"{'usersList': ['24624', '65407', '117954', '71...",


## Wrap up <a class="anchor" id="wrapup"></a>
[Back to top](#top)

With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate customer data to better integrate with the service, and a knowledge of how to do all this over APIs and by leveraging open source data science tools.

Use these notebooks as a guide to getting started with your customers for POCs. As you find missing components, or discover new approaches, make a pull request and provide any additional helpful components that may be missing from this collection.

You can choose to head to `04_Operations_Layer.ipynb` to go deeper into ML Ops and what a production solution can look like with an automation pipeline.

You'll want to make sure that you clean up all of the resources deployed during this POC. We have provided a separate notebook which shows you how to identify and delete the resources in `05_Clean_Up.ipynb`.

In [None]:
%store event_tracker_arn
%store batchInferenceJobArn