# Using Personalize campaigns  on synthetic cars data
This notebook takes advantage of campaigns that have been built in the other notebooks.
There are specific sections for each model:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Imports, overall settings, initialization

In [41]:
import json
import boto3
import time
import datetime
import pandas as pd
from sklearn.utils import shuffle

region             = '<your-region>'
account_num        = '<your-account>'
dataset_group_name = 'car-dg12'

dg_arn = 'arn:aws:personalize:{}:{}:dataset-group/{}'.format(region, 
                                                             account_num, 
                                                             dataset_group_name)

cars_filename         = 'car_items.csv'
users_filename        = 'users.csv'
interactions_filename = 'interactions.csv'
int_exp_filename      = 'interactions_expanded.csv'

ranking_arn           = 'arn:aws:personalize:{}:{}:campaign/car-personalized-ranking'.format(region, account_num)
sims_arn              = 'arn:aws:personalize:{}:{}:campaign/car-sims'.format(region, account_num)
hrnn_arn              = 'arn:aws:personalize:{}:{}:campaign/car-hrnn'.format(region, account_num)
hrnn_metadata_arn     = 'arn:aws:personalize:{}:{}:campaign/car-hrnn-metadata'.format(region, account_num)
pop_arn               = 'arn:aws:personalize:{}:{}:campaign/car-popularity-count'.format(region, account_num)

In [42]:
personalize           = boto3.client('personalize')
personalize_runtime   = boto3.client('personalize-runtime')
personalize_events    = boto3.client('personalize-events')

In [43]:
def show_item_interaction_history(int_df, item_id):
    _tmp_df = int_df[int_df.ITEM_ID == item_id].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','GENDER','SALARY']]

In [44]:
def show_user_interaction_history(int_df, user_id):
    _tmp_df = int_df[int_df.USER_ID == int(user_id)].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','PRICE','MILEAGE']]

In [45]:
def date_to_string(ts):
    return datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')

In [46]:
def is_campaign_active(c):
    _is_active = False
    
    try:
        _resp = personalize.describe_campaign(campaignArn = c)
        _campaign_status = _resp['campaign']['status']
        if _campaign_status == 'ACTIVE':
            _is_active = True
    except Exception as e:
        pass
        
    return _is_active

In [47]:
int_expanded_df = pd.read_csv(int_exp_filename)

int_expanded_df['WHEN'] = int_expanded_df['TIMESTAMP'].apply(date_to_string)

NUM_CLUSTERS = len(int_expanded_df.FAV_CLUSTER.value_counts())
print('{} clusters'.format(NUM_CLUSTERS))

20 clusters


In [48]:
items_to_rank = int_expanded_df.sample(10)
items_to_rank.head(3)

Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,SESSION_ID,MAKE,MODEL,YEAR,MILEAGE,PRICE,AGE,GENDER,LOCATION,SALARY,FAV_CLUSTER,FAV_MODEL,FAV,WHEN
560481,8204,23388,1563133894,67300,Toyota,Sienna,2015,63627,33566,47,MALE,22193,67190,8,4,NEWISH-Toyota-Sienna,2019-07-14 19:51:34
703608,26604,16998,1563136912,62960,Nissan,Altima,2014,79859,31228,40,MALE,89110,43073,12,6,NEWISH-Nissan-Altima,2019-07-14 20:41:52
439237,13529,15976,1563137417,22321,Nissan,Altima,2014,83670,31513,41,MALE,77479,96118,12,6,NEWISH-Nissan-Altima,2019-07-14 20:50:17


In [49]:
def print_item(item_id):
    tmp = int_expanded_df[int_expanded_df.ITEM_ID == item_id].iloc[0]
    print('Id: {}, Make: {}, Model: {}, Fav: {}, Year: {}, Age: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['FAV'], tmp['YEAR'], tmp['AGE']))

Skip ahead to try out various campaigns:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Exercise the Personalized Ranking campaign
Here we want to see Personalize re-rank a set of search results. For our sample, we will pass
a user that likes oldish cars and would expect oldish cars to appear closer to the top. Likewise, we will
pass a user that likes newish cars and expect the higher ranked cars to be newish.

In [50]:
full_df = pd.DataFrame(columns=['USER_ID','FAV','FAV_CLUSTER'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['USER_ID','FAV','FAV_CLUSTER']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_user_df = shuffle(full_df)
ranking_user_list = ranking_user_df['USER_ID'].values.astype(str).tolist()
ranking_user_df.head(NUM_CLUSTERS)

Unnamed: 0,USER_ID,FAV,FAV_CLUSTER
466009,18933,OLDISH-Ford-Fusion,11
199449,8181,NEWISH-Nissan-Leaf,14
671404,21675,OLDISH-Nissan-Altima,13
679564,15078,NEWISH-Ford-Mustang,18
568313,12686,NEWISH-Toyota-Prius,16
75343,17569,NEWISH-Toyota-Sienna,8
168027,16640,NEWISH-Ford-Fusion,10
442762,6356,NEWISH-Nissan-Altima,12
133409,16205,OLDISH-Toyota-Sienna,9
717693,19608,NEWISH-Ford-Explorer,2


In [51]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['ITEM_ID','FAV']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_item_df = shuffle(full_df)
ranking_item_list = ranking_item_df['ITEM_ID'].values.astype(str).tolist()
ranking_item_df.head(25)

Unnamed: 0,ITEM_ID,FAV
733924,23788,OLDISH-Toyota-Camry
208095,27238,OLDISH-Toyota-Rav4
44153,21581,NEWISH-Ford-Explorer
83292,19203,NEWISH-Toyota-Sienna
516877,23505,NEWISH-Toyota-Prius
232393,27903,OLDISH-Nissan-Rogue
421726,18194,NEWISH-Nissan-Altima
29485,31478,NEWISH-Ford-Fusion
716234,29513,OLDISH-Ford-Mustang
434412,25736,NEWISH-Nissan-Rogue


In [52]:
def print_ranking_target_df(user_id, input_df, target_cluster):
    print('\nRanking for user: {}'.format(user_id))
    
    _input_list = input_df['ITEM_ID'].values.astype(str).tolist()
    
    personalized_ranking_response = personalize_runtime.get_personalized_ranking(
        campaignArn = ranking_arn, userId = str(user_id), inputList = _input_list)
    
    i = 0
    _rank = len(_input_list)
    for item in personalized_ranking_response['personalizedRanking']:
        item_id = item['itemId']
        tmp = int_expanded_df[int_expanded_df.ITEM_ID == int(item_id)].iloc[0]
        _fav_cluster = tmp['FAV_CLUSTER']
        if (target_cluster == _fav_cluster) & (_rank == len(_input_list)):
            _rank = i
        print('Id: {}, Make: {}, Model: {}, Year: {}, Price: {}, Fav: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['YEAR'], tmp['PRICE'], tmp['FAV']))
        i += 1
    return _rank

In [53]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV','FAV_CLUSTER'])

random_item_df   = shuffle(int_expanded_df[['ITEM_ID','FAV', 'FAV_CLUSTER']].sample(25))
random_item_list = random_item_df['ITEM_ID'].values.astype(str).tolist()
random_item_df.head(25)

Unnamed: 0,ITEM_ID,FAV,FAV_CLUSTER
482807,21409,OLDISH-Ford-Fusion,11
94114,25606,NEWISH-Nissan-Altima,12
385704,22626,OLDISH-Ford-Fusion,11
452947,20506,NEWISH-Toyota-Sienna,8
327788,21093,OLDISH-Toyota-Sienna,9
292301,28034,OLDISH-Toyota-Sienna,9
367100,30032,NEWISH-Ford-Fusion,10
422265,25672,NEWISH-Nissan-Altima,12
574942,30567,NEWISH-Nissan-Rogue,6
600068,18315,NEWISH-Toyota-Sienna,8


#### Try personalized ranking on a set of random items
Here we take some random items and see how well Personalize can re-rank them
for each of a set of users with a known bias to specific car clusters. Ideally,
we would find that a matching car would rise as close to the 0th rank as possible.
We use a curated list of users that cover each car cluster preference. 

Note that it is likely that some of the users will have a preference that is not
covered by the list of random items. In those cases, the best case is that Personalize
re-ranks the list simply based on popularity.

In [54]:
if is_campaign_active(ranking_arn):
    rank_total = 0
    for i in range(NUM_CLUSTERS):
        user_fav = ranking_user_df.iloc[i]['FAV']
        user_fav_cluster = ranking_user_df.iloc[i]['FAV_CLUSTER']
        print('\nRanking for user that prefers: {}'.format(user_fav))
        rank = print_ranking_target_df(ranking_user_list[i], random_item_df, user_fav_cluster)
        if (random_item_df.shape[0] == rank):
            print('**desired cluster was not found in the item set')
            rank = 0 # reset to not penalize when no item was available
        else:
            print('**rank {}'.format(rank))
        rank_total += rank

    print('\nRank average: {:.2f}'.format(rank_total/len(random_item_list)))
else:
    print('Personalized ranking campaign not active: {}'.format(ranking_arn))


Ranking for user that prefers: OLDISH-Ford-Fusion

Ranking for user: 18933
Id: 22626, Make: Ford, Model: Fusion, Year: 2012, Price: 19810, Fav: OLDISH-Ford-Fusion
Id: 29666, Make: Ford, Model: Fusion, Year: 2012, Price: 19293, Fav: OLDISH-Ford-Fusion
Id: 21409, Make: Ford, Model: Fusion, Year: 2011, Price: 21679, Fav: OLDISH-Ford-Fusion
Id: 37600, Make: Ford, Model: Fusion, Year: 2013, Price: 26797, Fav: OLDISH-Ford-Fusion
Id: 30567, Make: Nissan, Model: Rogue, Year: 2016, Price: 37873, Fav: NEWISH-Nissan-Rogue
Id: 30521, Make: Nissan, Model: Rogue, Year: 2012, Price: 22408, Fav: OLDISH-Nissan-Rogue
Id: 20506, Make: Toyota, Model: Sienna, Year: 2014, Price: 27000, Fav: NEWISH-Toyota-Sienna
Id: 19594, Make: Ford, Model: Fusion, Year: 2017, Price: 41639, Fav: NEWISH-Ford-Fusion
Id: 26415, Make: Toyota, Model: Sienna, Year: 2015, Price: 37346, Fav: NEWISH-Toyota-Sienna
Id: 25672, Make: Nissan, Model: Altima, Year: 2014, Price: 27550, Fav: NEWISH-Nissan-Altima
Id: 27148, Make: Nissan, Mod

#### Try personalized ranking on a curated set of items with each car cluster covered
Here we take a curated set of items, with one item for each car cluster. Personalize
should be able to re-rank in such a way that the specific item that would best match
the user rises to the 0th position.

In [55]:
if is_campaign_active(ranking_arn):
    rank_total = 0
    for i in range(NUM_CLUSTERS):
        user_fav = ranking_user_df.iloc[i]['FAV']
        user_fav_cluster = ranking_user_df.iloc[i]['FAV_CLUSTER']
        print('\nRanking for user that prefers: {}'.format(user_fav))
        rank = print_ranking_target_df(ranking_user_list[i], ranking_item_df, user_fav_cluster)
        if (random_item_df.shape[0] == rank):
            print('**desired cluster was not found in the item set')
            rank = 0 # reset to not penalize when no item was available
        else:
            print('**rank {}'.format(rank))
        rank_total += rank

    print('\nRank average: {:.2f}'.format(rank_total/len(ranking_item_list)))
else:
    print('Personalized ranking campaign not active: {}'.format(ranking_arn))


Ranking for user that prefers: OLDISH-Ford-Fusion

Ranking for user: 18933
Id: 23512, Make: Ford, Model: Fusion, Year: 2013, Price: 24448, Fav: OLDISH-Ford-Fusion
Id: 27238, Make: Toyota, Model: Rav4, Year: 2013, Price: 28215, Fav: OLDISH-Toyota-Rav4
Id: 25736, Make: Nissan, Model: Rogue, Year: 2017, Price: 38159, Fav: NEWISH-Nissan-Rogue
Id: 31478, Make: Ford, Model: Fusion, Year: 2016, Price: 39803, Fav: NEWISH-Ford-Fusion
Id: 27903, Make: Nissan, Model: Rogue, Year: 2012, Price: 26978, Fav: OLDISH-Nissan-Rogue
Id: 23788, Make: Toyota, Model: Camry, Year: 2012, Price: 24901, Fav: OLDISH-Toyota-Camry
Id: 19884, Make: Ford, Model: Explorer, Year: 2012, Price: 19775, Fav: OLDISH-Ford-Explorer
Id: 19203, Make: Toyota, Model: Sienna, Year: 2016, Price: 40728, Fav: NEWISH-Toyota-Sienna
Id: 29038, Make: Toyota, Model: Rav4, Year: 2015, Price: 28734, Fav: NEWISH-Toyota-Rav4
Id: 29513, Make: Ford, Model: Mustang, Year: 2010, Price: 22189, Fav: OLDISH-Ford-Mustang
Id: 23505, Make: Toyota, Mod

## Exercise the hrnn campaign
Here we try out the hrnn campaign. We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly. 
Best case is that the recommended list of cars entirely matches the user's preferred car
cluster.

In [56]:
users_to_try = int_expanded_df.sample(5)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
226896,11027,OLDISH-Nissan-Rogue
257842,16543,OLDISH-Nissan-Rogue
647315,10303,OLDISH-Nissan-Altima


In [57]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(50, 17)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
226900,11027,23058,2019-07-14 19:31:07,OLDISH-Nissan-Rogue,2013,28174,92399
226938,11027,32870,2019-07-14 19:33:07,OLDISH-Nissan-Rogue,2013,28129,91979
226921,11027,31840,2019-07-14 19:37:07,OLDISH-Nissan-Rogue,2010,16404,136320
226939,11027,18470,2019-07-14 19:37:35,OLDISH-Nissan-Rogue,2011,23450,122039
226899,11027,30354,2019-07-14 19:39:35,OLDISH-Nissan-Rogue,2013,24668,90112
226918,11027,29653,2019-07-14 19:43:07,OLDISH-Nissan-Rogue,2012,25890,109265
226925,11027,28201,2019-07-14 19:43:13,OLDISH-Nissan-Rogue,2013,29272,90132
226933,11027,32576,2019-07-14 19:43:35,OLDISH-Nissan-Rogue,2010,13138,136542
226934,11027,16233,2019-07-14 19:43:56,OLDISH-Nissan-Rogue,2011,24012,124874
226893,11027,23000,2019-07-14 19:45:13,OLDISH-Nissan-Rogue,2013,29680,91704


In [58]:
if is_campaign_active(hrnn_arn):
    for i in range(5):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        items = response['itemList']

        match = 0
        actual_num_results = len(items)

        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.2f} ({}/{})'.format(match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

Getting recommendations for user: 11027, who likes: OLDISH-Nissan-Rogue
Id: 27952, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2010, Age: 35
Id: 24274, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2011, Age: 41
Id: 23126, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2012, Age: 40
Id: 23447, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2011, Age: 44
Id: 23342, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2013, Age: 44
Id: 23468, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2013, Age: 45
Id: 27630, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2013, Age: 36
Id: 23988, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2012, Age: 40
Id: 23604, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2012, Age: 38
Id: 24715, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2011, Age: 38
Matched 1.00 (10/10)

Getting recommendations for user: 16543, who likes: OLDI

## Exercise the SIMS campaign
Here we experiment with the SIMS campaign. We loop through a list of items that have at 
least some interactions historically. 
For each car, we would expect similar cars to be similar in age, make and model.
We leverage car clusters and would like to see Personalize generate a list of similar cars
that entirely come from the same car cluster.

In [59]:
items_to_try = int_expanded_df.sample(5)
items_to_try[['ITEM_ID','FAV']].head(5)

Unnamed: 0,ITEM_ID,FAV
164722,26336,OLDISH-Toyota-Sienna
315803,24397,NEWISH-Toyota-Sienna
442193,25575,NEWISH-Nissan-Altima
112884,32136,OLDISH-Ford-Fusion
272011,27600,NEWISH-Nissan-Rogue


In [60]:
if is_campaign_active(sims_arn):
    desired_num_results = 10
    for i in range(items_to_try.shape[0]):
        item_id     = str(items_to_try.iloc[i]['ITEM_ID'])
        fav         = items_to_try.iloc[i]['FAV']
        fav_cluster = items_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting items similar to: {}, which is a: {}'.format(item_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=sims_arn, 
                                                           itemId=item_id, 
                                                           numResults=desired_num_results)
        items = response['itemList']
        
        match = 0
        actual_num_results = len(items)
        
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.2f} ({}/{})'.format(match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('SIMS campaign not active: {}'.format(sims_arn))

Getting items similar to: 26336, which is a: OLDISH-Toyota-Sienna
Id: 34692, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 31
Id: 29313, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 36
Id: 27488, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 34
Id: 22481, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2013, Age: 39
Id: 22107, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 42
Id: 29519, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2013, Age: 40
Id: 27807, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2010, Age: 40
Id: 31566, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2010, Age: 39
Id: 22115, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2013, Age: 38
Id: 21367, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 40
Matched 1.00 (10/10)

Getting items similar to: 24397, which is 

## Exercise the hrnn-metadata campaign
Here we try out the hrnn-metadata campaign. 
We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly.

In [61]:
users_to_try = int_expanded_df.sample(10)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
390331,6379,OLDISH-Ford-Fusion
600704,21032,NEWISH-Nissan-Rogue
302939,15057,OLDISH-Ford-Fusion


In [62]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(10, 17)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
390336,6379,22949,2019-07-14 20:05:19,OLDISH-Ford-Fusion,2013,22663,94973
390335,6379,34916,2019-07-14 20:07:19,OLDISH-Ford-Fusion,2009,18836,158615
390334,6379,25880,2019-07-14 20:11:19,OLDISH-Ford-Fusion,2013,28985,98333
390330,6379,25428,2019-07-14 20:17:19,OLDISH-Ford-Fusion,2010,17717,137904
390333,6379,24667,2019-07-14 20:25:19,OLDISH-Ford-Fusion,2011,18124,125048
390339,6379,25436,2019-07-14 20:35:19,OLDISH-Ford-Fusion,2012,27164,105167
390337,6379,30781,2019-07-14 20:47:19,OLDISH-Ford-Fusion,2013,24441,90840
390332,6379,23050,2019-07-14 21:01:19,OLDISH-Ford-Fusion,2011,21209,123529
390338,6379,22761,2019-07-14 21:17:19,OLDISH-Ford-Fusion,2009,12110,153553
390331,6379,25004,2019-07-14 21:35:19,OLDISH-Ford-Fusion,2011,21530,124538


In [63]:
if is_campaign_active(hrnn_metadata_arn):
    for i in range(10):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_metadata_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        match = 0
        actual_num_results = len(items)

        items = response['itemList']
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.2f} ({}/{})'.format(match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Getting recommendations for user: 6379, who likes: OLDISH-Ford-Fusion
Id: 25454, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 39
Id: 27115, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 34
Id: 25574, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2011, Age: 39
Id: 25461, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2011, Age: 39
Id: 24702, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 43
Id: 26931, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2011, Age: 43
Id: 24270, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 34
Id: 28520, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 34
Id: 23827, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2011, Age: 37
Id: 31942, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2011, Age: 43
Matched 1.00 (10/10)

Getting recommendations for user: 21032, who likes: NEWISH-Nissan-Rogue
Id: 27

## Exercise the popularity campaign
Personalize provides a baseline recommender which leverages simple popularity of an item. 
Here we will
compare its results with our own definition of "popular". 

Our popularity is driven simply by total count of
interactions for that item. We expect significant overlap between our list and the one from Personalize.

#### First let's get the results from Personalize

In [64]:
personalized_pop = []
pop_items = []
NUM_MOST_POPULAR = 10

if is_campaign_active(pop_arn):
    popularity_response = personalize_runtime.get_recommendations(campaignArn=pop_arn, 
                                                                  userId='0', 
                                                                  numResults=NUM_MOST_POPULAR)
    pop_items = popularity_response['itemList']
    for item in pop_items:
        print_item(int(item['itemId']))    
else:
    print('Popularity campaign not active: {}'.format(pop_arn))

for p in pop_items:
    personalized_pop.append(str(p['itemId']))

Id: 25645, Make: Ford, Model: Fusion, Fav: OLDISH-Ford-Fusion, Year: 2012, Age: 40
Id: 24564, Make: Ford, Model: Fusion, Fav: NEWISH-Ford-Fusion, Year: 2017, Age: 39
Id: 24343, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2013, Age: 40
Id: 24189, Make: Ford, Model: Fusion, Fav: NEWISH-Ford-Fusion, Year: 2015, Age: 41
Id: 25372, Make: Ford, Model: Fusion, Fav: NEWISH-Ford-Fusion, Year: 2015, Age: 33
Id: 24794, Make: Ford, Model: Fusion, Fav: NEWISH-Ford-Fusion, Year: 2017, Age: 41
Id: 25897, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2012, Age: 39
Id: 25762, Make: Toyota, Model: Sienna, Fav: OLDISH-Toyota-Sienna, Year: 2009, Age: 48
Id: 22883, Make: Ford, Model: Fusion, Fav: NEWISH-Ford-Fusion, Year: 2016, Age: 44
Id: 24259, Make: Nissan, Model: Altima, Fav: NEWISH-Nissan-Altima, Year: 2017, Age: 42


#### Now let's get the actual popularity counts of the historical interactions

In [65]:
most_popular = pd.DataFrame(int_expanded_df['ITEM_ID'].value_counts().reset_index())
most_popular.drop(['ITEM_ID'], axis=1, inplace=True)
ten_most_popular = most_popular.head(10)

#### Now compare the two lists

In [66]:
if is_campaign_active(pop_arn):
    print('We asked Personalize for {} most popular.'.format(NUM_MOST_POPULAR))
    print('{}'.format(personalized_pop))
    print('We computed it ourselves also')
    print(ten_most_popular['index'])

    overlap_items     = ten_most_popular[ten_most_popular['index'].isin(personalized_pop)]
    overlap_count     = overlap_items.shape[0]
    not_overlap_items = ten_most_popular[~ten_most_popular['index'].isin(personalized_pop)]
    not_overlap_count = not_overlap_items.shape[0]
    
    print('\nOf the actual most popular, {} are selected by Personalize also.'.format(overlap_items))
    print('\nPersonalize did not think this list was truly top 10:'.format(not_overlap_count))
    print(not_overlap_items.head())

We asked Personalize for 10 most popular.
['25645', '24564', '24343', '24189', '25372', '24794', '25897', '25762', '22883', '24259']
We computed it ourselves also
0    24343
1    24564
2    25645
3    24189
4    25762
5    24567
6    25897
7    24794
8    25372
9    24259
Name: index, dtype: int64

Of the actual most popular,    index
0  24343
1  24564
2  25645
3  24189
4  25762
6  25897
7  24794
8  25372
9  24259 are selected by Personalize also.

Personalize did not think this list was truly top 10:
   index
5  24567


## Use real time events
Here we use the event tracker mechanism of personalize to add some events on the fly after deployment of 
a campaign. We then show the impact on the recommendations.

In [67]:
def is_tracker_active(tracker_name):
    _is_active = False
    _event_tracker_arn = ''
    _tracking_id = ''

    resp = personalize.list_event_trackers()
    trackers = resp['eventTrackers']

    for t in trackers:
        if t['name'] == tracker_name:
            _is_active = True
            _event_tracker_arn = t['eventTrackerArn']
            d_resp = personalize.describe_event_tracker(eventTrackerArn = _event_tracker_arn)

            _tracking_id = d_resp['eventTracker']['trackingId']
    
    return _is_active, _event_tracker_arn, _tracking_id

In [68]:
(exists, tracker_arn, tracking_id) = is_tracker_active('CarClickTracker')
if not exists:
    response = personalize.create_event_tracker(
        name='CarClickTracker',
        datasetGroupArn=dg_arn
    )
    print(response['eventTrackerArn'])
    print(response['trackingId'])

    TRACKING_ID = response['trackingId']
else:
    TRACKING_ID = tracking_id

arn:aws:personalize:us-east-1:355151823911:event-tracker/c43b6269
f1615153-5dd5-4d17-9690-3d12b177486b


In [69]:
session_dict = {}

In [70]:
import uuid

def send_car_click(user_id, item_id, ts):
    """
    Simulates a click to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        session_ID = session_dict[user_id]
    except:
        session_dict[user_id] = str(uuid.uuid1())
        session_ID = session_dict[user_id]
        
    # Configure Properties:
    event = {
        'itemId': str(item_id)
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
        trackingId = TRACKING_ID,
        userId     = str(user_id),
        sessionId  = session_ID,
        eventList  = [{
            'sentAt': ts,
            'eventType': 'EVENT_TYPE',
            'properties': event_json
            }]
)

In [71]:
def send_car_clicks(user_id, items):
    # TODO: send all events in a single array instead of one call for each item
    i = 0
    for item in items:
        send_car_click(user_id, item, time.time())
        i += 1

In [72]:
def recommend_cars(user_id, campaign_arn):
    response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                       userId=str(user_id), 
                                                       numResults=10)
    items = response['itemList']
    for item in items:
        print_item(int(item['itemId']))
    print('')

In [73]:
sample_user = int_expanded_df.sample(1).iloc[0]['USER_ID']
sample_user_cluster = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV_CLUSTER']
sample_user_fav = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV']
print('user: {}, cluster: {}, fav: {}'.format(sample_user, sample_user_cluster, sample_user_fav))

user: 20084, cluster: 5, fav: OLDISH-Toyota-Rav4


In [74]:
new_cluster = sample_user_cluster + 1
if (new_cluster == NUM_CLUSTERS):
    new_cluster = 0
new_fav = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].iloc[0]['FAV']
print('new cluster: {}, new fav: {}'.format(new_cluster, new_fav))

new cluster: 6, new fav: NEWISH-Nissan-Rogue


In [75]:
print('Before any real time events, Personalize should recommend {} cars...\n'.format(sample_user_fav))

if is_campaign_active(hrnn_arn):
    print('First using {}'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Before any real time events, Personalize should recommend OLDISH-Toyota-Rav4 cars...

First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn
Id: 27591, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2012, Age: 36
Id: 24350, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2013, Age: 36
Id: 26241, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2011, Age: 34
Id: 21550, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2012, Age: 41
Id: 26691, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2012, Age: 44
Id: 25267, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2010, Age: 41
Id: 26156, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2013, Age: 41
Id: 31348, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2013, Age: 37
Id: 26376, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2013, Age: 37
Id: 24560, Make: Toyota, Model: Rav4, Fav: OLDISH-Toyota-Rav4, Year: 2012, Age: 36

Next using

In [76]:
new_car_cluster = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].sample(100)
new_car_cluster[['FAV','ITEM_ID','YEAR','PRICE']].head(3)

Unnamed: 0,FAV,ITEM_ID,YEAR,PRICE
723873,NEWISH-Nissan-Rogue,27302,2015,29248
428034,NEWISH-Nissan-Rogue,26148,2015,30333
269744,NEWISH-Nissan-Rogue,22280,2014,34087


In [77]:
new_items_clicked = new_car_cluster['ITEM_ID'].values
new_items_clicked

array([27302, 26148, 22280, 35956, 21984, 24302, 16258, 25933, 27294,
       21922, 20942, 27592, 25490, 31083, 26267, 18159, 28878, 22494,
       16648, 21750, 20569, 28455, 24187, 24775, 39886, 28921, 28103,
       21619, 22075, 24796,  9909, 24516, 31147, 19197, 27431, 23828,
       22697, 28921, 22799, 25812, 24659, 26798, 24186, 29332, 25572,
       22358, 21636, 26585, 23985, 15058, 30271, 21922, 23828, 31703,
       24483, 20824, 26820, 28107, 21159, 20155, 28244, 33956, 20106,
       16944, 20438, 27009, 28241, 30544, 25223, 30400, 18612, 20041,
       25950, 25404, 19569, 22572, 22678, 26034, 26554, 20569, 22454,
       20582, 35004, 24978, 28855, 28018, 28625, 18826, 22454, 26281,
       32344, 30400, 19414, 24465, 24315, 20826, 19639, 25812, 33461,
       20380])

In [78]:
send_car_clicks(sample_user, new_items_clicked)

In [79]:
int_expanded_df[int_expanded_df.USER_ID == sample_user]['FAV'].value_counts()

OLDISH-Toyota-Rav4    30
Name: FAV, dtype: int64

In [80]:
print('Now this same user has started to like {} cars.'.format(new_fav))
print('Lets see if Personalize picks up on this real time change in intent...')

if is_campaign_active(hrnn_arn):
    print('First using {}'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Now this same user has started to like NEWISH-Nissan-Rogue cars.
Lets see if Personalize picks up on this real time change in intent...
First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn
Id: 25404, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2016, Age: 36
Id: 27397, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2017, Age: 44
Id: 21847, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2016, Age: 35
Id: 24437, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014, Age: 33
Id: 28368, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2015, Age: 38
Id: 23625, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014, Age: 39
Id: 26028, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014, Age: 40
Id: 19899, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2015, Age: 30
Id: 24903, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014, Age: 45
Id: 24302, Make: Nissan, M