# Module 5. Custom Metric 으로 성능 데이터 및 Cold Start 성능 체크 하기 

이번 모듈에서는 모듈1에서 테스트 용으로 분리했던 데이터를 가지고 Custom 지표를 통해 추가적인 성능을 평가해 보도록 합니다. 
또한 HRNN Coldstart 성능도 추가적으로 확인해 보도록 합니다.Coldstart 아이템은 신규로 등록된 아이템 이기 때문에 성능을 예측하기가 어려운 부분이 있습니다. 

In [9]:
import pandas as pd, numpy as np
import io
import scipy.sparse as ss
import json
import time
import os
import boto3
from botocore.exceptions import ClientError
from metrics import mean_reciprocal_rank, ndcg_at_k, precision_at_k
!pip install tqdm
from tqdm import tqdm_notebook

[33mYou are using pip version 10.0.1, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [19]:
%store -r

In [11]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In [12]:
#read holdout data and coldstart data
df_holdout = pd.read_csv(validation_interaction_filename)
df_coldstart=pd.read_csv(coldstart_interation_filename)

## Custom Metric으로 테스트 데이터 세트 평가하기

이번 파트에서는 앞장에 남겨두었던 데이터 세트를 활용하여 모델 성능을 평가 하도록 합니다.
테스트 데이터 셋에 있는 모든 고유한 사용자에 대해 테스트 데이터 세트 Interaction Ground Truth data와 Campaign에서 생성된 결과를 비교 하도록 합니다.


In [13]:
test_users = df_holdout['USER_ID'].unique()
df_holdout.head()


Unnamed: 0,USER_ID,ITEM_ID,EVENT_VALUE,TIMESTAMP,EVENT_TYPE
0,1,48,5,978824351,RATING
1,1,2294,4,978824291,RATING
2,1,1566,4,978824330,RATING
3,1,1907,4,978824330,RATING
4,1,783,4,978824291,RATING


In [14]:
relevance = []
for user_id in tqdm_notebook(test_users):
    true_items = set(df_holdout[df_holdout['USER_ID']==user_id]['ITEM_ID'].values)
    rec_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id)
    )
    rec_items = [int(x['itemId']) for x in rec_response['itemList']]
    relevance.append([int(x in true_items) for x in rec_items])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  from ipykernel import kernelapp as app


HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




In [15]:
print('mean_reciprocal_rank', np.mean([mean_reciprocal_rank(r) for r in relevance]))
print('precision_at_5', np.mean([precision_at_k(r, 5) for r in relevance]))
print('precision_at_10', np.mean([precision_at_k(r, 10) for r in relevance]))
print('precision_at_25', np.mean([precision_at_k(r, 25) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_5', np.mean([ndcg_at_k(r, 5) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_10', np.mean([ndcg_at_k(r, 10) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_25', np.mean([ndcg_at_k(r, 25) for r in relevance]))

mean_reciprocal_rank 0.26350005593566467
precision_at_5 0.11198675496688741
precision_at_10 0.10013245033112583
precision_at_25 0.08117218543046358
normalized_discounted_cumulative_gain_at_5 0.18136740340044508
normalized_discounted_cumulative_gain_at_10 0.24044758214568512
normalized_discounted_cumulative_gain_at_25 0.34747869222672667


## Cold Start 성능 테스트 

이부분에서는 새롭게 더해진 새로운 아이템(ColdStart)에 대한 추천 성능을 테스트 해보도록 합니다. 


In [39]:
users = df_coldstart['USER_ID'].unique()
users.shape

(6039,)

In [40]:
relevance = []
for user_id in  tqdm_notebook(users[:1000]):

    true_items = set(df_coldstart[df_coldstart['USER_ID']==user_id]['ITEM_ID'].values)

    rec_response = personalize_runtime.get_recommendations(
            campaignArn = hrnn_coldstart_campaign_arn,
            userId = str(user_id)
        )
    rec_items = [int(x['itemId']) for x in rec_response['itemList']]
    relevance.append([int(x in true_items) for x in rec_items])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  from ipykernel import kernelapp as app


HBox(children=(FloatProgress(value=0.0, max=1000.0), HTML(value='')))




In [22]:
print('mean_reciprocal_rank', np.mean([mean_reciprocal_rank(r) for r in relevance]))
print('precision_at_5', np.mean([precision_at_k(r, 5) for r in relevance]))
print('precision_at_10', np.mean([precision_at_k(r, 10) for r in relevance]))
print('precision_at_25', np.mean([precision_at_k(r, 25) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_5', np.mean([ndcg_at_k(r, 5) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_10', np.mean([ndcg_at_k(r, 10) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_25', np.mean([ndcg_at_k(r, 25) for r in relevance]))

mean_reciprocal_rank 0.2775733329091827
precision_at_5 0.13319999999999999
precision_at_10 0.12360000000000002
precision_at_25 0.11144
normalized_discounted_cumulative_gain_at_5 0.18656523234155697
normalized_discounted_cumulative_gain_at_10 0.23745924259361573
normalized_discounted_cumulative_gain_at_25 0.3622279100966482


기존 이전 대비 많이 행상된 것이 확인해 볼수 있습니다.

### A baseline

랜덤으로 추천하엿을 경우 대비 Coldstart 성능이 얼마나 좋은 것인지 비교하여 보도록 합니다. 

In [41]:
len(rec_items)

25

In [42]:
relevance = []
for user_id in  tqdm_notebook(users[:1000]):

    true_items = set(df_coldstart[df_coldstart['USER_ID']==user_id]['ITEM_ID'].values)
    rec_items = np.random.permutation(cold_items)[:25]
    relevance.append([int(x in true_items) for x in rec_items])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  from ipykernel import kernelapp as app


HBox(children=(FloatProgress(value=0.0, max=1000.0), HTML(value='')))




In [43]:
print('mean_reciprocal_rank', np.mean([mean_reciprocal_rank(r) for r in relevance]))
print('precision_at_5', np.mean([precision_at_k(r, 5) for r in relevance]))
print('precision_at_10', np.mean([precision_at_k(r, 10) for r in relevance]))
print('precision_at_25', np.mean([precision_at_k(r, 25) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_5', np.mean([ndcg_at_k(r, 5) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_10', np.mean([ndcg_at_k(r, 10) for r in relevance]))
print('normalized_discounted_cumulative_gain_at_25', np.mean([ndcg_at_k(r, 25) for r in relevance]))

mean_reciprocal_rank 0.1107151771408771
precision_at_5 0.03780000000000001
precision_at_10 0.038
precision_at_25 0.03792
normalized_discounted_cumulative_gain_at_5 0.07928200579353978
normalized_discounted_cumulative_gain_at_10 0.11349734090449269
normalized_discounted_cumulative_gain_at_25 0.1852170317212653


HRNN Cold Start모델은 메타 데이터의 일부 정보를 활용하여 Interaction정보가 없는 새로운 아이템에 대해도 추천을 할수 있습니다. 메타 데이터 정보가 장르밖에 없었음에도 랜덤 추천 대비 약 3~4배의 성능이 있었음을 확인할 수 있습니다. 메타 데이터 성능을 향상 시키거나 Cold-start item비율을 줄인다면 더 좋은 성능을 기대해 볼 수 있습니다. 


## A quick test

In [37]:
# we had saved all the data before deleting the cold items

df=pd.read_csv(interaction_filename)
df = df.sort_values('TIMESTAMP', kind='mergesort').copy()

In [None]:
items_all = pd.read_csv('./ml-1m/movies.dat',sep='::', encoding='latin1',names=['ITEM_ID', '_TITLE', 'GENRE'],)
del items_all['_TITLE']

user_id = users[100]
hist_items = df[df['USER_ID']==user_id]['ITEM_ID'].tail(5).values
items_all.set_index('ITEM_ID').loc[hist_items]

In [76]:
rec_response = personalize_runtime.get_recommendations(
            campaignArn = hrnn_coldstart_campaign_arn,
            userId = str(user_id)
        )
rec_items = [int(x['itemId']) for x in rec_response['itemList']]

items_all.set_index('ITEM_ID').loc[rec_items[:5]]



[1298, 1196, 1868, 1263, 1224, 2972, 647, 41, 3091, 3808, 1208, 2235, 3746, 1674, 866, 3257, 2322, 924, 1264, 2872, 24, 3503, 3700, 1306, 946]


Unnamed: 0_level_0,GENRE
ITEM_ID,Unnamed: 1_level_1
1298,Drama|Musical|War
1196,Action|Adventure|Drama|Sci-Fi|War
1868,Drama|War
1263,Drama|War
1224,Drama|War


In [77]:
def is_cold_item(rec_items):
    count=0
    np_cold_items=np.array(cold_items)
    for i in range(len(rec_items)):
        if np.where(np_cold_items==rec_items[i]):
            count+=1
        else:
            print("Item_id {} is not Coldstart Item".format(rec_items[i]))
    print(count)
is_cold_item(rec_items)


Item_id 1298 is Coldstart Item
Item_id 1196 is Coldstart Item
Item_id 1868 is Coldstart Item
Item_id 1263 is Coldstart Item
Item_id 1224 is Coldstart Item
Item_id 2972 is Coldstart Item
Item_id 647 is Coldstart Item
Item_id 41 is Coldstart Item
Item_id 3091 is Coldstart Item
Item_id 3808 is Coldstart Item
Item_id 1208 is Coldstart Item
Item_id 2235 is Coldstart Item
Item_id 3746 is Coldstart Item
Item_id 1674 is Coldstart Item
Item_id 866 is Coldstart Item
Item_id 3257 is Coldstart Item
Item_id 2322 is Coldstart Item
Item_id 924 is Coldstart Item
Item_id 1264 is Coldstart Item
Item_id 2872 is Coldstart Item
Item_id 24 is Coldstart Item
Item_id 3503 is Coldstart Item
Item_id 3700 is Coldstart Item
Item_id 1306 is Coldstart Item
Item_id 946 is Coldstart Item
25


이 사용자는액션|어드벤처|스릴러 아이템을 많이 선택하였고 모델도 장르에서 만이 선택하였다는 것을 다는 것을 알았습니다. 콜드 아이템에서 액션 | 어드벤처 | 스릴러 아이템을 추천합니다.

## Another quick test

In [33]:
user_id = users[2]
hist_items = df[df['USER_ID']==user_id]['ITEM_ID'].tail(5).values
items_all.set_index('ITEM_ID').loc[hist_items]

Unnamed: 0_level_0,GENRE
ITEM_ID,Unnamed: 1_level_1
1304,Action|Comedy|Western
3619,Comedy
1270,Comedy|Sci-Fi
1079,Comedy
1259,Adventure|Comedy|Drama


In [35]:
rec_response = personalize_runtime.get_recommendations(
            campaignArn = hrnn_coldstart_campaign_arn,
            userId = str(user_id)
        )
rec_items = [int(x['itemId']) for x in rec_response['itemList']]
items_all.set_index('ITEM_ID').loc[rec_items[:5]]

Unnamed: 0_level_0,GENRE
ITEM_ID,Unnamed: 1_level_1
2012,Comedy|Sci-Fi|Western
2054,Adventure|Children's|Comedy|Fantasy|Sci-Fi
258,Adventure|Children's|Comedy|Fantasy|Romance
673,Adventure|Animation|Children's|Comedy|Fantasy
368,Action|Comedy|Western


다시 한번 테스트를 통해 해당 사용자는 Comedy|Action을 주로 보았고 Amazon personalize 모델이 Comedy|Action 아이템을 추천하는 것을 볼 수 있습니다. 