# 03 - Personalized Content-Based Recommender

This code build on our previous notebooks to build a personalized content-based recommender. Briefly, a personalized recommender is one which provides suggestions that are tailored to the user --- i.e. every user gets a different set of recommendations based on their preferences. In the case of a review-based recommender, the preferences of a user can be the words that user mention in their review. That way, if the user constantly talks about _room service_, it makes sense to recommend highly rated hotels that are associated with reviews about _room service_.


<blockquote>
**NOTE**: At this point, you should've ran the previous notebook to create user and item profiles --- these are prerequisites to this notebook.
</blockquote>

A non-personalized recommender, on the other hand, suggests the same set of items (e.g. restaurants) to all users. This means that the recommendations are not tailored to the user's preferences.

This code in this notebook was run with the following configuration:

    pd.__version__: 0.21.0

In [1]:
import os

import pandas as pd
from pprint import pprint
from random import choice
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

Load reviews from disk

In [2]:
df_user_reviews = pd.read_csv('../data/user-reviews.csv')
df_item_reviews = pd.read_csv('../data/item-reviews.csv')

print('Number of users: {:,}'.format(len(df_user_reviews)))

Number of users: 11,165


## Create Bag-of-Words Representation

Next, using the newly created dataframe (i.e. `df_item_reviews`), we create TF-IDF matrix where the rows represent items and the columns represent the words mentioned in the all the reviews of the item.

In [3]:
vectorizer = TfidfVectorizer()
item_word_matrix = vectorizer.fit_transform(df_item_reviews.review_text)
print('Item-Word TF-IDF matrix created with dimensions', item_word_matrix.shape)

Item-Word TF-IDF matrix created with dimensions (4174, 49225)


In [4]:
user_word_matrix = vectorizer.transform(df_user_reviews.review_text)
print('User-Word TF-IDF matrix created with dimensions', user_word_matrix.shape)

User-Word TF-IDF matrix created with dimensions (11165, 49225)


Let's also create a lookup for item IDs. Essentially, it's a dictionary (or a lookup table) where 

- the **keys** correspond to the row value of the item in the `df_item_reviews` dataframe, and 
- the **values** correspond to the actualy ID of the item (i.e. the restaurant)

In [5]:
item_id_lookup = df_item_reviews.item_id.to_dict()  
user_id_lookup = df_user_reviews.user_id.to_dict()  

# Create a reverse lookup for finding a row location
# based on any given user ID
user_idx_lookup = {v:k for k,v in user_id_lookup.items()}

pprint(list(item_id_lookup.items())[:3])
pprint(list(user_id_lookup.items())[:3])
pprint(list(user_idx_lookup.items())[:3])

[(0, '--9e1ONYQuAa-CB_Rrw7Tw'),
 (1, '--cZ6Hhc9F7VkKXxHMVZSQ'),
 (2, '-0NhdsDJsdarxyDPR523ZQ')]
[(0, '--8g9UaBe0xQ4FD0q34h_A'),
 (1, '--KQJPdrU0Md97DiOliDzw'),
 (2, '-0S_XaK3Q_Mesal2Unta2w')]
[('--8g9UaBe0xQ4FD0q34h_A', 0),
 ('--KQJPdrU0Md97DiOliDzw', 1),
 ('-0S_XaK3Q_Mesal2Unta2w', 2)]


## Generate Recommendations

Compute a pairwise similarity matrix where 
- the rows correspond to users, and 
- the columns correspond to items

This can be an expensive process, especially if the matrix dimensions are high.

In [6]:
similarity_matrix = cosine_similarity(user_word_matrix, item_word_matrix)
print('Similarity matrix:', similarity_matrix.shape)

Similarity matrix: (11165, 4174)


In [7]:
recommendations = []
recommendation_size = 10

for user_id in df_user_reviews.user_id.unique():
    user_idx = user_idx_lookup[user_id]
    recommended_items = similarity_matrix[user_idx]

    # Sort the recommendations in decreasing order of similarity,
    # and return the indexes of the top-n items where n = recommendation_size
    recommended_items_idxs = recommended_items.argsort()[::-1][1:recommendation_size]
    
    # Convert those index positions to actual ID values using our lookup table.
    recommended_items_ids = [item_id_lookup[item_idx] 
                             for item_idx in recommended_items_idxs]
    
    # Get the similarity scores for the recommendations.
    sim_scores = similarity_matrix[user_idx][recommended_items_idxs].tolist()

    recommendation = dict(
        user_id=user_id,
        sim_item_ids=','.join(recommended_items_ids),
        sim_scores=','.join([str(s) for s in sim_scores])
    )
    recommendations.append(recommendation)

Let's convert our recommendations to a DataFrame, because we 💚 DataFrames. The columns are:
- `item_id`: The ID of the item.
- `sim_item_ids`: The recommendations for the item in `item_id`
- `sim_scores`: The similarities of the recommendations to the item in `item_id`

In [8]:
df_recommendations = pd.DataFrame(recommendations)
display(df_recommendations.head(3))

Unnamed: 0,sim_item_ids,sim_scores,user_id
0,"dYMhfzyZyklXELmYq_wfKg,aT_SsfZ6GQgJGyuIv1Hapw,...","0.34118541546198344,0.3106378646633447,0.30784...",--8g9UaBe0xQ4FD0q34h_A
1,"07gh-AImcEcWQ2bew-jprg,Po0QGzgwFrfnGJEi_akCjA,...","0.4910988867424457,0.48006443950069755,0.46310...",--KQJPdrU0Md97DiOliDzw
2,"XUA7xA7myMaCuN6G3xYdIA,hroo5nOO8b9QhHX0GLg7oA,...","0.5710409413940867,0.5708601179783902,0.568313...",-0S_XaK3Q_Mesal2Unta2w


## Example Recommendations

Now, given any item ID, we should be able to produce a recommendation for the top-N most similar items (e.g. restaurants) based on the words in that occur in their reviews.

In [9]:
df_restaurants = pd.read_csv('../data/items.csv')

# Let's randomly choose 3 users
random_user_ids = df_user_reviews.sample(3).user_id.values.tolist()

In [10]:
for user_id in random_user_ids:
    print('Generating recommendation for', user_id)
    row = df_recommendations.query('user_id == @user_id')
    df_mlt = pd.DataFrame({
        'item_id': row.sim_item_ids.values[0].split(','),
        'similarities': row.sim_scores.values[0].split(',')
    })

    columns_to_show = ['item_name', 'categories', 'average_rating', 'similarities']
    df_biz_info = pd.merge(df_restaurants, df_mlt, how='inner')
    df_biz_info = df_biz_info[columns_to_show].sort_values(
        ['similarities', 'average_rating']
    )
    display(df_biz_info)

    

Generating recommendation for Uea_V6r3pzynOXm_NoNBBw


Unnamed: 0,item_name,categories,average_rating,similarities
2,First Food & Bar,"American (New),American (Traditional),Nightlif...",3.5,0.6572261666326642
0,Tom Colicchio's Craftsteak,"Steakhouses,Restaurants,Cheesesteaks,Food,Amer...",4.0,0.6580692134279444
8,TAO Asian Bistro,"Asian Fusion,Bars,Restaurants,Lounges,Nightlife",3.5,0.6598220355972241
3,MIX,"Bars,French,Lounges,American (New),Nightlife,R...",4.0,0.6605622070949438
4,Emeril's New Orleans Fish House,"Restaurants,Seafood,American (New)",3.5,0.6609762664729969
7,Hash House A Go Go,"Breakfast & Brunch,Restaurants,American (New)",4.0,0.664533117190942
1,Mon Ami Gabi,"Restaurants,Steakhouses,French,Breakfast & Brunch",4.0,0.6656831996134304
5,Olives,"Bars,American (New),Restaurants,Nightlife,Medi...",4.0,0.66842145655084
6,Lotus of Siam,"Automotive,Car Dealers,Restaurants,Thai,Bars,W...",4.0,0.6991324083949937


Generating recommendation for 7M_JCs91AO4BkXpbpG1VtQ


Unnamed: 0,item_name,categories,average_rating,similarities
6,ARIA Café,"Restaurants,Cafes,American (New),Breakfast & B...",2.5,0.1399665924255301
1,Ichiza,"Japanese,Restaurants",4.0,0.1425943931596048
4,Capo's Italian Cuisine,"Italian,Restaurants",4.0,0.1461330292879036
8,Border Grill,"Tapas/Small Plates,Breakfast & Brunch,Mexican,...",4.0,0.1503457971758564
7,The Peppermill Restaurant & Fireside Lounge,"Lounges,Nightlife,Restaurants,Breakfast & Brun...",4.0,0.1521136195510767
2,Firefly,"Tapas Bars,Tapas/Small Plates,Restaurants",4.5,0.1555087479908224
3,La Feria,"Restaurants,Peruvian,Latin American",4.0,0.1575821383788005
5,El Gordo Fine Foods,"Mexican,Restaurants,Bakeries,Latin American,Fo...",4.0,0.1588149089435419
0,Jumbo Empanadas,"Restaurants,Latin American",4.0,0.2533590288562571


Generating recommendation for ylvQgW8feJeSQTaugGSrfw


Unnamed: 0,item_name,categories,average_rating,similarities
7,Tamba,"Halal,Gluten-Free,Indian,Restaurants,Pakistani...",3.5,0.4026303375003038
2,Copper,"Indian,Restaurants,Pakistani",4.0,0.4145634675478562
1,Namaste Indian Cuisine,"Indian,Restaurants",4.0,0.4169828113397984
8,Samosa Factory,"Indian,Vegan,Restaurants,Vegetarian",4.0,0.4261398169789404
6,Mount Everest India's Cuisine,"Indian,Restaurants",4.5,0.4275723860717317
5,India Garden,"Pakistani,Indian,Restaurants",3.5,0.4315480248016181
4,Tamarind,"Restaurants,Indian",4.0,0.4584065578349521
3,Origin India Restaurant & Bar,"Restaurants,Pakistani,Indian,Nightlife,Vegetar...",3.5,0.4648677720521537
0,Mint Indian Bistro,"Halal,Restaurants,Indian,Vegan",4.0,0.465648745262431
