## Create the similarity matrix

In 3 simple steps:

1. Create the big users-items table

2. Replace NaNs with zeros

3. Compute pairwise cosine similarities

### 1. Create the big users-items table.

We are just reshaping (pivoting) the data, so that we have users as rows and restaurants as columns. We need the data to be in this shape to compute similarities between users in the next step.

In [11]:
import pandas as pd

# rating_final.csv
url = 'https://drive.google.com/file/d/1ptu4AlEXO4qQ8GytxKHoeuS1y4l_zWkC/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)

frame.head(5)

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [12]:
# 'geoplaces2.csv'
url = 'https://drive.google.com/file/d/1ee3ib7LqGsMUksY68SD9yBItRvTFELxo/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
geodata = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

places =  geodata[['placeID', 'name']]

places.head(5)

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


In [13]:
users_items = pd.pivot_table(data=frame, 
                                 values='rating', 
                                 index='userID', 
                                 columns='placeID')

users_items.head()

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,...,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,,,,,,,,,,,...,,,,0.0,,,,,,
U1002,,,,,,,,,,,...,,,,1.0,,,,1.0,,
U1003,,,,,,,,,,,...,2.0,,,,,,,,,
U1004,,,,,,,,,,,...,,,,,,,,2.0,,
U1005,,,,,,,,,,,...,,,,,,,,,,


### 2. Replace NaNs with zeros
The cosine similarity can't be computed with NaN's

In [2]:
users_items.fillna(0, inplace=True)
users_items.head()

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,...,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U1002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
U1003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
U1004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
U1005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3. Compute cosine similarities

In [20]:
users_items.shape

(138, 130)

In [3]:
from sklearn.metrics.pairwise import cosine_similarity

user_similarities = pd.DataFrame(cosine_similarity(users_items),
                                 columns=users_items.index, 
                                 index=users_items.index)
user_similarities.head()

userID,U1001,U1002,U1003,U1004,U1005,U1006,U1007,U1008,U1009,U1010,...,U1129,U1130,U1131,U1132,U1133,U1134,U1135,U1136,U1137,U1138
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,1.0,0.227921,0.166957,0.0,0.059761,0.111456,0.188982,0.0,0.106904,0.0,...,0.0,0.0,0.0,0.353553,0.0,0.083478,0.0,0.0,0.14825,0.0
U1002,0.227921,1.0,0.266371,0.158362,0.095346,0.088911,0.075378,0.0,0.426401,0.0,...,0.0,0.0,0.0,0.402911,0.0,0.199778,0.0,0.322329,0.413919,0.355335
U1003,0.166957,0.266371,1.0,0.0,0.0,0.325645,0.0,0.0,0.374817,0.0,...,0.0,0.0,0.0,0.118056,0.0,0.439024,0.0,0.059028,0.476463,0.208232
U1004,0.0,0.158362,0.0,1.0,0.166091,0.07744,0.131306,0.0,0.037139,0.0,...,0.0,0.0,0.0,0.350931,0.0,0.0,0.0,0.280745,0.103005,0.0
U1005,0.059761,0.095346,0.0,0.166091,1.0,0.0,0.237171,0.0,0.0,0.447214,...,0.0,0.0,0.0,0.084515,0.0,0.0,0.0,0.0,0.124035,0.0


## Building the recommender step by step:

Let's focus on one random user (user `U1001`) and compute the recommendations only for this user, as an example. Then, we will build a function that can compute recommendations for any users. We will follow these steps:

1. Compute the weights.

2. Find restaurants user `U1001` has not rated.

3. Compute the ratings user `U1001` would give to those unrated restaurants.

4. Find the top 5 restaurants from the rating predictions.

### 1. Compute the weights

Here we will exclude user `U1001` using `.query()`.

In [31]:
# compute the weights for one user
user_id = "U1001"

weights = (
    user_similarities.query("userID!=@user_id")[user_id] / sum(user_similarities.query("userID!=@user_id")[user_id])
          )
weights.head(6)

userID
U1002    0.023329
U1003    0.017089
U1004    0.000000
U1005    0.006117
U1006    0.011408
U1007    0.019343
Name: U1001, dtype: float64

In [5]:
weights.sum()

1.0

### 2. Find restaurants user `U1001` has not rated.

We will exclude our user, since we don't want to include them on the weights.

In [6]:
users_items.loc[user_id,:]==0

placeID
132560    True
132561    True
132564    True
132572    True
132583    True
          ... 
135088    True
135104    True
135106    True
135108    True
135109    True
Name: U1001, Length: 130, dtype: bool

In [7]:
# select restaurants that the inputed user has not visited
not_visited_restaurants = users_items.loc[users_items.index!=user_id, users_items.loc[user_id,:]==0]
not_visited_restaurants.T

userID,U1002,U1003,U1004,U1005,U1006,U1007,U1008,U1009,U1010,U1011,...,U1129,U1130,U1131,U1132,U1133,U1134,U1135,U1136,U1137,U1138
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
132560,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
132561,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
132564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
132572,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
132583,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
135104,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
135106,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
135108,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3. Compute the ratings user `U1001` would give to those unrated restaurants.

In [8]:
# dot product between the not-visited-restaurants and the weights
weighted_averages = pd.DataFrame(not_visited_restaurants.T.dot(weights), columns=["predicted_rating"])
weighted_averages.head()

Unnamed: 0_level_0,predicted_rating
placeID,Unnamed: 1_level_1
132560,0.0
132561,0.0
132564,0.0
132572,0.193427
132583,0.0


### 4. Find the top 5 restaurants from the rating predictions

In [9]:
recommendations = weighted_averages.merge(places, left_index=True, right_on="placeID")
recommendations.sort_values("predicted_rating", ascending=False).head()

Unnamed: 0,predicted_rating,placeID,name
121,0.878773,135085,Tortas Locas Hipocampo
65,0.742529,135052,La Cantina Restaurante
119,0.622755,135032,Cafeteria y Restaurant El Pacifico
60,0.549689,135038,Restaurant la Chalita
113,0.495248,135062,Restaurante El Cielo Potosino


### Challenge:

1. Make a function that recommends the top `n` restaurants to an inputted `userID`

2. Make this function for the movies dataset.

In [48]:
n = 10
user_id = 'U1001'


def recommendations(n, user_id):
  users_items = pd.pivot_table(data=frame,
                               values='rating',
                               index='userID',
                               columns='placeID')
  users_items.fillna(0, 
                     inplace=True)
  user_similarities = pd.DataFrame(cosine_similarity(users_items),
                                   columns=users_items.index,
                                   index=users_items.index)

  weights = (
      user_similarities
      .query('userID!=@user_id')[user_id]
      / sum(user_similarities
            .query('userID!=@user_id')[user_id])
  )

  not_visited_restaurents = (
      users_items
      .loc[users_items.index!=user_id, users_items.loc[user_id,:]==0]
  )

  weighted_averages = (
      pd.DataFrame(not_visited_restaurents
                   .T.dot(weights), columns=['predicted_rating'])
  )

  recommendations_for_user = (
      weighted_averages
      .merge(places, left_index=True, right_on = 'placeID')
  )

  return (
      recommendations_for_user
      .sort_values('predicted_rating', ascending=False)
      .head(n)
  )


recommendations(n,user_id)

Unnamed: 0,predicted_rating,placeID,name
121,0.878773,135085,Tortas Locas Hipocampo
65,0.742529,135052,La Cantina Restaurante
119,0.622755,135032,Cafeteria y Restaurant El Pacifico
60,0.549689,135038,Restaurant la Chalita
113,0.495248,135062,Restaurante El Cielo Potosino
80,0.438097,132834,Gorditas Doa Gloria
54,0.431464,135060,Restaurante Marisco Sam
25,0.414378,135042,Restaurant Oriental Express
120,0.406709,135028,La Virreina
35,0.384146,135030,Preambulo Wifi Zone Cafe
