**Collaborative Filtering**

Collaborative methods work with the interaction matrix that can also be called rating matrix in the rare case when users provide explicit rating of items. The task of machine learning is to learn a function that predicts utility of items to each user. Matrix is typically huge, very sparse and most of values are missing.


Collaborative filtering methods are based on collecting and analyzing a large amount of information on user behaviors, activities or preferences and predicting what users will like based on their similarity to other users.

The fundamental assumption behind collaborative filtering technique is that similar user preferences over the items could be exploited to recommend those items to a user who has not seen or used it before. In simpler terms, we assume that users who agreed in the past (purchased the same product or viewed the same movie) will agree in the future.

**Aim is to find all these user and item dependencies in the Matrix.**

Example:

24 = 8 * 3 <br>
     ^   ^ <br>
     |   | <br>
     factors
     
<b>Matrix Factorization</b>: Matrix Factorization finds two rectangular matrices with smaller dimensions to represent a big rating matrix (RM). These factors retain the dependencies and properties of the rating matrix. One matrix can be seen as the user matrix (UM) where rows represent users and columns are k latent factors. The other matrix is the item matrix (IM)where rows are k latent factors and columns represent items. Here k < number of items and k < number of users.

My Approach:

- 1) Creating user-restaurant matrix 
- 2) Checking for sparsity
- 3) Handling Sparsity using Matrix factorization where gradient descent will be appiled to lower the RMSE error and fill up the missing values in the matrix.
- 4) Applying cosine similarity to calculate the similiarity socres for users.
- 5) Applying Top K nearest neighbouros to find the top K recommended movies for the queries user id.

## Importing necessary settings and modules

In [30]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import pairwise_distances

In [13]:
bus_reviews = pd.read_csv('output/business_review.csv',encoding = "ISO-8859-1",index_col=0)

In [14]:
bus_reviews.head()

Unnamed: 0,review_id,user_id,business_id,stars,name,restuarant_stars,review_count,attributes,categories,city
0,3bMgLXMLzm89C_0mkbIFOA,j9hC9EmCsS2S2ZtbsK-l0g,5Q4Gw1pyZnG8IlFNozxIlw,2,Native New Yorker Restaurant,3.0,22,"{'RestaurantsTakeOut': 'True', 'Ambience': ""{'...",Restaurants,Gilbert
1,slhog3p6YaoVEej3USo2Iw,IJ1wbXUh_B5Yn6U1YWHy4g,5Q4Gw1pyZnG8IlFNozxIlw,1,Native New Yorker Restaurant,3.0,22,"{'RestaurantsTakeOut': 'True', 'Ambience': ""{'...",Restaurants,Gilbert
2,noxz5btWWJjSpbUBmregVQ,_L-JKT5OahgimFlVOo708w,5Q4Gw1pyZnG8IlFNozxIlw,4,Native New Yorker Restaurant,3.0,22,"{'RestaurantsTakeOut': 'True', 'Ambience': ""{'...",Restaurants,Gilbert
3,gmiMXHSb6loIDxopmOWHSA,x_I7IDsFeT4vVcEBZLqr7A,5Q4Gw1pyZnG8IlFNozxIlw,5,Native New Yorker Restaurant,3.0,22,"{'RestaurantsTakeOut': 'True', 'Ambience': ""{'...",Restaurants,Gilbert
4,xfgHmu5n7cg-uNoU2C1lZg,ceLFSre4hrzkT5VpSlt2Lg,5Q4Gw1pyZnG8IlFNozxIlw,3,Native New Yorker Restaurant,3.0,22,"{'RestaurantsTakeOut': 'True', 'Ambience': ""{'...",Restaurants,Gilbert


## User based collaborative filtering

While doing exploratory analysis, we saw that Pheonix has the highest number of resturants reviews than other cities. So, we will be keeping the data for Pheonix only.

In [15]:
bus_reviews = bus_reviews[bus_reviews['city'] == 'Phoenix']

In [16]:
bus_reviews

Unnamed: 0,review_id,user_id,business_id,stars,name,restuarant_stars,review_count,attributes,categories,city
34,6FGgykl5quqggNouj7LE5A,Ndssfx9LA9WpIIvj67wJuw,DkaPGavPjjw9DWUmrZ19IQ,1,Church's Fried Chicken,2.5,8,"{'WiFi': ""u'no'"", 'RestaurantsGoodForGroups': ...",Restaurants,Phoenix
35,tuSFEbAFKTBZMFmOA3-Veg,i6NaEP-UmQ6hNiyBpYMQVA,DkaPGavPjjw9DWUmrZ19IQ,1,Church's Fried Chicken,2.5,8,"{'WiFi': ""u'no'"", 'RestaurantsGoodForGroups': ...",Restaurants,Phoenix
36,pbWi-Co8DC97sjtw06scEQ,OsFWc7PMDDACG9MMit7kGQ,DkaPGavPjjw9DWUmrZ19IQ,1,Church's Fried Chicken,2.5,8,"{'WiFi': ""u'no'"", 'RestaurantsGoodForGroups': ...",Restaurants,Phoenix
37,pGIl_9uxupp45yl8X_VIcw,oS0Z4spgZmf73OGw1YkLmA,DkaPGavPjjw9DWUmrZ19IQ,1,Church's Fried Chicken,2.5,8,"{'WiFi': ""u'no'"", 'RestaurantsGoodForGroups': ...",Restaurants,Phoenix
38,KMl_ym1NfhtRR_rUTYxsaw,LqU99zFTGxAqsNZwkj69gQ,DkaPGavPjjw9DWUmrZ19IQ,4,Church's Fried Chicken,2.5,8,"{'WiFi': ""u'no'"", 'RestaurantsGoodForGroups': ...",Restaurants,Phoenix
...,...,...,...,...,...,...,...,...,...,...
719,mHErAneJh1C3prKagRHXhg,NUycvb2CD5f4sel9XXA-MQ,SjxdlHEREXRcFE8xQCGBpQ,4,Qween Creek Olive Mill,3.0,5,{'HasTV': 'True'},Restaurants,Phoenix
720,aMPIl2nC2Oyvvl_KbiIQ1A,EtySygJ2sep1ELBBCaJgMQ,SjxdlHEREXRcFE8xQCGBpQ,1,Qween Creek Olive Mill,3.0,5,{'HasTV': 'True'},Restaurants,Phoenix
721,qZ5q2GwGwmX_S8fyxq1Fjg,1bYuoztlpZuYneVTCrRLzQ,SjxdlHEREXRcFE8xQCGBpQ,1,Qween Creek Olive Mill,3.0,5,{'HasTV': 'True'},Restaurants,Phoenix
722,YiJaYu00BXgNLeuT5pl7Dw,j_aAOJmTGn7DMhjqpNE14A,SjxdlHEREXRcFE8xQCGBpQ,5,Qween Creek Olive Mill,3.0,5,{'HasTV': 'True'},Restaurants,Phoenix


In [22]:
check = pd.pivot_table(bus_reviews,values='stars',index='user_id',columns='business_id')
check.head()

business_id,2eL6ete3g0EtVpuRx2sF9w,DkaPGavPjjw9DWUmrZ19IQ,EX3wTLh4gOVF7RQ0ulGYPw,JL42sdCzD5WV_gkNWublxA,SjxdlHEREXRcFE8xQCGBpQ,rPY4bukI1QdfJc-1OCAwLg
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0rn5E5l1tVZZXlhTbU340g,,,,,,1.0
193UVQE-VD2gxYx4-bkzGw,,,,,,2.0
1bYuoztlpZuYneVTCrRLzQ,,,,,1.0,
1v_W1gC_vZo_X_dTVCwXvg,,,,,,1.0
1yw3JgmQQxwP662FOLZrWg,,5.0,,,,


In [24]:
final_business = check.fillna(check.mean(axis=0))

# Replacing NaN by user Average
final_user = check.apply(lambda row: row.fillna(row.mean()), axis=1)

In [25]:
final_business

business_id,2eL6ete3g0EtVpuRx2sF9w,DkaPGavPjjw9DWUmrZ19IQ,EX3wTLh4gOVF7RQ0ulGYPw,JL42sdCzD5WV_gkNWublxA,SjxdlHEREXRcFE8xQCGBpQ,rPY4bukI1QdfJc-1OCAwLg
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0rn5E5l1tVZZXlhTbU340g,1.0,2.625,2.916667,4.0,3.0,1.000000
193UVQE-VD2gxYx4-bkzGw,1.0,2.625,2.916667,4.0,3.0,2.000000
1bYuoztlpZuYneVTCrRLzQ,1.0,2.625,2.916667,4.0,1.0,2.208333
1v_W1gC_vZo_X_dTVCwXvg,1.0,2.625,2.916667,4.0,3.0,1.000000
1yw3JgmQQxwP662FOLZrWg,1.0,5.000,2.916667,4.0,3.0,2.208333
...,...,...,...,...,...,...
wMRM_m-FgorKlNhFZqpFxQ,1.0,2.625,2.916667,5.0,3.0,2.208333
xgI1C_yELWMragW3ynmzKQ,1.0,2.625,4.000000,4.0,3.0,2.208333
y2n8rM-M4mgUrhCObgdzIA,1.0,2.625,2.916667,4.0,3.0,1.000000
z_tqPytGQF_tDw_buHKJcw,1.0,2.625,4.000000,4.0,3.0,2.208333


In [27]:
final_user

business_id,2eL6ete3g0EtVpuRx2sF9w,DkaPGavPjjw9DWUmrZ19IQ,EX3wTLh4gOVF7RQ0ulGYPw,JL42sdCzD5WV_gkNWublxA,SjxdlHEREXRcFE8xQCGBpQ,rPY4bukI1QdfJc-1OCAwLg
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0rn5E5l1tVZZXlhTbU340g,1.0,1.0,1.0,1.0,1.0,1.0
193UVQE-VD2gxYx4-bkzGw,2.0,2.0,2.0,2.0,2.0,2.0
1bYuoztlpZuYneVTCrRLzQ,1.0,1.0,1.0,1.0,1.0,1.0
1v_W1gC_vZo_X_dTVCwXvg,1.0,1.0,1.0,1.0,1.0,1.0
1yw3JgmQQxwP662FOLZrWg,5.0,5.0,5.0,5.0,5.0,5.0
...,...,...,...,...,...,...
wMRM_m-FgorKlNhFZqpFxQ,5.0,5.0,5.0,5.0,5.0,5.0
xgI1C_yELWMragW3ynmzKQ,4.0,4.0,4.0,4.0,4.0,4.0
y2n8rM-M4mgUrhCObgdzIA,1.0,1.0,1.0,1.0,1.0,1.0
z_tqPytGQF_tDw_buHKJcw,4.0,4.0,4.0,4.0,4.0,4.0


In [31]:
# user similarity on replacing NAN by user avg
b = cosine_similarity(final_user)
np.fill_diagonal(b, 0 )
similarity_with_user = pd.DataFrame(b,index=final_user.index)
similarity_with_user.columns=final_user.index
similarity_with_user.head()

user_id,0rn5E5l1tVZZXlhTbU340g,193UVQE-VD2gxYx4-bkzGw,1bYuoztlpZuYneVTCrRLzQ,1v_W1gC_vZo_X_dTVCwXvg,1yw3JgmQQxwP662FOLZrWg,2BF4W15wdzpaQQUKFsQe9Q,4bxhIKz9ePWqD93UkEjnbg,4qAD7LR2gsGgX3oNgeTIXQ,60JrvkRZ8v58-XkZ8B-CQw,7M1zIE6OzpySDlqLU6MnEg,...,p7HMe4REUwSHn9in8mUQMQ,qX_jPq81Eoug6JUPDtJigg,rATOBMt4cvaHb1m81fmvbg,t4jgQSUfubezTIZvTqn0tQ,v8DIR4v2PoH6CSHiRAci7g,wMRM_m-FgorKlNhFZqpFxQ,xgI1C_yELWMragW3ynmzKQ,y2n8rM-M4mgUrhCObgdzIA,z_tqPytGQF_tDw_buHKJcw,zj2ff8z23ejDFDAsEcs_BA
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0rn5E5l1tVZZXlhTbU340g,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
193UVQE-VD2gxYx4-bkzGw,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1bYuoztlpZuYneVTCrRLzQ,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1v_W1gC_vZo_X_dTVCwXvg,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1yw3JgmQQxwP662FOLZrWg,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [36]:
# user similarity on replacing NAN by item(business) avg
cosine = cosine_similarity(final_business)
np.fill_diagonal(cosine, 0 )
similarity_with_business = pd.DataFrame(cosine,index=final_business.index)
similarity_with_business.columns=final_user.index
similarity_with_business.head()

user_id,0rn5E5l1tVZZXlhTbU340g,193UVQE-VD2gxYx4-bkzGw,1bYuoztlpZuYneVTCrRLzQ,1v_W1gC_vZo_X_dTVCwXvg,1yw3JgmQQxwP662FOLZrWg,2BF4W15wdzpaQQUKFsQe9Q,4bxhIKz9ePWqD93UkEjnbg,4qAD7LR2gsGgX3oNgeTIXQ,60JrvkRZ8v58-XkZ8B-CQw,7M1zIE6OzpySDlqLU6MnEg,...,p7HMe4REUwSHn9in8mUQMQ,qX_jPq81Eoug6JUPDtJigg,rATOBMt4cvaHb1m81fmvbg,t4jgQSUfubezTIZvTqn0tQ,v8DIR4v2PoH6CSHiRAci7g,wMRM_m-FgorKlNhFZqpFxQ,xgI1C_yELWMragW3ynmzKQ,y2n8rM-M4mgUrhCObgdzIA,z_tqPytGQF_tDw_buHKJcw,zj2ff8z23ejDFDAsEcs_BA
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0rn5E5l1tVZZXlhTbU340g,0.0,0.989188,0.933538,1.0,0.953942,0.874478,0.96047,0.963084,0.989188,0.934044,...,0.974364,0.937688,0.983397,0.972696,0.983397,0.983397,0.979483,1.0,0.979483,1.0
193UVQE-VD2gxYx4-bkzGw,0.989188,0.0,0.955143,0.989188,0.962731,0.93616,0.990912,0.972088,1.0,0.958638,...,0.98566,0.958816,0.994433,0.990722,0.994433,0.994433,0.991264,0.989188,0.991264,0.989188
1bYuoztlpZuYneVTCrRLzQ,0.933538,0.955143,0.0,0.933538,0.936898,0.921203,0.956807,0.945661,0.955143,0.891085,...,0.953144,0.900415,0.962579,0.940424,0.962579,0.962579,0.957452,0.933538,0.957452,0.933538
1v_W1gC_vZo_X_dTVCwXvg,1.0,0.989188,0.933538,0.0,0.953942,0.874478,0.96047,0.963084,0.989188,0.934044,...,0.974364,0.937688,0.983397,0.972696,0.983397,0.983397,0.979483,1.0,0.979483,1.0
1yw3JgmQQxwP662FOLZrWg,0.953942,0.962731,0.936898,0.953942,0.0,0.897385,0.952495,0.921578,0.962731,0.947485,...,0.994462,0.939122,0.947252,0.961012,0.947252,0.947252,0.946153,0.953942,0.946153,0.953942


In [33]:
def find_n_neighbours(df,n):
    order = np.argsort(df.values, axis=1)[:, :n]
    df = df.apply(lambda x: pd.Series(x.sort_values(ascending=False)
           .iloc[:n].index, 
          index=['top{}'.format(i) for i in range(1, n+1)]), axis=1)
    return df

In [34]:
# top 30 neighbours for each user
sim_user_30_u = find_n_neighbours(similarity_with_user,30)
sim_user_30_u.head()

Unnamed: 0_level_0,top1,top2,top3,top4,top5,top6,top7,top8,top9,top10,...,top21,top22,top23,top24,top25,top26,top27,top28,top29,top30
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0rn5E5l1tVZZXlhTbU340g,zj2ff8z23ejDFDAsEcs_BA,NUycvb2CD5f4sel9XXA-MQ,N8KY5W3UXAZrAGMYO2IDFw,N-Sy8I9byR2Q2gTI99nKxQ,LqU99zFTGxAqsNZwkj69gQ,Lasp35EpoURyJz5vhLlVHQ,HyeG4D_IL4oGA1NHg57D6A,F2LCVynfNbxG7IMqe1UO-A,Eu7rIZOFDUVb4XrOwReaZA,EtySygJ2sep1ELBBCaJgMQ,...,7M1zIE6OzpySDlqLU6MnEg,60JrvkRZ8v58-XkZ8B-CQw,4qAD7LR2gsGgX3oNgeTIXQ,4bxhIKz9ePWqD93UkEjnbg,2BF4W15wdzpaQQUKFsQe9Q,1yw3JgmQQxwP662FOLZrWg,1v_W1gC_vZo_X_dTVCwXvg,1bYuoztlpZuYneVTCrRLzQ,193UVQE-VD2gxYx4-bkzGw,NNXx0ccFbBLykwbmq9uqMQ
193UVQE-VD2gxYx4-bkzGw,zj2ff8z23ejDFDAsEcs_BA,z_tqPytGQF_tDw_buHKJcw,NNXx0ccFbBLykwbmq9uqMQ,N8KY5W3UXAZrAGMYO2IDFw,N-Sy8I9byR2Q2gTI99nKxQ,LqU99zFTGxAqsNZwkj69gQ,Lasp35EpoURyJz5vhLlVHQ,HyeG4D_IL4oGA1NHg57D6A,F2LCVynfNbxG7IMqe1UO-A,Eu7rIZOFDUVb4XrOwReaZA,...,8mtioD5uNnlVl-kHrNE4Iw,7M1zIE6OzpySDlqLU6MnEg,60JrvkRZ8v58-XkZ8B-CQw,4qAD7LR2gsGgX3oNgeTIXQ,4bxhIKz9ePWqD93UkEjnbg,2BF4W15wdzpaQQUKFsQe9Q,1yw3JgmQQxwP662FOLZrWg,1v_W1gC_vZo_X_dTVCwXvg,1bYuoztlpZuYneVTCrRLzQ,NUycvb2CD5f4sel9XXA-MQ
1bYuoztlpZuYneVTCrRLzQ,zj2ff8z23ejDFDAsEcs_BA,z_tqPytGQF_tDw_buHKJcw,NNXx0ccFbBLykwbmq9uqMQ,N8KY5W3UXAZrAGMYO2IDFw,N-Sy8I9byR2Q2gTI99nKxQ,LqU99zFTGxAqsNZwkj69gQ,Lasp35EpoURyJz5vhLlVHQ,HyeG4D_IL4oGA1NHg57D6A,F2LCVynfNbxG7IMqe1UO-A,Eu7rIZOFDUVb4XrOwReaZA,...,8mtioD5uNnlVl-kHrNE4Iw,7M1zIE6OzpySDlqLU6MnEg,60JrvkRZ8v58-XkZ8B-CQw,4qAD7LR2gsGgX3oNgeTIXQ,4bxhIKz9ePWqD93UkEjnbg,2BF4W15wdzpaQQUKFsQe9Q,1yw3JgmQQxwP662FOLZrWg,1v_W1gC_vZo_X_dTVCwXvg,193UVQE-VD2gxYx4-bkzGw,NUycvb2CD5f4sel9XXA-MQ
1v_W1gC_vZo_X_dTVCwXvg,zj2ff8z23ejDFDAsEcs_BA,z_tqPytGQF_tDw_buHKJcw,NNXx0ccFbBLykwbmq9uqMQ,N8KY5W3UXAZrAGMYO2IDFw,N-Sy8I9byR2Q2gTI99nKxQ,LqU99zFTGxAqsNZwkj69gQ,Lasp35EpoURyJz5vhLlVHQ,HyeG4D_IL4oGA1NHg57D6A,F2LCVynfNbxG7IMqe1UO-A,Eu7rIZOFDUVb4XrOwReaZA,...,8mtioD5uNnlVl-kHrNE4Iw,7M1zIE6OzpySDlqLU6MnEg,60JrvkRZ8v58-XkZ8B-CQw,4qAD7LR2gsGgX3oNgeTIXQ,4bxhIKz9ePWqD93UkEjnbg,2BF4W15wdzpaQQUKFsQe9Q,1yw3JgmQQxwP662FOLZrWg,1bYuoztlpZuYneVTCrRLzQ,193UVQE-VD2gxYx4-bkzGw,NUycvb2CD5f4sel9XXA-MQ
1yw3JgmQQxwP662FOLZrWg,zj2ff8z23ejDFDAsEcs_BA,Duv-ckttNBl_26Q9GynGGQ,OsFWc7PMDDACG9MMit7kGQ,z_tqPytGQF_tDw_buHKJcw,NUycvb2CD5f4sel9XXA-MQ,NNXx0ccFbBLykwbmq9uqMQ,N8KY5W3UXAZrAGMYO2IDFw,LqU99zFTGxAqsNZwkj69gQ,Lasp35EpoURyJz5vhLlVHQ,HyeG4D_IL4oGA1NHg57D6A,...,60JrvkRZ8v58-XkZ8B-CQw,1v_W1gC_vZo_X_dTVCwXvg,1bYuoztlpZuYneVTCrRLzQ,193UVQE-VD2gxYx4-bkzGw,PJkVXGBMk3N6i31XahdzkA,Ndssfx9LA9WpIIvj67wJuw,P_PhP-YftSTWN2OBkLWCEQ,i6NaEP-UmQ6hNiyBpYMQVA,y2n8rM-M4mgUrhCObgdzIA,xgI1C_yELWMragW3ynmzKQ


In [37]:
# top 30 neighbours for each user
sim_user_30_m = find_n_neighbours(similarity_with_business,30)
sim_user_30_m.head()

Unnamed: 0_level_0,top1,top2,top3,top4,top5,top6,top7,top8,top9,top10,...,top21,top22,top23,top24,top25,top26,top27,top28,top29,top30
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0rn5E5l1tVZZXlhTbU340g,zj2ff8z23ejDFDAsEcs_BA,1v_W1gC_vZo_X_dTVCwXvg,Duv-ckttNBl_26Q9GynGGQ,Eu7rIZOFDUVb4XrOwReaZA,HyeG4D_IL4oGA1NHg57D6A,Lasp35EpoURyJz5vhLlVHQ,NNXx0ccFbBLykwbmq9uqMQ,XIWbu1y3MoqtdPqdglNUEg,e0USjAiBMXlJMv0WidTkFQ,Vx8oytfIS-J2oP3rnr0AxA,...,wMRM_m-FgorKlNhFZqpFxQ,v8DIR4v2PoH6CSHiRAci7g,8mtioD5uNnlVl-kHrNE4Iw,rATOBMt4cvaHb1m81fmvbg,dJNWdoxJ2vwrwAp3qh7ixA,iPvikJY3_4ltliN-1yzmRA,NUycvb2CD5f4sel9XXA-MQ,xgI1C_yELWMragW3ynmzKQ,z_tqPytGQF_tDw_buHKJcw,p7HMe4REUwSHn9in8mUQMQ
193UVQE-VD2gxYx4-bkzGw,60JrvkRZ8v58-XkZ8B-CQw,N8KY5W3UXAZrAGMYO2IDFw,PXe-x5SAMzAEn8EOBCVgeA,P_PhP-YftSTWN2OBkLWCEQ,PJkVXGBMk3N6i31XahdzkA,dwiLbT2ZFasBYz1BYrwrwQ,UlZOzO4wiHa-DT31t5NruA,dJNWdoxJ2vwrwAp3qh7ixA,8mtioD5uNnlVl-kHrNE4Iw,F2LCVynfNbxG7IMqe1UO-A,...,RZXnKcQYDm68QtJcSlQ1vA,t4jgQSUfubezTIZvTqn0tQ,N-Sy8I9byR2Q2gTI99nKxQ,Eu7rIZOFDUVb4XrOwReaZA,HyeG4D_IL4oGA1NHg57D6A,Lasp35EpoURyJz5vhLlVHQ,Duv-ckttNBl_26Q9GynGGQ,1v_W1gC_vZo_X_dTVCwXvg,NNXx0ccFbBLykwbmq9uqMQ,zj2ff8z23ejDFDAsEcs_BA
1bYuoztlpZuYneVTCrRLzQ,EtySygJ2sep1ELBBCaJgMQ,8mtioD5uNnlVl-kHrNE4Iw,wMRM_m-FgorKlNhFZqpFxQ,v8DIR4v2PoH6CSHiRAci7g,rATOBMt4cvaHb1m81fmvbg,F2LCVynfNbxG7IMqe1UO-A,dJNWdoxJ2vwrwAp3qh7ixA,UlZOzO4wiHa-DT31t5NruA,xgI1C_yELWMragW3ynmzKQ,z_tqPytGQF_tDw_buHKJcw,...,p7HMe4REUwSHn9in8mUQMQ,9pNcdrQLWWrX0vEGGJlEbg,4qAD7LR2gsGgX3oNgeTIXQ,cNO-iv0SlxqUjZjs2MR1Tw,PBSk4cQN8gA8ysKFz32PLA,ActrvcM7JYa8Ck7hUfhFXA,UG4EKu13JRwzRix6ESINdg,BQ2622RQLC5QLCb3B_dW_A,RZXnKcQYDm68QtJcSlQ1vA,t4jgQSUfubezTIZvTqn0tQ
1v_W1gC_vZo_X_dTVCwXvg,zj2ff8z23ejDFDAsEcs_BA,e0USjAiBMXlJMv0WidTkFQ,Duv-ckttNBl_26Q9GynGGQ,Eu7rIZOFDUVb4XrOwReaZA,HyeG4D_IL4oGA1NHg57D6A,Lasp35EpoURyJz5vhLlVHQ,NNXx0ccFbBLykwbmq9uqMQ,Vx8oytfIS-J2oP3rnr0AxA,XIWbu1y3MoqtdPqdglNUEg,0rn5E5l1tVZZXlhTbU340g,...,wMRM_m-FgorKlNhFZqpFxQ,v8DIR4v2PoH6CSHiRAci7g,8mtioD5uNnlVl-kHrNE4Iw,rATOBMt4cvaHb1m81fmvbg,dJNWdoxJ2vwrwAp3qh7ixA,iPvikJY3_4ltliN-1yzmRA,NUycvb2CD5f4sel9XXA-MQ,xgI1C_yELWMragW3ynmzKQ,z_tqPytGQF_tDw_buHKJcw,LqU99zFTGxAqsNZwkj69gQ
1yw3JgmQQxwP662FOLZrWg,LqU99zFTGxAqsNZwkj69gQ,9pNcdrQLWWrX0vEGGJlEbg,p7HMe4REUwSHn9in8mUQMQ,N-Sy8I9byR2Q2gTI99nKxQ,193UVQE-VD2gxYx4-bkzGw,60JrvkRZ8v58-XkZ8B-CQw,PXe-x5SAMzAEn8EOBCVgeA,N8KY5W3UXAZrAGMYO2IDFw,P_PhP-YftSTWN2OBkLWCEQ,PJkVXGBMk3N6i31XahdzkA,...,1v_W1gC_vZo_X_dTVCwXvg,zj2ff8z23ejDFDAsEcs_BA,0rn5E5l1tVZZXlhTbU340g,e0USjAiBMXlJMv0WidTkFQ,y2n8rM-M4mgUrhCObgdzIA,Vx8oytfIS-J2oP3rnr0AxA,XIWbu1y3MoqtdPqdglNUEg,f-8BKEVPqiqRaNNmhjAU8Q,4bxhIKz9ePWqD93UkEjnbg,NUycvb2CD5f4sel9XXA-MQ


In [None]:
def get_user_similar_movies( user1, user2 ):
    common_business = Rating_avg[Rating_avg.userId == user1].merge(
    Rating_avg[Rating_avg.userId == user2],
    on = "business_id",
    how = "inner" )
    return common_business.merge( movies, on = 'business_id' )