# Book Recommendations

One of the solutions for my business problem is to recommend books to customers. With the help of the recommendations;

- Customers do not need to spend to much time to find suitable book.
- When they find books easily, they will buy more books.
- Also, they will advice this online store and give good reviews to it. It means more customer and more sales. 
- Also, when customers spend less time on server, it decreases the technical problems on website. 

### Aim of This Notebook:

My aim in this notebook to recommend books to customers in 2 different ways.

In [4]:
# dataframe and series 
import pandas as pd
import numpy as np
import scipy
import math

# sklearn imports 
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,balanced_accuracy_score
from sklearn.model_selection import train_test_split

from nltk.corpus import stopwords
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
from sklearn.preprocessing import MinMaxScaler
import sklearn.metrics.pairwise as pw
from sklearn.metrics.pairwise import pairwise_distances

# To plot
import matplotlib.pyplot as plt  
%matplotlib inline    
import matplotlib as mpl
import seaborn as sns

import random
from scipy import sparse
from scipy.stats import pearsonr
import re

In [5]:
pd.options.display.max_columns=100 # To see the hidden columns in dataframe

In [6]:
df = pd.read_csv('sample30.csv') # taking whole data

My data is too big to take pivot table in my basic computer and impossible to work on it. So, I will take around top 1000 users and 1000 books and do my system according to them.

## Choosing 1000 users - 1000 books

Firstly, I would like to see users review numbers and books review numbers.

In [7]:
# creting new column which consists of reviewer numbers of user
df['Reviewer_Count'] = df['id'].map(df['id'].value_counts())

In [8]:
# creating new column for review numbers of books
df['Book_Count'] = df['brand'].map(df['brand'].value_counts())

In [9]:
df = df.sort_values(by=['Reviewer_Count'],ascending=False)

In [10]:
df.reset_index(drop=True,inplace=True) 

In [11]:
df.isna().sum() # to check title column

id                          0
brand                       0
categories                  0
manufacturer              141
name                        0
reviews_date               46
reviews_didPurchase     13877
reviews_doRecommend      2404
reviews_rating              0
reviews_text                0
reviews_title               0
reviews_userCity        27881
reviews_userProvince    29642
user_sentiment              0
review_clean                1
Reviewer_Count              0
Book_Count                  0
dtype: int64

I want to give information with movie titles, so I will drop the reviews which do not have titles. In this data, 'brand' shows the movie ID. If the title is not found in meta data, it is shown as null. So, I will drop them to see titles.

In [13]:
df.dropna(subset=['reviews_title'], inplace=True) # dropping without title columns

# Taking Samples 

I will have 2 sampled data to compare recommandation results at the end. In this notebook, I will use df_1000 which contains approximately 1000 user and 1000 movies.

In [14]:
df_100 = df.loc[df['Reviewer_Count']>150] 

In [15]:
df_1000 = df_100.loc[df_100['Book_Count']>150]

In [16]:
df_1000['id'].nunique()

30

In [17]:
df_1000['brand'].nunique()

29

In above cells, I select users who have more than 150 reviews. Also between them, I select movies which have more than 150 reviews.

In [18]:
df_200=df.loc[df['Reviewer_Count']>100]

In [19]:
df_2000 = df_200.loc[df_200['Book_Count']>100]

I also take more than 100 reviews for each same way to keep more big sample.

In [20]:
df_1000.to_csv('df_1000.csv',index = False) # writing to csv for later use

In [21]:
df_2000.to_csv('df_2000.csv',index = False) # writing to csv for later use

# Two Main Recommendation Systems

In [22]:
# taking pivot table of user-item

pivot = pd.pivot_table(df_1000, index='id', columns=['brand'], values='reviews_rating').fillna(0)

pivot.head(5)

brand,Aveeno,Avery,Burt's Bees,Chester's,Clear Scalp & Hair Therapy,Clorox,Coty,Disney,FOX,Hoover,Hormel,Just For Men,L'oreal Paris,Lionsgate,Lundberg,Lysol,Nexxus,Olay,Pendaflex,Sony Pictures,Storkcraft,Summit Entertainment,Tostitos,Universal Home Video,Vaseline,Warner Bros.,Warner Home Video,Warner Music Group,Windex
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1
AV1YGDqsGV-KLJ3adc-O,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.12931
AV1l8zRZvKc47QAVhnAv,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.690852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AVpe31o71cnluZ0-YrSD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.226721,0.0,0.0,0.0
AVpe41TqilAPnD_xQH3d,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.303831,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AVpe59io1cnluZ0-ZgDU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.491018,0.0,0.0,0.0,0.0,0.0


In [24]:
# changing pivot table to matrix
pivot_mat = pivot.to_numpy()

pivot_mat[:5]

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 4.12931034],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 4.69085174, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
  

In [25]:
# My reviewer id column is index, so I turned it to list
reviewer_id = list(pivot.index)
reviewer_id[:10]

['AV1YGDqsGV-KLJ3adc-O',
 'AV1l8zRZvKc47QAVhnAv',
 'AVpe31o71cnluZ0-YrSD',
 'AVpe41TqilAPnD_xQH3d',
 'AVpe59io1cnluZ0-ZgDU',
 'AVpe8gsILJeJML43y6Ed',
 'AVpe9W4D1cnluZ0-avf0',
 'AVpf0eb2LJeJML43EVSt',
 'AVpf2tw1ilAPnD_xjflC',
 'AVpf385g1cnluZ0-s0_t']

In [26]:
# Sparcing pivot matrix 
sparse_matrix = csr_matrix(pivot_mat)

sparse_matrix

<30x29 sparse matrix of type '<class 'numpy.float64'>'
	with 30 stored elements in Compressed Sparse Row format>

I created sparce matrix because lots of the values in pivot matrix is zero (if user have no rating on that movie, it is shown as zero). So, different than dense matrix which stores all values, sparse matrix keeps non-zero values according to row and column indices.

In [27]:
# factor numbers of the user-item matrix

factor_n = 15

# matrix factorization of the user-item matrix

U, sigma, V = svds(sparse_matrix, k = factor_n) #defining elements of the factorization

In [28]:
# to see and check the dimensions of matrix
print(U.shape)
print(V.shape)
sigma = np.diag(sigma)
print(sigma.shape)

(30, 15)
(15, 29)
(15, 15)


In [29]:
# dot product of matrix
pred_rating = np.dot(np.dot(U,sigma),V)

pred_rating[:5]

array([[ 5.02100879e-31,  2.26520916e-16, -9.07518785e-16,
         5.93685885e-16, -1.03638621e-30,  9.33329556e-16,
         4.62057869e-16, -1.98681156e-15,  7.18770972e-31,
        -3.18614236e-30,  1.96204673e-30,  2.18417118e-31,
        -1.09386277e-31,  4.43704023e-16,  3.96177805e-17,
        -2.04498503e-16, -2.44213711e-34, -5.21628567e-16,
         1.42056455e-31, -2.16982011e-16,  2.29712036e-31,
         3.53343471e-17,  1.38635963e-15, -8.54486683e-31,
        -3.92374789e-16, -2.94444388e-31, -1.66728141e-30,
         1.90421399e-15,  3.03248149e-30],
       [ 3.16197397e-16, -1.17199005e-17, -9.90792054e-17,
        -5.24005827e-16, -1.51183412e-16, -2.82026667e-16,
        -5.36378870e-16,  2.57006880e-17, -3.26664653e-16,
        -7.01231892e-17,  8.09307363e-19, -3.86363728e-16,
         6.74941313e-17,  1.41452571e-16,  3.87338385e-17,
        -2.83742804e-16, -2.07804526e-16,  4.69085174e+00,
        -5.34429660e-16, -1.64791434e-16, -6.17021634e-16,
         3.90

I need to normalize my matrix before recommending because of zero values.

In [30]:
# normalizing matrix
pred_rating_n = (pred_rating - pred_rating.min()) / (pred_rating.max() - pred_rating.min())

In [31]:
pred_rating_n[:5]

array([[1.50045620e-15, 1.54682706e-15, 1.31467898e-15, 1.62198902e-15,
        1.50045620e-15, 1.69151713e-15, 1.59504359e-15, 1.09373807e-15,
        1.50045620e-15, 1.50045620e-15, 1.50045620e-15, 1.50045620e-15,
        1.50045620e-15, 1.59128639e-15, 1.50856632e-15, 1.45859353e-15,
        1.50045620e-15, 1.39367416e-15, 1.50045620e-15, 1.45603804e-15,
        1.50045620e-15, 1.50768946e-15, 1.78425644e-15, 1.50045620e-15,
        1.42013357e-15, 1.50045620e-15, 1.50045620e-15, 1.89026587e-15,
        1.50045620e-15],
       [1.56518464e-15, 1.49805703e-15, 1.48017380e-15, 1.39318751e-15,
        1.46950760e-15, 1.44272282e-15, 1.39065464e-15, 1.50571736e-15,
        1.43358502e-15, 1.48610136e-15, 1.50062188e-15, 1.42136409e-15,
        1.51427286e-15, 1.52941281e-15, 1.50838537e-15, 1.44237151e-15,
        1.45791675e-15, 9.60259381e-01, 1.39105366e-15, 1.46672192e-15,
        1.37414635e-15, 1.58041536e-15, 1.51826101e-15, 1.39142216e-15,
        1.55942924e-15, 1.66487292e-15,

In [32]:
# Assigning reconstructed matrix to df
pred_df = pd.DataFrame(pred_rating_n, columns = pivot.columns, index=reviewer_id).transpose()
pred_df.head(10)

Unnamed: 0_level_0,AV1YGDqsGV-KLJ3adc-O,AV1l8zRZvKc47QAVhnAv,AVpe31o71cnluZ0-YrSD,AVpe41TqilAPnD_xQH3d,AVpe59io1cnluZ0-ZgDU,AVpe8gsILJeJML43y6Ed,AVpe9W4D1cnluZ0-avf0,AVpf0eb2LJeJML43EVSt,AVpf2tw1ilAPnD_xjflC,AVpf385g1cnluZ0-s0_t,AVpf3VOfilAPnD_xjpun,AVpf4oLxLJeJML43FcxC,AVpf5Z1zLJeJML43FpB-,AVpf5olc1cnluZ0-tPrO,AVpf63aJLJeJML43F__Q,AVpf9pzn1cnluZ0-uNTM,AVpfBrUZilAPnD_xTUly,AVpfJP1C1cnluZ0-e3Xy,AVpfM_ytilAPnD_xXIJb,AVpfOmKwLJeJML435GM7,AVpfPPkEilAPnD_xX3cP,AVpfPaoqLJeJML435Xk9,AVpfPnrU1cnluZ0-g9rL,AVpfR5m0LJeJML436K3W,AVpfRTh1ilAPnD_xYic2,AVpfW8y_LJeJML437ySW,AVpfazX31cnluZ0-kbdl,AVpfcu821cnluZ0-k8ep,AVpfm8yiLJeJML43AYyu,AVpftikC1cnluZ0-p31V
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
Aveeno,1.500456e-15,1.565185e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.35996e-15,1.275193e-15,1.500456e-15,1.547842e-15,1.511123e-15,1.251525e-15,1.205399e-15,1.368032e-15,1.537651e-15,1.36153e-15,1.548036e-15,1.453782e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.709765e-15,1.582325e-15,1.324158e-15,1.500456e-15,1.500456e-15,1.564733e-15,1.500456e-15,1.500456e-15
Avery,1.546827e-15,1.498057e-15,1.467761e-15,1.535125e-15,2.531159e-15,1.458725e-15,1.731555e-15,1.503256e-15,1.482593e-15,1.736466e-15,1.51929e-15,1.0,1.544533e-15,1.520953e-15,1.553144e-15,1.498771e-15,1.483446e-15,1.519367e-15,1.510985e-15,1.505653e-15,1.507956e-15,1.496433e-15,1.520317e-15,1.469694e-15,1.504773e-15,1.571822e-15,1.562962e-15,1.573019e-15,1.482021e-15,1.530041e-15
Burt's Bees,1.314679e-15,1.480174e-15,1.278722e-15,6.082525e-16,1.275408e-15,1.214427e-15,1.642459e-15,1.468397e-15,1.345759e-15,1.794533e-15,1.573511e-15,1.555799e-15,1.471904e-15,1.544994e-15,0.9520256,1.491818e-15,1.345374e-15,1.573809e-15,1.63529e-15,1.483789e-15,1.402661e-15,1.743137e-15,1.667039e-15,1.527882e-15,1.533253e-15,1.047186e-15,1.280119e-15,1.583872e-15,1.500951e-15,1.471535e-15
Chester's,1.621989e-15,1.393188e-15,1.51851e-15,1.515075e-15,2.109314e-15,1.502894e-15,1.369358e-15,1.781669e-15,1.461514e-15,1.544286e-15,1.522792e-15,1.521233e-15,1.400404e-15,0.9865353,1.543436e-15,1.603486e-15,1.434982e-15,1.522883e-15,1.585798e-15,1.562513e-15,1.290178e-15,1.595392e-15,1.543607e-15,1.471526e-15,1.279423e-15,1.684177e-15,1.505847e-15,1.538157e-15,1.507376e-15,1.396482e-15
Clear Scalp & Hair Therapy,1.500456e-15,1.469508e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.884402e-15,1.319749e-15,1.500456e-15,1.257021e-15,1.509409e-15,1.442788e-15,1.605924e-15,1.473121e-15,1.495221e-15,1.450778e-15,1.256028e-15,1.548929e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.4182e-15,1.339815e-15,1.76235e-15,1.500456e-15,1.500456e-15,1.449041e-15,1.500456e-15,1.500456e-15
Clorox,1.691517e-15,1.442723e-15,1.359257e-15,1.369564e-15,1.514196e-15,1.475114e-15,1.59633e-15,1.422945e-15,1.533735e-15,1.592602e-15,0.9867911,1.519542e-15,1.48989e-15,1.522786e-15,1.570937e-15,1.470302e-15,1.33299e-15,0.9908155,1.53224e-15,1.357258e-15,1.534218e-15,1.451094e-15,1.39601e-15,1.579643e-15,1.51334e-15,1.680715e-15,1.382258e-15,1.468668e-15,1.475391e-15,1.810654e-15
Coty,1.595044e-15,1.390655e-15,1.528546e-15,1.571143e-15,1.605306e-15,1.133807e-15,1.992461e-15,1.501552e-15,1.794055e-15,2.07753e-15,1.326686e-15,1.48257e-15,1.57904e-15,1.432535e-15,1.345205e-15,1.517322e-15,0.9509897,1.325977e-15,1.254855e-15,1.470133e-15,1.397747e-15,1.794392e-15,1.468196e-15,1.491179e-15,1.536137e-15,1.381995e-15,1.764953e-15,1.444113e-15,1.495569e-15,1.436113e-15
Disney,1.093738e-15,1.505717e-15,1.715755e-15,1.531869e-15,1.941031e-15,1.466681e-15,2.58976e-15,1.464057e-15,1.455371e-15,8.967435e-16,1.513733e-15,1.504964e-15,1.549218e-15,1.27274e-15,1.533063e-15,1.482814e-15,1.535892e-15,1.513787e-15,1.504957e-15,1.659211e-15,1.371015e-15,2.157558e-15,1.564721e-15,1.621322e-15,0.957585,1.851941e-15,1.758599e-15,1.519347e-15,1.494073e-15,1.825613e-15
FOX,1.500456e-15,1.433585e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.720598e-15,6.10875e-16,1.500456e-15,1.353851e-15,1.539807e-15,1.534935e-15,1.516826e-15,5.363595e-16,1.488107e-15,1.576755e-15,1.353253e-15,1.476766e-15,1.500456e-15,1.500456e-15,1.500456e-15,2.032561e-15,1.340173e-15,1.534599e-15,1.500456e-15,1.500456e-15,1.221861e-15,1.500456e-15,1.500456e-15
Hoover,1.500456e-15,1.486101e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,1.500456e-15,5.891906e-16,1.869167e-15,1.500456e-15,1.605743e-15,1.757641e-15,1.447485e-15,1.356524e-15,1.650907e-15,1.399864e-15,2.021163e-15,1.606173e-15,1.134038e-15,1.500456e-15,1.500456e-15,1.500456e-15,2.514813e-15,9.61132e-16,2.661301e-15,1.500456e-15,1.500456e-15,1.64979e-15,1.500456e-15,1.500456e-15


Now, I can have the SVD latent factorization of each user for each item. It is time for recommendation function.

In [33]:
def recommend_items(user_id ,pred_df, items_df, items_to_ignore=[], top_list=20, verbose=False):
        '''this function find the most related items'''
        # taking and sorting the user's predictions
        sorted_user_predictions = pred_df[user_id].sort_values(ascending=False) \
                                    .reset_index().rename(columns={user_id: 'recStrength'})

        recommendations_df = sorted_user_predictions[~sorted_user_predictions['brand'].isin(items_to_ignore)] \
                               .sort_values('recStrength', ascending = False) \
                               .head(top_list)

        return recommendations_df
    


'Recstrength' shows the how much our prediction is strength.

In [34]:
def recommender(user_id,pred_df,real_df,df):
    '''this functions merges the title with corresponding book id and finding the 
        empty rows which means does not read by user and priting results'''
        
    recommend = recommend_items(user_id, pred_df,real_df)
    df_user= real_df.loc[real_df['id'] == user_id]
    new_df = df_user.merge(recommend, how = 'outer', left_on = 'brand', right_on = 'brand')
    rec_df = new_df.loc[new_df['reviews_rating'].isnull()==True]
    df_rec_t = rec_df.loc[:, ['brand', 'recStrength']]
    df_last = pd.merge(df_rec_t,df[['reviews_title','brand']],on=['brand'], how='left') 
    return df_last

# Getting Recommendations 

When I try my recommender system it gives me recommended books. Maybe strength looks not well, but it is expected because I am working now only the small subset of original data. My aim is to build a system, if systems works for this subset, it can be run in more strong computers.

In [35]:
recommender('AV1YGDqsGV-KLJ3adc-O',pred_df,df_1000,df)

Unnamed: 0,brand,recStrength,reviews_title
0,Warner Music Group,1.890266e-15,Great music!
1,Warner Music Group,1.890266e-15,CD purchase
2,Warner Music Group,1.890266e-15,Great album
3,Warner Music Group,1.890266e-15,Great
4,Warner Music Group,1.890266e-15,fill with hits
...,...,...,...
18328,Universal Home Video,1.500456e-15,Good movie
18329,Universal Home Video,1.500456e-15,Funny!
18330,Universal Home Video,1.500456e-15,Funny
18331,Universal Home Video,1.500456e-15,Great movie


# Another Technique

Now, I will use cosine similarity tool of scikit learn library directly and build a user-based recommender.

In [36]:
user_sparse_pivot = sparse.csr_matrix(pivot.fillna(0))
user_recommender = pw.cosine_similarity(user_sparse_pivot)

In [37]:
# normalizing zero values 
pred_rating_n_cos = (user_recommender - user_recommender.mean())/ (user_recommender.max() - user_recommender.min())

In [38]:
user_recommender_df = pd.DataFrame(pred_rating_n_cos, columns=pivot.index.values,index = pivot.index.values)

In this method, I have user-user matrix, which shows relations of them, now I will change it to get recommendations.

In [39]:
user_recommender_df.head(3)

Unnamed: 0,AV1YGDqsGV-KLJ3adc-O,AV1l8zRZvKc47QAVhnAv,AVpe31o71cnluZ0-YrSD,AVpe41TqilAPnD_xQH3d,AVpe59io1cnluZ0-ZgDU,AVpe8gsILJeJML43y6Ed,AVpe9W4D1cnluZ0-avf0,AVpf0eb2LJeJML43EVSt,AVpf2tw1ilAPnD_xjflC,AVpf385g1cnluZ0-s0_t,AVpf3VOfilAPnD_xjpun,AVpf4oLxLJeJML43FcxC,AVpf5Z1zLJeJML43FpB-,AVpf5olc1cnluZ0-tPrO,AVpf63aJLJeJML43F__Q,AVpf9pzn1cnluZ0-uNTM,AVpfBrUZilAPnD_xTUly,AVpfJP1C1cnluZ0-e3Xy,AVpfM_ytilAPnD_xXIJb,AVpfOmKwLJeJML435GM7,AVpfPPkEilAPnD_xX3cP,AVpfPaoqLJeJML435Xk9,AVpfPnrU1cnluZ0-g9rL,AVpfR5m0LJeJML436K3W,AVpfRTh1ilAPnD_xYic2,AVpfW8y_LJeJML437ySW,AVpfazX31cnluZ0-kbdl,AVpfcu821cnluZ0-k8ep,AVpfm8yiLJeJML43AYyu,AVpftikC1cnluZ0-p31V
AV1YGDqsGV-KLJ3adc-O,0.964444,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556
AV1l8zRZvKc47QAVhnAv,-0.035556,0.964444,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556
AVpe31o71cnluZ0-YrSD,-0.035556,-0.035556,0.964444,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556,-0.035556


In [40]:
def recommend2(user_recommender_df,user_id,df):
    '''This function finds the top-2 similar users first,
    then it takes the mean of their ratings to books and sorted books according to mean values of books
    '''
    user_cosine_df = pd.DataFrame(user_recommender_df[user_id].sort_values(ascending=False))
    user_cosine_df.reset_index(level=0, inplace=True)
    user_cosine_df.columns = ['brand','cosine_sim']
    
    # for 2-similar users
    similar_user = list(user_cosine_df['brand'][1:3].values)
    similar_user_df = pivot.T[[user_id] + similar_user]
    similar_user_df['mean'] = similar_user_df[similar_user].mean(numeric_only=True,axis=1)
    similar_user_df.sort_values('mean', ascending=False,inplace = True)
    
    #taking top 10 books from similar users and merging them
    book_top10 = similar_user_df[similar_user_df[user_id]==0].head(10)
    df_last1 = pd.merge(book_top10,df[['reviews_title','brand']],on=['brand'], how='left') 
    return df_last1

In [42]:
recommend2(user_recommender_df,'AV1YGDqsGV-KLJ3adc-O',df)

Unnamed: 0,brand,AV1YGDqsGV-KLJ3adc-O,AVpfm8yiLJeJML43AYyu,AV1l8zRZvKc47QAVhnAv,mean,reviews_title
0,Olay,0.0,0.0,4.690852,2.345426,Excellent Product!
1,Olay,0.0,0.0,4.690852,2.345426,This product smells amazing
2,Olay,0.0,0.0,4.690852,2.345426,looking young healthy
3,Olay,0.0,0.0,4.690852,2.345426,The Fountain of Youth
4,Olay,0.0,0.0,4.690852,2.345426,You will be amazed at the actual benefits of O...
...,...,...,...,...,...,...
6985,Tostitos,0.0,0.0,0.000000,0.000000,love these
6986,Tostitos,0.0,0.0,0.000000,0.000000,Chips
6987,Tostitos,0.0,0.0,0.000000,0.000000,loving it
6988,Tostitos,0.0,0.0,0.000000,0.000000,Simply the Best


When we look at these 2 different recommendations, it is shown that they recommends different movies. When we change our system, results have changed. There are many ways to compare results in recommendation systems. As a future plan, I will add comparing metric to my systems. 