# Assignment - Beer Recommendation System

Description: `beer_data.csv` :  Each record includes a beer's name and the user's name, along with the ratings he/she has given to the beer. All ratings are on a scale from 1 to 5, with 5 being the best rating.

### Purpose of the Case study :

#### Data Preparation

Choose only those beers that have at least N number of reviews. (Figure out an appropriate value of N using EDA)

#### Data exploration

1) What are the unique values of ratings? <br>
2) Visualise the rating values and notice:<br>
    a) The average beer ratings<br>
    b) The average user ratings<br>
    c) The average number of ratings given to the beers<br>
    d) The average number of ratings given by the users<br>

#### Recommendation Models

1) Divide your data into training and testing dataset.<br>
2) Build user-based and item-based models.<br>
3) Determine how similar the first 10 users are to each other and visualise it.<br>
4) Compute and visualise the similarity between the first 10 beers.<br>
5) Compare the performance of the two models using test data and suggest the one that should be deployed.<br>
6) Give the names of the top 5 beers that you would recommend to the users 'cokes', 'genog' and 'giblet' using both the models.<br>




In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import r2_score

# let's import the scaling libraries, Since lot of dummy variables we will use MinMaxScaler

from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

import os

# hide warnings
import warnings
warnings.filterwarnings('ignore')
import datetime
from datetime import datetime

In [None]:
# Reading ratings file
ratings = pd.read_csv('beer_data.csv', encoding='latin-1')

In [None]:
# Since column count is usually more than the default Jupyter settings, let's refit the visible columns
pd.set_option('max_columns', 99999)
pd.set_option('display.max_colwidth', 150)

In [None]:
ratings.head()

In [None]:
ratings.shape

#### Data Preparation

#### 1) Let's check true duplicates

In [None]:
ratings_dup = ratings[ratings.duplicated(subset=['beer_beerid','review_profilename'], keep="last")]

In [None]:
ratings_dup.head()

In [None]:
ratings.loc[(ratings['review_profilename'] == 'AleWatcher' )
            & (ratings['beer_beerid'] == 52211 ) ]

In [None]:
ratings_dup = ratings[ratings.duplicated(subset=['beer_beerid','review_profilename'], keep="last")]
print ('Original file dataframe:', ratings.shape , '; Duplicate dataframe:', ratings_dup.shape)

In [None]:
ratings.loc[(ratings['review_profilename'] == 'RedDiamond' )
            & (ratings['beer_beerid'] == 962 ) ]

In [None]:
ratings.loc[(ratings['review_profilename'] == 'barleywinefiend' )
            & (ratings['beer_beerid'] == 73647 ) ]

##### Drop these true duplicates 
1) Assumption - same user giving two reviews, last review is expected to be latest and considered

In [None]:
ratings_dup = ratings[ratings.duplicated(subset=['beer_beerid','review_profilename'], keep="last")]
ratings_non_dup = ratings[~ratings.duplicated(subset=['beer_beerid','review_profilename'], keep=False)]

In [None]:
print('Dup data frame :',ratings_dup.shape)
print('Non-Dup data frame: ',ratings_non_dup.shape)
print('Rows excluding Dup data :',ratings.shape[0] - ratings_non_dup.shape[0] )


In [None]:
ratings_fin = pd.concat([ratings_dup, ratings_non_dup], axis = 0)

#### Verification post removal of true duplicates

In [None]:
ratings_fin.loc[(ratings_fin['review_profilename'] == 'AleWatcher' )
            & (ratings_fin['beer_beerid'] == 52211 ) ]

In [None]:
ratings_fin.loc[(ratings_fin['review_profilename'] == 'RedDiamond' )
            & (ratings_fin['beer_beerid'] == 962 ) ]

##### We no longer have `true duplicates`

#### 2) Check NaN or null values if any

##### Check percentage of null values, insert the same onto dataframe, choose the columns where % null values are > 0

In [None]:
a = pd.DataFrame(round(100*(ratings_fin.isnull().sum()/len(ratings_fin.index)), 2)).reset_index()
a.columns = ['column_name', 'null_pct']
a = a.loc[ (a['null_pct'] > 0) , :]
a.sort_values(by='null_pct', ascending=False).head()

##### Very few % of null values across different rows. We won't be imputing value for these columns and will drop rows. Once done, the total % is calculated to see if this has any impact on the total % data available

In [None]:
ratings_nan = ratings_fin.dropna(axis=0)


In [None]:
r1= ratings_fin.shape[0]
r2= ratings_nan.shape[0]
pct_dropped = 100*(r1-r2)/r1
pct_avail = 100 - 100* (r1-r2)/r1
print("% of records dropped :", format(pct_dropped))
print("% of records available :", format(pct_avail))

In [None]:
# Drop NaN
ratings = ratings_fin.dropna(axis=0)

In [None]:
a = pd.DataFrame(round(100*(ratings.isnull().sum()/len(ratings.index)), 2)).reset_index()
a.columns = ['column_name', 'null_pct']
a = a.loc[ (a['null_pct'] > 0) , :]
a.sort_values(by='null_pct', ascending=False).head()

##### No more Null values

In [None]:
print('Total number of records after cleaning duplicates and NaN:', ratings.shape)

#### 3) Let's perform EDA

**Let's group by beer_beerid and bin the total number of reviews**

In [None]:
rb = ratings.groupby(['beer_beerid']).count()
rb.sort_values('review_overall', ascending=False)

bins = [0, 1, 5, 10, 50, 100, 500, 1000, 5000]
print(rb.groupby(pd.cut(rb['review_overall'], bins=bins)).size())


**Ratings less than 10 for any given beer appears to be very less considering that 50+ beers have > 500 reviews. We choose K as 10 as that we have a decent number of ratings per beer before we go on to build the model. Let's derive percentile and plot histogram before applying the filter**

**Let's apply filter. From grouped by data, we choose reviews > 10 per beer. We then set index on the main dataframe/ grouped dataframe on beer_id then use isin to apply filter on the main dataframe**

In [None]:
# These are some of the highest reviews obtained for any beer_beerid
rb = ratings.groupby(['beer_beerid']).count()
rb.sort_values('review_overall', ascending= False).head(5)


In [None]:
# These are the lowest reviews obtained for any beer_beerid (single reviews)
rb.sort_values('review_overall', ascending= True).head(5)

In [None]:
selected_beers = pd.DataFrame(rb.loc[rb['review_overall'] > 10 , :] ).reset_index()
selected_beers.columns = ['beer_beerid', 'review_profilename' , 'review_overall']
print(selected_beers.shape)
i1 = ratings.set_index('beer_beerid').index
i2 = selected_beers.set_index('beer_beerid').index
tmp_df = ratings[i1.isin(i2)]

In [None]:
print('Original Dataframe :', ratings.shape)
print('After removing low rated beers(with N=2 derived as median) Dataframe :', tmp_df.shape)

#### Next, we exclude individuals who just rated  beers < 5. This is done so that we don't consider those reviewers who haven't contributed a whole lot for the reviews. Count of 5 is reasonable

In [None]:
rb = tmp_df.groupby(['review_profilename']).count()
rb.sort_values('review_overall', ascending=False)

bins = [0, 1, 5, 10, 50, 100, 500, 1000, 5000]
print(rb.groupby(pd.cut(rb['review_overall'], bins=bins)).size())



#### We don't want to consider those who rated less than 5 beers , 5 seems to be optimum cutoff to rule out any bias

In [None]:
profile_beer = tmp_df.groupby(['review_profilename']).count()
profile_beer.head(2)


In [None]:
selected_profiles = pd.DataFrame(profile_beer.loc[profile_beer['beer_beerid'] > 5 , :] ).reset_index()
selected_profiles.columns = ['review_profilename' , 'beer_beerid',  'review_overall']
i1 = tmp_df.set_index('review_profilename').index
i2 = selected_profiles.set_index('review_profilename').index
ratings = tmp_df[i1.isin(i2)]

In [None]:
print('Original Dataframe :', tmp_df.shape)
print('After removing <10 review contributions, Dataframe :', ratings.shape)

In [None]:
ratings.head(2)

**1) What are the unique values of ratings?**

In [None]:
a = pd.DataFrame(ratings.groupby(['review_overall'])['beer_beerid'].count()).reset_index()
a.columns = ['review_overall', 'count']
a.sort_values(by='count', ascending=False)

In [None]:
ratings.groupby(['review_overall'])['beer_beerid'].count().plot(kind='bar', figsize=(10,5))

In [None]:
print('Unique values for ratings are :', list(a['review_overall']))
# unique ratings are [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0] 

**2) Visualise the rating values and notice: <br>
a) The average beer ratings <br>
b) The average user ratings <br>
c) The average number of ratings given to the beers <br>
d) The average number of ratings given by the users <br>**

**a) The average beer ratings**

In [None]:
a = pd.DataFrame(ratings.groupby(['beer_beerid'])['review_overall'].mean()).reset_index()
a.columns = ['review_overall', 'average']
print ('** Top 5 average beer ratings with beer_beerid ** \n',
       a.sort_values(by='average', ascending=False).head(5))

print('** Bottom 5 average beer ratings with beer_beerid ** \n',
      a.sort_values(by='average', ascending=True).head(5))


**Distribution of The average beer ratings**

In [None]:
sns.set(font_scale=1.4)
sns.distplot(a['average'])
plt.show()

In [None]:
print('The average beer ratings:' , a['average'].mean())

**b)The average user ratings**

In [None]:
a = pd.DataFrame(ratings.groupby(['review_profilename'])['review_overall'].mean()).reset_index()
a.columns = ['review_overall', 'average']
print ('** Top 5 reviewers with average ratings based on review_profilename ** \n',
       a.sort_values(by='average', ascending=False).head(5))

print('** Bottom 5 reviewers with average ratings based on review_profilename  ** \n',
      a.sort_values(by='average', ascending=True).head(5))



**Distribution of The average user ratings**

In [None]:
sns.set(font_scale=1.4)
sns.distplot(a['average'])
plt.show()

In [None]:
print('The average user ratings :',a['average'].mean())

**c) The average number of ratings given to the beers** 

In [None]:
a = pd.DataFrame(ratings.groupby(['beer_beerid'])['review_overall'].count()).reset_index()
a.columns = ['review_overall', 'count']
print ('**Top 5 number of beer ratings with beer_beerid ** \n', a.sort_values(by='count', ascending=False).head(5))
print('**Bottom 5 number of beer ratings with beer_beerid ** \n', a.sort_values(by='count', ascending=True).head(5))

In [None]:
sns.set(font_scale=1.4)
sns.distplot(a['count'])
plt.show()

In [None]:
print('The average number of ratings given to the beers :',a['count'].mean())

**d) The average number of ratings given by the users** 

In [None]:
a = pd.DataFrame(ratings.groupby(['review_profilename'])['review_overall'].count()).reset_index()
a.columns = ['review_overall', 'count']
print ('** Top 5 reviewers with review_profilename ** \n', a.sort_values(by='count', ascending=False).head(5))
print('** Bottom 5 reviewers with review_profilename ** \n', a.sort_values(by='count', ascending=True).head(5))

In [None]:
sns.set(font_scale=1.4)
sns.distplot(a['count'])
plt.show()

In [None]:
print('The average number of ratings given by the users  :',a['count'].mean())

### Recommendation Models

1) Divide your data into training and testing dataset.

In [None]:
print('Dataset size before we begin split into test/train: ' , ratings.shape)

In [None]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(ratings, test_size=0.30, random_state=49)

In [None]:
print(train.shape)
print(test.shape)

**Since dataset is large, we don't need to stratify the sample**

**2) Build user-based and item-based models. <br>
2a) User-based model**

In [None]:
dfuser = pd.pivot_table(train,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'],fill_value=0)

In [None]:
dfuser.head()

##### Copy train and test dataset
These dataset will be used for prediction and evaluation. 
- Dummy train will be used later for prediction of the beers which has not been rated by the user. To ignore the beers rated by the user, we will mark it as 0 during prediction. The beers not rated by user is marked as 1 for prediction. 
- Dummy test will be used for evaluation. To evaluate, we will only make prediction on the beers rated by the user. So, this is marked as 1. This is just opposite of dummy_train

In [None]:
dummy_train = train.copy()
dummy_test = test.copy()

In [None]:
dummy_train['review_overall'] = dummy_train['review_overall'].apply(lambda x: 0 if x>=1 else 1)
dummy_test['review_overall'] = dummy_test['review_overall'].apply(lambda x: 1 if x>=1 else 0)

In [None]:
# The movies not rated by user is marked as 1 for prediction. 
dummy_train =  pd.pivot_table(dummy_train,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'],fill_value=1)

dummy_test =  pd.pivot_table(dummy_test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'],fill_value=0)

In [None]:
dummy_train.head()

In [None]:
dummy_test.head()

**Using similarity matrix - Cosine similarity**  

In [None]:
from sklearn.metrics.pairwise import pairwise_distances

# User Similarity Matrix
user_correlation = 1 - pairwise_distances(dfuser, metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

In [None]:
user_correlation.shape

**Using adjusted Cosine - not removing the NaN values and calculating the mean only for the beers rated by the user**

In [None]:
dfuser_w_nan = pd.pivot_table(train,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'])

In [None]:
dfuser_w_nan.head()

**Normalising the rating of the Beers for each user aroung 0 mean**

In [None]:
mean = np.nanmean(dfuser_w_nan, axis=1)
df_subtracted = (dfuser_w_nan.T-mean).T

In [None]:
df_subtracted.head()

**Let's find Cosine similarity**

In [None]:
from sklearn.metrics.pairwise import pairwise_distances

# User Similarity Matrix
user_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

**Prediction**

Doing the prediction for the users which are positively related with other users, and not the users which are negatively related as we are interested in the users which are more similar to the current users. So, ignoring the correlation for values less than 0.

In [None]:
user_correlation[user_correlation<0]=0
user_correlation

Rating predicted by the user (for beers rated as well as not rated) is the weighted sum of correlation with the beer ratings (as present in the rating dataset). 

In [None]:
user_predicted_ratings = np.dot(user_correlation, dfuser_w_nan.fillna(0))
user_predicted_ratings

In [None]:
user_predicted_ratings.shape

In [None]:
user_final_rating = np.multiply(user_predicted_ratings,dummy_train)
user_final_rating.head(10)

**Find top 5 recommendations for user1**

In [None]:
user_final_rating.iloc[0].sort_values(ascending=False)[0:5]

**2b)Item based model**

Using Correlation

Taking the transpose of the rating matrix to normalize the rating around the mean for different beer_beerid. In the user based similarity, we had taken mean for each user intead of each beerid reviewed

In [None]:
dfitem = pd.pivot_table(train,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'],fill_value=0).T

In [None]:
dfitem.head()

Normalising the rating for beers

In [None]:
mean = np.nanmean(dfitem, axis=1)
df_subtracted = (dfitem.T-mean).T

In [None]:
df_subtracted.head()

**Cosine similarity** <br>
Note that since the data is normalised, both the cosine metric and correlation metric will give the same value

In [None]:
from sklearn.metrics.pairwise import pairwise_distances

# User Similarity Matrix
item_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
item_correlation[np.isnan(item_correlation)] = 0
print(item_correlation)

In [None]:
# Let's choose positive correlation
item_correlation[item_correlation<0]=0
item_correlation

**Prediction**

In [None]:
item_predicted_ratings = np.dot((dfitem.fillna(0).T),item_correlation)
item_predicted_ratings

In [None]:
item_predicted_ratings.shape

In [None]:
dummy_train.shape

**Filtering the rating only for the beers not rated by the users for recommendation**

In [None]:
item_final_rating = np.multiply(item_predicted_ratings,dummy_train)
item_final_rating.head()

**Top5 item prediction**

In [None]:
item_final_rating.iloc[1].sort_values(ascending=False)[0:5]

**5) Compare the performance of the two models using test data and suggest the one that should be deployed**

**User similarity**

In [None]:
T_dfuser = pd.pivot_table(test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'])

mean = np.nanmean(T_dfuser, axis=1)
test_df_subtracted = (T_dfuser.T-mean).T

# User Similarity Matrix
test_user_correlation = 1 - pairwise_distances(test_df_subtracted.fillna(0), metric='cosine')
test_user_correlation[np.isnan(test_user_correlation)] = 0
print(test_user_correlation)

In [None]:
test_user_correlation.shape

In [None]:
test_user_correlation[test_user_correlation<0]=0
test_user_predicted_ratings = np.dot(test_user_correlation, T_dfuser.fillna(0))
test_user_predicted_ratings

**Test Prediction**

In [None]:
test_user_final_rating = np.multiply(test_user_predicted_ratings,dummy_test)
test_user_final_rating.head()

**Calculate Root Mean Square error/RMSE**

In [None]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = test_user_final_rating.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

In [None]:
test_ = pd.pivot_table(test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'])

In [None]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [None]:
rmse = (sum(sum((test_ - y )**2))/total_non_nan)**0.5
print('RMSE for User-based model :  ',rmse)

**Using Item Similarity**

In [None]:
T_dfitem = pd.pivot_table(test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid']).T


mean = np.nanmean(T_dfitem, axis=1)
test_df_subtracted = (T_dfitem.T-mean).T

test_item_correlation = 1 - pairwise_distances(test_df_subtracted.fillna(0), metric='cosine')
test_item_correlation[np.isnan(test_item_correlation)] = 0
test_item_correlation[test_item_correlation<0]=0

In [None]:
test_item_correlation.shape

In [None]:
T_dfitem.shape

In [None]:
test_item_predicted_ratings = (np.dot(test_item_correlation, T_dfitem.fillna(0))).T
test_item_final_rating = np.multiply(test_item_predicted_ratings,dummy_test)
test_item_final_rating.head()

In [None]:
test_ = pd.pivot_table(test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'])

In [None]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = test_item_final_rating.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))


test_ = pd.pivot_table(test,index=['review_profilename'],values=['review_overall'],
               columns=['beer_beerid'])

# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [None]:
rmse = (sum(sum((test_ - y )**2))/total_non_nan)**0.5
print('RMSE for Item-based model :  ',rmse)

**Answer  to 5): As can be seen, RMSE for User-User similarity is `1.84` and RMSE for Item-Item similarity is `2.26`. Hence User similarity model should get deployed in comparison to Item similarity as it has lesser RMSE**

**3)Determine how similar the first 10 users are to each other and visualise it.**

**We use SNS Clustermap to identify similarity between first 10 users. Please see the Dendrogram on the left. You can cut the dendrogram based on # similar clusters as needed** <br>

**logic works as follows, since the final rating matrix obtained is a multi-index dataframe and cannot be traversed normally** <br>
1) Iterate through user_final_rating dataframe for the ten rows, hence range (0,10) <br>
2) Iterate through all the columns of dataframe and choose top 10 values (sort desc) <br>
3) Append the values (including the name)to a list and then reshape the list <br>
4) Change datatype of elements from Object to Float <br>
5) Derive cluster map. Dendrograms show relativity between elements. Cut the dendrogram based on # of similar clusters needed! <br>


**Using Correlation metric**

In [None]:
udf_  = []
for x in range(0, 10):
    tmp = user_final_rating.iloc[x]
    udf_.append(tmp.name)
    for y in range(0,10): 
         udf_.append(tmp.sort_values(ascending=False)[y])
            
udf_ = pd.DataFrame(np.array(udf_).reshape(10,11))
udf_.columns = ['review_profilename', '1','2','3','4','5','6','7','8','9','10']
udf_ = udf_.set_index('review_profilename')
udf_ = udf_.astype('float')
#sns.clustermap(udf_ ,metric="correlation",  cmap="mako", col_cluster=False)

**Single linkage clustering method**

In [None]:
udf_  = []
for x in range(0, 10):
    tmp = user_final_rating.iloc[x]
    udf_.append(tmp.name)
    for y in range(0,10): 
         udf_.append(tmp.sort_values(ascending=False)[y])
            
udf_ = pd.DataFrame(np.array(udf_).reshape(10,11))
udf_.columns = ['review_profilename', '1','2','3','4','5','6','7','8','9','10']
udf_ = udf_.set_index('review_profilename')
udf_ = udf_.astype('float')
#sns.clustermap(udf_ ,method="single",  cmap="mako", col_cluster=False)

**Result can also be obtained by Correlation matrix and heatmap for the top 10 values of first 10 users. Rows numbered 1 to 10 depict first 10 users. colors indicate how closely the values are correlated / how similar the users are with each other for each of the top 10 values**

In [None]:
plt.subplots(figsize=(10,10))
#sns.heatmap(udf_.corr(), annot=True, linewidths=.5 )

**4)Compute and visualise the similarity between the first 10 beers.**

**We use SNS Clustermap to identify similarity between first 10 beers. Please see the Dendrogram on the left. You can cut the dendrogram based on # similar clusters as needed**  
**logic works as follows, since the final rating matrix obtained is a multi-index dataframe and cannot be traversed normally**<br>
1) Iterate through user_final_rating dataframe for the ten rows, hence range (0,10)<br> 
2) Iterate through all the columns of dataframe and choose top 10 values (sort desc) <br>
3) Append the values (including the name)to a list and then reshape the list <br>
4) Change datatype of elements from Object to Float <br>
5) Derive cluster map. Dendrograms show relativity between elements. Cut the dendrogram based on # of similar clusters needed! <br>


In [None]:
idf_  = []
for x in range(0, 10):
    tmp = item_final_rating.T.iloc[x]
    idf_.append(tmp.name[1].astype(str))
    for y in range(0,10): 
         idf_.append(tmp.sort_values(ascending=False)[y])
            
idf_ = pd.DataFrame(np.array(idf_).reshape(10,11))
idf_.columns = ['beer_beerid', '1','2','3','4','5','6','7','8','9','10']
idf_ = idf_.set_index('beer_beerid')
idf_ = idf_.astype('float')
#sns.clustermap(idf_,  cmap="mako", col_cluster=False)

**Result can also be obtained by Correlation matrix and heatmap for the top 10 values of first 10 beers. Rows numbered 1 to 10 depict first 10 rows /first 10 beers. colors indicate how closely the values are correlated / how similar the beers are with each other for each of the top 10 values of a beer**

In [None]:
#plt.subplots(figsize=(10,10))
#sns.heatmap(idf_.corr(), annot=True, linewidths=.5 )

**6)Give the names of the top 5 beers that you would recommend to the users 'cokes', 'genog' and 'giblet' using both the models.** 

**Recommendation based on User-prediction model**

In [None]:
# locate the index positions:
print ('Cokes index pos:', user_final_rating.index.get_loc('cokes'))
print ('Genog index pos:', user_final_rating.index.get_loc('genog'))
print ('Giblet index pos:',user_final_rating.index.get_loc('giblet'))

In [None]:
# Get top 5 predictions based on user similarity model
print (user_final_rating.iloc[user_final_rating.index.get_loc('cokes')].sort_values(ascending=False)[0:5])
print (user_final_rating.iloc[user_final_rating.index.get_loc('genog')].sort_values(ascending=False)[0:5])
print (user_final_rating.iloc[user_final_rating.index.get_loc('giblet')].sort_values(ascending=False)[0:5])

**Recommendation based on Item-prediction model**

In [None]:
# locate the index positions:
print ('Cokes index pos:', item_final_rating.index.get_loc('cokes'))
print ('Genog index pos:', item_final_rating.index.get_loc('genog'))
print ('Giblet index pos:',item_final_rating.index.get_loc('giblet'))

In [None]:
# Get top 5 predictions based on item similarity model
print (item_final_rating.iloc[item_final_rating.index.get_loc('cokes')].sort_values(ascending=False)[0:5])
print (item_final_rating.iloc[item_final_rating.index.get_loc('genog')].sort_values(ascending=False)[0:5])
print (item_final_rating.iloc[item_final_rating.index.get_loc('giblet')].sort_values(ascending=False)[0:5])