<img src="https://wallpaperaccess.com/full/1308159.jpg" alt="Amazon.com" class="center">

- Domain - E-commerce
- Context - Everyday a million products are being recommended to users based on
popularity and other metrics on e-commerce websites. The most popular e-commerce
website boosts average order value by 50%, increases revenues by 300%, and
improves conversion. In addition to being a powerful tool for increasing revenues,
product recommendations are so essential that customers now expect to see similar
features on all other eCommerce sites.
- Data Description -
Data columns- First three columns are userId, productId, and ratings and the fourth
column is timestamp. You can discard the timestamp column as in this case you may
not need to use it.
- Source - Amazon Reviews data (http://jmcauley.ucsd.edu/data/amazon/) The
repository has several datasets. For this case study, we are using the Electronics
dataset.
- Learning Outcomes <br/>
Exploratory Data Analysis<br/>
Data Wrangling <br/>
Build a Popularity recommender model <br/>
Build Collaborative Filtering model <br/>

- Objective - To make a recommendation system that recommends at least five(5)
new products based on the user's habits.



<center style="background-color:tomato"> Load libraries </center>

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

<center style="background-color:tomato">  1. Read and explore the given dataset </center>

In [None]:
df = pd.read_csv('../input/amazon-electronic-product-recommendation/ratings_Electronics (1).csv', names=['userId', 'productId','rating','timestamp'])
df.head()

In [None]:

df.info()

In [None]:
df.shape

In [None]:
df.rating.describe()

<center style="background-color:tomato"> 2. Take a subset of the dataset to make it less sparse/ denser </center>

In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.drop(columns=["timestamp"], inplace=True)

In [None]:
# take 10 % of the data
df = df[:int(len(df) * .1)]
df.shape

In [None]:
# find minimum and maximum ratings
print('The minimum rating is: %d' %(df['rating'].min()))
print('The maximum rating is: %d' %(df['rating'].max()))

In [None]:
df.groupby('userId')['rating'].mean().sort_values(ascending=False).head(10)  

In [None]:
# check the Rating distribution in the range 1-5 for the Data given 

with sns.axes_style('white'):
    g = sns.factorplot("rating", data=df, aspect=2.0,kind='count')
    g.set_ylabels("Total number of ratings")

In [None]:
print("Total data ")
print("*"*50)
print("\nTotal no of ratings :",df.shape[0])
print("Total No of Users   :", len(np.unique(df.userId)))
print("Total No of products  :", len(np.unique(df.productId)))

In [None]:
#Keep the users where the user has rated more than 50 

counts1 = df['userId'].value_counts()
#print(counts1)
Data_new = df[df['userId'].isin(counts1[counts1 >= 50].index)]
#counts1

In [None]:
#highest rated products from the selected records. 

Data_new.groupby('productId')['rating'].mean().sort_values(ascending=False) 

In [None]:
#Calculate the density of the rating matrix

final_ratings_matrix = Data_new.pivot(index = 'userId', columns ='productId', values = 'rating').fillna(0)
print('Shape of final_ratings_matrix: ', final_ratings_matrix.shape)

given_num_of_ratings = np.count_nonzero(final_ratings_matrix)
print('given_num_of_ratings = ', given_num_of_ratings)
possible_num_of_ratings = final_ratings_matrix.shape[0] * final_ratings_matrix.shape[1]
print('possible_num_of_ratings = ', possible_num_of_ratings)
density = (given_num_of_ratings/possible_num_of_ratings)
density *= 100
print ('density: {:4.2f}%'.format(density))

In [None]:
final_ratings_matrix.head()

In [None]:
# Matrix with one row per 'Product' and one column per 'user' for Item-based CF
final_ratings_matrix_T = final_ratings_matrix.transpose()
final_ratings_matrix_T.head()

<center style="background-color:tomato">  3. Build Popularity Recommender model. </center>

In [None]:
#Count of user_id for each unique product as recommendation score 
Data_new_grouped = Data_new.groupby('productId').agg({'userId': 'count'}).reset_index()
Data_new_grouped.rename(columns = {'userId': 'score'},inplace=True)
Data_new_grouped.head()

In [None]:
#Sort the products on recommendation score 
train_data_sort = Data_new_grouped.sort_values(['score', 'productId'], ascending = [0,1]) 

In [None]:
#Generate a recommendation rank based upon score 
train_data_sort['Rank'] = train_data_sort['score'].rank(ascending=0, method='first') 

In [None]:
#Get the top 5 recommendations 
popularity_recommendations = train_data_sort.head(5) 
popularity_recommendations 

In [None]:
# Use popularity based recommender model to make predictions
def recommend(user_id):     
    user_recommendations = popularity_recommendations 
          
    #Add user_id column for which the recommendations are being generated 
    user_recommendations['userId'] = user_id 
      
    #Bring user_id column to the front 
    cols = user_recommendations.columns.tolist() 
    cols = cols[-1:] + cols[:-1] 
    user_recommendations = user_recommendations[cols] 
          
    return user_recommendations 

In [None]:
find_recom = [15,21,53]   # This list is user choice.
for i in find_recom:
    print("Here is the recommendation for the userId: %d\n" %(i))
    print(recommend(i))    
    print("\n") 

In [None]:
no_of_ratings_per_product = Data_new.groupby(by='productId')['rating'].count().sort_values(ascending=False)

fig = plt.figure(figsize=plt.figaspect(.5))
ax = plt.gca()
plt.plot(no_of_ratings_per_product.values)
plt.title('Ratings per Product')
plt.xlabel('Product')
plt.ylabel('No of ratings per product')
ax.set_xticklabels([])

plt.show()

In [None]:
# Top 30 recommendations for the users

popular_products = pd.DataFrame(Data_new.groupby('productId')['rating'].count())
most_popular = popular_products.sort_values('rating', ascending=False)
most_popular.head(30).plot(kind = "bar")
plt.title("Ratings of product Id")
plt.show()

<center style="background-color:tomato"> 4. Split the data randomly into a train and test dataset.  </center>

In [None]:
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
import os
from surprise.model_selection import train_test_split
from collections import defaultdict

In [None]:
#Reading the dataset
reader = Reader(rating_scale=(1, 5))
data1 = Dataset.load_from_df(Data_new,reader)
data1

In [None]:
#Splitting the dataset
trainset, testset = train_test_split(data1, test_size=0.3,random_state=123)

In [None]:
trainset.ur

<center style="background-color:tomato"> Build Collaborative Filtering model </center>

In [None]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo.fit(trainset)

In [None]:
# run the trained model against the testset
test_pred = algo.test(testset)

In [None]:
test_pred

<center style="background-color:tomato"> Get top - K ( K = 5) recommendations. </center>

In [None]:
def get_top_n(predictions, n=5):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [None]:
top_n = get_top_n(test_pred, n=5)
top_n

In [None]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

In [None]:
uid = "A231WM2Z2JL0U3"  # raw user id (as in the ratings file). They are **strings**!
iid = "B00004RC2D"  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=0.0, verbose=True)

In [None]:
pred = pd.DataFrame(test_pred)
pred[pred['uid'] == 'A231WM2Z2JL0U3'][['iid', 'r_ui','est']].sort_values(by = 'est',ascending = False).head(10)

<center style="background-color:tomato">6. Evaluate the above model </center>

In [None]:
# get RMSE
print("User-based Model : Test Set")
accuracy.rmse(test_pred, verbose=True)

<center style="background-color:tomato"> 8. Summarise </center>

- Model-based Collaborative Filtering is a personalised recommender system, the recommendations are based on the past behavior of the user and it is not dependent on any additional information.

- The Popularity-based recommender system is non-personalised and the recommendations are based on frequecy counts, which may be not suitable to the user.The Popularity based model has recommended the same set of 5 products to both but Collaborative Filtering based model has recommended entire different list based on the user past purchase history

- KNN with means recommender system if we have got user rating avaialable and popularity based in case of cold start