# Description

## Context
Online E-commerce websites like Amazon, Flipkart uses different recommendation models to provide different suggestions to different users. Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and produces high-quality recommendations in real-time.

## Objective
Build a recommendation system to recommend products to customers based on their previous ratings for other products. Apply the concepts and techniques you have learned in the previous weeks and summarise your insights at the end.

 

## Dataset
We are using the Electronics dataset from the Amazon Reviews data repository, which has several datasets.

## Attribute Information

**userId:** Every user identified with a unique id

**productId:** Every product identified with a unique id

**Rating:** Rating of the corresponding product by the corresponding user

**timestamp:** Time of the rating ( ignore this column for this exercise)

In [1]:
%matplotlib inline

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import time
import sklearn.externals
import joblib

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# To supress warnings
import warnings

warnings.filterwarnings("ignore")

## Load Electronics dataset

In [4]:
df = pd.read_csv('ratings_Electronics.csv', names=['userId', 'productId', 'rating', 'timestamp'])

## Explore Data

In [6]:
df.head()

Unnamed: 0,userId,productId,Rating,timestamp
0,AKM1MP6P0OYPR,132793040,5.0,1365811200
1,A2CX7LUOHB2NDG,321732944,5.0,1341100800
2,A2NWSAGRHCP8N5,439886341,1.0,1367193600
3,A2WNBOD3WNDNKT,439886341,3.0,1374451200
4,A1GI0U4ZRJA8WN,439886341,1.0,1334707200


In [7]:
# there are 7824482 rows and 4 feature
df.shape

(7824482, 4)

In [9]:
len(df['productId'].unique())

476002

In [14]:
len(df['userId'].unique())

4201696

Observation:
- There are `476002` unique product.
- There are `4201696` unique user.

## Take a subset of the dataset to make it less sparse (or denser) (For example, keep the users only who have given 50 or more ratings.)

## Split the data randomly into train and test datasets

In [10]:
train_data, test_data = train_test_split(df, test_size = 0.20, random_state=0)


In [12]:
train_data.shape

(6259585, 4)

In [13]:
test_data.shape

(1564897, 4)

## Build Popularity Recommender model

In [15]:
#Count of user_id for each unique song as recommendation score 
train_data_grouped = train_data.groupby('productId').agg({'userId': 'count'}).reset_index()
train_data_grouped.rename(columns = {'userId': 'score'},inplace=True)
train_data_grouped.head()

Unnamed: 0,productId,score
0,321732944,1
1,439886341,3
2,511189877,4
3,528881469,24
4,558835155,1


In [16]:
#Sort the songs on recommendation score 
train_data_sort = train_data_grouped.sort_values(['score', 'productId'], ascending = [0,1]) 
      
#Generate a recommendation rank based upon score 
train_data_sort['Rank'] = train_data_sort['score'].rank(ascending=0, method='first') 
          
#Get the top 5 recommendations 
popularity_recommendations = train_data_sort.head(5) 
popularity_recommendations 

Unnamed: 0,productId,score,Rank
285236,B0074BW614,14642,1.0
395341,B00DR0PDNE,13139,2.0
302531,B007WTAJTO,11383,3.0
95915,B0019EHU8G,9786,4.0
274350,B006GWO5WK,9770,5.0


## Build Collaborative Filtering model

## Evaluate both models (Once the model is trained on the training data, it can be used to compute the error (RMSE) on predictions made on the test data.)

## Get top K (K = 5) recommendations (Since our goal is to recommend new products to each user based on his/her habits, we will recommend 5 new products.)

## Summarize your insights