# RS - Project

## Domain:
Smartphone, Electronics

## Context:
India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system based on individual consumer’s behaviour or choice. 

## Data Description:

• author : name of the person who gave the rating <br>
• country : country the person who gave the rating belongs to <br>
• data : date of the rating <br>
• domain: website from which the rating was taken from <br>
• extract: rating content <br>
• language: language in which the rating was given <br>
• product: name of the product/mobile phone for which the rating was given <br>
• score: average rating for the phone <br>
• score_max: highest rating given for the phone <br>
• source: source from where the rating was taken  <br>

## Project Objective:
We will build a recommendation system using popularity based and collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively.. 

In [1]:
# Importing the libraries
!pip install surprise
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import preprocessing
from collections import defaultdict
from surprise import SVD
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

# Suppressing Warnings
import warnings
warnings.filterwarnings('ignore')





# Smartphone, Electronics

## 1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps.

### A. Merge all the provided CSVs into one data-frame.

In [2]:
#Loading Data files
ph1 = pd.read_csv('phone_user_review_file_1.csv')
ph2 = pd.read_csv('phone_user_review_file_2.csv')
ph3 = pd.read_csv('phone_user_review_file_3.csv')
ph4 = pd.read_csv('phone_user_review_file_4.csv')
ph5 = pd.read_csv('phone_user_review_file_5.csv')
ph6 = pd.read_csv('phone_user_review_file_6.csv')   

In [3]:
ph1.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/samsung-galaxy-s8/,/cellphones/samsung-galaxy-s8/,/cellphones/samsung-galaxy-s8/,/cellphones/samsung-galaxy-s8/,/cellphones/samsung-galaxy-s8/
date,5/2/2017,4/28/2017,5/4/2017,5/2/2017,5/11/2017
lang,en,en,en,en,en
country,us,us,us,us,us
source,Verizon Wireless,Phone Arena,Amazon,Samsung,Verizon Wireless
domain,verizonwireless.com,phonearena.com,amazon.com,samsung.com,verizonwireless.com
score,10.0,10.0,6.0,9.2,4.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,As a diehard Samsung fan who has had every Sam...,Love the phone. the phone is sleek and smooth ...,Adequate feel. Nice heft. Processor's still sl...,Never disappointed. One of the reasons I've be...,I've now found that i'm in a group of people t...
author,CarolAnn35,james0923,R. Craig,Buster2020,S Ate Mine


In [4]:
ph2.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/leagoo-lead-7/,/cellphones/leagoo-lead-7/,/cellphones/leagoo-lead-7/,/cellphones/leagoo-lead-7/,/cellphones/leagoo-lead-7/
date,4/15/2015,5/23/2015,4/27/2015,4/22/2015,4/18/2015
lang,en,en,en,en,en
country,us,gb,gb,gb,gb
source,Amazon,Amazon,Amazon,Amazon,Amazon
domain,amazon.com,amazon.co.uk,amazon.co.uk,amazon.co.uk,amazon.co.uk
score,2.0,10.0,8.0,10.0,10.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,"The telephone headset is of poor quality , not...",This is my first smartphone so I have nothing ...,Great phone. Battery life not great but seems ...,Best 90 quid I've ever spent on a smart phone,I m happy with this phone.it s very good.thx team
author,luis,Mark Lavin,tracey,Reuben Ingram,viorel


In [5]:
ph3.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/,/cellphones/samsung-galaxy-s-iii-slim-sm-g3812/
date,11/7/2015,10/2/2015,9/2/2015,9/2/2015,9/1/2015
lang,pt,pt,pt,pt,pt
country,br,br,br,br,br
source,Submarino,Submarino,Submarino,Submarino,Colombo
domain,submarino.com.br,submarino.com.br,submarino.com.br,submarino.com.br,colombo.com.br
score,6.0,10.0,10.0,8.0,8.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,"recomendo, eu comprei um, a um ano, e agora co...",Comprei um pouco desconfiada do site e do celu...,"Muito bom o produto, obvio que tem versões mel...",Unica ressalva fica para a camera que poderia ...,Rapidez e atenção na entrega. O aparelho é mui...
author,herlington tesch,Luisa Silva Marieta,Cyrus,Marcela Santa Clara Brito,Claudine Maria Kuhn Walendorff


In [6]:
ph4.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/samsung-s7262-duos-galaxy-ace/,/cellphones/samsung-s7262-duos-galaxy-ace/,/cellphones/samsung-s7262-duos-galaxy-ace/,/cellphones/samsung-s7262-duos-galaxy-ace/,/cellphones/samsung-s7262-duos-galaxy-ace/
date,3/11/2015,17/11/2015,29/10/2015,29/10/2015,29/10/2015
lang,en,en,en,en,en
country,us,in,in,in,in
source,Amazon,Zopper,Amazon,Amazon,Amazon
domain,amazon.com,zopper.com,amazon.in,amazon.in,amazon.in
score,2.0,10.0,4.0,6.0,10.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,was not conpatable with my phone as stated. I ...,Decent Functions and Easy to Operate Pros:- Th...,Not Good Phone such price. Hang too much and v...,not bad for features,Excellent product
author,Frances DeSimone,Expert Review,Amazon Customer,Amazon Customer,NHK


In [7]:
ph5.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/karbonn-k1616/,/cellphones/karbonn-k1616/,/cellphones/karbonn-k1616/,/cellphones/karbonn-k1616/,/cellphones/karbonn-k1616/
date,7/13/2016,7/13/2016,7/13/2016,4/25/2014,4/23/2013
lang,en,en,en,en,en
country,in,in,in,in,in
source,91 Mobiles,91 Mobiles,91 Mobiles,Naaptol,Naaptol
domain,91mobiles.com,91mobiles.com,91mobiles.com,naaptol.com,naaptol.com
score,2.0,6.0,4.0,10.0,10.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,I bought 1 month before. currently speaker is ...,"I just bought one week back, I have Airtel con...",one problem in this handset opera is not worki...,here Karbonn comes up with an another excellen...,"What a phone, all so on Naaptol my god 23% off..."
author,venkatesh,Venkat,krrish,BRIJESH CHAUHAN,Suraj CHAUHAN


In [8]:
ph6.head().T

Unnamed: 0,0,1,2,3,4
phone_url,/cellphones/samsung-instinct-sph-m800/,/cellphones/samsung-instinct-sph-m800/,/cellphones/samsung-instinct-sph-m800/,/cellphones/samsung-instinct-sph-m800/,/cellphones/samsung-instinct-sph-m800/
date,9/16/2011,2/13/2014,12/30/2011,10/18/2008,9/6/2008
lang,en,en,en,en,en
country,us,us,us,us,us
source,Phone Arena,Amazon,Phone Scoop,HandCellPhone,Reviewed.com
domain,phonearena.com,amazon.com,phonescoop.com,handcellphone.com,reviewed.com
score,8.0,6.0,9.0,4.0,6.0
score_max,10.0,10.0,10.0,10.0,10.0
extract,I've had the phone for awhile and it's a prett...,to be clear it is not the sellers fault that t...,Well i love this phone. i have had ton of phon...,I have had my Instinct for several months now ...,i have had this instinct phone for about two m...
author,ajabrams95,Stephanie,snickers,A4C,betaBgood


In [9]:
ph1.shape

(374910, 11)

In [10]:
ph2.shape

(114925, 11)

In [11]:
ph3.shape

(312961, 11)

In [12]:
ph4.shape

(98284, 11)

In [13]:
ph5.shape

(350216, 11)

In [14]:
ph6.shape

(163837, 11)

In [15]:
ph_merge = pd.concat([ph1,ph2,ph3,ph4,ph5,ph6],axis=0)

In [16]:
ph_copy = ph_merge.copy()

### B. Explore, understand the Data and share at least 2 observations.

In [17]:
#Checking dataset attributes datatypes 
ph_merge.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1415133 entries, 0 to 163836
Data columns (total 11 columns):
 #   Column     Non-Null Count    Dtype  
---  ------     --------------    -----  
 0   phone_url  1415133 non-null  object 
 1   date       1415133 non-null  object 
 2   lang       1415133 non-null  object 
 3   country    1415133 non-null  object 
 4   source     1415133 non-null  object 
 5   domain     1415133 non-null  object 
 6   score      1351644 non-null  float64
 7   score_max  1351644 non-null  float64
 8   extract    1395772 non-null  object 
 9   author     1351931 non-null  object 
 10  product    1415132 non-null  object 
dtypes: float64(2), object(9)
memory usage: 129.6+ MB


#### All columns are objects except score and score_max which are floating point.

In [18]:
ph_merge.shape

(1415133, 11)

In [19]:
ph_merge.describe()

Unnamed: 0,score,score_max
count,1351644.0,1351644.0
mean,8.00706,10.0
std,2.616121,0.0
min,0.2,10.0
25%,7.2,10.0
50%,9.2,10.0
75%,10.0,10.0
max,10.0,10.0


#### Standard deviation from the mean score of 8 is 2.616121e+00

In [20]:
ph_merge.head(2)

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8


### D. Check for missing values. Impute the missing values, if any.

In [21]:
#check for missing values
ph_merge.isnull().values.any() # If there are any null values in data set

True

In [22]:
null_counts = ph_merge.isnull().sum()  # This prints the columns with the number of null values they have
print (null_counts)

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64


In [23]:
# filling the null values in column 'score' and 'score_max' 
ph_merge = ph_merge.fillna(ph_merge.median())

# dropping the null values in columns 'extract' ,'author' and 'product'
ph_merge = ph_merge.dropna()

### C. Round off scores to the nearest integers.

In [24]:
ph_merge['score'] = ph_merge['score'].astype(int) 
ph_merge['score_max'] = ph_merge['score_max'].astype(int) 

In [25]:
ph_merge.shape

(1336416, 11)

### E. Check for duplicate values and remove them, if any.

In [26]:
ph_merge_drop = ph_merge.drop_duplicates()

### G. Drop irrelevant features. Keep features like Author, Product, and Score.

In [27]:
# we can drop phone_url,date,lang,country,source,domain and extract since they do not contribute in deciding popularity.  
ph_merge_drop.drop(['phone_url','date','lang','country','source','domain','score_max','extract'], axis = 1, inplace = True)

In [28]:
ph_merge_copy = ph_merge_drop.copy()

In [29]:
ph_merge_drop.shape

(1331600, 3)

### F. Keep only 1 Million data samples. Use random state=612.

In [30]:
ph = ph_merge_drop.sample(n=1000000, random_state=612)

## 2. Answer the following questions.

### A. Identify the most rated features.

In [31]:
#sorting on products that got highest mean score
ph.groupby('product')['score'].mean().sort_values(ascending=False).head()  

product
Smartphone Sony Xperia E1 Desbloqueado Vivo Android 4.3 Tela 4 4GB 3G Wi-Fi Câmera 3MP - Branco                    10.0
Samsung Smartphone Samsung Galaxy S5 Desbloqueado Branco Android 4.4.2 4G Câmera 16 MP Memória Interna 16 GB       10.0
Samsung Smartphone Samsung Galaxy S5 Duos Desbloqueado/ Dual Chip / Branco / 4G / 16 MP / Android 4.4              10.0
Samsung Smartphone Samsung Galaxy S5 Desbloqueado/ Branco / 4G / 16 MP / Android 4.4.2 / 16 GB / USB 3.0           10.0
Samsung Smartphone Samsung Galaxy S5 Desbloqueado Vivo Preto Android 4.4.2 4G Câmera 16 MP Memória Interna 16GB    10.0
Name: score, dtype: float64

### B. Identify the users with most number of reviews.

In [32]:
(ph['author'].value_counts()).head()

Amazon Customer    57765
Cliente Amazon     14564
e-bit               6309
Client d'Amazon     5720
Amazon Kunde        3624
Name: author, dtype: int64

In [33]:
# The product that got most number of reviews.
ph['product'].value_counts().head()

Lenovo Vibe K4 Note (White,16GB)     3908
Lenovo Vibe K4 Note (Black, 16GB)    3234
OnePlus 3 (Graphite, 64 GB)          3128
OnePlus 3 (Soft Gold, 64 GB)         2643
Huawei P8lite zwart / 16 GB          1994
Name: product, dtype: int64

### C. Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset.

In [34]:
# extracting authors who gave greater than 50 ratings
ph1 = pd.DataFrame(columns=['author', 'a_count'])
ph1['author']=ph['author'].value_counts().index.tolist() 
ph1['a_count'] = list(ph['author'].value_counts() > 50)

In [35]:
# get names of indexes for which count column value is False
index_names = ph1[ ph1['a_count'] == False ].index 
# drop these row indexes from dataFrame 
ph1.drop(index_names, inplace = True) 
ph1

Unnamed: 0,author,a_count
0,Amazon Customer,True
1,Cliente Amazon,True
2,e-bit,True
3,Client d'Amazon,True
4,Amazon Kunde,True
...,...,...
674,Jens,True
675,cemdiler,True
676,Валерия,True
677,Ann,True


In [36]:
# extracting product that got more than 50 ratings
ph2 = pd.DataFrame(columns=['product', 'p_count'])
ph2['product']=ph['product'].value_counts().index.tolist() 
ph2['p_count'] = list(ph['product'].value_counts() > 50)

In [37]:
# get names of indexes for which count column value is False
index_names = ph2[ ph2['p_count'] == False ].index 
# drop these row indexes from dataFrame 
ph2.drop(index_names, inplace = True)

In [38]:
ph2

Unnamed: 0,product,p_count
0,"Lenovo Vibe K4 Note (White,16GB)",True
1,"Lenovo Vibe K4 Note (Black, 16GB)",True
2,"OnePlus 3 (Graphite, 64 GB)",True
3,"OnePlus 3 (Soft Gold, 64 GB)",True
4,Huawei P8lite zwart / 16 GB,True
...,...,...
4341,Alcatel OT-708,True
4342,Apple iPhone 5 32GB wei??,True
4343,Apple iPhone 5c 16GB Blue SIM-Free Smartphone,True
4344,"Samsung Galaxy S4 Mini, 16GB (Verizon Wireless)",True


In [39]:
# selecting data rows where product is having more than 50 ratings.  
ph3 = ph[ph['product'].isin(ph2['product'])] 
ph3

Unnamed: 0,score,author,product
104246,10,Paul B,Samsung i897 Captivate Android Smartphone Gala...
78693,10,Yuvraj,"Blu Win JR LTE (Grey, 4GB)"
8816,2,Joyce D. Pratt,"BLU Vivo XL Smartphone - 5.5"" 4G LTE - GSM Unl..."
116623,10,David B,Samsung S3350 Chat 335 Sim Free Mobile Phone
35333,10,Sebastian,"Samsung E1190 Handy (3,6 cm (1,43 Zoll) Displa..."
...,...,...,...
87173,8,Javier,Huawei Ascend Y330 - Smartphone libre Android ...
281625,8,Patrix,"Huawei Ascend G510 Smartphone Touch, Fotocamer..."
110881,2,Amazon Customer,"Apple iPhone 5C Factory Unlocked Cellphone, 8G..."
36197,10,majere1975,"Samsung Smartphone Galaxy S Advance, Display 4..."


In [40]:
# selecting data rows from df3 where author has given more than 50 ratings.
# so that we get the data with products having more than 50 ratings and users who have given more than 50 ratings
ph4 = ph3[ph3['author'].isin(ph1['author'])]
ph4

Unnamed: 0,score,author,product
35333,10,Sebastian,"Samsung E1190 Handy (3,6 cm (1,43 Zoll) Displa..."
290678,8,sara,"Samsung SM-N910F Galaxy Note 4 Smartphone, 32 ..."
101404,10,Евгений,Sony Xperia Z1 Compact (лайм)
223332,8,Amazon Customer,Motorola Moto G 3rd Generation SIM-Free Smartp...
361379,10,e-bit,Smartphone Motorola Moto G 4 Play XT1603
...,...,...,...
21110,2,Amazon customer,Tracfone Motorola Moto E Android Prepaid Phone...
321740,8,Qantas,Sony Ericsson K810i Cyber-shot
269553,9,Capyto,Samsung M150 Cep Telefonu
87173,8,Javier,Huawei Ascend Y330 - Smartphone libre Android ...


In [41]:
# Shape of the final dataset.
ph4.shape

(108983, 3)

## 3. Build a popularity based model and recommend top 5 mobile phones. 

In [42]:
#calculating the mean score for a product by grouping it.
ratings_mean_count = pd.DataFrame(ph.groupby('product')['score'].mean()) 

In [43]:
# calculating the number of ratings a product got
ratings_mean_count['rating_counts'] = pd.DataFrame(ph.groupby('product')['score'].count())  

In [44]:
# 3. Recommending the 5 mobile phones based in highest mean score and highest number of ratings the product got. 
ratings_mean_count.sort_values(by=['score','rating_counts'], ascending=[False,False]).head()

Unnamed: 0_level_0,score,rating_counts
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,144
Nokia Smartphone Nokia Lumia 520 Desbloqueado Oi Preto Windows Phone 8 Câmera 5MP 3G Wi-Fi Memória Interna 8G GPS,10.0,132
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 Câmera 10MP e Frontal 2MP Memória Interna de 16GB GSM,10.0,131
Samsung Smartphone Galaxy Win Duos Branco Desbloqueado Dual Chip Câmera 5MP Processador Quad Core 1.2 Ghz Android 4.1 3G Wi- Fi e Memória 8GB,10.0,127
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi Câmera 5MP - Preto,10.0,126


In [45]:
data_pb = ph
ph

Unnamed: 0,score,author,product
104246,10,Paul B,Samsung i897 Captivate Android Smartphone Gala...
78693,10,Yuvraj,"Blu Win JR LTE (Grey, 4GB)"
109329,10,Pankaj Bhalla,"Lenovo P780 (Deep Black, 4GB)"
64164,6,Bgrazina,Samsung Galaxy XCover 2
8816,2,Joyce D. Pratt,"BLU Vivo XL Smartphone - 5.5"" 4G LTE - GSM Unl..."
...,...,...,...
70406,4,Dudls,Nokia 301 Dual
16189,8,Cintaaa__,LG Viewty KU990
99081,10,ALBERT M. MASSILLON,BLU Dash JR K Smartphone - Unlocked - Black
102484,2,Amazon Customer,Samsung Galaxy S6 SM-G920F 32GB (FACTORY UNLOC...


## 4. Build a collaborative filtering model using SVD. 

In [46]:
# arranging columns in the order of user id,item id and rating to be fed in the svd
columns_titles = ['author','product','score']
ph_merge_svd = ph_merge_drop.reindex(columns=columns_titles)

In [47]:
# Keep only 5000 data samples. Use random state=612
ph_merge_svd_data = ph_merge_svd.sample(n=5000, random_state=612)

In [48]:
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(ph_merge_svd_data,reader = reader)

In [49]:
trainset = data.build_full_trainset()

In [50]:
trainset.ur

defaultdict(list,
            {0: [(0, 10.0)],
             1: [(1, 10.0)],
             2: [(2, 10.0)],
             3: [(3, 6.0)],
             4: [(4, 2.0)],
             5: [(5, 10.0)],
             6: [(6, 10.0), (1363, 10.0)],
             7: [(7, 10.0)],
             8: [(8, 8.0), (465, 9.0)],
             9: [(9, 8.0)],
             10: [(10, 10.0)],
             11: [(11, 2.0)],
             12: [(12, 8.0)],
             13: [(13, 8.0)],
             14: [(14, 10.0)],
             15: [(15, 10.0)],
             16: [(16, 2.0)],
             17: [(17, 8.0)],
             18: [(18, 10.0)],
             19: [(19, 9.0)],
             20: [(20, 8.0)],
             21: [(21, 10.0),
              (909, 9.0),
              (2202, 6.0),
              (2551, 10.0),
              (3378, 9.0),
              (3614, 10.0)],
             22: [(22, 2.0)],
             23: [(23, 10.0)],
             24: [(24, 8.0)],
             25: [(25, 10.0)],
             26: [(26, 10.0)],
             27:

In [51]:
algo = SVD()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x27e008f0ac0>

In [52]:
# Than predict ratings for all pairs (u, i) that are NOT in the training set.
testset = trainset.build_anti_testset()

In [53]:
predictions = algo.test(testset)

In [54]:
predictions

[Prediction(uid='Paul B', iid='Blu Win JR LTE (Grey, 4GB)', r_ui=8.0086, est=8.467213717892571, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='Lenovo P780 (Deep Black, 4GB)', r_ui=8.0086, est=8.361581709557907, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='Samsung Galaxy XCover 2', r_ui=8.0086, est=8.195330167161272, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='BLU Vivo XL Smartphone - 5.5" 4G LTE - GSM Unlocked - Solid Gold', r_ui=8.0086, est=8.198071887882056, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='Samsung S3350 Chat 335 Sim Free Mobile Phone', r_ui=8.0086, est=8.458669889125334, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='Samsung E1190 Handy (3,6 cm (1,43 Zoll) Display, Dual-Band) titan gray', r_ui=8.0086, est=8.401057263048063, details={'was_impossible': False}),
 Prediction(uid='Paul B', iid='LG Nexus 4 Smartphone, Nero [Italia]', r_ui=8.0086, est=8.662481054415625, de

#### Above are the  predicted items and their estimated ratings for test user.

## 8. Try and recommend top 5 products for test users.

In [55]:
def get_top_n(predictions, n=5):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [56]:
top_n = get_top_n(predictions, n=5)

In [57]:
top_n 

defaultdict(list,
            {'Paul B': [('Samsung G935 Galaxy S7 Edge Smartphone da 32GB, Argento [Italia]',
               9.205105231396642),
              ('OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)',
               9.155369844077224),
              ('OnePlus 3 (Graphite, 64 GB)', 8.94804785794504),
              ('Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco',
               8.923320095465483),
              ('OnePlus One (Sandstone Black, 64GB)', 8.86442993018417)],
             'Yuvraj': [('OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)',
               9.19550427792605),
              ('Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)',
               9.06972316645103),
              ('OnePlus 3 (Graphite, 64 GB)', 8.990353924972373),
              ('Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, f

#### Above are the top 5 predicted items and their ratings for test users.

In [58]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

Paul B ['Samsung G935 Galaxy S7 Edge Smartphone da 32GB, Argento [Italia]', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'OnePlus 3 (Graphite, 64 GB)', 'Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'OnePlus One (Sandstone Black, 64GB)']
Yuvraj ['OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'OnePlus 3 (Graphite, 64 GB)', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]']
Pankaj Bhalla ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Lenovo Mo

Felix ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 32GB (T-Mobile)', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)']
CClark ['Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 32GB (T-Mobile)', 'Samsung Galaxy S6 32GB (Verizon)', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]']
David ['OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Samsung Galaxy S7 32GB (T-Mobile)', 'Huawei P8 lit

kds199315 ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]', 'Samsung Galaxy S7 32GB (T-Mobile)']
Peter Marxbauer ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'Samsung Galaxy S7 3

patrick ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'OnePlus 3 (Graphite, 64 GB)', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]']
RG-GONZA ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Samsung Galaxy S7 32GB (T-Mobile)', 'OnePlus 3 (Graphite, 64 GB)']
Daniel R Smith ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, 

txmate ['Samsung Galaxy S7 edge 32GB (AT&T)', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'OnePlus 3 (Graphite, 64 GB)', 'Samsung Galaxy Express I8730', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]']
kal4ak ['OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Samsung G935 Galaxy S7 Edge Smartphone da 32GB, Argento [Italia]', 'Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]']
Lee ['Samsung Galaxy S7 32GB (T-Mobile)', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Leno

Czeresnia ['Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]', 'OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Lenovo Motorola Moto G 4G (2 Generazione) Smartphone, Display 5 Pollici, LTE, Fotocamera 8 MP, Memoria 8 GB, Android 5 Lollipop, Nero [Italia]', 'Samsung Galaxy S7 edge 32GB (Verizon)']
MIchele ['Huawei P8 lite Smartphone, Display 5.0" IPS, Dual Sim, Processore Octa-Core, Memoria 16 GB, Fotocamera 13 MP, Android 5.0, Bianco', 'Samsung Galaxy S7 edge Smartphone, 13,9 cm (5,5 Zoll) Display, LTE (4G)', 'Lenovo Motorola Moto G Smartphone, 4,5 pollici display HD, processore Qualcomm, memoria 16GB, MicroSIM, Android 4.3 OS, fotocamera da 5 MP, Nero [Germania]', 'Samsung Galaxy S7 Smartphone, 12,9 cm (5,1 Zoll) Display, LTE (4G)', 'Samsung Galaxy 

## 5. Evaluate the collaborative model. Print RMSE value for SVD

In [59]:
print("SVD Model : Test Set")
accuracy.rmse(predictions, verbose=True)

SVD Model : Test Set
RMSE: 0.3367


0.3366969769065758

In [60]:
cross_validate(algo, data, measures=['RMSE'], cv=3, verbose=False)

{'test_rmse': array([2.56720483, 2.54275215, 2.64241449]),
 'fit_time': (0.25313353538513184, 0.20103812217712402, 0.20013070106506348),
 'test_time': (0.029204845428466797,
  0.016809463500976562,
  0.013332366943359375)}

#### RMSE of SVD model is lower than for cross validation.

In [61]:
def get_Iu(uid):
    """ return the number of items rated by given user
    args: 
      uid: the id of the user
    returns: 
      the number of items rated by the user
    """
    try:
        return len(trainset.ur[trainset.to_inner_uid(uid)])
    except ValueError: # user was not part of the trainset
        return 0
    
def get_Ui(iid):
    """ return number of users that have rated given item
    args:
      iid: the raw id of the item
    returns:
      the number of users that have rated the item.
    """
    try: 
        return len(trainset.ir[trainset.to_inner_iid(iid)])
    except ValueError:
        return 0
    
bf = pd.DataFrame(predictions, columns=['uid', 'iid', 'rui', 'est', 'details'])
bf['Iu'] = bf.uid.apply(get_Iu)
bf['Ui'] = bf.iid.apply(get_Ui)
bf['err'] = abs(bf.est - bf.rui)
best_predictions = bf.sort_values(by='err')[:10]
worst_predictions = bf.sort_values(by='err')[-10:]


In [62]:
best_predictions

Unnamed: 0,uid,iid,rui,est,details,Iu,Ui,err
5906033,kemal Hebano,BLU Star 4.5 US GSM (White),8.0086,8.0086,{'was_impossible': False},1,1,7.043609e-08
15163967,Dolce86,Мобильный телефон LG K220 X Power Dual Sim Black,8.0086,8.0086,{'was_impossible': False},1,1,1.153592e-07
7453499,linpanboys,Samsung SGH i450 - Blue Smartphone,8.0086,8.0086,{'was_impossible': False},1,1,1.326811e-07
6725001,D. Lawson,"Samsung Galaxy S Plus I9001 Smartphone (10,16 ...",8.0086,8.0086,{'was_impossible': False},1,1,2.192662e-07
5728737,asti,Samsung Galaxy S7 32GB UK SIM-Free Smartphone ...,8.0086,8.0086,{'was_impossible': False},1,2,2.196556e-07
3168365,AnnettaBirbetta,SAMSUNG Wave 3,8.0086,8.0086,{'was_impossible': False},1,1,2.548671e-07
15376910,Glaucia Camilo Ferreira,"Sony Xperia P Smartphone (10,2 cm (4 Zoll) Tou...",8.0086,8.0086,{'was_impossible': False},1,3,3.164126e-07
2058754,zeba,Samsung GT-I5500,8.0086,8.0086,{'was_impossible': False},1,1,3.381626e-07
16741176,Suelen,Samsung GALAXY S6 G920 32GB Unlocked GSM 4G LT...,8.0086,8.0086,{'was_impossible': False},1,1,3.457664e-07
7197499,mircan,Nokia C6,8.0086,8.0086,{'was_impossible': False},1,1,3.664029e-07


## 4. Build a collaborative filtering model using kNNWithMeans from surprise using Item based model

In [64]:
# Read dataset.
reader = Reader(rating_scale=(1, 10))
data_I = Dataset.load_from_df(ph_merge_svd_data,reader = reader)

In [65]:
trainset_I, testset_I = train_test_split(data_I, test_size=.15)

In [66]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo.fit(trainset_I)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x280676d0be0>

In [67]:
# run the  model against the testset
test_pred_I = algo.test(testset_I)

In [68]:
test_pred_I

[Prediction(uid='Amazon Customer', iid='Samsung Galaxy S5 SM-G900T 4G LTE 16GB Smartphone, Black (T-Mobile)', r_ui=2.0, est=7.98164705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='bragina_ok', iid='Blackberry Мобильный телефон Blackberry Q5', r_ui=8.0, est=7.98164705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Ramkumar', iid='Motorola Moto E 2nd Generation (4G, White)', r_ui=10.0, est=7.98164705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Picachu', iid='Apple iPhone 5 16Go Blanc', r_ui=9.0, est=7.98164705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Jimary', iid='SONY Xperia Z2 Noir', r_ui=9.0, est=7.98164705882353, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Amazon Customer', iid='Sprint LG Volt White (Sprint Prep

## 8. Try and recommend top 5 products for test users.

In [69]:
def get_top_n(test_pred_I, n=5):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in test_pred_I:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [70]:
top_n = get_top_n(test_pred_I, n=5)

In [71]:
top_n 

defaultdict(list,
            {'Amazon Customer': [('OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)',
               10),
              ('Motorola Moto Z Play with Style Mod (Black, 32GB)', 10),
              ('Apple iPhone 5s (Silver, 16GB)', 10),
              ('Motorola Moto G 3rd Generation SIM-Free Smartphone 2 GB RAM/16 GB ROM',
               10),
              ('VIVO V5 (Crown gold, 32 GB) (4 GB RAM)', 10)],
             'bragina_ok': [('Blackberry Мобильный телефон Blackberry Q5',
               7.98164705882353)],
             'Ramkumar': [('Motorola Moto E 2nd Generation (4G, White)',
               7.98164705882353)],
             'Picachu': [('Apple iPhone 5 16Go Blanc', 7.98164705882353)],
             'Jimary': [('SONY Xperia Z2 Noir', 7.98164705882353)],
             'Firma, Doksy': [('Microsoft Lumia 435 oranžová Dual SIM',
               7.98164705882353)],
             'DJ': [('Huawei P9 Lite - Smartphone libre Android (4G, pantalla 5.2", Octa-core, 2 GB RAM, 16 GB, cá

#### Above are the top 5 predicted items and their ratings for test users.

In [72]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

Amazon Customer ['OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)', 'Motorola Moto Z Play with Style Mod (Black, 32GB)', 'Apple iPhone 5s (Silver, 16GB)', 'Motorola Moto G 3rd Generation SIM-Free Smartphone 2 GB RAM/16 GB ROM', 'VIVO V5 (Crown gold, 32 GB) (4 GB RAM)']
bragina_ok ['Blackberry Мобильный телефон Blackberry Q5']
Ramkumar ['Motorola Moto E 2nd Generation (4G, White)']
Picachu ['Apple iPhone 5 16Go Blanc']
Jimary ['SONY Xperia Z2 Noir']
Firma, Doksy ['Microsoft Lumia 435 oranžová Dual SIM']
DJ ['Huawei P9 Lite - Smartphone libre Android (4G, pantalla 5.2", Octa-core, 2 GB RAM, 16 GB, cámara 13 MP), color blanco']
H.C.de Kort ['Apple iPhone 6S 128GB Space Gray']
Кирилл Бубликов ['Samsung I9100 Galaxy S II (Black)']
kesang tsh bhutia ['Nokia Lumia 525 (Black)']
Winger777  ['Samsung Galaxy S7 edge 32GB (Sprint)']
ineso ['Samsung SGH-U900 Soul']
roma22 ['HTC Touch']
Christian M. ['Microsoft Lumia 650 Smartphone (5 Zoll (12,7 cm) Touch-Display, 16 GB Speicher, Windows 10) weiß']
st

## 5. Evaluate the collaborative model. Print RMSE value for SVD

In [73]:
# get RMSE
print("Item-based Model : Test Set")
accuracy.rmse(test_pred_I, verbose=True)

Item-based Model : Test Set
RMSE: 2.5574


2.557425322736379

## 4. Build a collaborative filtering model using kNNWithMeans from surprise using User based model

In [74]:
reader = Reader(rating_scale=(1, 10))
data_U = Dataset.load_from_df(ph_merge_svd_data,reader = reader)

In [75]:
trainset_U, testset_U = train_test_split(data_U, test_size=.15)

In [76]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo.fit(trainset_U)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x2802f9e9730>

In [77]:
# we can now query for specific predicions
uid = 'Frances DeSimone'  # raw user id
iid = 'Samsung Galaxy Star Pro DUOS S7262 Unlocked Ce.'  # raw item id

In [78]:
# get a prediction for specific users and items.
pred = algo.predict(uid, iid, verbose=True)

user: Frances DeSimone item: Samsung Galaxy Star Pro DUOS S7262 Unlocked Ce. r_ui = None   est = 8.03   {'was_impossible': True, 'reason': 'User and/or item is unknown.'}


when, author = Frances DeSimone ,
item: Samsung Galaxy Star Pro DUOS S7262 Unlocked Ce.
estimated rating is 8.03

In [79]:
# run the trained model against the testset
test_pred_U = algo.test(testset_U)

## 8. Try and recommend top 5 products for test users.

In [80]:
def get_top_n(test_pred_U, n=5):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in test_pred_U:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [81]:
top_n = get_top_n(test_pred_U, n=5)

In [82]:
top_n 

defaultdict(list,
            {'Traumelfenkind': [('Samsung U800 Soul b', 8.028705882352941)],
             'capitanslaff': [('Gigaset Siemens C45', 8.028705882352941)],
             'lolitadollie': [('Samsung Glyde SCH-U940 (Verizon Wireless)',
               8.028705882352941)],
             'Amazon Customer': [('Motorola Moto G, 4th Gen (White, 2 GB, 16 GB)',
               10),
              ('Lenovo Vibe K4 Note (White,16GB)', 8.47283054864455),
              ('Lenovo Vibe K4 Note (White,16GB)', 8.47283054864455),
              ('Lenovo Vibe K4 Note (White,16GB)', 8.47283054864455),
              ('Lenovo Vibe K4 Note (White,16GB)', 8.47283054864455)],
             'Знаменитый': [('Samsung S5250 Wave 525', 8.028705882352941)],
             'AznGrl': [('LG KP500 Cookie Unlocked Phone with 3.2 MP Camera (Brown)--International Version with No Warranty',
               8.028705882352941)],
             'Amazon Kunde': [('Samsung Galaxy J5 DUOS Smartphone (13,2 cm (5,2 Zoll) Touch-Disp

#### Above are the top 5 predicted items and their ratings for test users.

In [83]:
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

Traumelfenkind ['Samsung U800 Soul b']
capitanslaff ['Gigaset Siemens C45']
lolitadollie ['Samsung Glyde SCH-U940 (Verizon Wireless)']
Amazon Customer ['Motorola Moto G, 4th Gen (White, 2 GB, 16 GB)', 'Lenovo Vibe K4 Note (White,16GB)', 'Lenovo Vibe K4 Note (White,16GB)', 'Lenovo Vibe K4 Note (White,16GB)', 'Lenovo Vibe K4 Note (White,16GB)']
Знаменитый ['Samsung S5250 Wave 525']
AznGrl ['LG KP500 Cookie Unlocked Phone with 3.2 MP Camera (Brown)--International Version with No Warranty']
Amazon Kunde ['Samsung Galaxy J5 DUOS Smartphone (13,2 cm (5,2 Zoll) Touch-Display, 16 GB Speicher, Android 6.0) weiß', 'Caterpillar B25 Outdoor Handy (Dual-Sim, 2 Megapixel Kamera, Freisprechfunktion, UKW Radio) schwarz', 'HTC U Play Smartphone (13,2 cm (5,2 Zoll), 16 MP Frontkamera, 32GB Speicher, Android) Schwarz']
Игорь ['Телефон LG P705 Optimus L7 Black', 'Смартфон NOKIA 5230 XpressMusic Black Silver', 'LG Optimus L7 P705 Black']
Richcop1973 ['Motorola Renegade V950']
mielpopz ['Samsung GALAXY Mini

## 6. Predict score (average rating) for test users.

In [84]:
test_pred_U

[Prediction(uid='Traumelfenkind', iid='Samsung U800 Soul b', r_ui=10.0, est=8.028705882352941, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='capitanslaff', iid='Gigaset Siemens C45', r_ui=10.0, est=8.028705882352941, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='lolitadollie', iid='Samsung Glyde SCH-U940 (Verizon Wireless)', r_ui=2.0, est=8.028705882352941, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Amazon Customer', iid='Samsung Galaxy Note 5 N920G 32GB Factory Unlocked Phone - Retail Packaging - Black Sapphire (International Version)', r_ui=8.0, est=8.028705882352941, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Знаменитый', iid='Samsung S5250 Wave 525', r_ui=10.0, est=8.028705882352941, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='AznGrl', ii

#### Above are the prediction of user item combinations and the estimated ratings.

## 5. Evaluate the collaborative model. Print RMSE value for User Based CF.

In [85]:
print("User-based Model : Test Set")
accuracy.rmse(test_pred_U, verbose=True)

User-based Model : Test Set
RMSE: 2.7310


2.7309583916206255

## 9. Try other techniques (Example: cross validation) to get better results.

In [86]:
# Try cross validation techniques to get better results.
cross_validate(algo,data_U, measures=['RMSE'], cv=3, verbose=False)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


{'test_rmse': array([2.58839668, 2.53890518, 2.66688721]),
 'fit_time': (0.18178558349609375, 0.19212961196899414, 0.18471646308898926),
 'test_time': (0.01683831214904785,
  0.010133504867553711,
  0.008011341094970703)}

 ## 7. Report your findings and inferences.

#### Samsung Galaxy Note5 is the most popular product 
#### Amazon Customer is the most active author who writes reviews.
#### Lenovo Vibe K4 Note (White,16GB) was rated by most of the authors
#### CV rmse was 2.5

## 10. In what business scenario you should use popularity based Recommendation Systems ? 
#### Ans. Popularity based recommendation system relies on the popularity,trends and frequency counts of which items were most purchased.It is used buy the travel companies selling holiday packages in a season, by Google News and other news websites to show Top Stories with images.


## 11.  In what business scenario you should use CF based Recommendation Systems ? 
#### Ans. Collaborative Filtering is used to building intelligent recommender systems that can learn to give better recommendations as more information about users is collected. It isa personalised recommender system , recommendations are made based on the past behaviour of the user. Most websites like Amazon, YouTube, and Netflix use collaborative filtering as a part of their sophisticated recommendation system.

## 12.  What other possible methods can you think of which can further improve the recommendation for diﬀerent users ?
#### Ans. Apart from Popularity and Collaborative Filtering , Content-based, Demographic, Utility based, Knowledge based and Hybrid recommendation system can be used as per the user needs.