***DOMAIN:*** Smartphone, Electronics<br>
***CONTEXT:*** India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India
in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by
smartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour has
made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they
are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the
right place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation system
based on individual consumer’s behaviour or choice.

***DATA DESCRIPTION:***<br>
***• author :*** name of the person who gave the rating<br>
***• country :*** country the person who gave the rating belongs to<br>
***• data :*** date of the rating<br>
***• domain:*** website from which the rating was taken from<br>
***• extract:*** rating content<br>
***• language:*** language in which the rating was given<br>
***• product:*** name of the product/mobile phone for which the rating was given<br>
***• score:*** average rating for the phone<br>
***• score_max:*** highest rating given for the phone<br>
***• source:*** source from where the rating was taken<br>

**PROJECT OBJECTIVE:** We will build a recommendation system using popularity based and collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively..


**1. Import the necessary libraries and read the provided CSVs as a data frame**

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

***Merge the provided CSVs into one data-frame.***

In [2]:
csv_file_list = ["phone_user_review_file_1.csv", "phone_user_review_file_2.csv","phone_user_review_file_3.csv","phone_user_review_file_4.csv","phone_user_review_file_5.csv","phone_user_review_file_6.csv"]

list_of_dataframes = []
for filename in csv_file_list:
    print(filename)
    list_of_dataframes.append(pd.read_csv(filename,encoding='latin1'))

phones_df = pd.concat(list_of_dataframes)


phone_user_review_file_1.csv
phone_user_review_file_2.csv
phone_user_review_file_3.csv
phone_user_review_file_4.csv
phone_user_review_file_5.csv
phone_user_review_file_6.csv


***Check a few observations and shape of the data-frame***

In [3]:
phones_df.head()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.2,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8


In [4]:
row,col = phones_df.shape
print("Number of rows: {}".format(row))
print("Number of columns: {}".format(col))

Number of rows: 1415133
Number of columns: 11


In [5]:
phones_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1415133 entries, 0 to 163836
Data columns (total 11 columns):
 #   Column     Non-Null Count    Dtype  
---  ------     --------------    -----  
 0   phone_url  1415133 non-null  object 
 1   date       1415133 non-null  object 
 2   lang       1415133 non-null  object 
 3   country    1415133 non-null  object 
 4   source     1415133 non-null  object 
 5   domain     1415133 non-null  object 
 6   score      1351644 non-null  float64
 7   score_max  1351644 non-null  float64
 8   extract    1395772 non-null  object 
 9   author     1351931 non-null  object 
 10  product    1415132 non-null  object 
dtypes: float64(2), object(9)
memory usage: 129.6+ MB


In [6]:
phones_df.isna().sum()

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64

There are few NA's in score, score_max, extract and author columns. Dropping off the records having NA's

In [7]:
phones_cl = phones_df.dropna()

***Round off scores to the nearest integers.***

In [8]:
phones_cl.loc[:, ('score', 'score_max')]

Unnamed: 0,score,score_max
0,10.0,10.0
1,10.0,10.0
2,6.0,10.0
3,9.2,10.0
4,4.0,10.0
...,...,...
163832,2.0,10.0
163833,10.0,10.0
163834,2.0,10.0
163835,8.0,10.0


In [9]:
phones_cl.loc[:, ('score', 'score_max')] = phones_cl.loc[:, ('score', 'score_max')].round()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


***Check for missing values. Impute the missing values if there is any***<br>
MIssing values are dropped as the number of records are 63000+ out of 1400000 records.

***Check for duplicate values and remove them if there is any.***

In [10]:
phones_cl.drop_duplicates()

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.0,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8
...,...,...,...,...,...,...,...,...,...,...,...
163832,/cellphones/alcatel-ot-club_1187/,5/12/2000,de,de,Ciao,ciao.de,2.0,10.0,Weil mein Onkel bei ALcatel arbeitet habe ich ...,david.paul,Alcatel Club Plus Handy
163833,/cellphones/alcatel-ot-club_1187/,5/11/2000,de,de,Ciao,ciao.de,10.0,10.0,Hy Liebe Leserinnen und Leser!! Ich habe seit ...,Christiane14,Alcatel Club Plus Handy
163834,/cellphones/alcatel-ot-club_1187/,5/4/2000,de,de,Ciao,ciao.de,2.0,10.0,"Jetzt hat wohl Alcatell gedacht ,sie machen wa...",michaelawr,Alcatel Club Plus Handy
163835,/cellphones/alcatel-ot-club_1187/,5/1/2000,de,de,Ciao,ciao.de,8.0,10.0,Ich bin seit 2 Jahren (stolzer) Besitzer eines...,claudia0815,Alcatel Club Plus Handy


4500+ records has been removed as duplicates.

***Keep only 1000000 data samples. Use random state=612.***<br>
80% of the data has to be taken after clean up to keep 1000000 records. 

In [11]:
phones_sampled = phones_cl.sample(n=1000000,random_state=612)

***Drop irrelevant features. Keep features like Author, Product, and Score***

In [12]:
phones_fl = phones_sampled[['author','product','score']]

In [13]:
phones_fl.head(5)

Unnamed: 0,author,product,score
292711,Giuseppe Calavaro,"Alcatel One Touch 20-04G Telefono Cellulare, Nero",6.0
78482,Buraian22,Huawei M750,2.0
126183,badamyan.karen,Nokia C7-00,10.0
32139,Amazon Customer,Binatone SM800 Touch Screen Big Button Sim Fre...,10.0
17325,unknown,Samsung Samsung Galaxy A5 2016 - Wit,6.0


***Identify the most rated features***

In [14]:
phones_fl[phones_fl['score'].values == phones_fl['score'].median()]

Unnamed: 0,author,product,score
86389,Petras,Samsung SGH-X700,9.0
149640,BruceDude,S46,9.0
160977,pklat,Samsung GALAXY A3 (2016) A310F white Android S...,9.0
85798,J.R90,Samsung Galaxy S6 edge zwart / 32 GB,9.0
107910,Maris123654,Samsung Galaxy S6 zwart / 32 GB,9.0
...,...,...,...
14503,nixter1029,Samsung Galaxy S7 edge 32GB (Verizon),9.0
81273,Peter Janssens,Sony Xperia Go Geel,9.0
76059,Selina-1991,Samsung Galaxy Note 3 zwart / 32 GB - Overzicht,9.0
114166,EdH,Huawei P8 grijs / 16 GB,9.0


***Identify the users with most number of reviews.***

In [15]:
phones_fl[['author','score']].groupby(by='author',axis=0,sort=False).count().sort_values('score',ascending=False).head(10)

Unnamed: 0_level_0,score
author,Unnamed: 1_level_1
Amazon Customer,60237
Cliente Amazon,15034
e-bit,6865
Client d'Amazon,5963
Amazon Kunde,3691
einer Kundin,2054
Anonymous,2048
einem Kunden,1493
unknown,1357
Anonymous,1151


***Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final
dataset.***

In [16]:
phones_fl.groupby(by=['product'],axis=0).count()>50

Unnamed: 0_level_0,author,score
product,Unnamed: 1_level_1,Unnamed: 2_level_1
"'Smartphone Meizu Pro 5, 5,7 pouces avec Exynos 7420 Octa 8 Core Processeur. mÃ©moire RAM 4 Go et 64 Go mÃ©moire...",False,False
"'Sony Xperia X (F5122) â White â Dual Sim (Google Android 6.0.1, 5 Display, 2 x CORTEX A72 1.8 GHz + 4 x cortex-a53...",False,False
"'Sony Xperia X (F5122) â rosa â Dual Sim (Google Android 6.0.1, 5 Display, 2 x CORTEX A72 1.8 GHz + 4 x cortex-a53...",False,False
"(CUBOT) GT88 5.5"" qHD 1.3GHz MTK6572 2-Core Android 4.2.2 3G Phone 8MP CAM 512MB RAM 4GB ROM",False,False
"(DG300 Versione Aggiornata)5'' DOOGEE VOYAGER2 DG310 Dual Flashlights IPS Screen 3G Smartphone Android 4.4 MTK6582 1.3GHz Quad Core Telefono Cellulare Dual SIM 8G ROM OTG OTA GPS WIFI, BIANCO",False,False
...,...,...
Ø³Ø§ÙØ³ÙÙØ¬ Ø¬Ø§ÙÙØ³Ù J1 2016 SM-J120H - Ø´Ø±ÙØ­ØªÙÙ Ø§ØªØµØ§ÙØ 8 Ø¬ÙØ¬Ø§Ø 1 Ø¬ÙØ¬Ø§ Ø±Ø§ÙØ Ø§ÙØ¬ÙÙ Ø§ÙØ«Ø§ÙØ«Ø Ø§Ø³ÙØ¯,False,False
Ø³Ø§ÙØ³ÙÙØ¬ Ø¬Ø§ÙÙØ³Ù J1 2016 SM-J120H - Ø´Ø±ÙØ­ØªÙÙ Ø§ØªØµØ§ÙØ 8 Ø¬ÙØ¬Ø§Ø 1 Ø¬ÙØ¬Ø§ Ø±Ø§ÙØ Ø§ÙØ¬ÙÙ Ø§ÙØ«Ø§ÙØ«Ø Ø°ÙØ¨Ù,False,False
"Ø³Ø§ÙØ³ÙÙØ¬ Ø¬Ø§ÙÙØ³Ù J1 SM-J120FD Ø¨Ø´Ø±ÙØ­ØªÙ Ø§ØªØµØ§Ù - 8 Ø¬ÙØ¬Ø§, Ø§ÙØ¬ÙÙ Ø§ÙØ±Ø§Ø¨Ø¹ Ø§Ù ØªÙ Ø§Ù, Ø§Ø³ÙØ¯",False,False
"Ø³Ø§ÙØ³ÙÙØ¬ Ø¬Ø§ÙÙØ³Ù J1 SM-J120FD Ø¨Ø´Ø±ÙØ­ØªÙ Ø§ØªØµØ§Ù - 8 Ø¬ÙØ¬Ø§, Ø§ÙØ¬ÙÙ Ø§ÙØ±Ø§Ø¨Ø¹ Ø§Ù ØªÙ Ø§Ù, Ø°ÙØ¨Ù",False,False


***51961 products are having rating more than 50****

In [17]:
phones_fl.groupby(by=['author'],axis=0).count()>50

Unnamed: 0_level_0,product,score
author,Unnamed: 1_level_1,Unnamed: 2_level_1
efef,False,False
!!!!!!!!!!!!!!!!!!!!!!!!!!!!,False,False
"!!!JOSE""ANTONIO""",False,False
!!:. PuNi$heR .:!!,False,False
!&#34;Â§,False,False
...,...,...
æ­¦è¡å¤§å¸«,False,False
ç«è¯ããªã§,False,False
è¥ççç,False,False
"éº¦ç¬ç¬,",False,False


***625104 users has given more than 50 rating*** 

***Build a popularity based model and recommend top 5 mobile phones.***

In [18]:
phones_fl.groupby('product')['score'].mean().sort_values(ascending=False).head(5)

product
æ©æç½æ C168i                                                                                                                        10.0
Samsung Galaxy S7 32GB SAMSUNG Galaxy S7 32GB - Vit                                                                                       10.0
HTC Butterfly S 901s Gray 16GB Factory Unlocked SmartPhone GSM 850 / 900 / 1800 / 1900 HSDPA 850 / 900 / 1900 / 2100                      10.0
Samsung Galaxy S6 edge SM-G925F 128GB 4G Green - smartphones (Single SIM, Android, NanoSIM, GSM, HSPA, LTE)                               10.0
Samsung Galaxy S6 edge SM-G925F - Smartphone de 5.1" (12,954 cm, 2560 x 1440 pixeles, SAMOLED, 2,1 GHz, 1,5 GHz, 3072 MB), color verde    10.0
Name: score, dtype: float64

***Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch(Note: Incase you’re building it from scratch you
can limit your data points to 5000 samples if you face memory issues). Build a collaborative filtering model using kNNWithMeans from surprise. You
can try both user-based and item-based model.***

***Taking 5000 rows only***

In [19]:
svd_input = phones_fl.sample(n=5000,random_state=612)

In [20]:
svd_input.shape[0]

5000

In [35]:
svd_input.drop_duplicates(inplace=True)

In [41]:
svd_input = svd_input.sort_values('score').drop_duplicates(subset=['product','author'],keep='last')

In [42]:
svd_input['product'].value_counts().count()

3765

In [43]:
svd_input['author'].value_counts().count()

4332

In [45]:
svd_input.set_index('product')

Unnamed: 0_level_0,author,score
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Palm Pre,Tommy,1.0
C61,cokeacola644,1.0
Rumor / Scoop / UX-260,in2theunknown,1.0
T-Mobile MDA Vario II,Gast,1.0
SGH-D415 / SGH-D410,extremebase,1.0
...,...,...
"Samsung Galaxy S5 SM-G900T 4G LTE 16GB Smartphone, Black (T-Mobile)",Jacamo,10.0
"Samsung Galaxy Core Plus Smartphone (10,9 cm (4,3 Zoll) TFT-Touchscreen, 5 Megapixel Kamera, WiFi, NFC, S Beam, Android 4.2.2) weiÃ",Karlheinz Hame,10.0
"Samsung Galaxy Alpha (11,90 cm (4,7 Zoll) Super-AMOLED-Display, Octa-Core-Prozessor, 12-Megapixel-Kamera, Android...",KÃ¤ufer,10.0
Ð¢ÐµÐ»ÐµÑÐ¾Ð½ LG E960 Google Nexus 4 Black,ÐÑÐ¸ÑÑÐ¸Ð½Ð°,10.0


In [48]:
score_mat = svd_input.pivot(index='product',columns='author',values='score')

In [50]:
normalised_mat = score_mat - np.asarray([(np.mean(score_mat, 1))]).T

In [56]:
np.sqrt(score_mat.shape[0] - 1)

61.35144660071187

In [53]:
A = normalised_mat.T / np.sqrt(score_mat.shape[0] - 1)

In [54]:
A

product,**** PacK Exclusif A&d COFFRET WIKO **** Housse WIKO CINK FIVE Coque wiko cink five protection flip cover wiko...,1006,2014 Newest DOOGEE DAGGER DG550 5.5'' Unlocked Octa Core 1.7Ghz Android 4.2.9 OS 3G Smartphone -- 5-Point-Touch...,3100 / 3120,5-Zoll- Android 4.2 Cubot P9 3G Smart Phone MTK6572 Dual Core 1.3GHz QHD IPS Schirm 512MB RAM 4GB ROM GPS 8MP...,5.5-Inch Unlocked Lenovo A850 3G Smartphone-(960x540) Quad Core 4GB MT6582m 1331MHz Android 4.2 Dual Camera +Dual SIM -Black (Rooted + Google Play),6555,7290,8801 / 8800,?????????????? ?????????????? Samsung E2202 Black,...,Ð¡Ð¾ÑÐ¾Ð²ÑÐ¹ ÑÐµÐ»ÐµÑÐ¾Ð½ Sonim XP 3300 Force,Ð¡Ð¾ÑÐ¾Ð²ÑÐ¹ ÑÐµÐ»ÐµÑÐ¾Ð½ Sony Xperia Tipo Dual ST21i2,Ð¢ÐµÐ»ÐµÑÐ¾Ð½ LG E960 Google Nexus 4 Black,×××¤×× ×¡××××¨× Apple iPhone 7 Plus 32GB SimFree,×××¤×× ×¡××××¨× G4 H815 ×× ×¢××¨ LG,×××¤×× ×¡××××¨× LG Nexus 5 16GB D821,×××¤×× ×¡××××¨× Meizu M3 Note 16GB,×××¤×× ×¡××××¨× Samsung Galaxy S6 Edge SM-G925F 64GB,×××¤×× ×¡××××¨× Samsung Galaxy S6 SM-G920F 64GB,×××¤×× ×¡××××¨× Xiaomi Redmi 4A 16GB
author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
*Darkest*Star*,,,,,,,,,,,...,,,,,,,,,,
//@ndzo,,,,,,,,,,,...,,,,,,,,,,
03102255mvp,,,,,,,,,,,...,,,,,,,,,,
100Ð¿ÑÐ´Ð¾Ð²,,,,,,,,,,,...,,,,,,,,,,
19Dennis87,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ÑÐ¾Ð¿Ð¿Ð° ÑÑÐ°,,,,,,,,,,,...,,,,,,,,,,
ÑÐ»Ñ,,,,,,,,,,,...,,,,,,,,,,
×ª×××¨ ×¨×¢××ª,,,,,,,,,,,...,,,,,,,,,0.0,
â Francis Mazzucco [Spain Premium Top Reviewer] âââââ,,,,,,,,,,,...,,,,,,,,,,


In [57]:
U, S, V = np.linalg.svd(A)

LinAlgError: SVD did not converge

In [None]:
phones_cl["lang"].value_counts()

In [None]:
demo_array = np.arange(10,21)
subset_demo_array = demo_array[0:7]
subset_demo_array[:]= 101
subset_demo_array