# DOMAIN:  Smartphone, Electronics
## •CONTEXT: India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across India in the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web by smartphone  users  across  Asia  Pacific.  The  combination  of  very  high  sales  volumes  and  the  average  smartphone  consumer  behaviour  has made India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when they are buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at the right  place,  there  are  90%  chances  that  user  will  enquire  for  the  same.  This  Case  Study  is  targeted  to  build  a  recommendation  system based on individual consumer’s behaviour or choice.

## •DATA DESCRIPTION: 
### •author : name of the person who gave the rating
### •country : country the person who gave the rating belongs to
### •data : date of the rating 
### •domain: website from which the rating was taken from 
### •extract: rating content 
### •language: language in which the rating was given 
### •product: name of the product/mobile phone for which the rating was given 
### •score: average rating for the phone 
### •score_max: highest rating given for the phone 
### •source: source from where the rating was taken 

## •PROJECT  OBJECTIVE: We  will  build  a  recommendation  system  using  popularity  based  and  collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively

### Steps and tasks: [ Total Score: 60 points]
### 1.Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps. [15 Marks]

#### A. Merge all the provided CSVs into one dataFrame. [2 Marks]
#### B. Explore, understand the Data and share at least 2 observations. [2 Marks]
#### C. Round off scores to the nearest integers. [3 Marks]
#### D. Check for missing values. Impute the missing values, if any. [2 Marks]
#### E. Check for duplicate values and remove them, if any. [2 Marks]
#### F.  Keep only 1 Million data samples. Use random state=612. [2 Marks]
#### G. Drop irrelevant features. Keep features like Author, Product, and Score. [2 Marks]

In [1]:
# Load required packages
import pandas as pd
import numpy as np
import os
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import KNNWithMeans
from surprise import accuracy
from collections import defaultdict
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')


In [2]:
# 1.A. Merge all the provided CSVs into one dataFrame. [2 Marks]
# Function to extract names of all files from OS
# Before 
def find_csv_files():
    suffix=".csv"
    filenames = os.listdir('.')
    return([filename for filename in filenames if filename.endswith( suffix )])

df_master = pd.DataFrame()
for f in find_csv_files():
    df_master = pd.concat([df_master, pd.read_csv(f)])

In [3]:
# Reset index of the combined dataframe
df_master.reset_index(drop=True, inplace=True)

# Display combined dataframe basic information
df_master.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1415133 entries, 0 to 1415132
Data columns (total 11 columns):
 #   Column     Non-Null Count    Dtype  
---  ------     --------------    -----  
 0   phone_url  1415133 non-null  object 
 1   date       1415133 non-null  object 
 2   lang       1415133 non-null  object 
 3   country    1415133 non-null  object 
 4   source     1415133 non-null  object 
 5   domain     1415133 non-null  object 
 6   score      1351644 non-null  float64
 7   score_max  1351644 non-null  float64
 8   extract    1395772 non-null  object 
 9   author     1351931 non-null  object 
 10  product    1415132 non-null  object 
dtypes: float64(2), object(9)
memory usage: 118.8+ MB


In [4]:
# 1.B Explore, understand the Data and share at least 2 observations.
df_master.shape

(1415133, 11)

In [5]:
df_master.isnull().sum()

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        63489
score_max    63489
extract      19361
author       63202
product          1
dtype: int64

In [6]:
# Observations 
# There are 11 Columns and 1.4 mn records to analyse i.e. a huge dat set.
# There are signifiant number of null rows in the recors as seen from info in score, score_max, extract and author columns.

In [7]:
# 1.C Round off scores to the nearest integers. [3 Marks]
df_master['score']=df_master['score'].apply(np.round)

In [8]:
df_master

Unnamed: 0,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Verizon Wireless,verizonwireless.com,10.0,10.0,As a diehard Samsung fan who has had every Sam...,CarolAnn35,Samsung Galaxy S8
1,/cellphones/samsung-galaxy-s8/,4/28/2017,en,us,Phone Arena,phonearena.com,10.0,10.0,Love the phone. the phone is sleek and smooth ...,james0923,Samsung Galaxy S8
2,/cellphones/samsung-galaxy-s8/,5/4/2017,en,us,Amazon,amazon.com,6.0,10.0,Adequate feel. Nice heft. Processor's still sl...,R. Craig,"Samsung Galaxy S8 (64GB) G950U 5.8"" 4G LTE Unl..."
3,/cellphones/samsung-galaxy-s8/,5/2/2017,en,us,Samsung,samsung.com,9.0,10.0,Never disappointed. One of the reasons I've be...,Buster2020,Samsung Galaxy S8 64GB (AT&T)
4,/cellphones/samsung-galaxy-s8/,5/11/2017,en,us,Verizon Wireless,verizonwireless.com,4.0,10.0,I've now found that i'm in a group of people t...,S Ate Mine,Samsung Galaxy S8
...,...,...,...,...,...,...,...,...,...,...,...
1415128,/cellphones/sony-ericsson-z710i/,8/7/2006,fr,fr,GraphMobile,graphmobile.com,10.0,10.0,Pour info il est sur amazon.de a 212.99€ s'il ...,,Sony-Ericsson Z710i
1415129,/cellphones/sony-ericsson-z710i/,8/5/2006,fr,fr,GraphMobile,graphmobile.com,9.0,10.0,Habitué à samsung sony nous sort 1 jolie clam ...,,Sony-Ericsson Z710i
1415130,/cellphones/sony-ericsson-z710i/,7/19/2006,fr,fr,GraphMobile,graphmobile.com,10.0,10.0,"Pour les gens qui ne regarde pas Il fait mp3, ...",,Sony-Ericsson Z710i
1415131,/cellphones/sony-ericsson-z710i/,7/9/2006,fr,fr,GraphMobile,graphmobile.com,9.0,10.0,C vrai que sans le mp3 c moyen...,,Sony-Ericsson Z710i


In [9]:
# 1.D Check for missing values. Impute the missing values, if any. 
df_master['score_max'].value_counts()

10.0    1351644
Name: score_max, dtype: int64

In [10]:
# Since only one value with no variation dropping this column.
df_master.drop('score_max',axis=1,inplace=True)

In [11]:
# Delete the row where product is mising its only one row 
df_master[df_master['product'].isna()]

Unnamed: 0,phone_url,date,lang,country,source,domain,score,extract,author,product
802795,/cellphones/samsung-galaxy-s-iii/,1/22/2014,de,de,Amazon,amazon.de,10.0,Bestes Smartphone was ich bisher hatte :) öafk...,,


In [12]:
df_master.dropna(axis=0, subset=['product'], inplace=True)

In [13]:
# Delete all the rows where author is null 
# Dropping records with null values in author
df_master.dropna(axis=0, subset=['author'], inplace=True)

In [14]:
df_master.isnull().sum()

phone_url        0
date             0
lang             0
country          0
source           0
domain           0
score        60893
extract      15515
author           0
product          0
dtype: int64

In [15]:
# Where score is missing replace it with the mean score
df_master['score'].fillna(np.round(df_master['score'].mean()),inplace=True)

In [16]:
# Where extract is missing repace it with blanks
df_master['extract'].replace({np.nan: ' '},inplace=True)

In [17]:
#1.E Check for duplicate values and remove them, if any. [2 Marks]
df_master = df_master.loc[~df_master.duplicated(keep = 'first')]

In [18]:
df_master.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1346881 entries, 0 to 1415125
Data columns (total 10 columns):
 #   Column     Non-Null Count    Dtype  
---  ------     --------------    -----  
 0   phone_url  1346881 non-null  object 
 1   date       1346881 non-null  object 
 2   lang       1346881 non-null  object 
 3   country    1346881 non-null  object 
 4   source     1346881 non-null  object 
 5   domain     1346881 non-null  object 
 6   score      1346881 non-null  float64
 7   extract    1346881 non-null  object 
 8   author     1346881 non-null  object 
 9   product    1346881 non-null  object 
dtypes: float64(1), object(9)
memory usage: 113.0+ MB


In [19]:
#1.F Keep only 1 Million data samples. Use random state=612. [2 Marks]
df_subSet = df_master.sample(1000000, random_state=612).copy().reset_index(drop=True)

In [20]:
#1.G Drop irrelevant features. Keep features like Author, Product, and Score. [2 Marks]
df_analysis = df_subSet[["country", "author", "product", "score"]].copy()

### 2.Answer the following questions. [10 Marks]
#### A.Identify the most rated products. [3 Marks]
#### B.Identify the users with most number of reviews. [3 Marks]
#### C.Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final dataset. [4 Marks]

In [21]:
#2.A Identify the most rated products.
df_analysis['product'].value_counts()[:10]

Lenovo Vibe K4 Note (White,16GB)                3847
Lenovo Vibe K4 Note (Black, 16GB)               3197
OnePlus 3 (Graphite, 64 GB)                     3014
OnePlus 3 (Soft Gold, 64 GB)                    2666
Samsung Galaxy Express I8730                    2019
Huawei P8lite zwart / 16 GB                     1971
Lenovo Vibe K5 (Gold, VoLTE update)             1897
Samsung Galaxy S6 zwart / 32 GB                 1760
Lenovo Vibe K5 (Grey, VoLTE update)             1566
Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)    1460
Name: product, dtype: int64

In [22]:
#2.B Identify the users with most number of reviews.
df_analysis['author'].value_counts()[:10]

Amazon Customer    57169
Cliente Amazon     14349
e-bit               6254
Client d'Amazon     5703
Amazon Kunde        3557
Anonymous           2051
einer Kundin        1894
einem Kunden        1435
unknown             1283
Anonymous           1083
Name: author, dtype: int64

In [23]:
#2.C Select the data with products having more than 50 ratings and users who have given more than 50 ratings. 
# Report the shape of the final dataset. [4 Marks]
# selecting products which had atleast 50 reviews.

df_analysis['p_count'] = df_analysis.groupby(['product'])['score'].transform('count')
df_analysis['a_count'] = df_analysis.groupby(['author'])['score'].transform('count')

In [24]:
df_gt_50_reviews = df_analysis[(df_analysis['p_count']>50) & (df_analysis['a_count']>50)]
df_gt_50_reviews

Unnamed: 0,country,author,product,score,p_count,a_count
1,in,Amazon Customer,"Asus Zenfone 2 Laser ZE500KL (Black, 16GB)",4.0,82,57169
12,fr,Client d'Amazon,Buyus Etui Housse Luxe Portefeuille Samsung Ga...,10.0,64,5703
15,nl,unknown,Samsung Samsung Galaxy A5 2016 - Wit,10.0,65,1283
21,gb,Amazon Customer,Samsung Galaxy S7 Edge 32GB UK SIM-Free Smartp...,10.0,148,57169
29,in,Amazon Customer,"OnePlus 3T (Soft Gold, 6GB RAM + 64GB memory)",10.0,1000,57169
...,...,...,...,...,...,...
999979,de,Tim,Samsung Galaxy S Duos S7562 Smartphone (Qualco...,8.0,88,167
999980,gb,Amazon Customer,Doro PhoneEasy 612i GSM Sim Free Mobile Phone ...,10.0,176,57169
999984,de,Amazon-Kunde,HTC Desire X Smartphone (1 GHz Dual-Core Proze...,10.0,191,383
999985,fr,Client d'Amazon,"EasyAcc Coque Samsung Galaxy A3 2016, EasyAcc ...",10.0,63,5703


In [25]:
df_gt_50_reviews.shape

(108776, 6)

## 3 Build a popularity based model and recommend top 5 mobile phones. [5 Marks]

In [26]:
# Build a popularity based model and recommend top 5 mobile phones. [5 Marks]
# Display top 5 mobile phones by rating in the world
# Calculating the mean score for a product by grouping it.
# Recoomendation idea to get higherst rating with highest count of rating product got
# This removes the bias that if one rating was high the product gets promoted
# Count of ratings received is also consdiered a parameter to suggest rating.
ratings_mean_count = pd.DataFrame(df_analysis.groupby('product')['score'].mean()) 
ratings_mean_count['rating_counts'] = pd.DataFrame(df_analysis.groupby('product')['score'].count()) 
ratings_mean_count.sort_values(by=['score','rating_counts'], ascending=[False,False]).head()

# However since the date data is too wide we should have recommended based on monthly or yearly populatiry. 
# A model popular in one time preiod should not be recommended in another period

Unnamed: 0_level_0,score,rating_counts
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,149
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 Câmera 10MP e Frontal 2MP Memória Interna de 16GB GSM,10.0,142
Nokia Smartphone Nokia Lumia 520 Desbloqueado Oi Preto Windows Phone 8 Câmera 5MP 3G Wi-Fi Memória Interna 8G GPS,10.0,133
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi Câmera 5MP - Preto,10.0,130
Samsung Smartphone Dual Chip Samsung Galaxy SIII Duos Desbloqueado Claro Azul Android 4.1 3G/Wi-Fi Câmera 5MP,10.0,128


## 4. Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch(Note: Incase you’re building it from scratch you can limit your data points to 5000 samples if you face memory issues). Build a collaborative filtering model using kNNWithMeans from surprise. You can try both user-based and item-based model. [10 Marks]

In [27]:
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(df_gt_50_reviews[['author', 'product', 'score']], reader)
algo_svd = SVD(n_epochs=10)

In [28]:
trainset, testset = train_test_split(data, test_size=.2,random_state=612)

In [29]:
# Training model svd
algo_svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f87ee1d8550>

In [30]:
# Building user-user collaborative filter model KNNWithMeans
algo_user = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo_user.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f87ee1d8e80>

In [31]:
# Build item-item collaborative filter model
algo_item = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo_item.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f87ee1d8f40>

## 5. Evaluate the collaborative model. Print RMSE value.

In [32]:
# Get prediction for svd based collaborative filtering model
# get RMSE for svd 
svd_pred = algo_svd.test(testset)
print("SVD-based Model : Test Set")
accuracy.rmse(svd_pred, verbose=True)

# get RMSE for user-user collaborative filtering model
usr_pred = algo_user.test(testset)
print("\nUser-User Model : Test Set")
accuracy.rmse(usr_pred, verbose=True)

# get RMSE for item-item collaborative filtering model
itm_pred = algo_item.test(testset)
print("\nItem-Item Model : Test Set")
accuracy.rmse(itm_pred, verbose=True)

SVD-based Model : Test Set
RMSE: 2.6185

User-User Model : Test Set
RMSE: 2.7538

Item-Item Model : Test Set
RMSE: 2.6826


2.682600168588943

## 6. Predict score (average rating) for test users. [2 Marks]

In [33]:
# SVD 
svd_pred_df = pd.DataFrame(svd_pred)
svd_pred_df.head(5)

Unnamed: 0,uid,iid,r_ui,est,details
0,Francesco,"Huawei P9 Lite Smartphone, LTE, Display 5.2'' ...",10.0,9.151644,{'was_impossible': False}
1,Deepak,"Nokia 215 (Dual SIM, Black)",6.0,7.543952,{'was_impossible': False}
2,.,Samsung Galaxy S3 I9300 Unlocked 16GB (Marble ...,6.0,7.265435,{'was_impossible': False}
3,Amazon Customer,"OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)",10.0,8.52353,{'was_impossible': False}
4,John,Sony Ericsson W580,8.0,8.17337,{'was_impossible': False}


In [34]:
#User-User CF
usr_pred_df = pd.DataFrame(usr_pred)
usr_pred_df.head(5)

Unnamed: 0,uid,iid,r_ui,est,details
0,Francesco,"Huawei P9 Lite Smartphone, LTE, Display 5.2'' ...",10.0,9.191791,"{'actual_k': 50, 'was_impossible': False}"
1,Deepak,"Nokia 215 (Dual SIM, Black)",6.0,9.666667,"{'actual_k': 1, 'was_impossible': False}"
2,.,Samsung Galaxy S3 I9300 Unlocked 16GB (Marble ...,6.0,6.0,"{'actual_k': 1, 'was_impossible': False}"
3,Amazon Customer,"OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)",10.0,8.68,"{'actual_k': 50, 'was_impossible': False}"
4,John,Sony Ericsson W580,8.0,8.183266,"{'actual_k': 7, 'was_impossible': False}"


In [35]:
#Item-Item CF
itm_pred_df = pd.DataFrame(itm_pred)
itm_pred_df.head(5)

Unnamed: 0,uid,iid,r_ui,est,details
0,Francesco,"Huawei P9 Lite Smartphone, LTE, Display 5.2'' ...",10.0,9.260736,"{'actual_k': 50, 'was_impossible': False}"
1,Deepak,"Nokia 215 (Dual SIM, Black)",6.0,8.752487,"{'actual_k': 7, 'was_impossible': False}"
2,.,Samsung Galaxy S3 I9300 Unlocked 16GB (Marble ...,6.0,6.028074,"{'actual_k': 6, 'was_impossible': False}"
3,Amazon Customer,"OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)",10.0,8.68,"{'actual_k': 50, 'was_impossible': False}"
4,John,Sony Ericsson W580,8.0,7.029029,"{'actual_k': 7, 'was_impossible': False}"


## 7. Report your findings and inferences.

#### 1. Based on the results we got on the accuracy (RMSE) scores. SVD has the lowest RMSE
#### 2. However the RMSE of ~2.5 is still very high.
#### 3. Item Item CF takes maximum time to compute.
#### 4. Given that exact prediction is not the objective here but old recommendation all three models do a good job.
#### 5. Needless to say the people have generally given good rating i.e. around 8 so there is bias in data. However recommendations are less impacted due to the same.


## 8. Try and recommend top 5 products for test users.

In [36]:
top_5_prod_recos = svd_pred_df.groupby('uid').head(5).reset_index(drop=True)

In [37]:
pd.set_option('max_columns',None)
pd.set_option('max_rows',None)
top_5_prod_recos.sort_values('uid')

Unnamed: 0,uid,iid,r_ui,est,details
1299,#,Sony Ericsson W350i,4.0,7.865478,{'was_impossible': False}
1296,#,Samsung GT-S5230 Star,8.0,8.66158,{'was_impossible': False}
1442,#,Nokia 6288,8.0,8.612038,{'was_impossible': False}
1240,#,Nokia 5228,8.0,8.116358,{'was_impossible': False}
739,#,Sony Ericsson Xperia X8,8.0,8.497432,{'was_impossible': False}
2,.,Samsung Galaxy S3 I9300 Unlocked 16GB (Marble ...,6.0,7.265435,{'was_impossible': False}
2180,.,"Samsung Galaxy Ace 2, 96.5 mm (3.8 ""), 480 x 8...",10.0,7.35336,{'was_impossible': False}
1059,.,"Samsung Galaxy Ace S5830i Smartphone (8,9 cm (...",8.0,7.140434,{'was_impossible': False}
1975,.,"Samsung Galaxy S7 goud, roze / 32 GB",8.0,7.897246,{'was_impossible': False}
1372,????????,Sony Xperia V (?????�??????),9.0,9.33568,{'was_impossible': False}


## 9. Try other techniques (Example: cross validation) to get better results. [3 Marks]

In [38]:
from surprise import NMF
from surprise import KNNBaseline
from surprise import KNNBasic
from surprise import KNNWithZScore
from surprise import BaselineOnly
from surprise import CoClustering
benchmark = []
# Iterate over different algorithms
for algorithm in [SVD(), KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True}),
                  NMF(), KNNBaseline(), KNNBasic(), KNNWithZScore(), BaselineOnly(), CoClustering()]:
    # Perform cross validation
    results = cross_validate(algorithm, data, measures=['RMSE'], cv=5, verbose=False)
    
    # Document Results
    tmp_res = pd.DataFrame.from_dict(results).mean(axis=0)
    tmp_res = tmp_res.append(pd.Series([str(algorithm).split(' ')[0].split('.')[-1]], index=['Algorithm']))
    benchmark.append(tmp_res)
    
pd.DataFrame(benchmark).set_index('Algorithm').sort_values('test_rmse') 

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.

Unnamed: 0_level_0,test_rmse,fit_time,test_time
Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BaselineOnly,2.586091,0.247751,0.080693
SVD,2.645955,5.159474,0.163024
CoClustering,2.646058,2.054477,0.140885
KNNBaseline,2.667041,0.640033,3.101351
KNNBasic,2.67315,0.425397,2.750316
KNNWithZScore,2.676785,0.538523,2.952492
KNNWithMeans,2.736032,0.838428,2.900748
NMF,3.205908,5.39808,0.132282


### Observation
1. Best prediction algorithm with lowest RMSE is BaselineOnly followed by SVD. 
2. BaselineOnly algoritm also take lowest time to compute
3. Needless to say RMSE of 2.5 is still very high
4. However if we will round the recommendation to nearest integer the error decreases.
5. Skipped item-item CF as it took very long time to compute earlier.

## 10. In what business scenario you should use popularity based Recommendation Systems ? [2 Marks]

1. Popularity based Recommended system works on trends i.e. what is in demand.
2. We will use it in scenarios where we have (cold start problems) i.e. new user without any history so we can recommend him most popular items on site.
3. Not having enough information on the likes and dislikes at an user level can be overcome by this system even on Day1
4. Items that have become popular in short amount of time should also be recommended by Popularity based system e.g. top selling books this month.

e.g.
1. Trending Videos for new comer on site
2. Most popular news when user perfrences are not known
3. Most sold products on e-commerce sites for a new user

## 11. In what business scenario you should use CF based Recommendation Systems ? [2 Marks

1. Collaborative filtering is a personalized recommendation system that identifies the similarities between users (based on their taste ) to serve relevant product recommendations.
2. E.g. Netflix movies i.e. if your set of prefences matches to someother users( lets call them digital clones/neighbours) who have watched same movies and given similar ratings. Then on next login the user will display movies watched by your digital clones/neighbours that you have not yet seen. 
3. If business already has good amount of data about user's historical choices then CF based recommendation systems should be used. It will help in customer retention and will also make customer purchase faster, since the discovery time of needs get reduced.

## 12. What other possible methods can you think of which can further improve the recommendation for different users ?

1. Use hybrid approach between differnt recommendations to cover for different perspectives e.g. similar tastes, newness, mood changes etc.
2. Bundling should also be suggested using Apriori basket method for recommendation
3. Demographics based recommendation system i.e. Trending near you can also be used for e.g. if you have to vist a nearby hotel etc.
4. Feature based recommendation system which focuses on key features to recommend a product. 
5. Knowledge / Certification based recommendation system e.g. while selling re-used cars the most important thing is if a SME can say that it has passed all the certification tests