# Project Recommendation Systems #

---

DOMAIN: Smartphone, Electronics
• CONTEXT: India is the second largest market globally for smartphones after China. About 134 million smartphones were sold across Indiain the year 2017 and is estimated to increase to about 442 million in 2022. India ranked second in the average time spent on mobile web bysmartphone users across Asia Pacific. The combination of very high sales volumes and the average smartphone consumer behaviour hasmade India a very attractive market for foreign vendors. As per Consumer behaviour, 97% of consumers turn to a search engine when theyare buying a product vs. 15% who turn to social media. If a seller succeeds to publish smartphones based on user’s behaviour/choice at theright place, there are 90% chances that user will enquire for the same. This Case Study is targeted to build a recommendation systembased on individual consumer’s behaviour or choice.

DATA DESCRIPTION:
- author : name of the person who gave the rating
- country : country the person who gave the rating belongs to
- data : date of the rating
- domain: website from which the rating was taken from
- extract: rating content
- language: language in which the rating was given
- product: name of the product/mobile phone for which the rating was given
- score: average rating for the phone
- score_max: highest rating given for the phone
- source: source from where the rating was taken

PROJECT OBJECTIVE: We will build a recommendation system using popularity based and collaborative filtering methods to recommend mobile phones to a user which are most popular and personalised respectively.



In [1]:
# Load the Libraries
import numpy  as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

class color:
   BOLD = '\033[1m'
   END = '\033[0m'


*** 1. Import the necessary libraries and read the provided CSVs as a data frame and perform the below steps. ***

***Q 1A. Merge all the provided CSVs into one dataFrame ***

***Ans 1A.***

In [2]:
# Load the libraries 
import glob
import os

In [3]:
# Read file with encoding to avoid parsing errors as content contains unicode or UTF-8 contents
filepath = 'RecommendationSystem'

combined_path = glob.glob(os.path.join(filepath, "phone_*.csv"))
df_of_file    = ( pd.read_csv( f, sep=',', encoding='latin1' ) for f in combined_path)
df_merged     = pd.concat(df_of_file, ignore_index=True)


In [4]:
df_merged.shape

(1415144, 14)


***Q 1B. . Explore, understand the Data and share at least 2 observations.***

***Ans 2A.***

In [5]:
df_merged.head

<bound method NDFrame.head of                           Samsung Galaxy Note Edge (Black)  Unnamed: 1  \
0                                                      NaN         NaN   
1                                                      NaN         NaN   
2                                                      NaN         NaN   
3                                                      NaN         NaN   
4        Samsung Galaxy Note510.0149Motorola Smartphone...         NaN   
...                                                    ...         ...   
1415139                                                NaN         NaN   
1415140                                                NaN         NaN   
1415141                                                NaN         NaN   
1415142                                                NaN         NaN   
1415143                                                NaN         NaN   

         Unnamed: 2                          phone_url       date lang  \
0      

***Observations***
- There are 11 features in the csv file
- The feature 'phone_url' needs to be trimmed or removed or hot encoded asthe orginal value does not provide any correlation to the data
-  Simillarly the feature 'extract' is simillar to review comments and does not add added value or correlation
- The features 'lang', 'country', 'source' and 'domain' needs to hot encoded with numerical values for better correlation


***Q 1C. Round off scores to the nearest integers. ***

***Ans 1C.***



In [6]:
# find the median value
Scr_med_val = df_merged[ 'score' ].median( )
# Assign the median value to nulls
df_merged[ 'score' ].fillna( value = Scr_med_val, inplace = True )



In [7]:
# Check if there are any nulls or NA
df_merged['score'].isnull().sum()

0

In [8]:
# Chaneg the datatype from float to int to round the value
df_merged['score'] = df_merged['score'].astype(int)

In [9]:
df_merged[ 'score' ].head()

0    9
1    9
2    9
3    9
4    9
Name: score, dtype: int32

*** Q 1D. Check for missing values. Impute the missing values, if any***

***Ans 1D***

In [10]:
# check the null values across the columns
df_merged.isnull().sum()

Samsung Galaxy Note Edge (Black)    1415138
Unnamed: 1                          1415139
Unnamed: 2                          1415139
phone_url                                11
date                                     11
lang                                     11
country                                  11
source                                   11
domain                                   11
score                                     0
score_max                             63500
extract                               19372
author                                63213
product                                  12
dtype: int64

In [11]:
# Score null values are already filled in.  Fill the the null score max feature of the row using the core value of the respective row
df_merged[ 'score_max' ].fillna( df_merged[ 'score' ] , inplace = True )

In [12]:
df_merged.isnull().sum()

Samsung Galaxy Note Edge (Black)    1415138
Unnamed: 1                          1415139
Unnamed: 2                          1415139
phone_url                                11
date                                     11
lang                                     11
country                                  11
source                                   11
domain                                   11
score                                     0
score_max                                 0
extract                               19372
author                                63213
product                                  12
dtype: int64

In [13]:
df_merged[ df_merged[ 'product' ].isnull() ] 

Unnamed: 0,Samsung Galaxy Note Edge (Black),Unnamed: 1,Unnamed: 2,phone_url,date,lang,country,source,domain,score,score_max,extract,author,product
0,,,,,,,,,,9,9.0,,,
1,,,,,,,,,,9,9.0,,,
2,,,,,,,,,,9,9.0,,,
3,,,,,,,,,,9,9.0,,,
4,Samsung Galaxy Note510.0149Motorola Smartphone...,,,,,,,,,9,9.0,,,
5,,,,,,,,,,9,9.0,,,
6,Samsung Galaxy Note5,10.0,149.0,,,,,,,9,9.0,,,
7,Motorola Smartphone Motorola Moto X Desbloquea...,10.0,134.0,,,,,,,9,9.0,,,
8,Motorola Smartphone Motorola Moto G Dual Chip ...,10.0,130.0,,,,,,,9,9.0,,,
9,Nokia Smartphone Nokia Lumia 520 Desbloqueado ...,10.0,128.0,,,,,,,9,9.0,,,


***Q 1E. Check for duplicate values and remove them, if any***

***Ans 1E***

In [14]:
# Use the drop_duplicates function to drop the duplicate rows
df_merged_wo_dup = df_merged.drop_duplicates()

In [15]:
# Check the dataframe after dropping the duplicate rows
df_merged_wo_dup.shape

(1408720, 14)


***Q F. Keep only 1 Million data samples. Use random state=612.***

***Ans. 1F***

Remove the records where the author value is null -  around 63K records

Then take a sample of 1 Million records/ rows from remaining rows

In [16]:
df_merged_wo_dup.dropna( axis = 0, subset = [ 'author' ], inplace=True )

In [17]:
df_merged_wo_dup.shape

(1346896, 14)

In [18]:
# Use the sample function and specify the number or rows and the random state to maintain integrity across re-runs
df_merged_fin = df_merged_wo_dup.sample( n=1000000, random_state=612 )

In [19]:
df_merged_fin.shape

(1000000, 14)


***Q 1G. Drop irrelevant features. Keep features like Author, Product, and Score***

***Ans 1G***

drop 'phone_url' features as they dot influence the recommendation systems

In [20]:
# drop the feature phone_url 
df_merged_fin = df_merged_fin.drop( [ 'phone_url' ] , axis = 1 )

In [21]:
df_merged_fin.shape

(1000000, 13)

----

***Q 2A Identify the most rated products.***

***Ans 2A.***


In [22]:
# Group by product and author to find the top rated product
most_rt_prd = df_merged_fin.groupby( 'product' )[ 'author' ].count( )
most_rt_prd.sort_values( ascending = False, inplace = True )
most_rt_prd.head( 10 )

product
Lenovo Vibe K4 Note (White,16GB)                3839
Lenovo Vibe K4 Note (Black, 16GB)               3205
OnePlus 3 (Graphite, 64 GB)                     3013
OnePlus 3 (Soft Gold, 64 GB)                    2666
Samsung Galaxy Express I8730                    1997
Huawei P8lite zwart / 16 GB                     1971
Lenovo Vibe K5 (Gold, VoLTE update)             1905
Samsung Galaxy S6 zwart / 32 GB                 1760
Lenovo Vibe K5 (Grey, VoLTE update)             1542
Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)    1453
Name: author, dtype: int64

***Q 2B. Identify the users with most number of reviews***

***Ans 2B***

In [23]:
# Group by author vs product (score also can be used) to find the users providing most reviews
auth_most_num_rev = df_merged_fin.groupby( 'author' )[ 'product' ].count( )
auth_most_num_rev.sort_values(ascending = False, inplace = True)
auth_most_num_rev.head(10)

author
Amazon Customer    57287
Cliente Amazon     14348
e-bit               6265
Client d'Amazon     5706
Amazon Kunde        3576
Anonymous           2063
einer Kundin        1905
einem Kunden        1441
unknown             1277
Anonymous           1095
Name: product, dtype: int64


***Q 2C. Select the data with products having more than 50 ratings and users who have given more than 50 ratings. Report the shape of the final
dataset.***

***Ans 2C.***

In [24]:
# Group by product and author agsint score to find the highest combonation of product and users (author) vs score
most_rt_prd1 = df_merged_fin.groupby( ['product', 'author' ] )[ 'score' ].count( )
most_rt_prd1.sort_values( ascending = False, inplace = True)
most_rt_prd1.head( 10 )

product                                       author         
Lenovo Vibe K4 Note (White,16GB)              Amazon Customer    2280
Lenovo Vibe K4 Note (Black, 16GB)             Amazon Customer    1844
OnePlus 3 (Graphite, 64 GB)                   Amazon Customer    1357
OnePlus 3 (Soft Gold, 64 GB)                  Amazon Customer    1317
Lenovo Vibe K5 (Gold, VoLTE update)           Amazon Customer    1189
Lenovo Vibe K5 (Grey, VoLTE update)           Amazon Customer    1011
Lenovo Used Lenovo Zuk Z1 (Space Grey, 64GB)  Amazon Customer     861
YU Yuphoria YU5010A (Black+Silver)            Amazon Customer     664
Lenovo Vibe K5 (Silver, 16GB)                 Amazon Customer     655
OnePlus 3T (Gunmetal, 6GB RAM + 64GB memory)  Amazon Customer     626
Name: score, dtype: int64

In [25]:
# filter the grouped data for review 
most_rt_prd_lst = most_rt_prd1[ most_rt_prd1 > 50].index.to_list()

In [26]:
len(most_rt_prd_lst)

222

In [27]:
Df_flt_mot_rated = pd.DataFrame()

for val in most_rt_prd_lst:
    #print(val[0] + ' : ' + val[1])
    Df_flt_mot_rated = Df_flt_mot_rated.append( df_merged_fin[ (df_merged_fin['product'] == val[0] ) &  ( df_merged_fin['author'] == val[1]) ] )
    #print(Df_flt_mot_rated.shape)
    
    

In [28]:
# Report the shape of the final dataframe
Df_flt_mot_rated.shape

(40002, 13)


----

***Q 3. Build a popularity based model and recommend top 5 mobile phones.**

***Ans 3.***


In [29]:
rcmd_mob_phn = pd.DataFrame( df_merged_fin.groupby( 'product' )[ 'score' ].mean() )


In [30]:
rcmd_mob_phn['scorecount'] = pd.DataFrame( df_merged_fin.groupby( 'product' )[ 'score' ].count() )


In [31]:
rcmd_mob_phn.sort_values(by=['score','scorecount'], ascending=[False, False] , inplace=True)
rcmd_mob_phn.head()

Unnamed: 0_level_0,score,scorecount
product,Unnamed: 1_level_1,Unnamed: 2_level_1
Samsung Galaxy Note5,10.0,149
Motorola Smartphone Motorola Moto X Desbloqueado Preto Android 4.2.2 CÃ¢mera 10MP e Frontal 2MP MemÃ³ria Interna de 16GB GSM,10.0,134
Motorola Smartphone Motorola Moto G Dual Chip Desbloqueado TIM Android 4.3 Tela 4.5 8GB 3G Wi-Fi CÃ¢mera 5MP - Preto,10.0,130
Nokia Smartphone Nokia Lumia 520 Desbloqueado Oi Preto Windows Phone 8 CÃ¢mera 5MP 3G Wi-Fi MemÃ³ria Interna 8G GPS,10.0,128
Samsung Smartphone Galaxy Win Duos Branco Desbloqueado Dual Chip CÃ¢mera 5MP Processador Quad Core 1.2 Ghz Android 4.1 3G Wi- Fi e MemÃ³ria 8GB,10.0,122



---

***Q 4.Build a collaborative filtering model using SVD. You can use SVD from surprise or build it from scratch(Note: Incase you’re building it from scratch you
can limit your data points to 5000 samples if you face memory issues). Build a collaborative filtering model using kNNWithMeans from surprise. You
can try both user-based and item-based model.***

***Ans 4.***

In [32]:
# Load the suprise nd KNNWithMeans libraries
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import train_test_split


In [33]:
df_red_sam = df_merged_fin.sample( n=50000, random_state=5 )

In [34]:
#Transform the data into SVD data format or map to SVD feature headers
reader = Reader(rating_scale=(1, 10))

svddata = Dataset.load_from_df(df_red_sam[['author', 'product', 'score']], reader)




In [35]:
#Split the svd dataset into training and test (75 : 25)
trainset, testset = train_test_split(svddata, test_size=.25, random_state=5 )

In [36]:
# Use item-based collaborative filtering
KNN_Mn_algo = KNNWithMeans(k=25, sim_options={'name': 'pearson_baseline', 'user_based': False})
KNN_Mn_algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x257066d2d60>

In [37]:
# execute the model with the testset
test_pred = KNN_Mn_algo.test(testset)

In [38]:
test_pred

[Prediction(uid='Deacon', iid='Sony Ericsson Xperia X10 Mini', r_ui=10.0, est=5.4, details={'actual_k': 0, 'was_impossible': False}),
 Prediction(uid='Ð\x95Ñ\x80Ð½Ð°Ð·Ð°Ñ\x80Ð¾Ð² Ð\x9cÐ¸Ñ\x85Ð°Ð¸Ð»', iid='Huawei Ascend Y511', r_ui=10.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Ian Fonseca', iid='OnePlus X (Onyx, 16GB)', r_ui=10.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Amazon Customer', iid='Apple iPhone 4S AT&T Cellphone, 16GB, Black', r_ui=10.0, est=7.0, details={'actual_k': 0, 'was_impossible': False}),
 Prediction(uid='vshalamay', iid='Nokia 6700 Classic', r_ui=4.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='sander1980', iid='Nokia 1100', r_ui=8.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='knoocker', iid='Lenovo P780 4Gb', 

In [39]:
# Use user-based collaborative filtering
KNN_Mn_algo = KNNWithMeans(k=20, sim_options={'name': 'pearson_baseline', 'user_based': True})
KNN_Mn_algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x257066dff40>

In [40]:
# execute the model with the testset
test_pred = KNN_Mn_algo.test(testset)

In [41]:
test_pred

[Prediction(uid='Deacon', iid='Sony Ericsson Xperia X10 Mini', r_ui=10.0, est=10, details={'actual_k': 0, 'was_impossible': False}),
 Prediction(uid='Ð\x95Ñ\x80Ð½Ð°Ð·Ð°Ñ\x80Ð¾Ð² Ð\x9cÐ¸Ñ\x85Ð°Ð¸Ð»', iid='Huawei Ascend Y511', r_ui=10.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Ian Fonseca', iid='OnePlus X (Onyx, 16GB)', r_ui=10.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='Amazon Customer', iid='Apple iPhone 4S AT&T Cellphone, 16GB, Black', r_ui=10.0, est=7.010813352139163, details={'actual_k': 0, 'was_impossible': False}),
 Prediction(uid='vshalamay', iid='Nokia 6700 Classic', r_ui=4.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='sander1980', iid='Nokia 1100', r_ui=8.0, est=8.03376, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'}),
 Prediction(uid='knoocker', iid='Lenov


---

***Q 5. Evaluate the collaborative model. Print RMSE value.***

***Ans 5.***


In [42]:
# Calculate RMSE against the test data
print("User-based Collobortive model : validating RMSE using test data")
accuracy.rmse(test_pred, verbose=True)

User-based Collobortive model : validating RMSE using test data
RMSE: 2.6049


2.604856498451057

---

***Q 6.Predict score (average rating) for test users.***
                                                           
***Ans 6.***
                                                           

In [43]:
 #df_red_sam.to_csv("RecommendationSystem\Phone_reduced_data.csv")

In [44]:
# Predict score for test user 'Amazon Kunde' who has not predicted Samsung Galaxy Note Edge (Black)
authortest = 'Amazon Kunde'  
producttest = 'Samsung Galaxy Note Edge (Black)'


In [45]:
predtn = KNN_Mn_algo.predict( authortest, producttest, verbose=True)

user: Amazon Kunde item: Samsung Galaxy Note Edge (Black) r_ui = None   est = 7.58   {'actual_k': 0, 'was_impossible': False}


In [46]:
authortest = 'e-bit'  
producttest = 'YU Yuphoria YU5010A (Black+Silver)'
predtn = KNN_Mn_algo.predict( authortest, producttest, verbose=True)

user: e-bit      item: YU Yuphoria YU5010A (Black+Silver) r_ui = None   est = 8.86   {'actual_k': 0, 'was_impossible': False}



---

***Q 7. Report your findings and inferences***

***Ans 7.***

- The prediction for model provides a common rating when user - item modekl ised than item-item is used
- The median rating of predicted values are close to 8 rating
- The input data has same product listed duplicate based on encoded, capacity, color  information and might have skewed the prediction data of a product

---

***Q 8. Try and recommend top 5 products for test users.***

***Ans 8***
Authors 'Amazon Kunde'  and 'e-bit' is identified for recommendation

In [47]:
# Obtain the list of unique products
uniq_prod = df_red_sam['product'].unique( )

In [48]:
# Get the list of products reviewed by the user Amazon Kunde
user_rev_prod = df_red_sam.loc[ df_red_sam[ 'author' ]=='Amazon Kunde', 'product']

In [49]:
# remove the rated movies for the recommendations
net_prod_for_recmd = np.setdiff1d( uniq_prod , user_rev_prod )

In [50]:
top_prod_recmd = []
#Iterate through all the products and predict the score
for prod in net_prod_for_recmd:
    top_prod_recmd.append( ( prod, KNN_Mn_algo.predict(uid='Amazon Kunde', iid=prod ).est ) )

#Sort the predircts for the user Amazon Kunde to display the top 5 products                                             
pd.DataFrame(top_prod_recmd , columns = [ 'Product', 'score' ] ).sort_values( 'score', ascending=False ).head( 5 )

Unnamed: 0,Product,score
4516,"Honor 6 - Smartphone libre (pantalla de 5"", 16...",10.0
8857,Nexus LG Nexus 5 UK Smartphone - White (16GB),10.0
7413,Microsoft Lumia 435 UK SIM-Free Smartphone - B...,10.0
300,"Acer Liquid E1 Duo Smartphone 11,4 cm (4,5 Zol...",10.0
7686,Microsoft Nokia 3510 Handy pleasure,10.0


In [51]:
# Predict for user e-bit
# Get the list of products reviewed by the user Amazon Kunde
user_rev_prod = df_red_sam.loc[ df_red_sam[ 'author' ]=='e-bit', 'product']

In [52]:
# remove the rated movies for the recommendations
net_prod_for_recmd = np.setdiff1d( uniq_prod , user_rev_prod )

In [53]:
top_prod_recmd = []
#Iterate through all the products and predict the score
for prod in net_prod_for_recmd:
    top_prod_recmd.append( ( prod, KNN_Mn_algo.predict(uid='e-bit', iid=prod ).est ) )

#Sort the predircts for the user Amazon Kunde to display the top 5 products                                             
pd.DataFrame(top_prod_recmd , columns = [ 'Product', 'score' ] ).sort_values( 'score', ascending=False ).head( 5 )

Unnamed: 0,Product,score
12150,"Samsung Galaxy A3 (2016) - Smartphone de 4.7"" ...",9.52957
7098,Lenovo Motorola Moto G 4G (2 Generazione) Smar...,9.196237
0,(DG300 Versione Aggiornata)5'' DOOGEE VOYAGER2...,8.862903
11954,Samsung GALAXY Trend Lite - night black - 3G 4...,8.862903
11959,Samsung GOOGLE NEXUS S UNLOCKED CELL PHONE,8.862903



*** Q 9. Try other techniques (Example: cross validation) to get better results.***

***Ans 9.***


In [54]:
from surprise.model_selection import cross_validate

In [55]:
cross_validate(KNN_Mn_algo, svddata, measures=['RMSE', 'MAE', 'MSE'], cv=3, verbose=True)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE, MSE of algorithm KNNWithMeans on 3 split(s).

                  Fold 1  Fold 2  Fold 3  Mean    Std     
RMSE (testset)    2.6059  2.6505  2.6265  2.6276  0.0182  
MAE (testset)     1.9694  2.0029  1.9942  1.9888  0.0142  
MSE (testset)     6.7905  7.0252  6.8985  6.9047  0.0959  
Fit time          11.55   11.52   10.23   11.10   0.62    
Test time         0.18    0.19    0.20    0.19    0.01    


{'test_rmse': array([2.60585342, 2.65050663, 2.62650906]),
 'test_mae': array([1.96939948, 2.00292833, 1.99419252]),
 'test_mse': array([6.79047206, 7.0251854 , 6.89854984]),
 'fit_time': (11.546854734420776, 11.52302861213684, 10.226139783859253),
 'test_time': (0.18176913261413574, 0.1926116943359375, 0.20339632034301758)}

In [56]:
# Trying the same using SVD alogorithm
from surprise import SVD
svd_algo = SVD()

In [57]:
cross_validate(svd_algo, svddata, measures=['RMSE', 'MAE', 'MSE'], cv=3, verbose=True)

Evaluating RMSE, MAE, MSE of algorithm SVD on 3 split(s).

                  Fold 1  Fold 2  Fold 3  Mean    Std     
RMSE (testset)    2.5478  2.5326  2.5299  2.5368  0.0079  
MAE (testset)     1.9541  1.9553  1.9462  1.9519  0.0041  
MSE (testset)     6.4915  6.4143  6.4004  6.4354  0.0401  
Fit time          1.71    1.30    1.32    1.44    0.19    
Test time         0.07    0.07    0.15    0.10    0.04    


{'test_rmse': array([2.54784673, 2.53264333, 2.52989227]),
 'test_mae': array([1.95408467, 1.9553431 , 1.94618482]),
 'test_mse': array([6.49152296, 6.41428222, 6.40035491]),
 'fit_time': (1.711954116821289, 1.2959108352661133, 1.3212840557098389),
 'test_time': (0.07474899291992188, 0.06598663330078125, 0.14551329612731934)}

ANS: The SVD alogorithm (Fold 1) has a better RMSE compared to KNNWithMeans Algorithm 


---

***Q 10. In what business scenario you should use popularity based Recommendation Systems ***

***Ans 10.***
Recommendation Systems can be used in
- Ecommerce Sites that shows recommendation based on product purchased.
- Entertainment sites that stream media like movies, songs and videos  etc can show recommendations based on popularity especially including cold start sceanrios
- Financial sites that can recommend products based on volume popularity
- News ans social modeia sites or apps can recommend news and posts that are most viewed or liked



---

***Q 11. In what business scenario you should use CF based Recommendation Systems***

***Ans. 11***

The Colorboratie Filtering can be used in 
- Ecommerce sites to recommend based on user-item based coloborative fiktering using items purchased by users having simillar profile or purchase patterns
- Streaming sites and media sites can recommend movies, songs or videos based on user preferences and simillar users interest in generes 
- Item-item based coloborative fiktering is used to recommend a product that has simillar features against the products previously purchased by the user




---

***Q 12. What other possible methods can you think of which can further improve the recommendation for different users***

***Ans 12***

The recommendation system should
- Focus on  recommending selected products that has high sale probability simillar to pareto principle 20% of the products result in 80% of sales
- Analyze the popularity ratings holoisticly looking at the avg users rated that product and using a threhold limit to avoid small user base rating higher to increase sales dubiously
- Build related, allied  products or accessory mappings to recommend products that can be be purchased together