# Content-Based Recommender System

To create a recommender system, we need to have a user profile which we do not.
Later in this notebook, I will make one based on the Items dataset in which I will rate some Items based on my opinion, and then I will create a recommendation system for the user profile I created.

We can use both content-based and collaborative filtering methods. But, since our items are cellphones and we have a small dataset, it is hard to find similar tastes with our user profile, so  I think it's better to use a content-based method.




You can find the dataset and its description in [here](https://www.kaggle.com/grikomsn/amazon-cell-phones-reviews)

First I'm going to prepare the data:

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np

In [2]:
Items_df = pd.read_csv('items.csv')
reviews_df = pd.read_csv('reviews.csv')
Items_df.head()

Unnamed: 0,asin,brand,title,url,image,rating,reviewUrl,totalReviews,price,originalPrice
0,B0000SX2UC,,Dual-Band / Tri-Mode Sprint PCS Phone w/ Voice...,https://www.amazon.com/Dual-Band-Tri-Mode-Acti...,https://m.media-amazon.com/images/I/2143EBQ210...,3.0,https://www.amazon.com/product-reviews/B0000SX2UC,14,0.0,0.0
1,B0009N5L7K,Motorola,Motorola I265 phone,https://www.amazon.com/Motorola-i265-I265-phon...,https://m.media-amazon.com/images/I/419WBAVDAR...,3.0,https://www.amazon.com/product-reviews/B0009N5L7K,7,49.95,0.0
2,B000SKTZ0S,Motorola,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,https://www.amazon.com/MOTOROLA-C168i-CINGULAR...,https://m.media-amazon.com/images/I/71b+q3ydkI...,2.7,https://www.amazon.com/product-reviews/B000SKTZ0S,22,99.99,0.0
3,B001AO4OUC,Motorola,Motorola i335 Cell Phone Boost Mobile,https://www.amazon.com/Motorola-i335-Phone-Boo...,https://m.media-amazon.com/images/I/710UO8gdT+...,3.3,https://www.amazon.com/product-reviews/B001AO4OUC,21,0.0,0.0
4,B001DCJAJG,Motorola,Motorola V365 no contract cellular phone AT&T,https://www.amazon.com/Motorola-V365-contract-...,https://m.media-amazon.com/images/I/61LYNCVrrK...,3.1,https://www.amazon.com/product-reviews/B001DCJAJG,12,149.99,0.0


In [3]:
Items_df = Items_df[["asin","title" ,"brand", "rating","price"]]
print(Items_df.shape)
Items_df.head(5)

(720, 5)


Unnamed: 0,asin,title,brand,rating,price
0,B0000SX2UC,Dual-Band / Tri-Mode Sprint PCS Phone w/ Voice...,,3.0,0.0
1,B0009N5L7K,Motorola I265 phone,Motorola,3.0,49.95
2,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,2.7,99.99
3,B001AO4OUC,Motorola i335 Cell Phone Boost Mobile,Motorola,3.3,0.0
4,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,3.1,149.99


In [4]:
Items_df = Items_df[Items_df["price"] != 0]
print(Items_df.shape)
Items_df.head(5)

(596, 5)


Unnamed: 0,asin,title,brand,rating,price
1,B0009N5L7K,Motorola I265 phone,Motorola,3.0,49.95
2,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,2.7,99.99
4,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,3.1,149.99
9,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),Motorola,3.6,139.99
10,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,Samsung,3.3,64.99


In [5]:
Items_df = Items_df.dropna()
print(Items_df.shape)
Items_df.head(5)

(593, 5)


Unnamed: 0,asin,title,brand,rating,price
1,B0009N5L7K,Motorola I265 phone,Motorola,3.0,49.95
2,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,2.7,99.99
4,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,3.1,149.99
9,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),Motorola,3.6,139.99
10,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,Samsung,3.3,64.99


In [6]:
Items_df.dtypes

asin       object
title      object
brand      object
rating    float64
price     float64
dtype: object

In [7]:
Items_df.describe()

Unnamed: 0,rating,price
count,593.0,593.0
mean,3.760034,283.762648
std,0.723896,185.576207
min,1.0,1.0
25%,3.5,148.98
50%,3.9,229.99
75%,4.2,389.28
max,5.0,999.99


In [8]:
Items_df['brand'] = Items_df['brand'].astype(str)
Items_df['brand'] = Items_df['brand'].apply(lambda x: x.strip())
Items_df = Items_df.reset_index(drop=True)
Items_df.head(5)

Unnamed: 0,asin,title,brand,rating,price
0,B0009N5L7K,Motorola I265 phone,Motorola,3.0,49.95
1,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,2.7,99.99
2,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,3.1,149.99
3,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),Motorola,3.6,139.99
4,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,Samsung,3.3,64.99


I want to categorize the **price** and the average **rating** columns:

In [9]:
Items_df_2 = Items_df.copy()
bins=[0, 2, 4, 5]
labels=['low', 'medium', 'high']
Items_df_2['rating'] = pd.cut(Items_df_2['rating'], bins, labels=labels)
Items_df_2.head(5)

Unnamed: 0,asin,title,brand,rating,price
0,B0009N5L7K,Motorola I265 phone,Motorola,medium,49.95
1,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,medium,99.99
2,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,medium,149.99
3,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),Motorola,medium,139.99
4,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,Samsung,medium,64.99


In [10]:
bins=[0, 300, 600,1000]
labels=['economical', 'expensive', 'very expensive']
Items_df_2['price'] = pd.cut(Items_df_2['price'], bins, labels=labels)
Items_df_2.head(5)

Unnamed: 0,asin,title,brand,rating,price
0,B0009N5L7K,Motorola I265 phone,Motorola,medium,economical
1,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,Motorola,medium,economical
2,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,Motorola,medium,economical
3,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),Motorola,medium,economical
4,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,Samsung,medium,economical


In [11]:
Items_df_features = Items_df_2.copy()
Items_df_features = pd.get_dummies(Items_df_features, columns=['brand'])
Items_df_features = pd.get_dummies(Items_df_features, columns=['rating'])
Items_df_features = pd.get_dummies(Items_df_features, columns=['price'])

Items_df_features.head(5)

Unnamed: 0,asin,title,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
0,B0009N5L7K,Motorola I265 phone,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
1,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
2,B001DCJAJG,Motorola V365 no contract cellular phone AT&T,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
3,B002WTC1NG,Motorola Barrage V860 Phone (Verizon Wireless),0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
4,B0033SFV5A,Verizon or PagePlus Samsung Smooth U350 Great ...,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0


Here, I will create a user profile who likes Apple iPhones based on the rating:

In [12]:
userInput = [
            {'title':'Apple iPhone 7, 32GB, Rose Gold - For AT&T / T-Mobile (Renewed)', 'user_rating':4.5},
            {'title':'Apple iPhone Xs Max, 256GB, Space Gray - Fully Unlocked (Renewed)', 'user_rating':5},
            {'title':'Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked Smartphone (AT&T/T-Mobile/Metropcs/Cricket/Mint) - 5.0" HD Screen - Copper', 'user_rating':2},
            {'title':'Motorola Moto G (1st Generation) Unlocked Cellphone, 8GB, White', 'user_rating':3},
            {'title':'Samsung Galaxy A80 SM-A805F/DS Dual Sim (Factory Unlocked) 6.7" 128GB 8GB RAM (Ghost White)', 'user_rating':4}
         ] 
inputItems = pd.DataFrame(userInput)
inputItems


Unnamed: 0,title,user_rating
0,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",4.5
1,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",5.0
2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,2.0
3,Motorola Moto G (1st Generation) Unlocked Cell...,3.0
4,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,4.0


Now I'm going to build a straightforward recommendation system:

In [13]:
inputId = Items_df[Items_df['title'].isin(inputItems['title'].tolist())]
inputId.head()

Unnamed: 0,asin,title,brand,rating,price
40,B00K0NS0P4,Motorola Moto G (1st Generation) Unlocked Cell...,Motorola,4.0,209.75
259,B079HB518K,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",Apple,3.9,199.0
377,B07KFNRQ5S,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",Apple,4.1,664.99
387,B07L78G3D2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,Nokia,2.7,79.0
518,B07V5NSD8N,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,Samsung,4.7,499.99


In [14]:
inputItems = pd.merge(inputId, inputItems , on='title')
inputItems.head()

Unnamed: 0,asin,title,brand,rating,price,user_rating
0,B00K0NS0P4,Motorola Moto G (1st Generation) Unlocked Cell...,Motorola,4.0,209.75,3.0
1,B079HB518K,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",Apple,3.9,199.0,4.5
2,B07KFNRQ5S,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",Apple,4.1,664.99,5.0
3,B07L78G3D2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,Nokia,2.7,79.0,2.0
4,B07V5NSD8N,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,Samsung,4.7,499.99,4.0


In [15]:
inputItems = inputItems.drop('brand', 1).drop('rating', 1).drop('price', 1)
inputItems.head()

Unnamed: 0,asin,title,user_rating
0,B00K0NS0P4,Motorola Moto G (1st Generation) Unlocked Cell...,3.0
1,B079HB518K,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",4.5
2,B07KFNRQ5S,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",5.0
3,B07L78G3D2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,2.0
4,B07V5NSD8N,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,4.0


In [16]:
userItems = Items_df_features[Items_df_features['asin'].isin(inputItems['asin'].tolist())]
userItems

Unnamed: 0,asin,title,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
40,B00K0NS0P4,Motorola Moto G (1st Generation) Unlocked Cell...,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
259,B079HB518K,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0
377,B07KFNRQ5S,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1
387,B07L78G3D2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0
518,B07V5NSD8N,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0


In [17]:
userItems = userItems.reset_index(drop=True)
userItems

Unnamed: 0,asin,title,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
0,B00K0NS0P4,Motorola Moto G (1st Generation) Unlocked Cell...,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
1,B079HB518K,"Apple iPhone 7, 32GB, Rose Gold - For AT&T / T...",0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0
2,B07KFNRQ5S,"Apple iPhone Xs Max, 256GB, Space Gray - Fully...",0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1
3,B07L78G3D2,Nokia 3 - Android 9.0 Pie - 16 GB - Unlocked S...,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0
4,B07V5NSD8N,Samsung Galaxy A80 SM-A805F/DS Dual Sim (Facto...,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0


In [18]:
userfeatureTable = userItems.drop('asin', 1).drop('title', 1)
userfeatureTable

Unnamed: 0,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0
2,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1
3,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0
4,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0


In [19]:
inputItems['user_rating']

0    3.0
1    4.5
2    5.0
3    2.0
4    4.0
Name: user_rating, dtype: float64

In [20]:
#Dot produt to get weights
userProfile = userfeatureTable.transpose().dot(inputItems['user_rating'])
#The user profile
userProfile

brand_ASUS              0.0
brand_Apple             9.5
brand_Google            0.0
brand_HUAWEI            0.0
brand_Motorola          3.0
brand_Nokia             2.0
brand_OnePlus           0.0
brand_Samsung           4.0
brand_Sony              0.0
brand_Xiaomi            0.0
rating_low              0.0
rating_medium           9.5
rating_high             9.0
price_economical        9.5
price_expensive         4.0
price_very expensive    5.0
dtype: float64

In [21]:
#Now let's get the features of every item in our original dataframe
featureTable = Items_df_features.set_index(Items_df_features['asin'])
featureTable.head(2)

Unnamed: 0_level_0,asin,title,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
B0009N5L7K,B0009N5L7K,Motorola I265 phone,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
B000SKTZ0S,B000SKTZ0S,MOTOROLA C168i AT&T CINGULAR PREPAID GOPHONE C...,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0


In [22]:
#And drop the unnecessary information
featureTable = featureTable.drop('asin', 1).drop('title', 1)
featureTable.head(3)

Unnamed: 0_level_0,brand_ASUS,brand_Apple,brand_Google,brand_HUAWEI,brand_Motorola,brand_Nokia,brand_OnePlus,brand_Samsung,brand_Sony,brand_Xiaomi,rating_low,rating_medium,rating_high,price_economical,price_expensive,price_very expensive
asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
B0009N5L7K,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
B000SKTZ0S,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0
B001DCJAJG,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0


In [23]:
featureTable.shape

(593, 16)

In [24]:
#Multiply the features by the weights and then take the weighted average
recommendationTable_df = ((featureTable*userProfile).sum(axis=1))/(userProfile.sum())
recommendationTable_df.head()

asin
B0009N5L7K    0.396396
B000SKTZ0S    0.396396
B001DCJAJG    0.396396
B002WTC1NG    0.396396
B0033SFV5A    0.414414
dtype: float64

In [25]:
#Sort our recommendations in descending order
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
#Just a peek at the values
recommendationTable_df.head()

asin
B077NK4TZ7    0.513514
B079HB518K    0.513514
B07SVPKTYK    0.513514
B077NJQPGB    0.513514
B06XR8G1TX    0.513514
dtype: float64

Best recommendations for the user we made are:

In [26]:
#The final recommendation table
Items_df.loc[Items_df['asin'].isin(recommendationTable_df.head(20).keys())]

Unnamed: 0,asin,title,brand,rating,price
53,B00V8STWY8,"Apple iPad Air MF529LL/A (32GB, Wi-Fi + at&T, ...",Apple,2.7,219.95
69,B01AUOS8BI,"Apple iPhone 6S Plus, 128GB, Rose Gold - For A...",Apple,2.5,239.98
71,B01CR1FQMG,"Apple iPhone 6S, 64GB, Rose Gold - For AT&T / ...",Apple,3.8,160.08
116,B01N4IHGHI,"Apple iPhone 6S, 16GB, Gold - For AT&T (Renewed)",Apple,3.7,150.0
117,B01N4R20RS,"Apple iPhone 7, 32GB, Black - Fully Unlocked (...",Apple,3.6,209.94
122,B01N9YO1DS,"Apple iPhone 7, 128GB, Gold - For AT&T / T-Mob...",Apple,3.5,215.0
130,B06X9X15Y8,"Apple iPhone 7 Plus, 128GB, Silver - Fully Unl...",Apple,3.9,299.99
131,B06XGLHP8V,"Apple iPhone 7 Plus, GSM Unlocked, 128GB - Ros...",Apple,3.5,288.0
134,B06XR1K6HR,"Apple iPhone 6S Plus, 64GB, Rose Gold - For AT...",Apple,3.8,249.99
135,B06XR8G1TX,"Apple iPhone 6S, 64GB, Space Gray - Fully Unlo...",Apple,3.5,156.88


Worst recommendations for the user we made are:

In [27]:
Items_df.loc[Items_df['asin'].isin(recommendationTable_df.tail(20).keys())]

Unnamed: 0,asin,title,brand,rating,price
16,B0096DERAG,Motorola MC75A Hand Held Computer Windows Mobi...,Motorola,1.0,499.95
297,B07D9TTLZG,"OnePlus Factory Unlocked Phone - 6.28"" Screen ...",OnePlus,1.0,426.17
298,B07DCB61LG,OnePlus 6 A6000 64GB/6GB Mirror Black - Dual B...,OnePlus,4.4,479.0
310,B07FMD7MRX,"Motorola Moto G6 Play 16GB - 5.7"" 4G LTE Unloc...",Motorola,2.0,154.97
342,B07HC74RMG,Samsung Galaxy S9 Enterprise Edition 64 GB Unl...,Samsung,1.0,826.69
386,B07L6RCH5W,"Google Pixel 3, Verizon, 64 GB - Clearly White...",Google,1.4,310.99
403,B07ND4ZN2X,Google Pixel 3 128GB Unlocked - White (Renewed),Google,4.6,429.0
406,B07NLMCDN4,Samsung Galaxy Note9 Smartphone 6.4in AT&T And...,Samsung,1.0,469.0
471,B07R4PP7FF,Xiaomi Mi 9 64GB + 6GB RAM - 48MP Ultra High R...,Xiaomi,4.5,334.9
483,B07RSSVMH8,Motorola G7 Play 32GB GSM Nano-SIM Phone w/ 13...,Motorola,1.0,121.08
