# **E-Commerce Recommendation System (Retailrocket)**

*M. Ruben (June 2019)*
***
<div style="text-align: justify">This notebook aims to create a recommendation system for an e-commerce dataset. The dataset was collected from a real-world e-commerce dataset. Using this dataset, three different models of recommendation system was developed using the Collaborative Filtering Technique.</div>
***
**Acknowledgement**: The original dataset can be found in [this link](https://www.kaggle.com/retailrocket/ecommerce-dataset). In addition, the collaborative filtering technique used in this notebook is based on the following [blog post](https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-engine-python/).
***
# Contents<a id = 'toc'></a>
- **[1. Modules](#1)** 
- **[2. Data Exploration](#2)** 
- **[3. Collaborative Filtering based on Items Viewed](#3)** 
    * [**3.1** *Data Wrangling*](#3.1)  
    * [**3.2** *Calculating similarity and event score*](#3.2)
    * [**3.3** *Example of Top 5 recommended item based on result (user-user)*](#3.3)
    * [**3.4** *Example of Top 5 recommended item based on result (item-item)*](#3.4)<br><br>  
- **[4. Collaborative Filtering based on Items Bought](#4)**
    * [**4.1** *Data Wrangling*](#4.1)  
    * [**4.2** *Calculating similarity and event score*](#4.2)
    * [**4.3** *Example of Top 5 recommended item based on result (user-user)*](#4.3)
    * [**4.4** *Example of Top 5 recommended item based on result (item-item)*](#4.4)<br><br>  
- **[5. Collaborative Filtering based on All Events](#5)**
    * [**5.1** *Data Wrangling*](#5.1)  
    * [**5.2** *Calculating similarity and event score*](#5.2)
    * [**5.3** *Example of Top 5 recommended item based on result (user-user)*](#5.3)
    * [**5.4** *Example of Top 5 recommended item based on result (item-item)*](#5.4)<br><br>  
- **[6. Concluding Remarks](#6)**

## <a id = '1'></a> 1. Modules

The modules are loaded in this section. Modules used in this library are Pandas, NumPy and Scikit-Learn. 

In [10]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances 

## <a id = '2'></a> 2. Data Exploration

<div style="text-align: justify">This section focuses on a brief process of exploring the Retailrocket dataset. There are altogether 4 CSV files included in the dataset: 'category_tree.csv', 'events.csv', 'item_properties_part1.csv' and 'item_properties_part2.csv'. The first file, 'category_tree.csv', only contains information pertaining the category of each of the item/product. The second file, 'events.csv', contains the record of each event that a visitor makes when interacting with the site. Finally, both 'item_properties_part1.csv' and 'item_properties_part2.csv' contains information regarding the properties of the items/products. The files 'category_tree.csv', 'item_properties_part1.csv' and 'item_properties_part2.csv' would be useful for creating a Content-Based Filtering recommendation system, meanwhile the file 'events.csv' can be used to develop a Collaborative Filtering recommendation system. The data in the files 'item_properties_part1.csv' and 'item_properties_part2.csv', however, were hashed due to isssues with confidentiality. It is hard to judge the context of the item properties due to this issue. Therefore, this notebook would only focus on creating a recommendation system using the Collaborative Filtering technique based on the data obtained from 'events.csv'.<br><br/>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The cell code below shows the data exploration conducted before building the recommendation system. The data was first imported from the 'event.csv' file using pandas.read_csv(). Through pandas.DataFrame.head, the first 5 lines of the imported dataframe was then shown. There are five columns in the dataframe: 'timestamp', 'visitorid', 'event', 'itemid' and 'transactionid'. The 'timestamp' column refers to the time at which the event occurs and is in the format of unix timestamp. For the purpose of this notebook, the 'timestamp' information is neglegible. The 'visitorid' on the other hand shows the ID assigned to a particular visitor, which is a user of the website. The column 'event' shows the type of interaction, herein referred to as an event, a user conducted while using the website. There are three possible type of event: 'view', 'addtocart' and 'transaction'. A 'view' event is recorded into the system when a visitor views an item, meanwhile an 'addtocart' event is recorded if the visitor then adds the item to the cart. If the visitor then proceeds to complete the process of buying the item, a 'transaction' event is recorded. In the case of a 'transaction' event, an ID would be assigned to that specific event and is then stored in the 'transactionid' column. A transaction ID is only assigned when a 'transaction' event occurs, thereby accounting for the NaN values in the 'transactionid' column for non-transaction events.<br><br/>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A total of 2,664,312 'view' events were recorded, of which 69,332 items were then added to the visitor's cart. However, only 22,457 transaction were recorded meaning only about 1/3 of the 'addtocart' events were converted into a 'transaction' event. In total, there are 1,407,580 unique visitors and 235,061 unique items in this dataframe. Based on these type of events, there 3 ways we can develop a model. The first is to develop the recommendation model based on only the 'view' events. Secondly, only use the 'transaction' events or finally, use all of the events by assigning score based on the type of event. The first method is shown in section 3, the second method can be found in section 4 and finally  section 5 shows the final method.
</div> 

 [Back to table of content.](#toc)

In [11]:
# import event data
events = pd.read_csv('CSV/events.csv')

# Show the first five events
print('First five events :')
display(events.head())

# Calculate number of unique view, addtocart and transcation 
print('\nNumber of unique views, addtocarts and transactions :')
print(events.event.value_counts())

# Calculate number of unique visitors
print('\nNumber of unique visitors :')
print(events.visitorid.nunique())

# Calculate number of unique items
print('\nNumber of unique items :')
print(events.itemid.nunique())

First five events :


Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,



Number of unique views, addtocarts and transactions :
view           2664312
addtocart        69332
transaction      22457
Name: event, dtype: int64

Number of unique visitors :
1407580

Number of unique items :
235061


 [Back to table of content.](#toc)

## <a id = '3'></a> 3. Collaborative Filtering based on Items Viewed

<a id = '3.1'></a> 
***
**3.1** *Data Wrangling* 
***
<div style="text-align: justify"> A new dataframe was created by extracting only the rows containing a 'view' event. The 'timestamp', 'event' and 'transactionid' column were dropped from the dataframe because the information were neglegible at that point. Due to technical constraints (memory error), the dataframe needed to be reduced in size. Therefore only the first 1000 unique visitors were considered for this model. After the dataframe was reduced in size, the total number of unique visitors and unique items in this reduced dataframe were then calculated. These two variables were then used to create an empty matrix which will be used as the visitor-item matrix. A for loop was then utilized to fill in the visitor-item matrix with the information of which items were viewed by each visitor.<br><br/>
<div/>

In [12]:
# Extract view events then drop timestamp, event and transactionid columns
view_events = events[events['event'].str.contains('view')].drop(
    ['timestamp', 'event', 'transactionid'], axis=1)

# Extract view events of the first 1000 unique visitors
view_visitor_id = view_events.visitorid.unique()
fot_view_vis_id = view_visitor_id[:1000]

# Extract view events of first one thousand visitors (fotv) from view_events dataframe
view_events_fotv = view_events[view_events['visitorid'].isin(fot_view_vis_id)]

# Count number of unique visitors and unique items
view_items_list = view_events_fotv.itemid.unique()
n_view_items = len(view_items_list)
view_vis_list = view_events_fotv.visitorid.unique()
n_view_vis = len(view_vis_list)

# Create visitor_item matrix
view_vi_matrix = np.zeros((n_view_vis, n_view_items))
for row in view_events_fotv.itertuples(index = False):
    
    # Get the order of visitor among the unique visitors
    id_vis = row[0]
    m = np.where(view_vis_list == id_vis)
    
    # Get the order of item among the unique items
    id_ite = row[1]
    n = np.where(view_items_list == id_ite)
    
    # Assign events score(no view = 0, view = 1)
    view_vi_matrix[m[0][0], n[0][0]] = 1

 [Back to table of content.](#toc)

<a id = '3.2'></a> 
***
**3.2** *Calculating similarity and event score *
***
<div style="text-align: justify">There are two types of Collaborative Filtering that we can employ here, which is Item-Item Collaborative Filtering and User-User Collaborative Filtering. Both types are demonstrated in this notebook. Using the visitor-item matrix created in the previous section, the item-item and user-user similarity were calculated based on the cosine similarity. Afterwards, a function was defined to calculate the score for each item. The scores for both types of Collaborative Filtering were then predicted.<br><br/>
<div/>

In [13]:
# Calculate cosine similarity for user-user and item-item
view_vis_similarity = pairwise_distances(view_vi_matrix, metric = 'cosine')
view_item_similarity = pairwise_distances(view_vi_matrix.T, metric = 'cosine')

# Defining function to calculate predicted recommendation ranking based on similarity
def score(matrix, similarity, type = 'visitor'):
    if type == 'visitor':
        # Calculate mean user view for each item
        mean_user_event = matrix.mean(axis = 0)
        
        # Calculate difference between event and mean event
        event_diff = (matrix - mean_user_event[np.newaxis, :])
        
        # Predict score for ranking based on user-user
        scr = mean_user_event[np.newaxis, :] + similarity.dot(event_diff) / np.array([np.abs(similarity.sum(axis=1))]).T
        
    elif type == 'item':
        # Predict score for ranking based on item-item
        scr = matrix.dot(similarity) / np.array([np.abs(similarity.sum(axis=1))])
        
    return scr

# visitor-item predicted view score based on user-user
vis_view_ranking = score(view_vi_matrix, view_vis_similarity, type = 'visitor')

# visitor-item predicted view score based on item-item
item_view_ranking = score(view_vi_matrix, view_item_similarity, type = 'item')

 [Back to table of content.](#toc)

<a id = '3.3'></a> 
***
**3.3** *Example of Top 5 recommended item based on result (user-user) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using User-User Collaborative Filtering based on the 'view' events.<br><br/>
<div/>

In [14]:
# First visitor for example
first_user = view_vis_list[0]

# Top 5 recommended item for the 1st visitor (user-user)
index_sort_view_vis = np.flip(np.argsort(vis_view_ranking[0, :]))
itemid_score_sort_view_vis = np.zeros((len(index_sort_view_vis)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_view_vis)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_view_vis[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_view_vis[m] = view_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_view_vis[0], ',', itemid_score_sort_view_vis[1], ',',
     itemid_score_sort_view_vis[2],',', itemid_score_sort_view_vis[3], 
     ',',itemid_score_sort_view_vis[4], '. (user-user collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 257597 are itemid = 37029 , 315543 , 369447 , 12217 , 355994 . (user-user collaborative filtering)


 [Back to table of content.](#toc)

<a id = '3.4'></a> 
***
**3.4** *Example of Top 5 recommended item based on result (item-item) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using Item-Item Collaborative Filtering based on the 'view' events.<br><br/>
<div/>

In [15]:
# Top 5 recommended item for the 1st visitor (item_item)
index_sort_view_item = np.flip(np.argsort(item_view_ranking[0, :]))
itemid_score_sort_view_item = np.zeros((len(index_sort_view_item)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_view_item)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_view_item[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_view_item[m] = view_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_view_item[0], ',', itemid_score_sort_view_item[1], ',',
     itemid_score_sort_view_item[2],',', itemid_score_sort_view_item[3], 
     ',',itemid_score_sort_view_item[4], '. (item-item collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 257597 are itemid = 186702 , 143293 , 60980 , 185115 , 119736 . (item-item collaborative filtering)


 [Back to table of content.](#toc)

## <a id = '4'></a> 4. Collaborative Filtering based on Items Bought

<a id = '4.1'></a> 
***
**4.1** *Data Wrangling* 
***
<div style="text-align: justify"> A new dataframe was created by extracting only the rows containing a 'transaction' event. The 'timestamp', 'event' and 'transactionid' column were dropped from the dataframe because the information were neglegible at that point. Due to technical constraints (memory error), the dataframe needed to be reduced in size. Therefore only the first 1000 unique visitors were considered for this model. After the dataframe was reduced in size, the total number of unique visitors and unique items in this reduced dataframe were then calculated. These two variables were then used to create an empty matrix which will be used as the visitor-item matrix. A for loop was then utilized to fill in the visitor-item matrix with the information of which items were viewed by each visitor.<br><br/>
<div/>

In [16]:
# Extract buy events then drop timestamp, event and transactionid columns
buy_events = events[events['event'].str.contains('transaction')].drop(
    ['timestamp', 'event', 'transactionid'], axis=1)

# Extract buy events of the first 1000 unique visitors
buy_visitor_id = buy_events.visitorid.unique()
fot_buy_vis_id = buy_visitor_id[:1000]

# Extract buy events of first one thousand visitors (fotv) from buy_events dataframe
buy_events_fotv = buy_events[buy_events['visitorid'].isin(fot_buy_vis_id)]

# Count number of unique visitors and unique items
buy_items_list = buy_events_fotv.itemid.unique()
n_buy_items = len(buy_items_list)
buy_vis_list = buy_events_fotv.visitorid.unique()
n_buy_vis = len(buy_vis_list)

# Create visitor_item matrix
buy_vi_matrix = np.zeros((n_buy_vis, n_buy_items))
for row in buy_events_fotv.itertuples(index = False):
    
    # Get the order of visitor among the unique visitors
    id_vis = row[0]
    m = np.where(buy_vis_list == id_vis)
    
    # Get the order of item among the unique items
    id_ite = row[1]
    n = np.where(buy_items_list == id_ite)
    
    # Assign events score(no transaction = 0, transaction = 1)
    buy_vi_matrix[m[0][0], n[0][0]] = 1

 [Back to table of content.](#toc)

<a id = '4.2'></a> 
***
**4.2** *Calculating similarity and event score *
***
<div style="text-align: justify">There are two types of Collaborative Filtering that we can employ here, which is Item-Item Collaborative Filtering and User-User Collaborative Filtering. Both types are demonstrated in this notebook. Using the visitor-item matrix created in the previous section, the item-item and user-user similarity were calculated based on the cosine similarity. The function defined in section 3.2 was again used to calculate the score for each item. The scores for both types of Collaborative Filtering were then predicted.<br><br/>
<div/>

In [17]:
# Calculate cosine similarity for user-user and item-item
buy_vis_similarity = pairwise_distances(buy_vi_matrix, metric = 'cosine')
buy_item_similarity = pairwise_distances(buy_vi_matrix.T, metric = 'cosine')

# visitor-item predicted view score based on user-user
vis_buy_ranking = score(buy_vi_matrix, buy_vis_similarity, type = 'visitor')

# visitor-item predicted view score based on item-item
item_buy_ranking = score(buy_vi_matrix, buy_item_similarity, type = 'item')

 [Back to table of content.](#toc)

<a id = '4.3'></a> 
***
**4.3** *Example of Top 5 recommended item based on result (user-user) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using User-User Collaborative Filtering based on the 'transaction' events.<br><br/>
<div/>

In [18]:
# First visitor for example
first_user = buy_vis_list[0]

# Top 5 recommended item for the 1st visitor (user-user)
index_sort_buy_vis = np.flip(np.argsort(vis_buy_ranking[0, :]))
itemid_score_sort_buy_vis = np.zeros((len(index_sort_buy_vis)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_buy_vis)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_buy_vis[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_buy_vis[m] = buy_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_buy_vis[0], ',', itemid_score_sort_buy_vis[1], ',',
     itemid_score_sort_buy_vis[2],',', itemid_score_sort_buy_vis[3], 
     ',',itemid_score_sort_buy_vis[4], '. (user-user collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 599528 are itemid = 119736 , 420960 , 460553 , 309778 , 7943 . (user-user collaborative filtering)


 [Back to table of content.](#toc)

<a id = '4.4'></a> 
***
**4.4** *Example of Top 5 recommended item based on result (item-item) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using Item-Item Collaborative Filtering based on the 'transaction' events.<br><br/>
<div/>

In [19]:
# Top 5 recommended item for the 1st visitor (item_item)
index_sort_buy_item = np.flip(np.argsort(item_buy_ranking[0, :]))
itemid_score_sort_buy_item = np.zeros((len(index_sort_buy_item)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_buy_item)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_buy_item[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_buy_item[m] = buy_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_buy_item[0], ',', itemid_score_sort_buy_item[1], ',',
     itemid_score_sort_buy_item[2],',', itemid_score_sort_buy_item[3], 
     ',',itemid_score_sort_buy_item[4], '. (item-item collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 599528 are itemid = 360825 , 450565 , 360831 , 361152 , 459354 . (item-item collaborative filtering)


 [Back to table of content.](#toc)

## <a id = '5'></a> 5. Collaborative Filtering based on All Events

<a id = '5.1'></a> 
***
**5.1** *Data Wrangling* 
***
<div style="text-align: justify"> The 'timestamp', 'event' and 'transactionid' column were dropped from the newly created dataframe because the information were neglegible at that point. Due to technical constraints (memory error), the dataframe needed to be reduced in size. Therefore only the first 1000 unique visitors were considered for this model. After the dataframe was reduced in size, the total number of unique visitors and unique items in this reduced dataframe were then calculated. These two variables were then used to create an empty matrix which will be used as the visitor-item matrix. A for loop was then utilized to fill in the visitor-item matrix with the information of which items were viewed by each visitor.<br><br/>
<div/>

In [20]:
# Drop timestamp, event and transactionid columns
all_events = events.drop(
    ['timestamp', 'transactionid'], axis=1)

# Extract all events of the first 1000 unique visitors
all_visitor_id = all_events.visitorid.unique()
fot_all_vis_id = all_visitor_id[:1000]

# Extract all events of first one thousand visitors (fotv) from all_events dataframe
all_events_fotv = all_events[all_events['visitorid'].isin(fot_all_vis_id)]

# Count number of unique visitors and unique items
all_items_list = all_events_fotv.itemid.unique()
n_all_items = len(all_items_list)
all_vis_list = all_events_fotv.visitorid.unique()
n_all_vis = len(all_vis_list)

# Create visitor_item matrix
all_vi_matrix = np.zeros((n_all_vis, n_all_items))
for row in all_events_fotv.itertuples(index = False):
    
    # Get the order of visitor among the unique visitors
    id_vis = row[0]
    m = np.where(all_vis_list == id_vis)
    
    # Get the order of item among the unique items
    id_ite = row[2]
    n = np.where(all_items_list == id_ite)
    
    if row[1] == 'view':
        # Assign events score(view = 1, addtocart = 2, transaction = 3)
        all_vi_matrix[m[0][0], n[0][0]] = 1
        
    elif row[1] == 'addtocart':
        # Assign events score(view = 1, addtocart = 2, transaction = 3)
        all_vi_matrix[m[0][0], n[0][0]] = 2

    elif row[1] == 'transaction':
        # Assign events score(view = 1, addtocart = 2, transaction = 3)
        all_vi_matrix[m[0][0], n[0][0]] = 3


 [Back to table of content.](#toc)

<a id = '5.2'></a> 
***
**5.2** *Calculating similarity and event score *
***
<div style="text-align: justify">There are two types of Collaborative Filtering that we can employ here, which is Item-Item Collaborative Filtering and User-User Collaborative Filtering. Both types are demonstrated in this notebook. Using the visitor-item matrix created in the previous section, the item-item and user-user similarity were calculated based on the cosine similarity. The function defined in section 3.2 was again used to calculate the score for each item. The scores for both types of Collaborative Filtering were then predicted.<br><br/>
<div/>

In [21]:
# Calculate cosine similarity for user-user and item-item
all_vis_similarity = pairwise_distances(all_vi_matrix, metric = 'cosine')
all_item_similarity = pairwise_distances(all_vi_matrix.T, metric = 'cosine')

# visitor-item predicted view score based on user-user
vis_all_ranking = score(all_vi_matrix, all_vis_similarity, type = 'visitor')

# visitor-item predicted view score based on item-item
item_all_ranking = score(all_vi_matrix, all_item_similarity, type = 'item')

 [Back to table of content.](#toc)

<a id = '5.3'></a> 
***
**5.3** *Example of Top 5 recommended item based on result (user-user) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using User-User Collaborative Filtering based on all type of events.<br><br/>
<div/>

In [22]:
# First visitor for example
first_user = all_vis_list[0]

# Top 5 recommended item for the 1st visitor (user-user)
index_sort_all_vis = np.flip(np.argsort(vis_all_ranking[0, :]))
itemid_score_sort_all_vis = np.zeros((len(index_sort_all_vis)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_all_vis)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_all_vis[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_all_vis[m] = all_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_all_vis[0], ',', itemid_score_sort_all_vis[1], ',',
     itemid_score_sort_all_vis[2],',', itemid_score_sort_all_vis[3], 
     ',',itemid_score_sort_all_vis[4], '. (user-user collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 257597 are itemid = 257040 , 37029 , 119736 , 369447 , 12217 . (user-user collaborative filtering)


 [Back to table of content.](#toc)

<a id = '5.4'></a> 
***
**5.4** *Example of Top 5 recommended item based on result (item-item) *
***
<div style="text-align: justify">This section shows an example result of the recommended item using Item-Item Collaborative Filtering based on all type of events.<br><br/>
<div/>

In [23]:
# Top 5 recommended item for the 1st visitor (item_item)
index_sort_all_item = np.flip(np.argsort(item_all_ranking[0, :]))
itemid_score_sort_all_item = np.zeros((len(index_sort_all_item)), dtype=np.int)

# Create array for recommendation score
for m in range(len(index_sort_all_item)):
    
    # Returns the index of the item in the original unique item list
    index = index_sort_all_item[m]
    
    # Returns the itemid according to the index
    itemid_score_sort_all_item[m] = all_items_list[index]
    
# Print out top 5 recommended item for the first user
print('Top 5 recommended item for the 1st user with visitorid =', first_user, 'are itemid =', 
     itemid_score_sort_all_item[0], ',', itemid_score_sort_all_item[1], ',',
     itemid_score_sort_all_item[2],',', itemid_score_sort_all_item[3], 
     ',',itemid_score_sort_all_item[4], '. (item-item collaborative filtering)')

Top 5 recommended item for the 1st user with visitorid = 257597 are itemid = 119736 , 301721 , 186702 , 439963 , 123548 . (item-item collaborative filtering)


 [Back to table of content.](#toc)

# <a id = '6'></a> 6. Concluding Remarks

<div style="text-align: justify">This notebook documents an attempt to develop a recommendation system based on the Retailrocket dataset. Based on the available data, a Collaborative Filtering technique was chosen for the recommendation system. There are two types of Collaborative Filtering, one based on similarity between user who clicked an item (User-User Collaborative Filtering) and one based on similarity between items clicked by a user (Item-Item Collaborative Filtering). Both are presented in this notebook. Due to a technical issue (memory error) the data had to be downsampled, however the general approach remains the same. In conclusion, Item-Item Collaborative Filtering was less computationally intensive compared to User-User Collaborative Filtering. If resource allocation is a factor, then the Item-Item Collaborative Filtering is the recommended approach.
<div/>

 [Back to table of content.](#toc)