# E-Commerce Recommender System with Tensorflow

### Objective

In this notebook we will be building a recommender system

### Procedure

1. Recommender Systems (Overview)
2. Matrix Factorization for Recommender Systems
    1. Dataset preparation and baseline
    2. Matrix factorization
    3. Implicit feedback datasets
    4. SGD-based matrix factorization
    5. Bayesian personalized ranking
3. RNN for Recommender Systems
    1. Data preparation and baseline
    2. RNN rec systems in Tensorflow

### Dataset and Problem Description

* Dataset: [UCI Online Retail Dataset](http://archive.ics.uci.edu/ml/datasets/online+retail)

### Topics Covered

* Basics of Recommender Systems
* Matrix Factorization for Recommender Systems
* Bayesian Personalized Ranking
* Advanced Recommender Systems based on Recurrent Neural Nets

### We should be able to after this project

* Be able to prepare data for training a recommender system
* How to build models with Tensorflow
* Perform simple evaluation of quality of these models

### Theoretical Foundations

* Recommender systems are applied to help recommend things customers might like in order to sell more products, and recommendation engines do this task very well and use machine learning techniques.

**What is a Recommendation Engine?**

A recommendation engine filters data using different algorithms and recommends the most relevant item to a user. A recommendation engine first captures a users behavior, and then based on past behavior recommends products that users would be likely to buy

In our eCommerce recommender system, it will be a "customers that bought X also bought Y"


## Step 1: Recommender Systems

* **Recommender System:** The task of a RS is to take in a list of possible items and rank them according to preferences of particular users. This list is referred to as a personalized ranking or a **recommendation**
    * Recommendations are often based off of past data, this historical data includes
        * Data includes: clicks, visits, transaction history, etc.
    * ML uses this historical data to find patterns in the behavior of users and come up with the best recommendations
    * Great for companies to sell more products
* In this chapter we will be implementing multiple RecSys algorithms with Tensorflow

Dataset is using UCI Online Retail dataset

* 25900 transactions, with each transaction containing 20 items
* Total items in matrix would be 540,000
* Transactions were made by 4,300 users

Features of dataset

* Invoice No
* Stock Code
* Description
* Quantity
* UnitPrice
* CustomerID
* Country



## Step 2: Matrix Factorization for Recommender Systems

In this section we will do the following:

1. Define the problem
2. Establish a few baselines
3. Implement classical Matrix factorization algorithm
4. Implement Bayesian Personalized Ranking

## Step 2.1 Data Preparation and Baseline

Steps

1. Read the excel data
2. Save the data as a pickle file and load it as a pickle file
3. Clean data
    1. Column names are in capital letters, so lowercase them
    2. Filter out transactions that are "returns"
    3. Remove transactions from unknown users
4. 
    

In [3]:
import tensorflow as tf
import pandas as pd
import numpy as np
import scipy.sparse as sp
from tqdm import tqdm

In [4]:
# Read data
df = pd.read_excel('../data/raw/Online Retail.xlsx')

In [5]:
# Reading takes time so we will save the dataframe into a picklefile
import pickle
with open('../data/processed/df_retail.bin', 'wb') as f_out:
    pickle.dump(df, f_out)

In [6]:
# Pickle file is faster to read, we will now use Pickled version
with open('../data/processed/df_retail.bin', 'rb') as f_in:
    df = pickle.load(f_in)


In [7]:
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


There are some problems with the data so far, here are the following problems

* column names are in capital letters, so lowercase them
* some transactions are returns, not of interest to us
* some transaction belong to unknown users



In [8]:
df.columns = df.columns.str.lower() # make lowercase
# remove transactions that are REturns
df = df[~df.invoiceno.astype('str').str.startswith('C')].reset_index(drop=True)
# remove transactions from unknown users, assign -1 to them
df.customerid = df.customerid.fillna(-1).astype('int32')

Now we will encode all item IDs (stockcode) with integers. 

We will do it by building a mapping from each code to some unique index number

In [9]:
stockcode_values = df.stockcode.astype('str')
stockcodes = sorted(set(stockcode_values))
stockcodes = {c: i for (i, c) in enumerate(stockcodes)}
df.stockcode = stockcode_values.map(stockcodes).astype('int32')

Now we will split the dataset into train, validation and test parts. That means we will have 3 training sets

* Training set: before 2011.10.09 (around 10 months of data, approximately 378,500 rows)
* Validation set: between 2011.10.09 and 2011.11.09 (one month of data, approximately 64,500 rows)
* Test set: after 2011.11.09 (also one month, approximately 89,000 rows)

Since we have e-commerce transactions data, the most sensible way to do the split is based on time. So we will use:

In [16]:
df_train = df[df.invoicedate < '2011-10-09']
df_val = df[(df.invoicedate >= '2011-10-09') & (df.invoicedate <= '2011-11-09')]
df_test = df[df.invoicedate >= '2011-11-09']

In this section, we will consider the following (very simplified) recommendation scenario:

The user enters the website.
We present five recommendations.
The user assesses the lists, maybe buys some things from there, and then continues shopping as usual.


In this section, we will consider the following (very simplified) recommendation scenario:

The user enters the website.
We present five recommendations.
The user assesses the lists, maybe buys some things from there, and then continues shopping as usual.



In [17]:
# setting a baseline, calculate how many of each item was bought, then
# take the frequent 5 items and recommend them to all users 
# (recommendations by popularity)
top = df_train.stockcode.value_counts().head(5).index.values

In [19]:
top # these are the top 5 product stock codes that got bought
    # from the dataset (top 5 products purchased)

array([3527, 3506, 1347, 2730,  180])

In [14]:
num_groups = len(df_val.invoiceno.drop_duplicates())
base = np.tile(top, num_groups).reshape(-1, 5)

In [15]:
base

array([[3527, 3506, 1347, 2730,  180],
       [3527, 3506, 1347, 2730,  180],
       [3527, 3506, 1347, 2730,  180],
       ...,
       [3527, 3506, 1347, 2730,  180],
       [3527, 3506, 1347, 2730,  180],
       [3527, 3506, 1347, 2730,  180]])

In [23]:
# See where a transaction finishes, and where the
# next one starts
def group_indptr(df):
    # At each row index, we compare the current index
    # with the previous one, and if it is different
    # we record the index. We use this using the shift() method
    indptr, = np.where(df.invoiceno != df.invoiceno.shift())
    indptr = np.append(indptr, len(df)).astype('int32')
    return indptr


In [24]:
# Pointers array for the validation set
val_indptr = group_indptr(df_val)

In [25]:
from numba import njit

'''

The logic of this function is straight forward, 

For each transaction, we check how many items we predicted correctly,
which is the 'tp' variable

At the end, we divide 'tp' by the total number of predictions, which
is the size of the prediction matrix, that the number of transactions
times 5 in our case.

@njit is a decorator that tells numba to optimize the code, where it 
analyzes the code using the JIT compiler (just-in-time) to translate the
function to native code

When the function is compiled, it runs multiple orders of magnitude faster
comparable to native code written in C


'''
@njit 
def precision(group_indptr, true_items, predicted_items):
    tp = 0 # True # of predictions

    n, m = predicted_items.shape # total number of predictions our system made

    for i in range(n):
        group_start = group_indptr[i]
        group_end = group_indptr[i + 1]
        
        # Groups a single transaction that manifested
        # in multiple rows in the CSV
        group_true_items = true_items[group_start:group_end]
        
        # Checking precision
        for item in group_true_items:
            for j in range(m):
                if item == predicted_items[i, j]:
                    tp = tp + 1
                    continue

    # return the # of correct predictions / total predictions
    return tp / (n * m)


In [28]:
'''

Now we will check what the precision of this baseline number is 

'''

val_items = df_val.stockcode.values
precision(val_indptr, val_items, base)



0.0642299794661191

Executing this code should produce 0.064. That is, in 6.4% of the cases we made the correct recommendation. This means that the user ended up buying the recommended item only in 6.4% cases.

Now we will use a technique such as matrix factoriazation

## Step 2.2: Matrix factorization

We will be using Matrix Factorization for our recommendation system, it is powerful, scalable, and is easy to implement and deploy

We will optimize (minimize) the cost function by using SVD with a regularization term above.

Regularization is used so that the optimization function doesn't overfit our training model to the data, so our weights don't get thrown off by certain data points.

## Step 2.3: Implicit Feedback Datasets

So the thing with collecting data for recommender systems to work well is that we need to first of all, collect a lot of data. This comes in 2 main ways, explicit feedback and implicit feedback.

Explicit Feedback is given by the users explicity where a user goes onto a website and we ask how much they would rate a move from 1-5 stars

Implicit Feedback has to do with data a system collected that users do not explicitly give by using their browsing history, clicks, page time, etc. etc. Mainly interaction information.

Our dataset tells us what the users previously bought, but does not tell us what the users do not like. **We do not know if the usres did not buy an items because they did not like it or just because they did not know the iterm existed**

We can luckily still apply matrix factorization to implicit datasets

1. Use ALS in implicit library to get a baseline stronger than the previous one, prepare data the way implicit expects

In [29]:
# Constructing user-item matrix X, translate both
# users and items into IDs, so we can map each user to a row of x
# and an iterm to a column of X
df_train_user = df_train[df_train.customerid != -1].reset_index(drop=True)
customers = sorted(set(df_train_user.customerid))
customers = {c: i for (i, c) in enumerate(customers)}
df_train_user.customerid = df_train_user.customerid.map(customers)

# apply same procedure to validation set
df_val.customerid = df_val.customerid.apply(lambda c: customers.get(c, -1))

# use integer codes to construct the matrix X
uid = df_train_user.customerid.values.astype('int32')
iid = df_train_user.stockcode.values.astype('int32')
ones = np.ones_like(uid, dtype='uint8')

X_train = sp.csr_matrix((ones, (uid, iid)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value
