# Intro

Recommender systems can be used to evaluate cross selling opportunities on the domain of retail marketing. Further reading about the key concepts of "recommender systems",  "cross selling", "collaborative filtering" and "deep learning" and their applications, one can look related papers such as:

1. Kamakura, W. A., Wedel, M., de Rosa, F., Mazzon, J. A. (2003), Cross-selling through database marketing: A mixed data factor analyzer for data augmentation and prediction. International Journal of Research in Marketing, 20, 45–65.

2. Knott, A., Hayes, A., & Neslin, S. A. (2002), Next-product-to-buy models for cross-selling applications. Journal of Interactive Marketing, 16(3), 59–75.

3. Thuring F., Nielsen J.P., Guillén M., Bolancé C.,(2012), Selecting prospects for cross-selling ﬁnancial products using multivariate credibility, Expert Systems with Applications 39, 8809–8816.

4. Zhang S., Yao L., Sun A., Tay Y., (2018), Deep Learning based Recommender System: A Survey and New Perspectives. ACM Comput. Surv. 1(1), 1-35.

5. Shi Y., Larson M., Hanjalic A., (2014), Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. 47(1) 45. DOI: http://dx.doi.org/10.1145/2556270

6. Hidasi B., Karatzoglou A., (2018), Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 10 pages. DOI: https://doi.org/10.1145/3269206.3271761

.
.

In this notebook, we, mainly aimed at prediction of customers next products to buy using implicit feedback from purchase preferences, and mainly follow [Lazy Programmer Inc.'s](https://www.udemy.com/course/recommender-systems/) -which is a udemy course about recommender systems that we strongly recommend- methodology.

Dataset choosen, famous, Online Retail II. For detailed information please visit:

[https://www.kaggle.com/mashlyn/online-retail-ii-uci](https://www.kaggle.com/mashlyn/online-retail-ii-uci)

and for the original source:

[UCI Repository](https://archive.ics.uci.edu/ml/datasets/Online+Retail+II)

# Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

print(tf.version.VERSION)
print(keras.__version__)

# Data & Preparation

In [None]:
data = pd.read_csv("../input/online-retail-ii-uci/online_retail_II.csv",
                   parse_dates=["InvoiceDate"],
                   dtype={"Customer ID":"object"})

In [None]:
df = data.copy()
df.head()

## A brief of data cleaning

Droping rows with missing values and irrelevant labels

In [None]:
df = df.dropna()
df = df.drop(df[df["Quantity"]<0].index)
df = df.drop(df[df["StockCode"].str.contains("TEST")].index)
df = df.drop(df[df["StockCode"]=="POST"].index)

df = df.sort_values("InvoiceDate")

## Common Functions

First function can be used to obtain lists having unique elements. Second, for generating product purchase sequences and a target sequence having "n_target" length occuring after a sequence of product purchased. The last one for generating a negative sample.  

In [None]:
def unique(list1):
    list_set = set(list1)
    unique_list = (list(list_set))
    return unique_list

def generate_sequence(serie, n_target):
    input_sequence = []
    output_sequence = []
    for x in serie:
        x = unique(x)
        if len(x)>n_target:
            input_sequence.append(x[:-n_target])
            output_sequence.append(x[-n_target:])
    return input_sequence, output_sequence

def neg(x, corp, sample_size=1):
    diff = np.setdiff1d(corp, list(x))
    ind = np.random.permutation(len(diff))
    return diff[ind[:int(sample_size*len(x))]]

By following cells, we try to generate customers' purchase sequences of distinct products. 

In [None]:
by_customer = df.groupby("Customer ID", as_index=False).agg(
    {"StockCode": [lambda x: list(x)]}
)
sequential_df = by_customer["StockCode"].rename(
    columns={"<lambda>":"purchase_sequence"}
)
sequential_df["CustomerID"] = by_customer["Customer ID"]
sequential_df["product_count"] = sequential_df["purchase_sequence"].apply(
    lambda x: len(unique(list(x)))
)

We choose some hyperparameter values arbitrarily but it can be a good practice to look at some statistics like below: number of distinct products purchased.  

In [None]:
sequential_df["product_count"].describe()

In [None]:
n_target = 1
n_frequency = 3
prod_embedding_size = 16
user_embedding_size = 16

corp = sequential_df.explode("purchase_sequence")["purchase_sequence"].unique()
frequent_df = sequential_df[(sequential_df["product_count"]>n_frequency)]

input_seq, output_seq = generate_sequence(
    frequent_df["purchase_sequence"],
    n_target
    )

frequent_df["input_sequence"] = input_seq
frequent_df["output_sequence"] = output_seq
frequent_df = frequent_df[["CustomerID", "input_sequence", "output_sequence"]]
frequent_df = frequent_df.explode("input_sequence")
frequent_df["purchase"] = 1
frequent_df = frequent_df.set_index("CustomerID", drop=True)
frequent_df.head(10)

## Negative Sampling

Since all instances prepared so far represent positive-only feedback, we try to supply some negative information to the model. Negative instances are chosen from products not purchased for a particular customer.
> sample_size=1 

means there is 1 non-purchased product to be selected randomly.

In [None]:
new_df = frequent_df.reset_index().groupby("CustomerID").agg({"input_sequence": (lambda x: list(x))})
new_df["neg"] = new_df["input_sequence"].apply(lambda y: neg(y, corp, 5))
ndf = new_df.explode("neg")[["neg"]]
ndf["purchase"] = 0
ndf = ndf.rename(columns={"neg":"input_sequence"})

pdf = frequent_df[["input_sequence", "purchase"]]

sample_df = pdf.append(ndf)
sample_df = sample_df.reset_index()
sample_df = sample_df.sort_values("CustomerID", ignore_index=True)

display(sample_df.info())
display(sample_df.head(50))

## Encoding

As a last step we try to encode user and product features. Method taken from [keras.io](https://keras.io/examples/structured_data/collaborative_filtering_movielens/) examples. We take the data as train & validation, but the better practice is holding out some samples in advance as test data.    

In [None]:
cust_ids = sample_df["CustomerID"].unique().tolist()
cust2cust_encoded = {x: i for i, x in enumerate(cust_ids)}
cust_encoded2cust = {i: x for i, x in enumerate(cust_ids)}
prod_ids = corp
prod2prod_encoded = {x: i for i, x in enumerate(prod_ids)}
prod_encoded2prod = {i: x for i, x in enumerate(prod_ids)}
sample_df["cust"] = sample_df["CustomerID"].map(cust2cust_encoded)
sample_df["prod"] = sample_df["input_sequence"].map(prod2prod_encoded)

num_custs = len(cust2cust_encoded)
num_prods = len(prod2prod_encoded)
sample_df["purchase"] = sample_df["purchase"].values.astype(np.float32)

print(
    "Number of Customers: {}, Number of Products: {}, Purchase: {}, Not Purchase: {}".format(
        num_custs, num_prods, 1, 0
    )
)

sample_df = sample_df.sample(frac=1, random_state=52)

# Matrix Factorization

In the first part, we try to implement a collaborative filtering system using embedding layers for user-item instances. Python code mostly adapted from the notebooks of:   

* [colinmorris-1](https://www.kaggle.com/colinmorris/embedding-layers)
* [colinmorris-2](https://www.kaggle.com/colinmorris/matrix-factorization)
* [rajmehra03-1](https://www.kaggle.com/rajmehra03/a-detailed-explanation-of-keras-embedding-layer)
* [rajmehra03-2](https://www.kaggle.com/rajmehra03/cf-based-recsys-by-low-rank-matrix-factorization)
* [rounakbanik](https://www.kaggle.com/rounakbanik/movie-recommender-systems)
* [keras.io examples](https://keras.io/examples/structured_data/collaborative_filtering_movielens/)

and we try to evaluate the system based on the instructions from: 

[jamesloy](https://www.kaggle.com/jamesloy/deep-learning-based-recommender-systems)



In [None]:
cust_input = layers.Input(shape=(1,), name="cust_id", dtype=tf.int32)
prod_input = layers.Input(shape=(1,), name="prod_id", dtype=tf.int32)
 
cust_embedding = layers.Embedding(num_custs,
                                  user_embedding_size,
                                  name="cust_emb")
cust_bias = layers.Embedding(num_custs, 1, name="cust_bias")
 
prod_embedding = layers.Embedding(num_prods,
                                  prod_embedding_size,
                                  name="prod_emb")
prod_bias = layers.Embedding(num_prods, 1, name="prod_bias")
 
cust_vector = cust_embedding(cust_input)
cust_bias = cust_bias(cust_input)
prod_vector = prod_embedding(prod_input)
prod_bias = prod_bias(prod_input)
 
dot_cust_product = layers.Dot(name="Dot", axes=2)([cust_vector, prod_vector])
output = layers.Add(name="Add")([dot_cust_product, cust_bias, prod_bias])
output = layers.Flatten(name="Flat")(output)
 
model_X = keras.Model([cust_input, prod_input], output, name="model_x")
 
model_X.compile(loss=tf.keras.losses.MeanSquaredError(),
                optimizer=keras.optimizers.Adam(learning_rate=0.0001))
 
model_X.summary()

In [None]:
es = keras.callbacks.EarlyStopping(monitor="val_loss",
                                   mode="min",
                                   verbose=1,
                                   patience=5)

history = model_X.fit([sample_df["cust"].values, sample_df["prod"].values],
                      sample_df["purchase"].values,
                      batch_size=256,
                      epochs=20,
                      verbose=1,
                      validation_split=0.1)

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("embedding loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()

# Multi-Layer Perceptron

In [None]:
hidden_units = [128, 64]
 
user_id_input = layers.Input(shape=(1,), name="user_id", dtype=tf.int32)
prod_id_input = layers.Input(shape=(1,), name="prod_id", dtype=tf.int32)
user_embedded = layers.Embedding(num_custs,
                                 user_embedding_size, 
                                 input_length=1,
                                 embeddings_regularizer=keras.regularizers.l2(1e-7),
                                 name="user_embedding")(user_id_input)
prod_embedded = layers.Embedding(num_prods,
                                 prod_embedding_size,
                                 input_length=1,
                                 embeddings_regularizer=keras.regularizers.l2(1e-6),
                                 name="prod_embedding")(prod_id_input)
 
concatenated = layers.Concatenate(name="concat")([user_embedded, prod_embedded])
out = layers.Flatten(name="flat")(concatenated)
 
for n_hidden in hidden_units:
    out = layers.Dense(n_hidden,
                       activation="relu",
                       kernel_regularizer=keras.regularizers.l2(0.001))(out)
    out = layers.Dropout(0.4)(out)
    out = layers.BatchNormalization()(out)

out = layers.Dense(1, activation="sigmoid", name="prediction")(out)
 
model_Y = keras.Model(inputs = [user_id_input, prod_id_input],
                           outputs = out, name="model_y")
 
model_Y.compile(loss=keras.losses.MeanSquaredError(),
                     optimizer=keras.optimizers.Adam(learning_rate=0.0001))
 
model_Y.summary()

In [None]:
es = keras.callbacks.EarlyStopping(monitor="val_loss",
                                   mode="min",
                                   verbose=1,
                                   patience=5)

history = model_Y.fit([sample_df["cust"].values, sample_df["prod"].values],
                      sample_df["purchase"].values,
                      batch_size=256,
                      epochs=20,
                      verbose=1,
                      validation_split=0.1)

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("mlp loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()

# Neural Collaborative Filtering

Neural Collaborative Filtering (NCF) is one of the recommendation system frameworks, based on neural networks, proposed by He, et. al. (2017). According to them a neural network can develop a model by learning item user interactions as a key factor of a collaboritive filtering from implicit feedback. Python code mostly developed thanks to beautiful notebooks like:

[fuzzywizard](https://www.kaggle.com/fuzzywizard/rec-sys-collaborative-filtering-dl-techniques#4-Matrix-Factorization-using-Deep-Learning-(Keras))

[rajmehra03](https://www.kaggle.com/rajmehra03/cf-based-recsys-by-low-rank-matrix-factorization)

Please see for detailed information about NCF:

He, X., Liao, L., Zhang, H., Nie, L., Hu, X.,Chua, T.,  (2017), Neural Collaborative Filtering. WWW'17: Proceedings of the 26th International Conference on World Wide Web 173–182 DOI: http://dx.doi.org/10.1145/3038912.3052569

In [None]:
hidden_units = [128, 64]

user_id_input = layers.Input(shape=(1,), name="user_id", dtype=tf.int32)
prod_id_input = layers.Input(shape=(1,), name="prod_id", dtype=tf.int32)
user_embedded = layers.Embedding(num_custs,
                                 user_embedding_size, 
                                 input_length=1,
                                 embeddings_regularizer=keras.regularizers.l2(1e-7),
                                 name="user_emb")(user_id_input)
cust_bias = layers.Embedding(num_custs, 1, name="cust_bias")(user_id_input)
 
prod_embedded = layers.Embedding(num_prods,
                                  user_embedding_size,
                                  embeddings_regularizer=keras.regularizers.l2(1e-6),
                                  name="prod_emb")(prod_id_input)
prod_bias = layers.Embedding(num_prods, 1, name="prod_bias")(prod_id_input)
 
dot_cust_product = layers.Dot(name="Dot", axes=2)([user_embedded, prod_embedded])
X = layers.Add(name="Add")([dot_cust_product, cust_bias, prod_bias])
X = layers.Flatten(name="Flat")(X)

concatenated = layers.Concatenate(name="concat")([user_embedded, prod_embedded])
Y = layers.Flatten(name="flat")(concatenated)
 
for n_hidden in hidden_units:
    Y = layers.Dense(n_hidden,
                     activation="relu",
                     kernel_regularizer=keras.regularizers.l2(0.001))(Y)
    Y = layers.Dropout(0.4)(Y)
    Y = layers.BatchNormalization()(Y)

Y = layers.Dense(1, activation="sigmoid", name="prediction")(Y)

Z = layers.Add(name="final")([X, Y])

model_Z = keras.Model(inputs = [user_id_input, prod_id_input],
                      outputs = Z, name="model_z")
 
model_Z.compile(loss=keras.losses.MeanSquaredError(),
                optimizer=keras.optimizers.Adam(learning_rate=0.0001))
 
model_Z.summary()

In [None]:
es = keras.callbacks.EarlyStopping(monitor="val_loss",
                                   mode="min",
                                   verbose=1,
                                   patience=5)

history = model_Z.fit([sample_df["cust"].values, sample_df["prod"].values],
                      sample_df["purchase"].values,
                      batch_size=256,
                      epochs=20,
                      verbose=1,
                      validation_split=0.1)

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("ncf loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()

# Model Evaluation

We try to measure the model performance by providing candidate products to the model and evaluating the outputs. Candidate products are merged with 49 products selected from non-purchased products and a target product which respresented in output_sequence variable. If target product occures in the top k of the model outputs, we count this event as a hit.

On the other hand; Hidasi and Karatzoglou (2018) define "recall@k" as an evaluatinon metric as "the proportion of cases having the desired item amongst the top-k items in all test cases." Moreover, one another evaluation metric is "MRR@k", which is the average of reciprocal ranks of the target items. The reciprocal rank is set to zero if the rank is above k.

In [None]:
def get_metrics(model, k, size=1000):  
    hit = 0
    mrr = 0
    counter = size
    for s in range(counter):
        cust_id = sample_df["CustomerID"].unique()[s]
        cust_encoder = cust2cust_encoded.get(cust_id)
        purchased = frequent_df[(frequent_df.index==cust_id) & (frequent_df["purchase"]==1)]
        candidates = frequent_df[~frequent_df["input_sequence"].isin(purchased["input_sequence"].values)]["input_sequence"][:49]
        candidates = set(candidates).intersection(set(prod2prod_encoded.keys()))
        candidates = candidates.union(set(frequent_df[frequent_df.index==cust_id]["output_sequence"].values[0]))
        candidates = [[prod2prod_encoded.get(x)] for x in list(candidates)]
        ids = np.stack([[cust_encoder]]*len(candidates))
        y_pred = model.predict([ids, np.array(list(candidates), dtype="int32")]).flatten()
        t = frequent_df.loc[(frequent_df.index==cust_id), "output_sequence"].values[0][0]
        recommend = []
        rr = 0
        for i in range(k):
            p = prod_encoded2prod.get(candidates[y_pred.argsort()[-(i+1)]][0])
            recommend.append(p)
            if (p==t):
                rr = 1/(i+1)
                hit = hit + 1
        mrr = mrr + rr
        
    return (hit/counter), (mrr/counter)

In [None]:
recallx, mrrx = get_metrics(model_X, 10)
print("Recall@: ", recallx)
print("MRR@: ", mrrx)

In [None]:
recally, mrry = get_metrics(model_Y, 10)
print("Recall@: ", recally)
print("MRR@: ", mrry)

In [None]:
recallz, mrrz = get_metrics(model_Z, 10)
print("Recall@: ", recallz)
print("MRR@: ", mrrz)

# Critiques

Please criticise this study and faulty issues other than hyperparameter tuning. Any comment is more precious than upvotes for this fresh notebook. To compare metrics please see: https://medium.com/decathlondevelopers/building-a-rnn-recommendation-engine-with-tensorflow-505644aa9ff3. They developed a model for more than 10,000 different products. 

Some hyperparameters which should be tuned.

* Number and unit numbers of *hidden_units*
* *prod_embedding_size* and *user_embedding_size*
* regularizers, learning rate, activation functions, batch size, dropout rates
* *n_frequency* number of frequent products
* *sample_size* number of negative samples corresponding to a positive sample

Some topics which are ambiguous:

* Can prediction performance be upgraded for this model? 
* Is there a need for negative sampling? Is there a room for improvement by adjusting negative sample size?
* Are there more suitable or effective techniques to measure the performance of the model?
* Can execution time be shortened?
* Any other effective ways to predict next-product-to-buy using deep learning?
.
.

Sorry for language...
Thanks in advance