# Intro

Recommender systems can be used to evaluate cross selling opportunities on the domain of retail marketing. Further reading about the key concepts of "recommender systems",  "cross selling", "collaborative filtering" and "deep learning" and their applications, one can look related papers such as:

1. Kamakura, W. A., Wedel, M., de Rosa, F., Mazzon, J. A. (2003), Cross-selling through database marketing: A mixed data factor analyzer for data augmentation and prediction. International Journal of Research in Marketing, 20, 45–65.

2. Knott, A., Hayes, A., & Neslin, S. A. (2002), Next-product-to-buy models for cross-selling applications. Journal of Interactive Marketing, 16(3), 59–75.

3. Thuring F., Nielsen J.P., Guillén M., Bolancé C.,(2012), Selecting prospects for cross-selling ﬁnancial products using multivariate credibility, Expert Systems with Applications 39, 8809–8816.

4. Zhang S., Yao L., Sun A., Tay Y., (2018), Deep Learning based Recommender System: A Survey and New Perspectives. ACM Comput. Surv. 1(1), 1-35.

5. Shi Y., Larson M., Hanjalic A., (2014), Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. 47(1) 45. DOI: http://dx.doi.org/10.1145/2556270

6. Hidasi B., Karatzoglou A., (2018), Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 10 pages. DOI: https://doi.org/10.1145/3269206.3271761

.
.

We try to implement a collaborative filtering system using embedding layers for user-item instances. We, mainly aimed at prediction of customers next products to buy using implicit feedback from purchase preferences. Python code mostly adapted from the notebooks of:   

* [colinmorris-1](https://www.kaggle.com/colinmorris/embedding-layers)
* [colinmorris-2](https://www.kaggle.com/colinmorris/matrix-factorization)
* [rajmehra03-1](https://www.kaggle.com/rajmehra03/a-detailed-explanation-of-keras-embedding-layer)
* [rajmehra03-2](https://www.kaggle.com/rajmehra03/cf-based-recsys-by-low-rank-matrix-factorization)
* [rounakbanik](https://www.kaggle.com/rounakbanik/movie-recommender-systems)
* [keras.io examples](https://keras.io/examples/structured_data/collaborative_filtering_movielens/)

and we try to evaluate the system based on the instructions from: 

[jamesloy](https://www.kaggle.com/jamesloy/deep-learning-based-recommender-systems)

Dataset choosen, famous, Online Retail II. For detailed information please visit:

[https://www.kaggle.com/mashlyn/online-retail-ii-uci](https://www.kaggle.com/mashlyn/online-retail-ii-uci)

and for the original source:

[UCI Repository](https://archive.ics.uci.edu/ml/datasets/Online+Retail+II)

# Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers

# Data & Preparation

In [None]:
data = pd.read_csv("../input/online-retail-ii-uci/online_retail_II.csv",
                   parse_dates=["InvoiceDate"],
                   dtype={"Customer ID":"object"})

In [None]:
df = data.copy()
df.head()

## A brief of data cleaning

Droping rows with missing values and irrelevant labels

In [None]:
df = df.dropna()
df = df.drop(df[df["Quantity"]<0].index)
df = df.drop(df[df["StockCode"].str.contains("TEST")].index)
df = df.drop(df[df["StockCode"]=="POST"].index)

df = df.sort_values("InvoiceDate")

## Common Functions

First function can be used to obtain lists having unique elements. Second, for generating product purchase sequences and a target sequence having "n_target" length occuring after a sequence of product purchased. The last one for generating a negative sample.  

In [None]:
def unique(list1):
    list_set = set(list1)
    unique_list = (list(list_set))
    return unique_list

def generate_sequence(serie, n_target):
    input_sequence = []
    output_sequence = []
    for x in serie:
        x = unique(x)
        if len(x)>n_target:
            input_sequence.append(x[:-n_target])
            output_sequence.append(x[-n_target:])
    return input_sequence, output_sequence

def agg(x, corp, sample_size=1):
    diff = np.setdiff1d(corp, list(x))
    ind = np.random.permutation(len(diff))
    return diff[ind[:int(sample_size*len(x))]]

By following cells, we try to generate customers' purchase sequences of distinct products. 

In [None]:
by_customer = df.groupby("Customer ID", as_index=False).agg(
    {"StockCode": [lambda x: list(x)]}
)
sequential_df = by_customer["StockCode"].rename(
    columns={"<lambda>":"purchase_sequence"}
)
sequential_df["CustomerID"] = by_customer["Customer ID"]
sequential_df["product_count"] = sequential_df["purchase_sequence"].apply(
    lambda x: len(unique(list(x)))
)

We choose some hyperparameter values arbitrarily but it can be a good practice to look at some statistics like below: number of distinct products purchased.  

In [None]:
sequential_df["product_count"].describe()

In [None]:
n_target = 1
n_embedding = 16
n_frequency = 3
corp = sequential_df.explode("purchase_sequence")["purchase_sequence"].unique()
frequent_df = sequential_df[(sequential_df["product_count"]>n_frequency)]

input_seq, output_seq = generate_sequence(
    frequent_df["purchase_sequence"],
    n_target
    )

frequent_df["input_sequence"] = input_seq
frequent_df["output_sequence"] = output_seq
frequent_df = frequent_df[["CustomerID", "input_sequence", "output_sequence"]]
frequent_df = frequent_df.explode("input_sequence")
frequent_df["purchase"] = 1
frequent_df = frequent_df.set_index("CustomerID", drop=True)
frequent_df.head(10)

## Negative Sampling & Merging

Since all instances prepared so far represent positive-only feedback, we try to supply some negative information to the model. Negative instances are chosen from products not purchased for a particular customer.
> sample_size=1 

means there is 1 non-purchased product to be selected randomly.

In [None]:
new_df = frequent_df.reset_index().groupby("CustomerID").agg({"input_sequence": (lambda x: list(x))})
new_df["agg"] = new_df["input_sequence"].apply(lambda y: agg(y, corp, 1))
ndf = new_df.explode("agg")[["agg"]]
ndf["purchase"] = 0
ndf = ndf.rename(columns={"agg":"input_sequence"})

pdf = frequent_df[["input_sequence", "purchase"]]

sample_df = pdf.append(ndf)
sample_df = sample_df.reset_index()
sample_df = sample_df.sort_values("CustomerID", ignore_index=True)

display(sample_df.info())
display(sample_df.head(50))

## Encoding & Splitting

As a last step we try to encode user and product features. Method taken from [keras.io](https://keras.io/examples/structured_data/collaborative_filtering_movielens/) examples. We take the data as train & validation, but the better practice is holding out some samples in advance as test data.    

In [None]:
cust_ids = sample_df["CustomerID"].unique().tolist()
cust2cust_encoded = {x: i for i, x in enumerate(cust_ids)}
cust_encoded2cust = {i: x for i, x in enumerate(cust_ids)}
prod_ids = corp
prod2prod_encoded = {x: i for i, x in enumerate(prod_ids)}
prod_encoded2prod = {i: x for i, x in enumerate(prod_ids)}
sample_df["cust"] = sample_df["CustomerID"].map(cust2cust_encoded)
sample_df["prod"] = sample_df["input_sequence"].map(prod2prod_encoded)

num_custs = len(cust2cust_encoded)
num_prods = len(prod2prod_encoded)
sample_df["purchase"] = sample_df["purchase"].values.astype(np.float32)

print(
    "Number of Customers: {}, Number of Products: {}, Purchase: {}, Not Purchase: {}".format(
        num_custs, num_prods, 1, 0
    )
)

sample_df = sample_df.sample(frac=1, random_state=52)
X = sample_df[["cust", "prod"]].values
y = sample_df["purchase"].values

train_indices = int(0.8 * sample_df.shape[0])
X_train, X_val, y_train, y_val = (X[:train_indices],
                                  X[train_indices:],
                                  y[:train_indices],
                                  y[train_indices:])

# Baseline Model & Training

In [None]:
class RecommenderNet(keras.Model):
    def __init__(self, num_custs, num_prods, embedding_size, **kwargs):
        super(RecommenderNet, self).__init__(**kwargs)
        self.num_custs = num_custs
        self.num_prods = num_prods
        self.embedding_size = embedding_size
        self.cust_embedding = layers.Embedding(
            num_custs,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-7),
        )
        self.cust_bias = layers.Embedding(num_custs, 1)
        self.prod_embedding = layers.Embedding(
            num_prods,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.prod_bias = layers.Embedding(num_prods, 1)

    def call(self, inputs):
        cust_vector = self.cust_embedding(inputs[:, 0])
        cust_bias = self.cust_bias(inputs[:, 0])
        prod_vector = self.prod_embedding(inputs[:, 1])
        prod_bias = self.prod_bias(inputs[:, 1])
        dot_cust_product = tf.tensordot(cust_vector, prod_vector, 2)
        x = dot_cust_product + cust_bias + prod_bias
        return tf.nn.sigmoid(x)

model = RecommenderNet(num_custs, num_prods, n_embedding)

model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=keras.optimizers.Adam(learning_rate=0.001))

In [None]:
es = keras.callbacks.EarlyStopping(monitor="val_loss",
                                   mode="min",
                                   verbose=1,
                                   patience=5)

history = model.fit(X_train, y_train,
                    batch_size=256,
                    epochs=20,
                    verbose=1,
                    validation_data=(X_val, y_val))

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("embedding loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="upper left")
plt.show()

# Model Evaluation

We try to measure the model performance by providing candidate products to the model and evaluating the outputs. Candidate products are merged with 49 products selected from non-purchased products and a target product which respresented in output_sequence variable. If target product occures in the top 20 of the model outputs, we count this event as a hit.

On the other hand; Hidasi and Karatzoglou (2018) define "recall@20" as an evaluatinon metric as "the proportion of cases having the desired item amongst the top-20 items in all test cases."

## Output Examples

In [None]:
cust_id = sample_df["CustomerID"].sample(1).iloc[0]
cust_encoder = cust2cust_encoded.get(cust_id)
purchased = frequent_df[(frequent_df.index==cust_id) & (frequent_df["purchase"]==1)]

candidates = frequent_df[~frequent_df["input_sequence"].isin(purchased["input_sequence"].values)]["input_sequence"][:49]
candidates = set(candidates).intersection(set(prod2prod_encoded.keys()))
candidates = candidates.union(set(frequent_df[frequent_df.index==cust_id]["output_sequence"].values[0]))
candidates = [[prod2prod_encoded.get(x)] for x in list(candidates)]

cust_prod_array = np.hstack(([[cust_encoder]] * len(candidates), candidates))

vals = model.predict(cust_prod_array).flatten()
top_ratings_indices = vals.argsort()[-20:][::-1]

recommended_prod_ids = [prod_encoded2prod.get(candidates[x][0]) for x in top_ratings_indices]

print("Showing recommendations for user: {}".format(cust_id))
print("====" * 12)
print("Products purchased from customer")
print("----" * 8)
print(frequent_df[frequent_df.index==cust_id])

print("----" * 8)
print("Top 10 product recommendations")
print("----" * 8)
print(recommended_prod_ids)

In [None]:
counter = 0
size = 100
for s in range(size):
    cust_id = sample_df["CustomerID"].unique()[s]
    cust_encoder = cust2cust_encoded.get(cust_id)
    purchased = frequent_df[(frequent_df.index==cust_id) & (frequent_df["purchase"]==1)]

    candidates = frequent_df[~frequent_df["input_sequence"].isin(purchased["input_sequence"].values)]["input_sequence"][:49]
    candidates = set(candidates).intersection(set(prod2prod_encoded.keys()))
    candidates = candidates.union(set(frequent_df[frequent_df.index==cust_id]["output_sequence"].values[0]))
    candidates = [[prod2prod_encoded.get(x)] for x in list(candidates)]

    cust_prod_array = np.hstack(([[cust_encoder]] * len(candidates), candidates))

    vals = model.predict(cust_prod_array).flatten()
    top_ratings_indices = vals.argsort()[-20:][::-1]

    recommended_prod_ids = [prod_encoded2prod.get(candidates[x][0]) for x in top_ratings_indices]
    target_prod_ids = frequent_df.loc[(frequent_df.index==cust_id), "output_sequence"].values[0]
    if len(np.setdiff1d(target_prod_ids, recommended_prod_ids)) < n_target:
        counter = counter + 1
        
print("recall@20 for first", size, " input: ", counter/size)

# Critics

Please criticise this study and faulty issues other than hyperparameter tuning. Any comment is more precious than upvotes for this fresh notebook. To compare metrics please see: https://medium.com/decathlondevelopers/building-a-rnn-recommendation-engine-with-tensorflow-505644aa9ff3. They developed a model for more than 10,000 different products. 

Some topics which are ambiguous:

* Can prediction performance be upgraded for this model? 
* Is there a need for negative sampling? Is there a room for improvement by adjusting negative sample size?
* Are there more suitable or effective techniques to measure the performance of the model?
* Can execution time be shortened?
* Any other effective ways to predict next-product-to-buy using deep learning?
.
.

Sorry for language...
Thanks in advance