## SKU TO VEC

### Reference
https://arxiv.org/pdf/1103.0398.pdf

http://sebastianruder.com/word-embeddings-1/index.html#continuousbagofwordscbow

## Word embedding models

Naturally, every feed-forward neural network that takes words from a vocabulary as input and embeds them as vectors into a lower dimensional space, which it then fine-tunes through back-propagation, necessarily yields word embeddings as the weights of the first layer, which is usually referred to as Embedding Layer.

The main difference between such a network that produces word embeddings as a by-product and a method such as word2vec whose explicit goal is the generation of word embeddings is its computational complexity. Generating word embeddings with a very deep architecture is simply too computationally expensive for a large vocabulary. This is the main reason why it took until 2013 for word embeddings to explode onto the NLP stage; computational complexity is a key trade-off for word embedding models

#### 1 Embedding Layer: 
A layer that generates word embeddings by multiplying an index vector with a word embedding matrix;
#### 2 Intermediate Layer(s):
One or more layers that produce an intermediate representation of the input, e.g. a fully-connected layer that applies a non-linearity to the concatenation of word embeddings of nn previous words;
#### 3 Cost Layer:
The final layer that produces a probability distribution over words in VV.

In [1]:
import tensorflow as tf
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

## CBOW model

<img src="img/cbow.png">

## The pairwise ranking criterion, which looks like this:

   <img src="img/margin_1_loss.png">
    

## Imports, config variables, and data generators

In [14]:
# Global config variables
batch_size = 16 # 128
num_classes = 10 # number of skus and zuids ~700.000 
state_size = 4 # 32, 64, 128
learning_rate = 0.1 
d_win = 1
n_negative = 7
n_positive = 1

#layer_1 
n_layer_1 = 8

#layer_2
n_layer_2 = 4

In [3]:
def gen_data(size=128):
    pass

def gen_batch(raw_data, batch_size, num_steps):
    pass

def gen_epochs(n, num_steps):
    for i in range(n):
        yield gen_batch(gen_data(), batch_size, num_steps)

### Model

If we treat a continuous sequence of actions of a user in a transaction as a sentences, the sku will play a role as a word.
<img src="img/model_v1.png">

### Helper-functions for creating new variables

In [4]:
def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))

def new_fc_layer(input,          # The previous layer.
                 num_inputs,     # Num. inputs from prev. layer.
                 num_outputs,    # Num. outputs.
                 use_relu=True): # Use Rectified Linear Unit (ReLU)?

    # Create new weights and biases.
    weights = new_weights(shape=[num_inputs, num_outputs])
    biases = new_biases(length=num_outputs)

    # Calculate the layer as the matrix multiplication of
    # the input and weights, and then add the bias-values.
    layer = tf.matmul(input, weights) + biases

    # Use ReLU?
    if use_relu:
        layer = tf.nn.relu(layer)

    return layer

## Bayersian Personalize Ranking (BPR)

Loss function:

<img src="img/BPR_loss.png">

with:

<img src="img/BPR_loss_2.png">


In [19]:
# Placeholders
u_index = tf.placeholder(tf.int32, [batch_size, 1], name = 'zuid')
x_i_index = tf.placeholder(tf.int32, [batch_size, 1], name = 'xi')
x_j_index = tf.placeholder(tf.int32, [batch_size, 1], name = 'xj')

In [20]:
# Variables

#lookup table 
embedding_skus = new_weights([num_classes, state_size])
embedding_zuid = new_weights([num_classes, state_size])
# input vectors
u = tf.nn.embedding_lookup(embedding_zuid, u_index)
x_i = tf.nn.embedding_lookup(embedding_skus, x_i_index)
x_j = tf.nn.embedding_lookup(embedding_skus, x_j_index)

## Model 

<img src="img/BPR_loss_2.png">

In [28]:
x_ui = tf.matmul(u, x_i, transpose_a=False, transpose_b=True)
x_ui = tf.reshape(x_ui, [-1])
x_uj = tf.matmul(u, x_j, transpose_a=False, transpose_b=True)
x_uj = tf.reshape(x_uj, [-1])

<img src="img/BPR_loss.png">

In [45]:
x_uij = tf.minimum(x_ui - x_uj, 1)

## Cost-function to be optimized

In [46]:
loss = - tf.reduce_mean(x_uij)

## Optimization Method

In [47]:
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(loss)

## Support function AUC estimator

<img src="img/AUC.png">

In [48]:
# AUC for each user:

# average AUC:


## TensorFlow Run