# DVBPR Model – Deep Visually-Aware Bayesian Personalized Ranking

The .py script ___original-source-code/DVBPR/main.py___ reports both the training of the DVBPR and definition of the model itself. When translating it to get Keras-based code, we separate the model definition from the training pipeline.

> DVBPR jointly learn user latent factors and extract task-guided visual features from implicit feedback for fashion recommendation.

The modules we have to develop follow.
* ___Convolutional Neural Network (CNN)___ 
    > Our last layer has K dimensions. Here, instead of seeking a final layer that can be adapted to general-purpose prediction tasks, we hope to learn a representation whose dimensions explain the variance in users’ fashion preferences.

* [___Convolutional Siamese Network___](https://sorenbouma.github.io/blog/oneshot/)

###  Run-through: How to build models in Keras

There are two ways to build Keras models: __sequential__ and __functional__.

* The sequential API allows you to create models __layer-by-layer__ for most problems. It is limited in that it ___does not___ allow you to create models that share layers or have multiple inputs or outputs.

* Alternatively, the functional API allows you to create models that have a lot ___more flexibility___ as you can easily define models where __layers connect to more than just the previous and next layers__. In fact, you can connect layers to (literally) any other layer. As a result, creating complex networks such as siamese networks become possible.

## Siamese leg – Convolutional Neural Network

###  Run-through: CNN Layers

* __Batch Normalization__ is used to normalize the activations of a given input volume before passing it to the next layer in the network. It has been proven to be very effective at reducing the number of epochs required to train a CNN as well as stabilizing training itself.
* __POOL__ layers have a primary function of progressively reducing the spatial size (i.e. width and height) of the input volume to a layer. It is common to insert POOL layers between consecutive CONV layers in a CNN architecture.
* __Dropout__ is an interesting concept not to be overlooked. In an effort to force the network to be more robust we can apply dropout, the process of disconnecting random neurons between layers. This process is proven to reduce overfitting, increase accuracy, and allow our network to generalize better for unfamiliar images.

Via the __Sequential model API__, ___layers are added piecewise via the Sequential object___.

In [1]:
# Libraries
import numpy as np

from keras import backend as K
from keras.models import Sequential
from keras.layers import ZeroPadding2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.core import Activation
from keras.layers.core import Dropout
from keras.layers.core import Flatten, Dense
from keras.regularizers import l2

Using TensorFlow backend.


In [2]:
def CNN(width,
        height,
        depth,
        latent_dim,
        w_init="RandomNormal",
        cnn_w_regularizer=None,
        fc_w_regularizer=None,
        b_init="RandomNormal"):
        """
        Build the CNN.

            :param width (int): Image width in pixels.
            :param height (int): The image height in pixels.
            :param depth (int): The number of channels for the image.
            :param latent_dim (int): Dimesion of the latent space - embedding of the image.
            :param w_init="he_normal" (str): The kernel initializer.
            :param cnn_w_regularizer=None (str): Regularization method.
            :param fc_w_regularizer=None (str): Regularization method.
        """

        # Initialize the model along with the input shape to be
        # "channels last" and the channels dimension itself
        model = Sequential(name='cnn')
        input_shape = (height, width, depth)
        chan_dim = -1

        # if we are using "channels first", update the input shape
        # and channels dimension
        if K.image_data_format() == "channels_first":
            input_shape = (depth, height, width)
            chan_dim = 1

        # conv1
        #
        # Our first CONV layer will learn a total of 64 filters, each
        # of which are 11x11 -- we'll then apply 4x4 strides to reduce
        # the spatial dimensions of the volume
        # Moreover, a max-pooling layer is added
        model.add(Conv2D(64, (11, 11),
                         strides=(4, 4),
                         padding="valid",
                         kernel_initializer=w_init,
                         kernel_regularizer=cnn_w_regularizer,
                         bias_initializer=b_init,
                         input_shape=input_shape))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2),
                               padding="same"))

        # conv2
        #
        # Here we stack one more CONV layer on top,
        # each layer will learn a total of 256 (5x5) filters
        # A max-pooling layer is added
        model.add(ZeroPadding2D(padding=(2, 2)))
        model.add(Conv2D(256, (5, 5),
                         strides=(1, 1),
                         kernel_initializer=w_init,
                         kernel_regularizer=cnn_w_regularizer,
                         bias_initializer=b_init))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2),
                               padding="same"))

        # conv3
        #
        # Stack one more CONV layer, keeping 256 total learned filters
        # but decreasing the the size of each filter to 3x3
        model.add(ZeroPadding2D(padding=(1, 1)))
        model.add(Conv2D(256, (3, 3),
                         strides=(1, 1),
                         kernel_initializer=w_init,
                         kernel_regularizer=cnn_w_regularizer,
                         bias_initializer=b_init))
        model.add(Activation("relu"))

        # Two more CONV layers, same filter size and number
        #
        # conv4
        model.add(ZeroPadding2D(padding=(1, 1)))
        model.add(Conv2D(256, (3, 3),
                         strides=(1, 1),
                         kernel_initializer=w_init,
                         kernel_regularizer=cnn_w_regularizer,
                         bias_initializer=b_init))
        model.add(Activation("relu"))

        # conv5
        model.add(ZeroPadding2D(padding=(1, 1)))
        model.add(Conv2D(256, (3, 3),
                         strides=(1, 1),
                         kernel_initializer=w_init,
                         kernel_regularizer=cnn_w_regularizer,
                         bias_initializer=b_init))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2),
                               padding="same"))

        # Two fully-connected layers on top of each other
        #
        # full1
        model.add(Flatten())
        model.add(Dense(4096,
                        kernel_initializer=w_init,
                        kernel_regularizer=fc_w_regularizer,
                        bias_initializer=b_init))
        model.add(Activation("relu"))
        model.add(Dropout(0.5))

        # full2
        model.add(Dense(4096,
                        kernel_initializer=w_init,
                        kernel_regularizer=fc_w_regularizer,
                        bias_initializer=b_init))
        model.add(Activation("relu"))
        model.add(Dropout(0.5))

        # full3
        model.add(Dense(latent_dim,
                        kernel_initializer=w_init,
                        kernel_regularizer=fc_w_regularizer,
                        bias_initializer=b_init))

        # Any classifier layer (e.g. see softmax below) is added
        # since getting an embedding model is the goal here but solving a prediction task
        # model.add(Dense(classes))
        # model.add(Activation("softmax"))

        # Return the constructed network architecture
        return model

#### Let's display it!

In [3]:
conv_net = CNN(width=224,
               height=224,
               depth=3,
               latent_dim=100,
               cnn_w_regularizer=l2(1e-3),
               fc_w_regularizer=l2(1e-3))

In [4]:
print("\nSIAMESE LEG - CNN")
conv_net.summary()


SIAMESE LEG - CNN
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 54, 54, 64)        23296     
_________________________________________________________________
activation_1 (Activation)    (None, 54, 54, 64)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 27, 64)        0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 31, 31, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 27, 27, 256)       409856    
_________________________________________________________________
activation_2 (Activation)    (None, 27, 27, 256)       0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 256)       0 

## ConvSiameseNet

Since we want the same parameters used for both inputs,
    1. We define the twin network’s architecture once as a Sequential() model
    2. Call it with respect to each of two input layers 

Therefore, 
    3. The two input layers are merged together through absolute distance
    4. An output layer is added

In this case, the __Keras functional API__ needs to be used: it provides a more flexible way for defining models.

> Specifically, it allows you to define multiple input or output models as well as models that __share layers__. More than that, it allows you to define ad hoc acyclic network graphs.

> ___Models are defined by creating instances of layers and connecting them directly to each other in pairs, and then defining a Model that specifies the layers to act as the input and output to the model, via the parameters inputs and outputs, respectively.___

In [5]:
# Libraries
from keras.models import Model
from keras.layers import Input, Lambda, Concatenate, Dot

In [6]:
def ConvSiameseNet(users_dim,
                   width,
                   height,
                   depth,
                   latent_dim,
                   w_init="RandomNormal",
                   cnn_w_regularizer=None,
                   fc_w_regularizer=None,
                   u_w_regularizer=None,
                   b_init="RandomNormal"):

        # Define the input
        #   Unlike the Sequential model, you must create and define
        #   a standalone "Input" layer that specifies the shape of input
        #   data. The input layer takes a "shape" argument, which is a
        #   tuple that indicates the dimensionality of the input data.
        user_input = Input((1,))
        user_E = Input((users_dim * latent_dim, latent_dim),
                       name="user_matrix")

        image_shape = (width, height, depth)
        left_input = Input(image_shape,
                           name="observed_image")
        right_input = Input(image_shape,
                            name="non_observed_image")

        # Build convnet to use in each siamese 'leg'
        conv_net = CNN(width,
                       height,
                       depth,
                       latent_dim,
                       w_init,
                       cnn_w_regularizer,
                       fc_w_regularizer,
                       b_init)

        # Connecting layers
        #   The layers in the model are connected pairwise.
        #   This is done by specifying where the input comes from when
        #   defining each new layer. A bracket notation is used, such that
        #   after the layer is created, the layer from which the input to
        #   the current layer comes from is specified.
        #
        # merge the two encoded inputs through the L1 distance
        L1_distance = Lambda(lambda tensors: K.abs(tensors[0] - tensors[1]),
                             name="score_difference")

        # user's preferences theta_u
        theta_user = []
        for u in range(users_dim):
            theta_user.append(Dense(latent_dim,
                                    kernel_initializer=w_init,
                                    kernel_regularizer=u_w_regularizer,
                                    bias_initializer=b_init,
                                    name="user_%.0f_preferences" % (u)))

        # concatenate all users' preferences vectors to get the
        # users' preferences matrix Theta
        concatenate = Concatenate(axis=-1,
                                  name="theta")

        # single user's preferences theta_u
        user_preference = Dot(axes=1,
                              name="theta_u")

        # preference layer
        preference_relationship = Dot(axes=1,
                                      name="score_rank")

        # Apply the pipeline to the inputs
        #
        # call the convnet Sequential model on each of the input tensors
        # so params will be shared
        encoded_l = conv_net(left_input)
        encoded_r = conv_net(right_input)

        # merge the two encoded inputs through the L1 distance
        L1_dist = L1_distance([encoded_l, encoded_r])

        # concatenate user's preferences theta_u to get the preferences
        # matrix Theta
        theta_urs = []
        for u in range(users_dim):
            theta_urs.append(theta_user[u](user_input))
        theta = concatenate(theta_urs)

        # retrieve the single user preference
        theta_ur = user_preference([user_E, theta])

        # get the preference score
        prediction = preference_relationship([theta_ur, L1_dist])

        # Create the model
        #   After creating all of your model layers and connecting them
        #   together, you must then define the model.
        #   As with the Sequential API, the model is the thing that you can
        #   summarize, fit, evaluate, and use to make predictions.
        #   Keras provides a "Model" class that you can use to create a model
        #   from your created layers. It requires that you only specify the
        #   input and output layers.
        model = Model(inputs=[user_input, user_E, left_input, right_input],
                      outputs=prediction)
        return model

In [7]:
siamese_conv_net = ConvSiameseNet(users_dim=5,
                                  width=224,
                                  height=224,
                                  depth=3,
                                  latent_dim=100,
                                  cnn_w_regularizer=l2(1e-3),
                                  fc_w_regularizer=l2(1e-3),
                                  u_w_regularizer=l2(1.0))

In [8]:
print("\nCONVOLUTIONAL SIAMESE NET - example with users_dim=5")
siamese_conv_net.summary()


CONVOLUTIONAL SIAMESE NET - example with users_dim=5
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
user_0_preferences (Dense)      (None, 100)          200         input_1[0][0]                    
__________________________________________________________________________________________________
user_1_preferences (Dense)      (None, 100)          200         input_1[0][0]                    
__________________________________________________________________________________________________
user_2_preferences (Dense)      (None, 100)          200         input_1[0][0]                    
_______________________________________________________

In [9]:
def DVBPR(users_dim,
          width,
          height,
          depth,
          latent_dim,
          w_init="RandomNormal",
          cnn_w_regularizer=None,
          fc_w_regularizer=None,
          u_w_regularizer=None,
          b_init="RandomNormal"):

        # Define the input
        #   Unlike the Sequential model, you must create and define
        #   a standalone "Input" layer that specifies the shape of input
        #   data. The input layer takes a "shape" argument, which is a
        #   tuple that indicates the dimensionality of the input data.
        user_input = Input((1,))
        user_E = Input((users_dim * latent_dim, latent_dim),
                       name="user_matrix")

        image_shape = (width, height, depth)
        image_input = Input(image_shape,
                            name="observed_image")

        # Build convnet to use in each siamese 'leg'
        conv_net = CNN(width,
                       height,
                       depth,
                       latent_dim,
                       w_init,
                       cnn_w_regularizer,
                       fc_w_regularizer,
                       b_init)

        # Connecting layers
        #   The layers in the model are connected pairwise.
        #   This is done by specifying where the input comes from when
        #   defining each new layer. A bracket notation is used, such that
        #   after the layer is created, the layer from which the input to
        #   the current layer comes from is specified.
        #
        # user's preferences theta_u
        theta_user = []
        for u in range(users_dim):
            theta_user.append(Dense(latent_dim,
                                    kernel_initializer=w_init,
                                    kernel_regularizer=u_w_regularizer,
                                    bias_initializer=b_init,
                                    name="user_%.0f_preferences" % (u)))

        # concatenate all users' preferences vectors to get the
        # users' preferences matrix Theta
        concatenate = Concatenate(axis=-1,
                                  name="theta")

        # single user's preferences theta_u
        user_preference = Dot(axes=1,
                              name="theta_u")

        # preference layer
        preference_relationship = Dot(axes=1,
                                      name="score_rank")

        # Apply the pipeline to the inputs
        #
        # call the convnet Sequential model on each of the input tensors
        # so params will be shared
        encoded_i = conv_net(image_input)

        # concatenate user's preferences theta_u to get the preferences
        # matrix Theta
        theta_urs = []
        for u in range(users_dim):
            theta_urs.append(theta_user[u](user_input))
        theta = concatenate(theta_urs)

        # retrieve the single user preference
        theta_ur = user_preference([user_E, theta])

        # get the preference score
        prediction = preference_relationship([theta_ur, encoded_i])

        # Create the model
        #   After creating all of your model layers and connecting them
        #   together, you must then define the model.
        #   As with the Sequential API, the model is the thing that you can
        #   summarize, fit, evaluate, and use to make predictions.
        #   Keras provides a "Model" class that you can use to create a model
        #   from your created layers. It requires that you only specify the
        #   input and output layers.
        model = Model(inputs=[user_input, user_E, image_input],
                      outputs=prediction)
        return model

In [10]:
dvbpr = DVBPR(users_dim=5,
              width=224,
              height=224,
              depth=3,
              latent_dim=100,
              cnn_w_regularizer=l2(1e-3),
              fc_w_regularizer=l2(1e-3),
              u_w_regularizer=l2(1.0))

In [11]:
print("\nDEEP VISUAL BAYESIAN PERSONALIZED RANKING - example with users_dim=5")
dvbpr.summary()


DEEP VISUAL BAYESIAN PERSONALIZED RANKING - example with users_dim=5
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 1)            0                                            
__________________________________________________________________________________________________
user_0_preferences (Dense)      (None, 100)          200         input_2[0][0]                    
__________________________________________________________________________________________________
user_1_preferences (Dense)      (None, 100)          200         input_2[0][0]                    
__________________________________________________________________________________________________
user_2_preferences (Dense)      (None, 100)          200         input_2[0][0]                    
_______________________________________

In [1]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13875212510363869932
]
