This section defines the overall model of the system introduced by the paper along with each individual component that makes up the system

In [1]:
#import itertools
import math
import numpy as np
import os
import pdb
from scipy import signal
import torch
import torch.nn as nn
from torch.nn import init
import torch.nn.functional as F
import torch.optim as optim

![Overall System](system.png)

The figure above is an overview of the system that the researchers implemented for transductive inference. The system consists of three modules: a feature extractor, an rPPG estimator and a synthetic gradient generator. In this section, each of the individual modules have been fleshed out along with the ordinal regression loss, and are combined together to create the overall system

# Sub Models

In [2]:
#from collections import OrderedDict

![Layer Breakdown](system_layer_breakdown.png)

The individual modules are created using the layer breakdown that can be seen above

## Convolutional Encoder

Used for transductive learning

| Layer | Output Size | Kernel Size |
| :-: | :-: | :-: |
| Conv2DBlock | 60x32x32x32 | 3x3 |
| Conv2DBlock | 60x48x16x16 | 3x3 |
| Conv2DBlock | 60x64x8x8 | 3x3 |
| Conv2DBlock | 60x80x4x4 | 3x3 |
| Conv2DBlock | 60x120x2x2 | 3x3 |
| AvgPool | 60x120 | 2x2 |

The Conv2DBlocks are composed of a Conv2D, Batchnorm, average pooling and ReLU blocks

Output size is defined as T * Channels * K * K where
>T - number of input stream frames <br>
    channels - number of channels <br>
    KxK - size of the cropped and reshaped input face image <br>

In [3]:
class Convolutional_Encoder(nn.Module):
    def _init_(self, no_input_channel, isTrain, device):
        
        # Supercharge the Convolutional_Encoder class so that it can inherit from itself
        # This way individual layers can be defined within the class as per the image 'Layer Breakdown'
        super(Convolutional_Encoder, self)._init_()
        
        
        # Define the convolutional layers via the number of input nodes, number of output nodes,
        # kernel size, stride and padding
        self.conv = nn.Conv3d
        # Define kernel size, stride and padding
        kernel = (1, 3, 3)
        stride = (1, 1, 1)
        padding = (0, 1, 1)
        # Number of output nodes in previous layer = number of input nodes in next layer
        # layer 1
        self.conv1 = self.conv(no_input_channel, 32, kernel, stride, padding)
        # layer 2
        self.conv2 = self.conv(32, 48, kernel, stride, padding)
        # layer 3
        self.conv3 = self.conv(48, 64, kernel, stride, padding)
        # layer 4
        self.conv4 = self.conv(64, 80, kernel, stride, padding)
        # layer 5
        self.conv5 = self.conv(80, 120, kernel, stride, padding)
        
        
        # Define the BatchNorm layer
        # layer 1
        self.bn1 = nn.BatchNorm3d(32)
        # layer 2
        self.bn2 = nn.BatchNorm3d(48)
        # layer 3
        self.bn3 = nn.BatchNorm3d(64)
        # layer 4
        self.bn4 = nn.BatchNorm3d(80)
        # layer 5
        self.bn5 = nn.BatchNorm3d(120)
        
        
        # Defining the pooling window (value recieved from the paper)
        pool = (1, 2, 2)
        
        
        # Defining the ReLU block
        self.ReLU = nn.ReLU(inplace=True)
        
    
    # Forward propagation
    def forward(self, x):
        # x is the input face image
        window = x.shape[1]
        # Arrange the tensor according to the paper's ordering
        # x = x.permute(0, 2, 1, 3, 4)
        
        # Implementing 5 seperate Conv2DBlock
        # Conv2DBlock = conv2D + BatchNorm + Average Pooling in 3D space + ReLU
        # 1st Conv2DBlock
        x = self.conv1(x)
        x = self.bn1(x)
        x = avg_pool3d(x, pool)
        x = self.ReLU(x)
        # 2nd Conv2DBlock
        x = self.conv2(x)
        x = self.bn2(x)
        x = avg_pool3d(x, pool)
        x = self.ReLU(x)
        # 3rd Conv2DBlock
        x = self.conv3(x)
        x = self.bn3(x)
        x = avg_pool3d(x, pool)
        x = self.ReLU(x)
        # 4th Conv2DBlock
        x = self.conv4(x)
        x = self.bn4(x)
        x = avg_pool3d(x, pool)
        x = self.ReLU(x)
        # 5th Conv2DBlock
        x = self.conv5(x)
        x = self.bn5(x)
        x = avg_pool3d(x, pool)
        x = self.ReLU(x)
        
        # Implementing the AvgPool Block
        x = F.adaptive_avg_pool3d(x, (window, 1, 1))
        
        # Rearrange the tensor to have the same ordering as before
        # x = x.permute(0, 2, 1, 3, 4)
        
        # Reshaping of tensor to our original input
        x = x.reshape(x.size(0), x.size(1), -1)
        
        return x
    
    # Backward propagation to update the weights of the gradient
    def return_grad(self):
        c1 = self.conv1.weight.grad.data.clone()
        b1 = self.bn1.weight.grad.data.clone()
        c2 = self.conv2.weight.grad.data.clone()
        b2 = self.bn2.weight.grad.data.clone()
        c3 = self.conv3.weight.grad.data.clone()
        b3 = self.bn3.weight.grad.data.clone()
        c4 = self.conv4.weight.grad.data.clone()
        b4 = self.bn4.weight.grad.data.clone()
        c5 = self.conv5.weight.grad.data.clone()
        b5 = self.bn5.weight.grad.data.clone()
        
        return{'c1':c1, 'c2':c2, 'c3':c3, 'c4':c4, 'c5':c5,
               'b1':b1, 'b2':b2, 'b3':b3, 'b4':b4, 'b5':b5}

## rPPG Estimator

Used to infer the rPPG signal <br>
Posed as an Ordinal Regression Task

| Layer | Output Size |
| :-: | :-: |
| Bidirectional LSTM | 60x120
| Linear | 60x80 |
| Ordinal | 60x40 |

Output size is defined as T * Channels where
>T - number of input stream frames <br>
    channels - number of channels <br>

In [4]:
class rPPG_Estimator(nn.Module):
    def _init_(self, no_input_channel, num_layers, isTrain, device, num_classes=40, h=None, c=None):
        
        # Supercharge the rPPG_Estimator class so that it can inherit from itself
        super(rPPG_Estimator, self)._init_()
        
        # LSTM layer
        self.lstm = nn.LSTM(input_size=120, hidden_size=60, num_layers=num_layers, batch_first=True, bidirectional=True)
        # Linear Layer
        self.lin = nn.Linear(120, 80)
        # Ordinal Layer
        self.ord = OrdinalRegressionLayer()
        
        # Accounting for spatial and temporal features
        self.h, self.c = h, c
    
    
    # Forward propagation
    def forward(self, x):
        # Flatten the input lstm matrix into a vector
        self.lstm.flatten_parameters()
        # LSTM layer
        x, (self.h, self.c) = self.lstm(x, (self.h.data, self.c.data))
        # Linear layer
        x = self.lin(x)
        # Ordinal layer - used to gain the condition and estimator variable
        # condition can be either 0 or 1 denoting whether our samples falls within the correct segment or not
        # estimator is our rPPG estimation
        condition, estimator = self.orl(x)
        condition = condition.squeeze(2)
        return condition, estimator
    
    def feed_hc(self, date):
        self.h = data[0].data
        self.c = data[1].data
        
    # Backward propagation to update weights of the gradient
    def return_grad(self):
        dt_lstm = {}
        lin_grad = self.lin.weight.grad.data.clone()
        list_lstm = self.lstm._all_weights
        for x in list_lstm:
            for y in x:
                dt_lstm[x] = self.lstm._parameters[x].grad.data.clone()
        return {lin_grad, dt_lstm}

### Ordinal Regression Loss
Setting the Ordinal Regression Loss through a custom function OrdinalRegressionLayer() - introduced by authors of 'Predicting progression of alzheimers disease using ordinal regression'<br>
Code pulled in from the GitHub repository of the paper's researchers

In [5]:
class OrdinalRegressionLayer(nn.Module):
    def _init_(self):
        super(OrdinalRegressionLayer, self)._init_()
        
    def forward (self, x):
        """
        :param x: N X H X W X C, N is batch_size, C is channels of features
        :return: ord_labels is ordinal outputs for each spatial locations, 
        size is N x H X W X C (C = 2K, K is interval of SID)
        decode_label is the ordinal labels for each position of Image I
        """
        x = x.permute(0, 2, 1)
        N, C, W = x.size()
        ord_num = C
        
        """
        replace iter with matrix operation
        fast speed methods
        """
        A = x[:, ::2, :].clone()
        B = x[:, 1::2, :].clone()
        A = A.view(N, 1, ord_num * W)
        B = B.view(N, 1, ord_num * W)
        C = torch.cat((A, B), dim=1)
        # prevent nans
        C = torch.clamp(C, min=1e-8, max=1e8)
        ord_c = F.softmax(C, dim=1)
        
        ord_c1 = ord_c[:, 1, :].clone()
        ord_c1 = ord_c1.view(-1, ord_num, W)
        decode_c = torch.sum((ord_c1 > 0.5), dim=1).view(-1, 1, W)
        ord_c1 = ord_c1.permute(0, 2, 1)
        decode_c = decode_c.permute(0, 2, 1)
        return decode_c, ord_c1

## Synthetic Gradient Generator

Used to infer the rPPG signal

| Layer | Output Size | Kernel Size |
| :-: | :-: | :-: |
| Conv1DBlock | 40x120 | 3x3 |
| Conv1DBlock | 20x120 | 3x3 |
| Conv1DBlock | 40x120 | 3x3 |
| Conv1DBlock | 60x120 | 3x3 |

The Conv1DBlocks are composed of a Conv1D, Batchnorm and ReLU blocks

Output size is defined as T * Channels where
>T - number of input stream frames <br>
    channels - number of channels <br>

In [6]:
class Synthetic_Gradient_Generator(nn.Module):
    def _init_(self, number_input_channels, isTrain, device):
        
        # Supercharge the Synthetic_Gradient_Generator class so that it can inherit from itself
        super(Synthetic_Gradient_Generator, self)._init_()
        
        # layer 1
        self.layer1 = nn.Sequential(
            nn.Conv1d(60, 40, kernel_size=3, padding=1),
            nn.BatchNorm(40),
            nn.ReLU())
        # layer 2
        self.layer2 = nn.Sequential(
            nn.Conv1d(40, 20, kernel_size=3, padding=1),
            nn.BatchNorm(20),
            nn.ReLU())
        # layer 3
        self.layer3 = nn.Sequential(
            nn.Conv1d(20, 40, kernel_size=3, padding=1),
            nn.BatchNorm(40),
            nn.ReLU())
        # layer 4
        self.layer4 = nn.Sequential(
            nn.ConvTranspose1d(40, 60, kernel_size=3, padding=1))
    
    # Forward propagation
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        return x