# Ladder network
- Based on: https://github.com/abhiskk/ladder/blob/master/ladder/
- An unsqueeze operation was added **bn_hat_z_layers**   

## - Watch out:
- Check if using the einsum is correct accurate
- Check if the batch norm done in bn_hat_z_layers (decoder) is done in the correct axes

In [1]:
import numpy as np
import torch
from torch.nn.parameter import Parameter
from torch.autograd import Variable
from torch.optim import Adam
from torch.optim import SGD

In [2]:
import torchvision.datasets as dset
import pandas as pd
from pandas import Series
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
plt.rcParams.update({'font.size':18})
%matplotlib inline

In [3]:
import utils_CLR
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from tqdm import tqdm_notebook

![title](ladder_network_overview.jpg)

# Algorithm overview

### I. For each epoch:
### 1. forward_noise (on labelled data)
- create noise
- create dirty data (h) $h = x + noise$
- **store** the bottom h (h1) (without gradients)  $\overset{\sim}{z}$ (retrieved in 7)
- For each encoder we send forward_noise(h):
  - linear transformation: $y=xA^T$ (z_pre)
  - Batch_Normalization (z_pre_norm)
  - Noise addition $\overset{\sim}{z}$
  - **Store** a copy of $\overset{\sim}{z}$ (retrieved in 3)
  - Batch Norm correction
  - Activation (h)
  - Return (h)

### 2. forward_noise (on unlabelled data)
- same as 1

### 3. get_encoders_tilde_z
 - Get all $\overset{\sim}{z}$ obtained in forward_noise
 - Append them in an array and reverse it
 
### 4. forward_clean (unlabelled_data)
- For each encoder we send forward_clean(h):
  - linear transformation: $y=xA^T$ (z_pre)
  - **Store** a copy of (**buffer_z_pre**) (retrieved in 5)
  - Batch_Normalization (z)
  - **Store** a copy of (**buffer_z**) (retrieved in 6)
  - Batch_Normalization Gamm_Beta (z_gb)   <-check why
  - Activation (h)
  - Return (h)
  
### 5. get_encoders_z_pre (created in 4)
 - Get all **buffer_z_pre** obtained in forward_noise
 - Append them in an array and reverse it
 
### 6. get_encoders_z (created in 4)
 - Get all **buffer_z** obtained in forward_noise
 - Append them in an array and reverse it
 
### 7. buffer_tilde_z_bottom (created in 4)
-  **Store** the bottom $\overset{\sim}{z}$ created in 1

### 8. forward_decoders( $\overset{\sim}{z}$_unlabelled (from 3),    output_noise_unlabelled (from 2), bottom $\overset{\sim}{z}$_unlabelled (from 2))
### - returns $\hat{z}$ (weighted ($\upsilon$) sum of $\overset{\sim}{z}$ and a prior $\mu$) 
### $\hat{z} = \upsilon * \overset{\sim}{z} + (1 - \upsilon) * \mu$

- Batch Normalization on encoder_output  
- for each decoder:
  - get the corresponding $\overset{\sim}{z}$ from $\boldsymbol{\overset{\sim}{z}}$ (tilde_z from 3)
  - apply u=decoder.forward($\overset{\sim}{z}$, u) (in the first case u is encoder_output)
    - apply the combinatior function g(tilde_z, u). It combines the lateral noisy activation signal $\overset{\sim}{z}$ and the reconstruction from layer $\hat{z}^{(l+1)}$
    - In the first case the combinator will use the last $\overset{\sim}{z}$ and the encoder_output
    - In the combinator the functions $\mu$ and $\upsilon$ are model as expresive non linearities and have trainable parameters ($a1_i$...$a5_i$). e.g:
    - $\mu$ = a1 \* sigmoid(a2 \* output + a3) + a4 \* output + a5
    - $\upsilon$ = a6 \* sigmoid(a7 \* output + a8) + a9 \* output + a10
    - $\mu$ and $\upsilon$ have the same shape as u (output for the first case)
    - $\hat{z}$ = ($\overset{\sim}{z} - \mu) * \upsilon) + \mu$    (equivalent to first formula in (8))
    
### 9-10. Append unlabelled to preactivations and to activations
### 11. Batch norm $\hat{z}$ using z_pre  
  - It uses mean(z_pre) but the std is obtained with random noise (?) 
  - Warning: I added an unsqueeze operation to match mm dimensions

# Classes

### Encoder

In [4]:
class Encoder_Conv(torch.nn.Module):
    def __init__(self, d_in, d_out, activation_type,
                 train_bn_scaling, noise_level, use_cuda):
        super(Encoder_Conv, self).__init__()
        self.d_in = d_in
        self.d_out = d_out
        self.activation_type = activation_type
        self.train_bn_scaling = train_bn_scaling
        self.noise_level = noise_level
        self.use_cuda = use_cuda

        # Encoder
        # Encoder only uses W matrix, no bias
        #self.linear = torch.nn.Linear(d_in, d_out, bias=False)
        #self.linear.weight.data = torch.randn(self.linear.weight.data.size()) / np.sqrt(d_in)
        self.conv = torch.nn.Conv2d(d_in, d_out, kernel_size=3, stride=2, padding=1, bias=False)
        self.conv.weight.data = torch.randn(self.conv.weight.data.size()) / np.sqrt(d_in)

        # Batch Normalization
        # For Relu Beta of batch-norm is redundant, hence only Gamma is trained
        # For Softmax Beta, Gamma are trained
        # batch-normalization bias
        self.bn_normalize_clean = torch.nn.BatchNorm2d(d_out, affine=False)
        self.bn_normalize = torch.nn.BatchNorm2d(d_out, affine=False)
        
        

        # Activation
        if activation_type == 'relu':
            self.activation = torch.nn.ReLU()
        elif activation_type == 'softmax':
            self.activation = torch.nn.Softmax()
        elif activation_type == 'AdaptiveAvgPool':
            self.activation = torch.nn.AdaptiveAvgPool2d((1))
        else:
            raise ValueError("invalid Activation type")

        # buffer for z_pre, z which will be used in decoder cost
        self.buffer_z_pre = None
        self.buffer_z = None
        # buffer for tilde_z which will be used by decoder for reconstruction
        self.buffer_tilde_z = None
        
        #Get shapes after convolution to use them later in the decoder
        self.conv_shapes = None

    def bn_gamma_beta(self, x, d_out, printout = False):
        if self.use_cuda:
            ones = Parameter(torch.ones(x.size()[0], x.size()[-1], x.size()[-1], 1).cuda())
        else:
            ones = Parameter(torch.ones(x.size()[0], x.size()[-1], x.size()[-1], 1))
            
        if self.use_cuda:
            self.bn_beta = Parameter(torch.cuda.FloatTensor(1, d_out, x.size()[-1], x.size()[-1]))
        else:
            self.bn_beta = Parameter(torch.FloatTensor(1, d_out, x.size()[-1], x.size()[-1]))
        self.bn_beta.data.zero_()
        #t = x + ones.mm(self.bn_beta)
        mult_out = torch.einsum('bhwi,iohw->bohw', (ones, self.bn_beta)) #batch_height_width_in,in_out_height_width
        if printout: print(f'  {np.shape(self.bn_beta.cpu().detach().numpy())} = bn_beta')
        t = x + mult_out
        
        if self.train_bn_scaling:
            # batch-normalization scaling
            if self.use_cuda:
                self.bn_gamma = Parameter(torch.cuda.FloatTensor(1, d_out, x.size()[-1], x.size()[-1]))
                self.bn_gamma.data = torch.ones(self.bn_gamma.size()).cuda()
            else:
                self.bn_gamma = Parameter(torch.FloatTensor(1, d_out, x.size()[-1], x.size()[-1]))
                self.bn_gamma.data = torch.ones(self.bn_gamma.size())
        
        if self.train_bn_scaling:
            #print(f'ones = {np.shape(ones)}, bn_gamma = {np.shape(self.bn_gamma)}')
            mult_out = torch.einsum('bhwl,lchw->bchw', (ones, self.bn_gamma))
            if printout: print(f'  {np.shape(self.bn_gamma.cpu().detach().numpy())} = bn_gamma')
            t = torch.mul(t, mult_out)
        return t

    def forward_clean(self, h, printout = False):
        if printout:  print(f'{np.shape(h.cpu().detach().numpy())} = h')
        z_pre = self.conv(h) #MOD v0.3.0 
        #Store z_pre, z to be used in calculation of reconstruction cost
        self.buffer_z_pre = z_pre.detach().clone()
        if printout:  print('conv2d')
        if printout:  print(f'{np.shape(z_pre.cpu().detach().numpy())} = z_pre')
        if printout:  print(f'{list(np.shape(z_pre.cpu().detach().numpy()))[-1]} = shape')
        self.conv_shapes = list(np.shape(z_pre.cpu().detach().numpy()))[-1]
        z = self.bn_normalize_clean(z_pre)
        if printout:  print('BN')
        self.buffer_z = z.detach().clone()
        z_gb = self.bn_gamma_beta(z, self.d_out, printout= printout) #MOD v0.3.0
        if str(self.activation) == 'Softmax()':
            z_gb = z_gb.view(-1, z_gb.size(1))
        if printout:  print(f'{str(self.activation)}')
        h = self.activation(z_gb)
        if printout:print(f'{np.shape(h.cpu().detach().numpy())} = h')
        return h

    def forward_noise(self, tilde_h, printout = False):
        # z_pre will be used in the decoder cost
        if printout: print(f'{np.shape(tilde_h.cpu().detach().numpy())} = tilde_h')
        z_pre = self.conv(tilde_h) #MOD v0.3.0
        if printout: print('conv2d')
        if printout: print(f'{np.shape(z_pre.cpu().detach().numpy())} = z_pre')
        z_pre_norm = self.bn_normalize(z_pre)
        if printout: print('BN')
        # Add noise
        noise = np.random.normal(loc=0.0, scale=self.noise_level, size=z_pre_norm.size())
        if self.use_cuda:
            noise = Variable(torch.cuda.FloatTensor(noise))
        else:
            noise = Variable(torch.FloatTensor(noise))
        if printout: print('add noise')
        # tilde_z will be used by decoder for reconstruction
        tilde_z = z_pre_norm + noise
        # store tilde_z in buffer
        self.buffer_tilde_z = tilde_z
        if printout: print(f'{np.shape(tilde_z.cpu().detach().numpy())} = tilde_z')
        if printout: print('BN correction')
        z = self.bn_gamma_beta(tilde_z, self.d_out, printout=False) #MOD v0.3.0
        if str(self.activation) == 'Softmax()':
            z = z.view(-1, z.size(1))
        if printout: print(f'{str(self.activation)}')
        h = self.activation(z)
        if printout: print(f'{np.shape(h.cpu().detach().numpy())} = h')
        return h

In [5]:
class StackedEncoders_Conv(torch.nn.Module):
    def __init__(self, d_in, d_encoders, activation_types,
                 train_batch_norms, noise_std, use_cuda):
        super(StackedEncoders_Conv, self).__init__()
        self.buffer_tilde_z_bottom = None
        self.encoders_ref = []
        self.encoders = torch.nn.Sequential()
        self.noise_level = noise_std
        self.use_cuda = use_cuda
        n_encoders = len(d_encoders)
        
        for i in range(n_encoders):
            if i == 0:
                d_input = d_in
            else:
                d_input = d_encoders[i - 1]
            d_output = d_encoders[i]
            activation = activation_types[i]
            train_batch_norm = train_batch_norms[i]
            encoder_ref = "encoder_" + str(i)
            encoder = Encoder_Conv(d_input, d_output, activation, train_batch_norm, noise_std, use_cuda)
            self.encoders_ref.append(encoder_ref)
            self.encoders.add_module(encoder_ref, encoder)

    def forward_clean(self, x, printout = False):
        h = x
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            if printout: print(f'\n{str(e_ref)}')
            h = encoder.forward_clean(h, printout)
        return h

    def forward_noise(self, x, printout = False):
        noise = np.random.normal(loc=0.0, scale=self.noise_level, size=x.size())
        if self.use_cuda:
            
            noise = Variable(torch.cuda.FloatTensor(noise))
            #noise = Variable(torch.FloatTensor(noise)).cuda()
        else:
            noise = Variable(torch.FloatTensor(noise))
        h = x + noise
        self.buffer_tilde_z_bottom = h.clone()
        # pass through encoders
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            if printout: print(f'\n{str(e_ref)}')
            h = encoder.forward_noise(h, printout)
        
        return h

    def get_encoders_tilde_z(self, reverse=True, printout = False):
        tilde_z_layers = []
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            tilde_z = encoder.buffer_tilde_z.clone()
            tilde_z_layers.append(tilde_z)
        if reverse:
            tilde_z_layers.reverse()
        if printout: [print(f'tilde_z = {np.shape(i.cpu().detach().numpy())}') for i in tilde_z_layers]
        return tilde_z_layers

    def get_encoders_z_pre(self, reverse=True, printout = False):
        z_pre_layers = []
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            z_pre = encoder.buffer_z_pre.clone()
            z_pre_layers.append(z_pre)
        if reverse:
            z_pre_layers.reverse()
        if printout: [print(f'z_pre_layers = {np.shape(i.cpu().detach().numpy())}') for i in z_pre_layers]
        return z_pre_layers

    def get_encoders_z(self, reverse=True, printout = False):
        z_layers = []
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            z = encoder.buffer_z.clone()
            z_layers.append(z)
        if reverse:
            z_layers.reverse()
        if printout: [print(f'z_layers = {np.shape(i.cpu().detach().numpy())}') for i in z_layers]
        return z_layers
    
    def get_shapes_after_conv(self, reverse = True, printout = False):
        conv_shapes_layers = []
        for e_ref in self.encoders_ref:
            encoder = getattr(self.encoders, e_ref)
            conv_shapes = encoder.conv_shapes
            conv_shapes_layers.append(conv_shapes)
        if printout: [print(i) for i in conv_shapes_layers]
        if reverse:
            conv_shapes_layers.reverse()
        return conv_shapes_layers
        

## Output

### #1 Forward noise labelled
**encoder_0**   
(99, 1, 28, 28) = $\tilde{h}$   
conv2d   
(99, 100, 14, 14) = z_pre   
BN   
add noise   
(99, 100, 14, 14) = $\tilde{z}$   
BN correction   
  (1, 100, 14, 14) = bn_beta   
ReLU()   
(99, 100, 14, 14) = h   

**encoder_1**   
(99, 100, 14, 14) = $\tilde{h}$   
conv2d   
(99, 50, 7, 7) = z_pre   
BN   
add noise   
(99, 50, 7, 7) = $\tilde{z}$   
BN correction   
  (1, 50, 7, 7) = bn_beta   
ReLU()   
(99, 50, 7, 7) = h   

**encoder_2**   
(99, 50, 7, 7) = $\tilde{h}$   
conv2d   
(99, 25, 4, 4) = z_pre   
BN   
add noise   
(99, 25, 4, 4) = $\tilde{z}$   
BN correction   
  (1, 25, 4, 4) = bn_beta   
ReLU()   
(99, 25, 4, 4) = h   

**encoder_3**   
(99, 25, 4, 4) = $\tilde{h}$   
conv2d   
(99, 25, 2, 2) = z_pre   
BN   
add noise   
(99, 25, 2, 2) = $\tilde{z}$   
BN correction   
  (1, 25, 2, 2) = bn_beta   
ReLU()   
(99, 25, 2, 2) = h   

**encoder_4**   
(99, 25, 2, 2) = $\tilde{h}$   
conv2d   
(99, 10, 1, 1) = z_pre   
BN   
add noise   
(99, 10, 1, 1) = $\tilde{z}$   
BN correction   
  (1, 10, 1, 1) = bn_beta   
  (1, 10, 1, 1) = bn_gamma softmax   
Softmax()   
(99, 10) = h

### #2 Forward noise unlabelled
encoder_0   
(143, 1, 28, 28) = $\tilde{h}$   
conv2d   
(143, 100, 14, 14) = z_pre   
BN   
add noise   
(143, 100, 14, 14) = $\tilde{z}$   
BN correction   
  (1, 100, 14, 14) = bn_beta   
ReLU()   
(143, 100, 14, 14) = h   

encoder_1   
(143, 100, 14, 14) = $\tilde{h}$   
conv2d   
(143, 50, 7, 7) = z_pre   
BN   
add noise   
(143, 50, 7, 7) = $\tilde{z}$   
BN correction   
  (1, 50, 7, 7) = bn_beta   
ReLU()   
(143, 50, 7, 7) = h   

encoder_2   
(143, 50, 7, 7) = $\tilde{h}$   
conv2d   
(143, 25, 4, 4) = z_pre   
BN   
add noise   
(143, 25, 4, 4) = $\tilde{z}$   
BN correction   
  (1, 25, 4, 4) = bn_beta   
ReLU()   
(143, 25, 4, 4) = h   

encoder_3   
(143, 25, 4, 4) = $\tilde{h}$   
conv2d   
(143, 25, 2, 2) = z_pre   
BN   
add noise   
(143, 25, 2, 2) = $\tilde{z}$   
BN correction   
  (1, 25, 2, 2) = bn_beta   
ReLU()   
(143, 25, 2, 2) = h   

encoder_4   
(143, 25, 2, 2) = $\tilde{h}$   
conv2d   
(143, 10, 1, 1) = z_pre   
BN   
add noise   
(143, 10, 1, 1) = $\tilde{z}$   
BN correction   
  (1, 10, 1, 1) = bn_beta   
  (1, 10, 1, 1) = bn_gamma   
Softmax()   
(143, 10) = h   
   
   

### #3 get $\tilde{z}$  (from noise encoder)
$\tilde{z} $ = (143, 10, 1, 1)   
$\tilde{z} $ = (143, 25, 2, 2)   
$\tilde{z} $ = (143, 25, 4, 4)   
$\tilde{z} $ = (143, 50, 7, 7)   
$\tilde{z} $ = (143, 100, 14, 14)   
   
### #5 get ${z}$ pre BN   (from clean encoder)
z_pre_layers = (143, 10, 1, 1)   
z_pre_layers = (143, 25, 2, 2)   
z_pre_layers = (143, 25, 4, 4)   
z_pre_layers = (143, 50, 7, 7)   
z_pre_layers = (143, 100, 14, 14)   

### #6 get ${z}$ after BN   (from clean encoder)   
z_layers = (143, 10, 1, 1)   
z_layers = (143, 25, 2, 2)   
z_layers = (143, 25, 4, 4)   
z_layers = (143, 50, 7, 7)   
z_layers = (143, 100, 14, 14)   

### #7 get bottom $\tilde{z}$ (from noise)
z_bottom = (143, 1, 28, 28)

---
### For Decoder we need:   
#3 $\tilde{z}$  
#7 bottom $\tilde{z}$   
output_noise_unlabelled = (143, 10)

### #4 Forward clean unlabelled
encoder_0   
(143, 1, 28, 28) = h   
conv2d   
(143, 100, 14, 14) = z_pre   
BN   
  (1, 100, 14, 14) = bn_beta   
ReLU()   
(143, 100, 14, 14) = h   

encoder_1   
(143, 100, 14, 14) = h   
conv2d   
(143, 50, 7, 7) = z_pre   
BN   
  (1, 50, 7, 7) = bn_beta   
ReLU()   
(143, 50, 7, 7) = h   

encoder_2   
(143, 50, 7, 7) = h   
conv2d   
(143, 25, 4, 4) = z_pre   
BN   
  (1, 25, 4, 4) = bn_beta   
ReLU()   
(143, 25, 4, 4) = h   

encoder_3   
(143, 25, 4, 4) = h   
conv2d   
(143, 25, 2, 2) = z_pre   
BN   
  (1, 25, 2, 2) = bn_beta   
ReLU()   
(143, 25, 2, 2) = h   

encoder_4   
(143, 25, 2, 2) = h   
conv2d   
(143, 10, 1, 1) = z_pre   
BN   
  (1, 10, 1, 1) = bn_beta   
  (1, 10, 1, 1) = bn_gamma   
Softmax()   
(143, 10) = h

### Decoder outputs
### #8 hat_z_layers_unlabelled   
$\hat{z} = (143, 10, 1, 1)$   
$\hat{z} = (143, 25, 2, 2)$   
$\hat{z} = (143, 25, 4, 4)$   
$\hat{z} = (143, 50, 7, 7)$   
$\hat{z} = (143, 100, 14, 14)$   
$\hat{z} = (143, 1, 28, 28)$   

### #9 z_pre_layers_unlabelled    
z_pre = (143, 10, 1, 1)   
z_pre = (143, 25, 2, 2)   
z_pre = (143, 25, 4, 4)   
z_pre = (143, 50, 7, 7)   
z_pre = (143, 100, 14, 14)   
z_pre = (143, 1, 28, 28)   

### #10 z_layers_unlabelled    
z = (143, 10, 1, 1)   
z = (143, 25, 2, 2)   
z = (143, 25, 4, 4)   
z = (143, 50, 7, 7)   
z = (143, 100, 14, 14)   
z = (143, 1, 28, 28)    

### #11.bn_hat_z_layers_unlabelled
bn $\hat{z} = (143, 10, 1, 1)$   
bn $\hat{z} = (143, 25, 2, 2)$   
bn $\hat{z} = (143, 25, 4, 4)$   
bn $\hat{z} = (143, 50, 7, 7)$   
bn $\hat{z} = (143, 100, 14, 14)$   
bn $\hat{z} = (143, 1, 28, 28)$   

## decoder

In [6]:
class Decoder(torch.nn.Module):
    def __init__(self, d_in, d_out, tensor_shape, use_cuda):
        super(Decoder, self).__init__()

        self.d_in = d_in
        self.d_out = d_out
        self.use_cuda = use_cuda

        if self.use_cuda:
            self.a1 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a2 = Parameter(1. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a3 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a4 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a5 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())

            self.a6 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a7 = Parameter(1. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a8 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a9 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
            self.a10 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape).cuda())
        else:
            self.a1 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a2 = Parameter(1. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a3 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a4 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a5 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))

            self.a6 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a7 = Parameter(1. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a8 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a9 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))
            self.a10 = Parameter(0. * torch.ones(1, d_in, tensor_shape, tensor_shape))


        if self.d_out is not None:
            self.V = torch.nn.ConvTranspose2d(d_in, d_out, kernel_size = 3, stride = 2, padding=1, bias=False)
            self.V.weight.data = torch.randn(self.V.weight.data.size()) / np.sqrt(d_in)
            # batch-normalization for u
            self.bn_normalize = torch.nn.BatchNorm2d(d_out, affine=False)

        # buffer for hat_z_l to be used for cost calculation
        self.buffer_hat_z_l = None

    def g(self, tilde_z_l, u_l, printout):
        if self.use_cuda:
            ones = Parameter(torch.ones(tilde_z_l.size()[0], 1).cuda())
        else:
            ones = Parameter(torch.ones(tilde_z_l.size()[0], 1))
        if printout: print(f'ones = {np.shape(ones)}, tilde_z = {np.shape(tilde_z_l)}') 
        if printout: print(f'a1 = {np.shape(self.a1.detach().cpu().numpy())}, a2 = {np.shape(self.a2.detach().cpu().numpy())}, \
a3 = {np.shape(self.a3.detach().cpu().numpy())}, a4 = {np.shape(self.a4.detach().cpu().numpy())}, a5 = {np.shape(self.a5.detach().cpu().numpy())}')
        if printout: print(f'a6 = {np.shape(self.a6.detach().cpu().numpy())}, a7 = {np.shape(self.a7.detach().cpu().numpy())}, \
a8 = {np.shape(self.a8.detach().cpu().numpy())}, a9 = {np.shape(self.a9.detach().cpu().numpy())}, a10 = {np.shape(self.a10.detach().cpu().numpy())}')
        
        b_a1 = torch.einsum('cghw,bc->bghw', (self.a1, ones)) #channels_groups(classes)_height_width, batch_channels
        b_a2 = torch.einsum('cghw,bc->bghw', (self.a2, ones)) 
        b_a3 = torch.einsum('cghw,bc->bghw', (self.a3, ones)) 
        b_a4 = torch.einsum('cghw,bc->bghw', (self.a4, ones)) 
        b_a5 = torch.einsum('cghw,bc->bghw', (self.a5, ones)) 

        b_a6 = torch.einsum('cghw,bc->bghw', (self.a6, ones)) 
        b_a7 = torch.einsum('cghw,bc->bghw', (self.a7, ones)) 
        b_a8 = torch.einsum('cghw,bc->bghw', (self.a8, ones)) 
        b_a9 = torch.einsum('cghw,bc->bghw', (self.a9, ones)) 
        b_a10 = torch.einsum('cghw,bc->bghw', (self.a10, ones)) 
        
        if printout: print(f'b_a1 = {np.shape(b_a1.detach().cpu().numpy())}, b_a2 = {np.shape(b_a2.detach().cpu().numpy())}, \
b_a3 = {np.shape(b_a3.detach().cpu().numpy())}, b_a4 = {np.shape(b_a4.detach().cpu().numpy())}')
        if printout: print(f'b_a5 = {np.shape(b_a5.detach().cpu().numpy())}, b_a6 = {np.shape(b_a6.detach().cpu().numpy())}, b_a7 = {np.shape(b_a7.detach().cpu().numpy())}, \
b_a8 = {np.shape(b_a8.detach().cpu().numpy())}')
        if printout: print(f'b_a9 = {np.shape(b_a9.detach().cpu().numpy())}, b_a10 = {np.shape(b_a10.detach().cpu().numpy())}')

        if printout: print(f'u_l = {np.shape(u_l.detach().cpu().numpy())}')
        if printout: print(f'torch.mul(b_a2, u_l) = {np.shape(torch.mul(b_a2, u_l).detach().cpu().numpy())}')
        
        mu_l = torch.mul(b_a1, torch.sigmoid(torch.mul(b_a2, u_l) + b_a3)) + \
               torch.mul(b_a4, u_l) + \
               b_a5

        v_l = torch.mul(b_a6, torch.sigmoid(torch.mul(b_a7, u_l) + b_a8)) + \
              torch.mul(b_a9, u_l) + \
              b_a10

        hat_z_l = torch.mul(tilde_z_l - mu_l, v_l) + mu_l
        
        if printout: print(f'mu_l = {np.shape(mu_l)}, v_l = {np.shape(v_l)}')
        return hat_z_l

    def forward(self, tilde_z_l, u_l, tensor_shape, printout):
        # hat_z_l will be used for calculating decoder costs
        hat_z_l = self.g(tilde_z_l, u_l, printout)
        # store hat_z_l in buffer for cost calculation
        self.buffer_hat_z_l = hat_z_l
        if printout: print(f'hat_z (before conv) = {np.shape(hat_z_l.detach().cpu().numpy())}')
        if self.d_out is not None:
            t = self.V(hat_z_l, output_size=(-1,-1,tensor_shape,tensor_shape))
            if printout: print(f't (after conv) = {np.shape(t.detach().cpu().numpy())}')
            u_l_below = self.bn_normalize(t)
            
#             t = self.V.forward(hat_z_l)
#             print(f't (after conv) = {np.shape(t.detach().cpu().numpy())}')
#             u_l_below = self.bn_normalize(t)
            return u_l_below
        else:
            return None

In [13]:
class StackedDecoders_Conv(torch.nn.Module):
    def __init__(self, d_in, d_decoders, image_size, tensor_shapes, use_cuda):
        super(StackedDecoders_Conv, self).__init__()
        self.bn_u_top = torch.nn.BatchNorm1d(d_in, affine=False) 
        self.decoders_ref = []
        self.decoders = torch.nn.Sequential()
        self.use_cuda = use_cuda
        n_decoders = len(d_decoders)
        for i in range(n_decoders):
            if i == 0:
                d_input = d_in
            else:
                d_input = d_decoders[i - 1]
            d_output = d_decoders[i]
            tensor_shape = tensor_shapes[i]
            decoder_ref = "decoder_" + str(i)
            decoder = Decoder(d_input, d_output, tensor_shape, use_cuda)
            self.decoders_ref.append(decoder_ref)
            self.decoders.add_module(decoder_ref, decoder)

        self.bottom_decoder = Decoder(image_size, None, 28, use_cuda)
        self.tensor_shapes = tensor_shapes
        
    def forward(self, tilde_z_layers, u_top, tilde_z_bottom, printout = False):
        # Note that tilde_z_layers should be in reversed order of encoders
        hat_z = []
        u = self.bn_u_top(u_top)
        for i in range(len(self.decoders_ref)):
            d_ref = self.decoders_ref[i]
            decoder = getattr(self.decoders, d_ref)
            tilde_z = tilde_z_layers[i]
            if i == 0: 
                u.unsqueeze_(-1)
                u.unsqueeze_(-1)
            if printout: print(f'u before decoder = {np.shape(u.detach().cpu().numpy())}')
            tensor_shape = self.tensor_shapes[i+1]
            u = decoder.forward(tilde_z, u, tensor_shape, printout)
            if printout: print(f'u after decoder = {np.shape(u.detach().cpu().numpy())}')
            if printout: print('')
            hat_z.append(decoder.buffer_hat_z_l)
        self.bottom_decoder.forward(tilde_z_bottom, u, tensor_shapes[-1], printout)
        hat_z_bottom = self.bottom_decoder.buffer_hat_z_l.clone()
        hat_z.append(hat_z_bottom)
        return hat_z

    def bn_hat_z_layers(self, hat_z_layers, z_pre_layers, printout = False):
        # TODO: Calculate batchnorm using GPU Tensors.
        assert len(hat_z_layers) == len(z_pre_layers)
        hat_z_layers_normalized = []
        for i, (hat_z, z_pre) in enumerate(zip(hat_z_layers, z_pre_layers)):
            tensor_shape = self.tensor_shapes[i]
            if printout: print(f'hat_z = {np.shape(hat_z.detach().cpu().numpy())}, z_pre = {np.shape(z_pre.detach().cpu().numpy())}')
            if self.use_cuda:
                ones = Variable(torch.ones(z_pre.size()[0], 1, tensor_shape, tensor_shape).cuda())
            else:
                ones = Variable(torch.ones(z_pre.size()[0], 1, tensor_shape, tensor_shape))
            if printout: print(f'ones = {np.shape(ones.detach().cpu().numpy())}')
            mean = torch.mean(z_pre, 0)
            mean.unsqueeze_(0) # <---- ADDED BY OMM
            if printout: print(f'mean = {np.shape(mean.detach().cpu().numpy())}')
            noise_var = np.random.normal(loc=0.0, scale=1 - 1e-10, size=z_pre.size())
            if printout: print(f'z_pre = {np.shape(z_pre.detach().cpu().numpy())}')
            if printout: print(f'noise_var = {np.shape(noise_var)}')
            tensor_shape = self.tensor_shapes[i]
            #pdb.set_trace()
            if self.use_cuda: # add the tensor_shape into the reshape function
                var = np.var(z_pre.data.cpu().numpy() + noise_var, axis=0).reshape(1, z_pre.size()[1], tensor_shape, tensor_shape)
            else:
                var = np.var(z_pre.data.numpy() + noise_var, axis=0).reshape(1, z_pre.size()[1], tensor_shape, tensor_shape)
            var = Variable(torch.FloatTensor(var))
            if printout: print(f'var = {np.shape(var.detach().cpu().numpy())}')
            if self.use_cuda:
                hat_z = hat_z.cpu()
                ones = ones.cpu()
                mean = mean.cpu()
            onesmmmean = torch.einsum('bchw,cghw->bghw',(ones,mean)) # batch_channel_height_width, channels_groups(classes)_height_width
            if printout: print(f'onesmmmean = {np.shape(onesmmmean.detach().cpu().numpy())}')
            torchsqrt = torch.sqrt(var + 1e-10)
            if printout: print(f'torchsqrt = {np.shape(torchsqrt.detach().cpu().numpy())}')
            ones_torchsqrt = torch.einsum('bchw,vohw->bohw',(ones,torchsqrt)) #batch_channel_height_width, variance_channels
            if printout: print(f'ones_torchsqrt = {np.shape(ones_torchsqrt.detach().cpu().numpy())}')
            #hat_z_normalized = torch.div(hat_z - ones.mm(mean), ones.mm(torch.sqrt(var + 1e-10)))
            hat_z_normalized = torch.div(hat_z - onesmmmean, ones_torchsqrt)
            if self.use_cuda:
                hat_z_normalized = hat_z_normalized.cuda()
            hat_z_layers_normalized.append(hat_z_normalized)
            print('')
        return hat_z_layers_normalized

---

## ladder

In [None]:
# class Ladder(torch.nn.Module):
#     def __init__(self, encoder_sizes, decoder_sizes, encoder_activations, encoder_train_bn_scaling,
#                 noise_std, use_cuda):
#         super(Ladder, self).__init__()
#         self.use_cuda = use_cuda
#         decoder_in = encoder_sizes[-1]
#         encoder_in = decoder_sizes[-1]
#         self.se = StackedEncoders(encoder_in, encoder_sizes, encoder_activations,
#                                   encoder_train_bn_scaling, noise_std, use_cuda)
#         self.de = StackedDecoders(decoder_in, decoder_sizes, encoder_in, use_cuda)
        
#         self.bn_image = torch.nn.BatchNorm1d(encoder_in, affine=False)
        
#     def forward_encoders_clean(self, data):
#         return self.se.forward_clean(data)
        
#     def forward_encoders_noise(self, data):
#         return self.se.forward_noise(data)
    
#     def forward_decoders(self, tilde_z_layers, encoder_output, tilde_z_bottom):
#         return self.de.forward(tilde_z_layers, encoder_output, tilde_z_bottom)
        
#     def get_encoders_tilde_z(self, reverse=True):
#         return self.se.get_encoders_tilde_z(reverse)

#     def get_encoders_z_pre(self, reverse=True):
#         return self.se.get_encoders_z_pre(reverse)

#     def get_encoder_tilde_z_bottom(self):
#         return self.se.buffer_tilde_z_bottom.clone()

#     def get_encoders_z(self, reverse=True):
#         return self.se.get_encoders_z(reverse)

#     def decoder_bn_hat_z_layers(self, hat_z_layers, z_pre_layers):
#         return self.de.bn_hat_z_layers(hat_z_layers, z_pre_layers)

---

## Data

In [8]:
root='data/'
train_set = dset.MNIST(root=root, train=True, download=True)
test_set = dset.MNIST(root=root, train=False, download=True)

In [9]:
# Load MNIST and permutate it
X_train = train_set.train_data
y_train = train_set.train_labels
# Work with a subset of the samples
X_train = X_train[:16000]
y_train = y_train[:16000]
# Add channel dimension
X_train = X_train.view(-1,1,28,28)
# Normalize it
X_train = np.multiply(X_train, 1./255.)
# Flatten the rows and columns
# X_train = X_train.reshape(X_train.shape[0],X_train.shape[1]*X_train.shape[2])
randomize = np.arange(X_train.shape[0])
np.random.shuffle(randomize)
# Get a ratio from labeled and unlabeled data
labeled_ratio = .1
X_train_labelled = X_train[randomize[:int(np.round(X_train.shape[0] * labeled_ratio))]] 
y_train = y_train[randomize[:int(np.round(X_train.shape[0] * labeled_ratio))]]
X_train_unlabelled = X_train[randomize[int(np.round(X_train.shape[0] * labeled_ratio)):]] 
# Covert to tensor
X_train_labelled = X_train_labelled.type(torch.Tensor)
X_train_unlabelled = X_train_unlabelled.type(torch.Tensor)
#X_train = np.expand_dims(X_train,-1)
print(f'X_train_labelled = {np.shape(X_train_labelled)}, {labeled_ratio*100:.0f}% of {X_train.shape[0]}')
print(f'X_train_unlabelled = {np.shape(X_train_unlabelled)}, {(1-labeled_ratio)*100:.0f}% of {X_train.shape[0]}')
print(f'labels = {np.shape(y_train)}')

X_train_labelled = torch.Size([1600, 1, 28, 28]), 10% of 16000
X_train_unlabelled = torch.Size([14400, 1, 28, 28]), 90% of 16000
labels = torch.Size([1600])


In [10]:
# Test data
X_test = test_set.test_data
y_test = test_set.test_labels
X_test = np.multiply(X_test, 1./255)
X_test = X_test.view(-1,1,28,28)
#X_test = X_test.reshape(X_test.shape[0],X_test.shape[1]*X_test.shape[2])
X_test = X_test.type(torch.Tensor)
X_test = torch.FloatTensor(X_test)
y_test = torch.LongTensor(y_test)
X_test = Variable(X_test, requires_grad = False)
y_test = Variable(y_test, requires_grad = False)

In [None]:
# fig, ax = plt.subplots(1,2)
# ax[0].imshow(np.reshape(X_train[0],[28,28]))
# ax[1].imshow(np.reshape(X_test[0],[28,28]))

In [None]:
# # For debugging
# X_train_labelled = X_train_labelled[:300]
# X_train_unlabelled = X_train_unlabelled[:300]
# y_train = y_train[:300]
# X_test = X_test[:300]
# y_test = y_test[:300]

In [None]:
details = '' 

---

## Step by step

In [11]:
tensor_shapes = []
input_height = np.shape(X_train.detach().cpu().numpy())[-1]
tensor_shapes.append(input_height)
while input_height >1:
    input_height/=2
    input_height = np.ceil(input_height)
    tensor_shapes.append(int(input_height))
tensor_shapes.reverse()
print(tensor_shapes)

[1, 2, 4, 7, 14, 28]


In [15]:
noise_std = 0.2
encoder_sizes = [100, 50, 25, 25, 10]
encoder_activations =  ['relu','relu','relu','relu', 'softmax']
encoder_train_bn_scaling = [False, False,False, False, True]
use_cuda = True
learning_rate = 0.02
my_encoder = StackedEncoders_Conv(1, encoder_sizes, encoder_activations,
                                  encoder_train_bn_scaling, noise_std, use_cuda)

decoder_sizes = [25, 25, 50, 100, 1]
unsupervised_costs_lambda = [0.1, 0.1, 10., 1000.]

my_decoder = StackedDecoders_Conv(10, decoder_sizes, 1, tensor_shapes, use_cuda) 

if use_cuda:
    my_encoder.cuda()
    my_decoder.cuda()
    
optimizer = Adam(my_encoder.parameters(), lr = learning_rate)

batch_size = 100
batch_times = np.shape(X_train_labelled)[0]//batch_size
batch_size_unlabelled = int(np.shape(X_train_unlabelled)[0] / batch_size) # v0.2.6

loss = []
accuracies = []
y_pred_all = []
for epoch in tqdm_notebook(range(1)):
    
    for i in range(batch_times):
        if i>=1:continue # Uncomment to view tensor shapes
                    
        batch_train_labelled_images = torch.FloatTensor(X_train_labelled[i*batch_size:(i+1)*batch_size-1])
        batch_train_unlabelled_images = torch.FloatTensor(X_train_unlabelled[i*batch_size_unlabelled:(i+1)*batch_size_unlabelled-1])
        batch_train_labelled_labels = torch.LongTensor(y_train[i*batch_size:(i+1)*batch_size-1])
        
        if use_cuda:
            batch_train_labelled_images = batch_train_labelled_images.cuda()
            batch_train_unlabelled_images = batch_train_unlabelled_images.cuda()
            batch_train_labelled_labels = batch_train_labelled_labels.cuda()
        
        labelled_data = Variable(batch_train_labelled_images, requires_grad=False)
        unlabelled_data = Variable(batch_train_unlabelled_images, requires_grad=False)
        labelled_target = Variable(batch_train_labelled_labels)

        my_encoder.train()
        optimizer.zero_grad()

        #1 forward noise labelled
        output_noise_labelled = my_encoder.forward_noise(labelled_data, printout = False)
        
        #2 forward noise unlabelled
        output_noise_unlabelled = my_encoder.forward_noise(unlabelled_data, printout = False)        

        #3 get_encoders_tilde_z
        tilde_z_layers_unlabelled = my_encoder.get_encoders_tilde_z(reverse=True, printout = False)

        #4 forward_encoders_clean
        output_clean_unlabelled = my_encoder.forward_clean(unlabelled_data, printout = False)

        #5 get_encoders_z_pre
        z_pre_layers_unlabelled = my_encoder.get_encoders_z_pre(reverse=True, printout = False)

        #6 get_encoders_z
        z_layers_unlabelled = my_encoder.get_encoders_z(reverse=True, printout = False)

        #7 get_encoder_tilde_z_bottom
        tilde_z_bottom_unlabelled = my_encoder.buffer_tilde_z_bottom.clone()
        
        #Extra get tensor shapes after convolutions in encoder
        tensor_shapes_after_convs = my_encoder.get_shapes_after_conv(printout = False)
        
        #DECODERS
        #8 
        hat_z_layers_unlabelled = my_decoder.forward(tilde_z_layers_unlabelled, 
                                                        output_noise_unlabelled,
                                                        tilde_z_bottom_unlabelled, printout = False)
        #9  
        z_pre_layers_unlabelled.append(unlabelled_data)
        #10
        z_layers_unlabelled.append(unlabelled_data)
        #11
        bn_hat_z_layers_unlabelled = my_decoder.bn_hat_z_layers(hat_z_layers_unlabelled, 
                                                                   z_pre_layers_unlabelled, printout = False)

        loss_supervised = torch.nn.CrossEntropyLoss()
        cost = loss_supervised(output_noise_labelled, labelled_target)

        cost.backward()
        optimizer.step()

        loss.append(cost.item())
    
    my_encoder.eval()
    with torch.no_grad():
        if use_cuda:
            X_test = X_test.cuda()
        
        X_test_cuda = Variable(X_test)
        y_pred = my_encoder.forward_clean(X_test_cuda)
        y_pred = np.squeeze(y_pred.data.cpu().numpy())
        y_pred_all.append(y_pred)
    
    pred = np.argmax(output_noise_labelled.detach().cpu().numpy(), axis=1)
    accuracies.append(accuracy_score(labelled_target.cpu().numpy(), pred))

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))












In [21]:
for i in bn_hat_z_layers_unlabelled: print(f'bn_hat_z_layers_unlabelled = {np.shape(i.detach().cpu().numpy())}')

bn_hat_z_layers_unlabelled = (143, 10, 1, 1)
bn_hat_z_layers_unlabelled = (143, 25, 2, 2)
bn_hat_z_layers_unlabelled = (143, 25, 4, 4)
bn_hat_z_layers_unlabelled = (143, 50, 7, 7)
bn_hat_z_layers_unlabelled = (143, 100, 14, 14)
bn_hat_z_layers_unlabelled = (143, 1, 28, 28)


In [None]:
for i in hat_z_layers_unlabelled: print(f'hat_z = {np.shape(i.detach().cpu().numpy())}')

In [None]:
for i in z_pre_layers_unlabelled: print(f'z_pre = {np.shape(i.detach().cpu().numpy())}')

In [None]:
for i in z_layers_unlabelled: print(f'z = {np.shape(i.detach().cpu().numpy())}')

In [None]:
kernel_size = 4
padding  = 1
stride = 2
a = torch.ones((143,1,1,1))
print(a.shape)
my_conv2d = torch.nn.ConvTranspose2d(1, 25, kernel_size = kernel_size, padding = padding, stride = stride)
my_conv2d.output_size = (-1, -1, 2, 2)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(25, 50, kernel_size = kernel_size, padding = padding, stride = stride)
my_conv2d.output_size = (-1, -1, 4, 4)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(50, 50, kernel_size = kernel_size, padding = padding, stride = stride)
my_conv2d.output_size = (-1, -1, 7, 7)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(50, 100, kernel_size = kernel_size, padding = padding, stride = stride)
my_conv2d.output_size = (-1, -1, 14, 14)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(100, 1, kernel_size = kernel_size, padding = padding, stride = stride)
my_conv2d.output_size = (-1, -1, 28, 28)
a = my_conv2d(a)
print(a.shape)

In [None]:
kernel_size = 4
padding  = 1
stride = 2
a = torch.ones((143,1,1,1))
print(a.shape)
my_conv2d = torch.nn.ConvTranspose2d(1, 25, kernel_size = kernel_size, padding = padding, stride = stride)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(25, 50, kernel_size = kernel_size, padding = padding, stride = stride)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(50, 50, kernel_size = kernel_size, padding = padding, stride = stride)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(50, 100, kernel_size = kernel_size, padding = padding, stride = stride)
a = my_conv2d(a)
print(a.shape)

my_conv2d = torch.nn.ConvTranspose2d(100, 1, kernel_size = kernel_size, padding = padding, stride = stride)
a = my_conv2d(a)
print(a.shape)

In [None]:
import torch

In [None]:
a=torch.ones((143, 10, 1, 1))*3
a.shape

In [None]:
b=torch.ones((143, 10, 1, 1))*2
b.shape

In [None]:
c = torch.mul(a,b)
c.shape

In [None]:
torch.mul(b_a2, u_l)

In [None]:
a = np.ones((1,10,1,1))
b = np.ones((143,1))

In [None]:
c = np.einsum('cChw,bc->bChw',a,b)

In [None]:
np.shape(c)

In [None]:
mu_l = torch.mul(b_a1, torch.sigmoid(torch.mul(b_a2, u_l) + b_a3)) + \
               torch.mul(b_a4, u_l) + \
               b_a5

In [None]:
for i in tensor_shapes_after_convs: print(i)

In [None]:
accuracies_test = []
for i in y_pred_all:
    #pred_proba = i.data.cpu()
    pred = np.argmax(i, 1)
    accuracies_test.append(accuracy_score(y_test.numpy(),pred))
fig, ax = plt.subplots(1,2,figsize=(11,6))
ax[0].plot(accuracies, label = 'train')
ax[0].plot(accuracies_test, label = 'test')
ax[0].legend()
ax[1].plot(loss)

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test.cpu().numpy(), pred)

---

# Ladder

In [None]:
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)

In [None]:
# Configure the Ladder
noise_std = 0.2
encoder_sizes = [1000, 500, 250, 250, 250, 10]
decoder_sizes = [250, 250, 250, 500, 1000, 784]
unsupervised_costs_lambda = [0.1, 0.1, 0.1, 0.1, 0.1, 10., 1000.]  # 0.1, 0.1, 0.1, 0.1, 0.1, 10., 1000.
encoder_activations = ["relu", "relu", "relu", "relu", "relu", "softmax"]
encoder_train_bn_scaling = [False, False, False, False, False, True]

use_cuda = True 

ladder = Ladder(encoder_sizes, decoder_sizes, encoder_activations,
                    encoder_train_bn_scaling, noise_std, use_cuda)

epochs = 50
decay_epoch = 15
learning_rate = 0.02
initial_learning_rate = learning_rate
optimizer = Adam(ladder.parameters(), lr = learning_rate)
loss_supervised = torch.nn.CrossEntropyLoss()
loss_unsupervised = torch.nn.MSELoss()

if use_cuda:
    ladder.cuda()

batch_size = 100
batch_times = np.shape(X_train_labelled)[0]//batch_size
batch_size_unlabelled = int(np.shape(X_train_unlabelled)[0] / batch_size)

supervised_costs = []
unsupervised_costs = []
cost_main = []
y_pred_all = []

agg_cost = 0.
agg_supervised_cost = 0.
agg_unsupervised_cost = 0.
num_batches = 0

for epoch in range(epochs):
    
    if epoch > decay_epoch:
        ratio = float(epochs - epoch) / (epochs - decay_epoch)
        learning_rate = learning_rate * ratio
        optimizer = Adam(ladder.parameters(), lr = learning_rate)
        
    for i in range(batch_times):
        
        if (i+1)%100==0 or i==batch_times-1: print(f'epoch = {epoch}, batch = {i+1}')
    
        ladder.train()
        optimizer.zero_grad()
        
        batch_train_labelled_images = torch.FloatTensor(X_train_labelled[i*batch_size:(i+1)*batch_size-1])
        batch_train_unlabelled_images = torch.FloatTensor(X_train_unlabelled[i*batch_size_unlabelled:(i+1)*batch_size_unlabelled-1])
        batch_train_labelled_labels = torch.LongTensor(y_train[i*batch_size:(i+1)*batch_size-1])
        
        if use_cuda:
            batch_train_labelled_images = batch_train_labelled_images.cuda()
            batch_train_unlabelled_images = batch_train_unlabelled_images.cuda()
            batch_train_labelled_labels = batch_train_labelled_labels.cuda()
        
        labelled_data = Variable(batch_train_labelled_images, requires_grad=False)
        unlabelled_data = Variable(batch_train_unlabelled_images, requires_grad=False)
        labelled_target = Variable(batch_train_labelled_labels)
        
        # encoders
        output_noise_labelled = ladder.forward_encoders_noise(labelled_data)            # 1
        output_noise_unlabelled = ladder.forward_encoders_noise(unlabelled_data)        # 2
        tilde_z_layers_unlabelled = ladder.get_encoders_tilde_z(reverse=True)           # 3
        output_clean_unlabelled = ladder.forward_encoders_clean(unlabelled_data)        # 4
        z_pre_layers_unlabelled = ladder.get_encoders_z_pre(reverse=True)               # 5
        z_layers_unlabelled = ladder.get_encoders_z(reverse=True)                       # 6
        tilde_z_bottom_unlabelled = ladder.get_encoder_tilde_z_bottom()                 # 7
        # decoders
        hat_z_layers_unlabelled = ladder.forward_decoders(tilde_z_layers_unlabelled,    
                                                      output_noise_unlabelled,
                                                      tilde_z_bottom_unlabelled)
        z_pre_layers_unlabelled.append(unlabelled_data)
        z_layers_unlabelled.append(unlabelled_data)
        
        bn_hat_z_layers_unlabelled = ladder.decoder_bn_hat_z_layers(hat_z_layers_unlabelled, 
                                                                z_pre_layers_unlabelled)
        # costs
        cost_supervised = loss_supervised.forward(output_noise_labelled, labelled_target)
        cost_unsupervised = 0.
        assert len(z_layers_unlabelled) == len(bn_hat_z_layers_unlabelled)
        for cost_lambda, z, bn_hat_z in zip(unsupervised_costs_lambda, z_layers_unlabelled, bn_hat_z_layers_unlabelled):
            c = cost_lambda * loss_unsupervised.forward(bn_hat_z, z)
            cost_unsupervised += c
    
        # backprop
        cost = cost_supervised + cost_unsupervised
        cost.backward()                  
        optimizer.step()
    
        agg_cost += cost.item()
        agg_supervised_cost += cost_supervised.item()
        agg_unsupervised_cost += cost_unsupervised.item()
        num_batches += 1
        
        supervised_costs.append(cost_supervised.item())
        unsupervised_costs.append(cost_unsupervised.item())
        cost_main.append(cost.item())
        
    # eval
    ladder.eval()
    with torch.no_grad(): # So we don't run out of memory 
        if use_cuda:
            X_test = X_test.cuda()
        
        X_test_cuda = Variable(X_test)
        y_pred = ladder.forward_encoders_clean(X_test_cuda)
        y_pred_all.append(y_pred)

In [None]:
plt.figure(figsize=(14,9))
accuracies = []
for i in y_pred_all:
    pred_proba = i.data.cpu()
    pred = np.argmax(pred_proba, 1)
    accuracies.append(accuracy_score(y_test.numpy(),pred))
plt.plot(np.arange(1,len(accuracies)+1),accuracies)
plt.ylabel('Accuracy', fontsize=18)
plt.xlabel('Epochs', fontsize=18)
plt.title('Test Acc', fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
notes = mpatches.Patch(color='#1F77B4', label=f'cost.back(), \nini_lr = {initial_learning_rate:.3f}, \
last_lamda={unsupervised_costs_lambda[-1]} \ndecay_ep={decay_epoch}, \nlabelled={np.shape(X_train_labelled)[0]}({labeled_ratio}%)\
\nunlabelled={np.shape(X_train_unlabelled)[0]}({1-labeled_ratio}%)\
\ntest = {np.shape(X_test)[0]}')
plt.legend(handles=[notes], fontsize=18)
plt.savefig(f'ladder_mnist_acc_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details}.png', bbox_inches='tight')

In [None]:
plt.figure(figsize=(14,9))
name_cost_main, = plt.plot(np.arange(1,len(cost_main)+1), cost_main, label='main_cost')
name_supervised_costs, = plt.plot(np.arange(1,len(supervised_costs)+1), supervised_costs, label='supervised_cost')
name_unsupervised_costs, = plt.plot(np.arange(1,len(unsupervised_costs)+1), unsupervised_costs, label='unsupervised_cost')
plt.ylabel('Loss', fontsize=18)
plt.xlabel('Mini Batch Iterations', fontsize=18)
plt.title('Test Acc', fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.legend(handles=[name_cost_main, name_supervised_costs, name_unsupervised_costs], fontsize=18)
plt.savefig(f'ladder_mnist_loss_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details}.png', bbox_inches='tight')

In [None]:
my_dict = {'accuracies':accuracies,'cost_main':cost_main,'supervised_costs':supervised_costs,
           'unsupervised_costs':unsupervised_costs}
df = pd.DataFrame(dict([ (k,Series(v)) for k,v in my_dict.items() ]))
df.to_csv(f'ladder_mnist_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details}.csv',index=False)

In [None]:
df.head()

---

# Just encoder

In [None]:
use_cuda = True 

epochs = 50
decay_epoch = 15
learning_rate = 0.02
initial_learning_rate = learning_rate
optimizer = Adam(just_encoder.parameters(), lr = learning_rate)
loss_supervised = torch.nn.CrossEntropyLoss()

if use_cuda:
    just_encoder.cuda()

batch_size = 100
batch_times = np.shape(X_train_labelled)[0]//batch_size

supervised_costs = []
y_pred_all = []

agg_supervised_cost = 0.
num_batches = 0

for epoch in range(epochs):
    if epoch > decay_epoch:
        ratio = float(epochs - epoch) / (epochs - decay_epoch)
        learning_rate = learning_rate * ratio
        optimizer = Adam(just_encoder.parameters(), lr = learning_rate)
        
    for i in range(batch_times):
        
        if (i+1)%100==0 or i==batch_times-1: print(f'epoch = {epoch}, batch = {i+1}')
    
        just_encoder.train()
        optimizer.zero_grad()
        
        batch_train_labelled_images = torch.FloatTensor(X_train_labelled[i*batch_size:(i+1)*batch_size-1])
        batch_train_labelled_labels = torch.LongTensor(y_train[i*batch_size:(i+1)*batch_size-1])
        
        if use_cuda:
            batch_train_labelled_images = batch_train_labelled_images.cuda()
            batch_train_labelled_labels = batch_train_labelled_labels.cuda()
        
        labelled_data = Variable(batch_train_labelled_images, requires_grad=False)
        labelled_target = Variable(batch_train_labelled_labels)
        
        # encoders
        output_noise_labelled = just_encoder.forward_clean(labelled_data)      
        
        # costs
        cost_supervised = loss_supervised.forward(output_noise_labelled, labelled_target)
        
        # backprop
        cost_supervised.backward()                  
        optimizer.step()
    
        agg_supervised_cost += cost_supervised.item()
        num_batches += 1
        
        supervised_costs.append(cost_supervised.item())
        
    # eval
    just_encoder.eval()
    with torch.no_grad(): # So we don't run out of memory 
        if use_cuda:
            X_test = X_test.cuda()
        
        X_test_cuda = Variable(X_test)
        y_pred = just_encoder.forward_clean(X_test_cuda)
        y_pred_all.append(y_pred)

In [None]:
details_just_encoders = details + '_just_encoders'

In [None]:
plt.figure(figsize=(14,9))
accuracies = []
for i in y_pred_all:
    pred_proba = i.data.cpu()
    pred = np.argmax(pred_proba, 1)
    accuracies.append(accuracy_score(y_test.numpy(),pred))
plt.plot(np.arange(1,len(accuracies)+1),accuracies)
plt.ylabel('Accuracy', fontsize=18)
plt.xlabel('Epochs', fontsize=18)
plt.title('Test Acc (just encoder)', fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
notes = mpatches.Patch(color='#1F77B4', label=f'cost.back(), \nini_lr = {initial_learning_rate:.3f}, \
last_lamda={unsupervised_costs_lambda[-1]} \ndecay_ep={decay_epoch}, \nlabelled={np.shape(X_train_labelled)[0]}({labeled_ratio}%)\
\nunlabelled={np.shape(X_train_unlabelled)[0]}({1-labeled_ratio}%)\
\ntest = {np.shape(X_test)[0]}')
plt.legend(handles=[notes], fontsize=18)
#plt.savefig(f'ladder_mnist_acc_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details_just_encoders}.png', bbox_inches='tight')

In [None]:
plt.figure(figsize=(14,9))
name_supervised_costs, = plt.plot(np.arange(1,len(supervised_costs)+1), supervised_costs, label='supervised_cost')
plt.ylabel('Loss', fontsize=18)
plt.xlabel('Mini Batch Iterations', fontsize=18)
plt.title('Test Acc', fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.legend(handles=[name_supervised_costs], fontsize=18)
#plt.savefig(f'ladder_mnist_loss_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details_just_encoders}.png', bbox_inches='tight')

In [None]:
my_dict = {'accuracies':accuracies,'supervised_costs':supervised_costs}
df = pd.DataFrame(dict([ (k,Series(v)) for k,v in my_dict.items() ]))
df.to_csv(f'ladder_mnist_{int(labeled_ratio*100)}%_label_{initial_learning_rate:.3f}_lr_{int(unsupervised_costs_lambda[-1])}_lamda_{details_just_encoders}.csv',index=False)

---

# Compare ladder vs encoder

In [None]:
def ladder_vs_encoder(df_ladder, df_encoder, train_samples):
    plt.figure(figsize=(14,9))
    name_ladder, = plt.plot(df_ladder['accuracies'].values, label = 'ladder')
    name_encoder, = plt.plot(df_encoder['accuracies'].values, label = 'encoder')
    plt.ylabel('Acc', fontsize=18)
    plt.xlabel('Epochs', fontsize=18)
    plt.title('Ladder vs Encoder (Test Acc)', fontsize=18)
    plt.xticks(fontsize=18)
    plt.yticks(fontsize=18)
    notes = mpatches.Patch(hatch='.',color='#FFFFFF', label=f'{train_samples} train samples')
    plt.legend(handles=[name_ladder, name_encoder, notes], fontsize=18)
    plt.ylim([.3, 1])
    #plt.savefig(f'ladder_vs_encoder_{train_samples}_samples.png', bbox_inches='tight')

In [None]:
train_samples = 1000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)

In [None]:
train_samples = 2000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)

In [None]:
train_samples = 4000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)

In [None]:
train_samples = 8000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)

In [None]:
train_samples = 16000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)

In [None]:
train_samples = 32000
results_path = 'ladder results - small subsets/'
df_encoder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples_just_encoders.csv')
df_ladder = pd.read_csv(results_path + f'ladder_mnist_10%_label_0.020_lr_1000_lamda_{train_samples}_train_samples.csv')
ladder_vs_encoder(df_ladder, df_encoder, train_samples)