
## Biweekly report
### Pragna
### Mandadi
### Comparative Analysis of CNN vs MCDNN and MCDNN vs MCDNN with Dropout

In this biweekly report, I would like to start off with the basic implementations of various neural networks and use these implementations to create a Multi-column Deep Neural Network inspired from the paper:https://arxiv.org/pdf/1202.2745.pdf Through this report I would like to get to know the implementation details of various neural nets which would help me with the future reports in making any tweaks and experimenting with these neural networks. 
I have also done a comparision analysis with CNN vs MCDNN and MCDNN vs MCDNN with dropout over limited MNIST datset due to constarints in time and resources

In [None]:
import os
import sys
import timeit
import time
import pickle

import numpy

import theano
import theano.tensor as T
from theano.tensor.signal import pool
from theano.tensor.nnet import conv


For the implementation of various neural networks I chose to go with the theano library. I initally chose Theano because Theano attains high speed with problems involving high amounts data when there is GPU available. But unfortunately I couldn’t get an access to a GPU and as I have already written most of my code I went along with it. but there are other benefits that came along.It knows how to take structures and convert them into very efficient code that uses numpy and some native libraries. It is mainly designed to handle the types of computation required for large neural network algorithms used in Deep Learning. Theano is a sort of hybrid between numpy and sympy, an attempt is made to combine the two into one powerful library. Some advantages of theano are as follows:  

__Stability Optimization__: Theano can find out some unstable expressions and can use more stable means to evaluate them

__Symbolic Differentiation__: Theano is smart enough to automatically create symbolic graphs for computing gradients
That is why, it is a very popular library in the field of Deep Learning. 
This also gave me a an opportunity to make myself familiar with a new library. I have referred to the following links to learn Theano: 
https://www.geeksforgeeks.org/theano-in-python/
http://ir.hit.edu.cn/~jguo/docs/notes/a_simple_tutorial_on_theano.pdf

In [None]:
!pip3 install theano

In [3]:
!pip install --upgrade https://github.com/Theano/Theano/archive/master.zip



Collecting https://github.com/Theano/Theano/archive/master.zip
  Using cached https://github.com/Theano/Theano/archive/master.zip


In [None]:
pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

I have implemented a typical hidden layer of an MLP. The units are fully connected and I have used tanh as the activation function, taking the inspiration from the git hub repo(https://github.com/xanwerneck/ml_mnist) which implemented the MCDNN exactly as described in the paper I have mentioned earlier. 
W is initialized with W_values which is uniformly sampled from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden)) using a random number generator

In [8]:
class HiddenLayer(object):
    def __init__(self, rng, input, n_in, n_out, W=None, b=None,
                 activation=T.tanh):
        self.input = input
        if W is None:
            W_values = numpy.asarray(
                rng.uniform(
                    low=-numpy.sqrt(6. / (n_in + n_out)),
                    high=numpy.sqrt(6. / (n_in + n_out)),
                    size=(n_in, n_out)
                ),
                dtype=theano.config.floatX
            )
        else:
            W_values = numpy.asarray(W, dtype=theano.config.floatX)

        if b is None:
            b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
        else:
            b_values = numpy.asarray(b, dtype=theano.config.floatX)
        self.W = theano.shared(value=W_values, name='W', borrow=True)
        self.b = theano.shared(value=b_values, name='b', borrow=True)
        lin_output = T.dot(input, self.W) + self.b
        self.output = (
            lin_output if activation is None
            else activation(lin_output)
        )
        self.params = [self.W, self.b]

I have implemented the Dropout version of the hidden layer for comparative analysis later. Dropout is a technique for addressing the problem of overfitting by combining the predictions of many different large neural nets at test time. 

In [126]:
class DropoutHiddenLayer(HiddenLayer):
    def __init__(self, rng, input, n_in, n_out,
                 activation, dropout_rate, W=None, b=None):
        super(DropoutHiddenLayer, self).__init__(
                rng=rng, input=input, n_in=n_in, n_out=n_out, W=W, b=b,
                activation=activation)

        self.output = _dropout_from_layer(rng, self.output, p=dropout_rate)

A multilayer perceptron is a feedforward artificial neural network model that has one layer or more of hidden units & nonlinear activations. Intermediate layers usually have as activation function tanh or thesigmoid function (defined here by a HiddenLayer class)  while thetop layer is a softmax layer (defined here by a LogisticRegression class has a nonlinear activation function (usually tanh or sigmoid) . One can use many such hidden layers making the architecture deep. 
I have implemented MLP for the experimental purposes and I was also keen on implementing as many neural nets as possible in theano for me to gain expertise in both the theano library and the concepts of Neural Networks.

In [24]:
class MLP(object):
    def __init__(self, rng, input, n_in, n_hidden, n_out):
        self.hiddenLayer = HiddenLayer(
            rng=rng,
            input=input,
            n_in=n_in,
            n_out=n_hidden,
            activation=T.tanh
        )
        self.logRegressionLayer = LogisticRegression(
            input=self.hiddenLayer.output,
            n_in=n_hidden,
            n_out=n_out
        )
        self.L1 = (
            abs(self.hiddenLayer.W).sum()
            + abs(self.logRegressionLayer.W).sum()
        )
        self.L2_sqr = (
            (self.hiddenLayer.W ** 2).sum()
            + (self.logRegressionLayer.W ** 2).sum()
        )
        self.negative_log_likelihood = (
            self.logRegressionLayer.negative_log_likelihood
        )
        self.errors = self.logRegressionLayer.errors
        self.params = self.hiddenLayer.params + self.logRegressionLayer.params
        self.input = input

Logistic regression is a probabilistic, linear classifier. It is parametrized by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is done by projecting data points onto a set of hyperplanes, the distance to which is used to determine a class membership probability.
Mathematically, this can be written as:
.. math::
  P(Y=i|x, W,b) &= softmax_i(W x + b) \\
                &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}}
The output of the model or prediction is then done by taking the argmax of the vector whose i'th element is P(Y=i|x).
The logistic regression is fully described by a weight matrix :math: W and bias vector :math:b. Classification is done by projecting data points onto a set of hyperplanes, the distance to which is used to determine a class membership probability.

In [11]:
from PIL import Image
import scipy.misc

import pdb

SS = 28
class LogisticRegression(object):
    def __init__(self, input, n_in, n_out, W=None, b=None):
        if W is None:
            W_values = numpy.zeros((n_in, n_out), dtype=theano.config.floatX)
        else:
            W_values = numpy.asarray(W, dtype=theano.config.floatX)

        if b is None:
            b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
        else:
            b_values = numpy.asarray(b, dtype=theano.config.floatX)

        self.W = theano.shared(value=W_values, name='W', borrow=True)
        self.b = theano.shared(value=b_values, name='b', borrow=True)
        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)
        self.params = [self.W, self.b]
        
    

The below function negative_log_likelihoof return the mean of the negative log-likelihood of the prediction of this model under a given target distribution.
The error function returns a float representing the number of errors in the minibatch over the total number of examples of the minibatch

In [14]:
class LogisticRegression(LogisticRegression):
    def negative_log_likelihood(self, y):
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
    def errors(self, y):
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        if y.dtype.startswith('int'):
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()
        

ConvOp is the main workhorse for implementing a convolutional layer in Theano. ConvOp is used by theano.tensor.signal.conv2d, which takes two symbolic inputs:
    a 4D tensor corresponding to a mini-batch of input images. The shape of the tensor is as follows: [mini-batch size, number of input feature maps, image height, image width].
    a 4D tensor corresponding to the weight matrix W. The shape of the tensor is: [number of feature maps at layer m, number of feature maps at layer m-1, filter height, filter width]
nnet.conv2d: This is the standard operator for convolutional neural networks working with batches of multi-channel 2D images
Another important concept of CNNs is max-pooling, which is a form of non-linear down-sampling. Max-pooling partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum value.
Max-pooling is useful in vision for two reasons:
    By eliminating non-maximal values, it reduces computation for upper layers.
    It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Max-pooling is done in Theano by way of theano.tensor.signal.downsample.max_pool_2d. This function takes as input an N dimensional tensor (where N >= 2) and a downscaling factor and performs max-pooling over the 2 trailing dimensions of the tensor.

LeNetConvPoolLayer class, implements a {convolution + max-pooling} layer.

In [68]:
class LeNetConvPoolLayer(object):
    def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2), W=None, b=None):
        assert image_shape[1] == filter_shape[1]
        self.input = input
        fan_in = numpy.prod(filter_shape[1:])
        fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
                   numpy.prod(poolsize))
        if W is None:
            W_bound = numpy.sqrt(6. / (fan_in + fan_out))
            W_values = numpy.asarray(
                rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
                dtype=theano.config.floatX
            )
        else:
            W_values = W

        if b is None:
            b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
        else:
            b_values = b

        self.W = theano.shared(value=W_values, name='W', borrow=True)
        self.b = theano.shared(value=b_values, name='b', borrow=True)

        conv_out = conv.conv2d(
            input=input,
            filters=self.W,
            filter_shape=filter_shape,
            image_shape=image_shape
        )

        if (poolsize[0] == 1 and poolsize[1] == 1):
            pooled_out = conv_out
        else:
            pooled_out = pool.pool_2d(
                input=conv_out,
                ds=poolsize,
                ignore_border=True
            )

        self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        self.params = [self.W, self.b]
        self.input = input


Below is the dropout version of LeNetConvPoolLayer

In [128]:
class DropoutLeNetConvPoolLayer(LeNetConvPoolLayer):
    def __init__(self, rng, input, filter_shape, image_shape, poolsize,
               dropout_rate, W=None, b=None):
        super(DropoutLeNetConvPoolLayer, self).__init__(
          rng=rng, input=input, filter_shape=filter_shape, image_shape=image_shape,
          poolsize=poolsize, W=W, b=b)

        self.output = _dropout_from_layer(rng, self.output, p=dropout_rate)

This function takes a layer (which can be either a layer of units in an MLP or a layer of feature maps in a CNN) and drop units from the layer with a probability of p (or in the case of CNN pixels from feature maps with a probability of p).

In [125]:
def _dropout_from_layer(rng, layer, p):
    """p is the probablity of dropping a unit
    """
    srng = theano.tensor.shared_randomstreams.RandomStreams(
            rng.randint(999999))
    # p=1-p because 1's indicate keep and p is probability of dropping
    mask = srng.binomial(n=1, p=1-p, size=layer.shape)
    # The cast is important because
    # int * float32 = float64 which pulls things off the gpu
    output = layer * T.cast(mask, theano.config.floatX)
    return output

For my experiments I have used the MNIST datset. The load dataset function loads the MNIST dataset. The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. You can download the datasey from in the pickeled format: http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
Once the dataset is loaded I have reduced the size of training to 10000 which is usually 50000 and reduced the validation set to 2000 which is usally 10000 and the test set to 2000 as well which again was 10000. I tried to preserve the ratio of training to validation and to testing even though I reduced the entire dataset due to the lack of computational effeciency and resources.
Once the train, test and validation sets are determined we need to process the data based the normalization width of the particular DNN column that we are trying to run this data on. For that we use prepare_digits and prepare_digits inturn calls the normalize_digit function which helps in resizing the images which are usually 28x28 and add any padding if necessary.

In [122]:
def prepare_digits(sets, end_size, digit_normalized_width):
        set_x, set_y = sets[0], sets[1]
        out = numpy.ndarray((set_x.shape[0], end_size**2), dtype=numpy.float32)

        for i in range(0,set_x.shape[0]):
            x = set_x[i].reshape((SS,SS))
            if digit_normalized_width and (set_y[i] - 1): # don't normalize images of digit '1'
                out[i] = normalize_digit(x, digit_normalized_width, end_size).reshape(end_size**2)
            else:
                out[i] = pad_image(x, end_size).reshape(end_size**2)
        return out

def pad_image(x, end_size):
        cs = x.shape[0]
        padding = end_size - cs
        bp = round(padding / 2)
        ap = padding - bp
        pads = (bp,ap)
        if bp + ap > 0:
            return numpy.pad(x,(pads,pads),mode='constant').reshape(end_size**2)
        else:
            si = -bp
            ei = cs + ap
            return x[si:ei, si:ei].reshape(end_size**2)
from PIL import Image
def normalize_digit(x, digit_normalized_width, end_size):
        width_diff = digit_normalized_width - sum(sum(x) != 0)
        if width_diff:
            nd = SS + width_diff # new dim
            new_size = nd, nd
            im = Image.fromarray(x)
            normalized_image = im.resize(new_size, Image.ANTIALIAS)
            x = numpy.array(normalized_image.getdata(), dtype=numpy.float32).reshape((nd,nd)) / 255
        return pad_image(x, end_size)
    
def subtract_channel_mean(dataset, image_shape, channel_means, accuracy_dtype):
        orig_shape = dataset[0].shape
        full_shape = (dataset[0].shape[0], image_shape[0], image_shape[1], image_shape[2])
        xs = numpy.asarray(dataset[0].reshape(full_shape), dtype=accuracy_dtype)
        xs[:,:,:,0] -= channel_means[0]
        xs[:,:,:,1] -= channel_means[1]
        xs[:,:,:,2] -= channel_means[2]
        return (xs.reshape(orig_shape), dataset[1])

def divide_channel_max(dataset, image_shape, channel_maxes):
        orig_shape = dataset[0].shape
        full_shape = (dataset[0].shape[0], image_shape[0], image_shape[1], image_shape[2])
        xs = numpy.asarray(dataset[0].reshape(full_shape), dtype='float32')
        xs[:,:,:,0] /= channel_maxes[0]
        xs[:,:,:,1] /= channel_maxes[1]
        xs[:,:,:,2] /= channel_maxes[2]
        return (xs.reshape(orig_shape), dataset[1])

def load_data(dataset, digit_normalized_width=0, digit_out_image_size=SS,
              conserve_gpu_memory=False, center=0, normalize=0, image_shape=None, y_values_only=False):
    data_dir, data_file = os.path.split(dataset)
    data_ext = '.'.join(data_file.split('.')[1:])
    input_pixel_max = 1 if data_file == 'mnist.pkl.gz' else 255

    if data_file == 'mnist.pkl.gz':
        if data_dir == "" and not os.path.isfile(dataset):
            new_path = os.path.join(
                os.path.split(os.path.abspath(''))[0],
                "data",
                dataset
            )
            if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
                dataset = new_path

        if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
            import urllib.request
            origin = (
                'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
            )
            print('Downloading data from %s' % origin)
            print(data_dir)
            urllib.request.urlretrieve(origin, dataset)

    print('... loading data')
    if data_file == 'mnist.pkl.gz':
        f = gzip.open(dataset, 'rb')
        u = pickle._Unpickler(f)
        u.encoding = 'latin1'
        train_set, valid_set, test_set = u.load()
        train_set = (train_set[0][:10000],train_set[1][:10000])
        valid_set = (valid_set[0][:2000],valid_set[1][:2000])
        test_set = (test_set[0][:2000],test_set[1][:2000])
        if digit_normalized_width or (digit_out_image_size != SS):
            if digit_normalized_width:
                print('... normalizing digits to width %i with extra padding %i' % (digit_normalized_width, digit_out_image_size - SS))
            else:
                print('... (un)padding digits from %i -> %i' % (SS, digit_out_image_size))
            train_set = (prepare_digits(train_set, digit_out_image_size, digit_normalized_width), train_set[1])
            valid_set = (prepare_digits(valid_set, digit_out_image_size, digit_normalized_width), valid_set[1])
            test_set =  (prepare_digits(test_set, digit_out_image_size, digit_normalized_width),  test_set[1])
        else:
            print('... skipping digit normalization and image padding')

        f.close()
    elif data_ext == 'npz':
        with numpy.load(dataset) as archive:
            train_set = (archive['arr_0'], archive['arr_1'])
            valid_set = (archive['arr_2'], archive['arr_3'])
            test_set =  (archive['arr_4'], archive['arr_5'])
    else:
        raise ValueError("unsupported data extension %s" % data_ext)

    if y_values_only:
        print('... returning y values')
        return (train_set[1], valid_set[1], test_set[1])

    accuracy_dtype = int
    if center == 1:
        assert(image_shape)
        print('... subtracting channel means')
        channel_means = numpy.mean(train_set[0].reshape(train_set[0].shape[0], *image_shape), axis=(0,1,2))
        train_set = subtract_channel_mean(train_set, image_shape, channel_means, accuracy_dtype)
        valid_set = subtract_channel_mean(valid_set, image_shape, channel_means, accuracy_dtype)
        test_set = subtract_channel_mean(test_set, image_shape, channel_means, accuracy_dtype)
    elif center == 2:
        print('... subtracting mean images')
        raise NotImplementedError()

    if not input_pixel_max == 1:
        if normalize == 1:
            print('... normalizing with max channel pixel values')
            channel_maxes = numpy.array(255 - channel_means, dtype=accuracy_dtype)
            train_set = divide_channel_max(train_set, image_shape, channel_maxes)
            valid_set = divide_channel_max(valid_set, image_shape, channel_maxes)
            test_set = divide_channel_max(test_set, image_shape, channel_maxes)
        elif normalize == 2:
            print('... normalizing with image std deviations')
            raise NotImplementedError()

    print('... sharing data')
    def share_dataset(data_xy, borrow=True, image_shape=None, conserve_gpu_memory=False):
        data_x, data_y = data_xy
        if image_shape:
            data_x = data_x.reshape(data_x.shape[0], *image_shape)
            data_x = numpy.rollaxis(data_x, 3, 1)

        if conserve_gpu_memory:
            shared_x = theano.tensor._shared(numpy.asarray(data_x,
                                               dtype=theano.config.floatX),
                                 borrow=borrow)
            shared_y = theano.tensor._shared(numpy.asarray(data_y,
                                               dtype=theano.config.floatX),
                                 borrow=borrow)
        else:
            shared_x = theano.shared(numpy.asarray(data_x,
                                                   dtype=theano.config.floatX),
                                     borrow=borrow)
            shared_y = theano.shared(numpy.asarray(data_y,
                                                   dtype=theano.config.floatX),
                                     borrow=borrow)
        return shared_x, T.cast(shared_y, 'int32')

    test_set_x, test_set_y = share_dataset(test_set, image_shape=image_shape, conserve_gpu_memory=conserve_gpu_memory)
    valid_set_x, valid_set_y = share_dataset(valid_set, image_shape=image_shape, conserve_gpu_memory=conserve_gpu_memory)
    train_set_x, train_set_y = share_dataset(train_set, image_shape=image_shape, conserve_gpu_memory=conserve_gpu_memory)

    rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
            (test_set_x, test_set_y)]
    return rval

    

        

In [168]:
def evaluate_lenet5(learning_rate=0.1, n_epochs=50,
                    dataset='mnist.pkl.gz',
                    nkerns=[20, 50], batch_size=100):

    rng = numpy.random.RandomState(23455)

    datasets = load_data(dataset)

    train_set_x, train_set_y = datasets[0]
    valid_set_x, valid_set_y = datasets[1]
    test_set_x, test_set_y = datasets[2]
    n_train_batches = train_set_x.get_value(borrow=True).shape[0]
    n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
    n_test_batches = test_set_x.get_value(borrow=True).shape[0]
    n_train_batches //= batch_size
    n_valid_batches //= batch_size
    n_test_batches //= batch_size
    index = T.lscalar()
    x = T.matrix('x')   
    print('... building the model')

    layer0_input = x.reshape((batch_size, 1, 28, 28))

    layer0 = LeNetConvPoolLayer(
        rng,
        input=layer0_input,
        image_shape=(batch_size, 1, 28, 28),
        filter_shape=(nkerns[0], 1, 5, 5),
        poolsize=(2, 2)
    )

    layer1 = LeNetConvPoolLayer(
        rng,
        input=layer0.output,
        image_shape=(batch_size, nkerns[0], 12, 12),
        filter_shape=(nkerns[1], nkerns[0], 5, 5),
        poolsize=(2, 2)
    )

    layer2_input = layer1.output.flatten(2)

    layer2 = HiddenLayer(
        rng,
        input=layer2_input,
        n_in=nkerns[1] * 4 * 4,
        n_out=500,
        activation=T.tanh
    )
    
    layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)

    cost = layer3.negative_log_likelihood(y)

    test_model = theano.function(
        [index],
        layer3.errors(y),
        givens={
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    validate_model = theano.function(
        [index],
        layer3.errors(y),
        givens={
            x: valid_set_x[index * batch_size: (index + 1) * batch_size],
            y: valid_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    params = layer3.params + layer2.params + layer1.params + layer0.params

    grads = T.grad(cost, params)

    updates = [
        (param_i, param_i - learning_rate * grad_i)
        for param_i, grad_i in zip(params, grads)
    ]

    train_model = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )
    print('... training')
    patience = 10000
    patience_increase = 2 
    improvement_threshold = 0.995 
    validation_frequency = min(n_train_batches, patience / 2)
                

    best_validation_loss = numpy.inf
    best_iter = 0
    test_score = 0.
    start_time = timeit.default_timer()

    epoch = 0
    done_looping = False

    while (epoch < n_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in range(n_train_batches):

            iter = (epoch - 1) * n_train_batches + minibatch_index

            if iter % 100 == 0:
                print('training @ iter = ', iter)
            cost_ij = train_model(minibatch_index)

            if (iter + 1) % validation_frequency == 0:

                # compute zero-one loss on validation set
                validation_losses = [validate_model(i) for i
                                     in range(n_valid_batches)]
                this_validation_loss = numpy.mean(validation_losses)
                print('epoch %i, minibatch %i/%i, validation error %f %%' %
                      (epoch, minibatch_index + 1, n_train_batches,
                       this_validation_loss * 100.))

                # if we got the best validation score until now
                if this_validation_loss < best_validation_loss:

                    #improve patience if loss improvement is good enough
                    if this_validation_loss < best_validation_loss *  \
                       improvement_threshold:
                        patience = max(patience, iter * patience_increase)

                    # save best validation score and iteration number
                    best_validation_loss = this_validation_loss
                    best_iter = iter

                    # test it on the test set
                    test_losses = [
                        test_model(i)
                        for i in range(n_test_batches)
                    ]
                    test_score = numpy.mean(test_losses)
                    print(('     epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))

            if patience <= iter:
                done_looping = True
                break

    end_time = timeit.default_timer()
    print('Optimization complete.')
    print('Best validation score of %f %% obtained at iteration %i, '
          'with test performance %f %%' %
          (best_validation_loss * 100., best_iter + 1, test_score * 100.))
    print (sys.stderr, ('The code for file ' +
                          os.path.split(os.path.abspath(''))[1] +
                          ' ran for %.2fm' % ((end_time - start_time) / 60.)))


Below is the demonstration of training the MNIST dataset on the CNN (LeNetConvPoolLayer) with a batch size of 100 and over 50 epochs and. We can see in the results that the best validtion score is 1.8% which was converging towards the end of the epochs and the test performance on the trained model was 2.65%

In [169]:
evaluate_lenet5()

... loading data
... skipping digit normalization and image padding
... sharing data
... building the model


  pooled_out = pool.pool_2d(


... training
training @ iter =  0
epoch 1, minibatch 100/100, validation error 12.450000 %
     epoch 1, minibatch 100/100, test error of best model 13.650000 %
training @ iter =  100
epoch 2, minibatch 100/100, validation error 8.550000 %
     epoch 2, minibatch 100/100, test error of best model 10.100000 %
training @ iter =  200
epoch 3, minibatch 100/100, validation error 6.500000 %
     epoch 3, minibatch 100/100, test error of best model 8.050000 %
training @ iter =  300
epoch 4, minibatch 100/100, validation error 5.000000 %
     epoch 4, minibatch 100/100, test error of best model 6.450000 %
training @ iter =  400
epoch 5, minibatch 100/100, validation error 4.300000 %
     epoch 5, minibatch 100/100, test error of best model 5.800000 %
training @ iter =  500
epoch 6, minibatch 100/100, validation error 3.700000 %
     epoch 6, minibatch 100/100, test error of best model 4.850000 %
training @ iter =  600
epoch 7, minibatch 100/100, validation error 3.450000 %
     epoch 7, minib

Below is the implementation of one DNN column in MCDNN. Each DNN column is initialized with the normalized width and the extent of distortion that needs to be done to the images in data. I have implemented the DNN class both with dropout and without dropout. The dropout rates I have taken are 20% for the input units and 50% for the hidden units.
Each DNN column starts with Convolution layers with max-pooling to reduce the size of the images and create feature maps. After the convolution layer a hidden layer is introduced to transform the output into something the Logistic regrssion layer can use which inturn gives the probabilities of each class a particular image can belong to.

In [133]:
 class DNNColumn(object):

    def __init__(self, ds=None, nkerns=[32, 48], batch_size=100, normalized_width=0, distortion=0,
                    params=[None, None,None, None,None, None,None, None],dropout=False):
        layer3_W, layer3_b, layer2_W, layer2_b, layer1_W, layer1_b, layer0_W, layer0_b = params    
        train_set_x, train_set_y = ds[0]
        valid_set_x, valid_set_y = ds[1]
        test_set_x, test_set_y   = ds[2]
        dropout_rates = [0.2, 0.2, 0.2, 0.5]
        self.n_train_batches  = train_set_x.get_value(borrow=True).shape[0]
        self.n_valid_batches  = valid_set_x.get_value(borrow=True).shape[0]
        self.n_test_batches   = test_set_x.get_value(borrow=True).shape[0]
        self.n_train_batches //= batch_size
        self.n_valid_batches //= batch_size
        self.n_test_batches  //= batch_size
        index = T.lscalar()
        learning_rate = T.fscalar()
        rng = numpy.random.RandomState(23455)
        print('... building the dnn column')
        x = T.matrix('x')
        y = T.ivector('y')
        layer0_input = x.reshape((batch_size, 1, 29, 29))
        if dropout:
            layer0 = DropoutLeNetConvPoolLayer(
            rng,
            input=layer0_input,
            image_shape=(batch_size, 1, 29, 29),
            filter_shape=(nkerns[0], 1, 4, 4),
            poolsize=(2, 2),
            dropout_rate= dropout_rates[1]
        )
            layer1 = DropoutLeNetConvPoolLayer(
                rng,
                input=layer0.output,
                image_shape=(batch_size, nkerns[0], 13, 13),
                filter_shape=(nkerns[1], nkerns[0], 5, 5),
                poolsize=(3, 3),
                dropout_rate= dropout_rates[2]
            )
        else:
            layer0 = LeNetConvPoolLayer(
                rng,
                input=layer0_input,
                image_shape=(batch_size, 1, 29, 29),
                filter_shape=(nkerns[0], 1, 4, 4),
                poolsize=(2, 2),
                W=layer0_W,
                b=layer0_b
            )
            layer1 = LeNetConvPoolLayer(
                rng,
                input=layer0.output,
                image_shape=(batch_size, nkerns[0], 13, 13),
                filter_shape=(nkerns[1], nkerns[0], 5, 5),
                poolsize=(3, 3),
                W=layer1_W,
                b=layer1_b
            )

        layer2_input = layer1.output.flatten(2)
        if dropout:
            layer2 = DropoutHiddenLayer(
            rng,
            input=layer2_input,
            n_in=nkerns[1] * 3 * 3,
            n_out=150,
            dropout_rate = dropout_rates[3],
            activation=T.tanh
        )
        else:
            layer2 = HiddenLayer(
                rng,
                input=layer2_input,
                n_in=nkerns[1] * 3 * 3,
                n_out=150,
                W=layer2_W,
                b=layer2_b,
                activation=T.tanh
            )
        
        layer3 = LogisticRegression(
            input=layer2.output, 
            n_in=150, 
            n_out=10,
            W=layer3_W,
            b=layer3_b
        )

        cost = layer3.negative_log_likelihood(y)
        self.test_model = theano.function(
            [index],
            layer3.errors(y),
            givens={
                x: test_set_x[index * batch_size: (index + 1) * batch_size],
                y: test_set_y[index * batch_size: (index + 1) * batch_size]
            }
        )
        self.test_model = theano.function(
            [index],
            layer3.errors(y),
            givens={
                x: test_set_x[index * batch_size: (index + 1) * batch_size],
                y: test_set_y[index * batch_size: (index + 1) * batch_size]
            }
        )
        self.valid_output_batch = theano.function(
            [index],
            layer3.p_y_given_x,
            givens={
                x: valid_set_x[index * batch_size: (index + 1) * batch_size]
            }
        )
        self.test_output_batch = theano.function(
            [index],
            layer3.p_y_given_x,
            givens={
                x: test_set_x[index * batch_size: (index + 1) * batch_size]
            }
        )
        self.validate_model = theano.function(
            [index],
            layer3.errors(y),
            givens={
                x: valid_set_x[index * batch_size: (index + 1) * batch_size],
                y: valid_set_y[index * batch_size: (index + 1) * batch_size]
            }
        )
        self.params = layer3.params + layer2.params + layer1.params + layer0.params
        self.column_params = [nkerns, batch_size, normalized_width, distortion]
        self.column_params = [nkerns, batch_size, normalized_width, distortion]

        grads  = T.grad(cost, self.params)

        updates = [
            (param_i, param_i - learning_rate * grad_i)
            for param_i, grad_i in zip(self.params, grads)
        ]

        # train the model
        self.train_model = theano.function(
            [index, learning_rate],
            cost,
            updates=updates,
            givens={
                x: train_set_x[index * batch_size: (index + 1) * batch_size],
                y: train_set_y[index * batch_size: (index + 1) * batch_size]
            }
        )

In [135]:
class DNNColumn(DNNColumn):
    def valid_outputs(self):
        test_losses = [
            self.valid_output_batch(i)
            for i in range(self.n_valid_batches)
        ]
        return numpy.concatenate(test_losses)

    def test_outputs(self):
        test_losses = [
            self.test_output_batch(i)
            for i in range(self.n_test_batches)
        ]
        return numpy.concatenate(test_losses)

    def train_column(self, n_epochs=800,init_learning_rate=0.001):
        ######################
        # TRAIN MODEL COLUMN #
        ######################
        print('... training')
        # early-stopping parameters
        patience = 10000
        patience_increase = 2  
        improvement_threshold = 0.995  
        validation_frequency = min(self.n_train_batches, patience / 2)

        best_validation_loss = numpy.inf
        best_iter = 0
        test_score = 0.
        start_time = timeit.default_timer()

        epoch = 0
        done_looping = False
        
        while (epoch < n_epochs) and (not done_looping):
            current_learning_rate = max(numpy.array([init_learning_rate * 0.993**epoch, 0.00003], dtype=numpy.float32))
            epoch = epoch + 1

            for minibatch_index in range(self.n_train_batches):

                iter = (epoch - 1) * self.n_train_batches + minibatch_index

                if iter % 100 == 0:
                    print('training @ iter = ', iter)
                
                cost_ij = self.train_model(minibatch_index, current_learning_rate)

                if (iter + 1) % validation_frequency == 0:

                    validation_losses = [self.validate_model(i) for i
                                         in range(self.n_valid_batches)]
                    this_validation_loss = numpy.mean(validation_losses)
                    print('epoch %i, minibatch %i/%i, validation error %f %%' %
                          (epoch, minibatch_index + 1, self.n_train_batches,
                           this_validation_loss * 100.))

                    if this_validation_loss < best_validation_loss:

                        if this_validation_loss < best_validation_loss *  \
                           improvement_threshold:
                            patience = max(patience, iter * patience_increase)

                        best_validation_loss = this_validation_loss
                        best_iter = iter


                if patience <= iter:
                    done_looping = True
                    break

        end_time = timeit.default_timer()
        print('Optimization complete.')
        print('Best validation score of %f %% obtained at iteration %i, '
              'with test performance %f %%' %
              (best_validation_loss * 100., best_iter + 1, test_score * 100.))
        print(sys.stderr, ('The code for file ' +
                              os.path.split(os.path.abspath(''))[1] +
                              ' ran for %.2fm' % ((end_time - start_time) / 60.)))

    def save(self, filename=None,dropout=False):
        """
        Will need to load last layer W,b to first layer W,b
        """
        name = filename or 'DNN_%iLayers_t%i' % (len(self.params) / 2, int(time.time()))

        print('Saving Model as "%s"...' % name)
        if dropout:
            f = open('/Users/Dropoutmodels/'+name+'.pkl', 'wb+')
        else:
            f = open('/Users/models/'+name+'.pkl', 'wb+')

        pickle.dump([param.get_value(borrow=True) for param in self.params], f, -1)
        pickle.dump(self.column_params, f, -1)
        f.close()
 



In [130]:
def train_mcdnn_column(normalized_width=0, n_epochs=800, trail=0,dropout=False):
    print('... train %i column of normalization %i' % (trail, normalized_width))
    print('... num_epochs %i' % (n_epochs))
    
    datasets = load_data(dataset='mnist.pkl.gz', digit_normalized_width=normalized_width, digit_out_image_size=29)
    if dropout:
        column = DNNColumn(ds=datasets, normalized_width=normalized_width,dropout=True)
    else:
        column = DNNColumn(ds=datasets, normalized_width=normalized_width)
    column.train_column(n_epochs=n_epochs, init_learning_rate=0.1)
    
    filename = 'mcdnn_nm%i_trail%i_Layers_time_%i' % (normalized_width, trail, int(time.time()))
    column.save(filename,dropout)

Below we are training the MCDNN in a reduced capacity MCDNN is actually state-of-art on MNIST dataset with an 0,23 error rate. The principle of this algorithm is create a committee of 35 columns pre-trained with this algorithm. On each column we change some aspects of train_set. 
Our 35 columns divided by:
    5 Train per each normalization
    7 normalization width: [0,10,12,14,16,18,20]
        (0 correspond a dataset without normalization)
But below I have only implemented 1 train per normalization due to reduces resources and also it's very time consuming. Originally the number of epoch were also supposed to be 800 but I took it down to 50 which is also the number of epoch I have trained the CNN on.
The results doesn't align with the MCDNN claims.
The best validitiona score we got in the individual columns is 1.65% but the ensemble of all the columns was greater than 1.8% which is more than the CNN's validation score and the test score was 4.65% which is again more.

In [124]:
import gzip
for nm in [0,10,12,14,16,18,20]:
    train_mcdnn_column(nm, n_epochs=50, trail=0)

... train 0 column of normalization 0
... num_epochs 50
... loading data
... (un)padding digits from 28 -> 29
... sharing data
... building the dnn column


  pooled_out = pool.pool_2d(


... training
training @ iter =  0
epoch 1, minibatch 100/100, validation error 11.400000 %
training @ iter =  100
epoch 2, minibatch 100/100, validation error 6.850000 %
training @ iter =  200
epoch 3, minibatch 100/100, validation error 5.050000 %
training @ iter =  300
epoch 4, minibatch 100/100, validation error 4.200000 %
training @ iter =  400
epoch 5, minibatch 100/100, validation error 3.400000 %
training @ iter =  500
epoch 6, minibatch 100/100, validation error 3.100000 %
training @ iter =  600
epoch 7, minibatch 100/100, validation error 2.950000 %
training @ iter =  700
epoch 8, minibatch 100/100, validation error 2.750000 %
training @ iter =  800
epoch 9, minibatch 100/100, validation error 2.450000 %
training @ iter =  900
epoch 10, minibatch 100/100, validation error 2.400000 %
training @ iter =  1000
epoch 11, minibatch 100/100, validation error 2.400000 %
training @ iter =  1100
epoch 12, minibatch 100/100, validation error 2.400000 %
training @ iter =  1200
epoch 13, m

epoch 46, minibatch 100/100, validation error 28.650000 %
training @ iter =  4600
epoch 47, minibatch 100/100, validation error 26.000000 %
training @ iter =  4700
epoch 48, minibatch 100/100, validation error 23.900000 %
training @ iter =  4800
epoch 49, minibatch 100/100, validation error 22.300000 %
training @ iter =  4900
epoch 50, minibatch 100/100, validation error 20.400000 %
Optimization complete.
Best validation score of 20.400000 % obtained at iteration 5000, with test performance 0.000000 %
<ipykernel.iostream.OutStream object at 0x7ffed0646790> The code for file pragnamandadi ran for 51.01m
Saving Model as "mcdnn_nm10_trail0_Layers_time_1631069810"...
... train 0 column of normalization 12
... num_epochs 50
... loading data
... normalizing digits to width 12 with extra padding 1
... sharing data
... building the dnn column
... training
training @ iter =  0
epoch 1, minibatch 100/100, validation error 79.050000 %
training @ iter =  100
epoch 2, minibatch 100/100, validation 

epoch 36, minibatch 100/100, validation error 7.800000 %
training @ iter =  3600
epoch 37, minibatch 100/100, validation error 7.400000 %
training @ iter =  3700
epoch 38, minibatch 100/100, validation error 7.200000 %
training @ iter =  3800
epoch 39, minibatch 100/100, validation error 7.150000 %
training @ iter =  3900
epoch 40, minibatch 100/100, validation error 7.000000 %
training @ iter =  4000
epoch 41, minibatch 100/100, validation error 6.900000 %
training @ iter =  4100
epoch 42, minibatch 100/100, validation error 6.750000 %
training @ iter =  4200
epoch 43, minibatch 100/100, validation error 6.550000 %
training @ iter =  4300
epoch 44, minibatch 100/100, validation error 6.550000 %
training @ iter =  4400
epoch 45, minibatch 100/100, validation error 6.350000 %
training @ iter =  4500
epoch 46, minibatch 100/100, validation error 6.200000 %
training @ iter =  4600
epoch 47, minibatch 100/100, validation error 6.050000 %
training @ iter =  4700
epoch 48, minibatch 100/100,

epoch 26, minibatch 100/100, validation error 9.700000 %
training @ iter =  2600
epoch 27, minibatch 100/100, validation error 9.450000 %
training @ iter =  2700
epoch 28, minibatch 100/100, validation error 9.150000 %
training @ iter =  2800
epoch 29, minibatch 100/100, validation error 8.750000 %
training @ iter =  2900
epoch 30, minibatch 100/100, validation error 8.500000 %
training @ iter =  3000
epoch 31, minibatch 100/100, validation error 8.200000 %
training @ iter =  3100
epoch 32, minibatch 100/100, validation error 7.900000 %
training @ iter =  3200
epoch 33, minibatch 100/100, validation error 7.700000 %
training @ iter =  3300
epoch 34, minibatch 100/100, validation error 7.350000 %
training @ iter =  3400
epoch 35, minibatch 100/100, validation error 7.150000 %
training @ iter =  3500
epoch 36, minibatch 100/100, validation error 7.050000 %
training @ iter =  3600
epoch 37, minibatch 100/100, validation error 6.850000 %
training @ iter =  3700
epoch 38, minibatch 100/100,

In [164]:
import pdb
import glob
def test_columns():
    dataset='mnist.pkl.gz'
    # create data hash that will be filled with data from different normalizations
    all_datasets = {}
    # instantiate multiple columns
    columns = []
    models = files = glob.glob('/Users/models/' + "/*.pkl")
    print('... Starting to test %i columns' % len(models))
    print(models)
    for model in models:
        # load model params
        f = open(model,'rb')
        u = pickle._Unpickler(f)
        u.encoding = 'latin1'
        params = u.load()
        nkerns, batch_size, normalized_width, distortion = pickle.load(f)
        if all_datasets.get(normalized_width):
            datasets = all_datasets[normalized_width]
        else:
            datasets = load_data(dataset, normalized_width, 29)
            all_datasets[normalized_width] = datasets
        # no distortion during testing
        columns.append(DNNColumn(datasets, nkerns, batch_size, normalized_width, 0, params))
    print('... Forward propagating %i columns' % len(models))
    # call test on all of them recieving 10 outputs
#     if valid_test=='V':
#         model_outputs = [column.valid_outputs() for column in columns] 
#         position_ds   = 1 
#     else:
    model_outputs = [column.test_outputs() for column in columns]      
    position_ds   = 2
    # average 10 outputs
    avg_output = numpy.mean(model_outputs, axis=0)
    # argmax over them
    predictions = numpy.argmax(avg_output, axis=1)
    # compare predictions with true labels
    pred = T.ivector('pred')

    all_true_labels_length = theano.function([], list(all_datasets.values())[0][position_ds][1].shape)
    remainder = all_true_labels_length() - len(predictions)
    true_labels = list(all_datasets.values())[0][position_ds][1][:]

    error = theano.function([pred], T.mean(T.neq(pred, true_labels)))
    acc = error(predictions.astype(dtype=numpy.int32))
    print('....')
    print('Error across %i columns: %f %%' % (len(models), 100*acc))
    return [predictions, acc]

In [165]:
test_columns()

... Starting to test 7 columns
['/Users/models/mcdnn_nm20_trail0_Layers_time_1631081119.pkl', '/Users/models/mcdnn_nm0_trail0_Layers_time_1631066744.pkl', '/Users/models/mcdnn_nm10_trail0_Layers_time_1631069810.pkl', '/Users/models/mcdnn_nm12_trail0_Layers_time_1631072234.pkl', '/Users/models/mcdnn_nm16_trail0_Layers_time_1631077284.pkl', '/Users/models/mcdnn_nm18_trail0_Layers_time_1631079209.pkl', '/Users/models/mcdnn_nm14_trail0_Layers_time_1631074611.pkl']
... loading data
... normalizing digits to width 20 with extra padding 1
... sharing data
... building the dnn column


  pooled_out = pool.pool_2d(


... loading data
... (un)padding digits from 28 -> 29
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 10 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 12 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 16 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 18 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 14 with extra padding 1
... sharing data
... building the dnn column
... Forward propagating 7 columns
....
Error across 7 columns: 4.650000 %


[array([7, 2, 1, ..., 3, 9, 5]), array(0.0465)]

Below is the exact same implementation as above but with dropout and the results I got for dropout are inconclusive maybe coz of the reduced training data. I got a greater test score which is 8.85%, this could be because of the missing information due to dropping of neurons

In [136]:
import gzip
for nm in [0,10,12,14,16,18,20]:
    train_mcdnn_column(nm, n_epochs=50, trail=0, dropout = True)



... train 0 column of normalization 0
... num_epochs 50
... loading data
... (un)padding digits from 28 -> 29
... sharing data
... building the dnn column


  pooled_out = pool.pool_2d(


... training
training @ iter =  0
epoch 1, minibatch 100/100, validation error 56.850000 %
training @ iter =  100
epoch 2, minibatch 100/100, validation error 28.250000 %
training @ iter =  200
epoch 3, minibatch 100/100, validation error 14.600000 %
training @ iter =  300
epoch 4, minibatch 100/100, validation error 11.200000 %
training @ iter =  400
epoch 5, minibatch 100/100, validation error 8.650000 %
training @ iter =  500
epoch 6, minibatch 100/100, validation error 7.150000 %
training @ iter =  600
epoch 7, minibatch 100/100, validation error 6.500000 %
training @ iter =  700
epoch 8, minibatch 100/100, validation error 5.650000 %
training @ iter =  800
epoch 9, minibatch 100/100, validation error 5.200000 %
training @ iter =  900
epoch 10, minibatch 100/100, validation error 5.400000 %
training @ iter =  1000
epoch 11, minibatch 100/100, validation error 5.350000 %
training @ iter =  1100
epoch 12, minibatch 100/100, validation error 4.600000 %
training @ iter =  1200
epoch 13

epoch 46, minibatch 100/100, validation error 78.250000 %
training @ iter =  4600
epoch 47, minibatch 100/100, validation error 78.550000 %
training @ iter =  4700
epoch 48, minibatch 100/100, validation error 78.150000 %
training @ iter =  4800
epoch 49, minibatch 100/100, validation error 78.700000 %
training @ iter =  4900
epoch 50, minibatch 100/100, validation error 77.550000 %
Optimization complete.
Best validation score of 77.200000 % obtained at iteration 3400, with test performance 0.000000 %
<ipykernel.iostream.OutStream object at 0x7ffed0646790> The code for file pragnamandadi ran for 33.01m
Saving Model as "mcdnn_nm10_trail0_Layers_time_1631086774"...
... train 0 column of normalization 12
... num_epochs 50
... loading data
... normalizing digits to width 12 with extra padding 1
... sharing data
... building the dnn column
... training
training @ iter =  0
epoch 1, minibatch 100/100, validation error 79.800000 %
training @ iter =  100
epoch 2, minibatch 100/100, validation 

epoch 35, minibatch 100/100, validation error 47.100000 %
training @ iter =  3500
epoch 36, minibatch 100/100, validation error 41.850000 %
training @ iter =  3600
epoch 37, minibatch 100/100, validation error 44.750000 %
training @ iter =  3700
epoch 38, minibatch 100/100, validation error 47.100000 %
training @ iter =  3800
epoch 39, minibatch 100/100, validation error 35.500000 %
training @ iter =  3900
epoch 40, minibatch 100/100, validation error 36.350000 %
training @ iter =  4000
epoch 41, minibatch 100/100, validation error 33.500000 %
training @ iter =  4100
epoch 42, minibatch 100/100, validation error 28.900000 %
training @ iter =  4200
epoch 43, minibatch 100/100, validation error 34.400000 %
training @ iter =  4300
epoch 44, minibatch 100/100, validation error 24.050000 %
training @ iter =  4400
epoch 45, minibatch 100/100, validation error 24.100000 %
training @ iter =  4500
epoch 46, minibatch 100/100, validation error 23.150000 %
training @ iter =  4600
epoch 47, miniba

epoch 24, minibatch 100/100, validation error 56.850000 %
training @ iter =  2400
epoch 25, minibatch 100/100, validation error 60.450000 %
training @ iter =  2500
epoch 26, minibatch 100/100, validation error 54.050000 %
training @ iter =  2600
epoch 27, minibatch 100/100, validation error 55.800000 %
training @ iter =  2700
epoch 28, minibatch 100/100, validation error 48.050000 %
training @ iter =  2800
epoch 29, minibatch 100/100, validation error 49.100000 %
training @ iter =  2900
epoch 30, minibatch 100/100, validation error 39.450000 %
training @ iter =  3000
epoch 31, minibatch 100/100, validation error 38.450000 %
training @ iter =  3100
epoch 32, minibatch 100/100, validation error 33.600000 %
training @ iter =  3200
epoch 33, minibatch 100/100, validation error 31.800000 %
training @ iter =  3300
epoch 34, minibatch 100/100, validation error 33.650000 %
training @ iter =  3400
epoch 35, minibatch 100/100, validation error 29.750000 %
training @ iter =  3500
epoch 36, miniba

In [166]:
import pdb
import glob
def test_dropout_columns():
    dataset='mnist.pkl.gz'
    # create data hash that will be filled with data from different normalizations
    all_datasets = {}
    # instantiate multiple columns
    columns = []
    models = files = glob.glob('/Users/Dropoutmodels/' + "/*.pkl")
    print('... Starting to test %i columns' % len(models))
    print(models)
    for model in models:
        # load model params
        f = open(model,'rb')
        u = pickle._Unpickler(f)
        u.encoding = 'latin1'
        params = u.load()
        nkerns, batch_size, normalized_width, distortion = pickle.load(f)
        if all_datasets.get(normalized_width):
            datasets = all_datasets[normalized_width]
        else:
            datasets = load_data(dataset, normalized_width, 29)
            all_datasets[normalized_width] = datasets
        # no distortion during testing
        columns.append(DNNColumn(datasets, nkerns, batch_size, normalized_width, 0, params))
    print('... Forward propagating %i columns' % len(models))
    # call test on all of them recieving 10 outputs
#     if valid_test=='V':
#         model_outputs = [column.valid_outputs() for column in columns] 
#         position_ds   = 1 
#     else:
    model_outputs = [column.test_outputs() for column in columns]      
    position_ds   = 2
    # average 10 outputs
    avg_output = numpy.mean(model_outputs, axis=0)
    # argmax over them
    predictions = numpy.argmax(avg_output, axis=1)
    # compare predictions with true labels
    pred = T.ivector('pred')

    all_true_labels_length = theano.function([], list(all_datasets.values())[0][position_ds][1].shape)
    remainder = all_true_labels_length() - len(predictions)
    true_labels = list(all_datasets.values())[0][position_ds][1][:]

    error = theano.function([pred], T.mean(T.neq(pred, true_labels)))
    acc = error(predictions.astype(dtype=numpy.int32))
    print('....')
    print('Error across %i columns: %f %%' % (len(models), 100*acc))
    return [predictions, acc]

In [167]:
test_dropout_columns()

... Starting to test 7 columns
['/Users/Dropoutmodels/mcdnn_nm14_trail0_Layers_time_1631090599.pkl', '/Users/Dropoutmodels/mcdnn_nm16_trail0_Layers_time_1631092438.pkl', '/Users/Dropoutmodels/mcdnn_nm12_trail0_Layers_time_1631088740.pkl', '/Users/Dropoutmodels/mcdnn_nm0_trail0_Layers_time_1631084788.pkl', '/Users/Dropoutmodels/mcdnn_nm10_trail0_Layers_time_1631086774.pkl', '/Users/Dropoutmodels/mcdnn_nm18_trail0_Layers_time_1631094280.pkl', '/Users/Dropoutmodels/mcdnn_nm20_trail0_Layers_time_1631096121.pkl']
... loading data
... normalizing digits to width 14 with extra padding 1
... sharing data
... building the dnn column


  pooled_out = pool.pool_2d(


... loading data
... normalizing digits to width 16 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 12 with extra padding 1
... sharing data
... building the dnn column
... loading data
... (un)padding digits from 28 -> 29
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 10 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 18 with extra padding 1
... sharing data
... building the dnn column
... loading data
... normalizing digits to width 20 with extra padding 1
... sharing data
... building the dnn column
... Forward propagating 7 columns
....
Error across 7 columns: 8.850000 %


[array([7, 2, 1, ..., 3, 9, 5]), array(0.0885)]

_Conclusion_
In conclusion even though the claims are that the MCDNN performs better than a regular CNN, I wasn't able to achieve that due to the reduced amount of trainings per normalization and reduced epochs. But In my analysis if you are with reduced computational resources CNN still works better with 2.65% test_score which would have reduced to something below 1.5% if MCDNN was used but the trade off of the Computational resources is not worth it in my opinion if you are looking for cost effective model. 