Skip to content
🎓 Tutorial for coding a convolutional neural network using Tensorflow.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Convolutional Neural Network Tutorial


This is a little tutorial for develope a Convolutional Neural Network using Google's library: TensorFlow. The dataset which is going to be used is the MNIST dataset, which is one of the most famous machine learning datasets.

MNIST dataset

MNIST is one of the most famous datasets for machine learning. We have images of handwritten digits, from 0 to 9. It has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.

An example of an image would be:

Example of a MNIST image

You can check all the information in Yann Lecun webpage:

Deep Learning?

Nowadays, there is such a BOOM of Deep Learning. And in my opinion is not weird. There are problems which has get incredible results using Deep Learning techniques.

But, what is really Deep Learning?

If we say it with day by day language, is like throwing a lot of things to a box, and get amazing results. Or, this is how you would think about it, but in a more formal way:

Deep Learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data. [1]

It is based in Neural Networks(NN), such as other machine learning algorithms, but it differs from them in a very specific, important point, which is Feature Extraction.

Normally, in other machine learning techniques, you have to do the Feature Extraction manually, and then train the NN which is an automatic process. But, with Deep Learning techniques, we can also make the Feature Extraction automatic, and without the need of labeled data for it, just for the training part.

Convolutional Neural Network (CNN) [2]

A CNN is a type of neural network for processing data that has a known, grid-type topology. They use a mathematical operation called convolution. So:

CNN are just neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

Example of the architecture of a CNN: [3]

So the architecture is basicly, some layers for the convolution operation, some layers for the pooling operation and finally a fully conected layer that gives us the output.

There are some operations that are basic in a CNN:

  • Convolution: A convolution is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore "blends" one function with another. [4]

  • Pooling: A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. For example, one very used function, and the one that is going to be used in the code, is the max pooling(Zhon and Chellapa,1988) operation reports the maximun output within a rectangular neighborhood.

Also, the combination of preventing overfitting techniques (f.e dropout) and CNNs give us great results in the overfitting problem.

CNNs had been tremendously succesful in practical applications. If you want to learn more about the mathematical background, I strongly recommend you reading Deep Learning Book's [2] chapter about CNNs.



The tutorial is going to be made using Python3, and Tensorflow [5]. The code is based on Tensorflow tutorials about CNNs [6] and sentdex youtube channel [7] [8].

The code could be found in the code folder.


First we are going to import the Tensorflow library and the MNIST data set:

import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

Then, we are going to load the data to a variable, and set the number of classes and the batch size. The batch size defines the number of samples that going to be propagated through the network. So, if you have 1000 examples, and your batch size is 100, is going to take 100 and train the network. Then is going to take another 100 and train the network, and so on:

"""Here we load the dataset"""
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)

"""Number of classes of the dataset, in this case is 10 because we have numbers
from 0 to 9"""
n_classes = 10
batch_size = 128

PlaceHolders are used to inject data to any tensor in a computation graph. It exists solely to serve as the target of feeds. It is not initialized and contains no data.

x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')

The keep probability is going to be used for the dropout. This is going to be the probability for keep the connecting between two neurons, or the input and the neuron.

# Keep rate will do 0.6
keep_rate = 0.6
keep_prob = tf.placeholder(tf.float32)

Now, we are going to declare two functions. The first one is going to be used to declare variables for the weights, and the sencond one for the biases. We are going to embebed them in a function for having a clearer code;

""" This is going to be used for creating the weights and the biases"""
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Then we are going to define the convolution and the maxpool operations. They are built-in in Tensorflow. Our convolutions uses a stride of one and are zero padded so that the output is the same size as the input. Our pooling is plain old max pooling over 2x2 blocks:

def conv2d(data, weights):
    return tf.nn.conv2d(data, weights, strides=[1,1,1,1], padding='SAME') # We are not going to get the depth

def maxpool2d(data):
    """Here we are going to move two by two at a time size of the window movement of the window"""
    return tf.nn.max_pool(data, ksize=[1,2,2,1],strides=[1,2,2,1], padding = 'SAME')

Then we are going to define the architecture of our CNN. It's going to have 2 convolutional layers, we are going to use a ReLU for the firing fuction and apply the poiling technique after each convolutional layer and then a fully conected layer which is going to give us the output. Also, we are going to use the dropout technique to prevent overfitting.

The first convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]. The first two dimensions are the patch size, the next is the number of input channels, and the last is the number of output channels. We will also have a bias vector with a component for each output channel. The second layer will have 64 features for each 5x5 patch. Also the bias will be 64. Then, the fully connected layer will be get a patch of 7*7 and will have 1024 neurons. And finally, we have the final layer, which will go from 1024 neurons, to the number of classes.

To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions corresponding to image width and height, and the final dimension corresponding to the number of color channels.

Then, we apply the convolutional and the pooling operations to the first and second layers. Then, for the fully connected layer, we reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU [9].

Finally, we apply the dropout technique to prevent overfitting, and with a matmul operation we get the output.

def convolutional_neural_network(data):

    """Here we are going to create the weights and biases variables for generating our neural network"""

    w_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_variable([32])

    w_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_variable([64])

    w_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])

    w_fc2 = weight_variable([1024, n_classes])
    b_fc2 = bias_variable([n_classes])

    x_image = tf.reshape(data, shape=[-1, 28, 28, 1]) #Reshape the image

    #First convolutional layer
    h_conv1 = tf.nn.relu(conv2d(x_image,w_conv1 + b_conv1))
    h_pool1 = maxpool2d(h_conv1)

    #Second convolutional layer
    h_conv2 = tf.nn.relu(conv2d(h_pool1,w_conv2 + b_conv2))
    h_pool2 = maxpool2d(h_conv2)

    h_pool2_flat = tf.reshape(h_pool2,[-1,7*7*64])

    fc = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1 )+ b_fc1)

    """The idea of dropout is for help us in a
    bigger neural network, dropout is going to help fight
    local minimuns"""

    fc_dropout = tf.nn.dropout(fc, keep_rate) #Compute dropout
    #Final layer with a softmax
    y = tf.nn.softmax(tf.matmul(fc_dropout, w_fc2)+ b_fc2)

    return y

Finally, we have to train our CNN. We need to start the Tensorflow session, and initialize the variables. Also, we use the AdamOptimizer [10] instead of a normal gradient descent. In the range, you can use as many epochs as you want, but the time is going to take is going to depend in your machine computing power. In my case, the computer freezes if I try to compute very large networks with lots of epochs. Below, we can check an example of how it would be:

'''Here is the main where we are going to train the neural network'''
sess = tf.InteractiveSession()
#Firstly we get the prediction
prediction = convolutional_neural_network(x)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(prediction), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for i in range(20): #Here you can do as much epochs as you want
    batch = mnist.train.next_batch(50) #For your own data, you have to implement this function
    if(i%5 == 0):
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y: batch[1], keep_prob: 0.1})
        print("step %d, training accuracy %g"%(i, train_accuracy)){x: batch[0], y: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 1.0}))

And then we will have our CNN ready to be used.



[2]. Ian Goodfellow,Yoshua Bengio and Aaron Courville










You can’t perform that action at this time.