# Image Classification

## MNIST Dataset In Tensorflow

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Build a perceptron

In [3]:
input_size = 784
no_classes = 10
batch_size = 100
total_batches = 200

In [7]:
import tensorflow as tf

#x_input is the input where the images will be fed later. 
#y_input is the placeholder where the one-shot labels or targets will be supplied.

x_input = tf.placeholder(tf.float32, shape=[None, input_size])
y_input = tf.placeholder(tf.float32, shape=[None, no_classes])

The None in the shape argument indicates that it can be of any size as we have not yet defined the batch size. The second argument is the size of the tensor for x_input and the number of classes for y_input. Based on the type of placeholder, we have sent the data as floats. Next, we can define the perceptron.

The weight variables are initialized with normal random distribution with the shape of input size and number of classes. The input size is 784 here as the image is reshaped into a single vector. The number of classes is 10 which is equal to the number of digits in the dataset. The bias variable is also initialized with random normal distribution with the size equal to the number of classes. The weights and bias are defined as follows:


In [8]:
weights = tf.Variable(tf.random_normal([input_size, no_classes]))
bias = tf.Variable(tf.random_normal([no_classes]))

The initialization of the variables can be zeroes but a random normal distribution gives a steady training. The inputs are then weighted and added with the bias to produce logits as shown next:

In [9]:
logits = tf.matmul(x_input, weights) + bias

The logits produced by the perceptron has to be compared against one-hot labels y_input. The tf.nn.softmax_cross_entropy_with_logits API from TensorFlow does this for us. The loss can be computed by averaging the cross-entropies. Then the cross-entropy is fed through gradient descent optimization done
by tf.train.GradientDescentOptimizer. The optimizer takes the loss and minimizes it with a learning rate of 0.5. 

In [10]:
softmax_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_input, logits=logits)
loss_operation = tf.reduce_mean(softmax_cross_entropy)
optimiser = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(loss_operation)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



### Start training the model

In [11]:
session = tf.Session()
session.run(tf.global_variables_initializer())

 Over a loop, read the data in batches and train the model. Training the model is carried out by running the session with the required tensors. 

In [12]:
for batch_no in range(total_batches):
    mnist_batch = mnist_data.train.next_batch(batch_size)
    _, loss_value = session.run([optimiser, loss_operation], feed_dict={
        x_input: mnist_batch[0],
        y_input: mnist_batch[1]
    })
    print(loss_value)

13.3438425
9.544711
8.468291
7.0211115
5.966028
6.1674266
6.9207563
5.8695483
6.3952875
6.582772
5.7185273
5.5500426
5.666357
4.4614797
5.2273693
4.0060983
4.638402
5.0813637
3.0039093
4.169534
4.7116485
3.3631196
3.7953691
3.5868924
3.3116663
3.4326982
2.7202184
3.1401513
3.2643642
3.4223897
2.8351662
2.6343439
3.3724825
2.0103126
2.8256006
2.5645308
2.6122391
3.1199296
2.882144
3.4239647
1.8731222
2.5002882
1.9491075
2.3378136
2.641608
2.3660817
2.0220249
2.2781289
1.6602964
2.154785
1.8930233
2.0421805
2.2042398
2.8818164
1.8688518
2.548379
1.761872
2.0188308
1.6740566
2.2351923
1.9016485
1.9110863
1.8927827
1.5942504
1.9978166
2.1053681
1.5109371
1.8436159
2.1211126
1.7985716
2.147211
2.0203032
1.9330999
1.6550827
2.5906522
1.7457494
2.0940406
1.7813103
1.7686172
1.5329977
1.7671307
1.9661092
0.9769882
2.4779758
1.4414487
1.6319892
1.4561086
1.0990819
2.053479
1.6655915
1.0395226
1.3257146
1.7741635
1.0457492
1.6963199
1.2840796
1.4430994
1.3676748
1.7735982
1.7072165
1.4904449
1.8

In [14]:
predictions = tf.argmax(logits, 1)
correct_predictions = tf.equal(predictions, tf.argmax(y_input, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_predictions,
                                               tf.float32))
test_images, test_labels = mnist_data.test.images, mnist_data.test.labels
accuracy_value = session.run(accuracy_operation, feed_dict={
    x_input: test_images,
    y_input: test_labels
})
print('Accuracy : ', accuracy_value)
session.close()

Accuracy :  0.8087


### Build a multi-layer perceptron

In [16]:
def add_variable_summary(tf_variable, summary_name):
    with tf.name_scope(summary_name + '_summary'):
        mean = tf.reduce_mean(tf_variable)
        tf.summary.scalar('Mean', mean)
        with tf.name_scope('standard_deviation'):
            standard_deviation = tf.sqrt(tf.reduce_mean(
                tf.square(tf_variable - mean)))
        tf.summary.scalar('StandardDeviation', standard_deviation)
        tf.summary.scalar('Maximum', tf.reduce_max(tf_variable))
        tf.summary.scalar('Minimum', tf.reduce_min(tf_variable))
        tf.summary.histogram('Histogram', tf_variable)

The variable summary function writes the summaries of a variable. There are five statistics added to the summaries: mean, standard deviation, maximum, minimum and histogram. Summaries can be either a scalar or a histogram. 

Unlike the previous model, we will resize the MNIST data into a square and use it like a two-dimensional image. The following is the command to reshape the image into 28 image pixels by 28 image pixels:

In [17]:
x_input_reshape = tf.reshape(x_input, [-1, 28, 28, 1],
       name='input_reshape')

The dimension -1 denotes that the batch size can be any number. Note that there is an argument called name that will be reflected in the TensorBoard graph for ease of understanding. We will define a 2D convolution layer where the input, filters, kernels, and activations are defined. This method can be called anywhere for further examples and is useful when the activation function has to have Rectified Linear Unit (ReLU) activation.

In [18]:
def convolution_layer(input_layer, filters, kernel_size=[3, 3],
                         activation=tf.nn.relu):
    layer = tf.layers.conv2d(
           inputs=input_layer,
           filters=filters,
           kernel_size=kernel_size,
           activation=activation,
    )
    add_variable_summary(layer, 'convolution')
    return layer

There are default parameters for kernel_size and activation

The summaries are added to the layer within the function and the layer is returned. Whenever the function is
called, input_layer has to be passed as a parameter. This definition will make our other code simple and small. In a very similar way, we will define a function for
the pooling_layer

In [19]:
def pooling_layer(input_layer, pool_size=[2, 2], strides=2):
    layer = tf.layers.max_pooling2d(
           inputs=input_layer,
           pool_size=pool_size,
           strides=strides
    )
    add_variable_summary(layer, 'pooling')
    return layer

In [20]:
def dense_layer(input_layer, units, activation=tf.nn.relu):
    layer = tf.layers.dense(
           inputs=input_layer,
           units=units,
           activation=activation
    )
    add_variable_summary(layer, 'dense')
    return layer

Another convolution layer can be added to transform the sampled features from the first convolution layer to better features. After pooling, we may reshape the activations to a linear fashion in order to be fed through dense layers:

In [21]:
convolution_layer_1 = convolution_layer(x_input_reshape, 64)
pooling_layer_1 = pooling_layer(convolution_layer_1)
convolution_layer_2 = convolution_layer(pooling_layer_1, 128)
pooling_layer_2 = pooling_layer(convolution_layer_2)
flattened_pool = tf.reshape(pooling_layer_2, [-1, 5 * 5 * 128],
                               name='flattened_pool')
dense_layer_bottleneck = dense_layer(flattened_pool, 1024)

In [22]:
dropout_bool = tf.placeholder(tf.bool)
dropout_layer = tf.layers.dropout(
           inputs=dense_layer_bottleneck,
           rate=0.4,
           training=dropout_bool
)

The dropout layer is fed again to a dense layer, which is called logits. Logits is the final layer with activations leading to the number of classes. The activations will be spiked for a particular class, which is the target class, and can be obtained for a maximum of those 10 activations:

In [23]:
logits = dense_layer(dropout_layer, no_classes)

Now the logits can be passed through the softmax layer followed by the cross-entropy calculation as before. 

In [24]:
with tf.name_scope('loss'):
    softmax_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
        labels=y_input, logits=logits)
    loss_operation = tf.reduce_mean(softmax_cross_entropy, name='loss')
    tf.summary.scalar('loss', loss_operation)

This loss function can be optimized with tf.train APIs' methods. Here, we will use the Adamoptimiser. The learning rate need not be defined and works well for most cases

In [25]:
with tf.name_scope('optimiser'):
       optimiser = tf.train.AdamOptimizer().minimize(loss_operation)

The accuracy is calculated as before but name scopes are added for correct predictions and accuracy calculation

In [26]:
with tf.name_scope('accuracy'):
    with tf.name_scope('correct_prediction'):
        predictions = tf.argmax(logits, 1)
        correct_predictions = tf.equal(predictions, tf.argmax(y_input, 1))
        with tf.name_scope('accuracy'):
            accuracy_operation = tf.reduce_mean(
                tf.cast(correct_predictions, tf.float32))
tf.summary.scalar('accuracy', accuracy_operation)

<tf.Tensor 'accuracy_1:0' shape=() dtype=string>

The next step is to start the session and initialize the variables as in the previous section.

In [27]:
session = tf.Session()
session.run(tf.global_variables_initializer())

In [30]:
merged_summary_operation = tf.summary.merge_all()
train_summary_writer = tf.summary.FileWriter('train', session.graph)
test_summary_writer = tf.summary.FileWriter('test')

In [31]:
test_images, test_labels = mnist_data.test.images, mnist_data.test.labels
for batch_no in range(total_batches):
    mnist_batch = mnist_data.train.next_batch(batch_size)
    train_images, train_labels = mnist_batch[0], mnist_batch[1]
    _, merged_summary = session.run([optimiser, merged_summary_operation],
    feed_dict={
           x_input: train_images,
           y_input: train_labels,
           dropout_bool: True
    })
    train_summary_writer.add_summary(merged_summary, batch_no)
    if batch_no % 10 == 0:
        merged_summary, _ = session.run([merged_summary_operation,accuracy_operation], feed_dict={
               x_input: test_images,
               y_input: test_labels,
               dropout_bool: False
        })
        test_summary_writer.add_summary(merged_summary, batch_no)

<img src='tf.png'>

## MNIST Dataset In Keras

### Prepare data

In [47]:
batch_size = 128
no_classes = 10
epochs = 2
image_height, image_width = 28, 28

In [48]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Reshape the vector into an image format, and define the input dimension for the convolution using the code given:

In [49]:
x_train = x_train.reshape(x_train.shape[0], image_height,image_width, 1)
x_test = x_test.reshape(x_test.shape[0], image_height, image_width, 1)
input_shape = (image_height, image_width, 1)

Convert to float

In [50]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Normalize :

In [51]:
x_train /= 255
x_test /= 255

Convert categorical variables to one hot encoders :

In [52]:
y_train = tf.keras.utils.to_categorical(y_train, no_classes)
y_test = tf.keras.utils.to_categorical(y_test, no_classes)

### Build model

In [53]:
def simple_cnn(input_shape):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Conv2D(
           filters=64,
           kernel_size=(3, 3),
           activation='relu',
           input_shape=input_shape
    ))
    model.add(tf.keras.layers.Conv2D(
           filters=128,
           kernel_size=(3, 3),
           activation='relu'
    ))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(rate=0.3))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(units=1024, activation='relu'))
    model.add(tf.keras.layers.Dropout(rate=0.3))
    model.add(tf.keras.layers.Dense(units=no_classes,
   activation='softmax'))
    model.compile(loss=tf.keras.losses.categorical_crossentropy,
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=['accuracy'])
    return model
simple_cnn_model = simple_cnn(input_shape)

In [54]:
simple_cnn_model.fit(x_train, y_train, batch_size, epochs, (x_test,y_test))
train_loss, train_accuracy = simple_cnn_model.evaluate(x_train, y_train, verbose=0)
print('Train data loss:', train_loss)
print('Train data accuracy:', train_accuracy)

Epoch 1/2
Epoch 2/2
Train data loss: 0.02458886965283115
Train data accuracy: 0.99195


In [55]:
test_loss, test_accuracy = simple_cnn_model.evaluate(
       x_test, y_test, verbose=0)
print('Test data loss:', test_loss)
print('Test data accuracy:', test_accuracy)

Test data loss: 0.04100198608621722
Test data accuracy: 0.9864


## Training a model for cats versus dogs