# Deep Learning for computer vision

This chapter introduces convolutional neural networks, also known as convnets, the type of deep learning model that is now used almost universally in computer vision applications. We’ll learn to apply convnets to image-classification problems.

### Introduction to convnets

Let’s take a practical look at a simple convnet example that classifies MNIST digits.
We'll use a stack of Conv2D and MaxPooling2D layers.
With this example I want to show you what a basic convenet looks like. 

In [2]:
# Loading the MNIST dataset in Keras
import keras 
from keras.datasets import mnist
(train_images, train_labels),  (test_images, test_labels) = mnist.load_data()

2024-09-17 22:52:04.369751: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
# Instantiating a small convnet
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) # 32 filters, 3x3 kernel # the input shape is 28x28x1, which is the format of MNIST images
model.add(layers.MaxPooling2D((2, 2))) # 2x2 pooling
model.add(layers.Conv2D(64, (3, 3), activation='relu')) 
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Adding a classifier on top of the convnet
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(10, activation='softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Let's display the architecture of the convnet:

In [7]:
model.summary()

The output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channles).
- The width and height dimensions tend to shrink as we go deeper in the network.
- The number of channels is controlled by the first argument passed to the Conv2D (32 or 64)


The final step was to feed the last output tensor (of shape (3,3,64)) into a densely connected classifiers process vectors, which are 1D, whereas the current output is a 3D tensors. First, we had to flatten the 3D outputs to 1D, and then added a few Dense layers on top.

In [8]:
# Training the convnet on MNIST images
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 67ms/step - accuracy: 0.8730 - loss: 0.4034
Epoch 2/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 64ms/step - accuracy: 0.9840 - loss: 0.0504
Epoch 3/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 57ms/step - accuracy: 0.9884 - loss: 0.0356
Epoch 4/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 61ms/step - accuracy: 0.9919 - loss: 0.0251
Epoch 5/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m55s[0m 59ms/step - accuracy: 0.9949 - loss: 0.0171


<keras.src.callbacks.history.History at 0x12d77ab20>

Let's evaluate the model on the test data 

In [9]:
# Evaluating the model
test_loss, test_acc = model.evaluate(test_images, test_labels)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.9857 - loss: 0.0441


### The key characteristic of convnets

The fundamental difference between a densely connected layer and a convolution layer is this: Dense layers learn global patterns in their input feature space, whereas convolution layers learn local patterns.

This key characteristic gives convnets two interesting properties:
- The patterns they learn are translation-invariant. After learning a certain pattern in the lower-right corner of a picture, a convnet can recognize it anywhere: for example, in the upper-left corner. A densely connected model would have to learn the pattern anew if it appeared at a new location.
- They can learn spatial hierarchies of patterns. A first convolution layer will learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on. This allows convnets to efficiently learn increasingly complex and abstract visual concepts, because the visual world is fundamentally spatially hierarchical.
