<a href="https://colab.research.google.com/github/nat-smithh/ag-sci/blob/main/landscapes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# landscape classification

In this case study, we will be exploring the use of convolution neural networks to classify images of landscapes. Identifying landscape or vegetation or land-use types from image data has important applications in agriculture and natural-resource management.

In this example, we'll be using a data set of images from the following possible types or classes:

* buildings representing cities or other human habitations
* forests
* glacier or ice-covered landscapes
* mountains
* sea or ocean
* streets or paved areas

I've packaged the image data into separate training and validation sub-sets, available for download at zenodo.

The following code cell will download the image data sets, confirm the number of images in each sub-set, and package the data sets into tensorflow Dataset objects for neural network training.

In [1]:
import tensorflow as tf
import pathlib

# download training data
train_url = 'https://zenodo.org/record/5512793/files/train.tgz'
train_dir = tf.keras.utils.get_file(origin=train_url, fname='train', untar=True)
train_dir = pathlib.Path(train_dir)

# download validation data
valid_url = 'https://zenodo.org/record/5512793/files/valid.tgz'
valid_dir = tf.keras.utils.get_file(origin=valid_url, fname='valid', untar=True)
valid_dir = pathlib.Path(valid_dir)

# print number of training and validation images
train_image_count = len(list(train_dir.glob('*/*.jpg')))
valid_image_count = len(list(valid_dir.glob('*/*.jpg')))
print(train_image_count, valid_image_count)

# package images into tensorflow dataset objects
train_data = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
valid_data = tf.keras.preprocessing.image_dataset_from_directory(valid_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
# print tensorflow dataset objects
print(train_data, valid_data)

Downloading data from https://zenodo.org/record/5512793/files/train.tgz
[1m200908942/200908942[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 0us/step
Downloading data from https://zenodo.org/record/5512793/files/valid.tgz
[1m42810482/42810482[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 0us/step
0 0
Found 14034 files belonging to 1 classes.
Found 3000 files belonging to 1 classes.
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))> <_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>


It should take a few seconds to download the data sets.

You should see that there are 14,034 total image files in the training data sub-set, and 3000 images in the validation sub-set. In both cases, there are 6 possible classes or landscape types.

Notice that the shape of the training and validation Dataset objects is the same:

    ((None, 150, 150, 3), (None, ))

That is, the image data (ignoring the batch dimension of None) consists of 150x150 pixel images with 3 color channels (ie, typical RGB image data). The labels are integer-valued class labels, which is pretty standard for image classification problems.

We'll need to remember that the images are 150x150x3, so we can specify the correct input shape for our neural network.

Also, we'll need to use SparseCategoricalCrossentropy loss when we fit our model to the training data, because the category labels are not one-hot encoded.

To start off, we'll build a very simple convolution neural network consisting of a single convolution layer with a single 3x3 filter and ReLU activation.

As a 'trick', we're going to rescale our image data to be on the [0,1] scale *automatically* in our neural network. We'll do this by specifying a tf.keras.layers.experimental.preprocessing.Rescaling layer as the first layer in the network.

Because our image data has pixel values between 0 and 255, we'll need to 'rescale' them by a factor of:

    1.0/255

which we specify as the scaling factor for the Rescaling layer. We'll also need to specify the input shape of the network when we create the Rescaling layer, because it's the first layer in the network.

After 'flattening' the output of the convolution layer, we create a Dense output layer with 6 units (because there are 6 possible landscape classes), and softmax activation.

We're going to opt for the Adam optimizer in this case, as it will help our model fit run faster (ie, fewer epochs). With this much data, we don't want to wait around for the slower SGD optimizer to reach a good model fit. The Adam optimizer is typically 'faster' than SGD, and it has been widely used for training image classification networks.

Make sure we specify SparseCategoricalCrossentropy loss, record the model's accuracy as it trains, and we'll train for 20 epochs.

Make sure you use GPU resources for this run, or it will take a *long* time! Click on the downward-facing arrow in the upper right corner of colab, select "View resources", and then click "Change Runtime Type". Select "GPU" from the "Hardware acceleration" drop-down, and save. Now your model fit will run on a GPU.

In [2]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

  super().__init__(**kwargs)


Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 17ms/step - accuracy: 0.9859 - loss: 0.1242 - val_accuracy: 1.0000 - val_loss: 6.4045e-04
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 1.0000 - loss: 2.1790e-04 - val_accuracy: 1.0000 - val_loss: 1.1036e-05
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 16ms/step - accuracy: 1.0000 - loss: 1.3681e-05 - val_accuracy: 1.0000 - val_loss: 1.6857e-06
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 14ms/step - accuracy: 1.0000 - loss: 4.0611e-06 - val_accuracy: 1.0000 - val_loss: 5.6982e-07
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 1.0000 - loss: 2.0695e-06 - val_accuracy: 1.0000 - val_loss: 2.5918e-07
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 14ms/step - accuracy: 1.0000 - loss: 1.1524e-06 - val_accuracy: 1.0000

<keras.src.callbacks.history.History at 0x7c2d16bd9ee0>

This model has 131,458 trainable parameters, nearly all of them in the Dense output layer.

You'll notice that, with the Adam optimizer, the model reaches >0.95 accuracy on the *training* data after only a few epochs of training. But, the accuracy on the *validation* data stays *very* low (around 0.38 in my case)!

There is clearly an 'overfitting' problem. This makes sense, given that we have 14,034 training images and 131,458 model parameters!

Let's try adding a Dropout layer to reduce overfitting.

In the following code cell, we remove 90% of the outputs from the convolution layer, before flattening the data and sending it to the Dense output layer.

In [3]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.Dropout(rate=0.9))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 17ms/step - accuracy: 0.9920 - loss: 0.0232 - val_accuracy: 1.0000 - val_loss: 1.0092e-07
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 15ms/step - accuracy: 1.0000 - loss: 9.3382e-08 - val_accuracy: 1.0000 - val_loss: 8.1571e-08
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - accuracy: 1.0000 - loss: 7.5604e-08 - val_accuracy: 1.0000 - val_loss: 5.6859e-08
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 16ms/step - accuracy: 1.0000 - loss: 5.9503e-08 - val_accuracy: 1.0000 - val_loss: 4.3827e-08
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 13ms/step - accuracy: 1.0000 - loss: 5.2212e-08 - val_accuracy: 1.0000 - val_loss: 3.2980e-08
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 13ms/step - accuracy: 1.0000 - loss: 7.1099e-08 - val_accuracy: 1.0000 

<keras.src.callbacks.history.History at 0x7c2d1410f9e0>

Well, we appear to have alleviated model overfitting to the training data; the model's accuracy on the training and validation data sub-sets is much more similar.

But, accuracy is pretty *low*, overall. In my case, I achieved a final model accuracy of 0.49 on the training data and 0.50 on the validation data. So, about 50% of the images are being correctly classified, but at least our model isn't overfitting.

Let's see if we can improve model accuracy, without exacerbating overfitting.

The first thing we'll try is to add more 'filters' to the convolution layer.

Try setting the number of filters to 32, rather than using a single filter, and see if that improves accuracy on both the training and validation data. You can edit the following code cell to increase the number of convolution filters.

There's no 'magic' number of convolution filters that works in all situations, and I'm not aware of any reasonable 'formula' for deriving an 'optimal' number of convolution filters for a specific problem. Rather, most network designers will choose a 'convenient' number of convolution filters, based loosely on what has been used successfully in the past. Commonly-used values are typically in the range of 32-128, although sometimes you'll see more than 128 filters for some large-scale image analysis problems. And yes, the number of filters is typically a power of 2.

In our case, we chose 32 filters, based on the following 'intuition'.

* 16 filters might be 'too small' to capture a sufficient number of 'features' in the image data, and
* much more than 32 filters might increase the parameter count in our model 'too much', given the relatively small amount of training data.

In [5]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.Dropout(rate=0.9))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 25ms/step - accuracy: 0.9886 - loss: 0.0271 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 16ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 14ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 15ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.00

<keras.src.callbacks.history.History at 0x7c2cdffca030>

Well, now we've got the accuracy on the training data back up over 0.95, but the validation accuracy is still suffering a bit. In my case, it was 0.67 after 20 epochs of training.

We could try increasing the dropout rate in our Dropout layer, to try to reduce the observed overfitting, but this would likely negatively impact our training accuracy.

As an alternative, we'll use a type of neural network layer widely used in convolution networks called a MaxPooling layer.

Briefly, "MaxPooling" looks across a contiguous block of inputs and outputs the maximum value within that block. It's similar to taking an 'average' of a bunch of inputs, but instead of averaging all the values, it just takes the maximum over all inputs in the block.

Conceptually, MaxPooling is similar to a convolution: both methods consider a contiguous 'block' within a larger input 'image' and produce an output that is dependent on the inputs within the 'block'. Unlike a convolution layer, however, MaxPooling layers have *no* trainable parameters - they simply transmit the maximum value across a block of inputs.

So, MaxPooling can be used to 'decrease' the 'size' of 'image' data, without adding trainable parameters to the model.

MaxPooling layers are often used in convolution networks precisely for the purpose of 'compressing' the image data, without losing the main 'features' of the data, and without requiring any trainable parameters.

Tensorflow implements 2-dimensional MaxPooling as a tf.keras.layers.MaxPool2D object, which can be added to a neural network just like any other type of layer. By default, MaxPool2D objects 'pool' information from a 2x2 contiguous 'block' of inputs, so they decrease the image 'size' by a factor of 2 in both height and width.

We can create a MaxPool2D object and add it to our model using the python code:

    model.add(tf.keras.layers.MaxPool2D())

Try adding a MaxPool2D layer to the model immediately following the convolution layer but before the dropout layer. Make sure you use 32 filters in the convolution layer!

In [7]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Dropout(rate=0.9))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 22ms/step - accuracy: 0.9905 - loss: 0.0264 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 14ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 15ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.000

<keras.src.callbacks.history.History at 0x7c2cbe5d7b90>

Well, accuracy on the *training* data appears to have declined a bit (in my case, final training accuracy after 20 epochs was 0.85), and validation accuracy actually *increased* a tad (0.74, in my case).

So, we're 'moving' in the 'right' directions.

If we take a 'step back' and consider the *structure* of our little convolution network, we can see that we've created a 'block' of network layers consisting of the following:

1. a convolution layer with 32 filters
2. a MaxPooling layer
3. a dropout layer (with dropout rate 0.9)

Right now, the output from this 'block' is flattened and then sent to the 'decision' or 'classification' layer (ie, the output layer that actually 'classifies' the images).

However, we could consider 'replicating' this 'block' of 3 layers multiple times, in order to create a 'deeper' modular network. Many very deep neural networks are built this way: they are composed of a sequence of replicated 'blocks'. In our case, we could 'replicate' our convolution-pooling-dropout 'block' many times, transforming:

    rescaling
    convolution
    pooling
    dropout
    flatten
    output

into:

    rescaling
    convolution
    pooling
    dropout
    convolution
    pooling
    dropout
    convolution
    pooling
    dropout
    ...
    flatten
    output

Technically, there's nothing preventing us from replicating the convolution-pooling-dropout 'module' as many times as we want, although we'd want to keep the model's parameter count relatively low. Also, as both the convolution and pooling layers decrease the height and width of the 'image' data, at some point we'd wind up with a 1x1 'image' (although we could use 'same' padding to prevent this).

For now, let's try implementing a model with 4 total convolution-pooling-dropout modules. Make sure you use 32 3x3 convolution filters in each of the convolution layers, and use 0.9 as the dropout rate in each of the dropout layers.

In [8]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))

model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Dropout(rate=0.9))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 20ms/step - accuracy: 0.9915 - loss: 0.0220 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 15ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 17ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 18ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 16ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 18ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000

<keras.src.callbacks.history.History at 0x7c2ca097bd70>

Did you notice that your 'deeper' network actually has *fewer* parameters!

This is because the vast majority of the trainable parameters typically comes from the final Dense output layer's connections. By reducing the size of the 'image' data (in this case, the final 'image' is 7x7), we've *dramatically* reduced the number of trainable parameters in the output layer.

But wow, our network is *not* performing very well. After 20 epochs of training, my network achieved only 0.49 accuracy on the training data, and the accuracy on the validation data was 0.18.

By 'compressing' the image data through multiple rounds of MaxPooling *and* using a *very* strong dropout rate after *every* convolution-pooling block, we've effectively crippled our neural network's capacity!

Let's try *removing* all those Dropout layers, and see how the network behaves!

In [9]:
# build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))

model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation=tf.keras.activations.relu))
model.add(tf.keras.layers.MaxPool2D())

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 29ms/step - accuracy: 0.9849 - loss: 0.0776 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 20ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 20ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 19ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 20ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 20ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000

<keras.src.callbacks.history.History at 0x7c2c7bf32f00>

Our model has essentially the same number of trainable parameters with or without the Dropout layers, but removing Dropout definitely *improved* accuracy on both training and validation data.

When trying to design a convolution network from scratch, it can be challenging to make reliable design decisions. Often times, network 'design' becomes a fairly haphazard 'random walk' through network 'space' - you try something, see how it works, and then try something else. Often times, strategies that improve accuracy on their own don't necessarily 'play well' together, so you can take a few steps forward, followed by a giant leap *backward* as you try to design a reliable neural network.

## transfer learning

An alternative to building a neural network 'from scratch' is to use a pre-existing neural network and then 'tweak' it to solve your specific problem.

One of the main ways to do this is called "transfer learning".

In general, transfer learning is the process of taking a neural network that has been trained to solve one problem and 'transferring' it to solve a different problem. In practice, this is almost always done by 'grafting' part of the original network onto a new neural network, which is then trained to solve the specific problem at hand.

In order to use transfer learning, the original network to be 'grafted' onto a new network to solve a new problem must meet a couple criteria:

*  the original network must be trained to solve a problem that is relevant to the new problem at hand, and
*  the original network must consist of at least 2 'modules': one that 'extracts relevant features' to solve the new problem, and another module that 'solves' the original problem.

The idea behind transfer learning is to 'separate' the 'generalizable module' from the original network from the 'specialized module' that solves the old problem. This 'generalizable module' is then 'grafted' onto a new 'specialized module' that can be trained to solve the new problem.

Transfer learning is widely used in image analysis problems using convolution neural networks.

Convolution neural networks naturally fit the two 'criteria' for transfer learning to be effective. If we think about the structure of a typical convolution network, it consists of 2 modules:

*  the 'convolution blocks' that process image data into 'features' that can be used for image classification, and
*  the 'decision layers' that classify an image, based on its extracted 'features'.

Transfer learning works by 'grafting' the 'feature extraction' layers from the convolution network onto a *new* set of decision layers, which can then be trained to classify *new* images, based on the original 'features' that are extracted using the existing convolution blocks.

Transfer learning only works if the original 'features' extracted from image data are *relevant* for the new classification problem. Fortunately, in many cases convolution networks can extract very generalizable features from image data. So, we can train *extremely* large convolution networks using 'standardized' image data sets, and then 'transfer' the extracted features to solve new image classification problems.

The Tensorflow library provides easy access to a relatively large number of pre-trained neural networks that can be used for transfer learning.

In this case, we'll use the MobileNetV2 network as the 'donor' for our image analysis problem. The paper describing MobileNetV2 can be found [here](https://arxiv.org/abs/1801.04381). For our purposes, the exact architecture of MobileNetV2 is not important; MobileNetV2 is simply one of a number of 'state-of-the-art' deep-learning models for image analysis. And it is provided as a pre-trained model in Tensorflow, so it makes it easy to incorporate the feature extraction layers from MobileNetV2 into a new transfer-learning model.

To download the MobileNetV2 'base model', we just need to instantiate a model using the Tensorflow constructor:

    tf.keras.applications.MobileNetV2(...)

And we'll need to specify a few things about how we want the pre-trained model to work.

First, we need to specify the appropriate input_shape of the pre-trained model, which must match the shape of our image data. Our images have shape (150,150,3), so we'll need to specify the:

    input_shape=(150,150,3)

option when we create our pre-trained model.

We'll also need to tell tensorflow that we *only* want the 'feature extraction' module from the pre-trained network, so tensorflow should 'throw away' the existing decision layers. To specify this, we set the:

    include_top=False

option.

Finally, we need to specify which specific network weights to use for the pre-trained model. In our case, we'll use the weights inferred using the "ImageNet" data set, which is a very large data set of images widely used to train and evaluate image-analysis networks. At the time of this writing, ImageNet consists of over 14 million training images organized into more than 21,000 classes, so it can be used to train very 'deep' neural networks. More information about the ImageNet data set can be found [here](https://www.image-net.org/). To specify network weights inferred from the ImageNet data set, we set the:

    weights='imagenet'

option.

To use the MobileNetV2 pre-trained network in transfer learning, we have to 'freeze' the network's pre-trained parameters, so they don't get updated during the model fitting process. To do this, we simply set:

    base_model.trainable = False

Which tells tensorflow *not* to update the pre-trained model's parameters during the model.fit(...) process.

Just like any other tensorflow model, we can see a summary of the base_model's parameters.

In [10]:
import tensorflow as tf

# download pre-trained convolution network
base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),
                                               include_top=False,
                                               weights='imagenet')
# turn off training for the base model
base_model.trainable = False

# print summary of base model
base_model.summary()

  base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


As you can see, MobileNetV2 is a very deep and complex image-analysis network. There are over 2 million parameters in the MobileNetV2 model, and by setting the model to not-trainable, *none* of those parameters will be changed during the model fitting process. So, by using a pre-trained network and transfer learning, we gain a *lot* of statistical power without having to fit all those parameters to our small data set.

If you scroll back, you'll see that you get a 'warning' message when you create the MobileNetV2 network using an image shape of (150,150,3). This is because the MobileNetV2 network has not been pre-trained on images of this shape. In fact, 150x150 images are very unusual for neural-network training. Remember how computer scientists really like powers of 2! Well, most 'standard' image analysis networks are trained using images of 64x64, 128x128, etc. In this case, the parameters of our pre-trained network are not an *ideal* fit for our 150x150 images, so tensorflow is warning us of this potential problem. We'll ignore it for now, but in practice we might consider down-sampling our images to a more 'standard' size, such as 128x128.

Now we have the 'feature extraction' module that we need to 'graft' onto a new neural network to solve our specific image classification problem.

Fortunately, tensorflow allows us to use the entire MobileNetV2 object just like any other tf.keras.layers.Layer object, so we can add the entire pre-trained feature-extraction network to a new model, just as if it were a single neural-network layer!

The code cell below incorporates the MobileNetV2 network (captured as the python variable, "base_model") into our existing image classification network.

Notice that the input layer is still our Rescaling layer that scales our image data to be between 0 and 1. And the output layers include the Flatten and Dense classification layers, as before. We also added a single MaxPool2D layer *after* the MobileNetV2 pre-trained network.

In [11]:
# build complete inference model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(base_model)
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

As you can see from the complete model summary, there are 2,288,710 total parameters in our new network, the vast majority of which (2,257,984) come from the pre-trained MobileNetV2 sub-network. These pre-trained parameters are 'frozen', so our entire model only has 30,726 *trainable* parameters.

The code cell below contains a complete end-to-end transfer learning example using this model and our original landscape image data.


In [12]:
import tensorflow as tf
import pathlib

# download training data
train_url = 'https://zenodo.org/record/5512793/files/train.tgz'
train_dir = tf.keras.utils.get_file(origin=train_url, fname='train', untar=True)
train_dir = pathlib.Path(train_dir)

# download validation data
valid_url = 'https://zenodo.org/record/5512793/files/valid.tgz'
valid_dir = tf.keras.utils.get_file(origin=valid_url, fname='valid', untar=True)
valid_dir = pathlib.Path(valid_dir)

# print number of training and validation images
train_image_count = len(list(train_dir.glob('*/*.jpg')))
valid_image_count = len(list(valid_dir.glob('*/*.jpg')))
print(train_image_count, valid_image_count)

# package images into tensorflow dataset objects
train_data = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
valid_data = tf.keras.preprocessing.image_dataset_from_directory(valid_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
# print tensorflow dataset objects
print(train_data, valid_data)

# download pre-trained convolution network
base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),
                                               include_top=False,
                                               weights='imagenet')
# turn off training for the base model
base_model.trainable = False

# build complete inference model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(base_model)
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

0 0
Found 14034 files belonging to 1 classes.
Found 3000 files belonging to 1 classes.
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))> <_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>


  base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),


Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 74ms/step - accuracy: 0.9854 - loss: 0.0687 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 20ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 19ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 21ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 21ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 21ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000

<keras.src.callbacks.history.History at 0x7c2d16a31790>

You'll notice that, even though most of those >2 million parameters are not trainable, the weights and bias terms still need to be used in the neural network calculations, so the training process is quite a bit slower.

However, the network converges fairly quickly to >0.95 accuracy on the training data. Unfortunately, the accuracy on the validation data is still a bit lower (0.88, in my case). While this might not be a *huge* cause for concern, it does suggest that our model is still 'overfitting' the training data a bit, even though *most* of our model's parameters are not being trained!

Like most 'real world' image analysis problems, we don't seem to have enough image data to reliably train a deep neural network, even using transfer learning!

## data augmentation

To address this 'data shortage', we'll use a technique called "data augmentation".

Data augmentation is a technique that 'modifies' existing training data so that the model 'sees' more training data than you actually have available. Basically, by modifying the existing images in some reasonable, stochastic way, we can generate *new* images that are similar to - but different from - the existing training data. By incorporating this 'data augmentation' process into the model fitting routines, we effectively *increase* the amount of training data being supplied to the neural network, which reduces the model's tendency to over-fit.

Fortunately, image data is relatively 'easy' to 'augment'. If you flip an image of a mountain horizontally, it's still an image of a mountain, and it's very *different* from the original image (at least to a computer!). Similarly, if we rotate an image of a forest a few degrees to the right or left, it still 'looks like' an image of a forest, but it provides *new* data for the neural network to learn from.

Data augmentation has been extensively studied in the context of image classification problems, and tensorflow has implemented standard 'data augmentation' operations that can be easily incorporated into nearly *any* neural network model, because they behave just like any other tf.keras.layers.Layer object!

For example, the:

    tf.keras.layers.RandomFlip

object will stochastically 'flip' an image. By setting the 'horizontal' option when we create this layer, we can ensure that images will be randomly 'flipped horizontally', but not vertically.

Similarly, the:

    tf.keras.layers.RandomRotation

'layer' will randomly 'rotate' an image right or left. By setting the maximum rotation option to 0.2 (a commonly-used value), we can ensure that our images don't get rotated 'too much' to be 'believable'.

There are many other 'data augmentation' operations available for image data, but these are probably the most common, and they are probably sufficient in our case to reduce overfitting during transfer learning.

The following code cell contains an end-to-end transfer learning example, including incorporating data augmentation into the neural-network training process.

In [13]:
import tensorflow as tf
import pathlib

# download training data
train_url = 'https://zenodo.org/record/5512793/files/train.tgz'
train_dir = tf.keras.utils.get_file(origin=train_url, fname='train', untar=True)
train_dir = pathlib.Path(train_dir)

# download validation data
valid_url = 'https://zenodo.org/record/5512793/files/valid.tgz'
valid_dir = tf.keras.utils.get_file(origin=valid_url, fname='valid', untar=True)
valid_dir = pathlib.Path(valid_dir)

# print number of training and validation images
train_image_count = len(list(train_dir.glob('*/*.jpg')))
valid_image_count = len(list(valid_dir.glob('*/*.jpg')))
print(train_image_count, valid_image_count)

# package images into tensorflow dataset objects
train_data = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
valid_data = tf.keras.preprocessing.image_dataset_from_directory(valid_dir,
                                                                 image_size=(150,150),
                                                                 batch_size=32)
# print tensorflow dataset objects
print(train_data, valid_data)

# download pre-trained convolution network
base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),
                                               include_top=False,
                                               weights='imagenet')
# turn off training for the base model
base_model.trainable = False

# build complete inference model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Rescaling(1.0/255, input_shape=[150,150,3]))
model.add(tf.keras.layers.RandomFlip('horizontal'))
model.add(tf.keras.layers.RandomRotation(0.2))
model.add(base_model)
model.add(tf.keras.layers.MaxPool2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=6, activation=tf.keras.activations.softmax))

model.summary()

# compile model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# fit model
model.fit(train_data, epochs=20, validation_data=valid_data)

0 0
Found 14034 files belonging to 1 classes.
Found 3000 files belonging to 1 classes.
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))> <_PrefetchDataset element_spec=(TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>


  base_model = tf.keras.applications.MobileNetV2(input_shape=(150,150,3),


Epoch 1/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 41ms/step - accuracy: 0.9858 - loss: 0.0527 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 2/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 37ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 3/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 4/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 36ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 5/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 36ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/20
[1m439/439[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 36ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.

<keras.src.callbacks.history.History at 0x7c2c7a55e960>

By effectively providing *more* training data than we actually have available, 'data augmentation' appears to have reduced overfitting in our transfer-learning example.