# Machine Learning Zoomcamp - Capstone Project 1 - Terrain Image Classification

As described on the Readme.md file, in this project we'll train and tune a Convolutional Neural Network model to classify image terraings.

The dataset is available at Zenodo.org at [this address]('https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip?download=1')

Lets download the dataset

## Download dataset

In [1]:
!wget https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip

--2024-12-27 19:23:21--  https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94658721 (90M) [application/octet-stream]
Saving to: ‘EuroSAT_RGB.zip’


2024-12-27 19:23:31 (9.62 MB/s) - ‘EuroSAT_RGB.zip’ saved [94658721/94658721]



After downloading the file, lets see the content of the directory:

In [None]:
!ls

We extract the data from the zip file downloaded

In [2]:
!unzip -q EuroSAT_RGB.zip

Lets take a look again at the folder and the content of the folder we just extracted

In [None]:
!ls

In [None]:
!ls EuroSAT_RGB

We can see there are 10 different classes of terrain, but the images are not yet distributed as train, test and validation, so we need to do that

## Data Preparation and split

In [3]:
import os
import shutil
from sklearn.model_selection import train_test_split


In [4]:
categories = os.listdir('EuroSAT_RGB')
categories

['SeaLake',
 'River',
 'Forest',
 'HerbaceousVegetation',
 'AnnualCrop',
 'Industrial',
 'Pasture',
 'Residential',
 'PermanentCrop',
 'Highway']

For our model training, we need to create folders for train, test and evaluation.

Additionally, as we have 10 different classes of terrain, we need to create folders with these classes inside the training, testing and evaluation forlder

In [5]:
for dir_name in ['train', 'val', 'test']:
    for cat in categories:
      os.makedirs(dir_name, exist_ok=True)
      os.makedirs(os.path.join(dir_name, cat), exist_ok=True)

In [None]:
!ls

If we look inside the train folder we should see folders for each of the categories

In [None]:
!ls train

Now we do the train, test, validation split, meaning that we are making a distribution of the images of each of the 10 classes available, inside the train, test and validation folders and subfolders.

In [6]:
for cat in categories:
  image_paths = []
  for img in os.listdir(os.path.join('EuroSAT_RGB/', cat)):
    image_paths.append(os.path.join('EuroSAT_RGB/', cat, img))
  print(cat, len(image_paths))
  full_train_paths, test_paths = train_test_split(image_paths, test_size=0.2, random_state=42)
  train_paths, val_paths = train_test_split(full_train_paths, test_size=0.25, random_state=42)
  for path in train_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('train', cat, image))
  for path in val_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('val', cat, image))
  for path in test_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('test', cat, image))

SeaLake 3000
River 2500
Forest 3000
HerbaceousVegetation 3000
AnnualCrop 3000
Industrial 2500
Pasture 2000
Residential 3000
PermanentCrop 2500
Highway 2500


Just to make sure the images are splitted right, we can list and get the number of files per folder/subfolder. Lets take the AnnualCrop subfolder as an example.

In [None]:
!ls -l test/AnnualCrop|wc -l

In [None]:
!ls -l train/AnnualCrop|wc -l

In [None]:
!ls -l val/AnnualCrop|wc -l

As it can be seen the distribution of files is right in terms of train, test, val

## Training and Tunning the Model

For training and tunning the model we'll use three parameters:
- Learning rate
- Dropout rate
- Data augmentation

In [None]:
# Lets import the libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam

In [None]:
!python --version

In [None]:
tf.__version__

### Basic parameters

In [8]:
# Define paths and parameters
main_dir = os.getcwd()
train_dir = os.path.join(main_dir, 'train')
test_dir = os.path.join(main_dir, 'test')
val_dir = os.path.join(main_dir, 'val')
img_width, img_height = 150, 150
batch_size = 32
num_classes = 10  # Adjust based on the number of classes
epochs = 35

### Image preparation

In [None]:
# Create ImageDataGenerator objects
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)

In [None]:
# Load and preprocess data
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

### Make Model function

In order to make it easy to test and tune the parameters, we are going to define a function that actually create the model considering the parameters indicated previously (learning rate, dropout rate, data augmentation)


In [None]:
def make_model(learning_rate, dropout_rate=0.5):
  optimizer = Adam(learning_rate=learning_rate)  # Set your desired learning rate here
  model = Sequential()
  model.add(Conv2D(32, (3, 3), input_shape=(img_width, img_height, 3), activation='relu'))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Conv2D(64, (3, 3), activation='relu'))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Flatten())
  model.add(Dense(128, activation='relu'))
  model.add(Dropout(dropout_rate))
  model.add(Dense(num_classes, activation='softmax'))

  model.compile(loss='categorical_crossentropy',
                optimizer=optimizer,
                metrics=['accuracy'])
  return model

### Checkpoints

We are also going to setup checkpoints to save the best accuracy models.

As we are goint to make different tests based on changing the tunning parameters, we'll be saving the model with different versions, depending on the parameter we are testing.

In this case v1 will be related to the tunning of the learning_rate

In [None]:
checkpoint = keras.callbacks.ModelCheckpoint(
    'model_v1_{epoch:02d}_{val_accuracy:.3f}.h5.keras',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'
)

We are going to store the results of each model fit in a dictionary (scores) so we can have all the data at hand and select the best parameters

In [None]:
scores_lr = {}
for lr in [0.0001, 0.001]:
  print(lr)
  model = make_model(learning_rate=lr)
  history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // batch_size,
    callbacks=[checkpoint])
  scores_lr[lr] = history.history
  test_loss, test_acc = model.evaluate(test_generator, steps=test_generator.samples // batch_size)
  print('Test accuracy:', test_acc)

In [None]:
from matplotlib import pyplot as plt

In [None]:
for lr, hist in scores_lr.items():
  plt.plot(hist['val_accuracy'], label=lr)
  plt.title("Validation Accuracy by Learning Rate")
  plt.xlabel("Epochs")
  plt.ylabel("Accuracy")
  plt.legend()

Based on this result, the best learning rate would be **0.001** as it shows more stable

### Dropout Rate
Now we'll try different values for the dropout rate


In [None]:
checkpoint = keras.callbacks.ModelCheckpoint(
    'model_v2_{epoch:02d}_{val_accuracy:.3f}.h5.keras',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'
)

In [None]:
learning_rate = 0.001
scores_dr = {}

for dr in [0, 0.1, 0.5]:
    print(dr)
    history = model.fit(
        train_generator,
        steps_per_epoch=train_generator.samples // batch_size,
        epochs=epochs,
        validation_data=validation_generator,
        validation_steps=validation_generator.samples // batch_size,
        callbacks=[checkpoint])
    scores_dr[dr] = history.history
    test_loss, test_acc = model.evaluate(test_generator, steps=test_generator.samples // batch_size)
    print('Test accuracy:', test_acc)

In [None]:
for dr, hist in scores_dr.items():
  if (dr in [0, 0.1, 0.5]):
    plt.plot(hist['val_accuracy'], label=dr)
    plt.title("Validation Accuracy by Dropout rate")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy")
    plt.legend()

Based on these results, at lower epochs the best **Dropout rate** is 0.1, but at higher epochs we get better results with Dropout rates of 0 and 0.1. For the final decision, we use the Test Accuracy result which was better with a **Dropout Rate of 0.1**

### Data Augmentation

The final step of the tunning of the model would be based on data augmentation.

For this we are going to include additional parameters at the moment when we use the ImageDataGenerator at the training data

In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    zoom_range=0.05,
    rotation_range=20,    
    horizontal_flip=True)


In [None]:
# Load and preprocess data
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

In [None]:
checkpoint = keras.callbacks.ModelCheckpoint(
    'model_v3_{epoch:02d}_{val_accuracy:.3f}.h5.keras',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'
)

In [None]:
learning_rate = 0.0001
dropout_rate = 0.1
batch_size = 32
scores_aug = {}

model = make_model(learning_rate=learning_rate, dropout_rate=dropout_rate)
history = model.fit(
  train_generator,
  steps_per_epoch=train_generator.samples // batch_size,
  epochs=epochs,
  validation_data=validation_generator,
  validation_steps=validation_generator.samples // batch_size,
  callbacks=[checkpoint])
scores_aug['augment'] = history.history
test_loss, test_acc = model.evaluate(test_generator, steps=test_generator.samples // batch_size)
print('Test accuracy:', test_acc)

In [None]:
for aug, hist in scores_aug.items():
    plt.plot(hist['val_accuracy'])
    plt.title("Validation Accuracy by Dropout rate")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy")
    plt.legend()

We got good validation results after augmenting the data at higher epochs and got good validation accuracy

## Final model definition

We'll train once again the model with the final parameters

In [None]:
del model

In [9]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    zoom_range=0.02,
    rotation_range=15,    
    horizontal_flip=True)


In [10]:
# Create ImageDataGenerator objects
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)

In [11]:
# Load and preprocess data
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

Found 16200 images belonging to 10 classes.
Found 5400 images belonging to 10 classes.
Found 5400 images belonging to 10 classes.


In [12]:
checkpoint = keras.callbacks.ModelCheckpoint(
    'model_vf_{epoch:02d}_{val_accuracy:.3f}.h5.keras',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'
)

In [13]:
def make_model(learning_rate, dropout_rate=0.5):
    optimizer = Adam(learning_rate=learning_rate)
    model = Sequential()

    # First convolutional block
    model.add(Conv2D(32, (3, 3), input_shape=(img_width, img_height, 3), activation='relu', padding='same')) 
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Second convolutional block
    model.add(Conv2D(64, (3, 3), activation='relu', padding='same')) 
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Third convolutional block (optional, for deeper models)
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same')) 
    model.add(MaxPooling2D(pool_size=(2, 2))) 

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])
    return model

In [14]:
learning_rate = 0.001
dropout_rate = 0.1
batch_size = 32
scores = {}

model = make_model(learning_rate=learning_rate, dropout_rate=dropout_rate)
history = model.fit(
  train_generator,
  steps_per_epoch=train_generator.samples // batch_size,
  epochs=epochs,
  validation_data=validation_generator,
  validation_steps=validation_generator.samples // batch_size,
  callbacks=[checkpoint])
scores['final'] = history.history
test_loss, test_acc = model.evaluate(test_generator, steps=test_generator.samples // batch_size)
print('Test accuracy:', test_acc)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/35


  self._warn_if_super_not_called()


[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 46ms/step - accuracy: 0.2926 - loss: 1.8168 - val_accuracy: 0.6168 - val_loss: 1.0061
Epoch 2/35
[1m  1/506[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m15s[0m 31ms/step - accuracy: 0.5938 - loss: 1.1767

  self.gen.throw(typ, value, traceback)


[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.5938 - loss: 1.1767 - val_accuracy: 0.6250 - val_loss: 1.0654
Epoch 3/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 37ms/step - accuracy: 0.6403 - loss: 0.9674 - val_accuracy: 0.7770 - val_loss: 0.6530
Epoch 4/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 730us/step - accuracy: 0.7188 - loss: 0.7326 - val_accuracy: 0.8333 - val_loss: 0.5321
Epoch 5/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 37ms/step - accuracy: 0.7853 - loss: 0.6060 - val_accuracy: 0.7948 - val_loss: 0.5756
Epoch 6/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41us/step - accuracy: 0.7812 - loss: 0.5273 - val_accuracy: 0.7500 - val_loss: 0.6865
Epoch 7/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 37ms/step - accuracy: 0.8426 - loss: 0.4499 - val_accuracy: 0.7801 - val_loss: 0.6390
Epoch 8/35
[1m506/506[0m [3

After running these epochs I got the best choice through a checkpoint where the **val_accuracy** was **0.958**

Now that we got the final model, let us take the class_indices for future model predictions

In [15]:
class_indices = train_generator.class_indices
class_indices

{'AnnualCrop': 0,
 'Forest': 1,
 'HerbaceousVegetation': 2,
 'Highway': 3,
 'Industrial': 4,
 'Pasture': 5,
 'PermanentCrop': 6,
 'Residential': 7,
 'River': 8,
 'SeaLake': 9}

## Converting the model to TF-Lite

In order to deploy the service we convert the model to tf-lite

In [5]:
import numpy as np
import tensorflow as tf

In [6]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

with open('terrain-classification.tflite', 'wb') as f_out:
    f_out.write(tflite_model)

NameError: name 'model' is not defined