# Machine Learning Zoomcamp

# Capstone Project 1 - Terrain Image Classification

## Just the Training

This notebook only contains the final training of the model so anyone can just execute it to obtain the model binaries

The dataset is available at Zenodo.org at [this address]('https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip?download=1')

Lets download the dataset

## Download dataset

In [1]:
!wget https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip

--2024-12-30 16:20:45--  https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip
Resolving zenodo.org (zenodo.org)... 188.185.43.25, 188.185.48.194, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.43.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94658721 (90M) [application/octet-stream]
Saving to: ‘EuroSAT_RGB.zip’


2024-12-30 16:20:52 (14.7 MB/s) - ‘EuroSAT_RGB.zip’ saved [94658721/94658721]



We extract the data from the zip file downloaded

In [2]:
!unzip -q EuroSAT_RGB.zip

## Data Preparation and split

In [3]:
import os
import shutil
from sklearn.model_selection import train_test_split


In [4]:
categories = os.listdir('EuroSAT_RGB')
categories

['River',
 'AnnualCrop',
 'SeaLake',
 'Highway',
 'Residential',
 'HerbaceousVegetation',
 'PermanentCrop',
 'Industrial',
 'Forest',
 'Pasture']

For our model training, we need to create folders for train, test and evaluation.

Additionally, as we have 10 different classes of terrain, we need to create folders with these classes inside the training, testing and evaluation forlder

In [5]:
for dir_name in ['train', 'val', 'test']:
    for cat in categories:
      os.makedirs(dir_name, exist_ok=True)
      os.makedirs(os.path.join(dir_name, cat), exist_ok=True)

Now we do the train, test, validation split, meaning that we are making a distribution of the images of each of the 10 classes available, inside the train, test and validation folders and subfolders.

In [6]:
for cat in categories:
  image_paths = []
  for img in os.listdir(os.path.join('EuroSAT_RGB/', cat)):
    image_paths.append(os.path.join('EuroSAT_RGB/', cat, img))
  print(cat, len(image_paths))
  full_train_paths, test_paths = train_test_split(image_paths, test_size=0.2, random_state=42)
  train_paths, val_paths = train_test_split(full_train_paths, test_size=0.25, random_state=42)
  for path in train_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('train', cat, image))
  for path in val_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('val', cat, image))
  for path in test_paths:
    image = path.split('/')[-1]
    shutil.copy(path, os.path.join('test', cat, image))

River 2500
AnnualCrop 3000
SeaLake 3000
Highway 2500
Residential 3000
HerbaceousVegetation 3000
PermanentCrop 2500
Industrial 2500
Forest 3000
Pasture 2000


## Training the model

In [7]:
# Lets import the libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam



### Basic parameters

In [8]:
# Define paths and parameters
main_dir = os.getcwd()
train_dir = os.path.join(main_dir, 'train')
test_dir = os.path.join(main_dir, 'test')
val_dir = os.path.join(main_dir, 'val')
img_width, img_height = 150, 150
batch_size = 32
num_classes = 10  # Adjust based on the number of classes
epochs = 35

### Image preparation

In [12]:
# Create ImageDataGenerator objects
train_datagen = ImageDataGenerator(
    rescale=1./255,
    zoom_range=0.02,
    rotation_range=15,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)
validation_datagen = ImageDataGenerator(rescale=1./255)

In [13]:
# Load and preprocess data
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = validation_datagen.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

Found 16200 images belonging to 10 classes.
Found 5400 images belonging to 10 classes.
Found 5400 images belonging to 10 classes.


### Make Model function

In order to make it easy to test and tune the parameters, we are going to define a function that actually create the model considering the parameters indicated previously (learning rate, dropout rate, data augmentation)


In [15]:
def make_model(learning_rate, dropout_rate=0.5):
    optimizer = Adam(learning_rate=learning_rate)
    model = Sequential()

    # First convolutional block
    model.add(Conv2D(32, (3, 3), input_shape=(img_width, img_height, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Second convolutional block
    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # Third convolutional block (optional, for deeper models)
    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])
    return model

### Checkpoints

In [16]:
checkpoint = keras.callbacks.ModelCheckpoint(
    'model_vf_{epoch:02d}_{val_accuracy:.3f}.h5.keras',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'
)

In [17]:
learning_rate = 0.001
dropout_rate = 0.1
batch_size = 32
scores = {}

model = make_model(learning_rate=learning_rate, dropout_rate=dropout_rate)
history = model.fit(
  train_generator,
  steps_per_epoch=train_generator.samples // batch_size,
  epochs=epochs,
  validation_data=validation_generator,
  validation_steps=validation_generator.samples // batch_size,
  callbacks=[checkpoint])
scores['final'] = history.history
test_loss, test_acc = model.evaluate(test_generator, steps=test_generator.samples // batch_size)
print('Test accuracy:', test_acc)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/35


  self._warn_if_super_not_called()


[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m107s[0m 195ms/step - accuracy: 0.3913 - loss: 1.6071 - val_accuracy: 0.6337 - val_loss: 1.0361
Epoch 2/35
[1m  1/506[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m16s[0m 32ms/step - accuracy: 0.4688 - loss: 1.5053

  self.gen.throw(typ, value, traceback)


[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.4688 - loss: 1.5053 - val_accuracy: 0.6250 - val_loss: 0.9991
Epoch 3/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m99s[0m 194ms/step - accuracy: 0.6557 - loss: 0.9433 - val_accuracy: 0.7294 - val_loss: 0.7631
Epoch 4/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 49us/step - accuracy: 0.5625 - loss: 1.4181 - val_accuracy: 0.7083 - val_loss: 0.7057
Epoch 5/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m140s[0m 189ms/step - accuracy: 0.7227 - loss: 0.7747 - val_accuracy: 0.7522 - val_loss: 0.6807
Epoch 6/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1000us/step - accuracy: 0.8750 - loss: 0.4727 - val_accuracy: 0.7083 - val_loss: 0.6339
Epoch 7/35
[1m506/506[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m94s[0m 184ms/step - accuracy: 0.7620 - los