# 1. Introduction

## 1.1 Project Overview

The objective of this project is to develop a machine learning model capable of classifying images of animals into their respective categories. This classification task is crucial for applications in wildlife monitoring, automated animal detection in images, and educational tools in zoology and biodiversity.

## 1.2 Dataset

The dataset used for this project is the Animals10 dataset, which contains images of ten different animal classes. The dataset includes a total of 28,000 images with each class having a diverse set of images representing various poses, backgrounds, and lighting conditions. The ten animal classes are:

 * Dog
 * Cat
 * Horse
 * Spider
 * Butterfly
 * Chicken
 * Cow
 * Sheep
 * Squirrel
 * Elephant

The images are in JPEG format and come in various dimensions, reflecting real-world scenarios.


## 1.3 Tools and Technologies

* Python
* Python Libraries:
    * NumPy
    * Pandas
    * Scikit-Learn
    * Matplotlib
    * Seaborn
    * Keras
    * PIL
* Jupyter Notebook
* Flask

### 1.4 Loading the data


In [None]:
import zipfile
import os
import random
import shutil

!kaggle datasets download -d alessiocorrado99/animals10 -p /content

# Define the path to the zip file
zip_file_path = '/content/animals10.zip'

# Define the directory where you want to extract the files
extract_dir = '/content/animals10/'

# Create the extraction directory if it doesn't exist
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)

# Extract the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)


animals_dir = '/content/animals10/raw-img'

# Verify that the directory exists
if not os.path.exists(animals_dir):
    raise ValueError(f"Directory {animals_dir} does not exist")


## 2. Data Exploration and Preprocessing
### 2.1. Data Exploration

Initially, I explored the dataset to understand its structure and characteristics. This included:

* Dataset Structure: The Animals10 dataset consists of images categorized into ten different animal classes, with each class stored in a separate directory. Each directory contains various images of the respective animal.


In [None]:
# Get class names
class_names = [class_name for class_name in os.listdir(animals_dir) if os.path.isdir(os.path.join(animals_dir, class_name))]


* Class Distribution: We checked the distribution of images across different classes to ensure there is a balance. If the classes were imbalanced, this could affect the performance of the model.



In [None]:
# Counting total images per class
image_class = {}
image_formats = set()
image_dimensions = []
for class_name in class_names:
    class_dir = os.path.join(animals_dir, class_name)
    file_list = [f for f in os.listdir(class_dir) if f.endswith(('.jpg', '.png', '.jpeg'))]
    image_class[class_name] = len(file_list)

    # Check image formats and dimensions
    for file_name in file_list:
        img_path = os.path.join(class_dir, file_name)
        with Image.open(img_path) as img:
            image_formats.add(img.format)
            image_dimensions.append(img.size)


# Print total images per class
print("\n\nTotal images per class in the dataset:")
for class_name, count in image_class.items():
    print(f"{class_name} : {count}")

* Image Dimensions and Formats: We verified the dimensions and formats of the images to standardize preprocessing steps. The images were in JPEG format, but the dimensions varied, necessitating resizing during preprocessing.

In [None]:
# Print image formats
print("\nImage formats in the dataset:")
print(image_formats)

# Print image dimensions
print("\nSample image dimensions:")
print(f"Min dimensions: {np.min(image_dimensions, axis=0)}")
print(f"Max dimensions: {np.max(image_dimensions, axis=0)}")
print(f"Average dimensions: {np.mean(image_dimensions, axis=0)}")

* Sample Visualization: We visualized random samples from each class to get an intuition about the dataset and identify any anomalies or variations in the images that might require specific preprocessing steps.

In [None]:
# Plotting sample images from each class
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20,10))
axes = axes.flatten()
for ax, class_name in zip(axes, class_names):
    class_dir = os.path.join(animals_dir, class_name)
    file_list = [f for f in os.listdir(class_dir) if os.path.isfile(os.path.join(class_dir, f))]
    if file_list:
        path_sample = os.path.join(class_dir, file_list[0])  # Selecting the first file
        img = image.load_img(path_sample, target_size=(200, 200))
        ax.imshow(img)
        ax.set_title(class_name)
        ax.axis('off')
    else:
        continue
plt.tight_layout()
plt.show()


## 2.2 Data Preprocessing

Data preprocessing steps included:

* Handling Missing Values: There were no missing values in the dataset since all data points are images stored in directories.

* Data Splitting: We split the data into training, validation, and test sets. We moved 10% of the images from each class to the test set, and further split the remaining data into training (80%) and validation (20%) sets using the ImageDataGenerator.

* Image Resizing and Scaling: All images were resized to a uniform size of 200x200 pixels to ensure consistency. We also rescaled pixel values to the range [0, 1] by dividing by 255.0.

* Data Augmentation: To enhance model generalization, we applied several augmentation techniques:

 * Horizontal Flip: Randomly flipping images horizontally.
 * Zooming: Randomly zooming into images.
 * Width and Height Shifts: Randomly shifting images horizontally and vertically.
 * Shearing: Applying random shearing transformations.
 * Brightness Adjustment: Varying the brightness of images.
 * Rotation: Randomly rotating images within a specified range.

In [None]:
# Create ImageDataGenerators for training and validation sets
data_generator = ImageDataGenerator(
    rescale=1./255.0,
    validation_split=0.2,  # Split for training and validation
    horizontal_flip=True,
    zoom_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    brightness_range=[0.5, 1.5],
    rotation_range=30,
    fill_mode="nearest"
)

# Create ImageDataGenerator for test set
test_val_data_generator = ImageDataGenerator(rescale=1./255.0)

# Generate training set
train_set = data_generator.flow_from_directory(
    animals_dir,
    target_size=(200, 200),
    batch_size=64,
    class_mode='sparse',
    subset='training'
)

# Generate validation set
val_set = data_generator.flow_from_directory(
    animals_dir,
    target_size=(200, 200),
    batch_size=64,
    class_mode='sparse',
    subset='validation'
)

# Generate test set
test_set = test_val_data_generator.flow_from_directory(
    test_dir,
    target_size=(200, 200),
    batch_size=64,
    class_mode='sparse'
)


The original dataset contains the following classes:
ragno
cavallo
gallina
pecora
gatto
scoiattolo
cane
elefante
mucca
farfalla


Total images per class in the original dataset:
ragno : 4339
cavallo : 2361
gallina : 2789
pecora : 1638
gatto : 1502
scoiattolo : 1676
cane : 4377
elefante : 1302
mucca : 1680
farfalla : 1901


## 3. Model Development

### 3.1 Model Selection

The first phase of the project implied creating a custom made Convolutional Neural Network architecture, that would be suitable for image classification tasks.

In the first few iterations of the model, I started with a simpler structure, with less layers.

At the final stage, I noticed that the model precision, f1 score and accuracy performed badly for the validation and test sets, so I introduced reLU activation.

MReLU is an element wise operation (applied per pixel) and replaces all negative pixel values in the feature map by zero and to introduce non-linearity to the network
Other non linear functions such as tanh or sigmoid can also be used instead of ReLU, but ReLU has been found to perform better in most situations gradually reduce the spatial dimensions through the network while still allowing the filters to have a sufficient receptive field in the second layer.

The final CNN model included layers such as convolutional layers, batch normalization, ReLU activation, max pooling, and dropout to reduce overfitting. The model was trained with the Adam optimizer and sparse categorical cross-entropy loss function, achieving reasonable accuracy on both the training and validation datasets.

In [None]:
cnn_model = tf.keras.models.Sequential([
    Conv2D(64, (3, 3), activation='relu', input_shape=(200, 200, 3)),
    BatchNormalization(),
    tf.keras.layers.ReLU(),
    Dropout(rate=0.15),

    Conv2D(128, (3, 3), activation='relu'),
    BatchNormalization(),
    tf.keras.layers.ReLU(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(rate=0.20),

    Flatten(),
    Dense(256, activation='relu'),
    BatchNormalization(),
    tf.keras.layers.ReLU(),
    Dropout(rate=0.30),
    Dense(10, activation='softmax'),
]
)




### Model Summary


In [None]:

cnn_model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 198, 198, 64)      1792      
                                                                 
 batch_normalization (Batch  (None, 198, 198, 64)      256       
 Normalization)                                                  
                                                                 
 max_pooling2d (MaxPooling2  (None, 99, 99, 64)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 99, 99, 64)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 97, 97, 128)       73856     
                                                                 
 batch_normalization_1 (Bat  (None, 97, 97, 128)       5


* Input Layer:
Input shape: (200, 200, 3) indicating images of size 200x200 pixels with 3 color channels (RGB).

* Convolutional Layers:

The first convolutional layer has 64 filters, each with a kernel size of (3, 3) and ReLU activation.
The second convolutional layer has 128 filters, also with a kernel size of (3, 3) and ReLU activation.
Both convolutional layers use 'valid' padding by default.

* Batch Normalization:

Batch normalization layers are added after each convolutional layer. They help in normalizing the activations of the previous layer, reducing internal covariate shift, and potentially speeding up training.

* Activation Layers (ReLU):

ReLU activation layers are added after batch normalization. They introduce non-linearity to the model, allowing it to learn complex patterns in the data.

* Pooling Layer:

Max pooling with a pool size of (2, 2) is applied after the second convolutional layer. This layer reduces the spatial dimensions of the feature maps, helping in reducing computational complexity and controlling overfitting.

* Dropout Layers:

Dropout layers are added after the activation layers. They randomly set a fraction of input units to 0 during training, which helps prevent overfitting by forcing the network to learn redundant representations.

* Flatten Layer:

The Flatten layer converts the 2D feature maps into a 1D vector, preparing the data for input into the fully connected layers.

* Fully Connected (Dense) Layers:

There are two fully connected layers with 256 and 10 neurons respectively.
ReLU activation functions are used in these layers, introducing non-linearity.
The last layer uses a softmax activation function, suitable for multi-class classification tasks like yours.

### 3.2 Model Training and Evaluation

The model was compiled with the Adam optimizer, utilizing a learning rate of 0.0001, and categorical cross-entropy loss function, aiming to optimize accuracy during training.



In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
cnn_model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

**Early Stopping and ReduceLROnPlateau**

To monitor and control the training process, two callback functions were implemented: EarlyStopping and ReduceLROnPlateau.

EarlyStopping was set to monitor accuracy and halt training if there was no improvement after 5 consecutive epochs.

ReduceLROnPlateau was employed to adjust the learning rate dynamically by a factor of 0.2 if no improvement in validation loss was observed after 5 epochs, with a lower limit set to 0.001.



In [None]:
early_stopping = EarlyStopping(monitor='accuracy', patience = 5, restore_best_weights = True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001)

The model was trained using the fit method with the provided training set (train_set) and validated against the validation set (val_set) over 50 epochs.

Subsequently, the model's performance was evaluated on both the validation and test sets. The evaluation included measuring accuracy, along with generating classification reports to gain insights into the model's performance across different classes.

In [None]:
history = cnn_model.fit(train_set, validation_data=val_set, epochs=50, callbacks=[early_stopping, reduce_lr], verbose=2)

Epoch 1/50
295/295 - 912s - loss: 1.7654 - accuracy: 0.4020 - val_loss: 2.0458 - val_accuracy: 0.3682 - lr: 1.0000e-04 - 912s/epoch - 3s/step
Epoch 2/50
295/295 - 929s - loss: 1.6270 - accuracy: 0.4394 - val_loss: 1.8324 - val_accuracy: 0.4334 - lr: 1.0000e-04 - 929s/epoch - 3s/step
Epoch 3/50
295/295 - 936s - loss: 1.5382 - accuracy: 0.4746 - val_loss: 1.7231 - val_accuracy: 0.4576 - lr: 1.0000e-04 - 936s/epoch - 3s/step
Epoch 4/50
295/295 - 930s - loss: 1.4667 - accuracy: 0.4947 - val_loss: 1.7240 - val_accuracy: 0.4774 - lr: 1.0000e-04 - 930s/epoch - 3s/step
Epoch 5/50
295/295 - 925s - loss: 1.3967 - accuracy: 0.5193 - val_loss: 1.6860 - val_accuracy: 0.4804 - lr: 1.0000e-04 - 925s/epoch - 3s/step
Epoch 6/50
295/295 - 932s - loss: 1.3542 - accuracy: 0.5362 - val_loss: 1.5490 - val_accuracy: 0.5300 - lr: 1.0000e-04 - 932s/epoch - 3s/step
Epoch 7/50
295/295 - 946s - loss: 1.3119 - accuracy: 0.5474 - val_loss: 1.5973 - val_accuracy: 0.5111 - lr: 1.0000e-04 - 946s/epoch - 3s/step
Epoch 

## 4. Results

### 4.1 Model Performance Evaluation

The visualizations below illustrate the training and validation loss, as well as the training and validation accuracy over the course of the training epochs.

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.figure(figsize=(10, 5))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()


**Evaluating the model with the validation set**

In [None]:
test_loss, test_acc = cnn_model.evaluate(val_set)
print(f"Test accuracy: {test_acc}")

**Evaluating the model with a test set**

In [None]:
test_loss, test_acc = cnn_model.evaluate(test_set)
print(f"Test accuracy: {test_acc}")

**Model's accuracy, precision, recall, and F1-score**

In [None]:
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score  # Add this import statement


#Get the true labels from both validation and test sets
validation_true_labels = val_set.classes
test_true_labels = test_set.classes

class_names = list(val_set.class_indices.keys())


validation_pred = cnn_model.predict(val_set)
validation_pred_labels = np.argmax(validation_pred, axis=1)

acc_val = accuracy_score(validation_true_labels, validation_pred_labels)

print(f"Validation Set - Classification report:")
print(classification_report(validation_true_labels, validation_pred_labels, target_names=class_names))
print(validation_pred_labels)




test_pred = cnn_model.predict(test_set)
test_pred_labels = np.argmax(test_pred, axis=1)

acc_test = accuracy_score(test_true_labels, test_pred_labels)


print(f"Test set - Classification report:")
print(classification_report(test_true_labels, test_pred_labels, target_names=class_names))
print(test_pred_labels)

### **Confusion matrix**

**Validation Set**

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns


#Confusion matrix
conf_matrix = confusion_matrix(validation_true_labels, validation_pred_labels)

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix - Validation Set')
plt.show()

**Test Set**

In [None]:

#Confusion matrix
conf_matrix = confusion_matrix(test_true_labels, test_pred_labels)

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix - Test Set')
plt.show()

## 5. Transfer Learning

After evaluating the performance of the initial CNN model, transfer learning was employed to improve the model's accuracy and generalization capabilities.

The following pre-trained models were considered for transfer learning:

* **VGG16**: VGG16 is a popular deep learning model with 16 layers, known for its simplicity and effectiveness in image classification tasks. Its architecture consists of sequential convolutional layers followed by fully connected layers.

* **ResNet50**: ResNet50, a 50-layer residual network, introduces skip connections to address the vanishing gradient problem, enabling the training of deeper networks. It has shown superior performance in various image classification benchmarks.

* **InceptionV3**: InceptionV3, part of the Inception family of models, uses a more complex architecture with inception modules that allow the network to learn richer feature representations by combining multiple convolutional filters.

### 5.1 Removing the first layer and performing transfer learning using pre-trained models

First, I created a function to create the models according to a given pre-trained model.

In [None]:
def model_creation(base_model):

  #Freeze the base_model layers
  for layer in base_model.layers:
    layer.trainable = False

  #Add custom classification layers
  x = Flatten()(base_model.output)
  x = Dense(128, activation='relu')(x)
  x = Dropout(0.5)(x)
  output = Dense(len(class_names), activation='softmax')(x)

  #Create the model
  transfer_model = Model(inputs=base_model.input, outputs=output)
  return transfer_model


**VGG16**

Achieved good accuracy on the validation and test sets.
Benefits from a straightforward architecture, making it easier to fine-tune and interpret.

In [None]:
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input

#Load the VGG16 pre-trained model with imagenet weights without the top layer
VGG_base = VGG16(weights='imagenet', include_top=False, input_shape=(200,200,3))

VGG_model = model_creation(VGG_base)

#Compile the model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

VGG_model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_VGG = VGG_model.fit(
    train_set,
    validation_data=val_set,
    epochs=15,
    callbacks=[early_stopping, reduce_lr],
    verbose=2
)


# Evaluate the model
test_loss, test_acc = VGG_model.evaluate(test_set)
print(f"VGG Model Test accuracy: {test_acc}")


**ResNet50**

Not included in the final comparison due to resource constraints, but generally expected to perform well due to its deeper architecture and skip connections.

In [None]:
from keras.applications.resnet import ResNet50

#Load the ResNet50 pre-trained model with imagenet weights without the top layer
resnet_base = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

Resnet_model = model_creation(resnet_base)

#Compile the model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

Resnet_model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history_resnet = Resnet_model.fit(
    train_set,
    validation_data=val_set,
    epochs=15,
    callbacks=[early_stopping, reduce_lr],
    verbose=2
)

# Evaluate the model
test_loss_resnet, test_acc_resnet = Resnet_model.evaluate(test_set)
print(f"Resnet 50 Model Test accuracy: {test_acc_resnet}")


**InceptionV3**

Showed competitive accuracy on the validation and test sets.
The inception modules provide a diverse set of feature representations, potentially leading to better performance in complex datasets.

In [None]:
from keras.applications.inception_v3 import InceptionV3


#Load the InceptionV3 pre-trained model with imagenet weights without the top layer
inception_base = InceptionV3(weights='imagenet', include_top=False, input_shape=(200,200,3))

Inception_model = model_creation(inception_base)

#Compile the model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
Inception_model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history_inception = Inception_model.fit(
    train_set,
    validation_data=val_set,
    epochs=15,
    callbacks=[early_stopping, reduce_lr],
    verbose=2
)


# Evaluate the model
test_loss_inception , test_acc_inception  = Inception_model.evaluate(test_set)
print(f"InceptionV3 Model Test accuracy: {test_acc_inception }")


### 5.2 Transfer Learning Evaluation

**Evaluation of Pre-trained Models**

Each pre-trained model was fine-tuned on the Animals10 dataset, and their performances were compared based on validation accuracy and test accuracy. The key findings are summarized below:



**Final Model Selection**
The final model selected was VGG16 due to its balance of simplicity and performance. Additionally, VGG16 was further fine-tuned by unfreezing the last few layers and retraining them to adapt specifically to the Animals10 dataset. This fine-tuning process further improved the model's accuracy.

* Performance: VGG16 demonstrated strong performance on the validation and test sets, indicating its ability to generalize well to new images.
* Simplicity: The straightforward architecture of VGG16 made it easier to fine-tune and interpret compared to more complex models.
* Resource Efficiency: VGG16 required less computational resources compared to deeper models like ResNet50, making it a practical choice for this project.

Overall, VGG16 provided a good balance of accuracy, interpretability, and computational efficiency, making it the ideal choice for the animal species classification task.

In [None]:
# Unfreeze some of the base model layers
for layer in VGG_model.layers[-20:]:  # Unfreeze the last 20 layers
    if not isinstance(layer, BatchNormalization):  # Optionally leave BatchNormalization layers frozen
        layer.trainable = True

# Recompile the model (necessary after changing layer trainability)
VGG_model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Continue training
history_final = VGG_model.fit(
    train_set,
    validation_data=val_set,
    epochs=15,
    callbacks=[early_stopping, reduce_lr],
    verbose=2
)

# Evaluate the model
test_loss_final, test_acc_final = VGG_model.evaluate(test_set)
print(f"Final Model Test accuracy: {test_acc_final}")


In [None]:
# Save the model to disk

model_dir = "./model"
model_version = 1
model_export_path = f"{model_dir}/{model_version}"

tf.saved_model.save(
    VGG_model,
    export_dir=model_export_path,
)

print(f"SavedModel files: {os.listdir(model_export_path)}")