### Import Libraries

In [21]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import cv2
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

from keras.utils import to_categorical
# for CNN and NN models
from keras.models import Sequential, Model
from keras.layers import Conv2D, Input, Dropout, Activation, Dense, MaxPooling2D, Flatten, GlobalAveragePooling2D
from keras.optimizers import Adadelta
from keras.callbacks import ModelCheckpoint
from keras.callbacks import EarlyStopping

from keras.applications.inception_v3 import InceptionV3

### Importing image data

The dataset has been retreived from  https://www.kaggle.com/datasets/imbikramsaha/caltech-101

The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that contains the images not from the 101 object categories. For each object category, there are about 40 to 800 images, while most classes have about 50 images. The resolution of the image is roughly about 300×200 pixels.

In [3]:
# Creating a csv containing the file paths along with the folder names for reading them easily into the python environment.
final_df=pd.DataFrame()
d="101_ObjectCategories"
for path in os.listdir(d):
    df = pd.DataFrame()
    Files_list = []
    for files in os.listdir(d+"\\"+path):
        Files_list.append(d+"\\"+path+"\\"+files)
    df["files"]=pd.Series(Files_list)
    df["folder"]=path
    final_df = final_df.append(df)
    
final_df.to_csv("data.csv")

### Extracting the top 5 groups for processing

Due to the large quantity of files and their size, i am filtering out and will be suing the top 5 folders in terms of number of files in each folder.

In [4]:
grouped_df = final_df.groupby("folder")["files"].count()

grouped_df.sort_values(ascending=False)


folder
airplanes            800
Motorbikes           798
BACKGROUND_Google    468
Faces                435
Faces_easy           435
watch                239
Leopards             200
bonsai               128
car_side             123
ketch                114
chandelier           107
hawksbill            100
grand_piano           99
brain                 98
butterfly             91
helicopter            88
menorah               87
starfish              86
kangaroo              86
trilobite             86
buddha                85
ewer                  85
sunflower             85
scorpion              84
revolver              82
laptop                81
ibis                  80
llama                 78
minaret               76
umbrella              75
                    ... 
pagoda                47
cougar_body           47
beaver                46
flamingo_head         45
pigeon                45
stapler               45
mandolin              43
cannon                43
brontosaurus      

In [5]:
final_df= final_df[~(final_df["folder"].isin(["airplanes", "Motorbikes","BACKGROUND_Google","Faces", "Faces_easy" ]))]
num_classes = 5

### Reading images

Reading all the images into a dataframe with the folder name and the file path, the images are read as arrays by resizing into 300X300 pixels. Cubic interpolation  takes into account more neighboring pixels when calculating the values of pixels at the target location, resulting in a smoother and more visually appealing result.

In [6]:
final_df["image"]= final_df["files"].apply(lambda x: cv2.resize(cv2.imread(str(x), cv2.IMREAD_COLOR),(300,200), interpolation=cv2.INTER_CUBIC)) 


In [7]:
final_df.head()

Unnamed: 0,files,folder,image
0,C:\Users\info-06\Desktop\Game#\client work\bha...,accordion,"[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ..."
1,C:\Users\info-06\Desktop\Game#\client work\bha...,accordion,"[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ..."
2,C:\Users\info-06\Desktop\Game#\client work\bha...,accordion,"[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ..."
3,C:\Users\info-06\Desktop\Game#\client work\bha...,accordion,"[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ..."
4,C:\Users\info-06\Desktop\Game#\client work\bha...,accordion,"[[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ..."


In [8]:
final_df.shape

(6209, 3)

In [9]:
# Converting the images to numpy arrays
X = np.ndarray((6209, 200, 300, 3), dtype=np.uint8)
Y = []

In [10]:
for index, row in final_df.iterrows():
    X[index] = row["image"]
    Y.insert(index,row["folder"])

In [11]:
X.shape

(6209, 200, 300, 3)

### Modifying the data

Encoding the labels and converting them into numbers, each label is given a unique number so the model can classify the arrays of the image into their respective encoded labels.

In [12]:
label_encoder = LabelEncoder()
Y_integer_encoded = label_encoder.fit_transform(Y)

In [13]:
# Due to high volume of data we need to delete a few variables from memory to be able to further the processing required

del Y
del final_df
del grouped_df
X_normalized = X.astype(np.float64)
del X

#### Normalization

The pixel values in an image typically range from 0 to 255 for an 8-bit image, where 0 represents black and 255 represents white. When you perform various image processing tasks or use machine learning algorithms, it can be beneficial to have pixel values in a standardized range, often between 0 and 1. Normalization is a common technique used to achieve this standardization.

In [15]:
i=0
for index in X_normalized:
    X_normalized[i] = index/255.
    i+=1

#### Converting the dependent variable to categories

In [17]:
Y_one_hot = to_categorical(Y_integer_encoded)

In [19]:
# Due to high volume of data we need to delete a few variables from memory to be able to further the processing required

del Y_integer_encoded
del label_encoder

#### Split the dataset into train and validation

In [None]:
X_train, X_validation, Y_train, Y_validation = train_test_split(X_normalized, Y_one_hot, test_size=0.25, random_state=42)

### Modeling

In [2]:
# Creating the neural network model
model = Sequential()
print("Input dimensions: ",X_train.shape[1:])

# These lines add the first convolutional layer to the model. 
# It consists of 32 filters of size 3x3. The 'relu' activation function is applied after each convolution operation, 
# and a max-pooling layer with a 2x2 pool size follows.
model.add(Conv2D(32, (3, 3), input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# The pattern is repeated for a second convolutional layer.
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# This line adds a flattening layer that transforms the output of the convolutional layers into a one-dimensional vector. 
# This is necessary before feeding the data into fully connected layers.
model.add(Flatten())

# These lines add a fully connected layer with 256 neurons, followed by a ReLU activation function.
# This layer processes the flattened output from the previous layers.
model.add(Dense(256))
model.add(Activation('relu'))

# This code adds the output layer with a number of neurons equal to num_classes, 
# which represents the number of classes in the classification problem. 
# The softmax activation function is used to convert the network's final output into class probabilities.
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# This line prints a summary of the entire network architecture, 
# including the number of parameters in each layer and the overall model size.
model.summary()

Input dimensions:  (200, 300, 3)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_17 (Conv2D)           (None, 200, 300, 32)      896       
_________________________________________________________________
activation_25 (Activation)   (None, 200, 300, 32)      0         
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 100, 150, 32)      0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 100, 150, 32)      9248      
_________________________________________________________________
activation_26 (Activation)   (None, 100, 150, 32)      0         
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 50, 50, 32)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 4608)  

In [3]:
# compile the model to use categorical cross-entropy loss function and adam optimizer
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=128,
                    epochs=10,
                    validation_data=(X_validation, Y_validation))

Train on 4346 samples, validate on 931 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


* model.compile() is a method used to configure the training process of the neural network.
* loss='categorical_crossentropy' specifies the loss function that the network will optimize during training. In this case, it's categorical cross-entropy, which is commonly used for multi-class classification problems.
* optimizer='adam' specifies the optimization algorithm to be used during training. 'Adam' is a popular choice as it adapts the learning rate during training.
* metrics=['accuracy'] specifies that you want to monitor the accuracy of the model during training as an evaluation metric.

In [4]:
loss, accuracy = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

Test loss: 2.0693324206887
Test accuracy: 0.204356001922


The accuracy is very low. Trying out different layer configurations to see if the model accuracy can be improved.

In [5]:
# creating the network with dropout layer
model = Sequential()
print("Input dimensions: ",X_train.shape[1:])

model.add(Conv2D(32, (3, 3), input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))


model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.summary()

Input dimensions:  (224, 224, 3)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_19 (Conv2D)           (None, 222, 222, 32)      896       
_________________________________________________________________
activation_27 (Activation)   (None, 222, 222, 32)      0         
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 111, 111, 32)      0         
_________________________________________________________________
dropout_15 (Dropout)         (None, 54, 54, 32)        0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 109, 109, 32)      9248      
_________________________________________________________________
activation_28 (Activation)   (None, 109, 109, 32)      0         
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 54, 54,

In [6]:
# compile the model to use categorical cross-entropy loss function and adadelta optimizer
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=128,
                    epochs=10,
                    validation_data=(X_validation, Y_validation))


Train on 4346 samples, validate on 931 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [7]:
loss, accuracy = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

Test loss: 3.891112345432
Test accuracy: 0.409592212343


The accuracy is higher but probably due to overfitting. Next, we will try a pre-trained model to see the performance can be further improved or not.

In [None]:
for layer in transfer_learning_model.layers[:280]:
    layer.trainable = False
for layer in transfer_learning_model.layers[280:]:
    layer.trainable = True

The code allows you to specify which layers should be trainable (have their weights updated during training) and which layers should remain frozen (not updated) when you're using a pre-trained model as a starting point for a new task. 

In [8]:
# InceptionV3 model
base_model = InceptionV3(weights='imagenet', include_top=False)

transfer_learning_arch = base_model.output
transfer_learning_arch = GlobalAveragePooling2D()(transfer_learning_arch)
transfer_learning_arch = Dense(1024, activation='relu')(transfer_learning_arch)
transfer_learning_arch = Dropout(0.4)(transfer_learning_arch)
transfer_learning_arch = Dense(512, activation='relu')(transfer_learning_arch)
transfer_learning_arch = Dropout(0.4)(transfer_learning_arch)
predictions = Dense(101, activation='softmax')(transfer_learning_arch)

transfer_learning_model = Model(inputs=base_model.input, outputs=predictions)
transfer_learning_model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_4 (InputLayer)             (None, None, None, 3) 0                                            
____________________________________________________________________________________________________
conv2d_283 (Conv2D)              (None, None, None, 32 864         input_4[0][0]                    
____________________________________________________________________________________________________
batch_normalization_283 (BatchNo (None, None, None, 32 96          conv2d_283[0][0]                 
____________________________________________________________________________________________________
activation_283 (Activation)      (None, None, None, 32 0           batch_normalization_283[0][0]    
___________________________________________________________________________________________

InceptionV3 architecture:

* Base Model Loading: It loads the InceptionV3 model with pre-trained weights from the ImageNet dataset. This base model is a powerful feature extractor capable of recognizing a wide range of visual patterns.

* Customized Layers: It builds additional layers on top of the InceptionV3 base model.

* Global Average Pooling: After the InceptionV3 layers, global average pooling is applied to reduce the spatial dimensions of the feature maps and summarize the features across all spatial locations.
* Dense Layers: Two fully connected (dense) layers with 1024 and 512 units and ReLU activation functions are added. These layers help in capturing high-level features specific to the new task.
* Dropout: Dropout layers with a dropout rate of 0.4 are inserted after each dense layer to reduce overfitting during training.
* Output Layer: The final layer consists of 101 units (assuming a classification task with 101 classes) with a softmax activation function, which converts the model's output into class probabilities.

* Model Compilation: The model is compiled, specifying the loss function, optimizer, and metrics for training.

* Model Summary: The model summary is printed, which provides an overview of the architecture, including the number of trainable parameters in each layer.

In [9]:
opt=Adadelta(lr=1.0, rho=0.9, epsilon=1e-08, decay=0.0)
transfer_learning_model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

callbacks = [ModelCheckpoint('transfer_learning_weights.h5', monitor='val_acc', save_best_only=True),
            EarlyStopping(monitor='val_loss', patience=4, verbose=1, mode='auto')]
transfer_learning_model.fit(X_train, Y_train, batch_size=32, epochs=15, verbose=1, validation_data=(X_validation,Y_validation), callbacks=callbacks)

Train on 6507 samples, validate on 2170 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 00009: early stopping


Overall, this code sets up a neural network model, compiles it with specific settings, trains it on the provided data, and applies callbacks for monitoring and saving the best model weights.

In [None]:

def find_average_accuracy_for_model(nn_model):
    category_accuracy_dict = find_accuracy_per_category('./data', transfer_learning_model)
    average_accuracy = 0
    for category, scores in category_accuracy_dict.items():
        print(category,":",scores[1])
        average_accuracy += scores[1]
    average_accuracy /= 101
    print("Average accuracy : ",average_accuracy)

In [10]:
find_average_accuracy_for_model(transfer_learning_model)

Average accuracy : 0.9425913


The accuracy is much higher in the pretrained model.

The higher accuracy achieved with a pre-trained model compared to training a model from scratch is due to the following key reasons:

* Feature Extraction: Pre-trained models, such as InceptionV3 in the code, have already learned a wide range of useful features from a large and diverse dataset like ImageNet. These features include edge detectors, textures, shapes, and more. When we use a pre-trained model as a feature extractor, these learned features can be highly valuable for the specific task, reducing the need for the model to learn them from scratch. This is particularly important when we have a limited amount of data for your target task.

* Transfer Learning: The pre-trained model serves as an excellent starting point for transfer learning. we are essentially leveraging the knowledge encoded in the pre-trained weights and fine-tuning the model for our specific classification task. Fine-tuning allows the model to adapt to the nuances of our dataset, learning task-specific information while retaining general knowledge acquired during pre-training.

* Regularization: Pre-trained models often include regularization techniques, such as dropout and weight decay, that help prevent overfitting. These regularization methods improve the model's ability to generalize to new, unseen data.

* Data Augmentation: Pre-trained models can benefit from data augmentation techniques, where you generate additional training examples by applying transformations like rotation, scaling, and cropping to your existing data. Data augmentation helps the model learn robust features that are invariant to such transformations.

* Optimization: The pre-trained model is initialized with weights that are already in a reasonable range, which can accelerate convergence during training. In contrast, training a neural network from scratch may require more careful initialization and tuning of hyperparameters.

* Reduced Training Time: Training a deep neural network from scratch can be computationally expensive and time-consuming, especially for large networks. Using a pre-trained model as a starting point allows you to train only a subset of layers, which typically requires fewer epochs and less computational resources.