## Computer Vision using Resnet CNN architecture model

#### Matthew Yeseta, Master Data Science, Indiana (3.8/4.0)

In [1]:
import os
import random
import numpy as np
import matplotlib.pyplot as plt
import keras
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import roc_curve, auc, accuracy_score, classification_report




The DataLoader class is designed to handle the loading and organization of image file paths from a specified directory for use in a machine learning model. This class is initialized with a base directory (base_dir), which it then uses to gather and shuffle the paths to all image files that are located within directories containing the word 'images'. This shuffling is intended to randomize the order of the files, which is a common practice in machine learning to prevent the model from learning any potential order in the data presentation.

The method _load_paths is a private method, indicated by the underscore prefix, and it is responsible for walking through the directory structure starting from base_dir. As it explores each directory and its files, it collects paths to the files that are inside directories named with 'images'. After gathering all such paths, it shuffles this list to ensure randomness.

Lastly, the split_data method divides the shuffled list of image paths into three subsets: training, validation, and test sets. The sizes of these sets are determined by percentage thresholds, splitting approximately 75% of the data for training and then taking the next 7% for validation, leaving the remainder for testing. This method returns a tuple containing these three lists, which can then be used to train and evaluate a machine learning model. This separation of data is crucial for training robust models that generalize well on new, unseen data.

In [2]:
class DataLoader:
    def __init__(self, base_dir):
        self.base_dir = base_dir
        self.path_df = self._load_paths()

    def _load_paths(self):
        path_df = []
        for dirname, _, filenames in os.walk(self.base_dir):
            for filename in filenames:
                if 'images' in dirname:
                    path_df.append(os.path.join(dirname, filename))
        random.shuffle(path_df)
        return path_df

    def split_data(self):
        train_idx = int(len(self.path_df) * 0.75)
        val_idx = int(len(self.path_df) * 0.82)
        return (self.path_df[:train_idx], self.path_df[train_idx:val_idx], self.path_df[val_idx:])


The ImagePreprocessor class contains methods dedicated to processing images, specifically designed for use in a computer vision pipeline. This class operates as a static utility class, meaning it doesn't need to maintain any state and can be called directly on the class itself without needing an instance.

The first method, crop_image, takes a single image as input and performs cropping to remove any zero-padding. It identifies the non-zero regions of the image (those areas that actually contain image data as opposed to the padded zeros) and calculates the minimum and maximum boundaries of these regions along both axes. The image is then cropped to this identified bounding box, effectively removing any parts of the image that contain no information.

The second method, process_image_paths, is designed to process a list of image file paths. For each path provided, it loads the image in grayscale, converts it into an array, and then calls the crop_image method to remove padding. After cropping, the image is resized to a standard dimension (175x175 pixels in this case). This resized image is then normalized by dividing by 255 to scale pixel values to the range [0, 1], making it suitable for neural network input. This method also extracts the class label from the path name (assumed to be three directories up from the file name) and stores it. Finally, process_image_paths returns two arrays: one containing the processed image data (X) and the other containing the corresponding labels (y). This setup is particularly useful for preparing datasets for training and evaluating machine learning models, where inputs (X) and targets (y) are needed.

In [3]:
class ImagePreprocessor:
    @staticmethod
    def crop_image(image):
        nonzero_indices = np.argwhere(image != 0)
        y_min, y_max = nonzero_indices[:, 0].min(), nonzero_indices[:, 0].max()
        x_min, x_max = nonzero_indices[:, 1].min(), nonzero_indices[:, 1].max()
        return image[y_min:y_max, x_min:x_max]

    @staticmethod
    def process_image_paths(image_paths):
        X, y = [], []
        for path in image_paths:
            img = keras.preprocessing.image.img_to_array(keras.preprocessing.image.load_img(path, color_mode='grayscale'))
            cropped_img = ImagePreprocessor.crop_image(img)
            resized_img = tf.image.resize(cropped_img, (175, 175))
            X.append(np.array(resized_img / 255.0, dtype=np.float16))
            y.append(path.split('/')[-3])
        return np.array(X), np.array(y)


The ResNetModel class encapsulates the construction and setup of a convolutional neural network model specifically designed in the style of ResNet, which is particularly suited for image classification tasks.

Upon instantiation of an object of this class, the __init__ method initializes the neural network model by calling the build_model method. The build_model method defines a sequential model using Keras - a high-level neural networks API. This model starts with a 2D convolutional layer that has 512 filters and a kernel size of 7, and uses 'selu' (Scaled Exponential Linear Unit) as the activation function. The input shape specified corresponds to grayscale images of size 175x175. Following this, a dropout layer is introduced to prevent overfitting by randomly setting a fraction of input units to 0 during training, which in this case is 20% of the units.

Subsequently, another 2D convolutional layer with 128 filters and a smaller kernel size of 3 is added, followed by a max pooling layer which helps reduce the spatial dimensions of the output from the previous convolutional layers. The max pooling layer uses a pool size of 3 and a stride of 1, keeping the padding same to maintain the spatial dimensions.

The add_residual_units method is then called to insert additional layers into the model. This method iteratively adds groups of residual units to the model, which are essential components in ResNet architectures. These units help the network learn identity functions, which is beneficial for training deeper networks by allowing gradients to flow through the network without vanishing. Each residual unit is configured with a predefined number of filters (128 and 64) and a stride of 2, designed to increase the depth of the model while controlling its complexity and computational demand.

After constructing the residual units, the model’s architecture is finalized by flattening the multi-dimensional inputs into a one-dimensional array, followed by dense (fully connected) layers with 'selu' activation functions and dropout layers. These layers serve to interpret the features extracted by the convolutions and pooling, transforming them into final outputs of the model. The final dense layer uses a 'softmax' activation function, suitable for multi-class classification, outputting the probabilities of the input being in each of four classes.

Finally, the model is compiled with a configuration that specifies the 'sparse_categorical_crossentropy' loss function and the 'adam' optimizer, a popular choice for training deep learning models due to its efficient handling of sparse gradients on noisy problems. The accuracy metric is also tracked, which is essential for evaluating the performance of the model during training and testing.

In [4]:
class ResNetModel:
    def __init__(self):
        self.model = self.build_model()

    def build_model(self):
        model = keras.models.Sequential([
            keras.layers.Conv2D(512, 7, strides=2, padding="same", activation="selu", input_shape=(175, 175, 1)),
            keras.layers.Dropout(0.2),
            keras.layers.Conv2D(128, 3, strides=1, padding="same", activation="selu"),
            keras.layers.MaxPool2D(pool_size=3, strides=1, padding="same")
        ])
        
        self.add_residual_units(model)
        
        model.add(keras.layers.Flatten())
        model.add(keras.layers.Dense(128, activation='selu'))
        model.add(keras.layers.Dropout(0.4))
        model.add(keras.layers.Dense(64, activation='selu'))
        model.add(keras.layers.Dropout(0.3))
        model.add(keras.layers.Dense(4, activation='softmax'))
        
        model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
        return model

    def add_residual_units(self, model):
        filters = [128] * 2 + [64] * 2
        for f in filters:
            model.add(ResidualUnit(f, strides=2))


The ResidualUnit class is designed as a custom layer in TensorFlow, extending the capabilities of Keras's Layer class. It represents a building block of a residual neural network, often used in deeper neural networks to help mitigate the vanishing gradient problem. The constructor of this class (__init__) initializes the layer with a specific number of filters, stride, and activation function. The activation function used here is selu by default, which is a type of scaled exponential linear unit that aims to improve learning in deep neural networks by self-normalizing the outputs.

In [5]:
class ResidualUnit(keras.layers.Layer):
    def __init__(self, filters, strides=1, activation="selu", **kwargs):
        super().__init__(**kwargs)
        self.activation = keras.activations.get(activation)
        self.main_layers = [
            keras.layers.Conv2D(filters, 3, strides=strides, padding="same"),
            keras.layers.BatchNormalization(),
            self.activation,
            keras.layers.Conv2D(filters, 3, strides=1, padding="same"),
            keras.layers.BatchNormalization()
        ]
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                keras.layers.Conv2D(filters, 1, strides=strides, padding="same"),
                keras.layers.BatchNormalization()
            ]

    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
        return self.activation(Z + skip_Z)



The MainWorkflow class is designed to encapsulate the entire process of data loading, model training, and evaluation within a computer vision context using a ResNet model. Upon initialization, the class sets up three main components: a DataLoader to manage data retrieval, a LabelEncoder to encode class labels for use in the model, and a ResNetModel which represents the neural network architecture.

The run method of MainWorkflow orchestrates the preprocessing and partitioning of image data into training, validation, and test sets using predefined slices of data paths. This method uses the ImagePreprocessor to handle the loading and preprocessing of image data into a suitable format for the model. Once the data is prepared, the class labels are transformed into a numerical format using the LabelEncoder.

The display_results method is responsible for visualizing the outcomes of the model training and evaluation. It predicts class labels for the test data using the trained ResNet model and computes the accuracy against the true labels. A classification report is then generated to provide detailed performance metrics for each class. Additionally, this method plots the training history, showing both accuracy and loss metrics over the course of training, providing a visual representation of the model's training process. These plots help in understanding how well the model learned from the training data and how it performed on the validation data across training epochs.

In [6]:
class MainWorkflow:
    def __init__(self, base_dir):
        self.loader = DataLoader(base_dir)
        self.encoder = LabelEncoder()
        self.model = ResNetModel()

    def run(self):
        X_train, y_train = ImagePreprocessor.process_image_paths(self.loader.path_df[:70])
        X_val, y_val = ImagePreprocessor.process_image_paths(self.loader.path_df[70:80])
        X_test, y_test = ImagePreprocessor.process_image_paths(self.loader.path_df[80:])

        y_train = self.encoder.fit_transform(y_train)
        y_val = self.encoder.transform(y_val)
        y_test = self.encoder.transform(y_test)


    def display_results(self, history, X_test, y_test):
        prob_pred_ResNet = self.model.model.predict(X_test)
        y_pred_ResNet = np.argmax(prob_pred_ResNet, axis=1)

        print('Accuracy on Test Set:', accuracy_score(y_test, y_pred_ResNet))
        print('Classification Report:')
        print(classification_report(self.encoder.inverse_transform(y_test), self.encoder.inverse_transform(y_pred_ResNet)))
        
        fig, axs = plt.subplots(1, 2, figsize=(20, 7))
        plt.title("ResNet Model")
        axs[0].plot(history['accuracy'], label='Accuracy')
        axs[0].plot(history['val_accuracy'], label='Val Accuracy')
        axs[0].legend()
        axs[0].grid()

        axs[1].plot(history['loss'], label='Loss')
        axs[1].plot(history['val_loss'], label='Val Loss')
        axs[1].legend()
        axs[1].grid()

        plt.show()

Python "__name__" main function

In [None]:

if __name__ == "__main__":
    workflow = MainWorkflow('/kaggle/input')
    workflow.run()
