# **Malaria Detection**

## <b>Problem Definition</b>
**The context:** Malaria is a life-threatening disease caused by Plasmodium parasites, transmitted through bites of infected female Anopheles mosquitoes. It affects nearly half of the global population, with over 229 million cases and 400,000 deaths reported in 2019—67% of which were children under five. The parasite can remain in the body for over a year without symptoms, making early detection critical. Traditional diagnosis relies on manual inspection of red blood cells (RBCs), which is labor-intensive, time-consuming, and subject to human error. Automating this process using Machine Learning (ML) and Deep Learning (DL) techniques has shown promise in improving diagnostic accuracy and efficiency. This project aims to develop an AI-based solution for accurate and early malaria detection.<br>


**The objectives:** The goal is to build an efficient computer vision model that can automatically detect malaria by analyzing images of red blood cells. The model should classify each cell image as either parasitized (infected with malaria) or uninfected, enabling fast and accurate diagnosis.<br>


**The key questions:** 
 - Can we accurately detect malaria-infected red blood cells using image data?
 - What deep learning architecture yields the best performance for malaria classification?
 - How can we optimize the model for both accuracy and computational efficiency?
 - What is the minimum amount of data or preprocessing required to achieve high accuracy?
 - How generalizable is the model across different datasets or imaging conditions?<br>


**The problem formulation:** This project aims to solve a binary image classification problem using data science and deep learning techniques. Specifically, the task is to develop a computer vision model that can:
 - Take as input an image of a red blood cell from a blood smear,
 - Automatically analyze the visual features,
 - And classify the image as either parasitized (malaria-infected) or uninfected.

The broader objective is to support rapid, accurate, and scalable malaria diagnosis in clinical and resource-constrained settings, thereby reducing dependence on manual microscopy and enabling timely treatment

## <b>Data Description </b>

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:<br>


**Parasitized:** The parasitized cells contain the Plasmodium parasite which causes malaria<br>
**Uninfected:** The uninfected cells are free of the Plasmodium parasites<br>


## <b>Important Notes</b>
- All the outputs in the notebook are just for reference and can be different if you follow a different approach.

- There are sections called **Think About It** in the notebook that will help you get a better understanding of the reasoning behind a particular technique/step. Interested learners can take alternative approaches if they want to explore different techniques. 

### <b>Loading libraries</b>

In [2]:
# Importing libraries required to load the data
import zipfile

import os

from PIL import Image

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import MinMaxScaler

# To ignore warnings
import warnings

warnings.filterwarnings('ignore')

# Remove the limit from the number of displayed columns and rows. It helps to see the entire dataframe while printing it
pd.set_option("display.max_columns", None)

pd.set_option("display.max_rows", 200)

### <b>Let us load the data</b>

**Note:** 
- You must download the dataset from the link provided on Olympus and upload the same on your Google drive before executing the code in the next cell.
- In case of any error, please make sure that the path of the file is correct as the path may be different for you.

In [3]:
# Storing the path of the data file from the Google drive
path = 'cell_images.zip'

# The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_ref:

    zip_ref.extractall()

The extracted folder has different folders for train and test data which further contains the different sizes of images for parasitized and uninfected cells within the respective folder name. 

The size of all images must be the same and should be converted to 4D arrays so that they can be used as an input for the convolutional neural network. Also, we need to create the labels for both types of images to be able to train and test the model. 

Let's do the same for the training data first and then we will use the same code for the test data as well.

In [5]:
# Storing the path of the extracted "train" folder 
train_dir = 'cell_images/train'

# Size of image so that each image has the same size
SIZE = 64

# Empty list to store the training images after they are converted to NumPy arrays
train_images = []

# Empty list to store the training labels (0 - uninfected, 1 - parasitized)
train_labels = []

In [6]:
# We will run the same code for "parasitized" as well as "uninfected" folders within the "train" folder
for folder_name in ['/parasitized/', '/uninfected/']:
    
    # Path of the folder
    images_path = os.listdir(train_dir + folder_name)

    for i, image_name in enumerate(images_path):
    
        try:
    
            # Opening each image using the path of that image
            image = Image.open(train_dir + folder_name + image_name)

            # Resizing each image to (64, 64)
            image = image.resize((SIZE, SIZE))

            # Converting images to arrays and appending that array to the empty list defined above
            train_images.append(np.array(image))

            # Creating labels for parasitized and uninfected images
            if folder_name == '/parasitized/':
            
                train_labels.append(1)
           
            else:
           
                train_labels.append(0)
        
        except Exception:
       
            pass       

# Converting lists to arrays
train_images = np.array(train_images)

train_labels = np.array(train_labels)

In [9]:
# Storing the path of the extracted "test" folder 
test_dir = 'cell_images/test'

# Size of image so that each image has the same size (it must be same as the train image size)
SIZE = 64

# Empty list to store the testing images after they are converted to NumPy arrays
test_images = []

# Empty list to store the testing labels (0 - uninfected, 1 - parasitized)
test_labels = []

In [10]:
# We will run the same code for "parasitized" as well as "uninfected" folders within the "test" folder
for folder_name in ['/parasitized/', '/uninfected/']:
    
    # Path of the folder
    images_path = os.listdir(test_dir + folder_name)

    for i, image_name in enumerate(images_path):

        try:
            # Opening each image using the path of that image
            image = Image.open(test_dir + folder_name + image_name)
            
            # Resizing each image to (64, 64)
            image = image.resize((SIZE, SIZE))
            
            # Converting images to arrays and appending that array to the empty list defined above
            test_images.append(np.array(image))
            
            # Creating labels for parasitized and uninfected images
            if folder_name == '/parasitized/':

                test_labels.append(1)

            else:

                test_labels.append(0)

        except Exception:

            pass       

# Converting lists to arrays
test_images = np.array(test_images)

test_labels = np.array(test_labels)

### <b> Checking the shape of train and test images

In [13]:
print("Shape of training images:", train_images.shape)
print("Shape of testing images:", test_images.shape)

Shape of training images: (24958, 64, 64, 3)
Shape of testing images: (2600, 64, 64, 3)


### <b> Checking the shape of train and test labels

In [14]:
print("Shape of training labels:", train_labels.shape)
print("Shape of testing labels:", test_labels.shape)

Shape of training labels: (24958,)
Shape of testing labels: (2600,)


#### <b> 📊 Observations:

🔹 Training Data
 - train_images.shape = (24,958, 64, 64, 3)
 - We have 24,958 RGB images, each of size 64x64 pixels.

train_labels.shape = (24,958,)
 -  There are 24,958 corresponding labels, meaning every image has one label.

🔹 Testing Data
 - test_images.shape = (2,600, 64, 64, 3)
 - The test set contains 2,600 RGB images, also of size 64x64.

 - test_labels.shape = (2,600,)
 -  You have 2,600 labels—again, a 1-to-1 match with the test images.

<b> 📈 Insights:

✅ Data is well-aligned

The number of images matches the number of labels in both training and testing sets, so our dataset is properly structured for supervised learning.

📦 We're using RGB images

The 3 in the shape (last dimension) confirms all our images are in color (RGB), not grayscale.

🧪 We’re using a ~90/10 split

Total images = 24,958 (train) + 2,600 (test) = 27,558

Train set = ~90.6%

Test set = ~9.4%

✅ This is a reasonable and commonly used split for image classification tasks.

📐 Uniform Image Dimensions

All images are consistently sized to 64x64, which ensures efficient training and compatibility with CNN models.


### <b>Check the minimum and maximum range of pixel values for train and test images

In [15]:
print("Train images pixel range:", train_images.min(), "to", train_images.max())
print("Test images pixel range:", test_images.min(), "to", test_images.max())


Train images pixel range: 0 to 255
Test images pixel range: 0 to 255


#### <b> Observations:
 - The pixel values for both train and test images range from 0 to 255.
 - This indicates that the images are in standard 8-bit RGB format, where each channel (Red, Green, Blue) can take values from 0 (darkest) to 255 (brightest).
 - The pixel range is consistent across both datasets, which is important for model training and evaluation.

 <b> Insights:
  - Since the pixel values are not normalized, it is recommended to scale them to the [0,1] range before feeding them into a neural network. This helps the model train more efficiently and can improve convergence.
  - The consistency in pixel range between train and test sets ensures that the model will not encounter unexpected data distributions during interference.
  - No outliers or corrupted images are present in termins of pixel intensity, which means the data quality is good for image classification tasks.



### <b> Count the number of values in both uninfected and parasitized 

In [22]:
print("Training label counts:")
print(pd.Series(train_labels).value_counts())
print(30*"*")
print("Test label counts:")
print(pd.Series(test_labels).value_counts())

Training label counts:
1    12582
0    12376
Name: count, dtype: int64
******************************
Test label counts:
1    1300
0    1300
Name: count, dtype: int64


### <b>Normalize the images

In [None]:
# Try to normalize the train and test images by dividing it by 255 and convert them to float32 using astype function
train_images = (___________).astype('float32')

test_images = (______________).astype('float32')

####<b> Observations and insights: _____

###<b> Plot to check if the data is balanced

In [None]:
# You are free to use bar plot or pie-plot or count plot, etc. to plot the labels of train and test data and check if they are balanced



####<b> Observations and insights: _____

### <b>Data Exploration</b>
Let's visualize the images from the train data

In [None]:
# This code will help you in visualizing both the parasitized and uninfected images
np.random.seed(42)

plt.figure(1, figsize = (16 , 16))

for n in range(1, 17):

    plt.subplot(4, 4, n)

    index = int(np.random.randint(0, train_images.shape[0], 1))

    if train_labels[index] == 1: 

        plt.title('parasitized')

    else:
        plt.title('uninfected')

    plt.imshow(train_images[index])

    plt.axis('off')

####<b> Observations and insights: _____

###<b> Similarly visualize the images with subplot(6, 6) and figsize = (12, 12)

In [None]:
# Hint: Have a keen look into the number of iterations that the for loop should iterate



####<b>Observations and insights:

###<b> Plotting the mean images for parasitized and uninfected

In [None]:
# Function to find the mean
def find_mean_img(full_mat, title):

    # Calculate the average
    mean_img = np.mean(full_mat, axis = 0)[0]

    # Reshape it back to a matrix
    plt.imshow(mean_img)

    plt.title(f'Average {title}')

    plt.axis('off')

    plt.show()

    return mean_img

<b> Mean image for parasitized

In [None]:
# If the label = 1 then the image is parasitised and if the label = 0 then the image is uninfected
parasitized_data = []  # Create a list to store the parasitized data

for img, label in zip(train_images, train_labels):

        if label == 1:
              
              parasitized_data.append([img])          

parasitized_mean = find_mean_img(np.array(parasitized_data), 'Parasitized')   # find the mean

<b> Mean image for uninfected

In [None]:
# Similarly write the code to find the mean image of uninfected




####<b> Observations and insights: _____

### <b>Converting RGB to HSV of Images using OpenCV

###<b> Converting the train data

In [None]:
import cv2

gfx=[]   # to hold the HSV image array

for i in np.arange(0, 100, 1):

  a = cv2.cvtColor(train_images[i], cv2.COLOR_BGR2HSV)
  
  gfx.append(a)

gfx = np.array(gfx)

In [None]:
viewimage = np.random.randint(1, 100, 5)

fig, ax = plt.subplots(1, 5, figsize = (18, 18))

for t, i in zip(range(5), viewimage):

  Title = train_labels[i]

  ax[t].set_title(Title)

  ax[t].imshow(gfx[i])

  ax[t].set_axis_off()
  
  fig.tight_layout()

###<b> Converting the test data

In [None]:
# Similarly you can visualize for the images in the test data

####<b>Observations and insights: _____

###<b> Processing Images using Gaussian Blurring

###<b> Gaussian Blurring on train data

In [None]:
gbx = []  # To hold the blurred images

for i in np.arange(0, 100, 1):

  b = cv2.GaussianBlur(train_images[i], (5, 5), 0)

  gbx.append(b)

gbx = np.array(gbx)

In [None]:
viewimage = np.random.randint(1, 100, 5)

fig, ax = plt.subplots(1, 5, figsize = (18, 18))

for t, i in zip(range(5), viewimage):

  Title = train_labels[i]

  ax[t].set_title(Title)

  
  ax[t].imshow(gbx[i])
  
  ax[t].set_axis_off()
  
  fig.tight_layout()

###<b> Gaussian Blurring on test data

In [None]:
# Similarly you can apply Gaussian blurring for the images in the test data

####**Observations and insights: _____**

**Think About It:** Would blurring help us for this problem statement in any way? What else can we try?

###<B>One Hot Encoding on the train and test labels

In [None]:
# Encoding Train Labels
train_labels = to_categorical(____, 2)

# Similarly let us try to encode test labels
test_labels = to_categorical(_____, 2)

### **Base Model**

**Note:** The Base Model has been fully built and evaluated with all outputs shown to give an idea about the process of the creation and evaluation of the performance of a CNN architecture. A similar process can be followed in iterating to build better-performing CNN architectures.

###<b> Importing the required libraries for building and training our Model

In [None]:
# Clearing backend
from tensorflow.keras import backend

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout  

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

from random import shuffle

backend.clear_session()

# Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)

###<b> Building the model

In [None]:
# Creating sequential model
model = Sequential()

model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation = "relu", input_shape = (64, 64, 3)))

model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation = "relu"))

model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation = "relu"))

model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(512, activation = "relu"))

model.add(Dropout(0.4))

model.add(Dense(2, activation = "softmax")) # 2 represents output layer neurons 

model.summary()

###<b> Compiling the model

In [None]:
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

<b> Using Callbacks 

In [None]:
callbacks = [EarlyStopping(monitor = 'val_loss', patience = 2),
             ModelCheckpoint('.mdl_wts.hdf5', monitor = 'val_loss', save_best_only = True)]

<b> Fit and train our Model

In [None]:
# Fit the model with min batch size as 32 can tune batch size to some factor of 2^power ] 
history = model.fit(train_images, train_labels, batch_size = 32, callbacks = callbacks, validation_split = 0.2, epochs = 20, verbose = 1)

###<b> Evaluating the model on test data

In [None]:
accuracy = model.evaluate(test_images, test_labels, verbose = 1)
print('\n', 'Test_Accuracy:-', accuracy[1])

<b> Plotting the confusion matrix

In [None]:
from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

pred = model.predict(test_images)

pred = np.argmax(pred, axis = 1) 

y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

plt.figure(figsize = (8, 5))

sns.heatmap(cm, annot = True,  fmt = '.0f', xticklabels = ['Uninfected', 'Parasitized'], yticklabels = ['Uninfected', 'Parasitized'])

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()

<b>Plotting the train and validation curves

In [None]:
# Function to plot train and validation accuracy 
def plot_accuracy(history):

    N = len(history.history["accuracy"])

    plt.figure(figsize = (7, 7))

    plt.plot(np.arange(0, N), history.history["accuracy"], label = "train_accuracy", ls = '--')

    plt.plot(np.arange(0, N), history.history["val_accuracy"], label = "val_accuracy", ls = '--')

    plt.title("Accuracy vs Epoch")
    
    plt.xlabel("Epochs")
    
    plt.ylabel("Accuracy")
    
    plt.legend(loc="upper left")

In [None]:
plot_accuracy(history)



* Here we can clearly observe that the training and valiation accuracy are increasing 
* And we can also notice that validation accuracy is slightly higher than the train accuracy

So now let's try to build another model with few more add on layers and try to check if we can try to improve the model. Therefore try to build a model by adding few layers if required and altering the activation functions.

###<b> Model 1
####<b> Trying to improve the performance of our model by adding new layers


In [None]:
backend.clear_session() # Clearing the backend for new model

###<b> Building the Model

In [None]:
# Creating sequential model
model1 = Sequential()




# Build the model here and add new layers





model1.summary()

###<b> Compiling the model

In [None]:
model1.compile(loss = __________, optimizer = _______, metrics = ['accuracy'])

<b> Using Callbacks

In [None]:
callbacks = [EarlyStopping(monitor = 'val_loss', patience = 2),
             ModelCheckpoint('.mdl_wts.hdf5', monitor = 'val_loss', save_best_only = True)]

<b>Fit and Train the model

In [None]:
history1 = model1.fit(_____________, __________, batch_size = ______, callbacks = callbacks,  validation_split = ______, epochs = ______, verbose = 1)

###<b> Evaluating the model

In [None]:
accuracy1 = model1.evaluate(_________, _____________, verbose = 1)

print('\n', 'Test_Accuracy:-', accuracy1[1])

<b> Plotting the confusion matrix

<b> Plotting the train and the validation curves

###<b>Think about it:</b><br>
Now let's build a model with LeakyRelu as the activation function  

*  Can the model performance be improved if we change our activation function to LeakyRelu?
*  Can BatchNormalization improve our model?

Let us try to build a model using BatchNormalization and using LeakyRelu as our activation function.

###<b> Model 2 with Batch Normalization

In [None]:
backend.clear_session() # Clearing the backend for new model

###<b> Building the Model

In [None]:
model2 = Sequential()

model2.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), padding = 'same'))

'''

Complete this model using BatchNormalization layers and by using LeakyRelu as the activation function


'''

adam = optimizers.Adam(learning_rate = 0.001)

model2.summary()

###<b>Compiling the model

In [None]:
model2.compile(loss = "binary_crossentropy", optimizer = adam, metrics = ['accuracy'])

<b> Using callbacks

In [None]:
'''

create the callbacks similarly as done in the base model
As callbacks will help us in saving our checkpoints and stopping at an accuracy where the model doesnot seem to improve

'''

<b>Fit and train the model

In [None]:
history2 = model2.fit(train_images, train_labels, batch_size = 32, callbacks = callbacks, validation_split = 0.2, epochs = 20, verbose = 1)

<b>Plotting the train and validation accuracy

In [None]:
# Plotting the accuracies


###<b>Evaluating the model

In [None]:
# Evaluate the model to calculate the accuracy

accuracy = model2.evaluate(________, ______________, verbose = 1)

print('\n', 'Test_Accuracy:-', accuracy[1])

####<b>Observations and insights: ____

<b> Generate the classification report and confusion matrix 

In [None]:
from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

pred = model2.predict(_______)

pred = np.argmax(pred, axis = 1) 

y_true = np.argmax(________________, axis = 1)

# Printing the classification report
print(classification_report(______, _______))

# Plotting the heatmap using confusion matrix

cm = confusion_matrix(_____, _____)

plt.figure(figsize = (8, 5))

sns.heatmap(cm, annot = True,  fmt = '.0f', xticklabels = ['Uninfected', 'Parasitized'], yticklabels = ['Uninfected', 'Parasitized'])

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()

###**Think About It :**<br>

* Can we improve the model with Image Data Augmentation?
* References to image data augmentation can be seen below:
  *   [Image Augmentation for Computer Vision](https://www.mygreatlearning.com/blog/understanding-data-augmentation/)
  *   [How to Configure Image Data Augmentation in Keras?](https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/)





###<b>Model 3 with Data Augmentation

In [None]:
backend.clear_session() # Clearing backend for new model

###<b> Using image data generator

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(train_images, train_labels, test_size = 0.2, random_state = 42)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Using ImageDataGenerator to generate images
train_datagen = ImageDataGenerator(horizontal_flip = True, 
                                  zoom_range = 0.5, rotation_range = 30)

val_datagen  = ImageDataGenerator()

# Flowing training images using train_datagen generator
train_generator = train_datagen.flow(x = _______, y = __________, batch_size = 64, seed = 42, shuffle = True)


# Flowing validation images using val_datagen generator
val_generator =  val_datagen.flow(x = _________, y = _________, batch_size = 64, seed = 42, shuffle = True)

###**Think About It :**<br>

*  Check if the performance of the model can be improved by changing different parameters in the ImageDataGenerator.



####<B>Visualizing Augmented images

In [None]:
# Creating an iterable for images and labels from the training data
images, labels = next(train_generator)

# Plotting 16 images from the training data
fig, axes = plt.subplots(4, 4, figsize = (16, 8))

fig.set_size_inches(16, 16)
for (image, label, ax) in zip(images, labels, axes.flatten()):

    ax.imshow(image)

    if label[1] == 1: 

        ax.set_title('parasitized')

    else:

        ax.set_title('uninfected')

    ax.axis('off')

####<b>Observations and insights: ____

###<b>Building the Model

In [None]:
model3 = Sequential()

# Build the model here
# Use this as the optimizer
adam = optimizers.Adam(learning_rate = 0.001)

model3.compile(loss = ________________, optimizer = adam, metrics = ['accuracy'])

model3.summary()

<b>Using Callbacks

In [None]:
callbacks = [EarlyStopping(monitor = 'val_loss', patience = 2),
             ModelCheckpoint('.mdl_wts.hdf5', monitor = 'val_loss', save_best_only = True)]

<b> Fit and Train the model

In [None]:
history3 = model3.fit(train_generator, 
                                  validation_data = val_generator,
                                  batch_size = _____, callbacks = ___________,
                                  epochs = 20, verbose = 1)

###<B>Evaluating the model

<b>Plot the train and validation accuracy

In [None]:
# Potting the accuracies
plot_accuracy(_________)

In [None]:
# Evaluating the model on test data
accuracy3 = _________.evaluate(________, ___________, verbose = 1)

print('\n', 'Test_Accuracy:-', accuracy3[1])

<B>Plotting the classification report and confusion matrix

<b> Now, let us try to use a pretrained model like VGG16 and check how it performs on our data.

### **Pre-trained model (VGG16)**

In [None]:
# Clearing backend
from tensorflow.keras import backend

backend.clear_session()

# Fixing the seed for random number generators
np.random.seed(42)

import random

random.seed(42)

tf.random.set_seed(42)

In [None]:
from tensorflow.keras.applications.vgg16 import VGG16

from tensorflow.keras import Model

vgg = VGG16(include_top = _________, weights = 'imagenet', input_shape = (64, 64, 3))

vgg.summary()

In [None]:
transfer_layer = vgg.get_layer('block5_pool')

vgg.trainable = False

# Add classification layers on top of it  
x = Flatten()(transfer_layer.output)  # Flatten the output from the 3rd block of the VGG16 model

x = Dense(256, activation = 'relu')(x)

# Similarly add a dense layer with 128 neurons
x = Dropout(0.3)(x)

# Add a dense layer with 64 neurons
x = BatchNormalization()(x)

pred = Dense(______, activation = 'softmax')(_____)

model4 = Model(vgg.input, pred) # Initializing the model

###<b>Compiling the model

In [None]:
# Compiling the model 

<b> using callbacks

In [None]:
# Adding Callbacks to the model

<b>Fit and Train the model

In [None]:
# Fitting the model and running the model for 10 epochs
history4 = model4.fit(
            __________, ______________,
            epochs = _________,
            callbacks = _____________,
            batch_size = _________,
            validation_split = 0.2,
            verbose = 1
)

<b>Plot the train and validation accuracy

In [None]:
# plotting the accuracies
plot_accuracy(__________)

###**Observations and insights: _____**

*   What can be observed from the validation and train curves?

###<b> Evaluating the model

In [None]:
# Evaluating the model on test data


<b>Plotting the classification report and confusion matrix

In [None]:
# Plot the confusion matrix and generate a classification report for the model

###<b>Think about it:</b>
*  What observations and insights can be drawn from the confusion matrix and classification report?
*  Choose the model with the best accuracy scores from all the above models and save it as a final model.


####<b> Observations and Conclusions drawn from the final model: _____



**Improvements that can be done:**<br>


*  Can the model performance be improved using other pre-trained models or different CNN architecture?
*  You can try to build a model using these HSV images and compare them with your other models.

#### **Insights**

####**Refined insights**:
- What are the most meaningful insights from the data relevant to the problem?

####**Comparison of various techniques and their relative performance**:
- How do different techniques perform? Which one is performing relatively better? Is there scope to improve the performance further?

####**Proposal for the final solution design**:
- What model do you propose to be adopted? Why is this the best solution to adopt?