# Practical aspects to take care of during model implementation

So far, we have seen the various ways of building an image classification model. In
this section, we will learn about some of the practical considerations that need to be
taken care of when building models. The ones we will discuss in this chapter are as
follows:

- Dealing with imbalanced data
- The size of an object within an image when performing classification
- The difference between training and validation images
- The number of convolutional and pooling layers in a network
- Image sizes to train on GPUs
- Leveraging OpenCV utilities

## Dealing with imbalanced data

Imagine a scenario where you are trying to predict an object that occurs very rarely
within our dataset – let's say in 1% of the total images. For example, this can be the
task of predicting whether an X-ray image suggests a rare lung infection.

How do we measure the accuracy of the model that is trained to predict the rare lung
infection? If we simply predict a class of no infection for all images, the accuracy of
classification is 99%, while still being useless. A confusion matrix that depicts the
number of times the rare object class has occurred and the number of times the model
predicted the rare object class correctly comes in handy in this scenario. Thus, the
right set of metrics to look at in this scenario is the metrics related to the confusion
matrix.

A typical confusion matrix looks as follows:

![imgs](./imgs/2.png)

In the preceding confusion matrix, 0 stands for no infection and 1 stands for infection.
Typically, we would fill up the matrix to understand how accurate our model is.

Next comes the question of ensuring that the model gets trained. Typically, the loss
function (binary or categorical cross-entropy) takes care of ensuring that the loss
values are high when the amount of misclassification is high. However, in addition to
the loss function, we can also assign a higher weight to the rarely occurring class,
thereby ensuring that we explicitly mention to the model that we want to correctly
classify the rare class images.

In addition to assigning class weights, we have already seen that image augmentation
and/or transfer learning help considerably in improving the accuracy of the model.
Furthermore, when augmenting an image, we can over-sample the rare class images
to increase their mix in the overall population.

## The size of the object within an image

Imagine a scenario where the presence of a small patch within a large image dictates
the class of the image – for example, lung infection identification where the presence
of certain tiny nodules indicates an incident of the disease. In such a scenario, image
classification is likely to result in inaccurate results, as the object occupies a smaller
portion of the entire image. Object detection comes in handy in this scenario (which
we will study in the next chapter).

A high-level intuition to solve these problems would be to first divide the input
images into smaller grid cells (let's say a 10 x 10 grid) and then identify whether a
grid cell contains the object of interest.


# Dealing with the difference between training and validation data

Imagine a scenario where you have built a model to predict whether the image of an
eye indicates that the person is likely to be suffering from diabetic retinopathy. To
build the model, you have collected data, curated it, cropped it, normalized it, and
then finally built a model that has very high accuracy on validation images. However,
hypothetically, when the model is used in a real setting (let's say by a doctor/nurse),
the model is not able to predict well. Let's understand a few possible reasons why:

<b> Are the images taken at the doctor's office similar to the images used to
train the model? </b>

- Images used when training and real-world images could be very
different if you built a model on a curated set of data that has all the
preprocessing done, while the images taken at the doctor's end are
non-curated.

- Images could be different if the device used to capture images at the
doctor's office has a different resolution of capturing images when
compared to the device used to collect images that are used for
training.

- Images can be different if there are different lighting conditions at
which the images are getting captured in both places.
Are the subjects (images) representative enough of the overall population?

- Images are representative if they are trained on images of the male
population but are tested on the female population, or if, in general,
the training and real-world images correspond to different
demographics.

<b> Is the training and validation split done methodically? </b>

- Imagine a scenario where there are 10,000 images and the first 5,000
images belong to one class and the last 5,000 images belong to another
class. When building a model, if we do not randomize but split the
dataset into training and validation with consecutive indices (without
random indices), we are likely to see a higher representation of one
class while training and of the other class during validation.


In general, we need to ensure that the training, validation, and real-world images all
have similar data distribution before an end user leverages the system.

## The number of nodes in the flatten layer

Consider a scenario where you are working on images that are 300 x 300 in
dimensions. Technically, we can perform more than five convolutional pooling
operations to get the final layer that has as many features as possible. Furthermore,
we can have as many channels as we want in this scenario within a CNN. Practically,
though, in general, we would design a network so that it has 500–5,000 nodes in the
flatten layer.

But, if we have a
greater number of nodes in the flatten layer, we would have a very high number of
parameters when the flatten layer is connected to the subsequent dense layer before
connecting to the final classification layer.

In general, it is good practice to have a pre-trained model that obtains the flatten layer
so that relevant filters are activated as appropriate. Furthermore, when leveraging
pre-trained models, make sure to freeze the parameters of the pre-trained model.

Generally, the number of trainable parameters in a CNN can be anywhere between 1
million to 10 million in a less complex classification exercise.

## Image size

Let's say we are working on images that are of very high dimensions – for example,
2,000 x 1,000 in shape. When working on such large images, we need to consider the
following possibilities:

- Can the images be resized to lower dimensions? Images of objects might
not lose information if resized; however, images of text documents might
lose considerable information if resized to a smaller size.

- Can we have a lower batch size so that the batch fits into GPU memory?
Typically, if we are working with large images, there is a good chance that
for the given batch size, the GPU memory is not sufficient to perform
computations on the batch of images.

- Do certain portions of the image contain the majority of the information,
and hence can the rest of the image be cropped?

## Leveraging OpenCV utilities

Imagine a scenario where you have to move a model to production; less complexity is
generally preferable in such a scenario – sometimes even at the cost of accuracy. If
any OpenCV module solves the problem that you are already trying to solve, in
general, it should be preferred over building a model (unless building a model from
scratch gives a considerable boost in accuracy than leveraging off-the-shelf modules).