# Malaria Project - Theoretical Background
## Detecting Malaria in Cell-Images using CNN and TF 2.0


### By:
- Lukas Wagner s0556753
- Laila Westphal s0556469

* [Introduction](#Introduction)
* [Convolutional Neural Networks](#Convolutional-Neural-Networks)
  * [Layers](#Layers)
  * [CNN architectures](#CNN-architectures)
  * [CNNs for (medical) image classification](#CNNs-for-(medical)-image-classification)
* [Training CNNS](#Training-CNNS)
  * [Activation Functions](#Activation-Functions)
  * [Weight Initialization](#Weight-Initialization)
  * [Regularization](#Regularization)
  * [Data Augmentation](#Data-Augmentation)
* [Evaluation of Training and Test Results](#Evaluation-of-Training-and-Test-Results)
* [Conclusion](#Conclusion)
* [References](#References)
* [License](#License)

# Introduction

The general topic of this project is to use Artificial Intelligence (AI), namely Convolutional Neural Networks (CNN) to detect malaria in blood cell images.
Being a life-threatening disease, malaria is caused by Plasmodium parasites being transmitted to humans by the bites of female Anopheles mosquitoes [1]. There are five parasite species responsible for malaria in humans. Two of them represent the biggest threat:
P. falciparum, responsible for 99.7% of the estimated malaria cases in african countries.
P. vivax, responsible for about 74.1% of the malaria cases in the Americas. The World Malaria Report 2018 states that amongst the areas where malaria cases increased by more than 20% is the WHO Region of the Americas [2]. 84% of that increase is due to malaria cases reported in Venezuela.
Even though malaria can be cured and prevented, according to the World Health Organisation (WHO), almost half of the world’s population was at risk of it in 2017. 
An article of the Korean Journal of Parasitology referenced by the National Center for Biotechnology Information (NCBI) written by N. Tangpukdee et al. states that the microscopic diagnosis of malaria requires a well trained microscopist [3]. In regions where malaria isn’t endemic any longer, its diagnosis can be difficult because clinicians might not consider it as a possible cause. Microscopists might also fail to detect it, as they are not familiar with malaria and would possibly not recognize the parasites.
An important aspect of the proposed topic is that it is a great example of how AI can be used to save human lives. A CNN can be trained to be pretty proficient at detecting the disease. Once trained it can be used easily without producing a lot of costs and might therefore help to facilitate malaria diagnosis significantly.
Automatic identification of infected cells in general [4] and malaria infected cells in particular [5] has been studied by various research groups. CNNs are known to produce high accuracy results for image recognition problems and require little input from human experts besides labelled image data. According to Saurabh Yadav (Medium), CNNs are ”The most successful type of models for image analysis till date” [6].



# Convolutional Neural Networks

Regular Neural Networks flatten the image in order to process it. By doing so the image depth gets lost whilst Convolutional Neural Networks (CNNs) preserve it. According to the course notes of Andrey Karpathy[1] CNNs expect to obtain images as input. Due to that some properties can be encoded into the architecture of the CNN. This makes implementing the forward pass more efficient and reduces the number of parameters.

When talking about CNNs there are two important concepts to understand:
   - Shared Parameters
   - Receptive Fields
   
In Regular Neural Networks with Fully-Connected Layers there is one learnable parameter per pixel. CNNs work with convolutions. To do a convolution a filter or kernel is slid or convolved across the image performing an inner product between itself and the pixels of the image it is "filtering". This is where the concept of shared parameters fits in as the network will learn one parameter by applying the filter to a group of pixels. The output of this covolution, the activation layer, is then being used as input for the next layer. There are different types of kernels. The deeper the network gets the more specified the kernels will be. This means that the kernels at the start of the network will be the ondes detecting more general features like edges, circles and basic forms. The kernels deeper within the network are getting more and more specific and will be able to recognize more specified features like for example ears, legs, dogs, eyes.

The receptive field corresponds to the kernel-size. It's size is crucial then it comes to visual tasks as it needs to be large enough to capture specific content and ensure no important information gets lost.[2] In Fully Connected Network the value of each neuron depends on the entire input whilst in CNNs it only depends on a specific region of the image. This region is called the receptive field of the neuron. However if the kernel gets too big the output will get very small very quickly. This is there padding, which will be explained in more detail in the layer section, comes in.

The following sections will explain the different types of layers that CNNs are made of. Introduce some CNN architectures that have been successful in the past and some that are state-of-the art today. In the end some examples will be given where CNNs have been used for medical image classification.

############################################################################################
[1]Andrey Karpathy. Convolutional Neural Networks (CNNs / ConvNets), part of the lecture notes of course: CS231n Convolutional Neural Networks for Visual Recognition
Retrieved from http://cs231n.github.io/convolutional-networks/

[2]Luo, W., Li, Y., Urtasun, R., & Zemel, R. (2016). Understanding the effective receptive field in deep convolutional neural networks. In Advances in neural information processing systems (pp. 4898-4906).



## Layers

CNNs are built of different types of layers. Convolutional Layers, Pooling Layers and Fully Connected Layers. These layers are stacked above each other.[1]  Fully Connected Layers are the ones used in Regular Neural Networks where every neuron of one layer is connected to all neurons in the following layer. Convolutional and pooling layers will be introduced in this section.

**Convolutional Layers** are the ones performing the above described convolutions. They are, as Karpathy puts it, the core building block of CNNs. In order to explain what happens different convolution parameters have to be introduced:
   - **Kernel-Size:** the kernel represents the number of learnable parameters of the convolutional layer. It's depth always corresponds to the depth of the input.
   - **Stride:** the stride represents the step width the kernel will use when convolving across the image.
   - **Padding:** extends the size of the input to preserve the input dimensions for the output.
Lets pretend we have an input image I with dimensions [4x4x3] and a Kernel of size [3x3]. As mentioned above the dimension of the kernel allways corresponds to the depth of the input which means that the dimension of the kernel k is [3x3x3]. 

\begin{equation}
    I =
    \begin{matrix} 
    1 & 2 & 3 & 4\\ 
    0 & 1 & 2 & 3\\
    2 & 3 & 1 & 0\\
    1 & 0 & 0 & 1
    \end{matrix}, 
    \quad
    k =
    \begin{matrix} 
    1 & -1 & 0\\ 
    1 & 0 & 1\\
    0 & 1 & 1
    \end{matrix}
\end{equation}


The dimension of the output corresponds to the following equation:

\begin{equation*}
Dim_{Output} =
\frac{Dim_{Input} - Dim_{Kernel} + 2 * Padding}{Stride} + 1
\end{equation*}

The depth of the output corresponds to the number of kernels. This means that if we stick to the above example, assuming there are 7 kernels, with padding p=0 and stride s = 1.

\begin{equation*}
Dim_{Output} =
\frac{4 - 3 + 2 * 0}{1} + 1
=
\frac{1}{1} + 1=
2
\end{equation*}

The dimension of our output or activation layer will therefore be [2x2x7].

\begin{equation*}
Output =
\begin{matrix} 
5 & 4\\ 
2 & 3 
\end{matrix}
\end{equation*}

If we now add 0-Padding to the Input: p = 1. This is what the input would look like:

\begin{equation*}
I =
\begin{matrix} 
0 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 2 & 3 & 4 & 0\\ 
0 & 0 & 1 & 2 & 3 & 0\\
0 & 2 & 3 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0
\end{matrix}
\end{equation*}

Let's calculate the Output dimension.

\begin{equation*}
Dim_{Output} =
\frac{4 - 3 + 2 * 1}{1} + 1
=
\frac{3}{1} + 1=
4
\end{equation*}

The output will be of dimension [4x4x7]  



\begin{equation*}
Output =
\begin{matrix} 
3 & 7 & 11 & 6\\ 
5 & 5 & 4 & 1\\ 
4 & 2 & 3 & 1\\
-2 & 0 & 3 & 1
\end{matrix}
\end{equation*}


**Pooling Layers** are usually placed in between a couple of succesive convolution layers. They downsample or reduce the size of the output progressively. This is done to reduce the number of parameters, to reduce computation and also to reduce the risk of overfitting. The pooling layer will go over each depth slice or layer of its input and perform, unless stated otherwise, a MAX operation. 
The Input of the pooling layer is of dimension [W<sub>I</sub> x H<sub>I</sub> x D<sub>I</sub>]. The required hyperparameters are the filter extent F and the stride S.
The pooling layers output is of dimension [W<sub>P</sub> x H<sub>P</sub> x D<sub>P</sub>]

\begin{equation}
    W_P = \frac{W_I - F}{S + 1},
    \quad\quad
    H_P = \frac{H_I - F}{S + 1},
    \quad\quad
    D_P = D_I
\end{equation}

According to Karpathy the most common form is a [2x2] filter with a stride of 2.
If we stick to our example this is what would happen if we apply max pooling:

\begin{equation}
    \begin{matrix} 
    3 & 7 & 11 & 6\\ 
    5 & 5 & 4 & 1\\ 
    4 & 2 & 3 & 1\\
    -2 & 0 & 3 & 1
    \end{matrix}
    \quad
    = 
    \quad
    \begin{matrix} 
    7 & 11\\ 
    4 & 3
    \end{matrix}
\end{equation}


########################################################################################### [1]Andrey Karpathy. Convolutional Neural Networks (CNNs / ConvNets), part of the lecture notes of course: CS231n Convolutional Neural Networks for Visual Recognition Retrieved from http://cs231n.github.io/convolutional-networks/


## CNN architectures

Deciding on which CNN architecture to use is quite difficult as there is no concrete answer to that question. According to Andrey Karpathys Lecture notes[https://cs231n.github.io/neural-networks-1/] one should use a neural network as large as the accessible computational power allows it because the larger the network the larger the space of representable functions.
Karpathy mentions that CNNs usually follow the following architecture pattern:
`INPUT -> [[]CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC`
    - 0 <= N <= 3
    - M >= 0
    - 0 <= K < 3 (usually)
    - * : repetition
    - ? : optional
He also says that there are very few situations where a CNN need to be trained from scratch and recommends using a pretrained model, with the currently best working architecture, from ImageNet.

Regarding layer sizing patterns there are some recommendations:
 - **Input Layer:** Should be divisible by 2
 - **Conv Layer:**
     - use small kernels K: 3x3 and at most 5x5
     - use stride S = 1
     - Zero-Padding so that output dimension corresponds to input dimension. Which is the case when the following equation is applied.
     \begin{equation}
         P = \frac{K-1}{2}
     \end{equation}
 - **Pooling Layer:** Here it is most common to use K=2, S=2 and MAX-Pooling



## CNNs for (medical) image classification

http://www.ece.uah.edu/~dwpan/papers/BHI2017.pdf
Dong, Y., Jiang, Z., Shen, H., Pan, W. D., Williams, L. A., Reddy, V. V., ... & Bryan, A. W. (2017, February). Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. In 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) (pp. 101-104). IEEE.

http://static.aixpaper.com/pdf/d/d8/gs.2014.c24df31c2d.v1.pdf
Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D. D., & Chen, M. (2014, December). Medical image classification with convolutional neural network. In 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV) (pp. 844-848). IEEE.

# Training CNNS

## Activation Functions

## Weight Initialization

## Regularization

## Data Augmentation

It is know that a bigger data set improves the accuracy of an image classifier simply by providing more training data. Lack of data can be a problem especially in the medical sector because of the highly delicate nature of the data. The privacy concerns towards health related data do have the side effect of insufficient data gathering. Data augmentation could be a helpful way to close that gap. Small data sets are risking to train an overfitting CNN. With increasing size the data set will minimize the risk of overfitting.

Now there are two options for us to increase the size of our malaria cell image data set by data augmentation. One is to make the set as big as possible as has been shown by Google while training their Google Speech with a trillion words corpus. The second possibility is to thoughtfully decide on specific data augmentation techniques to use and enhance the data set carefully by proven and filtered features. The first option is tempting unfortunately we don’t have the training resources for that amount of data. So we will choose the second option and we will decide on specific data augmentation techniques to carefully increase the amount of reliable data and hence improve the accuracy of our CNN.

The choice of data augmentation features depends on the data we are dealing with. Our dataset consists of low quality images of roughly the same structure and size. The images are quite homogenous without many angles or colour differences, nor do they have blurry parts or weather differences. Still they vary in a small range and we will try to optimize the data set accordingly. 

The malaria cell images do not have a top or bottom, left or right, which means the cells images are provided in any random rotation. Therefore it is beneficial to teach the CNN to identify cell images from every rotation and we will rotate the original cell images in various angels. The same goes for flipping the images. There is no right way in taking a cell image, so we will adapt the data set and add a flipped version of all images. The color of the cell images depends on the camera and its configuration. Some images are brighter, some are darker. By changing the color palette of the duplicated images we will improve our data set even further.

Not yet decided:
The images will be processed as uint8 arrays. Uint8 arrays are very suitable for data augmentation tools. Our choice of image augmentation library will be imgaug. All needed features are available and easy to use.

Wang J. & Perez, L. (2017) The Effectiveness of Data Augmentation in Image Classiﬁcation using Deep Learning. Retrieved from http://cs231n.stanford.edu/reports/2017/pdfs/300.pdf

Halevy, A., Norvig, P. & Pereira, F. (2009) The Unreasonable Effectiveness of Data. Retrieved from https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf



# Evaluation of Training and Test Results

# Conclusion

# References
[1] WHO (2019, March 27). Malaria. Retrieved from https://www.who.int/news-room/fact-sheets/detail/malaria

[2] WHO (2018, November). World Malaria Report 2018. Retrieved from https://www.who.int/malaria/publications/world-malaria-report-2018/report/en/

[3] Tangpukdee, N., Duangdee, C., Wilairatana, P., & Krudsood, S. (2009). Malaria diagnosis: a brief review. The Korean journal of parasitology.

[4] Hirimutugoda, Y. M., & Wijayarathna, G. (2010). Image analysis system for detection of red cell disorders using artificial neural networks. Sri Lanka Journal of Bio-Medical Informatics.

[5] Dong, Y., Jiang, Z., Shen, H., Pan, W. D., Williams, L. A., Reddy, V. V., & Bryan, A. W. (2017). Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. 2017 IEEE EMBS International Conference on Biomedical & Health Informatics.

[6] Saurabh Yadav (2018, October 16). Brief Intro to Medical Image Analysis and Deep Learning. Retrieved from 
https://medium.com/@saurabh.yadav919/brief-intro-of-medical-image-analysis-and-deep-learning-810df940d2f7
