# Malaria Project - Theoretical Background
## Detecting Malaria in Cell-Images using CNN and TF 2.0


### By:
- Lukas Wagner s0556753
- Laila Westphal s0556469

* [Introduction](#Introduction)
* [Convolutional Neural Networks](#Convolutional-Neural-Networks)
  * [Layers](#Layers)
  * [CNN architectures](#CNN-architectures)
      * [Dense Convolutional Network (DenseNet)](#Dense-Convolutional-Network-(DenseNet))
  * [CNNs for medical image classification](#CNNs-for-medical-image-classification)
* [Training CNNS](#Training-CNNS)
  * [Activation Functions](#Activation-Functions)
  * [Weight Initialization](#Weight-Initialization)
  * [Regularization](#Regularization)
  * [Data Augmentation](#Data-Augmentation)
  * [Optimization](#Optimization)
* [Evaluation of Training and Test Results](#Evaluation-of-Training-and-Test-Results)
* [Conclusion](#Conclusion)
* [References](#References)
* [License](#License)

# Introduction

The general topic of this project is to use Artificial Intelligence (AI), namely Convolutional Neural Networks (CNN) to detect malaria in blood cell images.
Being a life-threatening disease, malaria is caused by Plasmodium parasites being transmitted to humans by the bites of female Anopheles mosquitoes [1]. There are five parasite species responsible for malaria in humans. Two of them represent the biggest threat:
P. falciparum, responsible for 99.7% of the estimated malaria cases in african countries.
P. vivax, responsible for about 74.1% of the malaria cases in the Americas. The World Malaria Report 2018 states that amongst the areas where malaria cases increased by more than 20% is the WHO Region of the Americas [2]. 84% of that increase is due to malaria cases reported in Venezuela.
Even though malaria can be cured and prevented, according to the World Health Organisation (WHO), almost half of the world’s population was at risk of it in 2017. 
An article of the Korean Journal of Parasitology referenced by the National Center for Biotechnology Information (NCBI) written by N. Tangpukdee et al. states that the microscopic diagnosis of malaria requires a well trained microscopist [3]. In regions where malaria isn’t endemic any longer, its diagnosis can be difficult because clinicians might not consider it as a possible cause. Microscopists might also fail to detect it, as they are not familiar with malaria and would possibly not recognize the parasites.
An important aspect of the proposed topic is that it is a great example of how AI can be used to save human lives. A CNN can be trained to be pretty proficient at detecting the disease. Once trained it can be used easily without producing a lot of costs and might therefore help to facilitate malaria diagnosis significantly.
Automatic identification of infected cells in general [4] and malaria infected cells in particular [5] has been studied by various research groups. CNNs are known to produce high accuracy results for image recognition problems and require little input from human experts besides labelled image data. According to Saurabh Yadav (Medium), CNNs are ”The most successful type of models for image analysis till date” [6].



# Convolutional Neural Networks

Regular Neural Networks flatten the image in order to process it. By doing so the image depth gets lost whilst Convolutional Neural Networks (CNNs) preserve it. According to the course notes of Andrey Karpathy[1] CNNs expect to obtain images as input. Due to that some properties can be encoded into the architecture of the CNN. This makes implementing the forward pass more efficient and reduces the number of parameters.

When talking about CNNs there are two important concepts to understand:
   - Shared Parameters
   - Receptive Fields
   
In Regular Neural Networks with Fully-Connected Layers there is one learnable parameter per pixel. CNNs work with convolutions. To do a convolution a filter or kernel is slid or convolved across the image performing an inner product between itself and the pixels of the image it is "filtering". This is where the concept of shared parameters fits in as the network will learn one parameter by applying the filter to a group of pixels. The output of this covolution, the activation layer, is then being used as input for the next layer. There are different types of kernels. The deeper the network gets the more specified the kernels will be. This means that the kernels at the start of the network will be the ondes detecting more general features like edges, circles and basic forms. The kernels deeper within the network are getting more and more specific and will be able to recognize more specified features like for example ears, legs, dogs, eyes.

The receptive field corresponds to the kernel-size. It's size is crucial then it comes to visual tasks as it needs to be large enough to capture specific content and ensure no important information gets lost.[2] In Fully Connected Network the value of each neuron depends on the entire input whilst in CNNs it only depends on a specific region of the image. This region is called the receptive field of the neuron. However if the kernel gets too big the output will get very small very quickly. This is there padding, which will be explained in more detail in the layer section, comes in.

The following sections will explain the different types of layers that CNNs are made of. Introduce some CNN architectures that have been successful in the past and some that are state-of-the art today. In the end some examples will be given where CNNs have been used for medical image classification.

############################################################################################
[1]Andrey Karpathy. Convolutional Neural Networks (CNNs / ConvNets), part of the lecture notes of course: CS231n Convolutional Neural Networks for Visual Recognition
Retrieved from http://cs231n.github.io/convolutional-networks/

[2]Luo, W., Li, Y., Urtasun, R., & Zemel, R. (2016). Understanding the effective receptive field in deep convolutional neural networks. In Advances in neural information processing systems (pp. 4898-4906).



## Layers

CNNs are built of different types of layers. Convolutional Layers, Pooling Layers and Fully Connected Layers. These layers are stacked above each other.[1]  Fully Connected Layers are the ones used in Regular Neural Networks where every neuron of one layer is connected to all neurons in the following layer. Convolutional and pooling layers will be introduced in this section.

**Convolutional Layers** are the ones performing the above described convolutions. They are, as Karpathy puts it, the core building block of CNNs. In order to explain what happens different convolution parameters have to be introduced:
   - **Kernel-Size:** the kernel represents the number of learnable parameters of the convolutional layer. It's depth always corresponds to the depth of the input.
   - **Stride:** the stride represents the step width the kernel will use when convolving across the image.
   - **Padding:** extends the size of the input to preserve the input dimensions for the output.
Lets pretend we have an input image I with dimensions [4x4x3] and a Kernel of size [3x3]. As mentioned above the dimension of the kernel allways corresponds to the depth of the input which means that the dimension of the kernel k is [3x3x3]. 

\begin{equation}
    I =
    \begin{matrix} 
    1 & 2 & 3 & 4\\ 
    0 & 1 & 2 & 3\\
    2 & 3 & 1 & 0\\
    1 & 0 & 0 & 1
    \end{matrix}, 
    \quad
    k =
    \begin{matrix} 
    1 & -1 & 0\\ 
    1 & 0 & 1\\
    0 & 1 & 1
    \end{matrix}
\end{equation}


The dimension of the output corresponds to the following equation:

\begin{equation*}
Dim_{Output} =
\frac{Dim_{Input} - Dim_{Kernel} + 2 * Padding}{Stride} + 1
\end{equation*}

The depth of the output corresponds to the number of kernels. This means that if we stick to the above example, assuming there are 7 kernels, with padding p=0 and stride s = 1.

\begin{equation*}
Dim_{Output} =
\frac{4 - 3 + 2 * 0}{1} + 1
=
\frac{1}{1} + 1=
2
\end{equation*}

The dimension of our output or activation layer will therefore be [2x2x7].

\begin{equation*}
Output =
\begin{matrix} 
5 & 4\\ 
2 & 3 
\end{matrix}
\end{equation*}

If we now add 0-Padding to the Input: p = 1. This is what the input would look like:

\begin{equation*}
I =
\begin{matrix} 
0 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 2 & 3 & 4 & 0\\ 
0 & 0 & 1 & 2 & 3 & 0\\
0 & 2 & 3 & 1 & 0 & 0\\
0 & 1 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0
\end{matrix}
\end{equation*}

Let's calculate the Output dimension.

\begin{equation*}
Dim_{Output} =
\frac{4 - 3 + 2 * 1}{1} + 1
=
\frac{3}{1} + 1=
4
\end{equation*}

The output will be of dimension [4x4x7]  



\begin{equation*}
Output =
\begin{matrix} 
3 & 7 & 11 & 6\\ 
5 & 5 & 4 & 1\\ 
4 & 2 & 3 & 1\\
-2 & 0 & 3 & 1
\end{matrix}
\end{equation*}


**Pooling Layers** are usually placed in between a couple of succesive convolution layers. They downsample or reduce the size of the output progressively. This is done to reduce the number of parameters, to reduce computation and also to reduce the risk of overfitting. The pooling layer will go over each depth slice or layer of its input and perform, unless stated otherwise, a MAX operation. 
The Input of the pooling layer is of dimension [W<sub>I</sub> x H<sub>I</sub> x D<sub>I</sub>]. The required hyperparameters are the filter extent F and the stride S.
The pooling layers output is of dimension [W<sub>P</sub> x H<sub>P</sub> x D<sub>P</sub>]

\begin{equation}
    W_P = \frac{W_I - F}{S + 1},
    \quad\quad
    H_P = \frac{H_I - F}{S + 1},
    \quad\quad
    D_P = D_I
\end{equation}

According to Karpathy the most common form is a [2x2] filter with a stride of 2.
If we stick to our example this is what would happen if we apply max pooling:

\begin{equation}
    \begin{matrix} 
    3 & 7 & 11 & 6\\ 
    5 & 5 & 4 & 1\\ 
    4 & 2 & 3 & 1\\
    -2 & 0 & 3 & 1
    \end{matrix}
    \quad
    = 
    \quad
    \begin{matrix} 
    7 & 11\\ 
    4 & 3
    \end{matrix}
\end{equation}


########################################################################################### [1]Andrey Karpathy. Convolutional Neural Networks (CNNs / ConvNets), part of the lecture notes of course: CS231n Convolutional Neural Networks for Visual Recognition Retrieved from http://cs231n.github.io/convolutional-networks/


## CNN architectures

Deciding on which CNN architecture to use is quite difficult as there is no concrete answer to that question. According to Andrey Karpathys Lecture notes[https://cs231n.github.io/neural-networks-1/] one should use a neural network as large as the accessible computational power allows it because the larger the network the larger the space of representable functions.
Karpathy mentions that CNNs usually follow the following architecture pattern:
`INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC`
    - 0 <= N <= 3
    - M >= 0
    - 0 <= K < 3 (usually)
    - * : repetition
    - ? : optional
He also says that there are very few situations where a CNN need to be trained from scratch and recommends using a pretrained model, with the currently best working architecture, from ImageNet.

Regarding layer sizing patterns there are some recommendations:
 - **Input Layer:** Should be divisible by 2
  
 - **Conv Layer:**
     - use small kernels K: 3x3 and at most 5x5
     - use stride S = 1
     - Zero-Padding so that output dimension corresponds to input dimension. Which is the case when the following equation is applied.
      
     \begin{equation}
         P = \frac{K-1}{2}
     \end{equation}
      
     
 - **Pooling Layer:** Here the most common way is to use K=2, S=2 and MAX-Pooling
 
Over the years different types of CNN-Architectures haven been reducing the top-5 loss on imagenet.[https://medium.com/datadriveninvestor/five-powerful-cnn-architectures-b939c9ddd57b] The first one to significantly drop that loss, from 26% to 15.3%, was AlexNet in 2012. In 2014 the VGGNet made it's appearance in the same challenge, together with GoogLeNet/Inception which won the competition dropping the loss to 6.67%. ResNet appeared in 2015 and took the loss down to 3.57%. 
Today state of the art might be ResNet or googles Inception. New architectures keep appearing, one of them is DenseNet which was used to detect infected malaria cells in the [densenet_implementation_kernel](https://github.com/lai-la/MalariaProject/blob/master/malaria_project_densenet_implementation.ipynb) belonging to this notebook.



## Dense Convolutional Network (DenseNet)

![DenseNetLayers.png](ressources/DenseNetLayers.png)

Our main focus is on Tensorflow and therefore Keras. Keras provides a DenseNet implementation which we are using in our malaria_project_densenet_implementation.ipynb Kernel. The Keras documentation references a paper called “Densely Connected Convolutional Networks (CVPR 2017 Best Paper Award)”. 

A DenseNet is a CNN architecture published in 2016. The main difference between traditional CNNs and the DenseNet is the density of the connected layers. Every layer is connected to every other layer. 

Since all layers are connected each one gets additional inputs from all previous layers. Those features maps get concatenated. Layer x has x inputs. The concatenated feature maps are the global state of the network. The global state is available to all layers in the network and therefore does not have to be recalculated or replicated over and over again. 

Another advantage is that the DenseNet needs fewer parameters because the feature maps are passed directly to all the following layers and do not have to be calculated again.Narrow layers with a low number of filters create small feature maps to pass on and to add to the global state. It is efficient to train short connections from input to output.
The computational power needed to perform the training is lower due to the parameter efficiency. In consequence the efficiency to train the model increases and the time to do so decreases.

Furthermore is the DenseNet a solution to the vanishing gradient problem. The vanishing gradient problem describes the washing out process of the input which decreases to insignificance as it passes through the layers. Activation functions map inputs into a small range. Even a large input change won’t make a significant difference in the output. The gradient is small. After some activation mappings into a continuously smaller range, the output can’t be changed even by big changes in the weights. The DenseNet solves this issue by shorting the way of the gradient through the network. Conquering the vanishing gradient problem allows the construction of deeper CNNs.

Optimizing the feature maps reuse and thereby the information flow through the network results in a regularizing effect also on small datasets and prevents overfitting.

Finally the new architecture results in higher accuracy (see Table 2). 
![DenseNet-Accuracy.png](ressources/DenseNetBenchMarks.png "DenseNetBenchMarks")

###################################################################################
Huag, G, Liu, Z., van der Maaten, L., Weinberger, K. Q. (2016) Densely Connected Convolutional Networks. Retrieved from https://arxiv.org/abs/1608.06993


## CNNs for cell image classification

Classifying cells is a challenging task. It is often crucial for medical diagnosis, the prevention of diseases and also for personalized treatment. Correct classification with high precision is extremely difficult, not only for computer vision but also for specialists. [1]
Using CNNs for cell classification tasks has proven to be very efficient in some cases. Different research groups have obtained Results where the predictions made by the ConvNet have outperformed those obtained by specialists.[1]
The high amount of papers written on the subject of using CNNs for different kinds of cell image classification underlines the importance of the topic. Whether it is for classifying Human Epithelial-2 (HEp-2) Cells, which are important for the diagnosis of several autoimmun diseases [3] or to differenciate between normal breast epithelial, an agressive and a less agressive form of breast cancer,[1] or again to detect malaria in blood cells[2].
Malaria, as mentioned before, is a global health threat.[5] It caused 438.000 deaths in 2014 and has an economic inpact of about 12 billion dollar per year. Malaria is usually diagnosed visually by technicians who analyze smears of blood with a microscope. The accuracy of the diagnosis therefore depends on the experience of the technician.
Being a curable disease Malaria could be cured and also prevented and controlled in a more effective way if it's diagnosis would be more accurate.
In the paper by Dong et al. a 17 layer CNN has been used for the task of detecting malaria in blood smears. For the training 27.578 images, with an equal number of uninfected and infected cells, were used. Those images had been normalized and preprocessed to have the dimensions 44x44x3. 90% of the data was then used for training and the remaining 10 % were used for backpropagation validation. They compared the model to a transger learning model and obtained much better results with the new model. The accuracy was 97,37% and the F1 score 97,36%.
All this is to show that CNNs have an enormous potential to be used to help saving lifes and if used in the right places maybe also to make good diagnosis accessible in an economic way to a bigger part of the earth population.

########################################################################################################################

http://www.ece.uah.edu/~dwpan/papers/BHI2017.pdf
[5] Dong, Y., Jiang, Z., Shen, H., Pan, W. D., Williams, L. A., Reddy, V. V., ... & Bryan, A. W. (2017, February). Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. In 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) (pp. 101-104). IEEE.

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213626
[1] Oei, R. W., Hou, G., Liu, F., Zhong, J., Zhang, J., An, Z., ... & Yang, Y. (2019). Convolutional neural network for cell classification using microscope images of intracellular actin networks. PloS one, 14(3), e0213626.

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7822567 Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, [2] K., Palaniappan, K., ... & Huang, J. X. (2016, December). CNN-based image analysis for malaria diagnosis. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 493-496). IEEE.

https://ieeexplore.ieee.org/iel7/7486633/7493185/07493483.pdf 
[3] Phan, H. T. H., Kumar, A., Kim, J., & Feng, D. (2016, April). Transfer learning of a convolutional neural network for HEp-2 cell image classification. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) (pp. 1208-1211). IEEE.

# Training CNNS

## Activation Functions

Activation functions are nodes of the CNN that are usually placed after convolutional layers or at the end. They are non-linearities and responsable for deciding whether a neuron will fire or not.[1] It's output is passed to the next layer as an input.
There are different types of activation functions:
 - **ReLU** (Rectified Linear Unit) has been very popular lately. When using ReLU the activations are thresholded at 0.[2]
     
    \begin{equation}
        f(x) = max(0,x)
    \end{equation}

     Its advantages are that the convergence of statistical gradient decent is being accelerated and that it doesn't use any expensive operations.
     Its biggest disadvantag is that a lot the nodes can "die" during training
     
     
 - **Leaky ReLU:** is trying to fix the problem of dying neurons faced by ReLU by having a negative slope of about 0.01. According to Karpathy some people have obtained good results using it. But those results are inconsistent.
 
     \begin{equation}
        f(x) = 1(x < 0)(\alpha x)+1(x>=0)(x)
    \end{equation}
    
 - **Maxout:** generalizes ReLU and Leaky ReLU. Neurons where Maxout is used benefit from ReLU advantages and are not faced with the "dying-ReLU"-Problem.
 
     \begin{equation}
        max(w^T_1 + b_1, w^T_2x + b_2)
    \end{equation}
 
According to Karpathy's lecture nodes ReLU should be used when working with CNNs.[2] If it is causing problems Leaky ReLU or Softmax should be used and Tanh can be tryed. Sigmoid shouldn't be used.



################################################################################
[1]https://medium.com/@udemeudofia01/basic-overview-of-convolutional-neural-network-cnn-4fcc7dbb4f17
[2]https://cs231n.github.io/neural-networks-1/

## Weight Initialization
Before training a Convolutional Neural Network and its weights, the weights have to be initialized. It is important to avoid initializing all weights with 0. Initializing weights with zero would prevent the Network from training[y]. The derivative with respect to the loss function would be the same for every weight and would not changed in the next iteration and therefore not in any iteration. 
Also random weights are problematic since they facilitate vanishing gradients and exploding gradients. Vanishing gradients are explained in the DenseNet section. Exploding gradients result from positive, large weights in combination with small activations. The formula weight - activation * cost shows, that big weights can produce big steps towards the minima. So big that they can be to big to pinpoint the optimum. The best practice for initializing weights depends on the activation function as we have seen with the vanishing and exploding gradient problems. Relu activation function is robust and allows bigger weights. With a normal distribution and a variance the weights can be calculated as follows:

\begin{equation}
        \sqrt{2/size^{[l-1]}}
\end{equation}

[y]Doshi, N. (26.03.2018) Deep Learning Best Practice (1) - Weight Initialization. Retrieved from https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94

## Regularization

## Data Augmentation

It is know that a bigger data set improves the accuracy of an image classifier simply by providing more training data. Lack of data can be a problem especially in the medical sector because of the highly delicate nature of the data. The privacy concerns towards health related data do have the side effect of insufficient data gathering. Data augmentation could be a helpful way to close that gap. Small data sets are risking to train an overfitting CNN. With increasing size the data set will minimize the risk of overfitting.

Now there are two options for us to increase the size of our malaria cell image data set by data augmentation. One is to make the set as big as possible as has been shown by Google while training their Google Speech with a trillion words corpus. The second possibility is to thoughtfully decide on specific data augmentation techniques to use and enhance the data set carefully by proven and filtered features. The first option is tempting unfortunately we don’t have the training resources for that amount of data. So we will choose the second option and we will decide on specific data augmentation techniques to carefully increase the amount of reliable data and hence improve the accuracy of our CNN.

The choice of data augmentation features depends on the data we are dealing with. Our dataset consists of low quality images of roughly the same structure and size. The images are quite homogenous without many angles or colour differences, nor do they have blurry parts or weather differences. Still they vary in a small range and we will try to optimize the data set accordingly. 

The malaria cell images do not have a top or bottom, left or right, which means the cells images are provided in any random rotation. Therefore it is beneficial to teach the CNN to identify cell images from every rotation and we will rotate the original cell images in various angels. The same goes for flipping the images. There is no right way in taking a cell image, so we will adapt the data set and add a flipped version of all images. The color of the cell images depends on the camera and its configuration. Some images are brighter, some are darker. By changing the color palette of the duplicated images we will improve our data set even further.

################################################################################

Wang J. & Perez, L. (2017) The Effectiveness of Data Augmentation in Image Classiﬁcation using Deep Learning. Retrieved from http://cs231n.stanford.edu/reports/2017/pdfs/300.pdf

Halevy, A., Norvig, P. & Pereira, F. (2009) The Unreasonable Effectiveness of Data. Retrieved from https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf



## Optimization

Frameworks like Keras allow the user to use optimizers easily. This section will not explain the maths behind the different algorithms but just introduce the most famous ones quickly to get an idea of what to use.
Optimizers are used to minimize the cost or loss. The most famous one used to optimize Neural Networks is Gradient Decent.[1] However there are three variants of Gradient Decent (GD):
 - Batch-GD: used on the entire dataset, this means that all gradients have to be computed in order to perform one update. It can be very slow and might not fit into memory. It also risks to stay trapped in local minima or saddle points. It is not commonly used in practice as an optimizer for CNNs.
 - Stochastic-GD: Performs one update per parameter. It is much faster and can be used for online learning. Eventually local minima and saddle points can be crossed.
 - Mini-batch-GD: performs an update on a batch of usually <256 training examples. It combines the best of Gd and SGD and is often used in Neural Networks.
 
There are also a lot of GD optimization algorithms. Some of the most prominent ones are:
  - Momentum: accelerates SGD and reduce the oscillations usually observed within SGD. This is achieved by adding a friction. It is often explained with a ball rolling down a hill accumulating velocity unless there is some resistence slowing it down. This is basically what is used for the parameter update when using momentum: updates are reduced for gradients that change direction and get bigger for those who don't. The advantages are that convergence is obtained quicker and that there are less risks to get trapped in saddle points or local minima.
  - Adaptive Learning Rate Methods: Each gradient gets adjusted in an adaptive way. The idea behind those algorithms, like Adagrad, is to find a more or less straight path leading to the minimum.
  - RMSProp: Also an adaptive learning rate method, that has not been published but introduced in a lecture by Geoff Hinton. RMS stands for Root-Mean-Square. It prevents us from having an early-dying lerning rate. RMSProp and AdaDelta are pretty similar.
  - Adam (Adaptive Moment Estimator): Adam is the most commonly used optimizer when doing image classification. The Algorithm combines the adaptive learning rate and the momentum principle.
  \begin{equation}
  m_t = \beta _1m_{t-1} + ( 1 - \beta_1) g_t
  \end{equation}
   
  \begin{equation}
  v_t = \beta _2v_{t-1} + ( 1 - \beta_2) g^2_t
  \end{equation}
  
  m_t and v_t are initialized as vectors of zeros who bias towards zero if there decy rates, \beta_1 and \beta_2, are close to one during the initial time steps. To avoid this the two following correction terms are being used for the first initializations:
   
  \begin{equation}
      \hat{m}_t = 
      \frac{m_t}{(1 - \beta^i_1)},
      \quad
      \hat{v}_t =
      \frac{v_t}{(1 - \beta^i_2)}
  \end{equation}
   
The update of the parameters is performed according to the following equation:

\begin{equation}
    \theta = \theta_{t-1} - 
    \frac{\alpha \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}
\end{equation}


######################################################
[1]https://arxiv.org/pdf/1609.04747.pdf
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

# Evaluation of Training and Test Results

# Conclusion

This project enabled us to get a better understanding of CNNs and how they can be used for classifying images. 

# References
[1] WHO (2019, March 27). Malaria. Retrieved from https://www.who.int/news-room/fact-sheets/detail/malaria

[2] WHO (2018, November). World Malaria Report 2018. Retrieved from https://www.who.int/malaria/publications/world-malaria-report-2018/report/en/

[3] Tangpukdee, N., Duangdee, C., Wilairatana, P., & Krudsood, S. (2009). Malaria diagnosis: a brief review. The Korean journal of parasitology.

[4] Hirimutugoda, Y. M., & Wijayarathna, G. (2010). Image analysis system for detection of red cell disorders using artificial neural networks. Sri Lanka Journal of Bio-Medical Informatics.

[5] Dong, Y., Jiang, Z., Shen, H., Pan, W. D., Williams, L. A., Reddy, V. V., & Bryan, A. W. (2017). Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. 2017 IEEE EMBS International Conference on Biomedical & Health Informatics.

[6] Saurabh Yadav (2018, October 16). Brief Intro to Medical Image Analysis and Deep Learning. Retrieved from 
https://medium.com/@saurabh.yadav919/brief-intro-of-medical-image-analysis-and-deep-learning-810df940d2f7
