<br>
<center><font size=7 color='#0020C2'>Malaria Detection in Red Blood Cells<font></center>  
    <br>
<center><font size=7 color='#0020C2'>Using Deep Learning Image Analysis<font></center>  
    <br>
<center><font size=7 color='#0020C2'>Stuart Huntley<font></center>

# Introduction

## Context

* **Malaria is a serious, contagious disease caused by *Plasmodium* parasites**
* *Plasmodium* enters the blood stream via bites from infected *Anopheles* mosquitos
* *Plasmodium* can reside asymptomatically for more than a year in the human body
* After infection, the parasite then invades red blood cells (RBCs) resulting in:
> * changes in the shape, permeability, and adhesiveness of the RBCs
> * these effects lead to destruction of infected and uninfected RBCS which can cause severe anemia and cerebral malaria
* World-wide:
> * almost 50% of the population is in danger of *Plasmodium* infections
> * an estimated 200 to more than 500 million people are infected annually
> * one half million (mostly infants and children) succumb to the illness
* Traditional diagnosis of malaria in the laboratory:
> * requires careful inspection of RBCs by an experienced professional
> * detection of infected cells is a tedious and time consuming manual process
> * accuracy of detection is adversely impacted by inspector fatigue and inter-observer variability
* An automated system can help with rapid and accurate detection of *Plasmodium* infected RBCs
* Applications of automated classification techniques using Machine Learning and Artificial Intelligence have consistently shown higher accuracy than manual classification
* The purpose of this study is to apply Deep Learning Algorithms to create a model that can detect infected cells by analyzing images of RBCs with a high level of accuracy

## Objective

* Build an efficient computer vision model to detect malaria
* The model should identify whether the RBC in an image is infected or not
* The model should then properly classify the sample as infected or uninfected

## Dataset for model creation

* The dataset consist of ~ 28,500 color images of individual RBCs
> * all images are in portable network graphics (png) format
> * NOTE: the dimensions of individual pictures vary significantly and will need to be corrected during data preparation
* The dataset exists pre-divided:
> * 'Training' (~25,000 images) and 'Test' (2600 images) datasets
> * The 'Training and 'Test' datasets were each separated into two directories:
>   * One directory containing only images of 'Parasitized' RBCs
>   * A second directory containing only images of 'Uninfected' RBCs<br>

**Below are example images for the 'Parasitized' and 'Uninfected' RBCs**<br>
<img alt="RBC image samples"
     style="margin-left: 1%;  width: 70%" 
     src="https://drive.google.com/uc?id=1LRBD_K9qrkgITRnDfWuJbXrTRNpFX9CD"/>

# Python Libraries

In [1]:
# To ignore warnings
import warnings
warnings.filterwarnings('ignore')

# import normal machine learning tool libraries
import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np
import pandas as pd
import seaborn as sns
import random
from random import shuffle

# to import data from images directories
import os

# Data Preparation

## Import the Data

**Data for this project is available for download at:**<br>
https://drive.google.com/file/d/1n3o1Xghpy9ufZwHkQFE5l5d9sUHQOUWM/view?pli=1

**The downloaded, zip-compressed dataset was decompressed by bash command at the time of**<br>
**download and stored in a folder "DL_Malaria_data/" adjacent to the working directory:**

<pre>
parent_directory/
├─ project_working_directory/
│  ├─ Malaria_RBC_CNN_model.ipynb
├─ DL_Malaria_data/
   ├─ cell_images/
      ├─ test/
      │  ├─ parasitized/
      │  ├─ uninfected/
      ├─ train/
         ├─ parasitized/
         ├─ uninfected/
</pre>


### Option 1: using Google Colab

**Note: The commands in this subsection are commented out as this was not the method used during creation of this notebook**

#### Mount the google drive

In [2]:
#from google.colab import drive
#drive.mount('/content/drive')

#### Assign the locations of the four image datasets to variables

In [3]:
# Assign variables indicating the Google Drive locations where the four
# data sets are located (train-parasitized, train-uninfected, test-parasitized,
# and test_uninfected)
#train_parasitized = '/content/cell_images/train/parasitized/'
#train_uninfected = '/content/cell_images/train/uninfected/'
#test_parasitized = '/content/cell_images/test/parasitized/'
#test_uninfected = '/content/cell_images/test/uninfected/'

### Option 2: using a local computer

#### Indicate the pathways to the relevant folders

In [6]:
# Assign variables indicating the local drive locations where the four
# data sets are located (train-parasitized, train-uninfected, test-parasitized,
# and test_uninfected)
train_parasitized = '../DL_Malaria_data/cell_images/train/parasitized/'
train_uninfected = '../DL_Malaria_data/cell_images/train/uninfected/'
test_parasitized = '../DL_Malaria_data/cell_images/test/parasitized/'
test_uninfected = '../DL_Malaria_data/cell_images/test/uninfected/'

### Set parameters for images

In [7]:
# set parameter for image dimensions per side (64 X 64 pixels) to standardize size of images
side_length = 64

# Data Exploration

# Image Manipulation Exploration

# CNN Modeling Exploration

# Insights and Recommendations