# <center>**Ensemble Model for Improved Personalized Lung Cancer Risk Assessment and Malignant Nodule Detection.** </center> 

## Overview 
1.

* Lung Cancer is fatal, earlier detection is crucial in treating

2.

* Phase 1 GBM w/ personalized patient characteristics more accurate screening tool
* Identifies previously misclassified patients
* Based on my research, Race/Ethnic group, BML, COPD were predictive of lung cancer risk

3.

* Once CT, nodules difficult, needle in haystack
* CAD, which should be good solution, only secondary because of high false positive rate
* Ensemble of CNN identify lung nodules and predict one year malignancy
* Sensitivity 90%, less 2 fps per scan

4.

* Automated first screen identify high risk and located nodules

### By: Suraj Anand

### 2018-2019



# <center>Research</center>
**This notebook has pictures of the methods**

* [Gradient Boosted Machine](#GBM) 
* [CT Scan Ensemble Overview](#CNN)
* [Convolutional Network Explanation](#Model)
* [Augmentation](#Aug)
* [Fully Convolutional Network Candidate Regions](#FCN)
* [Four Discriminator Convolutional Network Candidate Regions](#Disc)
* [Linear Classifier](#Linear)
* [Summary](#Summary)

<a id='GBM'></a>
## <center> Gradient Boosted Machine </center>

There are 50 of these trees trained on a dataset of 23,000 patients (12,000 with lung cancer).
The training algorithm is as follows:

![IMG](LC_FIGs/gbm.png)
![IMG](LC_FIGs/out.png)
![IMG](LC_FIGs/branch.png)

## Sensitivity of 89% compared to previous guideline sensitivity of 23%

<a id='CNN'></a>
# <center> Machine Learning CT Scan CNN Ensemble Overview </center>

### 1. Preprocessing of DICOM CT Scans
### 2. Fully Convolutional Network for Candidate Nodule Regions 
### 3. Four 3D CNN Discriminator Networks for Detection of Nodules with Malignant Potential
                       1. Unet Network for course nodule segmentation
                       2. VGG-like scanner CNN for fine-grained nodule probabilistic determination
                       3. VGG-like scanner CNN for fine-grained nodule probabilistic determination (trained on false 
                          positives)
                       4. Unet network for anomalous tissue detection 
                       
             *All networks except the 3rd are trained on the full training set
             *All networks except the 4th detect nodules and predict calcification, size, luminosity
 
### 4. Linear Classifier predicting development of lung cancer within one year

## **Preprocessing of the CT Scans**

* **Loading the DICOM files**, and adding missing metadata  
* **Converting the pixel values to *Hounsfield Units (HU)***, and what tissue these unit values correspond to
* **Resampling** to an isomorphic resolution to remove variance in scanner resolution.
* **3D plotting**, visualization is very useful to see what we are doing.
* **Lung segmentation**
* **Normalization** that makes sense.
* **Zero centering** the scans.



### HU Units
![IMG](LC_FIGS/HU.png)

### Lung Segmentation
![GIF](whole-scan-downsampled5x.gif)

![GIF](lung_segmentation.gif)

### Lung 3D Visualizations using the HU Units

Applied filter to HU units to process bones
![IMG](LC_FIGS/ribs.png)

Applied filter to HU units to process lung
![IMG](LC_FIGS/visualLung.png)

* ### HU units cropped from -1000 to 400 to simplify problem
* ### Scans are resampled using interpolation to account for scanner variability
* ### Scans are normalized

<a id='Aug'></a>
## Augmentation

![IMG](LC_FIGS/augmentation.png)





## Hand Labeled Big Pulmonary Masses to Improve Classifier


Example of Nodule Detection of LUNA Candidates with labeled Pulmonary Mass
![IMG](LC_FIGS/Annotated.png)

<a id='Model'></a>
## Model Construction

## How a Convolutional Neural Network Works
![IMG](LC_FIGS/convolutional.png)

![IMG](LC_FIGS/pool.png)
![IMG](LC_FIGS/ex2.png)
![IMG](LC_FIGS/ex1.png)

Applied filter to HU units to process lung
![IMG](LC_FIGS/convolution_schematic.gif)

<a id='FCN'></a>

## Fully Convolutional Network for Candidate Nodule Regions 

* **Operates only in spatial realm (no fully-connected layer) producing much faster processing**
* **Reduces area processed by 4 Discriminators**

Mask computed by Fully Convolutional Network
![IMG](LC_FIGS/mask.png)

Example of Candidate Nodule Region Generation
![IMG](LC_FIGS/candidate_nodules2.gif)





<a id='Disc'></a>
## Four 3D CNN Discriminator Networks for Detection of Nodules with Malignant Potential, and Nodule Morphology Predictions 
                       1. Unet Network for course nodule segmentation
                       2. VGG-like scanner CNN for fine-grained nodule probabilistic determination
                       3. VGG-like scanner CNN for fine-grained nodule probabilistic determination (trained on false 
                          positives)
                       4. Unet network for anomalous tissue detection 
                       
             *All networks except the 3rd are trained on the full training set
             *All networks except the 4th detect nodules and predict calcification, size, luminosity
 
# To decrease false positive rate, third CNN trained on set of false positives

Example of Nodule Detection from best discriminator
![IMG](LC_FIGS/final_nod.gif)
size: 30
calcification: [ 0. 0. 0. 0. 0. 1.]
spherity: [ 0. 0.25 0.75]
malignancy:	18.08 (trained from squared)
spiculation: 1.5
#### **The Ensemble also predicts the z-location, calcification, spiculation as seen above**






<a id='Linear'></a>

## Linear Classifier predicting development of lung cancer within one year
![IMG](LC_FIGS/linreg.png)

# FINAL PREDICTION: 88% accuracy 
## This is probability of one-year lung cancer development



<a id='Summary'></a>


