# A Machine Learning Approach to Lung Nodule Detection

## Shuyang Dai, Morgan Ringel, Lisa Sapozhnikov

Netids: sd301, ls258, mjr52

### Abstract

Identifying nodules in CT scan images of the human lung is crucial to being able to detect lung cancer early on while it is still treatable. Working with CT scans from the Lung Image Database Consortium (LIDC) [1], we break the images into smaller patches, ending up with few patches that have nodules (positive) in them while most other patches do not (negative). Due to this class imbalance, simply using a CNN classifier would be a waste of much of the available data.  Instead, we consider building an autoencoder that is trained only on the negative images. When the autoencoder receives positive input images during testing, it will not be able to accurately reconstruct the original image due to a lack of familiarity. The autoencoder should be able to determine whether a input test image has a nodule in it or not by developing distinct reconstruction residual ranges for positive vs. negative images. Out autoencoder greatly outperforms a baseline CNN classifier, steadily emitting much lower reconstruction residual errors for healthy images, and higher residuals for images containing nodules.

### Introduction

Lung cancer is the second most common cancer, and the leading cause of cancer-related death in both men and women. The chance of a man developing lung cancer over the course of his life is 1 in 15, versus 1 in 17 for a woman. A nodule is is defined as an abnormal cell growth on the lung that is three centimeters (about 1.5 inches) in diameter or less. At that size, lung cancer is very curable. Lung nodules are extremely prevalent, with 150,000 being detected in the US each year, and are typically only detected accidentally, through X-Rays for other medical needs, as they are too small to yet cause symptoms. While the majority of nodules are benign, almost 40% turn out to be cancerous. Especially considering their pervasiveness, it is a worthwhile goal to increase the efficiency of detection of this tiny, potentially life-threatening abnormality.

In this project, we want to apply a convolutional neural network (CNN) to detect abnormal nodules in a CT scan of a human lung as a baseline model. CNNs have consistently outperformed traditional computer vision algorithms in image classification tasks [6]. However, CNNs do not perform well on imbalanced classes [5].  As an alternative to the CNN model, this project also proposes an autoencoder that is trained only on image patches without nodules.  The autoencoder determines whether an input test image has a nodule in it or not by considering the reconstruction residuals. This project will build, test, and compare models that have the potential to be used as a computer aided diagnostic tool for lung cancer.

### Background

Several machine learning approaches have already been taken to attempt to detect and diagnose lung nodules. Ding et. al. (2017)  and Golan et. al. (2016) both prioritized deep Convolutional Neural Networks as classification models. Golan et. al. [4] used a sliding window to traverse a CT image, feeding each window into a dCNN containing 3 convolution and 3 pooling layers, resulting in a matrix representing the probability of a nodule existing in each window patch. This system yielded a sensitivity of 78.9% with 20 false positives per scan, or a sensitivity of 71.2% with 10 false positives per scan.

In a more complex approach, Ding et. al. [2] first used a dCNN to classify images as “nodule candidates”, achieving a high true positive rate of 94.6 %. Next, a dCNN containing six 3D convolutional layers, followed by Rectified Linear Unit (ReLU) activation layers, three 3D max-pooling layers, three fully connected layers, and a final 2-way softmax activation layer was used to identify the false positive images which had incorrectly been identified as nodule candidates. This complex system greatly increased specificity while retaining sensitivity, yielding a final FROC score of 0.891.

Song et. al. (2017) [3] classified lung nodules as either malignant or benign, and compared the results generated by a CNN and a stacked autoencoder (SAE). The CNN slightly outperformed the SAE, with a sensitivity of 83.96% and a specificity of 84.32%, while the SAE displayed a sensitivity of 83.96% and a specificity of 81.35%. 

Despite this slightly poorer performance, we were interested in an autoencoder for its image reconstruction abilities. When tasked with greater variety in images containing a nodule versus images lacking a nodule, we believed an autoencoder’s dimensionality reduction would perform well with the high variance between the classes, particularly outperforming a CNN on an imbalanced dataset.

### Data

Our training and testing data are both pulled from the Lung Image Database Consortium (LIDC) [1]. The CT scanning process involves scanning and collecting 2-dimensional images from the top to the bottom of a human lung one slice at a time. In our dataset, we have 1012 patients each with 100 to 300 slices of images. Each original scan is 512 x 512 in grayscale and for computational efficiency is divided into sixteen 32 x 32 patches. A nodule is mostly a circular shape not larger than 32 x 32. Figure 1 shows two slices of sample images. The left has a big nodule shown in the red box while the right has a small nodule. These nodules are enlarged and shown in the middle.

<img src="final_images/fig1.png" width="600">

One problem we needed to solve was how to distinguish between a nodule and a blood vessel, since they are virtually indistinguishable in 2D space (shown in the blue boxes in Figure 1). To overcome this challenge, we visualized the image in 3 dimensions. As shown in Figure 2, we can visualize a nodule as a ball, and visualize a blood vessel as a tilted cylinder, both of which are three dimensional. Thus, if we take k slices of the images (i.e., k = 3), overlap them, and visualize them in 2 dimensions, a nodule would look different from a blood vessel. In other words, the circular shape in each slice of a nodule should approximately share the same center, while for a blood vessel, the center of the circular shape tends to shift in every slice.

<img src="final_images/fig2.png" width="700">

### Methods

**Baseline Model (CNN Classifier)**

The data we used to train and test the CNN Classifier model consists of 5,000 32x32 3-channel patches from the CT images.  We used 80% (4,000) for training and 20% (1,000) for testing.  We artificially balanced both the training and testing datasets with equal nodule and no nodule classes to ensure optimal performance for the CNN Classifier baseline model. It should be noted that the real world application would not have balanced classes - the negative (no nodule class) would dominate.  We artificially balanced the classes in order to improve our baseline performance, and the unbalanced classes condition is addressed with the autoencoder model.

<img src="final_images/fig3.png" width="700">

As shown in Figure 3 above, the CNN Model Architecture consisted of 3 convolutional layers with 1 fully connected layer.  We applied batch normalization and rectified linear unit (ReLU) after each layer.  We used four-fold cross validation and mini batch size of 50. This gave us (3x4000)/(4x50) = 60 iterations per epoch. We trained the model for 20 epochs, and then used the test data to generate an ROC curve and AUC measurement to evaluate performance.

**Autoencoder Model**

One problem with the CNN model is that due to limited amount of data, the model may easily encounter overfitting. Indeed, while the amount of nodule images are limited, we do have many more images that do not contain nodule. In order to sufficiently use all the data and avoid any data unbalance issue, we propose an autoencoder model. Specifically, the model is trained only on the non-nodule images. After training, the autoencoder should act like a dictionary with only non-nodule information. Intuitively, once it receives anything it has never seen before, it would be able to tell the difference.  

The key difference between an autoencoder and a CNN classifier is that the former has both an encoder and a decoder while the later can be only interpreted as an encoder. In our autoencoder model, shown in Figure 4 below, the encoder has 3 layers of convolutional networks with rectified linear unit (ReLU) that map the original image patch of size 32 x 32 x 3 to size 4 x 4 x 64, and then 2 fully connected layers that maps the matrix to latent code z of size 64. The decoder has 2 fully connected layers at the beginning that maps the latent code z to a vector of size 1024, and then it is resized to 4 x4 x 64. Using 3 layers of deconvolutional networks with ReLU, the resized matrix is then mapped and recovered back to the size of the original input image, which is 32 x 32 x 3. Note that we uses 2 fully connected layers instead of using only one as the CNN model because we have way more training data than what we have previously, which allows us to increase the model flexibility. 

<img src="final_images/fig4.png" width="700">

We randomly select patches that do not contain nodule from every patient and use 128000 (training batch size is 64) of them as our input dataset to train the autoencoder. In every iteration, the parameters in the model are updated by propagating back the gradient to each layer of the neural networks using the reconstruction loss (i.e., the euclidean distance between the original image x and the reconstructed image x’). After 15 epochs of training, the model learns how to reconstruct images from the training set (i.e., images that do not contain nodule).  

We use the 2000 non-nodule images and 2000 nodule images for testing, none of which has been used for training. Ideally, the model should only memorize how to reconstruct non-nodule images after training and would give high reconstruction residuals for the nodule images from the testing set.

### Results

**Baseline Model (CNN Classifier)**

Our baseline CNN classifier was able to achieve a decent performance, with an AUC score of 0.83 as shown in Figure 5 below. 

<img src="final_images/fig5.png" width="700">

**Autoencoder Model**

We run our model with cross validation, store the reconstruction residual for each image (from testing sets with and without nodules), and compare the distributions of the two sets of residuals. 

Figure 6 shows the healthy images reconstructed by the autoencoder, producing reconstructed images similar in quality to the original images. Figure 7 shows the nodule images reconstructed by the encoder. Since the encoder was only trained on healthy images, it does not know how to properly recreate nodule images, leading to distorted reconstruction.

<img src="final_images/fig6.png" width="600">

<img src="final_images/fig7.png" width="600">

Ideally, we would like to see a clear separation between the residual distribution of the nodule set and that of the non-nodule set. Figure 8 shows the boxplots of the two distributions after every epoch. Note that there are only 10 epochs shown in the figure because we realize the model usually converges after 10 epochs (i.e. the performance does not improve). At the 10th epoch, we can see that the top 75% of the non-nodule set and the bottom 75% of the nodule set are not overlapping, meaning that it is possible to set a threshold in order to perform nodule classification. Figure 9 shows a ROC curve (with AUC 0.967) of the two distributions at the tenth epoch.

<img src="final_images/fig8.png" width="600">

<img src="final_images/fig9.png" width="600">

**Model Comparison**

The CNN Classifier baseline model had AUC = 0.83 while the Autoencoder had AUC=0.967, showing that the proposed autoencoder model had better performance for this application. The autoencoder did not require the classes to be artificially balanced to achieve optimal performance - it performed well with unbalanced classes.

### Conclusions

In order to detect lung nodules in CT scan images, we compared the performance of a CNN classified and autoencoder. Due to an imbalance in the dataset (many more healthy images than nodule images), the CNN could not be trained on the entire healthy dataset. The autoencoder on the other hand, was trained on all the healthy images available and was able to learn to reconstruct them very accurately, reconstructing healthy images with a residual error distinctly lower than reconstructed nodule images.

The autoencoder greatly outperformed our CNN classifier, with the test data classification producing an AUC score of 0.967 based on 1000 data points evenly split amongst the two classes. We can next apply out autoencoder approach to more complex image detection goals, such as locating nodule within a full-size CT scan image, or classifying a nodule as benign or malignant.

### Roles

**Shuyang Dai**: Autoencoder, Video, Report

**Morgan Ringel**: Baseline CNN classifier, Video, Report

**Lisa Sapozhnikov**: Baseline CNN classifier, Video, Report

### References

[1] Armato,  S. G.,  McLennan,  G.,  Bidaut,  L.,  McNitt-Gray,  M. F.,  Meyer,  C. R.,  Reeves,  A. P.,  & Kazerooni, E. A. (2011). The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics (38(2), 915-931). https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI 

[2] Ding, J., Li, A., Hu, Z., Wang, L. (2017). Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. School of EECS, Peking University. https://arxiv.org/pdf/1706.04303.pdf

[3] Dou, X., Luo, X., Song, Q., Zhao, L. (2017). Using deep learning for classification of lung nodules on computed tomography images. downloads.hindawi.com/journals/jhe/2017/8314740.pdf

[4] Golan, R., Jacob, C., Denzinger, J. ( Lung nodule detection in CT images using deep convolutional neural networks. 2016 International Joint Conference on Neural Networks (IJCNN). https://ieeexplore.ieee.org/document/7727205/metrics

[5] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems (pp. 1097-1105).

[6] Hensman, P., Masko, D. (2015). The impact of imbalanced training data for convolutional neural networks. Kth Royal Institute of Technology.
https://www.kth.se/social/files/588617ebf2765401cfcc478c/PHensmanDMasko_dkand15.pdf 

[7] Olson, Eric J. (2017). Lung nodules: Can they be cancerous? Mayo Clinic. https://www.mayoclinic.org/diseases-conditions/lung-cancer/expert-answers/lung-nodules/faq-20058445 

