# MIMIA Final Project

In this project, I will be working on lung CT scan images collected from Lung Nodule Analysis 2016 (LUNA 2016) competition.
The purpose of this competition, I intend to implement a lung CT segmentation published in a reference paper. In addition to implementing the algorithm in the paper, I will also include and explore other techniques such as applying hole-filling technique to process the data for fully segmenting the lung.

1. J. Heuberger, A. Geissbuhler, H. Muller, "Lung CT segmentation for image retrieval", Medical Imaging and Telemedicine, 2005. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.470.962&rep=rep1&type=pdf
2. LUNA16 Competition. https://luna16.grand-challenge.org/download/

In [1]:
import sys
import time
from LevelSetSegmentation import *
from ConnectedThresholdSegmentation import *
from NeighborhoodConnected import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

### Setup
Setup the objects for different segmentation algorithms. Implementation detail could be found in the corresponding Python module file.

In [8]:
levelSet = LevelSetSegmentation([1,3,5])
connectedThreshold = ConnectedThresholdSegmentation([1,3,5])
print(levelSet.iters)
print(connectedThreshold.iters)

[1, 3, 5]
[1, 3, 5]


### Reference Paper Implementation
Generate segmentation result using connected thresholding and the algorithms in the reference paper.

In [None]:
tempImage = connectedThreshold.preprocess()
connectIters = [1, 2, 3, 4, 5, 10]
connectIterTimes = []
for iteration in connectIters:
    start = time.time()
    connectedThreshold.run(tempImage, iteration)
    end = time.time()
    connectIterTimes.append(end - start)
    print("%d iteration takes %d seconds" % (iteration, end - start))

Write the segmented images and the time it takes with different number of iterations to a file for further analysis.

In [None]:
# with open("data.txt", "w") as f:
#     f.write("ConnectedThreshold\n")
#     f.write(",".join(list(map(str, iterations))) + "\n")
#     f.write(",".join(list(map(str, iterTimes))) + "\n")

Generate the segmentation using level set segmentation

In [None]:
levelSetIters = [500, 800, 1000, 1200, 1500]
levelSetTimes = []
for iteration in levelSetIters:
    start = time.time()
    levelSet.run(iteration)
    end = time.time()
    levelSetTimes.append(end - start)
    print("%d iteration takes %d seconds" % (iteration, end - start))

### Evaluate the Performance of Segmentation
Read in the segmented images and use LabelOverlapMeasuresImageFilter to evaluate the performance of segmentation

In [3]:
# Convert image into binary image
def convert2Binary(path):
    reader = sitk.ImageFileReader()
    reader.SetFileName( path )
    image = reader.Execute()

    binaryFilter = sitk.BinaryThresholdImageFilter()
    binaryFilter.SetUpperThreshold(0)
    binaryFilter.SetInsideValue(0)
    binaryFilter.SetOutsideValue(255)
    image = binaryFilter.Execute(image)
    return image
# Evaluate the segmentation using 
def evaluate(refSeg, image):
    # Set the origin in order to align the two comparing images
    refSeg.SetOrigin(image.GetOrigin())
    overlapMeasureFilter = sitk.LabelOverlapMeasuresImageFilter()
    overlapMeasureFilter.Execute(refSeg, image)
    results = (overlapMeasureFilter.GetJaccardCoefficient(), overlapMeasureFilter.GetDiceCoefficient())
    print(fileName)
    print(results)
    print()
    return results

In [5]:
refSegPath = "seg-lungs-LUNA16/1.3.6.1.4.1.14519.5.2.1.6279.6001.109002525524522225658609808059.mhd"
refSeg = convert2Binary(refSegPath)
# sitk.Show(refSeg)

directory = os.fsencode("ConnectedThreshold")
files = sorted(os.listdir(directory))
Jaccards = []
Dices = []
for file in files:
    fileName = os.fsdecode(file)
    if fileName.endswith("mhd"): 
        reader = sitk.ImageFileReader()
        reader.SetFileName(str(directory, 'utf-8') + "/" + fileName)
        image = reader.Execute()
        overlapMeasure = evaluate(refSeg, image)
        Jaccards.append(overlapMeasure[0])
        Dices.append(overlapMeasure[1])
        

1.mhd
(0.8315639955109462, 0.9080370629135097)

2.mhd
(0.8410021256428483, 0.9136351489536536)

3.mhd
(0.8465970804302524, 0.9169266965731339)

4.mhd
(0.8502037678321495, 0.919037981236324)

5.mhd
(0.8526791457356355, 0.9204822623477696)



In [7]:
measureDF = pd.DataFrame(data = {"JaccardCoefficient" : Jaccards, "DiceCoefficient" : Dices})
print(measureDF)
measureDF.plot()

   DiceCoefficient  JaccardCoefficient
0         0.908037            0.831564
1         0.913635            0.841002
2         0.916927            0.846597
3         0.919038            0.850204
4         0.920482            0.852679


NameError: name 'connectIterTimes' is not defined

### Grid Search for the Optimal Parameters
In this section, I'll be performing grid search, a common technique to find the optimal parameters for models, to find the parameters for the optimal algorithms I found in the cells above.

### Apply the Optimal Parameters
In this section, I will apply the optimal parameters to the remaining images in the dataset and report the performance of the segmentation. (Automatic seeds identification?)