# UPDATE:

The notebook has been updated and **works much faster (8x less processing time).**

### Each slice now takes approx. 120-220 ms, compared to the 2.5 - 5 seconds processing time.

## Update 2: I've been working on finding lung volumes and chest circumferences, I'll keep you updated once it's done!

# Lung Segmentation using Marker-Controlled Watershed Transformation

Previously-suggested segmentation methods use image thresholding based on HU (Hounsfield Value) or other methods which are susceptible for picking up regions which aren't of our interest. **Watershed Transform** is a really powerful segmentation algorithm which is based on [watersheds](https://science.howstuffworks.com/environmental/conservation/issues/watershed1.htm) where we think the image as a surface.


![Blobs](https://www.mathworks.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation/_jcr_content/mainParsys/image_1.adapt.1200.high.jpg/1542750811892.jpg)![Catchment Basins](https://www.mathworks.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation/_jcr_content/mainParsys/image_2.adapt.1200.high.gif/1542750811908.gif)

## Watershed Transformation
The basic idea behind watershed segmentation is that any grayscale can be considered as a topographic surface.
If we flood the surface from its minima, and successfully prevent merging of waters, we partition the image into two different sets: the catchment basins and the watershed lines.

![Watershed](http://www.cmm.mines-paristech.fr/~beucher/lpe1.gif)![Final Watersheds](http://www.cmm.mines-paristech.fr/~beucher/ima3.gif)

> Image Source: [CMM](http://www.cmm.mines-paristech.fr/~beucher/wtshed.html)

We'll be using `pydicom` for dealing with the scans, feel free to use any available library:

In [None]:
!pip install pydicom

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import pydicom
import os
import scipy.ndimage as ndimage
from skimage import measure, morphology, segmentation
import matplotlib.pyplot as plt

import time

Let's load the patients' data:

In [None]:
INPUT_FOLDER = '/kaggle/input/osic-pulmonary-fibrosis-progression/train/'

patients = os.listdir(INPUT_FOLDER)
patients.sort()

print("Some examples of patient IDs:")
print(",\n".join(patients[:5]))

Now let's load the scans:

This code for loading the scans is based on Franklin Heng's [Medium article.](https://medium.com/@hengloose/a-comprehensive-starter-guide-to-visualizing-and-analyzing-dicom-images-in-python-7a8430fcb7ed)

In [None]:
def load_scan(path):
    """
    Loads scans from a folder and into a list.
    
    Parameters: path (Folder path)
    
    Returns: slices (List of slices)
    """
    
    slices = [pydicom.read_file(path + '/' + s) for s in os.listdir(path)]
    slices.sort(key = lambda x: int(x.InstanceNumber))
    
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
        
    for s in slices:
        s.SliceThickness = slice_thickness
        
    return slices

## Hounsfield Units

The unit of measurement in CT scans is the Hounsfield Unit (HU), which is a measure of radiodensity. 

**Hounsfield units (HU)** are a dimensionless unit universally used in computed tomography (CT) scanning to express CT numbers in a standardized and convenient form. Hounsfield units are obtained from a linear transformation of the measured attenuation coefficients.

![HU Table](http://patentimages.storage.googleapis.com/WO2005055806A2/imgf000011_0001.png)

HUs can be calculated from the pixel data with a DICOM Image using the following formula:

$\large HU = m*P + b$

where,

$m$ = `RescaleSlope` attribute of the DICOM image,

$b$ = `RescaleIntercept` attribute of the DICOM image,

$P$ = Pixel Array

In [None]:
def get_pixels_hu(scans):
    """
    Converts raw images to Hounsfield Units (HU).
    
    Parameters: scans (Raw images)
    
    Returns: image (NumPy array)
    """
    
    image = np.stack([s.pixel_array for s in scans])
    image = image.astype(np.int16)

    # Since the scanning equipment is cylindrical in nature and image output is square,
    # we set the out-of-scan pixels to 0
    image[image == -2000] = 0
    
    
    # HU = m*P + b
    intercept = scans[0].RescaleIntercept
    slope = scans[0].RescaleSlope
    
    if slope != 1:
        image = slope * image.astype(np.float64)
        image = image.astype(np.int16)
        
    image += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)

Let's store store the slices and the images:

In [None]:
test_patient_scans = load_scan(INPUT_FOLDER + patients[24])
test_patient_images = get_pixels_hu(test_patient_scans)

We'll be taking a random slice to perform segmentation:

In [None]:
plt.imshow(test_patient_images[12], cmap='gray')
plt.title("Original Slice")
plt.show()

# Marker-Controlled Watershed Transformation

**Watershed Transform** is a really powerful segmentation algorithm, but has a drawback:

- **Over Segmentation:** Oversegmentation occurs because every regional minimum forms its own catchment basin. Here is an example where steel grains are over-segmented by watershed transformation:

![Steel Grains](https://www.mathworks.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation/_jcr_content/mainParsys/image_9.adapt.1200.high.gif/1542750812181.gif)![Oversegmented](https://www.mathworks.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation/_jcr_content/mainParsys/image_10.adapt.1200.high.gif/1542750812206.gif)

> **Left:** Steel Grains, **Right:** Oversegmented image as a result of using normal watershed transformation.

To overcome this drawback, we use a marker-controlled watershed transformation, where we manually create markers where we start the flooding process.

## About the Algorithm:

The image is seen as a topographical surface where grey values are deemed as elevation of the surface at that location. Then, flooding process starts in which water effuses out of the minimum grey value or the marker. When flooding across two minimum converges then a dam is built to identify the boundary across them.


![Markers](http://www.cmm.mines-paristech.fr/~beucher/ima4.gif)
![Flood](http://www.cmm.mines-paristech.fr/~beucher/lpe2.gif)

## Marker Generation:

For using marker-controlled watershed segmentation, we'll need to identify markers. Internal marker, which is our region of interest, i.e lung tissue and an external marker, which is the region outside of our region of interest.

We create the external marker is created by morphological dilation of the internal marker, by iterating twice and subtracting the results. The watershed marker is created by superimposing both the markers.

Some of the code is based from @arnavkj95's kernel: https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing

In [None]:
def generate_markers(image):
    """
    Generates markers for a given image.
    
    Parameters: image
    
    Returns: Internal Marker, External Marker, Watershed Marker
    """
    
    #Creation of the internal Marker
    marker_internal = image < -400
    marker_internal = segmentation.clear_border(marker_internal)
    marker_internal_labels = measure.label(marker_internal)
    
    areas = [r.area for r in measure.regionprops(marker_internal_labels)]
    areas.sort()
    
    if len(areas) > 2:
        for region in measure.regionprops(marker_internal_labels):
            if region.area < areas[-2]:
                for coordinates in region.coords:                
                       marker_internal_labels[coordinates[0], coordinates[1]] = 0
    
    marker_internal = marker_internal_labels > 0
    
    # Creation of the External Marker
    external_a = ndimage.binary_dilation(marker_internal, iterations=10)
    external_b = ndimage.binary_dilation(marker_internal, iterations=55)
    marker_external = external_b ^ external_a
    
    # Creation of the Watershed Marker
    marker_watershed = np.zeros((512, 512), dtype=np.int)
    marker_watershed += marker_internal * 255
    marker_watershed += marker_external * 128
    
    return marker_internal, marker_external, marker_watershed

Let's get our markers for the sample slice:

In [None]:
test_patient_internal, test_patient_external, test_patient_watershed = generate_markers(test_patient_images[12])

f, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True, figsize=(15,15))

ax1.imshow(test_patient_internal, cmap='gray')
ax1.set_title("Internal Marker")
ax1.axis('off')

ax2.imshow(test_patient_external, cmap='gray')
ax2.set_title("External Marker")
ax2.axis('off')

ax3.imshow(test_patient_watershed, cmap='gray')
ax3.set_title("Watershed Marker")
ax3.axis('off')

plt.show()

## Sobel Gradient and Edge Outlining

The Sobel operator performs a 2D spatial gradient measurement on an image and so emphasizes regions of high spatial frequency that correspond to edges.

It consists of a pair of 3Ã—3 convolution kernels.

![Conv Filters](http://homepages.inf.ed.ac.uk/rbf/HIPR2/figs/sobmasks.gif)

These kernels can then be combined together to find the absolute magnitude of the gradient at each point and the orientation of that gradient.

The gradient magnitude is given by:
$G = sqrt(Gx^2 + Gy^2)$

Sobel gradient can be calculated by `scipy.ndimage`.

In [None]:
# Lists to store computation times and iterations
computation_times = []
iteration_titles = []

In [None]:
def seperate_lungs(image, iterations = 1):
    """
    Segments lungs using various techniques.
    
    Parameters: image (Scan image), iterations (more iterations, more accurate mask)
    
    Returns: 
        - Segmented Lung
        - Lung Filter
        - Outline Lung
        - Watershed Lung
        - Sobel Gradient
    """
    
    # Store the start time
    start = time.time()
    
    marker_internal, marker_external, marker_watershed = generate_markers(image)
    
    
    '''
    Creation of Sobel Gradient
    '''
    
    # Sobel-Gradient
    sobel_filtered_dx = ndimage.sobel(image, 1)
    sobel_filtered_dy = ndimage.sobel(image, 0)
    sobel_gradient = np.hypot(sobel_filtered_dx, sobel_filtered_dy)
    sobel_gradient *= 255.0 / np.max(sobel_gradient)
    
    
    '''
    Using the watershed algorithm
    
    
    We pass the image convoluted by sobel operator and the watershed marker
    to morphology.watershed and get a matrix matrix labeled using the 
    watershed segmentation algorithm.
    '''
    watershed = morphology.watershed(sobel_gradient, marker_watershed)
    
    '''
    Reducing the image to outlines after Watershed algorithm
    '''
    outline = ndimage.morphological_gradient(watershed, size=(3,3))
    outline = outline.astype(bool)
    
    
    '''
    Black Top-hat Morphology:
    
    The black top hat of an image is defined as its morphological closing
    minus the original image. This operation returns the dark spots of the
    image that are smaller than the structuring element. Note that dark 
    spots in the original image are bright spots after the black top hat.
    '''
    
    # Structuring element used for the filter
    blackhat_struct = [[0, 0, 1, 1, 1, 0, 0],
                       [0, 1, 1, 1, 1, 1, 0],
                       [1, 1, 1, 1, 1, 1, 1],
                       [1, 1, 1, 1, 1, 1, 1],
                       [1, 1, 1, 1, 1, 1, 1],
                       [0, 1, 1, 1, 1, 1, 0],
                       [0, 0, 1, 1, 1, 0, 0]]
    
    blackhat_struct = ndimage.iterate_structure(blackhat_struct, iterations)
    
    # Perform Black Top-hat filter
    outline += ndimage.black_tophat(outline, structure=blackhat_struct)
    
    '''
    Generate lung filter using internal marker and outline.
    '''
    lungfilter = np.bitwise_or(marker_internal, outline)
    lungfilter = ndimage.morphology.binary_closing(lungfilter, structure=np.ones((5,5)), iterations=3)
    
    '''
    Segment lung using lungfilter and the image.
    '''
    segmented = np.where(lungfilter == 1, image, -2000*np.ones((512, 512)))
    
    # Append computation time
    end = time.time()
    computation_times.append(end - start)
    iteration_titles.append("{num} iterations".format(num = iterations))
    
    
    return segmented, lungfilter, outline, watershed, sobel_gradient

## Comparison of iterations with time

We'll be checking for iterations in the range of 1-8. `iterations = 1` is the default for the `seperate_lungs` function.

In [None]:
for itrs in range(1, 9):
    test_segmented, test_lungfilter, test_outline, test_watershed, test_sobel_gradient = seperate_lungs(test_patient_images[12], itrs)

In [None]:
itr_dict = {'Iterations' : iteration_titles, 'Computation Times (in seconds)': computation_times}

colors = ['#30336b',] * 8
colors[0] = '#eb4d4b'

import plotly.express as px
import plotly.graph_objects as go


fig = go.Figure(data=[go.Bar(
    x=itr_dict['Iterations'],
    y=itr_dict['Computation Times (in seconds)'],
    marker_color = colors
)])
fig.update_traces(texttemplate='%{y:.3s}', textposition='outside')


fig.update_layout(
    title = 'Iterations vs Computation Times',
    yaxis=dict(
        title='Computation Times (in seconds)',
        titlefont_size=16,
        tickfont_size=14,
    ),
    autosize=False,
    width=800,
    height=800)

fig.show()


In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize = (12, 12))

ax1.imshow(test_sobel_gradient, cmap='gray')
ax1.set_title("Sobel Gradient")
ax1.axis('off')

ax2.imshow(test_watershed, cmap='gray')
ax2.set_title("Watershed")
ax2.axis('off')

plt.show()

In [None]:
f, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True, figsize = (12, 12))

ax1.imshow(test_outline, cmap='gray')
ax1.set_title("Lung Outline")
ax1.axis('off')

ax2.imshow(test_lungfilter, cmap='gray')
ax2.set_title("Lung filter")
ax2.axis('off')

ax3.imshow(test_segmented, cmap='gray')
ax3.set_title("Segmented Lung")
ax3.axis('off')

plt.show()

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize = (12, 12))

ax1.imshow(test_patient_images[12], cmap='gray')
ax1.set_title("Original Lung")
ax1.axis('off')

ax2.imshow(test_segmented, cmap='gray')
ax2.set_title("Segmented Lung")
ax2.axis('off')

plt.show()

# Conclusion:

In this kernel, we discussed about:

- Idea behind Watershed Transformation
- Loading Scans
- Getting HU values from pixel data
- Drawback of Watershed Transforamtion
- Moving to Marker-Controlled Watershed Transformation
- Creating Internal, External and Watershed Markers
- Using Sobel Gradient
- Performing Black Top-hat Morphology
- Getting the segmented lung images

I think that there's still a long way to go in this competition. These segmented lung images can be used for determination of lung volume or for any other purpose.

Watershed transformation is comparatively better than algorithms. We use a marker-based approach which reduces the chances of over-segmentation and preserves the original lung border very accurately.