Most of cooler stuff in here is inspired by the following:<br>
https://www.kaggle.com/allunia/pulmonary-fibrosis-dicom-preprocessing
https://www.kaggle.com/ankasor/improved-lung-segmentation-using-watershed
https://www.kaggle.com/arnavkj95/candidate-generation-and-luna16-preprocessing

All the ugly bits are from me :)

I'm also somewhat new to this, so all commentary should be deemed speculative from this moment forward

We're given CT scan's in this competition and are asked to help predict the presence of Pulmonary Embolism (https://en.wikipedia.org/wiki/Pulmonary_embolism)

I thought before the OSIC competition, I was not aware of the stark difference between CT scans and X-Rays. Basically an X-Ray is projection of the density of 3-Dimensional object into 2-D. The CT scan retains this third dimension of the information. In a slice of CT Scan, we can view each item as a pixel, but when they are stacked together to encompass some volume, we refer to it as voxel. Each voxel has a value telling the average mass density of the matter at that particular point.

Did you say density?? It's technically Radio Density, which is a function of both the mass' density & the atomic number of the material in question. 

In this notebook we'll do the following: <br>
1) Briefly explore the data provided in the training dataframe <br>
2) Explore Dicoms <br>
3) Watershed segmentation(!) <br>
4) Another segmentation technique (to be decided) <br>
5) Compare the two techniques
6) Sprinkle in (potentially) incorrect and useless musings :) <br>

There are some things to keep in mind when dealing with CT scans:<br>
1) There does not seem to be a standard across scanners -- some manufacturers have specific settings in their scanners and the people who use them have the capacity to also fiddle with these settings as they see fit (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6115360/) -- this increases the complexity of the problem because as we will see, some scans are (way) bigger than others in terms of slices and the overall thickness of slice.<br>
2) CT scanners are like really expensive, so the amount of CT scanning data floating around and thus the amount of documenation on tackling problems related to them is a bit spare when compared to X-Rays (well, maybe this competition and OSIC will put a dent into that)

In [None]:
!conda install -c conda-forge gdcm -y

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
from pathlib import Path
import matplotlib.pyplot as plt
import pydicom
import cv2
import seaborn as sns
import gdcm
from skimage import measure, segmentation, morphology
from skimage.morphology import disk, opening, closing
from scipy import ndimage

In [None]:
input_path = Path('../input/rsna-str-pulmonary-embolism-detection')
os.listdir(input_path)

In [None]:
train_df = pd.read_csv(input_path/'train.csv')
test_df = pd.read_csv(input_path/'test.csv')
sub_df = pd.read_csv(input_path/'sample_submission.csv')

In [None]:
train_df.shape, test_df.shape, sub_df.shape

So we have 1,790,594 slices available for training and validation<br>
146,853 slices we're predicting on<br>
There are more rows in submission file than slices in the test set -- so be mindful of how you are creating submission file!

In [None]:
train_df.head()

In [None]:
test_df.head()

In [None]:
sub_df.head()

In [None]:
list(train_df.columns)

Not entirely sure what the negative_exam_for_pe (exam level) is indicating yet, will come back around for it

In [None]:
#let's just compute the % of scans with a particular attribute
positive = train_df['pe_present_on_image'].value_counts()[1] / len(train_df['pe_present_on_image'])
print("{0:.2f}% of the training data shows Pulmonary Embolism visually".format(positive * 100))

motion_issue = (train_df['qa_motion'].value_counts()[1] / len(train_df))
print("{0:.2f}% of the scans are noted that motion may have caused issues".format(motion_issue*100))

contrast_issue = (train_df['qa_contrast'].value_counts()[1]) / len(train_df)
print("{0:.2f}% of the scans are noted with contrast issues".format(contrast_issue*100))


left_pe = (train_df['leftsided_pe'].value_counts())
left_pe_pct = left_pe[1] / len(train_df['leftsided_pe'])
print("{0:.2f}% of the scans are noted with PE on left side".format(left_pe_pct*100))


right_pe = (train_df['rightsided_pe'].value_counts())
right_pe_pct = right_pe[1] / len(train_df['rightsided_pe'])
print("{0:.2f}% of the scans are noted with PE on right side".format(right_pe_pct*100))

central_pe = (train_df['central_pe'].value_counts())
central_pe_pct = central_pe[1] / len(train_df['central_pe'])
print("{0:.2f}% of the scans are noted with PE on right side".format(central_pe_pct*100))

chronic_pe = train_df['chronic_pe'].value_counts()
chronic_pe_pct = chronic_pe[1] / len(train_df['chronic_pe'])
print("{0:.2f}% of the scans feature Chronic PE".format(chronic_pe_pct * 100))

acute_and_chronic = train_df['acute_and_chronic_pe'].value_counts()
acute_chr_pct = acute_and_chronic[1] / len(train_df['acute_and_chronic_pe'])
print("{0:.2f}% of PE present are both acute AND Chronic :(".format(chronic_pe_pct * 100))

indeterminate = train_df['indeterminate'].value_counts()
indeterminate_pct = indeterminate[1] / len(train_df['indeterminate'])
print("{0:.2f}% of scans had QA issues".format(chronic_pe_pct * 100))

In [None]:
#no information on this in competition description
train_df['flow_artifact'].value_counts() 

Should we remove the indeterminate ones for modeling?

In [None]:
training_path = input_path/'train'
len(os.listdir(training_path))

Ok so we have 7,729 folders inside of our training folder which represent unique 7,729 scans -- each of which has a specific number of slices. What's the distribution of these slices?

Inside each folder of the training folder there is another folder.... inside of this folder will you find the dicoms....

In [None]:
%%time
scans_per_folder = []
for x in os.listdir(training_path):
    path = Path(str(training_path) + '/' + str(x))
    scans_per_folder.append(len(os.listdir(path)))

In [None]:
len(scans_per_folder), pd.Series(scans_per_folder).unique()

The above two lines just confirm that we only have one folder inside of each folder in the training set

In [None]:
%%time
slices_per_scan = []
for x in os.listdir(training_path):
    path = Path(str(training_path) + '/' + str(x))
    for folder in os.listdir(path):
        scan_path = Path(str(path)+ '/' + str(folder))
        slices_per_scan.append(len(os.listdir(scan_path)))

The annoying folder structure makes it so that an easy to read list comprehension is less feasible to express cleanly

In [None]:
plt.title('Distribution of Slices per Scan')
plt.xlabel('Number of slices')
plt.ylabel('Frequency')
plt.hist(slices_per_scan, bins=50);

Ok so most scans have in the range of 200-300 slices(!) -- each slices is 2-Dimensional array of numbers. So when these slices are stacked up adjacent to one another - we can volumetric information!

Let's create a function that extracts the dicoms from one folder, we'll reverse the ordering because the slices towards the feet are at the start

In [None]:
#returns a list with the dicoms in order
def dcm_sort(scan_path, scan_folder):
    #a list comprehension create distill our exact file paths -- ugh
    dcm_paths = [(str(scan_path) + '/' + file) for file in scan_folder]
    #list comprehension that runs through each slice in the folder
    dcm_stacked = [pydicom.dcmread(dcm) for dcm in dcm_paths]
    dcm_stacked.sort(key=lambda x: int(x.InstanceNumber), reverse=True)
    #returning a python list of dicoms sorted
    return dcm_stacked

In [None]:
scan_path = training_path/'858a11d72ad0/7829612362e8'
scan_folder = os.listdir(scan_path)
print("There are {} slices in the selected scan".format(len(scan_folder)))

In [None]:
%%time
sorted_scan = dcm_sort(scan_path, scan_folder)

In [None]:
sorted_scan[0]

A whole lot of info in one file!

As a side note, this was developed in the 80s and just like almost everything else developed by and for the medical community, it's kind of filled with a ton of garbage :)

The following resource helps interpret some of it: http://dicom.nema.org/medical/dicom/2017d/output/chtml/part03/sect_C.7.6.2.html

(0020, 0013) Instance Number --> IS: "18" Corresponds to the fact that this is the 18th dicom for this patient (are they ordered in a spatial way?)

These two determine voxel size:

(0018, 0050) Slice Thickness --> DS: "5.0" this is expressed in millimeters --> z-axis
(0028, 0030) Pixel Spacing --> DS: [0.683, 0.683] physical distance between center of each pixel. The pair of values indicates adjacent row spacing and adjacent column spacing --> x/y plane

(0020, 1041) Slice Location --> DS: "82.0" relative position of the image plane expressed in millimeters

(0020, 0032) Image Position (Patient) --> DS: [-174.2187, -175.0000, 1773.500] tells the x,y,z coordinates of the upper left hand corner (center of the firxt voxel transmitted) of image in millimeters

We will also take into account the Rescale slope & intercept below in order to convert the pixel values into the Hounsfield scale shortly: https://en.wikipedia.org/wiki/Hounsfield_scale

In [None]:
sorted_scan[0].PixelData[0:100]

Yikes, that's not something we can work with... luckily we can access the pixel information with the following:

In [None]:
sorted_scan[0].pixel_array[0:5]

Take note of that -2000 at all four corners of this slice -- we'll come back to that shortly

We know there are 261 slices in this scan and now that we have them in order, let's pick something near the middle so that we can see more of the lungs(!)

In [None]:
plt.imshow(sorted_scan[100].pixel_array);

That's not very easy to see, change the lens we're interpreting the pixel information with by choosing a better color map

In [None]:
#let's concentrate on the a section of 60 slices in this scan
middle_scan = sorted_scan[80:140]

fig,ax = plt.subplots(4,5, figsize=(12,8))
for n in range(4):
    for m in range(5):
        ax[n,m].imshow(middle_scan[n*5+m].pixel_array, cmap='Blues_r')

In [None]:
#let's take a look at a single slice's pixel distribution
plt.hist(middle_scan[20].pixel_array);

As we mentioned -- there's a ton of pixels at -2000.... why?<br>
The scan represents areas outside of the body as these extremely low values -- let's set them to 0, which represents water

In [None]:
one_slice = middle_scan[20].pixel_array
one_slice[one_slice <= -1000] = 0
plt.imshow(one_slice, cmap='Blues_r');

Ok, that looks better in terms of the border but what's up with that weird thing at the bottom of the image? Is it the surface the patient is laying on? We'll have to ensure the segmentation technique we use can filter this out

Let's take into account the rescale intercept and slope(!)

In [None]:
one_slice = middle_scan[20]
one_slice.RescaleIntercept, one_slice.RescaleSlope

In [None]:
def scan_transformed_hu(dcm_sorted, threshold=-1000, replace=-1000):
    intercept = dcm_sorted[0].RescaleIntercept
    slices_stacked = np.stack([dcm.pixel_array for dcm in dcm_sorted])
    slices_stacked = slices_stacked.astype(float)
    
    #converts the unknown values to desired replacement
    slices_stacked[slices_stacked <= threshold] = replace
    
    #turn into hounsfield scale
    slices_stacked += np.int16(intercept)
    
    return np.array(slices_stacked, dtype=np.int16)

In [None]:
middle_slices_hu = scan_transformed_hu(middle_scan, replace=0)

fig,ax = plt.subplots(12,5, figsize=(20,20))
for n in range(12):
    for m in range(5):
        ax[n,m].imshow(middle_slices_hu[n*5+m], cmap='Blues_r')

Ok these look better but you can still see the difference in contrast between slices

Let's start some segmentation -- the notebooks above list the concise functions of segmentations we'll explore. But we'll try to step through the segmentations one line at a time

In [None]:
test_slice = middle_slices_hu[10]
fig, ax = plt.subplots(1,2, figsize=(12,3))
ax[0].imshow(test_slice)
ax[0].set_title('Slice #90') #middle scan was 80-120 and this is 10th one....
ax[1].set_title('Pixel Distribution of Slice')
ax[1].hist(test_slice);

First we'll want to threshold the image -- this means we want everything below or above a certain condition to be set as true and everything else is set as false -- we know that lung tissue should be around -400 and air is -1000 so let's pick something a bit higher to be safe -- this will represent our internal marker

In [None]:
internal_marker = test_slice < -300
internal_marker[203:207, 203:220] #just to show the discrepeny in middle somewhere

In [None]:
#this represents the region we know definitely features lung tissue
plt.title('prelimary internal marker')
plt.imshow(segmentation.clear_border(internal_marker), cmap='gray');

In [None]:
internal_marker_labels = measure.label(segmentation.clear_border(internal_marker))
plt.imshow(internal_marker_labels, cmap='gray');

In [None]:
#explicating the next list comprehension
measure.regionprops(internal_marker_labels)[0:3]

In [None]:
areas = [x.area for x in measure.regionprops(internal_marker_labels)]
areas.sort()
areas

In [None]:
for region in measure.regionprops(internal_marker_labels):
    if region.area < areas[-2]:
        for coordinates in region.coords:
            internal_marker_labels[coordinates[0], coordinates[1]] = 0

In [None]:
marker_internal = internal_marker_labels > 0
plt.title('Internal marker')
plt.imshow(marker_internal, cmap='gray');

If you look closely this is NOT the same as the preliminary marker from above -- specifically those tiny white spots below the lungs are no longer visible

Now we want to generate the external marker - which is the area we know is outside our region of interest. <br>

How is this done??<br> A morphological dilation of the internal marker, with two iterations done and then we'll find the difference<br>

What does that mean?<br>
Let's take a look first

In [None]:
external_a = ndimage.binary_dilation(marker_internal, iterations=10)
external_b = ndimage.binary_dilation(marker_internal, iterations=50)
marker_external = external_b ^ external_a
#since they're set to binary values - finding sum will tell you how much white
external_a.sum(), external_b.sum() 

In [None]:
fig, ax = plt.subplots(1, 3, figsize=(20,8))
ax[0].imshow(external_a, cmap='gray')
ax[0].set_title('dilation of internal marker - 10 iterations')
ax[1].imshow(external_b, cmap='gray')
ax[1].set_title('dilation of internal marker - 50 iterations')
ax[2].set_title('external marker')
ax[2].imshow(marker_external, cmap='gray');

Ok, conceptually -- what just happened?

From the internal marker that we set -- we used the dilation operation to gradually enlarge the boundaries of regions of forground pixels (white pixels).<br>

1) Areas of foreground pixels grows in size -- so the boundary of the lungs are expanding<br>

2) Holes within those regions become smaller --> notice how in the first iteration you already no longer see all those pockets of black inside of the lungs<br>

3) by taking the difference, you find an outline that is beyond the lungs themselves to ensure that you are only pulling that regions

This was originally implemented on a dataset/competition where you are looking for nodules in the lungs

Which would end up on the lining of the lungs themselves, so they wanted to ensure they captured that boundary point fully -- unsure if that's needed here


In [None]:
watershed_marker = np.zeros((512, 512), dtype=np.int)
watershed_marker += marker_internal * 255 #high intensity
watershed_marker += marker_external * 128 #medium intensity

The above items do the following:<br>
1) initialze an empty array<br>
2) set values known to be lungs as the highest intensity<br>
3) set values of the external marker to be of medium intensity<br>

In [None]:
plt.title('Watershed marker!')
plt.imshow(watershed_marker, cmap='gray');

Now we'll be using Sobel Kernels (https://en.wikipedia.org/wiki/Sobel_operator) which is just two convolution kernels with set weights that we use to compute over the image to determine edges in both the x and y plane

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20,8))
ax[0].imshow(ndimage.sobel(test_slice, 0), cmap='gray')
ax[0].set_title('vertical edges')
ax[1].imshow(ndimage.sobel(test_slice, 1), cmap='gray')
ax[1].set_title('horizontal edges');

In [None]:
x_edges = ndimage.sobel(test_slice, 1)
y_edges = ndimage.sobel(test_slice, 0)
sobel_grad = np.hypot(x_edges, y_edges)
sobel_grad *= 255.0 / np.max(sobel_grad)
plt.title('sobel gradient')
plt.imshow(sobel_grad, cmap='gray');

So we used sobel algorithm to find edges in our image and now we'll take the output of this and apply the watershed segmentation algorithm:
https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_watershed.html

The algorithm takes the watershed markers we found earlier -- it then treats the image topographically: pixel values are viewed as having a certain elevation based on their intensity. It floors basins from these markers until a basin from another make contact

In [None]:
img_watershed = segmentation.watershed(test_slice, watershed_marker)
watershed = segmentation.watershed(sobel_grad, watershed_marker)

In [None]:
fig, ax = plt.subplots(1,2, figsize=(20,8))
ax[0].imshow(img_watershed, cmap='gray')
ax[0].set_title('watershed seg w/ original img')
ax[1].set_title('watershed seg w/ sobel gradient')
ax[1].imshow(watershed, cmap='gray');

We can see the output via the sobel gradient image gives us smoother edges of the lungs vs the more jagged texture as seen with the original image

Next we reduce what we have to an outline!
This is done with the morphological_gradient fxn in scipy, which seems to take a morphological dilation and a morphological erosion of the input and then finds the difference of the two: https://scipy.github.io/devdocs/generated/scipy.ndimage.morphological_gradient.html#scipy.ndimage.morphological_gradient

Based on that link from earlier -- the erosion takes the binary image and erodes away at the pixels in the boundary regions -- shrinking the foreground in size. Since the dilation is doing the opposite, by finding the difference between the two we can outline the lungs :)

In [None]:
#let's try out different kernel sizes :)
fig, ax = plt.subplots(1, 3, figsize=(20,8))
ax[0].imshow(ndimage.morphological_gradient(watershed, size=(2,2)))
ax[0].set_title('outline derived from 2x2 kernel')
ax[1].imshow(ndimage.morphological_gradient(watershed, size=(3,3)))
ax[1].set_title('outline derived from 3x3 kernel')
ax[2].set_title('outline derived from 7x7 kernel')
ax[2].imshow(ndimage.morphological_gradient(watershed, size=(7, 7)));

We definitely don't want to select the 2x2 kernel -- it can't even trace the bottom right section of this slice -- the other implementations of this all have a 3x3 kernel, but it might be interesting to see what happens if we were to use a larger kernel -- that outline is bold but perhaps unneccessary

In [None]:
outline = ndimage.morphological_gradient(watershed, size=(3,3))

Now onto black (top) hat morphology..... the documentation for scipy provides no explanation. openCV documentation to the rescue :) <br>
https://docs.opencv.org/trunk/d9/d61/tutorial_py_morphological_ops.html <br>
Basically it's the difference between closing of the input image and the input image. What does closing mean?<br>

Closing is a dilation followed by an erosion -- it allows you to close small holes inside the foreground objects.<br>

Dilation: you use a kernel still and a pixel element is 1 if at least ONE pixel under the kernel is 1. Meaning, it increases the white region in the image.<br>

Erosion: a pixel is consdiered a 1 only if all the pixels under that kernel are 1 -- otherwise it's eroded.

In [None]:
#openCV has kernel fxns - not so with scipy, hmm
blackhat_struct = [[0, 0, 1, 1, 1, 0, 0],
                       [0, 1, 1, 1, 1, 1, 0],
                       [1, 1, 1, 1, 1, 1, 1],
                       [1, 1, 1, 1, 1, 1, 1],
                       [1, 1, 1, 1, 1, 1, 1],
                       [0, 1, 1, 1, 1, 1, 0],
                       [0, 0, 1, 1, 1, 0, 0]]

blackhat_kernel = ndimage.iterate_structure(blackhat_struct, 8)

In [None]:
blackhat_outline = outline + ndimage.black_tophat(outline,
                                structure=blackhat_kernel)

plt.title('Blackhat outline')
plt.imshow(blackhat_outline, cmap='gray');

In [None]:
lung_filter = np.bitwise_or(marker_internal, blackhat_outline)
plt.title('lung filter')
plt.imshow(lung_filter, cmap='gray');

In [None]:
lung_filter = ndimage.morphology.binary_closing(lung_filter,
                structure=np.ones((5,5)), iterations=3)
plt.title('Lung Filter via internal marker and blackhat outline')
plt.imshow(lung_filter, cmap='gray');

In [None]:
#where you see a 1 in lung filter - put the actual pixel value
#everywhere else include -2000
plt.title('Our segmented slice!!')
plt.imshow(np.where(lung_filter == 1, test_slice, -2000), cmap='gray');

That looks pretty good! Of course it isn't exactly lining up along the border of lungs, but that's totally fine for the high level purposes we're trying for here. 

Ok let's wrap all of that up into a function that operates over each slice in a scan one by one -- of just applies it to a single slice if that's all that's fed to it

No commentary below - if confused, check the cells ran above :)

In [None]:
def gen_internal_marker(slices_s, threshold= -300):
    internal_marker = slices_s < threshold
    internal_marker_labels = measure.label(segmentation.clear_border(internal_marker))
    areas = [x.area for x in measure.regionprops(internal_marker_labels)]
    areas.sort()
    for region in measure.regionprops(internal_marker_labels):
        if region.area < areas[-2]:
            for coordinates in region.coords:
                internal_marker_labels[coordinates[0], coordinates[1]] = 0
    marker_internal = internal_marker_labels > 0
                
    return marker_internal

def gen_external_marker(internal_marker, iter_1 = 10, iter_2 = 50):
    external_a = ndimage.binary_dilation(internal_marker, 
                                         iterations=iter_1)
    external_b = ndimage.binary_dilation(internal_marker, 
                                         iterations=iter_2)
    external_marker = external_b ^ external_a
    return external_marker

def gen_watershed_marker(internal_marker, external_marker):
    watershed_marker = np.zeros((512, 512), dtype=np.int)
    watershed_marker += internal_marker * 255
    watershed_marker += external_marker * 128
    return watershed_marker

def gen_sobel_grad(one_slice):
    x_edges = ndimage.sobel(one_slice, 1)
    y_edges = ndimage.sobel(one_slice, 0)
    sobel_grad = np.hypot(x_edges, y_edges)
    sobel_grad *= 255.0 / np.max(sobel_grad)
    return sobel_grad

def gen_blackhat_outline(watershed, blackhat_struct, b_hat_iters=1):
    outline = ndimage.morphological_gradient(watershed, size=(3,3))
    blackhat_kernel = ndimage.iterate_structure(blackhat_struct, 
                                               b_hat_iters)
    blackhat_outline = outline + ndimage.black_tophat(outline, 
                            structure=blackhat_kernel)
    return blackhat_outline


def gen_lung_filter(internal_marker, blackhat_outline,
                   kernel_size=(5,5), iterations=3):
    pre_filter = np.bitwise_or(internal_marker, blackhat_outline)
    lung_filter = ndimage.morphology.binary_closing(pre_filter,
                            structure=np.ones(kernel_size),
                            iterations=iterations)
    return lung_filter


def watershed_seg(slice_s, blackhat_struct, threshold=-350,
                  b_hat_iters=1, iter_1=10, iter_2=50):
    
    scan = [] #initialize an empty list
    for one_slice in slice_s:
        internal_marker = gen_internal_marker(one_slice)

        external_marker = gen_external_marker(internal_marker)
        
        watershed_marker = gen_watershed_marker(internal_marker,
                                               external_marker)
        
        sobel_grad = gen_sobel_grad(one_slice)
       
        watershed = segmentation.watershed(sobel_grad, 
                                           watershed_marker)
        
        blackhat_outline = gen_blackhat_outline(watershed,
                                blackhat_struct, b_hat_iters)
        
        lung_filter = gen_lung_filter(internal_marker,
                                      blackhat_outline)
        
        segmented_slice = np.where(lung_filter == 1, one_slice, -2000)
        scan.append(segmented_slice)
        
    return np.array(scan)

In [None]:
#just a reminder of the shape of the middle section of the scan we pulled earlier
middle_slices_hu.shape

Let's look at the differences between 1 and 6 iterations of the blackhat fxn -- we'll consider both the time and segmentation results

In [None]:
%%time
segmented_scan_1 = watershed_seg(middle_slices_hu, blackhat_struct,
                                b_hat_iters=1)

In [None]:
%%time
segmented_scan_6 = watershed_seg(middle_slices_hu, blackhat_struct,
                              b_hat_iters=6)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20,8))
ax[0].imshow(segmented_scan_1[32], cmap='Blues_r')
ax[1].imshow(segmented_scan_6[32], cmap='Blues_r');

7,279 scans in train folder * 112 seconds means this would take 226 hours to segment all that with 6 iterations, that's not feasible. At one iteration it'll still take just over a whole day on a kaggle kernel to complete(!) 

The number of iterations done in the blackhat function expands the segmentation mask a bit but at a cost of over 9 times the time it takes for a single iteration.

In [None]:
fig,ax = plt.subplots(12,5, figsize=(20,20))
for n in range(12):
    for m in range(5):
        ax[n,m].imshow(segmented_scan_1[n*5+m], cmap='Blues_r')

A note to keep in mind. We did this on the middle 60 slices of the scan above, had we chosed one of the extremes, towards either the feet or head -- we would end up not seeing much or any lung at all and the segmentation technique would might end up picking up things we don't want, like the table/surface that the patient is laying on. This is NOT ideal and probably something we would want to select against. However in this competition, the segmentation seems to work better than in the OSIC one, not sure why. 

**To-Do List:**<br>
1) implement another segmentation method - most likely the one suggested by @[allunia](https://www.kaggle.com/allunia) in her awesome OSIC notebook!

2) compare the differences in the segmentations