IN THIS PART OF THE PROJECT, WE SHALL READ AND EXPLORE THE DATA MORE TO GAIN MORE UNDERSTANDING OF OUR DATA(CT IMAGES)

In [None]:
# here we are importing the required libraries for our data exploration:
import os
import glob
import pandas  as pd
import numpy   as np
import nibabel as nib
import matplotlib.pyplot as plt
import tensorflow as tf

In [None]:
# reading the metadata file:
data = pd.read_csv('../input/covid19-ct-scans/metadata.csv')
data.head()

OBSERVATIONS:

By reading the above metadata file we are able to observe that we have four different types of images:

* Original Lung CT.
* Lung Mask.
* Covid Infection Mask.
* Lung Mask and Covid Infection mask.

In [None]:
# printing the shape of our data:
print(data.shape)

OBSERVATIONS:
* We can observe that the data we have 20 CT scans.
* We have 4 different types of images.

In [None]:
# Let us first create a function to read our data:
# we shall create a method to read the data which is in NifTi format:
# we shall then convert the data to an array:

def read_nii(filepath):
    """
        Method Name: read_nii
        
        Description: This method will read the data/CT-scans which are in the NifTi(.nii) format using the nibabel library
        
        Input Description: data: Images/ CT-scans available in the Nifti(.nii) format.
                           filepath: the location at which our images are located.
                           
        Output: The output of this method will be volumetric CT-scans of COVID-infected/ non COVID-infected original CT, COVID masks,
                lung masks, lung and infection masks.

        Written By:
        Version: 1.0
        Revisions: None

    """
    ct_scan = nib.load(filepath)
    array   = ct_scan.get_fdata()
    array   = np.array(array)
    return(array)

In [None]:
# In this part of the code, we shall read the first CT scan and view the image

# reading the first CT scan
sample_ct   = read_nii(data.loc[0,'ct_scan'])

# printing the shape of our data
print("The shape of our data is:",sample_ct.shape)
print()
print()
print()
print()

# picking a random slice of the data to be printed
plt.imshow(sample_ct[..., 121], cmap = 'bone')

OBSERVATIONS:
* After reading our data, we were able to observe that our images are rotated by 90 degrees
* We shall make a change in our read_nii function to re orient our data.

In [None]:
# editing the original 'read_nii' function to re orient our data:
def read_nii_rotated(filepath):
    """
        Method Name: read_nii
        
        Description: This method will read the re oriented data/CT-scans which are in the NifTi(.nii) format using the nibabel library
        
        Input Description: data: Images/ CT-scans available in the Nifti(.nii) format.
                           filepath: the location at which our images are located.
                           
        Output: The output of this method will be volumetric CT-scans of COVID-infected/ non COVID-infected original CT, COVID masks,
                lung masks, lung and infection masks.

        Written By:
        Version: 1.0
        Revisions: None

    """
    ct_scan = nib.load(filepath)
    array   = ct_scan.get_fdata()
    # command to re orient our data:
    array   = np.rot90(np.array(array))
    return(array)

In [None]:
# reading the data and confirming if the data has been rotated.
sample_ct_rot   = read_nii_rotated(data.loc[0,'ct_scan'])
plt.imshow(sample_ct_rot[..., 121], cmap = 'bone')

OBSERVATIONS:
* We are able to observe that the data has been re oriented 
* Now that we have oriented data, let us take a look at the sample images of ct_scan, lung_mask, infection_mask, lung_and_infection_mask

In [None]:
# reading samples of ct_scan, lung_mask, infection_mask, lung_and_infection_mask:
# here, we are only considering the first CT scan
# similarly, if we want to view other CT's we can replace the '0' in the below code with the required CT number(eg. 1,2,3).

# reading CT
sample_ct_rot   = read_nii_rotated(data.loc[0,'ct_scan'])
# reading lung mask
sample_lung_rot = read_nii_rotated(data.loc[0,'lung_mask'])
# reading infected mask
sample_infe_rot = read_nii_rotated(data.loc[0,'infection_mask'])
# reading lung and infected mask
sample_all_rot  = read_nii_rotated(data.loc[0,'lung_and_infection_mask'])

In [None]:
# we have read the sample images above.
# now let us create a method/function to read the image slices from the sample CT.

def plot_sample(array_list, color_map = 'nipy_spectral'):
    
    """
        Method Name: plot_sample
        
        Description: This method will be used plot the sample images by specifying the image slice.
        
        Input Description: data: the input array of images.

        Output: Images of original lung CT, Lung Mask, Infection Mask, Lung and Infection Mask.

        Written By: 
        Version: 1.0
        Revisions: None

    """

    fig = plt.figure(figsize=(18,15))

    plt.subplot(1,4,1)
    plt.imshow(array_list[0], cmap='bone')
    plt.title('Original CT Image')

    plt.subplot(1,4,2)
    plt.imshow(array_list[0], cmap='bone')
    plt.imshow(array_list[1], alpha=0.5, cmap=color_map)
    plt.title('Lung Masks')

    plt.subplot(1,4,3)
    plt.imshow(array_list[0], cmap='bone')
    plt.imshow(array_list[2], alpha=0.5, cmap=color_map)
    plt.title('Infection Mask')

    plt.subplot(1,4,4)
    plt.imshow(array_list[0], cmap='bone')
    plt.imshow(array_list[3], alpha=0.5, cmap=color_map)
    plt.title('Lung and Infection Mask')

    plt.show()

In [None]:
# plotting the sample images:
plot_sample([sample_ct_rot[...,111], sample_lung_rot[...,111], sample_infe_rot[...,111], sample_all_rot[...,111]])

OBSERVATIONS:
* We are able to observe the Original Lung CT in the first image
* In the secong image, we can observe the Lung Masks which are marked in white and green color.
* In the third image we are see the infection masks which are marked in white.
* The final image shows us a combination of lung and infection mask.

* In the above images we have Original Lung CT, Lung masks, Infection Masks, Lung and Infection Masks.
* The aim of our project is to segment the COVID infections from the Lung CT.
* Hence, lets just consider the original Lung CT and Infection Masks for further analysis.

In [None]:
# Here we are extracting only the Original CT and the Lung Infections:
# This we are doing only for the 1st CT.
# We can select any CT of the 20 by replacing the '0' in the below code with the required CT number(eg. 1,2,3 etc.)

#CT
sample_scan_rot1 = read_nii_rotated(data.loc[0,'ct_scan'])

#INFECTION
sample_infection_rot1 = read_nii_rotated(data.loc[0,'infection_mask'])


In [None]:
# We will need to modify the 'plot_sample' function to 'plot_sample_mod' by specifying the required images to be printed

def plot_sample_mod(array_list, color_map = 'nipy_spectral'):
    
    """
        Method Name: plot_sample_mod
        
        Description: This method will be used plot the sample images by specifying the image slice.
        
        Input Description: data: the input array of images.

        Output: Images of original lung CT, Lung Mask, Infection Mask, Lung and Infection Mask.

        Written By: 
        Version: 1.0
        Revisions: None

    """

    fig = plt.figure(figsize=(18,15))

    plt.subplot(1,2,1)
    plt.imshow(array_list[0], cmap='bone')
    plt.title('Original CT Image')

    plt.subplot(1,2,2)
    plt.imshow(array_list[0], cmap='bone')
    plt.imshow(array_list[1], alpha=0.5, cmap=color_map)
    plt.title('Infection Mask')

    plt.show()

In [None]:
# creating a function to plot multiple image slice samples:

def multi_plot_sample(array_list, index_list):
    
    """
        Method Name: multi_plot_sample
        
        Description: This method will be used plot the multiple sample images by specifying the image slice.
        
        Input Description: data: the input array of images.

        Output: Images of original lung CT and Infection Mask.

        Written By:
        Version: 1.0
        Revisions: None

    """

    for index_value in index_list:
        plot_sample_mod([array_list[0][...,index_value], 
                         array_list[1][...,index_value]])

In [None]:
# plotting the multiple plot samples for the Original Lung CT and the Infection CT.

multi_plot_sample([sample_scan_rot1,sample_infection_rot1], index_list=[100,110,120,130,140,150, 160])

OBSERVATIONS:

* The plots for the Lung CT and the Infection Masks have been shown above.
* We can observe that the infections in the infection mask images have been indicated by white color.
* We have plotted the Original Lung and Infection masks only for the first CT.
* Similarly we can plot them for the remaining 19 CT's as well by specifying the required CT.

In [None]:
# Now that we have checked the data, let us take a look at the slices in each of the CT and Infection Mask.

for i in range(20):
    ct_scan = read_nii_rotated(data.loc[i,'ct_scan'])
    infection_mask = read_nii_rotated(data.loc[i,'infection_mask'])
    print(ct_scan.shape)
    print(infection_mask.shape)
    print("-"*40)

OBSERVATIONS:

* We can observe the image slice size and the number of CT slices in each of the CT's.
* The images slice sizes are varying which means that we will need to bring the images down to a standard size.
* Also, the number of slices in each of the CT's vary, this may have and implication onto which segmentation model we use(2D or 3D).

HOUNDSFIELD UNIT:

* We are aware that for 2D images, the unit of measurment is Pixels.
* Similarly for 3D images, the data is measured in Voxels or Volumetric Pixels.
* A unit to measure the range of this Voxels is Houndsfield Units(HU), which lie in range of -1000 to +2000.
* But, for medical images, this HU range should be in range of -1000 to about +400 to avoid getting bones in the images.
* Hence we shall check whether data is in the required range so that it can be used for futher processing and prediction.


In [None]:
# Plotting to know our HU range:

# This we are doing only for the 1st CT.
# We can select any CT of the 20 by replacing the '0' in the below code with the required CT number(eg. 1,2,3 etc.)


def HU_plot():
    """
        Method Name: HU_plot
        
        Description: This method will be used to HU units of multiple sample images by specifying the image slice.
        
        Input Description: data: the input array of images.

        Output: HU plot for the specified slice.

        Written By:
        Version: 1.0
        Revisions: None

    """
    # creating a list to call our image slices at random:
    img_index = [10,50,100,120,170,210,260,300]
    
    for image in range(len(img_index)):
        img_to_process = sample_ct_rot[...,image]
        plt.hist(img_to_process.flatten(), bins = 40)
        plt.xlabel("Hounsfield Units(HU)")
        plt.ylabel("Frequency")
        plt.show()

In [None]:
# Plotting the HU plot:

HU_plot()

* If we observe the plots of the HU, we can observe that for the sample of images we have considered, HU lies within the permissible range.
* We don't need to transform the data to bring the HU within the range.
* This is all the we shall cover in our data analysis part.

CONCLUSIONS:

* The original data was oriented such that we couln't view the data properly, hence, we re-oriented our data.
* We have 20 scans of Original CT, Lung Mask, Infection Mask and Lung and Infection Mask each, off which we are only considering Lung and Infection mask because of our problem statement.
* We are able to observe that the image slices are of different size and each CT has different number of slices.
* The data is in the required range of HU such that it can be further used.
* If we observe the CT and Infections masks closely, we can see a lot of black are and diaphragm present in the images, we can crop out only the Region of Interest for our image and not use the rest, this may help is in increasing the processing speed, as it will remove the unwanted parts of the images.

