<a href="https://colab.research.google.com/github/rachanabramhane/assignment/blob/main/CAPSTONE_PROJECT_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Name - Rachana Bramhane

Capstone Project - CT scan image classification

**This example will show the steps needed to build a 3D convolutional neural network (CNN) to predict the presence of viral pneumonia in computer tomography (CT) scans. 2D CNNs are commonly used to process RGB images (3 channels). A 3D CNN is simply the 3D equivalent: it takes as input a 3D volume or a sequence of 2D frames (e.g. slices in a CT scan), 3D CNNs are a powerful model for learning representations for volumetric data.**


In [1]:
import glob
import pandas as pd
import os

This dataset consists of lung CT scans with COVID-19 related findings, as well as without such findings.

We will be using the associated radiological findings of the CT scans as labels to build a classifier to predict presence of viral pneumonia. Hence, the task is a binary classification problem.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Loading data and preprocessing
 CT scans store raw voxel intensity in Hounsfield units (HU). They range from -1024 to above 2000 in this dataset. Above 400 are bones with different radiointensity, so this is used as a higher bound. A threshold between -1000 and 400 is commonly used to normalize CT scans.

To process the data, we do the following:

We first rotate the volumes by 90 degrees, so the orientation is fixed
We scale the HU values to be between 0 and 1.
We resize width, height and depth.
Here we define several helper functions to process the data. These functions will be used when building training and validation datasets

In [3]:
path=[]
label=[]
for i in glob.glob('/content/drive/MyDrive/capstone_project/'+'*/*.png'):
  path.append(i)
  label.append(i.split('/')[-2])

In [4]:
path

['/content/drive/MyDrive/capstone_project/covid/Covid (25).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (227).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (20).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (130).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (223).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (129).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (185).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (256).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (187).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (138).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (203).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (160).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (1266).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (146).png',
 '/content/drive/MyDrive/capstone_project/covid/Covid (214).png',
 '/content/

In [5]:
data = pd.DataFrame({"Path":path,"Label":label})
data

Unnamed: 0,Path,Label
0,/content/drive/MyDrive/capstone_project/covid/...,covid
1,/content/drive/MyDrive/capstone_project/covid/...,covid
2,/content/drive/MyDrive/capstone_project/covid/...,covid
3,/content/drive/MyDrive/capstone_project/covid/...,covid
4,/content/drive/MyDrive/capstone_project/covid/...,covid
...,...,...
2497,/content/drive/MyDrive/capstone_project/noncov...,noncovid
2498,/content/drive/MyDrive/capstone_project/noncov...,noncovid
2499,/content/drive/MyDrive/capstone_project/noncov...,noncovid
2500,/content/drive/MyDrive/capstone_project/noncov...,noncovid


In [6]:
data['Label'].value_counts()

covid       1273
noncovid    1229
Name: Label, dtype: int64

In [7]:
master_data=data.sample(frac=1)

In [8]:
from keras.models import Sequential,Model
from keras.layers import Dense,Flatten,Dropout
from keras.preprocessing.image import ImageDataGenerator

In [9]:
from keras.callbacks import ModelCheckpoint,EarlyStopping

Data augmentation
The CT scans also augmented by rotating at random angles during training. Since the data is stored in rank-3 tensors of shape (samples, height, width, depth), we add a dimension of size 1 at axis 4 to be able to perform 3D convolutions on the data. The new shape is thus (samples, height, width, depth, 1). There are different kinds of preprocessing and augmentation techniques out there, this example shows a few simple ones to get started.

In [10]:
train_generator = ImageDataGenerator(
    rescale=1./255,
    horizontal_flip=True,
    width_shift_range=0.2,
    height_shift_range=0.2,
    validation_split=0.2
)

In [11]:
test_generator = ImageDataGenerator(
    rescale=1./255
)

In [12]:
master_data['Label'] = master_data['Label'].replace({"COVID":0,"non-COVID":1})

In [13]:
master_data['Label'].unique()

array(['noncovid', 'covid'], dtype=object)

In [14]:
master_data.head(2)

Unnamed: 0,Path,Label
2341,/content/drive/MyDrive/capstone_project/noncov...,noncovid
1222,/content/drive/MyDrive/capstone_project/covid/...,covid


In [15]:
train_images = train_generator.flow_from_dataframe(
    dataframe=master_data,
    x_col='Path',
    y_col='Label',
    target_size=(224, 224),
    color_mode='rgb',
    class_mode='raw',
    batch_size=4,
    shuffle=True,
    subset='training'
)

Found 2002 validated image filenames.


In [16]:
val_images = train_generator.flow_from_dataframe(
    dataframe=master_data,
    x_col='Path',
    y_col='Label',
    target_size=(224, 224),
    color_mode='rgb',
    class_mode='raw',
    batch_size=4,
    shuffle=True,
    subset='validation'
)

Found 500 validated image filenames.


In [17]:
import tensorflow as tf
tf.__version__

'2.8.2'

In [18]:
# Import The Libraries 

from tensorflow.keras.layers import Input, Lambda, Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img
from tensorflow.keras.models import Sequential


import numpy as np
from glob import glob
import matplotlib.pyplot as plt

In [19]:
path= []
label = []
for i in glob('/content/drive/MyDrive/capstone_project'+'*/*.png'):
    path.append(i)
    label.append(i.split('/')[-2])

In [20]:
import pandas as pd

In [21]:
# Path
capstone_project=("/content/drive/MyDrive/capstone_project")
covid_path=("/content/drive/MyDrive/capstone_project/covid")
noncovid_path=("/content/drive/MyDrive/capstone_project/noncovid")

In [22]:
# Set Resize variable
IMAGE_SIZE = [224, 224] # This is my desired image size... and also ResNet50 accepts image of 224*224.

Resnet-50 is a convolutional neural network that is 50 layer deep. we can load a pretrained version of the network trained on more than a million images from rhe imagenet database. 

In [23]:
resnet = ResNet50(
    input_shape = IMAGE_SIZE + [3], # Making the image into 3 Channel, so concating 3.
    weights = 'imagenet', # Default weights.
    include_top = False   # 
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


In [24]:
resnet.summary()

Model: "resnet50"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                           

In [25]:
for layer in resnet.layers:
    layer.trainable = True  

In [26]:
folders = glob("/content/drive/MyDrive/capstone_project" + '/*')
folders

['/content/drive/MyDrive/capstone_project/covid',
 '/content/drive/MyDrive/capstone_project/noncovid']

In [27]:
len(folders)

2

In [28]:
capstone_project_label = ['covid','noncovid']

In [29]:
x = Flatten()(resnet.output)

In [30]:
len(folders)

2

In [31]:
prediction = Dense(len(folders), activation = 'softmax')(x)

In [32]:
resnet.input

<KerasTensor: shape=(None, 224, 224, 3) dtype=float32 (created by layer 'input_1')>

In [33]:
# Create a model Object
model = Model(inputs = resnet.input, outputs = prediction)

Here the model accuracy and loss for the training and the validation sets are plotted. Since the validation set is class-balanced, accuracy provides an unbiased representation of the model's performance.

In [34]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                              

In [35]:
model.compile (
    loss = 'categorical_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

In [36]:
nb_classes = 1010
batch_size = 64
img_size = 200
nb_epochs = 30

In [37]:

train_datagen=ImageDataGenerator(rescale=1./255, 
    validation_split=0.25,
    horizontal_flip = True,    
    zoom_range = 0.3,
    width_shift_range = 0.3,
    height_shift_range=0.3
    )

train_generator=train_datagen.flow_from_dataframe(
    dataframe=master_data,
    directory="/content/drive/MyDrive/capstone_project",
    x_col="Path",
    y_col="Label",
    batch_size=4,
    shuffle=True,
    class_mode="categorical",    
    target_size=(img_size,img_size))

Found 2502 validated image filenames belonging to 2 classes.


In [38]:
test_datagen = ImageDataGenerator(rescale=1./255)

test_generator=test_datagen.flow_from_dataframe(
    dataframe=master_data,
    directory="/content/drive/MyDrive/capstone_project",
    x_col="Path",
    y_col="Label",
    batch_size=batch_size,
    shuffle=True,
    class_mode="categorical",    
    target_size=(img_size,img_size))

Found 2502 validated image filenames belonging to 2 classes.


In [39]:
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

In [40]:
ckpt_path = 'new_model.h5'
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(ckpt_path,save_best_only=True)

In [41]:
EarlyStopping = tf.keras.callbacks.EarlyStopping(patience=4)

In [42]:
history = model.fit_generator(
    train_generator,
    validation_data = test_generator,
    epochs = 10,
    steps_per_epoch = len(train_generator),
    validation_steps = len(test_generator)
)             

  


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


It is important to note that the number of samples is very small (only 200) and we don't specify a random seed. As such, you can expect significant variance in the results. The full dataset which consists of over 1000 CT scans can be found. Using the full dataset, an accuracy of 90% was achieved. A variability of 5-6% in the classification performance is observed in both cases.

Make predictions on a single CT scan

In [43]:
prediction = model.predict(test_generator)

In [44]:
prediction

array([[9.9272984e-01, 7.2702011e-03],
       [8.0389363e-01, 1.9610636e-01],
       [5.4117537e-01, 4.5882460e-01],
       ...,
       [1.0000000e+00, 4.7372822e-08],
       [7.0604450e-01, 2.9395548e-01],
       [1.9008234e-01, 8.0991769e-01]], dtype=float32)

In [45]:
np.argmax(prediction, axis = 1)

array([0, 0, 0, ..., 0, 0, 1])

In [46]:
prediction = np.argmax(prediction, axis = 1)
prediction

array([0, 0, 0, ..., 0, 0, 1])