# COVID-19 Detection from X Ray Images of Lungs

# Accuracy Graph of DL Model

In [None]:
from IPython.display import Image
Image(filename='/kaggle/input/graph-of-model/plot.png', width=800) 

You can find the dataset used in training of this Deep Learning Model from [Open Source Kaggle Dataset of 5000 Lungs X Ray Images +ve COVID and 5000 Healthy Persons Lungs Images](https://www.kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images)

# Accessing dataset from the Kaggle
Download the whole directory available [here](https://www.kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images).
* Create a New Python Local Environment and paste all th files in main directory of local environment.
* Check the file named "requirements.txt". This file contains the information about all the libraries we gonna use in this process.
* run the command ***pip install -r requirements.txt*** this will install all the required libraries we needed.

* Now Open "generate_images.py" file of directory

# Lets dive into generate_images.py

* from line 2-9 we are importing necessary packages needed.

In [None]:
# import the necessary packages
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import load_img
import numpy as np
import argparse
import cv2
import os
from imutils import paths


* from line 12-19 we are setting the command line arguments.

In [None]:
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-o", "--output", required=True,
	help="path to output directory to store augmentation examples")
ap.add_argument("-t", "--total", type=int, default=100,
	help="# of training samples to generate")
args = vars(ap.parse_args())

* In line 22 we are taking all the image paths in a list
* In line 23 we are initializing an array to store all the images 

In [None]:
imagePaths = list(paths.list_images(args["dataset"]))
data = []

* From line 26-34 we are opening all the paths of images one by one and than appending images into data array.

In [None]:
# loop over the image paths
for imagePath in imagePaths:
	# extract the class label from the filename
	label = imagePath.split(os.path.sep)[-2]

	# load the image, swap color channels, and resize it to be a fixed
	# 224x224 pixels while ignoring aspect ratio
	image = cv2.imread(imagePath)
	# update the data and labels lists, respectively
	data.append(image)

* From line 35-40 we are converting image to array so that we can apply data augmentation on it.

In [None]:
for image in data:
	# load the input image, convert it to a NumPy array, and then
	# reshape it to have an extra dimension
	print("[INFO] loading example image...")
	image = img_to_array(image)
	image = np.expand_dims(image, axis=0)

* From line 44-52 we are doing data augmentation

In [None]:
aug = ImageDataGenerator(
		rotation_range=30,
		zoom_range=0.15,
		width_shift_range=0.2,
		height_shift_range=0.2,
		shear_range=0.15,
		horizontal_flip=True,
		fill_mode="nearest")
	total = 0

* From line 54-57 we are passing each image on by one to our data augmentation object and saving it in "jpg"format. and rest code loop over all images and check if we reached a specifies number of examples which is 100 by default.

In [None]:
# construct the actual Python generator
	print("[INFO] generating images...")
	imageGen = aug.flow(image, batch_size=1, save_to_dir=args["output"],
		save_prefix="image", save_format="jpg")
	# loop over examples from our image data augmentation generator
	for image in imageGen:
		# increment our counter
		total += 1

		# if we have reached the specified number of examples, break
		# from the loop
		if total == args["total"]:
			break

* and these commands one by one.

In [None]:
$ python generate_images.py --dataset dataset/covid --output generated_dataset/covid
$ python generate_images.py --dataset dataset/normal --output generated_dataset/normal

**NOTE**: *Wait for some time because these commands will create data augmentation and will create around 10000 images. On a good GPU it will take a few minutes*.   

Now you will be able to see the images dataset in the "generated-data" folder(containing 2 folders covid & normal) which we are gonna use in this whole process.

# Lets Train the Deep Learning Model 

* Now open the file in "train_covid19.py" in editor

# Lets dive into train_covid19.py

* From line 4-24 we are importing necessary packages need for this file

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os

* From line 26-33 we are setting command line arguments

In [None]:
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str, default="covid19.model",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

* from line 37-39 we are setting innitial learning rate, no or epochs and batch size.

In [None]:
INIT_LR = 1e-3
EPOCHS = 100
BS = 128

* From line 39-61 we are taking each image from the path, changing it to RGB, resizing it, adding its lable to label and adding image to data.

In [None]:
for imagePath in imagePaths:
	# extract the class label from the filename
	label = imagePath.split(os.path.sep)[-2]

	# load the image, swap color channels, and resize it to be a fixed
	# 224x224 pixels while ignoring aspect ratio
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image = cv2.resize(image, (128, 128))

	# update the data and labels lists, respectively
	data.append(image)
	labels.append(label)

* from line 65-71 we are creating np array and encoding images lables 

In [None]:
data = np.array(data) / 255.0
labels = np.array(labels)

# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

* From line 75-76 we setting training data, test data and 79-81 we doing data augmentation and finaly from line 85-86 we are loading VGG16 network whome we will trai. 

In [None]:
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.20, stratify=labels, random_state=42)

# initialize the training data augmentation object
trainAug = ImageDataGenerator(
	rotation_range=15,
	fill_mode="nearest")

# load the VGG16 network, ensuring the head FC layer sets are left
# off
baseModel = VGG16(weights="imagenet", include_top=False,
	input_tensor=Input(shape=(128, 128, 3)))


* From line 90-99 we are setting some training parameters and than from 103-104 we are freezing base layers so that they might not update during first training process.

In [None]:
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

* from line 107-10 we are compiling our model and from 113-119 we are traing the head of our network 

In [None]:
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

# train the head of the network
print("[INFO] training head...")
H = model.fit_generator(
	trainAug.flow(trainX, trainY, batch_size=BS),
	steps_per_epoch=len(trainX) // BS,
	validation_data=(testX, testY),
	validation_steps=len(testX) // BS,
	epochs=EPOCHS)



* from line 121-145 we are evaluating our model. Priniting reports and confusion matrix.

In [None]:
# make predictions on the testing set
print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report
print(classification_report(testY.argmax(axis=1), predIdxs,
	target_names=lb.classes_))

# compute the confusion matrix and and use it to derive the raw
# accuracy, sensitivity, and specificity
cm = confusion_matrix(testY.argmax(axis=1), predIdxs)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])

# show the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))

* and finally we are poling the graph of accuracy from 148-163 

In [None]:
# plot the training loss and accuracy
N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on COVID-19 Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

# serialize the model to disk
print("[INFO] saving COVID-19 detector model...")
model.save(args["model"], save_format="h5")

* and run the following command

In [None]:
$ python train_covid19.py --dataset generated_dataset

* This will create a Deep Learning model "covid19.model" in the same directory

Lets discuss the accuracy of our created model from the graph ploted

In [None]:
Train for 79 steps, validate on 102 samples
Epoch 1/100
11/11 [==============================] - 219s 20s/step - loss: 0.8125 - accuracy: 0.4706 - val_loss: 0.4332 - val_accuracy: 0.7500
Epoch 2/100
11/11 [==============================] - 255s 23s/step - loss: 0.6254 - accuracy: 0.6765 - val_loss: 0.3841 - val_accuracy: 0.7500
Epoch 3/100
11/11 [==============================] - 399s 36s/step - loss: 0.5812 - accuracy: 0.6569 - val_loss: 0.3528 - val_accuracy: 0.9000
Epoch 4/100
11/11 [==============================] - 398s 36s/step - loss: 0.5092 - accuracy: 0.7353 - val_loss: 0.3150 - val_accuracy: 0.8500
Epoch 5/100
11/11 [==============================] - 395s 36s/step - loss: 0.4465 - accuracy: 0.8137 - val_loss: 0.2848 - val_accuracy: 0.9000
Epoch 6/100
11/11 [==============================] - 397s 36s/step - loss: 0.3994 - accuracy: 0.8824 - val_loss: 0.2566 - val_accuracy: 0.9000
Epoch 7/100
11/11 [==============================] - 403s 37s/step - loss: 0.3739 - accuracy: 0.8529 - val_loss: 0.2346 - val_accuracy: 0.9000
Epoch 8/100
.
.
.
11/11 [==============================] - 221s 20s/step - loss: 0.0959 - accuracy: 0.9804 - val_loss: 0.0429 - val_accuracy: 1.0000
Epoch 99/100
11/11 [==============================] - 204s 19s/step - loss: 0.0893 - accuracy: 0.9706 - val_loss: 0.0228 - val_accuracy: 1.0000
Epoch 100/100
11/11 [==============================] - 202s 18s/step - loss: 0.0723 - accuracy: 0.9706 - val_loss: 0.0278 - val_accuracy: 1.0000
[INFO] evaluating network...
              precision    recall  f1-score   support

       covid       1.00      0.93      0.96        14
      normal       0.93      1.00      0.97        14

    accuracy                           0.96        28
   macro avg       0.97      0.96      0.96        28
weighted avg       0.97      0.96      0.96        28

[[13  1]
 [ 0 14]]

In [None]:
from IPython.display import Image
Image(filename='/kaggle/input/graph-of-model/plot.png', width=800) 

* Accuracy: 100.00%
* Sensitivity: 92.86%
* Specificity: 100.00%

**Accuracy**
Here we have 100% accuracy means we can use this model for detection of COVID from X Rays.

**Sensitivity**
92.86% Sensitivity means we could accurately identify them as “COVID-19 positive” 92.86% of the time using our model.

**Specificity**
100% Specificity means we could accurately identify them as “COVID-19 negative” 100.00% of the time using our model.

**NOTE**: Through this model is almost 100% accurate but still there is a lot of work need to be done as I didn't used other parameters like geo-location, travel history etc to detect COVID-19. Using only X Rays images is not enough. If you want to use this for further research kindly contact me for further studies and research. 

Download the source code and dataset [here](https://www.kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images).