# **CT Scan Image Classification**

**Dataset Information:-**
1. This dataset contains 1252 CT scans that are positive for SARS-CoV-2 infection (COVID-19) and 1230 CT scans for patients non-infected by SARS-CoV-2, 2482 CT scans in total. 
2. These data have been collected from real patients in hospitals from Sao Paulo, Brazil. 
3. The aim of this dataset is to encourage the research and development of artificial intelligent methods which are able to identify if a person is infected by SARS-CoV-2 through the analysis of his/her CT scans.

## **Importing Libraries**

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from PIL import Image
import tensorflow as tf
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

## **1: Data Preprocessing**

### **1.1 Loading the images**

In [2]:
data_dir = 'data'

covid_dir = os.path.join(data_dir, 'COVID')
non_covid_dir = os.path.join(data_dir, 'non-COVID')

covid_images = []
non_covid_images = []

for filename in os.listdir(covid_dir):
    img_path = os.path.join(covid_dir, filename)
    img = Image.open(img_path)
    covid_images.append(img)

for filename in os.listdir(non_covid_dir):
    img_path = os.path.join(non_covid_dir, filename)
    img = Image.open(img_path)
    non_covid_images.append(img)

### **1.2 Resizing images and Normalizing image data**

In [3]:
# Fixed size of (224, 224) for resizing the images
image_size = (224, 224)

# Converting images to RGB mode and resizing them
covid_images = [img.convert('RGB').resize(image_size) for img in covid_images]
non_covid_images = [img.convert('RGB').resize(image_size) for img in non_covid_images]

# Converting the images to arrays and normalizing the pixel values
covid_data = np.array([img_to_array(img)/255.0 for img in covid_images])
non_covid_data = np.array([img_to_array(img)/255.0 for img in non_covid_images])

# Here covid_data and non_covid_data now contains the preprocessed and normalized image data.

## **2: Data Augmentation**

In [4]:
# Defining all data augmentation parameters
datagen = ImageDataGenerator(
    rotation_range=20,        # Random rotation (±20 degrees)
    width_shift_range=0.2,    # Random horizontal shift (±20% of image width)
    height_shift_range=0.2,   # Random vertical shift (±20% of image height)
    shear_range=0.2,          # Random shear
    zoom_range=0.2,           # Random zoom
    horizontal_flip=True      # Random horizontal flip
)

# Fitting the data augmentation generator on the data
datagen.fit(covid_data)
datagen.fit(non_covid_data)