<a href="https://colab.research.google.com/github/mirajjara/DeepLearningResearch/blob/main/DeepLearningResearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Miraj Jara - Custom made Convolutional Neural Network

The purpose of this notebook is to create a CNN capable of classifying medical images into three categories: benign, malignant, and normal. The network will be trained on a dataset of labeled images and evaluated on its accuracy and performance.

To start, we will clone the Breast Ultrasound Images Dataset repository from GitHub and set the path to the dataset. We will then list the subfolders in the dataset using os.listdir(dataset_path), which should output the categories benign, malignant, and normal.

In [None]:
!git clone https://github.com/gt-big-data/cancer-detection.git

Cloning into 'cancer-detection'...
remote: Enumerating objects: 28036, done.[K
remote: Counting objects: 100% (4817/4817), done.[K
remote: Compressing objects: 100% (3969/3969), done.[K
remote: Total 28036 (delta 774), reused 4797 (delta 771), pack-reused 23219[K
Receiving objects: 100% (28036/28036), 237.50 MiB | 13.16 MiB/s, done.
Resolving deltas: 100% (4719/4719), done.
Updating files: 100% (1583/1583), done.


In [None]:
import os
dataset_path = '/content/cancer-detection/Dataset_BUSI_with_GT'
os.listdir(dataset_path)  # lists the subfolders (benign, malignant, normal)

['benign', 'malignant', 'normal']

##About the Dataset

Data collection: Breast ultrasound images from women aged 25-75, collected in 2018

Number of patients: 600 femalepatients

Number of images: 780 images, with an average size of 500x500 pixels, in PNG format

Ground truth images: Included with the original images

Classes: Images are categorized into three classes: normal, benign, and malignant

Citation: If you use this dataset, please cite the following paper:
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863

#Data Preprocessing



##Resizing images

The following code resizes all images in the dataset to 224x224 and saves them in resized_images.

In [None]:
import cv2

# Define the dataset path and output path
dataset_path = '/content/cancer-detection/Dataset_BUSI_with_GT'
output_path = '/content/resized_images'

# Create the output path if it doesn't exist
if not os.path.exists(output_path):
    os.makedirs(output_path)

# Define the resize dimensions
resize_dim = (224, 224)

# Loop through each subfolder (benign, malignant, normal)
for subfolder in os.listdir(dataset_path):
    # Create a subfolder in the output path
    subfolder_path = os.path.join(output_path, subfolder)
    if not os.path.exists(subfolder_path):
        os.makedirs(subfolder_path)

    # Loop through each image in the subfolder
    for image_file in os.listdir(os.path.join(dataset_path, subfolder)):
        # Check if the file has an image extension
        if image_file.lower().endswith(('.png', '.jpg', '.jpeg', '.tif', '.bmp', '.gif')):
            # Load the image
            img_path = os.path.join(dataset_path, subfolder, image_file)
            img = cv2.imread(img_path)

            # Check if the image was loaded correctly
            if img is not None:
                # Resize the image
                img = cv2.resize(img, resize_dim)

                # Save the resized image
                output_file = os.path.join(subfolder_path, image_file)
                cv2.imwrite(output_file, img)
            else:
                print(f"Error: Unable to read image {img_path}")
        else:
            print(f"Skipping non-image file: {image_file}")

Skipping non-image file: read-me
Skipping non-image file: read-me
Skipping non-image file: readme


#Creating the Model

The model consists of 9 layers: 3 convolutional layers, 3 max pooling layers, 1 flatten layer, 1 dense layer, and 1 dropout layer.

The output shape of each layer is displayed, showing the number of feature maps or units in each layer.
The number of parameters (Param #) is displayed for each layer, with a total of approximately 11 million parameters in the model.
All parameters are trainable, indicating that the model is ready for training.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers

# Define the CNN model architecture
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(3, activation='softmax')  # 3 classes: benign, malignant, normal
])

# Compile the model
model.compile(
    optimizer=optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Print the model summary
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 222, 222, 32)      896       
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 111, 111, 32)      0         
 g2D)                                                            
                                                                 
 conv2d_4 (Conv2D)           (None, 109, 109, 64)      18496     
                                                                 
 max_pooling2d_4 (MaxPoolin  (None, 54, 54, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_5 (Conv2D)           (None, 52, 52, 128)       73856     
                                                                 
 max_pooling2d_5 (MaxPoolin  (None, 26, 26, 128)      