# **About This Notebook**
In this notebook , I will start with complete explanation of everything you need know related to Prostate Cancer and its detection and I will explain the dataset and then create a Model to detect Prostate Cancer.
# **I) Medical Point Of View:** 
In this section, we willtalk about the general medical knowledge that you need to know about Prostate Cancer.

**1) What is Prostate Cancer?**

Prostate is a walnut-sized gland located just below the bladder and in front of the rectum in men. It plays a crucial role in the reproductive system by producing a fluid that, along with sperm from the testicles, makes up semen. The prostate surrounds the urethra, the tube that carries urine from the bladder and semen from the reproductive system out through the penis.

Prostate cancer is a condition where cells in the prostate gland multiply uncontrollably, forming a tumor. In its early stages, it may not cause noticeable harm, but as it progresses, it can lead to problems like difficulty urinating and discomfort. Without intervention, it can potentially spread to other parts of the body, becoming more dangerous. 

**2) Prostate biopsy test:**

When the doctor detects an abnomalies in your prostate after examine it with inital tests,
he collects a sample of prostate tissue.
Prostate biopsy is often done using a thin needle that's inserted into the prostate to collect tissue. The tissue sample is analyzed in a lab to determine whether cancer cells are present.

**3) GLEASON score:**

After a biopsy confirms cancer, the subsequent stage involves assessing the aggressiveness (grade) of the cancer cells. A laboratory pathologist analyzes a cancer sample to gauge the extent of differentiation from healthy cells. A higher grade signifies a more aggressive cancer with an increased likelihood of rapid spreading. The Gleason score, commonly employed to assess prostate cancer cells, combines two numbers and spans from 2 (less aggressive) to 10 (highly aggressive), although the lower end of the range is less frequently utilized.

**4) ISUP grade:**

According to current guidelines by the International Society of Urological Pathology (ISUP), the Gleason scores are summarized into an ISUP grade on a scale from 1 to 5 according to the following rule:

Gleason score 6 = ISUP grade 1 

Gleason score 7 (3 + 4) = ISUP grade 2 

Gleason score 7 (4 + 3) = ISUP grade 3 

Gleason score 8 = ISUP grade 4 

Gleason score 9-10 = ISUP grade 5 

If there is no cancer in the sample, we use the label ISUP grade 0 in this competition.

**5) GLEASON score Samples**

[A]Benign prostate glands with folded epithelium :The cytoplasm is pale and the nuclei small and regular. The glands are grouped together.

[B]Prostatic adenocarcinoma : Gleason Pattern 3 has no loss of glandular differentiation. Small glands infiltrate between benign glands. The cytoplasm is often dark and the nuclei enlarged with dark chromatin and some prominent nucleoli. Each epithelial unit is separate and has a lumen.

[C]Prostatic adenocarcinoma : Gleason Pattern 4 has partial loss of glandular differentiation. There is an attempt to form lumina but the tumor fails to form complete, well-developed glands. This microphotograph shows irregular cribriform cancer, i.e. epithelial sheets with multiple lumina. There are also some poorly formed small glands and some fused glands. All of these are included in Gleason Pattern 4.

[D]Prostatic adenocarcinoma : Gleason Pattern 5 has an almost complete loss of glandular differentiation. Dispersed single cancer cells are seen in the stroma. Gleason Pattern 5 may also contain solid sheets or strands of cancer cells. All microphotographs show hematoxylin and eosin stains at 20x lens magnification.


In [None]:
## import os
import os
# There are two ways to load the data from the PANDA dataset:
# Option 1: Load images using openslide
import openslide
# Option 2: Load images using skimage (requires that tifffile is installed)
import skimage.io
import random
import seaborn as sns
import cv2

# General packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageStat
from IPython.display import Image, display

# Plotly for the interactive viewer (see last section)
import plotly.graph_objs as go
import tensorflow as tf


In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
# Location of the training images

BASE_PATH = '../input/prostate-cancer-grade-assessment'

# image and mask directories
data_dir = f'{BASE_PATH}/train_images'
mask_dir = f'{BASE_PATH}/train_label_masks'


# Location of training labels
train = pd.read_csv(f'{BASE_PATH}/train.csv').set_index('image_id')
test = pd.read_csv(f'{BASE_PATH}/test.csv')
submission = pd.read_csv(f'{BASE_PATH}/sample_submission.csv')

In [None]:
display(train.head())
print("Shape of training data :", train.shape)
print("unique data provider :", len(train.data_provider.unique()))
print("unique isup_grade(target) :", len(train.isup_grade.unique()))
print("unique gleason_score :", len(train.gleason_score.unique()))
print("unique gleason_score :", train.gleason_score.unique())
print("unique isup_grade(target) :", train.isup_grade.unique())

# Count the number of images per 'isup_grade' (class)
images_per_isup_grade = train['isup_grade'].value_counts()
print("\nNumber of images per ISUP grade (class):")
print(images_per_isup_grade)

# Optionally, if you also want to count images per 'gleason_score'
images_per_gleason_score = train['gleason_score'].value_counts()
print("\nNumber of images per Gleason score:")
print(images_per_gleason_score)


In [None]:
# Display images from an array of IDs
def display_images(slides): 
    f, ax = plt.subplots(5,3, figsize=(18,22))
    for i, slide in enumerate(slides):
        image = openslide.OpenSlide(os.path.join(data_dir, f'{slide}.tiff'))
        spacing = 1 / (float(image.properties['tiff.XResolution']) / 10000)
        patch = image.read_region((1780,1950), 0, (256, 256))
        ax[i//3, i%3].imshow(patch) 
        image.close()       
        ax[i//3, i%3].axis('off')
        
        image_id = slide
        data_provider = train.loc[slide, 'data_provider']
        isup_grade = train.loc[slide, 'isup_grade']
        gleason_score = train.loc[slide, 'gleason_score']
        ax[i//3, i%3].set_title(f"ID: {image_id}\nSource: {data_provider} ISUP: {isup_grade} Gleason: {gleason_score}")

    plt.show() 

**6 Quickly displaying few images**

In the following sections we will load data from the slides with OpenSlide. The benefit of OpenSlide is that we can load arbitrary regions of the slide, without loading the whole image in memory. Want to interactively view a slide? We have added an interactive viewer to this notebook in the last section.

You can read more about the OpenSlide python bindings in the documentation: https://openslide.org/api/python/



In [None]:
images = [
    '07a7ef0ba3bb0d6564a73f4f3e1c2293',
    '037504061b9fba71ef6e24c48c6df44d',
    '035b1edd3d1aeeffc77ce5d248a01a53',
    '059cbf902c5e42972587c8d17d49efed',
    '06a0cbd8fd6320ef1aa6f19342af2e68',
    '06eda4a6faca84e84a781fee2d5f47e1',
    '0a4b7a7499ed55c71033cefb0765e93d',
    '0838c82917cd9af681df249264d2769c',
    '046b35ae95374bfb48cdca8d7c83233f',
    '074c3e01525681a275a42282cd21cbde',
    '05abe25c883d508ecc15b6e857e59f32',
    '05f4e9415af9fdabc19109c980daf5ad',
    '060121a06476ef401d8a21d6567dee6d',
    '068b0e3be4c35ea983f77accf8351cc8',
    '08f055372c7b8a7e1df97c6586542ac8'
]

display_images(images)

# **II) Building the model:** 

**1) Collecting data**

In [None]:
def is_blank(patch, threshold=240):
    """Check if the patch is mostly blank (white)"""
    stat = ImageStat.Stat(patch)
    mean_brightness = sum(stat.mean) / 3  # Calculate mean brightness of the patch
    return mean_brightness > threshold

In [None]:
# Assuming you have a DataFrame 'train' with necessary details
# train = pd.read_csv('path_to_your_csv')

collected_data_output_dir = f'./collected_images'

low_tiss_slides = [ '033e39459301e97e457232780a314ab7',
                    '0b6e34bf65ee0810c1a4bf702b667c88',
                    '3385a0f7f4f3e7e7b380325582b115c9',
                    '3790f55cad63053e956fb73027179707',
                    '5204134e82ce75b1109cc1913d81abc6',
                    'a08e24cff451d628df797efc4343e13c',
                    '379ZxmoZZwyfLZnABDtMdh3nwjqps6fov7'
                  ]
white_patches = 0
width, height = 256, 256  # Patch size
increase_factor = 2  # Factor to increase shift distance

for index, row in train.iterrows():
    if index in low_tiss_slides:
        continue
    slide = index
    image_path = os.path.join(data_dir, f'{slide}.tiff')
    image = openslide.OpenSlide(image_path)
    
    # Slide dimensions
    slide_width, slide_height = image.dimensions
    
    # Initial coordinates for the patch and initial shift
    x, y = 1780, 1950
    shift_x, shift_y = 100, 100  # Initial shift values
    
    patch = image.read_region((x, y), 0, (width, height)).convert("RGB")
    
    attempts = 0  # Keep track of attempts to find a non-blank patch

    while is_blank(patch):  # Limit attempts to prevent infinite loop
        x += shift_x
        y += shift_y
        
        # Ensure x and y do not exceed slide dimensions
        if x + width > slide_width or y + height > slide_height:
            x = max(0, min(slide_width - width, x))  # Adjust x within bounds
            y = max(0, min(slide_height - height, y))  # Adjust y within bounds
            shift_x *= increase_factor  # Increase shift size
            shift_y *= increase_factor  # Increase shift size
            
        # Extract a new patch with the adjusted coordinates
        patch = image.read_region((x, y), 0, (width, height)).convert("RGB")
        
        # If we've adjusted the shift beyond a reasonable limit, stop trying for this slide
        if shift_x >= slide_width or shift_y >= slide_height:
            break
        
        attempts += 1
    
    if is_blank(patch):  # If still blank after adjustments, skip this slide
        white_patches+=1
        continue
    
    # Define the directory based on the gleason_score
    isup = train.loc[slide, 'isup_grade']  # Replace '+' with '_' to avoid directory issues
    label_dir = os.path.join(collected_data_output_dir, f'isup_{isup}')
    os.makedirs(label_dir, exist_ok=True)
    
    # Save the extracted patch
    patch.save(os.path.join(label_dir, f'{slide}.png'))
    image.close()
    
print(f"No suitable patch found for {white_patches} slides")

**2) Loading Data**

In [None]:
# Global declarations for data
batch_size = 32
image_size = (256, 256)
seed = 123  # Seed for reproducibility

# Load the training dataset
train_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=collected_data_output_dir,
    validation_split=0.2,  # Use 20% of the images as the test set
    subset="training",
    seed=seed,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='categorical'  # Use 'categorical' for multi-class classification
)

# Load the test dataset
test_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=collected_data_output_dir,
    validation_split=0.2,
    subset="validation",
    seed=seed,
    image_size=image_size,
    batch_size=batch_size,
    label_mode='categorical'
)

# The datasets are now ready for training and evaluation


**3) Creating Block functions**

In [None]:
def conv_block(filters, kernel_size, activation='relu', pool_size=(2, 2), use_batchnorm=True):
    block = tf.keras.Sequential()
    block.add(tf.keras.layers.Conv2D(filters, kernel_size, padding='same', use_bias=not use_batchnorm))
    if use_batchnorm:
        block.add(tf.keras.layers.BatchNormalization())
    block.add(tf.keras.layers.Activation(activation))
    block.add(tf.keras.layers.MaxPooling2D(pool_size=pool_size))
    return block


In [None]:
def dense_block(units, activation='relu', dropout_rate=None, use_batchnorm=True):
    block = tf.keras.Sequential()
    block.add(tf.keras.layers.Dense(units, use_bias=not use_batchnorm))
    if use_batchnorm:
        block.add(tf.keras.layers.BatchNormalization())
    block.add(tf.keras.layers.Activation(activation))
    if dropout_rate:
        block.add(tf.keras.layers.Dropout(dropout_rate))
    return block

**4) Reassembling the modal**

In [None]:
num_classes = 6

model = tf.keras.Sequential([
    # Add convolutional blocks
    tf.keras.Input(shape=(image_size[0],image_size[1], 3)),
    conv_block(256, (3, 3)),
    conv_block(128, (3, 3)),
    conv_block(64, (3, 3)),
    conv_block(32, (3, 3)),
    
    
    tf.keras.layers.GlobalAveragePooling2D(),
    
    # Add dense blocks
    dense_block(64, dropout_rate=0),
    
    # Output layer
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

learning_rate = 0.001
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

# Compile the model
model.compile(optimizer=optimizer,
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=[tf.keras.metrics.CategoricalAccuracy(), tf.keras.metrics.Precision(),
                       tf.keras.metrics.Recall()])

for x, y in train_dataset.take(1):
    print(x.shape, y.shape)


**5) Evaluation**

In [None]:
history=model.fit(train_dataset,validation_data=test_dataset,epochs=15, verbose=1)

In [None]:
acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(15)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()