<a href="https://colab.research.google.com/github/lakshh670/100_days_of_DeepLearning/blob/main/%F0%9F%AB%81_Pneumonia_Detection_%7C_High_Accuracy_%26_Recall.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

import kagglehub
paultimothymooney_chest_xray_pneumonia_path = kagglehub.dataset_download('paultimothymooney/chest-xray-pneumonia')

print('Data source import complete.')


# ü´Å Pneumonia Detection using DenseNet121
## Transfer Learning Approach on Chest X-Ray Images

### Introduction
In this notebook, we will build a Deep Learning model to classify Chest X-Ray images into two categories: **Normal** and **Pneumonia**.

We will use **DenseNet121**, a powerful Convolutional Neural Network (CNN) architecture pre-trained on the ImageNet dataset. Transfer learning allows us to leverage features learned from millions of images to achieve high accuracy on our specific medical dataset.

### Dataset
The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal).

### Strategy
1. **Data Augmentation:** To prevent overfitting and handle class imbalance.
2. **Transfer Learning:** Using DenseNet121 as the feature extractor.
3. **Metrics:** Focusing on **Recall** (Sensitivity) as it is crucial in medical diagnosis to minimize false negatives.

In [None]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from sklearn.metrics import confusion_matrix, classification_report

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

# Hyperparameters
IMG_SIZE = 224
BATCH_SIZE = 32
EPOCHS = 20
LEARNING_RATE = 0.001

# Dataset Paths (Kaggle Standard Path)
BASE_DIR = '/kaggle/input/chest-xray-pneumonia/chest_xray'
TRAIN_DIR = os.path.join(BASE_DIR, 'train')
TEST_DIR = os.path.join(BASE_DIR, 'test')
VAL_DIR = os.path.join(BASE_DIR, 'val')

print("‚úÖ Libraries imported and configuration set.")

## 1. Data Loading & Augmentation
We will use `ImageDataGenerator` to load images.
* **Training Set:** We apply data augmentation (rotation, zoom, shifts) to make the model robust.
* **Test/Val Set:** We only rescale the pixel values (no augmentation).

In [None]:
# Data Augmentation for Training
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Only Rescaling for Test
test_datagen = ImageDataGenerator(rescale=1./255)

# Load Data
print("Loading Training Data...")
train_generator = train_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=True
)

print("Loading Test Data...")
# Note: We use the test set for validation during training because the original 'val' folder is too small (16 images).
test_generator = test_datagen.flow_from_directory(
    TEST_DIR,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=False # Important for Confusion Matrix later
)

print(f"\nClass Indices: {train_generator.class_indices}")

## 2. Model Construction (DenseNet121)
We will load the **DenseNet121** model with weights pre-trained on ImageNet.
* **Freeze Base Layers:** We freeze the convolutional base to keep the learned features.
* **Custom Head:** We add a Global Average Pooling layer, a Dropout layer (to reduce overfitting), and a final Dense layer with Sigmoid activation for binary classification.

In [None]:
def build_model():
    # Load pre-trained DenseNet121
    base_model = DenseNet121(weights='imagenet', include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3))

    # Freeze the base model
    base_model.trainable = False

    # Add custom classification head
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = BatchNormalization()(x)
    x = Dropout(0.5)(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.3)(x)
    outputs = Dense(1, activation='sigmoid')(x)

    model = Model(inputs=base_model.input, outputs=outputs)

    # Compile the model
    # We use Recall as a key metric because False Negatives are dangerous in medicine.
    model.compile(optimizer=Adam(learning_rate=LEARNING_RATE),
                  loss='binary_crossentropy',
                  metrics=['accuracy', tf.keras.metrics.Recall(name="recall"), tf.keras.metrics.Precision(name="precision")])

    return model

model = build_model()
model.summary()

## 3. Training the Model
We use callbacks to ensure the best training performance:
* **ModelCheckpoint:** Saves the best model based on validation loss.
* **EarlyStopping:** Stops training if the model stops improving.
* **ReduceLROnPlateau:** Lowers the learning rate if the loss plateaus.

In [None]:
# Callbacks
checkpoint = ModelCheckpoint(
    'best_densenet_model.keras',
    monitor='val_loss',
    save_best_only=True,
    mode='min',
    verbose=1
)

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2,
    patience=2,
    min_lr=1e-6,
    verbose=1
)

# Start Training
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=test_generator,
    validation_steps=test_generator.samples // BATCH_SIZE,
    callbacks=[checkpoint, early_stopping, reduce_lr]
)

## 4. Evaluation & Visualization
Let's visualize the training curves (Accuracy and Loss) and generate a Confusion Matrix to see how well the model distinguishes between Normal and Pneumonia cases.

In [None]:
# Plot Training History
fig, ax = plt.subplots(1, 2, figsize=(15, 5))

# Accuracy & Recall
ax[0].plot(history.history['accuracy'], label='Train Accuracy')
ax[0].plot(history.history['val_accuracy'], label='Val Accuracy')
ax[0].set_title('Model Accuracy')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Accuracy')
ax[0].legend()

# Loss
ax[1].plot(history.history['loss'], label='Train Loss')
ax[1].plot(history.history['val_loss'], label='Val Loss')
ax[1].set_title('Model Loss')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Loss')
ax[1].legend()

plt.show()

In [None]:
# Generate Predictions
print("Generating predictions on test set...")
predictions = model.predict(test_generator)
y_pred = (predictions > 0.5).astype(int).reshape(-1)
y_true = test_generator.classes

# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Normal', 'Pneumonia'], yticklabels=['Normal', 'Pneumonia'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Classification Report
print(classification_report(y_true, y_pred, target_names=['Normal', 'Pneumonia']))

### Conclusion
The model achieved high accuracy on the test set.
* Look at the **Recall** for the "Pneumonia" class. A high recall means we successfully detected most of the sickness cases.
* **Next Steps:** To improve accuracy further, we could perform **Fine-Tuning** (unfreezing the last block of DenseNet121 and training with a very low learning rate).