### Rationale part 1: Dataset Preparation for Multi-Class Classification
- **Reason**: I did this to perform classification, via build a dataset that includes labeled images from three classes: **normal**, **bacterial pneumonia**, and **viral pneumonia**.

In [None]:
import tensorflow as tf
import random

# Step 1: Prepare Data for Multi-Class Classification
# Separate paths for normal, bacterial and viral pneumonia paths
normal_image_paths = [path for path in image_paths if 'normal' in path.lower()]
bacterial_image_paths = [path for path in pneumonia_image_paths if 'bacteria' in path.lower()]
viral_image_paths = [path for path in pneumonia_image_paths if 'virus' in path.lower()]



Here I divide the images into three categories based on their labels:
- 'normal': Images that do not show signs of pneumonia.
- 'bacteria': Images with bacterial pneumonia.
- 'virus': Images with viral pneumonia.

This separation is necessary for building a multi-class classification model, and each category represents a unique label for classification.
By filtering the image paths based on keywords, separate datasets for each class can be prepared, which will help in assigning distinct labels for training.



In [None]:
# Step 2: Sampling the Data for Balanced Dataset
# - Sample 40% from each category to reduce dataset size, which helps manage computational cost.
sample_size_normal = int(0.4 * len(normal_image_paths))
sample_size_bacterial = int(0.4 * len(bacterial_image_paths))
sample_size_viral = int(0.4 * len(viral_image_paths))

# Use `random.sample` to get a random subset from each category.
sampled_normal_image_paths = random.sample(normal_image_paths, sample_size_normal)
sampled_bacterial_image_paths = random.sample(bacterial_image_paths, sample_size_bacterial)
sampled_viral_image_paths = random.sample(viral_image_paths, sample_size_viral)

- As dataset is relatively large via using colab, which makes model training computationally intensive.

- To make training more feasible in terms of resources, I chose to sample 40% of images from each category.

- Random sampling helps ensure that the selected subset is representative of the original data while reducing computational requirements.

- random.sample() is used to randomly select a specific portion of the dataset to prevent sampling bias and to make sure the model generalizes well during training.

In [None]:
# Step 3: Combine the Sampled Image Paths
# - Combine the sampled image paths for normal, bacterial, and viral pneumonia images.
multi_class_image_paths = sampled_normal_image_paths + sampled_bacterial_image_paths + sampled_viral_image_paths


- After obtaining the sampled images for each class, combine them into a single list called multi_class_image_paths.

- This combination creates a unified dataset containing all three categories, which is essential for training the multi-class classifier.

- This step allows to work with a combined dataset that can be further processed and fed to the model.

In [None]:
# Step 4: Assign Labels to the Images
# - Assign numerical labels to each category.
# - 0: NORMAL, 1: BACTERIAL PNEUMONIA, 2: VIRAL PNEUMONIA
normal_labels = [0] * len(sampled_normal_image_paths)
bacterial_labels = [1] * len(sampled_bacterial_image_paths)
viral_labels = [2] * len(sampled_viral_image_paths)

- Each image class needs to be assigned a numerical label for training purposes.

- Assigning numerical labels allows the model to understand which category each image belongs to, which is essential for training the classifier.

In [None]:
# Step 5: Combine the Labels
# - Combine labels for normal, bacterial, and viral pneumonia images.
multi_class_labels = normal_labels + bacterial_labels + viral_labels

- After assigning labels to each category, combine them into a single list called multi_class_labels.

- This combined list ensures that each image in multi_class_image_paths has a corresponding label in multi_class_labels.

- Maintaining this one-to-one correspondence between images and labels allows the model to learn the relationships between input images and their respective categories.

### Rationale Part 2: Preprocessing and Augmentation
- **Reason**: Image preprocessing, such as resizing and normalizing, ensures all images are in a consistent format for training. Augmentation helps increase the robustness of the model by introducing slight variations that it may encounter in real scenarios, reducing overfitting.

In [None]:
# Define a preprocessing and augmentation function to resize and normalize the images
def preprocess_and_augment_image2(image_path):

    # Load and decode the image
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=1)  # Convert to grayscale

    # Resize the image
    image = tf.image.resize(image, [64, 64])  # Resize to 64x64 pixels

    # Data augmentation
    image = tf.image.random_flip_left_right(image)  # Randomly flip horizontally
    image = tf.image.random_brightness(image, max_delta=0.1)  # Randomly adjust brightness
    image = tf.image.random_contrast(image, lower=0.8, upper=1.2)  # Randomly adjust contrast

    # Normalize the image
    image = tf.cast(image, tf.float32) / 255.0  # Normalize to [0, 1]

    return image

- **Loading and Decoding the Image**:
  - `tf.io.read_file(image_path)` reads the image file from the given path.
  - `tf.image.decode_jpeg(image, channels=1)` decodes the JPEG image into a format suitable for TensorFlow processing. Here, `channels=1` to convert the image to **grayscale** (single channel).
  - Grayscale: Chest X-ray images are typically grayscale, so converting them to one channel helps reduce the model's complexity, as there's no need to process color information.

- **Resizing the Image**:
  - `tf.image.resize(image, [input_height, input_width])` resizes the image to a standard dimension (`input_height` x `input_width`), ensuring consistency across all images.
  - During training, all images need to have the **same dimensions** so that the model can process them in batches.
  -Makes sure that the model input layer always receives a fixed size.

- **Random Augmentation**:
  - `tf.image.random_flip_left_right(image)` randomly flips the image horizontally.
  - `tf.image.random_brightness(image, max_delta=0.1)` randomly adjusts the brightness of the image within a specified range (`max_delta=0.1`).
  - `f.image.random_contrast(image, lower=0.8, upper=1.2)` randomly adjusts the contrast of the image within range 0.8 to 1.2
  - Image augmentation introduces variability to help the model generalize better and reduce overfitting.

- **Normalization**:
  - `tf.cast(image, tf.float32) / 255.0` converts pixel values from `uint8` (0 to 255) to `float32` and scales them to a range between **0 and 1**.
  - Normalization makes it easier for the model to learn and converge during training.
  - It ensures that all input values are in a consistent range, which helps in reducing bias during training.

### Rationale Part 3: Data Pipeline Optimization and Splitting the Dataset
- **Reason**: To ensure a well-structured training process, the dataset is split into training and validation sets. Proper batching and optimization of the data pipeline improve training efficiency.

In [None]:
# Step 1: Split the Dataset and Shuffle
# Split the Dataset into training and validation sets
dataset_size = len(multi_class_image_paths)
train_size = int(0.8 * dataset_size)  # 80% for training
val_size = dataset_size - train_size  # 20% for validation

# Shuffle and split the dataset
train_dataset = multi_class_dataset_filtered.take(train_size)
val_dataset = multi_class_dataset_filtered.skip(train_size)

# # Batch the dataset
# batch_size = 128
# train_dataset_batched = train_dataset.batch(batch_size, drop_remainder=True)
# val_dataset_batched = val_dataset.batch(batch_size, drop_remainder=True)



 - determine the size of the entire dataset and then split it into 80% for training (`train_size`) and 20% for validation (`val_size`).
 - use `.take(train_size)` and `.skip(train_size)` to create the training and validation datasets.
 - As I have a better batching operation in step 3, this batch step here results in repeating the batching operation, which does not cause error but it is redundancy.

In [None]:
# Step 2: Calculate Steps per Epoch Based on Dataset Size and Batch Size
# Total sampled images for training
total_sampled_images = len(sampled_normal_image_paths) + len(sampled_bacterial_image_paths) + len(sampled_viral_image_paths)

# Steps per epoch calculation
steps_per_epoch = total_sampled_images // batch_size
validation_steps = val_size // batch_size


- Steps per Epoch:
  - `steps_per_epoch` represents the number of batches that the model will process in one complete pass through the training dataset.
  - It is calculated as the total number of sampled images divided by the batch size (`total_sampled_images // batch_size`).
- Validation Steps:
  - Similarly, `validation_steps` is calculated to determine how many validation batches are processed per epoch.
- Reason of Calculation:
  - Knowing the number of steps per epoch is crucial for controlling the length of each training epoch.
  - This ensures the model iterates through the entire dataset during each epoch, enabling the training to progress consistently.

In [None]:
# Step 3: Batch the Dataset and Repeat to Avoid Running Out of Data
# Batch the dataset and repeat to avoid running out of data
train_dataset = train_dataset.batch(batch_size, drop_remainder=True).repeat()
val_dataset = val_dataset.batch(batch_size, drop_remainder=True).repeat()


- Batching:
  - `train_dataset.batch(batch_size, drop_remainder=True)` splits the dataset into smaller **batches** of a specified size (`batch_size`). This allows the model to process multiple images at once, improving training speed and computational efficiency.
  - The `drop_remainder=True` parameter ensures that only full batches are used, preventing issues if the total number of images is not divisible by the batch size.

- Repeat:
  - `.repeat()` repeats the dataset infinitely. It allows the model to continue accessing the training and validation data without running out of data after one epoch.



In [None]:
# Step 4: Data Pipeline Optimization with Prefetching
# Data Pipeline Optimization
train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)
val_dataset = val_dataset.prefetch(tf.data.experimental.AUTOTUNE)


- Prefetching:
  - `.prefetch(tf.data.experimental.AUTOTUNE)` is used to improve the efficiency of data input by overlapping the data preparation and model training steps.
  - `tf.data.experimental.AUTOTUNE` automatically tunes the prefetch buffer size to optimize performance.
  - Prefetching ensures that while the model is training on the current batch, the data for the next batch is already being prepared.
  - This helps in reducing idle time for the GPU/CPU, thereby making training faster and more efficient.

### Rationale Part 4: Multi-Class VAE and Classifier
- **Reason**: For this task, a **Variational Autoencoder (VAE)** was used to learn a latent representation of the input images. This latent space can then be used for reconstruction as well as classification. The **classifier** model is built on top of the latent vectors produced by the VAE to distinguish between the three classes (normal, bacterial pneumonia, viral pneumonia).

In [None]:
# Define activation choice with updated parameter
activation_choice = 'LeakyReLU'  # Options: 'LeakyReLU' or 'ReLU'

# Set activation function based on choice
if activation_choice == 'LeakyReLU':
    activation = layers.LeakyReLU(negative_slope=0.01)
else:
    activation = layers.ReLU()

# Adjusted latent dimension for better feature capture
latent_dim2 = 128  # Increased to capture more complex features


- Using  `LeakyReLU` as activation function instead of `ReLU`, which helps to mitigate issues related to neurons "dying" during training (i.e., neurons getting stuck with zero output). This choice can improve the model’s ability to learn complex relationships, especially with medical image data where subtle features may be crucial.

- Using `latent_dim2 = 128` instead of `latent_dim = 64`. This increasing the latent dimension allows the model to capture more complex features from the data, which is helpful when dealing with images that have a high degree of variability, such as medical images.

In [None]:
# VAE Encoder
# Step 2.1: Define the encoder with Dropout and Batch Normalization
def build_encoder2(input_shape=(64, 64, 1), latent_dim=latent_dim2):
    encoder_inputs = layers.Input(shape=input_shape)
    x = layers.Conv2D(32, (3, 3), strides=2, padding='same')(encoder_inputs)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Conv2D(64, (3, 3), strides=2, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(64)(x)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    z_mean = layers.Dense(latent_dim, name="z_mean")(x)
    z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
    z = Sampling()([z_mean, z_log_var])
    encoder = models.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder2")
    return encoder

encoder2 = build_encoder2()

- Input Layer and Convolution:
  - The input shape is `(64, 64, 1)`, representing 64x64 pixel grayscale images.
  - Convolutional layers are used to extract relevant features from the images, while `BatchNormalization()` helps to stabilize and speed up the training process.
- LeakyReLU Activation:
  - The `LeakyReLU` activation function helps prevent the vanishing gradient problem by allowing small negative values instead of setting them to zero (as in `ReLU`). This is especially useful for deep networks.
- Dropout:
  - Dropout (`0.4`) is applied after each convolution and dense layer to prevent **overfitting**, ensuring the model generalizes better to unseen data.
- Latent Space:
  - `z_mean` and `z_log_var` are learned to model the latent distribution for the input images.
  - The Sampling layer generates a latent vector (`z`) using `z_mean` and `z_log_var`, ensuring variability in the latent space.

In [None]:
# VAE Decoder
# Step 2.2: Define the decoder
def build_decoder2(latent_dim=latent_dim2):
    latent_inputs = layers.Input(shape=(latent_dim,))
    x = layers.Dense(16 * 16 * 64)(latent_inputs)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Reshape((16, 16, 64))(x)
    x = layers.Conv2DTranspose(64, (3, 3), strides=2, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Conv2DTranspose(32, (3, 3), strides=2, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = activation(x)
    x = layers.Dropout(0.4)(x)
    decoder_outputs = layers.Conv2DTranspose(1, (3, 3), activation='sigmoid', padding='same')(x)
    decoder = models.Model(latent_inputs, decoder_outputs, name="decoder2")
    return decoder

decoder2 = build_decoder2()

- Latent Input and Dense Layers:
  - The decoder starts with a dense layer that transforms the latent vector into a shape that can be reshaped into a feature map.
- Upsampling and Convolutional Layers:
  - The `Conv2DTranspose` layers, paired with `BatchNormalization` and `LeakyReLU`, are used to gradually restore the spatial dimensions.
  - The output layer uses a `sigmoid` activation function to produce values between 0 and 1, representing reconstructed grayscale images.
- Dropout:
  - Applying `Dropout` in the decoder helps prevent overfitting by ensuring the model does not rely too heavily on specific neurons during reconstruction.

In [None]:
# Classifier for Latent Space
# Step 2.3: Build a simple classifier model for multi-class classification using the latent vectors.
def build_classifier(latent_dim=latent_dim2):
    classifier_inputs = layers.Input(shape=(latent_dim,))
    x = layers.Dense(64, activation='relu')(classifier_inputs)
    x = layers.Dropout(0.3)(x)
    x = layers.Dense(32, activation='relu')(x)
    x = layers.Dropout(0.3)(x)

    # The output shape should match the batch size (None) and number of classes (3)
    classifier_outputs = layers.Dense(3, activation='softmax')(x)  # Ensure batch-wise output
    classifier = models.Model(classifier_inputs, classifier_outputs, name="classifier")
    return classifier

classifier = build_classifier()


- Latent Input:
  - The input to the classifier is the latent vector produced by the encoder. This vector is a low-dimensional representation of the original input image, which encodes important features for classification.
  - The classifier takes the latent vector (`latent_dim2=128`) as input. This vector represents the compressed representation of the input image.
- Fully Connected Layers:
  - Two dense layers are used with 64 and 32 units respectively, to learn non-linear combinations of the features (extracted by the encoder) specific for classification.
  - `ReLU` activation is used for non-linearity, helping the classifier to learn complex decision boundaries.
- Dropout Layers:
  - Dropout (0.3) is applied after each dense layer to prevent overfitting. Dropout randomly "drops" units during training, ensuring that the classifier does not become overly reliant on any specific features and learns generalized patterns.
- Output Layer:
  - The output layer has 3 units, corresponding to the three classes (normal, bacterial pneumonia, viral pneumonia).
  - `Softmax` activation ensures the output represents class probabilities, which is suitable for multi-class classification tasks.

In [None]:
# Combining Encoder, Decoder, and Classifier
# Define input layer
inputs = tf.keras.Input(shape=(64, 64, 1))

# Get the outputs from the encoder, decoder, and classifier
z_mean, z_log_var, z = encoder2(inputs)
reconstruction_output = decoder2(z)
classification_output = classifier(z)

# Rename the outputs explicitly using Keras' Lambda layer
reconstruction_output = tf.keras.layers.Lambda(lambda x: x, name='reconstruction_output')(reconstruction_output)
classification_output = tf.keras.layers.Lambda(lambda x: x, name='classification_output')(classification_output)


- Combining Components:
  - The encoder, decoder, and classifier are combined into a single model that takes an image as input and outputs:
    - Reconstructed Image (via decoder).
    - Class Probabilities (via classifier).
- Input Layer:
  - The input layer takes an image with dimensions (64, 64, 1), representing a grayscale image of size 64x64 pixels.
- Encoder Output:
  - The input image is passed through the encoder, which outputs the latent vector (z) along with the mean (z_mean) and log variance (z_log_var) used for the sampling.
  - The latent vector (z) represents the key features extracted by the encoder.
- Linking Latent Vectors to Classifier:
  - The latent vector (z) is then passed to the classifier, which processes it through its fully connected layers and outputs a probability distribution over the three classes.
  - This linkage between the encoder and classifier is what allows the model to take a raw input image and classify it after encoding it into a latent representation.
- Renaming Outputs:
  - The `Lambda` layer is used to explicitly rename the outputs for easier reference and checking.

In [None]:
# Create the model with the renamed outputs
vae_with_classifier_model = tf.keras.Model(
    inputs=inputs,
    outputs=[reconstruction_output, classification_output]
)

# Compile the model with correct output names in the loss function
vae_with_classifier_model.compile(
    optimizer=AdamW(learning_rate=0.001),
    loss={
        'classification_output': 'categorical_crossentropy',  # Classification loss
        'reconstruction_output': 'mse'  # Reconstruction loss
    },
    metrics={'classification_output': 'accuracy'}  # Track accuracy for classification
)

- Model Creation:
  - The VAE with Classifier model (vae_with_classifier_model) takes an input image and outputs both:
    1. Reconstructed Image (reconstruction_output), generated by the decoder.
    2. Class Probabilities (classification_output), generated by the classifier.
  - This structure allows the model to learn both tasks: reconstructing the input image and classifying it into one of the three categories.

- Loss Functions:
  - The model is trained with two loss functions:
    1. `categorical_crossentropy` is used for training the classifier to measure how well the model assigns the correct label to each input. The model is penalized based on the difference between the predicted class probabilities and the actual class label (one-hot encoded). This helps the model learn to classify images correctly.
    2. Mean Squared Error (`mse`) is used for training the VAE decoder, which measures how well the decoder is able to reconstruct the original input image from the latent vector.

- AdamW Optimizer:
  - `AdamW` is used for optimization, which is a variant of Adam that includes weight decay. It helps in **regularizing** the model, thereby reducing overfitting.

### Rationale Part 5: Building the Classifier and Incorporating Multi-Class Outputs
- Reason: The latent vectors from the encoder are then used to perform classification. A separate classifier model is trained using the latent vectors, which helps in distinguishing among normal, bacterial pneumonia, and viral pneumonia.

In [None]:
# Create the model with the renamed outputs
vae_with_classifier_model = tf.keras.Model(
    inputs=inputs,
    outputs=[reconstruction_output, classification_output]
)

In [None]:
# Compile the model with correct output names in the loss function
vae_with_classifier_model.compile(
    optimizer=AdamW(learning_rate=0.001),
    loss={
        'classification_output': 'categorical_crossentropy',  # Classification loss
        'reconstruction_output': 'mse'  # Reconstruction loss
    },
    metrics={'classification_output': 'accuracy'}  # Track accuracy for classification
)


### Overall Explanation of Classification Process
In this question, I employed a VAE, followed by a classifier to classify chest X-ray images into three categories:
1. Normal
2. Bacterial Pneumonia
3. Viral Pneumonia

This approach can be seen in two major components:
- VAE Encoder: Used to convert images into a latent vector representation.
- Classifier: The classifier that takes the latent vector as input and outputs the probability of each class.

Here is a detailed explanation of the entire process:

### 1. Linking the Latent Vectors to Labels for Classification
#### 1.1 Encoder and Latent Space Representation
The encoder is the first part of the VAE. It compresses an input image into a latent vector in a lower-dimensional latent space. Here's how it works:
- Each input image (`64x64x1`) is passed through several convolutional layers, batch normalization, and activation functions (LeakyReLU here).
- The output of the convolutional layers is then flattened and transformed into two vectors: **`z_mean`** and **`z_log_var`**.
  - `z_mean`: Represents the mean of the latent distribution.
  - `z_log_var`: Represents the log variance of the latent distribution.

The purpose of the latent vector is to encode important features of the image in a compact form. The latent vector is computed by sampling from a distribution defined by `z_mean` and `z_log_var`. This sampled vector (`z`) is use to represent the input image in a compressed form. This is essentially a feature representation of the original image.


#### 1.2 Classifier for Latent Vectors
##### 1.2.1 Fedding latent vector into the classifier model:
- The classifier is a simple feedforward neural network (with dense layers) that takes the latent vector (`z`) as input.
- The classifier then passes the latent vector through a few dense layers (fully connected layers) with `ReLU` activation and `Dropout` to learn non-linear relationships.
- Then, the output layer uses a `softmax` activation function to produce a probability distribution over the three classes.

##### 1.2.2 The output from the classifier:
- A vector with three values each representing the probability that the given input image belongs to one of the three classes (normal, bacterial pneumonia viral pneumonia).


### 2. How the Model Learns to Distinguish Between the Three Classes
The training process involves teaching the model to learn the distinguishing features of each class, which happens in the following way:


#### 2.1 One-Hot Encoding of Labels
- The labels are one-hot encoded before feeding into the model.
  - Normal: `[1, 0, 0]`
  - Bacterial Pneumonia: `[0, 1, 0]`
  - Viral Pneumonia: `[0, 0, 1]`
  
This one-hot encoding allows the model to learn which vector corresponds to which class, providing a target for the classification part of the network.


#### 2.2 Convolutional Feature Extraction by the Encoder
- Convolutional Layers in the encoder are responsible for feature extraction.
  - Filters in these layers detect specific patterns such as edges, textures, shapes, etc.
  - By passing the images through several convolutional layers, the encoder extracts the most important features that are required to identify the differences between the classes.
  
- For example:
  - For normal chest X-rays, the model might learn to detect features that indicate clear lung fields.
  - For bacterial pneumonia, it might learn features like dense, localized opacity, which are common in bacterial infections.
  - For viral pneumonia, the features might include diffuse opacity patterns that are distinct from bacterial infections.


#### 2.3 Latent Space Representation
- The latent vector (`z`) is a compressed representation of the image. It captures the key features extracted by the encoder.
- The encoder compresses images into a latent space in such a way that similar images have similar latent representations.
  - Images of normal lungs, for instance, would have latent vectors close to each other.
  - Similarly, images of bacterial pneumonia and viral pneumonia would be clustered in different areas of the latent space, but they would still maintain some/certain similarities based on shared pneumonia features.


#### 2.4 Classifier to Distinguish Classes
- The classifier receives the latent vector (`z`) and passes it through dense layers to map the latent features to class probabilities.
- The goal of classifier is to find **decision boundaries** in the latent space that separate the three classes.
  - During training, the **loss function** (categorical cross-entropy) penalizes incorrect classifications.
  - For every image, the model tries to predict the class by minimizing the distance between its prediction and the true label.
  - Over many epochs, the classifier learns most important features (encoded in the latent vector) for such as patterns, textures, and opacities, which are key indicators of the image's class identifying.
    - For example, features like opacity patterns might be indicative of pneumonia, while their distribution (localized vs. diffuse) could help distinguish between bacterial and viral pneumonia.


#### 2.5 Softmax Output for Classification
- The `softmax` function at the final output layer converts the output into a probability distribution over the three classes.
- The highest probability indicates the class the model thinks the image belongs to.
  - For example, for a **normal** image, the output might look like `[0.98, 0.01, 0.01]`, indicating a **98% probability** that the image is normal.
  - An another example, if the classifier produce an output like `[0.05, 0.85, 0.10]`, indicating that the model predicts an 85% probability that the image shows bacterial pneumonia.

I hope this explanation clarifies how the CNN classifier implemented and how the labels are linked to the latent vectors to differentiate between the three classes.
