## Plaque Classification Model Training

#### Developer: Meera Srikrishna

This script implements a deep learning pipeline for classifying amyloid plaque images into three categories: `coarse`, `cored`, and `diffused`. It includes data preprocessing, model training using a Convolutional Neural Network (CNN), and K-Fold cross-validation evaluation.

## Steps:
1. **Import required libraries**: Load necessary Python packages for image processing, deep learning, and evaluation.
2. **Load and preprocess images**: Convert images to grayscale, normalize pixel values, and reshape for CNN input.
3. **Define and compile CNN model**: Construct a sequential CNN model with convolutional, pooling, and dense layers.
4. **Implement K-Fold cross-validation**: Train the model across multiple folds to ensure robustness.
5. **Apply data augmentation**: Use transformations like rotation, zoom, and flips to enhance training.
6. **Train the model**: Utilize callbacks for early stopping, learning rate adjustments, and model checkpointing.
7. **Evaluate and save results**: Store training history, best models, and logs for further analysis.

#### 1. Import required libraries

In [None]:
import os
import numpy as np
import pandas as pd
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import StratifiedKFold

#### 2. Load and preprocess images

In [None]:
# Load and preprocess images
data_dir = "path/to/input/data/images"
all_images = []
for filename in sorted(os.listdir(data_dir)):
    if filename.endswith('.jpg') or filename.endswith('.tif'):
        image = cv2.imread(os.path.join(data_dir, filename))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        all_images.append(image)

image_array = np.array(all_images)
#Save the numpy training array for future use
#np.save('train_images_plaq_class_rep_grayscale.npy', image_array)
del image_array

# Load labels
label_array = pd.read_csv('path/to/labels.csv', header=None).values
#Save the numpy training label array for future use
#np.save('train_labels_plaq_class_rep.npy', train_data)
del label_array

# Load preprocessed data
train_data = np.load('train_images_plaq_class_rep_grayscale.npy')
train_label = np.load("train_labels_plaq_class_rep.npy")

# Normalize images
min_value = np.min(train_data)
max_value = np.max(train_data)
train_data = (train_data - min_value) / (max_value - min_value)
train_data = train_data.reshape(-1, 120, 120, 1)

# Convert labels to categorical format
train_label_cat = to_categorical(train_label)

#### 3. Define and compile CNN model

In [None]:
def create_model():
    model = Sequential([
        Conv2D(16, (3,3), activation='relu', input_shape=(120,120,1)),
        MaxPooling2D(2, 2),
        Conv2D(32, (3,3), activation='relu'),
        MaxPooling2D(2,2),
        Conv2D(64, (3,3), activation='relu'),
        MaxPooling2D(2,2),
        Conv2D(128, (3,3), activation='relu'),
        MaxPooling2D(2,2),
        Conv2D(128, (3,3), activation='relu'),
        MaxPooling2D(2,2),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(3, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

#### 4. Implement K-Fold cross-validation

In [None]:
num_folds = 3 #default is 3; can be changed accordingly
kf = StratifiedKFold(n_splits=num_folds, shuffle=True, random_state=42)

#### 5. Apply data augmentation

In [None]:
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

#### 6-7: Train the model and save the results

In [None]:
X, y = train_data, train_label_cat
fold = 0
for train_index, val_index in kf.split(X, y):
    fold += 1
    print(f"Fold {fold}")
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y[train_index], y[val_index]
    train_datagen = datagen.flow(X_train, y_train, batch_size=4)
    model = create_model()
    checkpoint_filepath = f'fold_{fold}_best_model.h5'
    callbacks = [
        ModelCheckpoint(filepath=checkpoint_filepath, monitor='val_accuracy', save_best_only=True, mode='max'),
        CSVLogger(f'fold_{fold}_training.log'),
        EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
        ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=1e-6)
    ]
    history = model.fit(train_datagen, epochs=100, validation_data=(X_val, y_val), callbacks=callbacks)
    np.save(f'fold_{fold}_history.npy', history.history)
    print("Training complete. Best models and logs saved.")