In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Github Repo: https://github.com/parthsaxena/CS598-Final-Project

# Video: https://drive.google.com/file/d/15eu31xB9dxJO9vGj80GlUjBp_ub5Cdtu/view?usp=sharing

(This pdf does not show the full python notebook so please use the github repo to view the rest of it)

---

# Introduction

## Background of the Problem

### What Type of Problem

Alzheimer's Disease (AD) is the most common form of dementia and poses significant challenges in diagnosis and management. The early stages, such as Mild Cognitive Impairment (MCI), often progress to Alzheimer's, making early and accurate detection critical for effective intervention. This project addresses the problem of Alzheimer’s disease diagnosis using a multimodal deep learning approach, integrating different types of medical data to improve diagnostic accuracy.

### Importance and Meaning of Solving the Problem

Accurately diagnosing Alzheimer's Disease is crucial as it allows for earlier intervention strategies, which can significantly delay disease progression and improve quality of life. Moreover, with the aging global population, the incidence of AD is expected to rise, increasing the burden on healthcare systems. Thus, enhancing diagnostic techniques is not only beneficial for patient care but also economically vital.

### Difficulty of the Problem

Diagnosing AD is inherently challenging due to its subtle onset and the overlap of symptoms with normal aging and other forms of dementia. Traditional diagnostic methods rely heavily on clinical assessments and biomarkers, which might not capture the full spectrum of the disease's progression. The integration of multiple data types (e.g., imaging, genetic, and clinical data) introduces additional complexity, including data heterogeneity and alignment, making effective analysis challenging.

### State of the Art Methods and Effectiveness

Current state-of-the-art methods in AD diagnosis include various machine learning models that utilize single-modality data sources, typically imaging or genetic data. While these methods have shown promise, they often fail to capture the intricate patterns across different types of data that are indicative of AD. Multimodal learning approaches have begun to address this limitation by combining information from multiple sources, leading to improved diagnostic accuracy and robustness.

## Paper Explanation

### Proposal of the Paper

The paper proposes a novel Multimodal Alzheimer’s Disease Diagnosis framework (MADDi) that utilizes a multimodal attention-based deep learning architecture. The framework integrates imaging, genetic, and clinical data to enhance the diagnosis of Alzheimer's Disease and its precursors.

### Innovations of the Method

MADDi introduces a cross-modal attention mechanism that allows the model to effectively learn from and integrate multiple data modalities. This approach enables the model to capture interactions between different types of data, which is a significant advancement over traditional methods that typically concatenate features from separate modalities without exploring their interactions.

### Effectiveness of the Proposed Method

In the original study, MADDi achieved a state-of-the-art accuracy of 96.88% on a held-out test set for classifying control, MCI, and AD stages. This performance indicates a substantial improvement over previous models and underscores the effectiveness of leveraging cross-modal attention in multimodal learning setups.

### Contribution to the Research Regime

The paper’s contributions are significant as they address the critical challenge of integrating heterogeneous data types in a meaningful way. By demonstrating the effectiveness of cross-modal attention mechanisms, the study not only advances the field of Alzheimer’s diagnosis but also lays the groundwork for similar approaches in other complex, multimodal medical diagnostic tasks. This innovation opens new avenues for research into more effective, interpretable models that can better support clinical decision-making processes.

## Scopes of Reproducibility

### Hypothesis 1: Unimodal models provide a good baseline diagnostic accuracy.

**Hypothesis Description:**
The original paper posits that the use of cross-modal attention mechanisms significantly enhances the model’s ability to integrate and interpret data from different modalities (imaging, genetic, and clinical), leading to improved diagnostic accuracy for Alzheimer's disease and its precursor states. We will test the claim that individual biomarkers such as clinical data and imaging are sufficient for accurate models.

**Experiment to Test the Hypothesis:**
1. **Data Preparation:**
   - Utilize the same subset of the ADNI dataset as used in the original study, ensuring each subject has imaging, genetic, and clinical data available.
   - Follow the preprocessing steps described in the paper for each data modality.

2. **Model Implementation:**
   - Implement the unimodal models alone to measure their performance. Ensure that all other aspects of the model architecture remain constant to isolate the effect of the modality switch.

3. **Training:**
   - Train all models on the training subset of the ADNI dataset.

4. **Evaluation:**
   - Evaluate all models on a separate validation set from the ADNI dataset.
   - Compare the performance metrics (accuracy, F1-score, precision, recall) of the models.

5. **Analysis:**
   - Analyze the results to determine if the unimodal models are sufficient for a good diagnostic accuracy.

### Hypothesis 2: Integration of Multiple Data Modalities Leads to Higher Diagnostic Performance than Single Modality Approaches

**Hypothesis Description:**
The paper claims that the integration of multiple data modalities (imaging, genetic, clinical) using a multimodal deep learning approach significantly outperforms any single-modality model in diagnosing Alzheimer's disease stages.

**Experiment to Test the Hypothesis:**
1. **Data Preparation:**
   - Use the same instances from the ADNI dataset, ensuring availability of imaging, genetic, and clinical data.
   - Process each modality as per the guidelines detailed in the paper.

2. **Model Implementation:**
   - Develop three unimodal models for each data type: imaging (using convolutional neural networks), genetic (using fully connected networks), and clinical (using fully connected networks).

3. **Training:**
   - Train all models separately on their respective datasets.

4. **Evaluation:**
   - Evaluate each model on a common validation set from the ADNI dataset.
   - Record and compare performance metrics for each model.

5. **Analysis:**
   - Determine whether the unimodal approach performs well in terms of accuracy, precision, recall, and F1-scores.

These experiments are designed to validate the central claims of the paper regarding the efficacy of multimodal and attention-based methods in improving the diagnosis of Alzheimer's disease using the ADNI dataset when compared to the baseline unimodal models.

# Methodology

### Environment

We conducted all implementations using Python version 3.11.6.

A list of all packages/dependencies needed are:

- TensorFlow
- Keras
- NumPy
- Pandas
- Pickle
- Matplotlib
- Seaborn
- scikit-learn
- os (standard)
- random (standard)

## Data

For our project, we will utilize the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which provides a comprehensive collection of genetic, imaging, and clinical data. Our data processing will involve several steps to ensure the data is compatible with our model requirements:

1. **Source of Data:** The data will be sourced from ADNI, specifically designed to assist in the study of Alzheimer’s disease progression. The dataset includes clinical assessments, imaging through MRI and PET scans, and genetic markers. For more information or to access the dataset, visit [ADNI's official site](https://adni.loni.usc.edu/).

2. **Statistics:** We will process and analyze over 1500 individual patient records. Each record contains multimodal data types, including demographic information, genetic markers, and imaging data.

3. **Data Processing:**
   - Clinical and genetic data will be normalized and categorized as needed.
   - Imaging data will be resized and standardized to ensure uniform input sizes for the neural network.
   - Missing values will be handled by imputation or removal, depending on their frequency and impact on the dataset’s integrity.

## Data Preprocessing

This is a picture of the data and what data was collected from the paper.

https://drive.google.com/file/d/1P85yipZm1QaKI1EjaAb5VOLSwE_6j-of/view?usp=sharing


### Combine all diagnoses

We will take diagnoses from images, clinical, and diagnosis sheet, and create one ground truth where all sources agree, and one majority vote where two sources agree.

In [None]:
import pandas as pd
import math
clinical = pd.read_csv("data/ADSP_PHC_COGN.csv") # .rename(columns={"PHASE":"Phase"})
#this file is the metadata file that one can get from downloading MRI images from ADNI
img = pd.read_csv("data/ADNI1_Annual_2_Yr_3T_4_14_2024.csv")
comb = pd.read_csv("data/DXSUM_PDXCONV_ADNIALL.csv")[["RID", "PTID" , "PHASE"]]

def read_diagnose(file_path: str = 'data/DXSUM_PDXCONV_ADNIALL.csv', verbose=False):
    # Read diagnostic summary
    diagnostic_summary = pd.read_csv(file_path, index_col='PTID')
    diagnostic_summary = diagnostic_summary.sort_values(by=["update_stamp"], ascending=True)
    # Create dictionary
    diagnostic_dict: dict = {}
    for key, data in diagnostic_summary.iterrows():
        # Iterate for each row of the document
        phase: str = data['PHASE']
        diagnosis: float = -1.
        if phase == "ADNI1":
            diagnosis = data['DIAGNOSIS']
        elif phase == "ADNI2" or phase == "ADNIGO":
            dxchange = data['DIAGNOSIS']
            if dxchange == 1 or dxchange == 7 or dxchange == 9:
                diagnosis = 1.
            if dxchange == 2 or dxchange == 4 or dxchange == 8:
                diagnosis = 2.
            if dxchange == 3 or dxchange == 5 or dxchange == 6:
                diagnosis = 3.
        elif phase == "ADNI3":
            diagnosis = data['DIAGNOSIS']
        else:
            # print(f"ERROR: Not recognized study phase {phase}")
            # exit(1)
            pass
        # Update dictionary
        if not math.isnan(diagnosis):
            diagnostic_dict[key] = diagnosis
    if verbose:
        print_diagnostic_dict_summary(diagnostic_dict)
    return diagnostic_dict


def print_diagnostic_dict_summary(diagnostic_dict: dict):
    print(f"Number of diagnosed patients: {len(diagnostic_dict.items())}\n")
    n_NL = 0
    n_MCI = 0
    n_AD = 0
    for (key, data) in diagnostic_dict.items():
        if data == 1:
            n_NL += 1
        if data == 2:
            n_MCI += 1
        if data == 3:
            n_AD += 1
    print(f"Number of NL patients: {n_NL}\n"
          f"Number of MCI patients: {n_MCI}\n"
          f"Number of AD patients: {n_AD}\n")

d = read_diagnose()
print_diagnostic_dict_summary(d)

new = pd.DataFrame.from_dict(d, orient='index').reset_index()
clinical["year"] = clinical["EXAMDATE"].str[:4]
clinical["Subject"] = clinical["SUBJECT_KEY"].str.replace("ADNI_", "").str.replace("s", "S")
c = comb.merge(clinical, on = ["RID", "PHASE"])
c = c.drop("Subject", axis =1)
c = c.rename(columns = {"PTID":"Subject"})
img["year"] = img["Acq Date"].str[5:].str.replace("/", "")
img = img.replace(["CN", "MCI", "AD"], [ 0, 1, 2])
c["DX"] = c["DX"] -1
new[0] = new[0].astype(int) -1
new = new.rename(columns = {"index":"Subject", 0:"GroupN"})
m = new.merge(c, on = "Subject", how = "outer").merge(img, on = "Subject", how = "outer")
m[["GroupN", "DX", "Group"]]
m = m[["Subject", "GroupN", "Group", "DX", "PHASE"]].drop_duplicates()
m = m.dropna(subset = ["GroupN", "Group", "DX"], how="all").drop_duplicates()
m.loc[m["DX"].isna() & m["Group"].isna(), "Group"] = m.loc[m["DX"].isna() & m["Group"].isna(), "GroupN"]
m.loc[m["DX"].isna() & m["Group"].isna(), "DX"] = m.loc[m["DX"].isna() & m["Group"].isna(), "GroupN"]
m1 = m[m["GroupN"] == m["Group"]]
m3 = m[m["GroupN"] == m["DX"]]
m4 = m[m["Group"] == m["DX"]]
m2 = m1[m1["Group"] == m1["DX"]]
m1 = m1[["Subject", "GroupN", "Group", "DX", "PHASE"]]
m1.loc[m1["DX"].isna(), "DX"] = m1.loc[m1["DX"].isna(), "Group"]
m3 = m3[["Subject", "GroupN", "Group", "DX", "PHASE"]]
m3.loc[m3["Group"].isna(), "Group"] = m3.loc[m3["Group"].isna(), "GroupN"]
m4 = m4[["Subject", "GroupN", "Group", "DX", "PHASE"]]
m4[m4["GroupN"] != m4["DX"]]
m5 = pd.concat([m1,m3,m4])
i = m5[m5["Group"] == m5["GroupN"]]
i = i[i["Group"] == i["DX"]]
i = i.drop_duplicates()
i[["Subject", "Group", "PHASE"]].to_csv("ground_truth.csv")
m.update(m5[~m5.index.duplicated(keep='first')])
indexes = m.index
m["GROUP"] = -1

for i in indexes:
    row = m.loc[i]
    if (row["GroupN"] == row["Group"]):
        val = row["GroupN"]

        m.loc[i, "GROUP"] = val
    elif (row["GroupN"] == row["DX"]):
        val = row["GroupN"]
        m.loc[i, "GROUP"] = val

    elif (row["Group"] == row["DX"]):
        val = row["Group"]
        m.loc[i, "GROUP"] = val
m5 = m5[~m5.index.duplicated(keep='first')]
m[m["GROUP"] != -1]
m[["Subject", "GroupN", "Group", "DX", "GROUP", "PHASE"]].to_csv("diagnosis_full.csv")

Number of diagnosed patients: 3033

Number of NL patients: 1023
Number of MCI patients: 958
Number of AD patients: 879



### Create clinical dataset

We will now process raw CSV clinical data from ADNI into a format that can be standardized with other sources (imaging, genetic)

In [None]:
# %%
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense,Dropout,MaxPooling1D, Flatten,BatchNormalization, GaussianNoise,Conv1D
import matplotlib.pyplot as plt
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.utils import compute_class_weight
from tensorflow.keras import initializers
from tensorflow.keras import regularizers
from tensorflow.keras.models import Sequential, save_model, load_model

#this was created in general/diagnosis_making notebook
diag = pd.read_csv("ground_truth.csv").drop("Unnamed: 0", axis=1)
# Below we are combining several clinical datasets.
demo = pd.read_csv("data/PTDEMOG.csv")
neuro = pd.read_csv("data/NEUROEXM.csv")


neuro.columns

clinical = pd.read_csv("data/ADSP_PHC_COGN.csv") #.rename(columns={"PHASE":"PHASE"})
clinical.head()
diag["Subject"].value_counts()

comb = pd.read_csv("data/DXSUM_PDXCONV_ADNIALL.csv")[["RID", "PTID" , "PHASE"]]
m = comb.merge(demo, on = ["RID", "PHASE"]).merge(neuro,on = ["RID", "PHASE"]).merge(clinical,on = ["RID", "PHASE"]).drop_duplicates()
m.columns = [c[:-2] if str(c).endswith(('_x','_y')) else c for c in m.columns]
m = m.loc[:,~m.columns.duplicated()]
diag = diag.rename(columns = {"Subject": "PTID"})
m = m.merge(diag, on = ["PTID", "PHASE"])
m["PTID"].value_counts()

t = m
t = t.drop(["ID",  "SITEID", "VISCODE", "VISCODE2", "USERDATE", "USERDATE2",
            "update_stamp",  "PTSOURCE","DX"], axis=1)

t.columns
t = t.fillna(-4)
t = t.replace("-4", -4)
cols_to_delete = t.columns[(t == -4).sum()/len(t) > .70]
t.drop(cols_to_delete, axis = 1, inplace = True)

len(t.columns)
print(t.columns)
print(cols_to_delete)

categorical = ['PTGENDER',
 'PTHOME',
 'PTMARRY',
 'PTEDUCAT',
 'PTPLANG',
 'NXVISUAL',
 'PTNOTRT',
 'NXTREMOR',
 'NXAUDITO',
 'PTHAND']

quant = ['PTDOBYY',
 'PHC_MEM',
 'PHC_EXF',
 'PTRACCAT',
 'AGE',
 'PTADDX',
 'PTETHCAT',
 'PTCOGBEG',
 'PHC_VSP',
 'PHC_LAN']

text = ["PTWORK", "CMMED", "PTDOB", "VISDATE"]
cols_left = list(set(t.columns) - set(categorical) - set(text)  - set(["label", "Group","GROUP", "PHASE", "RID", "PTID"]))
t[cols_left]

for col in cols_left:
    if len(t[col].value_counts()) < 10:
        print(col)
        categorical.append(col)

to_del = ["PTRTYR", "EXAMDATE", "SUBJECT_KEY"]
t = t.drop(to_del, axis=1)

quant = list(set(cols_left) - set(categorical) - set(text)  -set(to_del) - set(["label", "Group","GROUP", "PHASE", "RID", "PTID"]))
cols_left = list(set(cols_left) - set(categorical) - set(text) - set(quant) - set(to_del))

#after reviewing the meaning of each column, these are the final ones
l = ['RID', 'PTID', 'Group', 'PHASE', 'PTGENDER', 'PTDOBYY', 'PTHAND',
       'PTMARRY', 'PTEDUCAT', 'PTNOTRT', 'PTHOME', 'PTTLANG',
       'PTPLANG', 'PTCOGBEG', 'PTETHCAT', 'PTRACCAT', 'NXVISUAL',
       'NXAUDITO', 'NXTREMOR', 'NXCONSCI', 'NXNERVE', 'NXMOTOR', 'NXFINGER',
       'NXHEEL', 'NXSENSOR', 'NXTENDON', 'NXPLANTA', 'NXGAIT',
       'NXABNORM',  'PHC_MEM', 'PHC_EXF', 'PHC_LAN', 'PHC_VSP']

dfs = []

for col in categorical:
    dfs.append(pd.get_dummies(t[col], prefix = col, dtype=float))

cat = pd.concat(dfs, axis=1)
c = pd.concat([t[["PTID", "RID", "PHASE", "Group"]].reset_index(), cat.reset_index(), t[quant].reset_index()], axis=1).drop("index", axis=1) #tex
#removing repeating subjects, taking the most recent diagnosis
c = c.groupby('PTID',
                  group_keys=False).apply(lambda x: x.loc[x["Group"].astype(int).idxmax()]).drop("PTID", axis = 1).reset_index(inplace=False)
c.to_csv("clinical.csv")


#reading in the overlap test set
# ts = pd.read_csv("overlap_test_set.csv").rename(columns={"subject": "PTID"})
# #removing ids from the overlap test set
# c = c[~c["PTID"].isin(list(ts["PTID"].values))]

cols = list(set(c.columns) - set(["PTID","RID","subject", "ID","GROUP", "Group", "label", "PHASE", "SITEID", "VISCODE", "VISCODE2", "USERDATE", "USERDATE2", "update_stamp", "DX_x","DX_y", "Unnamed: 0"]))
X = c[cols].values
y = c["Group"].astype(int).values


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# print(X_train[:1])

import pickle

print("X_train shape: ", X_train.shape, "y_train shape: ", y_train.shape, "X_test shape: ", X_test.shape, "y_test shape: ", y_test.shape)

with open('X_train_c.pkl', 'wb') as f:
    pickle.dump(X_train, f)

with open('X_test_c.pkl', 'wb') as f:
    pickle.dump(X_test, f)

with open('y_train_c.pkl', 'wb') as f:
    pickle.dump(y_train, f)

with open('y_test_c.pkl', 'wb') as f:
    pickle.dump(y_test, f)


Index(['RID', 'PTID', 'PHASE', 'VISDATE', 'PTGENDER', 'PTDOB', 'PTDOBYY',
       'PTHAND', 'PTMARRY', 'PTEDUCAT', 'PTNOTRT', 'PTRTYR', 'PTHOME',
       'PTTLANG', 'PTPLANG', 'PTCOGBEG', 'PTADDX', 'PTETHCAT', 'PTRACCAT',
       'NXVISUAL', 'NXAUDITO', 'NXTREMOR', 'NXCONSCI', 'NXNERVE', 'NXMOTOR',
       'NXFINGER', 'NXHEEL', 'NXSENSOR', 'NXTENDON', 'NXPLANTA', 'NXGAIT',
       'NXOTHER', 'NXABNORM', 'SUBJECT_KEY', 'EXAMDATE', 'AGE', 'PHC_MEM',
       'PHC_EXF', 'PHC_LAN', 'PHC_VSP', 'Group'],
      dtype='object')
Index(['PTWORKHS', 'PTWORK', 'PTADBEG', 'PTIDENT', 'PTENGSPK', 'PTNLANG',
       'PTENGSPKAGE', 'PTCLANG', 'PTLANGSP', 'PTLANGWR', 'PTSPTIM',
       'PTSPOTTIM', 'PTLANGPR1', 'PTLANGSP1', 'PTLANGRD1', 'PTLANGWR1',
       'PTLANGUN1', 'PTLANGPR2', 'PTLANGSP2', 'PTLANGRD2', 'PTLANGWR2',
       'PTLANGUN2', 'PTLANGPR3', 'PTLANGSP3', 'PTLANGRD3', 'PTLANGWR3',
       'PTLANGUN3', 'PTLANGPR4', 'PTLANGSP4', 'PTLANGRD4', 'PTLANGWR4',
       'PTLANGUN4', 'PTLANGPR5', 'PTLANGSP5', 'PTLA

### Preprocess MRI images

The raw images provided by ADNI are already pre-processed with specific image correction steps. On top of these pre-processing steps, we create a full dataset that matches subjects from metadata with their MRI images that can be used for training/evaluation.

In [None]:
import numpy as np
import skimage.transform as skTrans
import nibabel as nib
import pandas as pd
import os
import sys
import time


def normalize_img(img_array):
    maxes = np.quantile(img_array, 0.995, axis=(0, 1, 2))
    return img_array / maxes

def create_dataset(meta, meta_all, path_to_datadir):
    files = os.listdir(path_to_datadir)
    start = '_'
    end = '.nii'

    for file in files:
        if file.endswith(end) == True:
        # if file != '.DS_Store':
            path = os.path.join(path_to_datadir, file)
            print(path)
            img_id = file.split(start)[-1].split(end)[0]
            idx = meta[meta["Image Data ID"] == img_id].index[0]
            im = nib.load(path).get_fdata()
            n_i, n_j, n_k = im.shape
            center_i = (n_i - 1) // 2
            center_j = (n_j - 1) // 2
            center_k = (n_k - 1) // 2
            im1 = skTrans.resize(im[center_i, :, :], (72, 72), order=1, preserve_range=True)
            im2 = skTrans.resize(im[:, center_j, :], (72, 72), order=1, preserve_range=True)
            im3 = skTrans.resize(im[:, :, center_k], (72, 72), order=1, preserve_range=True)
            im = np.array([im1, im2, im3]).T
            print(im.shape)
            label = meta.at[idx, "Group"]
            subject = meta.at[idx, "Subject"]
            norm_im = normalize_img(im)

            # Creating a temporary DataFrame and concatenating it
            temp_df = pd.DataFrame([{"img_array": norm_im, "label": label, "subject": subject}])
            meta_all = pd.concat([meta_all, temp_df], ignore_index=True)

    # Save the final DataFrame
    meta_all.to_pickle("mri_meta.pkl")

meta = pd.read_csv("meta.csv")
print(len(meta))
meta = meta[["Image Data ID", "Group", "Subject"]] #MCI = 0, CN =1, AD = 2
meta["Group"] = pd.factorize(meta["Group"])[0]
meta_all = pd.DataFrame(columns = ["img_array","label","subject"])
create_dataset(meta, meta_all, "imgs1/")


306
imgs1/ADNI_067_S_0290_MR_MPR__GradWarp__B1_Correction__N3__Scaled_Br_20080611160129027_S50875_I109187.nii
(72, 72, 3)
imgs1/ADNI_136_S_0184_MR_MPR____N3__Scaled_Br_20070215174801158_S12474_I40191.nii
(72, 72, 3)
imgs1/ADNI_130_S_0449_MR_MPR____N3__Scaled_Br_20071119102143821_S38619_I82686.nii
(72, 72, 3)
imgs1/ADNI_136_S_0429_MR_MPR____N3__Scaled_Br_20070215221039819_S15882_I40392.nii
(72, 72, 3)
imgs1/ADNI_023_S_0376_MR_MPR__GradWarp__B1_Correction__N3__Scaled_Br_20061201170258692_S13786_I31392.nii
(72, 72, 3)
imgs1/ADNI_130_S_0886_MR_MPR____N3__Scaled_Br_20080220160240264_S41511_I91173.nii
(72, 72, 3)
imgs1/ADNI_007_S_1206_MR_MPR__GradWarp__B1_Correction__N3__Scaled_Br_20070713113644677_S25703_I59955.nii
(72, 72, 3)
imgs1/ADNI_027_S_0307_MR_MPR__GradWarp__B1_Correction__N3__Scaled_Br_20061222185350351_S14336_I34168.nii
(72, 72, 3)
imgs1/ADNI_002_S_1018_MR_MPR____N3__Scaled_Br_20070217032215330_S24312_I40828.nii
(72, 72, 3)
imgs1/ADNI_032_S_0677_MR_MPR__GradWarp__B1_Correction__N3

### Split MRI image dataset for training

Since there are multiple MRI images per subject, we will find unique patient ID's then randomly select a set of training items and test items that we can then use directly in our model.

In [None]:
import pandas as pd
import random
#reading in a dataframe that contains image arrays, patient IDs ("subject"), and diagnosis
m2 = pd.read_pickle("mri_meta.pkl")

#cleaning patient IDs
m2["subject"] = m2["subject"].str.replace("s", "S").str.replace("\n", "")

#reading in the overlap test set
# ts = pd.read_csv("overlap_test_set.csv")

# #removing ids from the overlap test set
# m2 = m2[~m2["subject"].isin(list(ts["subject"].values))]


#there are 86 unique patients
subjects = list(set(m2["subject"].values))
len(subjects)

# We do not allow for any repeating patients in the testing set. We only allowed repetition during training, and no patient was included in both training and testing sets.
#selecting 367 patient IDs
picked_ids = random.sample(subjects, 9)


#creating the test set out of the patient IDs
test = pd.DataFrame(columns=["img_array", "subject", "label"])

for picked_id in picked_ids:
    # Sample a single entry where the 'subject' column matches 'picked_id'
    s = m2[m2["subject"] == picked_id].sample(n=1)
    # Concatenate the sampled DataFrame 's' with 'test'
    test = pd.concat([test, s], ignore_index=True)


indexes = list(set(m2.index) - set(test.index))


#creating the training set using all the other data points
train = m2[m2.index.isin(indexes)]


train[["img_array"]].to_pickle("img_train.pkl")
test[["img_array"]].to_pickle("img_test.pkl")

print(train[["img_array"]].shape)


train[["label"]].to_pickle("img_y_train.pkl")
test[["label"]].to_pickle("img_y_test.pkl")

(74, 1)


## Models + Training + Evaluation

Here is a [link](https://github.com/rsinghlab/MADDi) to the original paper's repository, and below is a citation to the original paper:

1.   Michal Golovanevsky, Carsten Eickhoff, Ritambhara Singh, Multimodal attention-based deep learning for Alzheimer’s disease diagnosis, Journal of the American Medical Informatics Association, Volume 29, Issue 12, December 2022, Pages 2014–2022, https://doi.org/10.1093/jamia/ocac168

### Metrics Descriptions

For evaluating performance, we use several key metrics to assess accuracy and reliability in classifying patients into one of three categories: control group (no Alzheimer's disease), Alzheimer's disease, and No Alzheimer's disease (at-risk but not diagnosed). Here's a concise description of each metric used:

- **Test Accuracy:** This metric measures the overall effectiveness of the model in correctly predicting the categories across all test samples. It's the ratio of correct predictions to the total number of cases examined.
- **Precision:** Precision is calculated for each category and indicates the accuracy of positive predictions. It is defined as the ratio of true positives to the sum of true and false positives. This metric helps us understand the likelihood that a patient diagnosed with Alzheimer's by the model actually has the disease.
- **Recall:** Recall measures the model's ability to identify all relevant instances per category. For our case, it reflects the proportion of actual Alzheimer's patients correctly identified by the model out of all patients who actually have Alzheimer's.
- **F1-Score:** The F1-score combines precision and recall into a single metric by taking their harmonic mean. A higher F1-score indicates a more robust model.

###   Unimodal Clinical Model

**Model Description:** We use the same model(s) architecture described in the paper for the individual modality for the clinical data using modified code from MADDi. We are unable to use the same exact data as the authors due to ADNI data/protocol updates which resulted in slight deviations in our approach and results. For clinical data, we use a three-layer fully connected network just as the paper describes.

**Hyperparameters:** We use **100 epochs** with a **batch size of 32** and **learning rate of 0.0001** in our training. We use **3 dropout rates of 0.5, 0.3, and 0.2** in an attempt to reduce overfitting.

**Computational Requirements:** We are using the standard Google Colab provided GPU. From the results, we observe an **~0.8s** average runtime for each epoch. We only use one trial with one random seed with 100 epochs.

In [None]:
import pandas as pd
import numpy as np
import os
import random
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

def reset_random_seeds(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

def train():
    # Load the data
    X_train = pd.read_pickle("X_train_c.pkl")
    y_train = pd.read_pickle("y_train_c.pkl")
    X_test = pd.read_pickle("X_test_c.pkl")
    y_test = pd.read_pickle("y_test_c.pkl")

    # Adjust data types
    X_train = X_train.astype('float32')
    y_train = y_train.astype('int32')
    X_test = X_test.astype('float32')
    y_test = y_test.astype('int32')

    # Build the model
    model = Sequential()
    model.add(Input(shape=(113,)))  # Adjust the input shape to match the actual feature count
    model.add(Dense(128, activation="relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(64, activation="relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))
    model.add(Dense(50, activation="relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    model.add(Dense(3, activation="softmax"))

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.0001), loss="sparse_categorical_crossentropy", metrics=["sparse_categorical_accuracy"])

    # Model summary
    model.summary()

    # Train the model
    history = model.fit(X_train, y_train, epochs=100, validation_split=0.1, batch_size=32, verbose=1)

    # Evaluate the model
    score = model.evaluate(X_test, y_test, verbose=0)
    print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

    # Predictions
    test_predictions = model.predict(X_test)
    test_label = to_categorical(y_test, 3)
    true_label = np.argmax(test_label, axis=1)
    predicted_label = np.argmax(test_predictions, axis=1)
    cr = classification_report(true_label, predicted_label, output_dict=True)
    print("Classification Report:", cr)

train()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 128)               14592     
                                                                 
 batch_normalization (Batch  (None, 128)               512       
 Normalization)                                                  
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 batch_normalization_1 (Bat  (None, 64)                256       
 chNormalization)                                                
                                                                 
 dropout_1 (Dropout)         (None, 64)                0

### Evaluation Discussion for Clinical Model

With our limited data and training approach, we report an overall test accuracy of **~69.3%** across all 3 patient groups. Across these 3 groups, we obtain a weighted precision of **~73.1%**, a weighted recall of **~69.2%**, and an F1-score of **0.698**.

Our overall accuracy is roughly ~10% below the authors reported accuracy of 80.59%. This is likely due to us using a smaller subset of the data compared to the authors implementation.

### Unimodal MRI Imaging Model

The paper and its code use a large (and different) subset of patient imaging data compared to our approach. To ensure our code can run on Colab, we only used about 96 patients worth of data. Similar to the paper, we used a three-layer CNN.

**Model Description:** We use the same model(s) architecture described in the paper for the individual modality for the clinical data using modified code from MADDi. We are unable to use the same exact data as the authors due to ADNI data/protocol updates which resulted in slight deviations in our approach and results. For clinical data, we use a three-layer fully connected network just as the paper describes.

**Hyperparameters:** We use **50 epochs** with a **batch size of 32** and **learning rate of 0.001** in our training. We use **2 dropout rates of 0.5, and 0.3** in an attempt to reduce overfitting.

**Computational Requirements:** We are using the standard Google Colab provided GPU. From the results, we observe an **~2.4s** average runtime for each epoch. We conduct 4 trial with 4 random seeds with 50 epochs each.

In [None]:
import os
import random
import tensorflow as tf
from tensorflow import keras
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import pickle as pickle
import matplotlib.pyplot as plt
from keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import classification_report
from keras.layers import Dense,Dropout,MaxPooling2D, Flatten, Conv2D


def reset_random_seeds(seed):
    os.environ['PYTHONHASHSEED']=str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)


def train():

    with open("img_train.pkl", "rb") as fh:
        data = pickle.load(fh)
    X_train_ = pd.DataFrame(data)["img_array"]

    with open("img_test.pkl", "rb") as fh:
        data = pickle.load(fh)
    X_test_ = pd.DataFrame(data)["img_array"]

    with open("img_y_train.pkl", "rb") as fh:
        data = pickle.load(fh)
    y_train = np.array(pd.DataFrame(data)["label"].values.astype(np.float32)).flatten()

    with open("img_y_test.pkl", "rb") as fh:
        data = pickle.load(fh)
    y_test = np.array(pd.DataFrame(data)["label"].values.astype(np.float32)).flatten()


    y_test[y_test == 2] = -1
    y_test[y_test == 1] = 2
    y_test[y_test == -1] = 1

    y_train[y_train == 2] = -1
    y_train[y_train == 1] = 2
    y_train[y_train == -1] = 1


    X_train = []
    X_test = []

    for i in range(len(X_train_)):
        X_train.append(X_train_.values[i])

    for i in range(len(X_test_)):
        X_test.append(X_test_.values[i])


    X_train = np.array(X_train)
    X_test = np.array(X_test)


    acc = []
    f1 = []
    precision = []
    recall = []
    seeds = [10, 20, 30, 40]
    for seed in seeds:
        reset_random_seeds(seed)
        model = Sequential()
        model.add(Conv2D(100, (3, 3),  activation='relu', input_shape=(72, 72, 3)))
        model.add(MaxPooling2D((2, 2)))
        model.add(Dropout(0.5))
        model.add(Conv2D(50, (3, 3), activation='relu'))
        model.add(MaxPooling2D((2, 2)))
        model.add(Dropout(0.3))
        model.add(Flatten())
        model.add(Dense(3, activation = "softmax"))


        model.compile(Adam(learning_rate = 0.001), "sparse_categorical_crossentropy", metrics = ["sparse_categorical_accuracy"])

        model.summary()


        history = model.fit(X_train, y_train, epochs=50, batch_size=32,validation_split=0.1, verbose=1)

        score = model.evaluate(X_test, y_test, verbose=0)
        print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
        acc.append(score[1])

        test_predictions = model.predict(X_test)
        test_label = to_categorical(y_test,3)

        true_label= np.argmax(test_label, axis =1)

        predicted_label= np.argmax(test_predictions, axis =1)

        cr = classification_report(true_label, predicted_label, output_dict=True)
        precision.append(cr["macro avg"]["precision"])
        recall.append(cr["macro avg"]["recall"])
        f1.append(cr["macro avg"]["f1-score"])

    print("Avg accuracy: " + str(np.array(acc).mean()))
    print("Avg precision: " + str(np.array(precision).mean()))
    print("Avg recall: " + str(np.array(recall).mean()))
    print("Avg f1: " + str(np.array(f1).mean()))
    print("Std accuracy: " + str(np.array(acc).std()))
    print("Std precision: " + str(np.array(precision).std()))
    print("Std recall: " + str(np.array(recall).std()))
    print("Std f1: " + str(np.array(f1).std()))
    print(acc)
    print(precision)
    print(recall)
    print(f1)

train()




Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 70, 70, 100)       2800      
                                                                 
 max_pooling2d_6 (MaxPoolin  (None, 35, 35, 100)       0         
 g2D)                                                            
                                                                 
 dropout_9 (Dropout)         (None, 35, 35, 100)       0         
                                                                 
 conv2d_7 (Conv2D)           (None, 33, 33, 50)        45050     
                                                                 
 max_pooling2d_7 (MaxPoolin  (None, 16, 16, 50)        0         
 g2D)                                                            
                                                                 
 dropout_10 (Dropout)        (None, 16, 16, 50)       



Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_10 (Conv2D)          (None, 70, 70, 100)       2800      
                                                                 
 max_pooling2d_10 (MaxPooli  (None, 35, 35, 100)       0         
 ng2D)                                                           
                                                                 
 dropout_13 (Dropout)        (None, 35, 35, 100)       0         
                                                                 
 conv2d_11 (Conv2D)          (None, 33, 33, 50)        45050     
                                                                 
 max_pooling2d_11 (MaxPooli  (None, 16, 16, 50)        0         
 ng2D)                                                           
                                                                 
 dropout_14 (Dropout)        (None, 16, 16, 50)       



Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_12 (Conv2D)          (None, 70, 70, 100)       2800      
                                                                 
 max_pooling2d_12 (MaxPooli  (None, 35, 35, 100)       0         
 ng2D)                                                           
                                                                 
 dropout_15 (Dropout)        (None, 35, 35, 100)       0         
                                                                 
 conv2d_13 (Conv2D)          (None, 33, 33, 50)        45050     
                                                                 
 max_pooling2d_13 (MaxPooli  (None, 16, 16, 50)        0         
 ng2D)                                                           
                                                                 
 dropout_16 (Dropout)        (None, 16, 16, 50)       

### Evaluation Discussion for Imaging Model

With our limited data and training approach, we report an overall test accuracy of **~94.4%** across all 3 patient groups. Across these 3 groups, we obtain an average precision of **~96.6%**, an average recall of **~95.8%**, and an F1-score of **0.95**.

Contrary to what we expected, our overall accuracy actually ended up being roughly 2% higher than the authors highlighted. We believe this to also be due to the fact that we used a different and smaller subset of data. This likely led to our model overfitting somewhat which may contribute to the higher accuracy.

# Results
We used the training routine from the MADDi repository to access each individual modality and the collective model. The report with key metrics is also saved.

Plans

For testing hypothesis 1, we will implement the models as described in the paper. We can then train and evaluate them against each other.

For testing hypothesis 2, we were unable to get a multimodal model working with adequate accuracy. The issue is outlined further in the Discussion section. Since this is crucial to evaluation of hypothesis 2, we will offer the paper's results on this matter and discuss how these results would translate to our own implementations.

### These are results for the clinical and imaging model respectively and they are explained above and discussed below:

https://drive.google.com/file/d/1--oRuwYDjus_LzwB42XJ02O1r8dw4nig/view?usp=sharing

https://drive.google.com/file/d/18cF8Nd-71RI6nWlgQmKs8-WlmnPwCrof/view?usp=drive_link

## Ablation studies

In this study, ablation studies were conducted to evaluate the importance of attention mechanisms in the proposed model architecture for diagnosing Alzheimer's disease. The attention mechanisms in question include self-attention and cross-modal attention. These attention modules are responsible for capturing intermodal interactions and highlighting relevant features for decision-making in the model.

By systematically disabling or including different attention mechanisms in the model, the paper was able to assess how each component contributes to the model's overall performance in AD diagnosis. The evaluation metrics used in the ablation studies include accuracy, precision, recall, and F1-score, which provide insights into the model's classification performance across different conditions.



# Discussion

In this section,you should discuss your work and make future plan. The discussion should address the following questions:
  * Make assessment that the paper is reproducible or not.
  * Explain why it is not reproducible if your results are kind negative.
  * Describe “What was easy” and “What was difficult” during the reproduction.
  * Make suggestions to the author or other reproducers on how to improve the reproducibility.
  * What will you do in next phase.



>The paper presents a detailed methodology for diagnosing Alzheimer's Disease using a multimodal deep learning approach, specifically introducing the MADDi framework. Overall, the methodology provides clear steps for data processing, model creation, and evaluation, which enhances the reproducibility of the research. The paper specifies the source of the data used in the research, which is the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Additionally, it provides guidelines for accessing the dataset, ensuring that other researchers can obtain the same data for replication purposes. The paper includes code snippets for various aspects of the research, such as data preprocessing, model creation, and evaluation. These code snippets serve as a reference for implementing the methodology and conducting the experiments, making it easier for other researchers to replicate the analysis.

>The process of reproducing the paper had both straightforward aspects and challenges. The clarity of the methodology and availability of code snippets facilitated the implementation of data processing steps and model creation. However, adapting the code to specific dataset characteristics and ensuring consistency with the paper's experimental setup posed challenges. Moreover, interpreting and comparing results required careful attention to detail and thorough understanding of the proposed methodology. Specifically, reproducing the multimodal modal's methodology proved to be quite difficult. Due to some data constraints, we were unable to get a multimodal model working with adequate accuracy (https://github.com/rsinghlab/MADDi/issues/17).

>To enhance reproducibility, future authors could provide more detailed documentation of dataset characteristics, preprocessing steps, and model hyperparameters. Standardizing the evaluation metrics and providing comprehensive explanations of experimental choices would also aid in replicating the research.

# References

1.   Michal Golovanevsky, Carsten Eickhoff, Ritambhara Singh, Multimodal attention-based deep learning for Alzheimer’s disease diagnosis, Journal of the American Medical Informatics Association, Volume 29, Issue 12, December 2022, Pages 2014–2022, https://doi.org/10.1093/jamia/ocac168
