# COGS 138 Final Project

## Names

- Fatima Enriquez
- Ashley Chavarria
- Tiffany Gunawan
- Hiroki Ito

## Reseach Question

#### How does the pattern of brain atrophy differ between patients with Alzheimer’s disease and healthy controls, and can these differences be used to prototype a predictive model to predict disease progression?

## Background & Prior Work

Our group’s decision to focus on brain atrophy in Alzheimer’s patients is per our discussion for a need to create better diagnostic and prognostic tools in managing this debilitating condition. Alzheimer's is a very common type of dementia with progressive symptoms, starting with mild memory loss and possibly leading to the loss of ability to carry on a conversation and respond to your general surroundings. According to the CDC, scientists do not fully know what causes Alzheimer’s disease, but there are multiple factors that can affect each person differently: age, family history, changes in the brain, and possibly education, diet, and environment. Most of what is known are the symptoms of Alzheimer’s, typically regarding memory problems being the first warning sign followed by difficulty completing familiar tasks, misplacing things, and changes in mood and behavior. [(CDC, 2020)](https://www.cdc.gov/aging/aginginfo/alzheimers.htm)

From what we knew before, there is no single test that can determine if a person is living with Alzheimer’s. Doctors employ a range of diagnostic methods for diagnosing including an integration of medical history, neurological exams, cognitive assessments, brain imaging, and blood examinations to make an accurate diagnosis. Recently, however, there has been a new usage of biomarkers to help diagnose Alzheimer’s disease, particularly focusing on brain imaging. According to the National Institute on Aging, brain scans are allowing doctors to see different factors that may help in diagnosis via CT, MRI, and PET scan. [(NIA, 2022)](https://www.nia.nih.gov/health/alzheimers-symptoms-and-diagnosis/how-biomarkers-help-diagnose-dementia#types_biomarkers_tests) Our group specifically wanted to focus on MRI for this project due to its versatility. 
    
Magnetic resonance imaging is a noninvasive technique that uses magnetic fields and radio waves to produce detailed images of body structures, and similar to CT scans, MRIs can show areas of the brain that have shrunk. Additionally, repeated MRIs can show a person’s brain changes over time, which may lead to evidence of shrinkage and can be used in many diagnoses. [(John Hopkins Medicine, 2024)](https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/magnetic-resonance-imaging-mri) This has our group wondering, if MRIs can be used to note shrinkage in the brain, and according to the NIA, Alzheimer’s results as neuronal death which can affect the brain via tissue death and shrinkage, then we can possibly use MRI as a predictive method for Alzheimer's disease.

## Installations

In [36]:
#pip install Pillow

In [35]:
pip install opencv-python

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


## Imports: 

In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn

from PIL import Image
import glob, os

from pathlib import Path
import imghdr

import cv2
import os
from pathlib import Path

## Data Overview: 

The dataset below is the exact same as the oasis datasets...

In [38]:
#dementia_dataset=pd.read_csv('dementia_dataset.csv') 
#dementia_dataset=dementia_dataset.drop(columns=['Visit', 'Hand', 'MR Delay', 'SES', 'MMSE', 'EDUC', 'CDR', 'ASF'])
#dimentia_dataset=dementia_dataset.sort_values('nWBV', ascending=False)

#### Dataset #1
- Name: Oasis_cross_sectional
- Link to dataset: https://sites.wustl.edu/oasisbrains/home/oasis-1/
- Number of observations: 436
- Number of variables: 12
- Variables of interest: gender (M/F), age (Age), volume (eTIV, nWBV)

In [56]:
oasis_cross_sectional=pd.read_csv('oasis_cross-sectional.csv')

In [68]:
oasis_cross_sectional.head()

Unnamed: 0,ID,M/F,Age,eTIV,nWBV
22,OAS1_0025_MR1,F,24,1240,0.893
44,OAS1_0049_MR1,F,20,1329,0.887
117,OAS1_0126_MR1,M,21,1582,0.885
408,OAS1_0450_MR1,M,19,1478,0.88
296,OAS1_0328_MR1,M,19,1453,0.878


In [58]:
oasis_cross_sectional.shape

(436, 12)

In [59]:
oasis_cross_sectional.dtypes

ID        object
M/F       object
Hand      object
Age        int64
Educ     float64
SES      float64
MMSE     float64
CDR      float64
eTIV       int64
nWBV     float64
ASF      float64
Delay    float64
dtype: object

In [60]:
oasis_cross_sectional.describe()

Unnamed: 0,Age,Educ,SES,MMSE,CDR,eTIV,nWBV,ASF,Delay
count,436.0,235.0,216.0,235.0,235.0,436.0,436.0,436.0,20.0
mean,51.357798,3.178723,2.490741,27.06383,0.285106,1481.919725,0.79167,1.198894,20.55
std,25.269862,1.31151,1.120593,3.69687,0.383405,158.740866,0.059937,0.128682,23.86249
min,18.0,1.0,1.0,14.0,0.0,1123.0,0.644,0.881,1.0
25%,23.0,2.0,2.0,26.0,0.0,1367.75,0.74275,1.11175,2.75
50%,54.0,3.0,2.0,29.0,0.0,1475.5,0.809,1.19,11.0
75%,74.0,4.0,3.0,30.0,0.5,1579.25,0.842,1.28425,30.75
max,96.0,5.0,5.0,30.0,2.0,1992.0,0.893,1.563,89.0


In [61]:
oasis_cross_sectional=oasis_cross_sectional.drop(columns=['Hand', 'Educ', 'SES', 'MMSE', 'CDR', 'ASF', 'Delay'])
oasis_cross_sectional=oasis_cross_sectional.sort_values('nWBV', ascending=False)
oasis_cross_sectional.head()

Unnamed: 0,ID,M/F,Age,eTIV,nWBV
22,OAS1_0025_MR1,F,24,1240,0.893
44,OAS1_0049_MR1,F,20,1329,0.887
117,OAS1_0126_MR1,M,21,1582,0.885
408,OAS1_0450_MR1,M,19,1478,0.88
296,OAS1_0328_MR1,M,19,1453,0.878


#### Dataset #2
- Name: Oasis_longitudinal_demographics
- Link to dataset: https://sites.wustl.edu/oasisbrains/home/oasis-2/
- Number of observations: 373
- Number of variables: 15
- Variables of interst: demented/nondemented (Group), gender (M/F), age (Age), volume (eTIV, nWBV)

In [63]:
oasis_longitudinal_demographics=pd.read_csv('oasis_longitudinal_demographics.csv')

In [69]:
oasis_longitudinal_demographics.head()

Unnamed: 0,Subject ID,MRI ID,Group,M/F,Age,eTIV,nWBV
116,OAS2_0055,OAS2_0055_MR1,Nondemented,M,65,1362,0.837
117,OAS2_0055,OAS2_0055_MR2,Nondemented,M,67,1365,0.827
55,OAS2_0030,OAS2_0030_MR1,Nondemented,F,60,1402,0.822
290,OAS2_0142,OAS2_0142_MR1,Nondemented,F,69,1380,0.819
56,OAS2_0030,OAS2_0030_MR2,Nondemented,F,62,1392,0.817


In [64]:
oasis_longitudinal_demographics.shape

(373, 15)

In [65]:
oasis_longitudinal_demographics.dtypes

Subject ID     object
MRI ID         object
Group          object
Visit           int64
MR Delay        int64
M/F            object
Hand           object
Age             int64
EDUC            int64
SES           float64
MMSE          float64
CDR           float64
eTIV            int64
nWBV          float64
ASF           float64
dtype: object

In [66]:
oasis_longitudinal_demographics.describe()

Unnamed: 0,Visit,MR Delay,Age,EDUC,SES,MMSE,CDR,eTIV,nWBV,ASF
count,373.0,373.0,373.0,373.0,354.0,371.0,373.0,373.0,373.0,373.0
mean,1.882038,595.104558,77.013405,14.597855,2.460452,27.342318,0.290885,1488.128686,0.729568,1.195461
std,0.922843,635.485118,7.640957,2.876339,1.134005,3.683244,0.374557,176.139286,0.037135,0.138092
min,1.0,0.0,60.0,6.0,1.0,4.0,0.0,1106.0,0.644,0.876
25%,1.0,0.0,71.0,12.0,2.0,27.0,0.0,1357.0,0.7,1.099
50%,2.0,552.0,77.0,15.0,2.0,29.0,0.0,1470.0,0.729,1.194
75%,2.0,873.0,82.0,16.0,3.0,30.0,0.5,1597.0,0.756,1.293
max,5.0,2639.0,98.0,23.0,5.0,30.0,2.0,2004.0,0.837,1.587


In [67]:
oasis_longitudinal_demographics=oasis_longitudinal_demographics.drop(columns=['Visit', 'MR Delay', 'Hand', 'EDUC', 'SES', 'MMSE', 'CDR', 'ASF'])
oasis_longitudinal_demographics=oasis_longitudinal_demographics.sort_values('nWBV', ascending=False)
oasis_longitudinal_demographics.head()

Unnamed: 0,Subject ID,MRI ID,Group,M/F,Age,eTIV,nWBV
116,OAS2_0055,OAS2_0055_MR1,Nondemented,M,65,1362,0.837
117,OAS2_0055,OAS2_0055_MR2,Nondemented,M,67,1365,0.827
55,OAS2_0030,OAS2_0030_MR1,Nondemented,F,60,1402,0.822
290,OAS2_0142,OAS2_0142_MR1,Nondemented,F,69,1380,0.819
56,OAS2_0030,OAS2_0030_MR2,Nondemented,F,62,1392,0.817


In [45]:
## derived from -- https://medium.com/@mangesh8374/working-with-image-dataset-to-build-cnn-model-in-tensorflow-f3dba0f72bfa

data_dir = 'D:\Millet Classification\Millet_Dataset'
file_extensions = [".png", ".jpg", ".jpeg"]  # image file extensions from the downloaded images
file_types_accepted_by_tf = ["bmp", "gif", "jpeg", "png"] # image extensions accepted by TensorFlow
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in file_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in file_types_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
            
data_dir = 'D:\Millet Classification\Millet_Dataset'

def jpg_to_jpeg(data_dir):
    for dir_name in os.listdir(data_dir):
        files = os.path.join(data_dir,dir_name)
        for filepaths in os.listdir(files):
            file_names = os.path.join(files,filepaths)
            if file_names.endswith(".jpg") or file_names.endswith(".JPG"):
                img = cv2.imread(str(file_names))
                cv2.imwrite(file_names[0:-4]+".jpeg", img)
                os.remove(file_names)

##### source: 
https://stackoverflow.com/questions/51178166/iterate-through-folder-with-pillow-image-open

save below for when images are finished uploading

In [30]:
#directory_path_1='oasis_cross_section.csv'
#for filename in os.listdir(directory_path_1):
#    if filename.endswith('.jpg'):
#        print(filename)
        
#directory_path_2='oasis_longitudinal_demographics.csv'
#for filename in os.listdir(directory_path_2):
#    if filename.endswith('.jpg'):
#        print(filename)

##### source: 
https://stackoverflow.com/questions/51178166/iterate-through-folder-with-pillow-image-open

!! CAUTION !!

Need to remove the # from img.show() and all files from the folder will open

In [None]:
images = glob.glob("Datasets/Moderate_Demented/*.jpg")
for image in images:
    with open(image, 'rb') as file:
        img = Image.open(file)
        # img.show()
