# Introduction to the Dataset
## What is Dementia?
Dementia isn't a single disease. Dementia is a term used to describe the symptoms that occur when there's a decline in brain function to such an extent that it interferes with a person's daily life and activities.
Dementia is caused when the brain is damaged by diseases, such as Alzheimer’s disease or a series of strokes. Alzheimer’s disease is the most common cause of dementia, but not the only one.
There is no one test to determine if someone has dementia. Doctors diagnose Alzheimer's and other types of dementia based on a careful medical history, a physical examination, laboratory tests, and the characteristic changes in thinking, day-to-day function and behavior associated with each type.

## Longitudinal MRI Data in Nondemented and Demented Older Adults
The Open Access Series of Imaging Studies (OASIS) is a project aimed at making MRI data sets of the brain freely available to the scientific community. By compiling and freely distributing MRI data sets, we hope to facilitate future discoveries in basic and clinical neuroscience. OASIS is made available by the Washington University Alzheimer’s Disease Research Center, Dr. Randy Buckner at the Howard Hughes Medical Institute (HHMI)( at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).

This dataset consists of longitudinal MRI data of people aged 60 to 96 including both men and women. Everyone was right-handed and scanned at least once. Some subjects were grouped as 'Demented' at the time of their initial visits and remained so throughout the study. The subjects grouped as 'Nondemented' at the time of their initial visit and subsequently characterized as 'Demented' at a later visit fall under the 'Converted' category.


Source: https://www.kaggle.com/jboysen/mri-and-alzheimers

# Research Questions
**1. Does age influence the likelihood of developing dementia?**

**2. What's the correlation between educational level and the probability of having dementia?**

**3. Does dementia affect the derived anatomical volumes of the brain (estimated total intracranial volume and normalized whole-brain volume)?**

# Step 1: Explain the Data

In [2]:
import pandas as pd
import altair as alt

In [3]:
df = pd.read_csv('oasis_longitudinal.csv')
df.head()

Unnamed: 0,Subject ID,MRI ID,Group,Visit,MR Delay,M/F,Hand,Age,EDUC,SES,MMSE,CDR,eTIV,nWBV,ASF
0,OAS2_0001,OAS2_0001_MR1,Nondemented,1,0,M,R,87,14,2.0,27.0,0.0,1987,0.696,0.883
1,OAS2_0001,OAS2_0001_MR2,Nondemented,2,457,M,R,88,14,2.0,30.0,0.0,2004,0.681,0.876
2,OAS2_0002,OAS2_0002_MR1,Demented,1,0,M,R,75,12,,23.0,0.5,1678,0.736,1.046
3,OAS2_0002,OAS2_0002_MR2,Demented,2,560,M,R,76,12,,28.0,0.5,1738,0.713,1.01
4,OAS2_0002,OAS2_0002_MR3,Demented,3,1895,M,R,80,12,,22.0,0.5,1698,0.701,1.034


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 373 entries, 0 to 372
Data columns (total 15 columns):
Subject ID    373 non-null object
MRI ID        373 non-null object
Group         373 non-null object
Visit         373 non-null int64
MR Delay      373 non-null int64
M/F           373 non-null object
Hand          373 non-null object
Age           373 non-null int64
EDUC          373 non-null int64
SES           354 non-null float64
MMSE          371 non-null float64
CDR           373 non-null float64
eTIV          373 non-null int64
nWBV          373 non-null float64
ASF           373 non-null float64
dtypes: float64(5), int64(5), object(5)
memory usage: 43.8+ KB


## **What the variables stand for:**

Subject.ID
<br>
MRI.ID
<br>
Group (Converted / Demented / Nondemented)
<br>
Visit - Number of visits
<br>
MR.Delay

**Demographics Info**

M.F - Gender
<br>
Hand - Handedness
<br>
Age
<br>
EDUC - Years of Education
<br>
SES - Socioeconomic Status as assessed by the Hollingshead Index of Social Position and classified into categories from 1 (highest status) to 5 (lowest status)

**Clinical Info**

MMSE - Mini-Mental State Examination score
- A 30-point questionnaire used extensively in clinical and research settings to measure cognitive impairment.
- Normal (24-30 points), severe (≤9 points), moderate (10–18 points) or mild (19–23 points) cognitive impairment.


CDR - Clinical Dementia Rating
- 0 = Normal
- 0.5 = Very Mild Dementia
- 1 = Mild Dementia
- 2 = Moderate Dementia
- 3 = Severe Dementia

**Derived Anatomic Volumes**

eTIV - Estimated total intracranial volume, mm3
<br>
nWBV - Normalized whole-brain volume, expressed as a percent of all voxels in the atlas-masked image that are labeled as gray or white matter by the automated tissue segmentation process
<br>
ASF - Atlas scaling factor (unitless). Computed scaling factor that transforms native-space brain and skull to the atlas target (i.e., the determinant of the transform matrix)

# Step 2: Explore, Clean and Visualize the data

In [5]:
df.describe()

Unnamed: 0,Visit,MR Delay,Age,EDUC,SES,MMSE,CDR,eTIV,nWBV,ASF
count,373.0,373.0,373.0,373.0,354.0,371.0,373.0,373.0,373.0,373.0
mean,1.882038,595.104558,77.013405,14.597855,2.460452,27.342318,0.290885,1488.128686,0.729568,1.195461
std,0.922843,635.485118,7.640957,2.876339,1.134005,3.683244,0.374557,176.139286,0.037135,0.138092
min,1.0,0.0,60.0,6.0,1.0,4.0,0.0,1106.0,0.644,0.876
25%,1.0,0.0,71.0,12.0,2.0,27.0,0.0,1357.0,0.7,1.099
50%,2.0,552.0,77.0,15.0,2.0,29.0,0.0,1470.0,0.729,1.194
75%,2.0,873.0,82.0,16.0,3.0,30.0,0.5,1597.0,0.756,1.293
max,5.0,2639.0,98.0,23.0,5.0,30.0,2.0,2004.0,0.837,1.587


In [6]:
df['Group'].value_counts()

Nondemented    190
Demented       146
Converted       37
Name: Group, dtype: int64

Since knowing if a person is converted to dementia is irrelevent to my research questions, I will change all the categorical values of 'Converted' in Group to 'Demented'.

In [7]:
df['Group'] = df['Group'].replace(['Converted'], ['Demented'])

In [8]:
df['Group'].value_counts()

Nondemented    190
Demented       183
Name: Group, dtype: int64

In [9]:
df['M/F'].value_counts()

F    213
M    160
Name: M/F, dtype: int64

## What does the distribution of age of people with dementia look like?

In [10]:
alt.Chart(df[df['Group'] == 'Demented']).mark_bar().encode(x=alt.X('Age', bin=True), y='count()')

## How does the age distribution change according to the level of dementia?
The level of dementia is measured by Clinical Dementia Rating (CDR):
- 0 = Normal
- 0.5 = Very Mild Dementia
- 1 = Mild Dementia
- 2 = Moderate Dementia
- 3 = Severe Dementia

In [11]:
df['CDR'].value_counts()

0.0    206
0.5    123
1.0     41
2.0      3
Name: CDR, dtype: int64

In [12]:
df[df['Group'] == 'Nondemented'].shape

(190, 15)

In [13]:
df.groupby('CDR').agg(mean_age = ('Age', 'mean'), min_year = ('Age', 'min'), max_age = ('Age', 'max'))

Unnamed: 0_level_0,mean_age,min_year,max_age
CDR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,77.15534,60,97
0.5,77.382114,62,92
1.0,74.609756,61,96
2.0,85.0,78,98


In [45]:
alt.Chart(df).mark_boxplot().encode(x='CDR', y='Age')

## What's the correlation between educational level and the probability of having dementia?

In [46]:
df[['EDUC', 'CDR']].corr()

Unnamed: 0,EDUC,CDR
EDUC,1.0,-0.153121
CDR,-0.153121,1.0


In [37]:
alt.Chart(df).mark_bar().encode(x=alt.X('EDUC', bin=True), y='count()', color='Group', column='Group')

## Does dementia affect the derived anatomical volumes of the brain (estimated total intracranial volume and normalized whole-brain volume)?

In [38]:
alt.Chart(df).mark_bar().encode(x=alt.X('eTIV', bin=True), y='count()', color='Group', column='Group')

In [48]:
alt.Chart(df).transform_fold(
    ['eTIV'],
    as_=['Experiment', 'Measurement']
).mark_area(
    opacity=0.3,
    interpolate='step'
).encode(
    alt.X('Measurement:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('Experiment:N')
)

In [49]:
alt.Chart(df).mark_bar().encode(x=alt.X('nWBV', bin=True), y='count()', color='Group', column='Group')

In [50]:
df[['eTIV', 'nWBV', 'CDR']].corr()

Unnamed: 0,eTIV,nWBV,CDR
eTIV,1.0,-0.210122,0.022819
nWBV,-0.210122,1.0,-0.344819
CDR,0.022819,-0.344819,1.0


# Answers to Research Questions

1. The onset of dementia seems to be in between the age of 70 and 80 because that’s where most people with very mild and mild dementia fall into. So we can say that people in their 70s are more at risk of developing dementia.
2. There's a negative correlation between educational level and clinical dementia rating, which means that the higher your educational level, the less likely that you have dementia.
3. Nondemented people seem to have higher eTIV and nWBV in general, and that nWBV has a negative correlation with the level of dementia.

We cannot confirm if those findings are scientifically correct or not because we need to further investigate the science behind the brain volume, educational level, age and dementia to answer that. The sample size of this dataset is relatively small, so the results might be biased.