# Neuropsychological Scores Exploratory Data Analysis

The purpose of this notebook is to explore and clean select data from the neuropsychological examinations available in ADNI.

**Import libraries**

In [38]:
%matplotlib inline
import os
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS

# import custom dependencies
from ADNI_utilities import define_terms, describe_meta_data, paths_with_ext, append_meta_cols

In [39]:
# define figure defaults
mpl.rc('axes', labelsize=10, titlesize=14)
mpl.rc('figure', figsize=[6,4], titlesize=14)
mpl.rc('legend', fontsize=12)
mpl.rc('lines', linewidth=2, color='k')
mpl.rc('xtick', labelsize=10)
mpl.rc('ytick', labelsize=10)

Import the ADNI dictionary to get readable definitions of features.

In [40]:
# import adni dictionary
adni_dict_df = pd.read_csv("../data/study info/DATADIC.csv")

## Geriatric Depression Scale

The Geriatric Depression scale is calculated from a battery of questioned designed to quantify a patient's depression.

In [41]:
# intialize neuroexam results and describe entries
gds_df = pd.read_csv("../data/Neuropsychological/GDSCALE.csv")

# create dictionary_df for NEUROEXM table
gds_dict = define_terms(gds_df, adni_dict_df, table_name="GDSCALE");
gds_dict

Unnamed: 0,FLDNAME,TYPE,TBLNAME,TEXT,CODE
0,,,,,
1,ID,N,GDSCALE,Record ID,"""crfname"",""Geriatric Depression Scale"",""indexe..."
2,RID,N,GDSCALE,Participant roster ID,
3,SITEID,N,GDSCALE,Site ID,
4,VISCODE,T,GDSCALE,Visit code,
5,VISCODE2,T,GDSCALE,Translated visit code,
6,USERDATE,S,GDSCALE,Date record created,
7,USERDATE2,S,GDSCALE,Date record last updated,
8,EXAMDATE,D,GDSCALE,Examination Date,
9,GDSOURCE,N,GDSCALE,Information Source,1=Participant Visit;2=Telephone Call


Most of the features here are raw data answers to the questions. Well just take the `GDTOTAL` score for future use.

In [42]:
# record columns
gds_cols = ["GDTOTAL"]

# standardize missingness and ensure float dtype
gds_df.replace({np.nan:-1, -4:-1}, inplace=True)
gds_df[gds_cols] = gds_df[gds_cols].astype(float)

## Mini-Mental State Exam

In [43]:
# intialize neuroexam results and describe entries
mmse_df = pd.read_csv("../data/Neuropsychological/MMSE.csv", low_memory=False)

# create dictionary_df for NEUROEXM table
mmse_dict = define_terms(mmse_df, adni_dict_df, table_name="MMSE");
mmse_dict

Unnamed: 0,FLDNAME,TYPE,TBLNAME,TEXT,CODE
0,,,,,
1,ID,N,MMSE,Record ID,"""crfname"",""Mini Mental State Exam"",""indexes"",""..."
2,RID,N,MMSE,Participant roster ID,
3,SITEID,N,MMSE,Site ID,
4,VISCODE,T,MMSE,Visit code,
5,VISCODE2,T,MMSE,Translated visit code,
6,USERDATE,S,MMSE,Date record created,
7,USERDATE2,S,MMSE,Date record last updated,
8,EXAMDATE,D,MMSE,Examination Date,
9,MMDATE,N,MMSE,1. What is today's date?,1=Correct; 2=Incorrect


Most of the features here are raw data answers to the questions. Again we'll just save the total score `MMSCORE`.

In [44]:
# record columns
mmse_cols = ["MMSCORE"]

# standardize missingness and ensure float dtype
mmse_df.replace({np.nan:-1, -4:-1}, inplace=True)
mmse_df[mmse_cols] = mmse_df[mmse_cols].astype(float)

## Modified Hachinski Ischemia Scale

In [45]:
# intialize neuroexam results and describe entries
mhach_df = pd.read_csv("../data/Neuropsychological/MODHACH.csv", low_memory=False)

# create dictionary_df for NEUROEXM table
mhach_dict = define_terms(mhach_df, adni_dict_df, table_name="MODHACH");
mhach_dict

Unnamed: 0,FLDNAME,TYPE,TBLNAME,TEXT,CODE
0,,,,,
1,ID,N,MODHACH,Record ID,"""crfname"",""Modified Hachinski"",""indexes"",""adni..."
2,RID,N,MODHACH,Participant roster ID,
3,SITEID,N,MODHACH,Site ID,
4,VISCODE,T,MODHACH,Visit code,
5,VISCODE2,T,MODHACH,Translated visit code,
6,USERDATE,S,MODHACH,Date record created,
7,USERDATE2,S,MODHACH,Date record last updated,
8,EXAMDATE,D,MODHACH,Examination Date,
9,HMONSET,N,MODHACH,1. Abrupt Onset of Dementia,2=Present - 2 points; 0=Absent


Most of the features here are raw data answers to the questions. Again we'll just save the total score `HMSCORE`.

In [46]:
# record columns
mhach_cols = ["HMSCORE"]

# standardize missingness and ensure float dtype
mhach_df.replace({np.nan:-1, -4:-1}, inplace=True)
mhach_df[mhach_cols] = mhach_df[mhach_cols].astype(float)

## Saving Data to File

With the columns from each data set hand-picked, the appropriate data types selected, and the missingness standardized, we can write the new cleaned dataframes to file.

In [47]:
# intialize dataframe list and empty placeholder
all_dfs = [gds_df, mmse_df, mhach_df]
all_df_cols = [gds_cols, mmse_cols, mhach_cols]
df_names = ["depression","mmse","mhach"]

# iterate over dataframes
for i,df in enumerate(all_dfs):
    
    # ensure RID is in column list for indexing
    cols = all_df_cols[i]
    cols = append_meta_cols(df.columns, cols)
    
    # write data to csv
    to_write = df[cols]
    to_write.to_csv("../data/Cleaned/" + df_names[i] + "_clean.csv")