# Exploring the "clean" ADNI dataset using Pandas

The "clean" ADNI dataset is the dataset for year 5 (i.e. 5 years after baseline) cleaned by Xiao (Gavin) Gao in the following way:
1. Only particular "ADNI" datasers were used: ADNI-1, ADNI-GO, ADNI-2
    - Different versions of ADNI used different versions of freesurfer which, in turn, impacted how Intra Cranial Volume is calculated.
    - We use the average ICV for the first 3 years
    - We also use the average volume per region calculated from the first 3 years
2. Images that didn't pass the ADNI's quality control (QC) were not considered (see `ADNI_123_V4.mat`).
3. If volume of a region increases by over 10% between 2 visits, volume is replaced by the upper limit (1.10x average) calculated before.
4. Not yet done here but performed during Gavin's calculations: If more than 10 regions in the brain go over the threshold (volume increased more than 10%), this data is discarded.

Lets start by importing our useful libraries

In [24]:
from scipy.io import loadmat
import os
import numpy as np
from pathlib import Path
import pandas as pd

## Loading data in to dataframe

Let's load the data into a pandas dataframe.
First lets find the file.

In [25]:
def path_to_file(filename):
    '''
    Returns path for file 'filename`. 
    Assumes file to be in the relative path: '../data/'
    '''
    here_dir    = os.path.dirname(os.path.realpath('__file__'))
    par_dir = os.path.abspath(os.path.join(here_dir, os.pardir))
    dataset_dir = os.path.join(par_dir, 'data',str(filename))
    return dataset_dir

In [26]:
adni_5y = loadmat(path_to_file('vec_a2b_5y_clean.mat'))
#get the proper matrix from the .mat file, ignoring metadata
adni_5y = adni_5y['vec_a2b_5y']

In order to make a dataframe, we'll need the name of each column in the adni_5y matrix.

We have created a dictionary (`../data/dictionary.csv`) associating the column names to the column numbers they refer to. Let's use it to make a dataframe.

In [156]:
def make_dataframe(dictionary_file):
    def read_column_names_from_csv(dictionary_file):
        '''
        Returns a dictionary with {column name: slice}
        where slice a slice of the range of columns
        corresponding to data with this column name.
        '''      
        import csv
        path = path_to_file(dictionary_file)
        reader = csv.reader(open(path, 'r'))
        name_dict = {}
        for row in reader:
            legend, column_numbers = row
            column_numbers_list = column_numbers.split(':')
            column_numbers_list = list(map(int, column_numbers_list))
            column_slice = slice(column_numbers_list[0],column_numbers_list[1],None) if len(column_numbers_list) > 1 \
                else slice(column_numbers_list[0],column_numbers_list[0]+1,None)
            name_dict[legend] = column_slice
        return(name_dict)
    
    names_dict = read_column_names_from_csv(dictionary_file)
    
    adni_5_dataframe = pd.DataFrame({k: [adni_5y[i][slice] for i in range(len(adni_5y))] for k, slice in zip(names_dict.keys(), names_dict.values())})
    
    return(adni_5_dataframe)

Now we can use these column names to create a pandas dataframe:

In [154]:
df = make_dataframe('dictionary_ADNI.csv')

Unnamed: 0,ID,BaselineDx,1yDx,End-of-studyDx,Baselineatrophy,Futureatrophy,GeneticInfo(APOE4),blAge,gender,educationyear,...,Future_Age,sum_volume_all_regions-baseline,sum_volume_all_regions-future,baseline_ICV / sum_volume_all_regions-baseline,baseline_ICV / sum_volume_all_regions-future,average_ICSV_first_3_visits,num_regions_over_upper_limit-baseline,num_regions_over_upper_limit-future,end-of-study-conversion,num_months_from_baseline
0,[42.0],[3.0],[3.0],[3.0],"[0.3860435507843098, 0.4601034735884774, 0.357...","[0.6768182200209267, 0.5706051539585486, 0.598...",[0.0],[-0.31172552176246077],[1.0],[0.5709329194451874],...,[458625.0],[3.0755121809170016],[3.2823112564731534],[1511358.3333333333],[3.0],[1.0],[0.0],[60.0],[],[]
1,[42.0],[3.0],[3.0],[3.0],"[0.4350936215281894, 0.46318749016151184, 0.44...","[0.6124012178938631, 0.5875415976676359, 0.421...",[0.0],[0.0285923290611428],[1.0],[0.5709329194451874],...,[458582.0],[3.115673721100145],[3.2941109768809067],[1511358.3333333333],[1.0],[4.0],[0.0],[60.0],[],[]
2,[51.0],[2.0],[3.0],[3.0],"[0.7895316423854466, 0.26355683532737223, 0.83...","[0.8468920704458542, 0.336827062199337, 0.8858...",[2.0],[-1.1803127840265417],[1.0],[0.5709329194451874],...,[530504.0],[2.8290822738185],[3.148487099060516],[1628523.3333333333],[2.0],[1.0],[1.0],[60.0],[],[]
3,[51.0],[2.0],[3.0],[3.0],"[0.8191180578895447, 0.26355683532737223, 0.84...","[0.8937978793463865, 0.3537468098375079, 0.911...",[2.0],[-1.0157722052635965],[1.0],[0.5709329194451874],...,[496010.0],[2.818724842494007],[3.344741033446906],[1628523.3333333333],[5.0],[0.0],[1.0],[60.0],[],[]
4,[56.0],[1.0],[1.0],[2.0],"[0.3440695743998192, 0.7262904051589735, 0.683...","[0.3375758225059455, 0.6552234805023012, 0.683...",[0.0],[-1.0008133737312492],[2.0],[-1.2557801177303942],...,[486438.0],[2.6799897878134575],[2.6947524658846556],[1325962.5],[1.0],[2.0],[0.0],[60.0],[],[]
5,[56.0],[1.0],[1.0],[2.0],"[0.33185448789385874, 0.635152598992666, 0.779...","[0.40258643661734944, 0.6954987668154035, 0.70...",[0.0],[-0.8412190615132978],[2.0],[-1.2557801177303942],...,[493722.0],[2.6792872782303507],[2.6560493557102984],[1325962.5],[1.0],[1.0],[0.0],[60.0],[],[]
6,[59.0],[1.0],[1.0],[2.0],"[0.4394603169504152, 0.23113084742179837, 0.71...","[0.6786594631176207, 0.4444750236473095, 0.739...",[0.0],[-0.7873503551576799],[2.0],[-1.2557801177303942],...,[447729.0],[2.530798027606907],[2.7894998983760266],[1237642.5],[2.0],[2.0],[0.0],[60.0],[],[]
7,[59.0],[1.0],[1.0],[2.0],"[0.6455786587438823, 0.351151748500752, 0.6672...","[0.5929089681114119, 0.283092416900162, 0.6750...",[0.0],[-0.6277560429397285],[2.0],[-1.2557801177303942],...,[484212.0],[2.555015500808021],[2.592025806877979],[1237642.5],[4.0],[5.0],[0.0],[60.0],[],[]
8,[61.0],[1.0],[3.0],[3.0],"[0.6596419050556441, 0.4771095750692263, 0.489...","[0.7022902448704071, 0.4996858896714643, 0.562...",[0.0],[0.7088020898851344],[2.0],[-0.5250949028601616],...,[490977.5],[2.900062554341625],[3.00400731194403],[1473486.0],[3.0],[0.0],[1.0],[60.0],[],[]
9,[68.0],[1.0],[1.0],[1.0],"[0.49519301446265623, 0.42380606734464116, 0.1...","[0.5836867876394929, 0.5804742551919625, 0.237...",[0.0],[-0.21264222822884454],[2.0],[-1.6211227251655105],...,[505269.00000000006],[2.5363354381300542],[2.7181692656123104],[1375186.0],[8.0],[1.0],[0.0],[60.0],[],[]
