In [1]:
%matplotlib inline


Voxel-Based Morphometry on Oasis dataset
========================================

This example uses Voxel-Based Morphometry (VBM) to study the relationship
between aging and gray matter density.

The data come from the `OASIS <http://www.oasis-brains.org/>`_ project.
If you use it, you need to agree with the data usage agreement available
on the website.

It has been run through a standard VBM pipeline (using SPM8 and
NewSegment) to create VBM maps, which we study here.

Predictive modeling analysis: VBM bio-markers of aging?
--------------------------------------------------------

We run a standard SVM-ANOVA nilearn pipeline to predict age from the VBM
data. We use only 100 subjects from the OASIS dataset to limit the memory
usage.

Note that for an actual predictive modeling study of aging, the study
should be ran on the full set of subjects. Also, parameters such as the
smoothing applied to the data and the number of features selected by the
Anova step should be set by nested cross-validation, as they impact
significantly the prediction score.

Brain mapping with mass univariate
-----------------------------------

SVM weights are very noisy, partly because heavy smoothing is detrimental
for the prediction here. A standard analysis using mass-univariate GLM
(here permuted to have exact correction for multiple comparisons) gives a
much clearer view of the important regions.

____




In [4]:
import numpy as np
import matplotlib.pyplot as plt
from nilearn import datasets
from nilearn.input_data import NiftiMasker
import random



Load Oasis dataset
-------------------



In [9]:
data_prefix = '/Volumes/Elements/OAS2_RAW_PART1/'
image_postfix = 'RAW/mpr-1.nifti.img'
scan_filenames = ['OAS2_0001_MR1',
            'OAS2_0002_MR1',
            'OAS2_0004_MR1',
            'OAS2_0005_MR1',
            'OAS2_0007_MR1',
            'OAS2_0008_MR1',
            'OAS2_0009_MR1',
            'OAS2_0010_MR1',
            'OAS2_0012_MR1',
            'OAS2_0013_MR1',
            'OAS2_0014_MR1',
            'OAS2_0016_MR1',
            'OAS2_0017_MR1',
            'OAS2_0018_MR1',
            'OAS2_0020_MR1',
            'OAS2_0021_MR1',
            'OAS2_0022_MR1',
            'OAS2_0023_MR1',
            'OAS2_0026_MR1',
            'OAS2_0027_MR1',
            'OAS2_0028_MR1',
            'OAS2_0029_MR1',
            'OAS2_0030_MR1',
            'OAS2_0031_MR1',
            'OAS2_0032_MR1',
            'OAS2_0034_MR1',
            'OAS2_0035_MR1',
            'OAS2_0036_MR1',
            'OAS2_0037_MR1',
            'OAS2_0039_MR1',
            'OAS2_0040_MR1',
            'OAS2_0041_MR1',
            'OAS2_0042_MR1',
            'OAS2_0043_MR1',
            'OAS2_0044_MR1',
            'OAS2_0045_MR1',
            'OAS2_0046_MR1',
            'OAS2_0047_MR1',
            'OAS2_0048_MR1',
            'OAS2_0049_MR1',
            'OAS2_0050_MR1',
            'OAS2_0051_MR1',
            'OAS2_0052_MR1',
            'OAS2_0053_MR1',
            'OAS2_0054_MR1',
            'OAS2_0055_MR1',
            'OAS2_0056_MR1',
            'OAS2_0057_MR1',
            'OAS2_0058_MR1',
            'OAS2_0060_MR1',
            'OAS2_0061_MR1',
            'OAS2_0062_MR1',
            'OAS2_0063_MR1',
            'OAS2_0064_MR1',
            'OAS2_0066_MR1',
            'OAS2_0067_MR1',
            'OAS2_0068_MR1',
            'OAS2_0069_MR1',
            'OAS2_0070_MR1',
            'OAS2_0071_MR1',
            'OAS2_0073_MR1',
            'OAS2_0075_MR1',
            'OAS2_0076_MR1',
            'OAS2_0077_MR1',
            'OAS2_0078_MR1',
            'OAS2_0079_MR1',
            'OAS2_0080_MR1',
            'OAS2_0081_MR1',
            'OAS2_0085_MR1',
            'OAS2_0086_MR1',
            'OAS2_0087_MR1',
            'OAS2_0088_MR1',
            'OAS2_0089_MR1',
            'OAS2_0090_MR1',
            'OAS2_0091_MR1',
            'OAS2_0092_MR1',
            'OAS2_0094_MR1',
            'OAS2_0095_MR1',
            'OAS2_0096_MR1',
            'OAS2_0097_MR1',
            'OAS2_0098_MR1',
            'OAS2_0099_MR1',
            'OAS2_0100_MR1',
            'OAS2_0101_MR1',
            'OAS2_0102_MR1',
            'OAS2_0103_MR1',
            'OAS2_0104_MR1',
            'OAS2_0105_MR1',
            'OAS2_0106_MR1',
            'OAS2_0108_MR1',
            'OAS2_0109_MR1',
            'OAS2_0111_MR1',
            'OAS2_0112_MR1',
            'OAS2_0113_MR1',
            'OAS2_0114_MR1',
            'OAS2_0116_MR1',
            'OAS2_0117_MR1',
            'OAS2_0118_MR1',
            'OAS2_0119_MR1',
            'OAS2_0120_MR1',
            'OAS2_0121_MR1',
            'OAS2_0122_MR1',
            'OAS2_0124_MR1',
            'OAS2_0126_MR1',
            'OAS2_0127_MR1',
            'OAS2_0128_MR1',
            'OAS2_0129_MR1',
            'OAS2_0131_MR1',
            'OAS2_0133_MR1',
            'OAS2_0134_MR1',
            'OAS2_0135_MR1',
            'OAS2_0137_MR1',
            'OAS2_0138_MR1',
            'OAS2_0139_MR1',
            'OAS2_0140_MR1',
            'OAS2_0141_MR1',
            'OAS2_0142_MR1',
            'OAS2_0143_MR1',
            'OAS2_0144_MR1',
            'OAS2_0145_MR1',
            'OAS2_0146_MR1',
            'OAS2_0147_MR1',
            'OAS2_0149_MR1',
            'OAS2_0150_MR1',
            'OAS2_0152_MR1',
            'OAS2_0154_MR1',
            'OAS2_0156_MR1',
            'OAS2_0157_MR1',
            'OAS2_0158_MR1',
            'OAS2_0159_MR1',
            'OAS2_0160_MR1',
            'OAS2_0161_MR1',
            'OAS2_0162_MR1',
            'OAS2_0164_MR1',
            'OAS2_0165_MR1',
            'OAS2_0169_MR1',
            'OAS2_0171_MR1',
            'OAS2_0172_MR1',
            'OAS2_0174_MR1',
            'OAS2_0175_MR1',
            'OAS2_0176_MR1',
            'OAS2_0177_MR1',
            'OAS2_0178_MR1',
            'OAS2_0179_MR1',
            'OAS2_0181_MR1',
            'OAS2_0182_MR1',
            'OAS2_0183_MR1',
            'OAS2_0184_MR1',
            'OAS2_0185_MR1',
            'OAS2_0186_MR1'
        ]

scan_filenames = np.asarray(list(map(lambda x: data_prefix + x + "/" + image_postfix, scan_filenames)))

csv_data = np.recfromcsv('/Users/justinchang/Desktop/Cs168/CS168/oasis_longitudinal_filtered.csv')

# note that we have to filter out the non MR1 scans.

status = np.asarray([True if x.decode() == 'Demented' else False for x in csv_data['group']])
assert(len(scan_filenames) == len(status))




#csv_data = np.recfromcsv('/Users/justinchang/Desktop/Cs168/CS168/oasis_longitudinal.csv')
#age = np.asarray([100. if x.decode() == 'Nondemented' else 1. if x.decode() == 'Demented' else 50. for x in csv_data['group'][:len(gray_matter_map_filenames)]])

# Comparisons to recfromcsv data must be/Users/RichardMin/nilearn_data/oasis2/oasis_longitudinal.csv bytes.


  output = genfromtxt(fname, **kwargs)


AssertionError: 

Preprocess data
----------------



In [8]:
import nibabel as nib
nifti_masker = NiftiMasker(
    standardize=False,
    smoothing_fwhm=2,
    memory='nilearn_cache')  # cache options

scans_masked = nifti_masker.fit_transform(scan_filenames)


#segment the data

train_ratio = 0.8
validation_ratio = 0.0
test_ratio = 0.2

assert(train_ratio + validation_ratio + test_ratio == 1)

zipped_data = list(zip(scans_masked, status))
random.shuffle(zipped_data)

shuffled_scans, shuffled_status = zip(*zipped_data)

train_scans = shuffled_scans[:int(len(shuffled_scans) * train_ratio)]
validation_scans = shuffled_scans[int(len(shuffled_scans) * train_ratio):int(len(shuffled_scans) * (train_ratio + validation_ratio))]
test_scans = shuffled_scans[int(len(shuffled_scans) * (train_ratio + validation_ratio)):]
train_status = shuffled_status[:int(len(shuffled_status) * train_ratio)]
validation_status = shuffled_status[int(len(shuffled_status) * train_ratio):int(len(shuffled_status) * (train_ratio + validation_ratio))]
test_status = shuffled_status[int(len(shuffled_status) * (train_ratio + validation_ratio)):]



ValueError: File not found: '/Volumes/Elements/OAS2_RAW_PART1/OAS2_0100_MR1/RAW/mpr-1.nifti.img'

Prediction pipeline with ANOVA and RandomForestClassifier
---------------------------------------



In [8]:
print("ANOVA + SVR")
# Define the prediction function to be used.
# Here we use a Support Vector Classification, with a linear kernel
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
#svc = SVC(kernel='linear')
# number of forests is 10, the max depth of tree is how deep it goes in terms of # of leaves? and features is # of features to consider per split
RFC=RandomForestClassifier(max_depth=5, n_estimators=10, max_features=3)
#weight function used in prediction. Possible values:
#‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
#‘distance’ : weight points by the inverse of their distance. 
#in this case, closer neighbors of a query point will have 
#a greater influence than neighbors which are further away.


# Dimension reduction
from sklearn.feature_selection import VarianceThreshold, SelectKBest, \
        f_regression

# Remove features with too low between-subject variance
variance_threshold = VarianceThreshold(threshold=.01)

# Here we use a classical univariate feature selection based on F-test,
# namely Anova.
feature_selection = SelectKBest(f_regression, k=2000)

# We have our predictor (SVR), our feature selection (SelectKBest), and now,
# we can plug them together in a *pipeline* that performs the two operations
# successively:
from sklearn.pipeline import Pipeline
anova_RFC = Pipeline([
            ('variance_threshold', variance_threshold),
            ('anova', feature_selection),
            ('RandomForestClassifier', RFC)])

### Fit and predictscans_masked
anova_RFC.fit(train_scans, train_status)
status_pred = anova_RFC.predict(test_scans)

ANOVA + SVR


Visualization
--------------
Look at the SVR's discriminating weights



In [None]:
import sklearn.metrics
print("Accuracy: " + str(sklearn.metrics.accuracy_score(test_status, status_pred)))
print("Precision: " + str(sklearn.metrics.precision_score(test_status, status_pred)))
print("Recall: " + str(sklearn.metrics.recall_score(test_status, status_pred)))
print("F1: " + str(sklearn.metrics.f1_score(test_status, status_pred)))

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/Users/justinchang/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/justinchang/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/justinchang/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/justinchang/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Users/justinchang/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/Users/justinchang/anaconda3

=== ANOVA ===
Prediction accuracy: -0.502144

Massively univariate model
