### Displacement Invariant Transformer (**Simple** usage example)

for more complex usage, please check the documentation: *ToDo: add documentation*

In [1]:
import os
os.chdir("..")

from nifeatures.transform import DisplacementInvariantTransformer
from nifeatures.search import TransformerCV
from nilearn.datasets import load_mni152_brain_mask
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np

In [2]:
# Generate 100 simulated fMRI activation maps;
MNI = load_mni152_brain_mask(resolution=2)
template_data = MNI.get_fdata()
data = []
target = []

for sample in range(100):
    noisy_data = template_data.copy()
    weight = np.random.normal(0, 1, 1)[0]
    noisy_data[noisy_data == 1] = np.random.normal(0, 1, np.sum(template_data == 1)) * weight
    if sample == 0:
        data = noisy_data.flatten()
    else:
        data = np.vstack((data, noisy_data.flatten()))
    
    target.append(weight)


### Check if transformer works (for debugging purposes!)

In [3]:
X_train, X_test, y_train, y_test = train_test_split(data, target)

dt = DisplacementInvariantTransformer()
dt.fit(X_train, y_train)

In [4]:
new_data = dt.transform(X_train)

# check shapes;
print("old shape: ", X_train.shape)
print("new shape: ", new_data.shape)

old shape:  (75, 1100385)
new shape:  (75, 100)


### Usage example without Hyperparameter Search

In [5]:
# The transformer is sklearn compatible.
# This means we can use it inside a sklearn pipeline:
model = Pipeline([
    ("trf", DisplacementInvariantTransformer()),
    ("model", Ridge())
])

# # Fit the model;
X_train, X_test, y_train, y_test = train_test_split(data, target)
fit = model.fit(X_train, y_train)

# Predict test values and score performance;
prediction = fit.predict(X_test)
print("R2: ", np.corrcoef(prediction, y_test)[0, 1]**2)

R2:  0.11085223752044017


### Usage example with Hyperparameter Search

The transformer comes with its own Hyperparameter Search algorithm, called "TransformerCV".
The characteristic of this search algorithm is that it takes in any sklearn compatible
hyperparameter search class (e.g. GridsearchCV) and embeds it in a workflow tailored 
for the Displacement Invariant Transformer.

It should work with Scikit-Learn Hyper-parameter optimizers:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection)

and the Ray-Tune sklearn API:
https://docs.ray.io/en/latest/tune/api/sklearn.html

In [7]:
model = Pipeline([
    ("trf", DisplacementInvariantTransformer()),
    ("model", Ridge())
])

params = {
    "trf__n_peaks": [100, 200],
    "model__alpha": [0.5, 1.5]
}

# Fit the model;
X_train, X_test, y_train, y_test = train_test_split(data, target)
fit = TransformerCV(model, params, search=GridSearchCV, cv=5, n_jobs=4).fit(X_train, y_train)

# Predict test values and score performance;
prediction = fit.predict(X_test)
print("R2: ", np.corrcoef(prediction, y_test)[0, 1]**2)

R2:  0.10143705829819576


If no search algorithm is given as a parameter, TransformerCV will return an 
array containing the results of the transformer's fit on all possible parameter
combinations (precomputed data)

In [3]:
model = Ridge()

params = {
    "trf__n_peaks": [100, 200],
    "model__alpha": [0.5, 1.5]
}

# Fit the model;
X_train, X_test, y_train, y_test = train_test_split(data, target)
fit = TransformerCV(model, params, transform=True, cv=2, n_jobs=10).fit(X_train, y_train)
pd.DataFrame(fit)

ValueError: too many values to unpack (expected 2)

Precomputed data contains information about the coordinates of important voxels for each combination of hyperparameters. This allows us to test the performance of our model on our data in a second moment, skipping the transformer fitting step.

In [9]:
# Test performance for each set of coordinate and every combination of hyperparameters without refitting:
model = Pipeline([
    ("trf", DisplacementInvariantTransformer(precomputed=fit)),
    ("model", Ridge())
])

# The search algorithm must be fit on the same number of folds and parameters
# to be able to use the precomputed data. Giving different parameters to the search
# algorithm is allowed, but it will result in more computations, to calculate the
# data that has not been precomputed.
search = GridSearchCV(
    model,
    params,
    cv=5,
    n_jobs=4
).fit(X_train, y_train)

# Predict test values and score performance;
prediction = search.predict(X_test)
print("R2: ", np.corrcoef(prediction, y_test)[0, 1]**2)

R2:  0.0869031281559803
