# Loading Fake Timeseries Surface Data

This notebook is designed to explore some functionality with loading DataFiles and using Loaders.

This example will require some extra optional libraries, including nibabel and nilearn! Note: while nilearn is not imported, when trying to import SingleConnectivityMeasure, if nilearn is not installed, this will give an ImportError.

We will also use fake data for this example - so no special datasets required!

In [1]:
import BPt as bp
import nibabel as nib
import numpy as np
import pandas as pd
import os

In [2]:
def save_fake_timeseries_data():
    '''Save fake timeseries and fake surface data.'''
    
    X = np.random.random(size = (20, 100, 10242))
    os.makedirs('fake_time_data', exist_ok=True)
    
    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_lh', X[x])
    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_rh', X[x])
        
save_fake_timeseries_data()

In [3]:
# Init a Dataset
data = bp.Dataset()

Next, we are interested in loading in the files to the dataset as data files. There are a few different ways to do this, but we will use the method add_data_files. We will try and load the timeseries data first.

First we need a dictionary mapping desired column name to location or a file glob (which is easier so let's use that).

In [4]:
# The *'s just mean wildcard
files = {'timeseries_lh': 'fake_time_data/*_lh*',
         'timeseries_rh': 'fake_time_data/*_rh*'}

# Now let's try loading with 'auto' as the file to subject function
data.add_data_files(files, 'auto')

Unnamed: 0,timeseries_lh,timeseries_rh
13_lh,Loc(0),
9_lh,Loc(1),
8_lh,Loc(2),
2_lh,Loc(3),
16_lh,Loc(4),
11_lh,Loc(5),
6_lh,Loc(6),
7_lh,Loc(7),
1_lh,Loc(8),
17_lh,Loc(9),


We can see 'auto' doesn't work for us, so we can try writing our own function instead.

In [5]:
def file_to_subj(loc):
    return loc.split('/')[-1].split('_')[0]

# Actually load it this time
data = data.add_data_files(files, file_to_subj)
data

Unnamed: 0,timeseries_lh,timeseries_rh
13,Loc(0),Loc(36)
9,Loc(1),Loc(27)
8,Loc(2),Loc(37)
2,Loc(3),Loc(35)
16,Loc(4),Loc(38)
11,Loc(5),Loc(20)
6,Loc(6),Loc(29)
7,Loc(7),Loc(31)
1,Loc(8),Loc(26)
17,Loc(9),Loc(39)


What's this though? Why are the files showing up as Loc(int). Whats going on is that the data files are really stored as just integers, see:

In [6]:
data['timeseries_lh']

13     0.0
9      1.0
8      2.0
2      3.0
16     4.0
11     5.0
6      6.0
7      7.0
1      8.0
17     9.0
19    10.0
15    11.0
10    12.0
3     13.0
14    14.0
0     15.0
18    16.0
5     17.0
4     18.0
12    19.0
Name: timeseries_lh, dtype: float64

They correspond to locations in a stored file mapping (note: you don't need to worry about any of this most of the time)

In [7]:
data.file_mapping[0], data.file_mapping[1], data.file_mapping[2]  

(DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/13_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/9_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/8_lh.npy'))

Let's add a fake target to our dataset now

In [8]:
data['t'] = np.random.random(len(data))
data.set_target('t', inplace=True)
data

Unnamed: 0,timeseries_lh,timeseries_rh
13,Loc(0),Loc(36)
9,Loc(1),Loc(27)
8,Loc(2),Loc(37)
2,Loc(3),Loc(35)
16,Loc(4),Loc(38)
11,Loc(5),Loc(20)
6,Loc(6),Loc(29)
7,Loc(7),Loc(31)
1,Loc(8),Loc(26)
17,Loc(9),Loc(39)

Unnamed: 0,t
13,0.656648
9,0.298354
8,0.495359
2,0.41466
16,0.606687
11,0.453163
6,0.853856
7,0.044329
1,0.916036
17,0.865733


Next we will generate a Loader to apply a parcellation, then extract a measure of connectivity.

In [9]:
from BPt.extensions import SurfLabels

lh_parc = SurfLabels(labels='data/lh.aparc.annot', vectorize=False)
rh_parc = SurfLabels(labels='data/rh.aparc.annot', vectorize=False)

We can see how this object works on example data first.

In [10]:
ex_lh = data.file_mapping[0].load()
ex_lh.shape

(100, 10242)

In [11]:
trans = lh_parc.fit_transform(ex_lh)
trans.shape

(100, 35)

We essentially get a reduction from 10242 features to 35.

Next, we want to transform the matrix into a correlation matrix.

In [12]:
from BPt.extensions import SingleConnectivityMeasure
scm = SingleConnectivityMeasure(kind='covariance', discard_diagonal=True, vectorize=True)

In [13]:
scm.fit_transform(trans).shape

(595,)

The single connectivity measure is just a wrapper designed to let the ConnectivityMeasure from nilearn work with a single subject's data at a time.

Next, let's use the input special Pipe wrapper to compose these two objects into their own pipeline

In [14]:
lh_loader = bp.Loader(bp.Pipe([lh_parc, scm]), scope='_lh')
rh_loader = bp.Loader(bp.Pipe([rh_parc, scm]), scope='_rh')

Define a simple pipeline with just our loader steps, then evaluate with mostly default settings.

In [15]:
pipeline = bp.Pipeline([lh_loader, rh_loader, bp.Model('linear')])

results = bp.evaluate(pipeline, data)
results

Folds:   0%|          | 0/5 [00:00<?, ?it/s]

BPtEvaluator
------------
mean_scores = {'explained_variance': -0.3492082271322736, 'neg_mean_squared_error': -0.08532586202634963}
std_scores = {'explained_variance': 0.37944917198666483, 'neg_mean_squared_error': 0.025409784568717956}

Saved Attributes: ['estimators', 'preds', 'timing', 'train_subjects', 'val_subjects', 'feat_names', 'ps', 'mean_scores', 'std_scores', 'weighted_mean_scores', 'scores', 'fis_', 'coef_']

Available Methods: ['get_preds_dfs', 'get_fis', 'get_coef_', 'permutation_importance']

Evaluated with:
ProblemSpec(problem_type='regression',
            scorer={'explained_variance': make_scorer(explained_variance_score),
                    'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False)},
            subjects='all', target='t')

Don't be discouraged that this didn't work, we are after all trying to predict random noise with random noise ... 

In [16]:
# These are the steps of the pipeline
fold0_pipeline = results.estimators[0]
for step in fold0_pipeline.steps:
    print(step[0])

loader_pipe0
loader_pipe1
linear regressor


We can investigate pieces, or use special functions like

In [17]:
results.get_X_transform_df(data, fold=0)

Unnamed: 0,timeseries_rh_0,timeseries_rh_1,timeseries_rh_2,timeseries_rh_3,timeseries_rh_4,timeseries_rh_5,timeseries_rh_6,timeseries_rh_7,timeseries_rh_8,timeseries_rh_9,...,timeseries_lh_585,timeseries_lh_586,timeseries_lh_587,timeseries_lh_588,timeseries_lh_589,timeseries_lh_590,timeseries_lh_591,timeseries_lh_592,timeseries_lh_593,timeseries_lh_594
0,-0.000165,4.6e-05,-7.7e-05,-7.5e-05,7.4e-05,-1.1e-05,-4.9e-05,4.7e-05,-2.4e-05,-2.4e-05,...,-8.290498e-06,-6e-06,-2.3e-05,1.610693e-06,1.5e-05,-6e-06,4.867083e-06,-0.0001215231,-0.00014,-4.8e-05
1,5.1e-05,2.7e-05,-1.1e-05,-3e-06,2.2e-05,3.3e-05,4.9e-05,7.2e-05,1e-05,-1.4e-05,...,9.147214e-06,-3.3e-05,-1.5e-05,4.817195e-06,1e-06,9e-06,-3.010718e-05,5.807162e-05,-7e-05,1.6e-05
2,-1.9e-05,-2.4e-05,-4e-06,2.7e-05,-5.4e-05,1.3e-05,6.4e-05,-0.000118,-6.5e-05,6.3e-05,...,-8.021237e-06,-5.9e-05,4e-06,-1.018778e-05,-2.6e-05,-3e-06,1.120659e-05,-3.87497e-05,5.7e-05,-8e-06
3,3.7e-05,2.7e-05,5e-05,8e-05,3.8e-05,9e-06,-9.4e-05,-0.000117,5.6e-05,-5e-06,...,2.637188e-07,-1.5e-05,-1.1e-05,-6.939784e-06,2.2e-05,5e-06,-2.519195e-05,0.0001219129,2.1e-05,7.4e-05
4,-3e-05,1.3e-05,-4.8e-05,-2e-06,4.3e-05,-2.1e-05,-2.1e-05,4.5e-05,1.5e-05,-8e-06,...,-4.193627e-05,-5e-06,-3.8e-05,-1.579288e-05,-1e-05,7e-06,-2.074608e-05,0.0001288912,4.8e-05,1.5e-05
5,-2.7e-05,1.2e-05,4.9e-05,-4e-05,0.000137,-2e-05,2.3e-05,5.7e-05,2e-05,1.8e-05,...,-2.317345e-05,4.7e-05,-2.1e-05,-3.256373e-06,1.3e-05,6e-06,-2.017995e-05,3.17479e-05,-4.4e-05,-5e-05
6,-3e-06,1.1e-05,3.7e-05,-7e-06,2.6e-05,3.4e-05,7e-06,-7.1e-05,-1.9e-05,-4e-06,...,1.230251e-05,6.5e-05,8e-06,8.041033e-07,1e-06,-2.6e-05,-1.401379e-05,2.662647e-05,-2e-05,3.2e-05
7,3.8e-05,1.9e-05,6e-06,1.7e-05,-0.000173,2.7e-05,-5.8e-05,0.00012,2.8e-05,-2.9e-05,...,-2.762708e-05,1.9e-05,1.5e-05,-5.296039e-06,-2.1e-05,1.7e-05,-3.512035e-06,-0.0001743649,1.5e-05,2e-06
8,-9e-06,7e-06,3.4e-05,-2e-06,3.2e-05,-1.1e-05,-2.1e-05,-0.000113,4e-05,2.4e-05,...,-1.286571e-06,-2.2e-05,-2.7e-05,2.031265e-05,-8e-06,3.5e-05,-5.331094e-06,-5.483645e-05,0.000103,-1.4e-05
9,6.2e-05,-2.2e-05,6e-05,1e-05,-1.7e-05,1.2e-05,-1.9e-05,9.3e-05,-2e-06,2.8e-05,...,-1.272615e-05,2.7e-05,-1.5e-05,-1.022682e-05,-4.4e-05,-6e-06,4.879025e-06,3.508208e-07,-6.9e-05,-2e-06
