ReceptiveField is not pipelineable #5089

nbara · 2018-04-04T11:58:55Z

Hi all,

I've been hitting a roadblock recently with the ReceptiveField module.

I'd like to use sklearn CV to find optimal hyperparameters for my data (typical dichotic listening experiment similar to the one in the tutorials :
https://mne-tools.github.io/dev/auto_tutorials/plot_receptive_field.html#compare-model-performance

In that example the crossvalidation and hyper parameter tuning is done "manually" in a for loop. This can be extremely slow as it does't use multiprocessing.

However when I tried using sklearn's GridSearchCV to find the best alpha, but I ran into all kinds of nasty sklearn reshaping errors. It's probably due to the input data format required by ReceptiveField (namely (n_epochs, n_times, n_chans)), which is:

not compatible with the rest of the MNE estimators like Scaler (which take (n_chans, n_epochs, n_times)
not compatible with sklearn's CV tools

For instance the code below will fail with a reshape-related error in sklearn :

from mne.decoding import Scaler
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe = Pipeline([
    ('scaler', Scaler(scalings='mean')),
    ('rf', ReceptiveField(tmin, tmax, fs, scoring='r2',
                            feature_names=ch_names[:n_channels]))
    ])
kf = KFold(n_splits=n_splits, shuffle=False)
parameters = {
    'rf__estimator': np.logspace(0, 9, 10)
    }
rf = ReceptiveField(
    tmin, tmax, fs, feature_names=ch_names[:n_channels], scoring='r2')
grid = GridSearchCV(pipe, parameters, cv=kf, n_jobs=n_jobs,
                    scoring='r2', verbose=2)
grid.fit(eeg, y)
res = grid.cv_results_

...........................................................................
/Users/nicolas/anaconda3/envs/py3/lib/python3.6/site-packages/sklearn/metrics/regression.py in _check_reg_targets(y_true=array([[[-0.63683281],
        [-1.04570221],
  ...,
        [ 1.35630007],
        [ 0.39642897]]]), y_pred=array([[[-0.03673164],
        [-0.04872231],
  ...,
        [ 0.58824323],
        [ 0.1759169 ]]]), multioutput='uniform_average')
     71         just the corresponding argument if ``multioutput`` is a
     72         correct keyword.
     73
     74     """
     75     check_consistent_length(y_true, y_pred)
---> 76     y_true = check_array(y_true, ensure_2d=False)
        y_true = array([[[-0.63683281],
        [-1.04570221],
  ...,
        [ 1.35630007],
        [ 0.39642897]]])
     77     y_pred = check_array(y_pred, ensure_2d=False)
     78
     79     if y_true.ndim == 1:
     80         y_true = y_true.reshape((-1, 1))

...........................................................................
/Users/nicolas/anaconda3/envs/py3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array=array([[[-0.63683281],
        [-1.04570221],
  ...,
        [ 1.35630007],
        [ 0.39642897]]]), accept_sparse=False, dtype=None, order=None, copy=False, force_all_finite=True, ensure_2d=False, allow_nd=False, ensure_min_samples=1, ensure_min_features=1, warn_on_dtype=False, estimator=None)
    446         # make sure we actually converted to numeric:
    447         if dtype_numeric and array.dtype.kind == "O":
    448             array = array.astype(np.float64)
    449         if not allow_nd and array.ndim >= 3:
    450             raise ValueError("Found array with dim %d. %s expected <= 2."
--> 451                              % (array.ndim, estimator_name))
        array.ndim = 3
        estimator_name = 'Estimator'
    452         if force_all_finite:
    453             _assert_all_finite(array)
    454
    455     shape_repr = _shape_repr(array.shape)

ValueError: Found array with dim 3. Estimator expected <= 2.
___________________________________________________________________________

_

PS: actually I wanted to pass 'rf__estimator': [Ridge(alpha) for alpha in np.logspace(0, 9, 10)] but I'm not sure this is correct ?

The text was updated successfully, but these errors were encountered:

jona-sassenhagen · 2018-04-04T13:40:05Z

I'm not sure you'll gain much from multiprocessing, the Ridge estimator will very efficiently use your cores anyways.

Also couldn't you create a GridSearchCV for a pipeline and then pass the pipeline to ReceptiveField? Or does TimeDelayedRidge maybe help you here?

nbara · 2018-04-04T13:54:30Z

I'm not sure you'll gain much from multiprocessing, the Ridge estimator will very efficiently use your cores anyways.

Are you sure? It's excruciatingly slow on a very fast server....

What I was doing until now to speed it up a notch is start using multiprocessing to run subjects in parallel (I have ~ 30)

import multiprocessing
pool = multiprocessing.Pool(processes=n_jobs)
pool.map(training_function, subjects_list)
pool.close()
pool.join()

Also couldn't you create a GridSearchCV for a pipeline and then pass the pipeline to ReceptiveField? Or does TimeDelayedRidge maybe help you here?

I will try your suggestion and let you know, thanks.

You're right that TDR is quite a bit faster, but unfortunately I've been getting noticeably better results with Ridge which is why I'm using the latter.

Still, this doesn't address the issue that RF does not take the same input data as other MNE estimator.

larsoner · 2018-04-04T14:04:36Z

You're right that TDR is quite a bit faster, but unfortunately I've been getting noticeably better results with Ridge which is why I'm using the latter.

Ridge and TimeDelayingRidge are meant to produce identical (to numerical precision) results. If they do not, we have some bug. Can you try to make a minimal example?

nbara · 2018-04-04T14:35:23Z

Uh oh, now you're making me doubt now @larsoner :) I'll double check that Ridge vs TDR results and get back to you asap.

jona-sassenhagen · 2018-04-04T17:27:15Z

Are you sure? It's excruciatingly slow on a very fast server....

If you want to see if it's using your cores, you should check CPU usage.

jona-sassenhagen · 2018-04-04T17:31:49Z

Btw, as you know, there isn't much to tune with Ridge anyways. You can use RidgeCV to tune internally.

nbara · 2018-04-05T09:01:26Z

Btw, as you know, there isn't much to tune with Ridge anyways. You can use RidgeCV to tune internally.

Currently my pipeline consists of a Z-scorer (according to several papers this should be done), a RF, and a classifier (I have 2 concurrent streams, but this is done in many other papers as well).

Or what if I had too many channels and wanted to run a PCA before the RF ? Currently I'm doing all this by hand, but shouldn't I be able to do that all at once in sklearn ?

What I was hoping to achieve is something similar to this example:
http://www.martinos.org/mne/dev/auto_examples/decoding/plot_linear_model_patterns.html#sphx-glr-auto-examples-decoding-plot-linear-model-patterns-py

larsoner · 2018-04-05T14:17:58Z

@choldgraf thoughts on this pipeline for ReceptiveField idea?

choldgraf · 2018-04-14T23:12:50Z

mmmm, I'm not sure what would be the path forward to making this possible since it's been a while since I've used the code, but I think something like this would be useful for sure, happy to brainstorm w/ folks

jona-sassenhagen · 2018-07-30T16:38:51Z

Retrospectively, it appears to me to be very inelegant that the decoding and (implicitly-)encoding functionality expects data to be in different shapes.

agramfort · 2018-07-30T20:00:08Z

what do you suggest?

jona-sassenhagen · 2018-07-30T20:03:55Z

Maybe add a kwarg "data_orientation" and a 3-release switch that goes:

data_orientation="e_t_c" default, warn that it will change to "e_c_t"
data_orientation="e_c_t"
delete keyword

?

That is, if consistency between decoding and encoding is actually agreed to be useful. Maybe it's not actually inherently desirable.

@choldgraf @larsoner @kingjr

larsoner · 2018-07-31T00:59:54Z

I don't have much of an opinion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReceptiveField is not pipelineable #5089

ReceptiveField is not pipelineable #5089

nbara commented Apr 4, 2018 •

edited

jona-sassenhagen commented Apr 4, 2018

nbara commented Apr 4, 2018

larsoner commented Apr 4, 2018

nbara commented Apr 4, 2018

jona-sassenhagen commented Apr 4, 2018

jona-sassenhagen commented Apr 4, 2018

nbara commented Apr 5, 2018 •

edited

larsoner commented Apr 5, 2018

choldgraf commented Apr 14, 2018

jona-sassenhagen commented Jul 30, 2018

agramfort commented Jul 30, 2018 via email

jona-sassenhagen commented Jul 30, 2018

larsoner commented Jul 31, 2018

ReceptiveField is not pipelineable #5089

ReceptiveField is not pipelineable #5089

Comments

nbara commented Apr 4, 2018 • edited

jona-sassenhagen commented Apr 4, 2018

nbara commented Apr 4, 2018

larsoner commented Apr 4, 2018

nbara commented Apr 4, 2018

jona-sassenhagen commented Apr 4, 2018

jona-sassenhagen commented Apr 4, 2018

nbara commented Apr 5, 2018 • edited

larsoner commented Apr 5, 2018

choldgraf commented Apr 14, 2018

jona-sassenhagen commented Jul 30, 2018

agramfort commented Jul 30, 2018 via email

jona-sassenhagen commented Jul 30, 2018

larsoner commented Jul 31, 2018

nbara commented Apr 4, 2018 •

edited

nbara commented Apr 5, 2018 •

edited