[MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling #46

adam2392 · 2021-09-18T18:18:35Z

PR Description

Fixes: #45
Fixes: #28

TODO:

Add test to handle trends -> left for the future.
~~Consolidate way of handling lags in the dataset when storing as a Connectivity object~~

Merge checklist

Maintainer, please confirm the following before merging:

adam2392 · 2021-09-20T15:02:17Z

To handle lags of the VAR model and easily forming the companion matrix of the VAR matrices, we will need to see if this issue can be resolved in scipy: scipy/scipy#13124

This would form the multivariate block-companion matrix that can then be leverage to check stability of the VAR model. This would be a useful attribute to expose to the users in DynamicMixin.

larsoner

I'm going to stop where I am so far, because I'm noticing a lot of uncovered lines. Is codecov wrong here? If not, then I think there is a lot of code that should either be cut out, or tested properly if possible

mne_connectivity/base.py

mne_connectivity/vector_ar/model_selection.py

mne_connectivity/vector_ar/utils.py

adam2392 · 2021-09-21T15:30:16Z

I'm going to stop where I am so far, because I'm noticing a lot of uncovered lines. Is codecov wrong here? If not, then I think there is a lot of code that should either be cut out, or tested properly if possible

Ah sorry about that. Was just hoping to get your thoughts on the current implementation ported over from statsmodels backend. I'll get rid of the unnecessary portions of code, based on discussion in #45

agramfort

the code you copied from statsmodels should be cleaned with the mne rigid code style requirements.

thx !

mne_connectivity/vector_ar/model_selection.py

mne_connectivity/vector_ar/tests/test_var.py

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

adam2392 · 2021-09-25T18:40:53Z

Kay I removed a bunch of the unnecessary code that took from statsmodels and added a few more tests. Checking CI now.

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

agramfort · 2021-10-07T19:43:39Z

can you give me evidence that it works? did you test on simulations? do you see a speed or memory gain?

adam2392 · 2021-10-11T14:40:20Z

Current memory profiler output:

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    18    780.3 MiB    780.3 MiB           1   @profile
    19                                         def run_experiment(data, times):
    20                                             """Run RAM experiment.
    21                                         
    22                                             python -m memory_profiler ./benchmarks/bench_var.py
    23                                             """
    24                                             # compute time-varying var
    25   1112.3 MiB    332.0 MiB           1       conn = vector_auto_regression(data, times=times)


Filename: ./benchmarks/bench_var.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    28   1116.4 MiB   1116.4 MiB           1   @profile
    29                                         def run_sm_experiment(sample_data):
    30                                             # statsmodels feeds in (n_samples, n_channels)
    31   1116.5 MiB      0.1 MiB           1       sm_var = VAR(endog=sample_data.squeeze().T)
    32   1166.9 MiB     50.4 MiB           1       sm_params = sm_var.fit(maxlags=1, trend='n')

With a for loop over samples (Tbd):

agramfort · 2021-10-11T16:00:27Z

sorry can you unpack this?

…

adam2392 · 2021-10-11T16:03:43Z

sorry can you unpack this?
…

The total RAM usage is basically comparable with that of statsmodels (a bit lower overall probably cuz we're not storing as many variables).

In #28 (comment) though you suggest using a for loop to prevent high RAM usage with the large number of n_chs/n_samples. I'm not sure how to best incorporate that suggestion with lags though, so I was wondering if you had any tips of how to incorporate lags into the for loop over samples?

agramfort · 2021-10-11T19:56:53Z

for loop over lags is ok I think.

…

adam2392 · 2021-10-14T14:43:53Z

def _test_forloop(X, lags, offset=0, l2_reg=0):
    # possibly offset the endogenous variable over the samples
    endog = X[offset:, :]

    # get the number of equations we want
    n_times, n_equations = endog.shape

    y_sample = endog[lags:]

    # X.T @ X coefficient matrix
    n_channels = n_equations * lags
    XdotX = np.zeros((n_channels, n_channels))

    # X.T @ Y ordinate / dependent variable matrix
    XdotY = np.zeros((n_channels, n_channels))

    # loop over sample points and aggregate the
    # necessary elements of the normal equations
    first_component = np.zeros((n_channels, 1))
    second_component = np.zeros((1, n_channels))
    y_component = np.zeros((1, n_channels))
    for idx in range(n_times - lags):
        for jdx in range(lags):
            first_component[jdx * n_equations: (jdx+1) * n_equations, :] = endog[idx + jdx, :][:, np.newaxis]
            second_component[:, jdx * n_equations: (jdx+1) * n_equations] = endog[idx + jdx, :][np.newaxis, :]
            y_component[:, jdx * n_equations: (jdx+1) * n_equations] = endog[idx + 1 + jdx, :][np.newaxis, :]
        # second_component = np.hstack([endog[idx + jdx, :] for jdx in range(lags)])[np.newaxis, :]
        # print(second_component.shape)
        # increment for X.T @ X
        XdotX += first_component @ second_component

        # increment for X.T @ Y
        # second_component = np.hstack([endog[idx + 1 + jdx, :] for jdx in range(lags)])[np.newaxis, :]
        XdotY += first_component @ y_component

    if l2_reg != 0:
        final_params = np.linalg.lstsq(XdotX + l2_reg * np.eye(n_equations * lags),
                                 XdotY, rcond=1e-15)[0]
    else:
        final_params = np.linalg.lstsq(XdotX, XdotY, rcond=1e-15)[0].T
    
    # format the final matrix as (lags * n_equations, n_equations)
    params = np.empty((lags * n_equations, n_equations))
    for idx in range(lags):
        start_col = n_equations*idx
        stop_col = n_equations*(idx+1)
        start_row = n_equations * (lags - idx - 1)
        stop_row = n_equations*(lags - idx)
        params[start_row:stop_row, ...] = final_params[n_equations * (lags - 1):, start_col:stop_col].T

The for loop over the samples is actually considerably slower and actually for some reason takes a lot more memory. This is done by running bench_var.py. It's probably slower because you end up getting n_times * lags number of loops. Not sure why more RAM would be used though...

# for the current implementation
Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8    784.3 MiB    784.3 MiB           1   @profile
     9                                         def run_experiment(data, times):
    10                                             """Run RAM experiment.
    11                                         
    12                                             python -m memory_profiler ./benchmarks/bench_var.py
    13                                             """
    14                                             # compute time-varying var
    15   1389.2 MiB    604.8 MiB           1       conn = vector_auto_regression(data, times=times, lags=5)

# statsmodels implementation
Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    18    780.2 MiB    780.2 MiB           1   @profile
    19                                         def run_sm_experiment(sample_data):
    20                                             """Run RAM expeirment with statsmodels."""
    21                                             # statsmodels feeds in (n_samples, n_channels)
    22    780.3 MiB      0.1 MiB           1       sm_var = VAR(endog=sample_data.squeeze().T)
    23   1233.4 MiB    453.1 MiB           1       sm_params = sm_var.fit(maxlags=5, trend='n')

# for loop implementation
Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8    789.0 MiB    789.0 MiB           1   @profile
     9                                         def run_experiment(data, times):
    10                                             """Run RAM experiment.
    11                                         
    12                                             python -m memory_profiler ./benchmarks/bench_var.py
    13                                             """
    14                                             # compute time-varying var
    15   2107.6 MiB   1318.6 MiB           1       conn = vector_auto_regression(data, times=times, lags=5)

adam2392 · 2021-10-14T14:44:28Z

If that looks okay, then I can revert to the original method and clean up the code, and then keep the benchmark_var.py file in there for future ppl to test against statsmodels.

adam2392 · 2021-10-26T18:32:37Z

Running bench_var.py, I see pretty similar RAM performance wrt statsmodels.

When we do a for loop over samples and lags, the code is significantly slower, so I think it's best to stick with a similar algorithm that statsmodels uses, so this is good to go now from my end. Once @agramfort you approve this, I can remove the _test_forloop() inside var.py, which shows how we compute the VAR model using your suggested for loop.

I also as a result removed scikit-learn from our dependencies

agramfort

i trust your judgement @adam2392

if you have tests against statsmodels you should be good !

thx

…model, ii) selecting order and iii) improved lags handling (mne-tools#46) * Improve VAR ram usage * Adding updated functionality from statsmodels * Fix * Fix whatsnew * Clean rebase * Clean rebase * Clean up * Adding utils function for block companion * Adding companion matrix formulation * Nest imports * Fix docs * Apply suggestions from code review Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> * Fix coverage * Fix coverage * Fix unit tests * Update mne_connectivity/base.py Co-authored-by: Eric Larson <larson.eric.d@gmail.com> * Try again * Fix unit tests coverage * Fix * Fix * Adding becnhmarks * Fixing benchmarks * Try again * Add benchmarks to date * Fix manifest * Fix manifest * Clean up sklearn Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

adam2392 mentioned this pull request Sep 18, 2021

[MRG] Moving files around to decrease diff #47

Merged

6 tasks

adam2392 added 4 commits September 19, 2021 12:16

Improve VAR ram usage

005709f

Adding updated functionality from statsmodels

b0b9ff0

Fix

f5e64b7

Fix whatsnew

0e1ef30

adam2392 force-pushed the var branch from 45d7469 to 0e1ef30 Compare September 19, 2021 16:17

adam2392 added 3 commits September 19, 2021 12:18

Clean rebase

6c7e193

Clean rebase

aa444ea

Clean up

4a0b1f9

adam2392 added 4 commits September 20, 2021 14:11

Adding utils function for block companion

fbb4a40

Adding companion matrix formulation

0c06fef

Nest imports

bd92f20

Fix docs

ba83234

adam2392 changed the title ~~[WIP] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling~~ [MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling Sep 21, 2021

adam2392 requested review from agramfort and larsoner September 21, 2021 02:19

larsoner reviewed Sep 21, 2021

View reviewed changes

agramfort reviewed Sep 24, 2021

View reviewed changes

adam2392 and others added 3 commits September 25, 2021 14:24

Apply suggestions from code review

deb4fa2

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

Fix coverage

8908b49

Fix coverage

bf511c5

adam2392 and others added 5 commits September 25, 2021 16:19

Fix unit tests

1e27f81

Update mne_connectivity/base.py

ee902ec

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

Try again

b684e29

Fix unit tests coverage

ccb3952

Fix

d0a0ea8

Fix

4c22966

adam2392 requested review from larsoner and agramfort October 7, 2021 18:30

adam2392 added 3 commits October 11, 2021 21:43

Adding becnhmarks

3afe9b1

Fixing benchmarks

7c6851c

Try again

a69d583

adam2392 added 4 commits October 14, 2021 10:45

Add benchmarks to date

028171a

Fix manifest

3488fab

Fix manifest

d273caf

Clean up sklearn

23ed468

agramfort approved these changes Oct 26, 2021

View reviewed changes

adam2392 merged commit 8373167 into mne-tools:main Oct 26, 2021

adam2392 deleted the var branch October 26, 2021 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling #46

[MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling #46

adam2392 commented Sep 18, 2021 •

edited

adam2392 commented Sep 20, 2021

larsoner left a comment

adam2392 commented Sep 21, 2021

agramfort left a comment

adam2392 commented Sep 25, 2021

agramfort commented Oct 7, 2021

adam2392 commented Oct 11, 2021

agramfort commented Oct 11, 2021 via email

adam2392 commented Oct 11, 2021

agramfort commented Oct 11, 2021 via email

adam2392 commented Oct 14, 2021

adam2392 commented Oct 14, 2021

adam2392 commented Oct 26, 2021

agramfort left a comment

[MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling #46

[MRG] Improve the VAR module with: i) regression tests against stats model, ii) selecting order and iii) improved lags handling #46

Conversation

adam2392 commented Sep 18, 2021 • edited

PR Description

Merge checklist

adam2392 commented Sep 20, 2021

larsoner left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 21, 2021

agramfort left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 25, 2021

agramfort commented Oct 7, 2021

adam2392 commented Oct 11, 2021

agramfort commented Oct 11, 2021 via email

adam2392 commented Oct 11, 2021

agramfort commented Oct 11, 2021 via email

adam2392 commented Oct 14, 2021

adam2392 commented Oct 14, 2021

adam2392 commented Oct 26, 2021

agramfort left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 18, 2021 •

edited