Wrong shape of variance-covariance matrix in ARDRegressor when parameters are pruned #18858

erikfransson · 2020-11-17T21:01:04Z

In the documentation for the ARDRegressor (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegression.html), it says that the shape of sigma_ is

sigma_ : array-like of shape (n_features, n_features)

However if parameters are pruned from the model the sigma_ matrix will shrink, for example if 10 parameters are pruned then sigma_ will be of shape (n_features - 10, n_features - 10).

This could easily be fixed in the documentation, but it would of course be nice to have access to the full sigma_ matrix.

Short example to demonstrate the behavior

import numpy as np
from sklearn.linear_model import ARDRegression

# make dataset
N = 100
M = 50
np.random.seed(42)
X = np.random.normal(0, 1, (N, M))
y = np.random.normal(0, 1, (N))

for lam in [1e7, 1e4]:
    ardr = ARDRegression(threshold_lambda=lam)
    ardr.fit(X, y)
    n_kept = np.count_nonzero(ardr.coef_)
    print('Nonzero parameters: {} / {}'.format(n_kept, M))
    print('sigma_ shape: ', ardr.sigma_.shape)

with output

Nonzero parameters: 50 / 50
sigma_ shape:  (50, 50)

Nonzero parameters: 37 / 50
sigma_ shape:  (37, 37)

The text was updated successfully, but these errors were encountered:

mokeddembillel · 2020-11-24T19:11:06Z

Hi, i'm new here and i would like to contribute to this issue, i just want to fully understand you,
can you please explain more about what do you mean by saying "to have access to the full sigma_ matrix."
Thank you

erikfransson · 2020-11-25T16:32:52Z

Hi,
Yes, in my linear model I have 50 parameters (weights) and sigma_ is the covariance matrix for these parameters. So I would expect sigma_[i][j] to be the covariance between parameter i and parameter j, and hence sigma_ is of shape (50, 50).
However if ARDR prunes 10 parameters the size/shape of sigma_ shrinks to (40, 40).

So if I now want to find the covariance between parameters 32, 48 I don't know how to do it. I'm guessing I would have to re-index them such that e.g. 32 --> 25 and 48 --> 39 based on which parameters were pruned.
What I mean with full sigma_ matrix was just an idea if it would be possible to keep the size (50,50) and add zeros for pruned parameters, such that no re-indexing is necessary.

erikfransson added the Documentation label Nov 17, 2020

cmarmo added the module:linear_model label Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong shape of variance-covariance matrix in ARDRegressor when parameters are pruned #18858

Wrong shape of variance-covariance matrix in ARDRegressor when parameters are pruned #18858

erikfransson commented Nov 17, 2020

mokeddembillel commented Nov 24, 2020

erikfransson commented Nov 25, 2020

Wrong shape of variance-covariance matrix in ARDRegressor when parameters are pruned #18858

Wrong shape of variance-covariance matrix in ARDRegressor when parameters are pruned #18858

Comments

erikfransson commented Nov 17, 2020

mokeddembillel commented Nov 24, 2020

erikfransson commented Nov 25, 2020