In [1]:
%load_ext autoreload
%autoreload 2

# Linear regression with laplace prior
 + In general, laplace prior gives sparse result for regression
     + However, it is difficult to deal with it well due to non-differential point at the origin.
         + $\log p(w) \equiv -1/\beta \sum_j |w_j| $, $|w_j|$ is non-differential at the origin.
 + By the way, non-differential point is eliminated by integrating $|w_j|$:
     + $E[|w_j|]$ does not have non-diffenrential point when the distribution is normal distribution.
     + It is achieved when we consider about the objective function of variational Bayes.
         + $\mathcal{F} := E[\log \frac{q(w)}{p(Y|X,w}p(w)]$
         + Here, $\mathcal{F}$ has a parameter that decides the form of $q(w) = N(w|m, \Sigma)$, $(m, \Sigma)$ is the parameter and optimized by it.
 + In this notebook, the approximated posterior distribution by Variational Bayes is studied.
     + The objective function is optimized by a gradient descent method.
         + Specifically, the Natural gradient descent is efficient method when we consider about a constrained parameter like positive definite matrix, positive real value, simplex, and so on.
         + Thus, we used the natural gradient descent.

# Formulation
+ Learning Model:
    + $p(y|x,w) = N(y|x \cdot w, 1), y \in mathbb{R}, x,w \in \mathbb{R}^M$
    + $p(w) \equiv \exp(-\frac{1}{\beta} \sum_j |w_j|)$, $\beta$ is hyperparameter.
+ Approximated Variational Posterior distribution:
    + $q(w) = N(w|m, \Sigma)$
        + $m \in \mathbb{R}^M, \Sigma \in \mathbb{R}^{M \times M}$ is the parameters to be optimized.

# In this notebook
+ We compare the following average generalization error:
$$
    G(n) = \frac{1}{L} \sum_{j=1}^L \| y - X \hat{w}(x^l, y^l) \|^2,
$$
where $\hat{w}$ is estimated parameter by $(x^l, y^l)$.  
We evaluate the error among Lasso, Ridge, and VB laplace(this calculation).

# Preliminary
## Import library

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.stats import invwishart

In [3]:
from sklearn.linear_model import LassoCV, Lasso, LassoLarsCV
from sklearn.linear_model import RidgeCV, Ridge
from sklearn.linear_model import ARDRegression
from sklearn.base import BaseEstimator, RegressorMixin

## Data setting

In [27]:
## data setting
n = 100 # train size
M = 150 # # of features
n_zero_ind = M//2 # # of zero elements in the parameter
prob_seed = 20201110 # random seed

N = 10000 # test size

datasets = 100

## Problem setting

In [5]:
np.random.seed(prob_seed)
true_w = np.random.normal(scale = 3, size = M)
zero_ind = np.random.choice(M, size = n_zero_ind)
true_w[zero_ind] = 0

## Learning settings

In [6]:
ln_vb_params = {
    "pri_beta": 10,
    "pri_opt_flag": True,
    "variance": "diag",
    "iteration": 10000,
    "step": 0.2,
    "is_trace": False,
    "trace_step": 100
}
ln_lasso_params = {
    "fit_intercept": False,
    "cv": 5,
    "max_iter": 10000
}
ln_ridge_params = {
    "fit_intercept": False,
    "cv": 5
}
ln_ard_params = {
    "fit_intercept": False
}

## Classes

In [45]:
class VBLaplace(BaseEstimator, RegressorMixin):
    def __init__(
        self, pri_beta: float = 20, pri_opt_flag: bool = True, variance: str = "full",
        seed: int = -1, iteration: int = 1000, tol: float = 1e-8, step: float = 0.1,
        is_trace: bool = False, trace_step: int = 20
    ):
        self.pri_beta = pri_beta
        self.pri_opt_flag = pri_opt_flag
        self.variance = variance
        self.seed = seed
        self.iteration = iteration
        self.tol = tol
        self.step = step
        self.is_trace = is_trace
        self.trace_step = trace_step
        pass
    
    def _initialization(self, M: int):
        seed = self.seed
        
        if seed > 0:
            np.random.seed(seed)
        
        mean = np.random.normal(size = M)
        sigma = invwishart.rvs(df = M+2, scale = np.eye(M), size = 1)
        if self.variance == "diag":
            sigma = np.diag(np.diag(sigma))
        pri_beta = np.random.gamma(shape = 3, size = 1) if self.pri_opt_flag else self.pri_beta
        
        self.mean_ = mean
        self.sigma_ = sigma
        self.pri_beta_ = pri_beta
        pass
    
    def _obj_func(self, X:np.ndarray, y:np.ndarray, pri_beta:float, mean:np.ndarray, sigma:np.ndarray) -> float:
        """
        Calculate objective function.

        + Input:
            1. X: input matrix (n, M) matrix
            2. y: output vector (n, ) matrix
            3. mean: mean parameter of vb posterior
            4. sigma: covariance matrix of vb posterior

        + Output:
            value of the objective function.

        """

        n, M = X.shape

        sq_sigma_diag = np.sqrt(np.diag(sigma))
        log_2pi = np.log(2*np.pi)

        F = 0
        # const values
        F += -M/2*log_2pi -M/2 + M*log_2pi + n*M/2*log_2pi + M*np.log(2*pri_beta)

        F += ((y - X@mean)**2).sum()/2 - np.linalg.slogdet(sigma)[1]/2 + np.trace(X.T @ X @ sigma)/2

        # term obtained from laplace prior
        F += ((mean + 2*sq_sigma_diag*norm.pdf(-mean/sq_sigma_diag)-2*mean*norm.cdf(-mean/sq_sigma_diag))/pri_beta).sum()

        return F
    
    def fit(self, train_X:np.ndarray, train_Y:np.ndarray):
        pri_beta = self.pri_beta
        iteration = self.iteration
        step = self.step
        tol = self.tol
        
        is_trace = self.is_trace
        trace_step = self.trace_step
        
        M = train_X.shape[1]
        
        if not hasattr(self, "mean_"):
            self._initialization(M)
        
        est_mean = self.mean_
        est_sigma = self.sigma_
        est_pri_beta = self.pri_beta_
        
        # transformation to natural parameter
        theta1 = np.linalg.solve(est_sigma, est_mean)
        theta2 = -np.linalg.inv(est_sigma)/2        
        
        F = []
        
        cov_X = train_X.T @ train_X
        cov_YX = train_Y @ train_X
        for ite in range(iteration):
            sq_sigma_diag = np.sqrt(np.diag(est_sigma))

            # update mean and sigma by natural gradient
            dFdnu1 = theta1 - cov_YX
            dFdnu1 += (1 - 2*est_mean/sq_sigma_diag*norm.pdf(-est_mean/sq_sigma_diag) - 2*norm.cdf(-est_mean/sq_sigma_diag)) / est_pri_beta
            dFdnu2 = theta2 + cov_X/2
            if self.variance == "diag":
                dFdnu2 = np.diag(np.diag(dFdnu2))
            dFdnu2[np.diag_indices(M)] += 1/sq_sigma_diag*norm.pdf(-est_mean/sq_sigma_diag)/est_pri_beta
            
            
            theta1 += -step * dFdnu1
            theta2 += -step * dFdnu2
            est_sigma = -np.linalg.inv(theta2)/2
            est_mean = est_sigma @ theta1
            
            # update pri_beta by extreme value
            sq_sigma_diag = np.sqrt(np.diag(est_sigma))
            est_pri_beta = ((est_mean + 2*sq_sigma_diag*norm.pdf(-est_mean/sq_sigma_diag)-2*est_mean*norm.cdf(-est_mean/sq_sigma_diag))).mean() if self.pri_opt_flag else pri_beta
            current_F = self._obj_func(train_X, train_Y, est_pri_beta, est_mean, est_sigma)
            
#             print(np.allclose(np.diag(np.diag(theta2)), theta2))
#             print(np.allclose(np.diag(np.diag(est_sigma)), est_sigma))
            
            if is_trace and ite % trace_step == 0:
                print(current_F, (dFdnu1**2).sum(), (dFdnu2**2).sum())            
            
            if ite > 0 and np.abs(current_F - F[ite-1]) < tol:
                if is_trace:
                    print(current_F, (dFdnu1**2).sum(), (dFdnu2**2).sum())                            
                break
            else:
                F.append(current_F)
            pass
        
        
        self.F_ = F
        self.mean_ = est_mean
        self.sigma_ = est_sigma
        self.pri_beta_ = est_pri_beta
        
        return self
        pass
    
    def predict(self, test_X: np.ndarray):
        if not hasattr(self, "mean_"):
            raise ValueError("fit has not finished yet, should fit before predict.")
        return test_X @ self.mean_
        pass
        
    pass

In [46]:
class VBNormal(BaseEstimator, RegressorMixin):
    def __init__(
        self, pri_beta: float = 20, pri_opt_flag: bool = True,
        seed: int = -1, iteration: int = 1000, tol: float = 1e-8, step: float = 0.1,
        is_trace: bool = False, trace_step: int = 20
    ):
        self.pri_beta = pri_beta
        self.pri_opt_flag = pri_opt_flag
        self.seed = seed
        self.iteration = iteration
        self.tol = tol
        self.step = step
        self.is_trace = is_trace
        self.trace_step = trace_step
        pass
    
    def _initialization(self, M: int):
        seed = self.seed
        
        if seed > 0:
            np.random.seed(seed)
        
        mean = np.random.normal(size = M)
        sigma = invwishart.rvs(df = M+2, scale = np.eye(M), size = 1)
        pri_beta = np.random.gamma(shape = 3, size = 1) if self.pri_opt_flag else self.pri_beta
        
        self.mean_ = mean
        self.sigma_ = sigma
        self.pri_beta_ = pri_beta
        pass
    
    def _obj_func(self, X:np.ndarray, y:np.ndarray, pri_beta:float, mean:np.ndarray, sigma:np.ndarray) -> float:
        """
        Calculate objective function.

        + Input:
            1. X: input matrix (n, M) matrix
            2. y: output vector (n, ) matrix
            3. mean: mean parameter of vb posterior
            4. sigma: covariance matrix of vb posterior

        + Output:
            value of the objective function.

        """

        n, M = X.shape

        log_2pi = np.log(2*np.pi)

        F = 0
        # const values
        F += -M/2*log_2pi -M/2 + M*log_2pi + n*M/2*log_2pi + M*np.log(2*pri_beta)

        F += ((y - X@mean)**2).sum()/2 - np.linalg.slogdet(sigma)[1]/2 + np.trace(X.T @ X @ sigma)/2

        # term obtained from Normal prior
        F += pri_beta/2*(mean@mean + np.trace(sigma)) - M/2*np.log(pri_beta) + M/2*log_2pi
        
        return F
    
    def fit(self, train_X:np.ndarray, train_Y:np.ndarray):
        pri_beta = self.pri_beta
        iteration = self.iteration
        step = self.step
        tol = self.tol
        
        is_trace = self.is_trace
        trace_step = self.trace_step
        
        M = train_X.shape[1]
        
        if not hasattr(self, "mean_"):
            self._initialization(M)
        
        est_mean = self.mean_
        est_sigma = self.sigma_
        est_pri_beta = self.pri_beta_
                
        F = []
        XY_cov = train_Y@train_X
        X_cov = train_X.T@train_X
        
        for ite in range(iteration):
            sigma_inv = X_cov + est_pri_beta*np.eye(M)
            est_mean = np.linalg.solve(sigma_inv, XY_cov)
            est_sigma = np.linalg.inv(sigma_inv)
            
            # update pri_beta by extreme value
            est_pri_beta = M/(est_mean@est_mean + np.trace(est_sigma)) if self.pri_opt_flag else pri_beta
            current_F = self._obj_func(train_X, train_Y, est_pri_beta, est_mean, est_sigma)
            if is_trace and ite % trace_step == 0:
                print(current_F, (dFdnu1**2).sum(), (dFdnu2**2).sum())            
            
            if ite > 0 and np.abs(current_F - F[ite-1]) < tol:
                if is_trace:
                    print(current_F, (dFdnu1**2).sum(), (dFdnu2**2).sum())                            
                break
            else:
                F.append(current_F)
            pass
        
        
        self.F_ = F
        self.mean_ = est_mean
        self.sigma_ = est_sigma
        self.pri_beta_ = est_pri_beta
        
        return self
        pass
    
    def predict(self, test_X: np.ndarray):
        if not hasattr(self, "mean_"):
            raise ValueError("fit has not finished yet, should fit before predict.")
        return test_X @ self.mean_
        pass
        
    pass

In [47]:
class VBApproxLaplace(BaseEstimator, RegressorMixin):
    """
    Laplace prior is approximated by normal distribution, and approximated posterior distribution is obtained by the approximated laplace prior.
    """
    
    def __init__(
        self, pri_beta: float = 20, pri_opt_flag: bool = True,
        seed: int = -1, iteration: int = 1000, tol: float = 1e-8, step: float = 0.1,
        is_trace: bool = False, trace_step: int = 20
    ):
        self.pri_beta = pri_beta
        self.pri_opt_flag = pri_opt_flag
        self.seed = seed
        self.iteration = iteration
        self.tol = tol
        self.step = step
        self.is_trace = is_trace
        self.trace_step = trace_step
        pass
    
    def _initialization(self, M: int):
        seed = self.seed
        
        if seed > 0:
            np.random.seed(seed)
        
        mean = np.random.normal(size = M)
        sigma = invwishart.rvs(df = M+2, scale = np.eye(M), size = 1)
        pri_beta = np.random.gamma(shape = 3, size = 1) if self.pri_opt_flag else self.pri_beta
        
        self.mean_ = mean
        self.sigma_ = sigma
        self.pri_beta_ = pri_beta
        pass
    
    def _obj_func(self, y:np.ndarray, pri_beta:float, mean:np.ndarray, inv_sigma:np.ndarray, h_xi: np.ndarray, v_xi: np.ndarray) -> float:
        """
        Calculate objective function.

        + Input:
            1. X: input matrix (n, M) matrix
            2. y: output vector (n, ) matrix
            3. mean: mean parameter of vb posterior
            4. sigma: covariance matrix of vb posterior

        + Output:
            value of the objective function.

        """

        F = 0
        F += pri_beta/2*np.sqrt(h_xi).sum() + v_xi@h_xi - M*np.log(pri_beta/2)
        F += n/2*np.log(2*np.pi) + train_Y@train_Y/2 - mean @ (inv_sigma @ mean)/2 + np.linalg.slogdet(inv_sigma)[0]/2
        return F
    
    def fit(self, train_X:np.ndarray, train_Y:np.ndarray):
        iteration = self.iteration
        step = self.step
        tol = self.tol
        
        is_trace = self.is_trace
        trace_step = self.trace_step
        
        M = train_X.shape[1]
        
        if not hasattr(self, "mean_"):
            self._initialization(M)
        
        est_mean = self.mean_
        est_sigma = self.sigma_
        est_pri_beta = self.pri_beta_
                
        F = []
        X_cov = train_X.T@train_X
        XY_cov = train_X.T @ train_Y
        
        for ite in range(iteration):
            # update form of approximated laplace prior
            est_h_xi = est_mean**2 + np.diag(est_sigma)
            est_v_xi = -est_pri_beta/2/np.sqrt(est_h_xi)            
            
            # update posterior distribution
            inv_sigma = X_cov -2*np.diag(est_v_xi)
            est_mean = np.linalg.solve(inv_sigma, XY_cov)
            est_sigma = np.linalg.inv(inv_sigma)
            
            # update pri_beta by extreme value
            est_pri_beta = M/((est_mean**2 + np.diag(est_sigma))/(2*np.sqrt(est_h_xi))).sum() if self.pri_opt_flag else pri_beta
            
            current_F = self._obj_func(train_Y, est_pri_beta, est_mean, inv_sigma, est_h_xi, est_v_xi)
            if is_trace and ite % trace_step == 0:
                print(current_F)            
            
            if ite > 0 and np.abs(current_F - F[ite-1]) < tol:
                if is_trace:
                    print(current_F, (dFdnu1**2).sum(), (dFdnu2**2).sum())                            
                break
            else:
                F.append(current_F)
            pass
        
        
        self.F_ = F
        self.mean_ = est_mean
        self.sigma_ = est_sigma
        self.pri_beta_ = est_pri_beta
        
        return self
        pass
    
    def predict(self, test_X: np.ndarray):
        if not hasattr(self, "mean_"):
            raise ValueError("fit has not finished yet, should fit before predict.")
        return test_X @ self.mean_
        pass
    
    pass

# Experiment part
+ By some datasets are used for train and evaluate

In [48]:
score_func = lambda X, y, coef: 1 - ((y - X@coef)**2).sum() / ((y - y.mean())**2).sum()
score_vb_laplace_exact = np.zeros(datasets)
score_vb_laplace_approx = np.zeros(datasets)
score_vb_normal = np.zeros(datasets)
score_ard = np.zeros(datasets)
score_lasso = np.zeros(datasets)
score_ridge = np.zeros(datasets)

In [49]:
sq_error = lambda X, y, coef: ((y - X@coef)**2).mean()
sq_error_vb_laplace_exact = np.zeros(datasets)
sq_error_vb_laplace_approx = np.zeros(datasets)
sq_error_vb_normal = np.zeros(datasets)
sq_error_ard = np.zeros(datasets)
sq_error_lasso = np.zeros(datasets)
sq_error_ridge = np.zeros(datasets)

In [50]:
ln_vb_params

{'pri_beta': 10,
 'pri_opt_flag': True,
 'variance': 'diag',
 'iteration': 10000,
 'step': 0.2,
 'is_trace': True,
 'trace_step': 1}

In [51]:
ln_vb_params["variance"] = "diag"
ln_vb_params["is_trace"] = True
ln_vb_params["trace_step"] = 1

In [52]:
vb_laplace_exact_obj = VBLaplace(**ln_vb_params)

# data generation
train_X = np.random.normal(size = (n, M))
train_Y = train_X @ true_w + np.random.normal(size = n)

vb_laplace_exact_obj.fit(train_X, train_Y)

True
True
92849.20075302289 18875326.398489423 359540.5075198414
True
True
109975.05719224799 12050071.483889073 228639.24678796716
True
True
116892.51099156705 7713431.384784327 146326.77462441372
True
True
120562.93032500097 4936977.43210936 93654.6969691077
True
True
122793.43863572853 3159817.484014558 59943.23527331648
True
True
124262.00511404609 2022354.556807278 38366.419704742824
True
True
125280.48652512468 1294343.706577247 24556.265168057358
True
True
126012.50778262592 828400.0917172353 15717.133529282619
True
True
126552.23815685244 530187.5113582017 10059.68652097304
True
True
126957.71681295871 339326.7140303096 6438.663168839806
True
True
127266.6430873462 217173.10654467097 4121.043373610755
True
True
127504.53280414063 138993.2224516469 2637.660796757177
True
True
127689.2293920282 88957.15756121138 1688.2277532437847
True
True
127833.54198466054 56933.507459608576 1080.5466080308165
True
True
127946.86170817289 36438.02300742064 691.6022403025806
True
True
128036.19

VBLaplace(is_trace=True, iteration=10000, pri_beta=10, pri_opt_flag=True,
          seed=-1, step=0.2, tol=1e-08, trace_step=1, variance='diag')

In [37]:
vb_laplace_exact_obj.sigma_

array([[0.00993731, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.00920756, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.00794714, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.01296585, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.01229836,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.00969664]])

In [36]:
np.diag(vb_laplace_exact_obj.sigma_)

array([0.00993731, 0.00920756, 0.00794714, 0.01077332, 0.00865557,
       0.01218945, 0.01088755, 0.00849105, 0.01023444, 0.00936811,
       0.01237105, 0.00910487, 0.0105398 , 0.0106561 , 0.01101786,
       0.01198317, 0.00998481, 0.01096596, 0.01129393, 0.00859911,
       0.01088822, 0.01018645, 0.0124968 , 0.01088129, 0.00993674,
       0.01033463, 0.00807404, 0.00981549, 0.00842826, 0.0096859 ,
       0.01049741, 0.01145693, 0.01060514, 0.00943052, 0.00987806,
       0.00992774, 0.00845357, 0.00942973, 0.01004097, 0.01081384,
       0.01067888, 0.00950251, 0.01190983, 0.0109627 , 0.01155085,
       0.01207892, 0.00979357, 0.01085858, 0.00920896, 0.01014675,
       0.00818336, 0.00866962, 0.00897798, 0.00956785, 0.01058597,
       0.00977907, 0.00981974, 0.00906789, 0.01131356, 0.00816479,
       0.01190088, 0.00861629, 0.01254633, 0.01218215, 0.0093641 ,
       0.00993394, 0.01050513, 0.01027787, 0.00870318, 0.01063569,
       0.01004705, 0.00799553, 0.00936932, 0.00995082, 0.01174

In [31]:
ln_vb_params["variance"] = "full"
ln_vb_params["is_trace"] = True
ln_vb_params["trace_step"] = 1

In [32]:
vb_laplace_exact_obj = VBLaplace(**ln_vb_params)

# data generation
train_X = np.random.normal(size = (n, M))
train_Y = train_X @ true_w + np.random.normal(size = n)

vb_laplace_exact_obj.fit(train_X, train_Y)

34899.96078703485 44836549.35076478 1498066.7226087423
27089.439054738556 28652851.666948825 959396.1150594726
23041.523100337807 18333669.86063216 614571.6097237729
20526.658972785768 11733044.300308771 393241.28656629304
18836.793352662375 7508084.607331431 251625.7321074433
17652.457858222213 4804530.690146969 161032.27971021255
16802.456315779884 3074666.9545456795 103044.47072593545
16183.15708264365 1967700.0594764235 65985.11014243097
15728.18775647413 1259217.3434293328 42247.04476274988
15393.169921812807 805820.5272205375 27028.656781456222
15146.459957773823 515798.8262713634 17294.209168740246
14965.24576437733 330170.82957615214 11066.77847862587
14832.68507697147 211315.56884619815 7081.745684861961
14736.161825728375 135251.33406055975 4531.925537909295
14666.024158755012 86579.81252036244 2901.630594946606
14614.966308772853 55424.14756824791 1858.318412855615
14577.55141350264 35480.14412692711 1190.0991904774605
14549.770656696206 22712.826710415102 762.0774836883377


VBLaplace(is_trace=True, iteration=10000, pri_beta=10, pri_opt_flag=True,
          seed=-1, step=0.2, tol=1e-08, trace_step=1, variance='fule')

In [18]:
for dataset_ind in range(datasets):
    vb_laplace_exact_obj = VBLaplace(**ln_vb_params)
    vb_laplace_approx_obj = VBApproxLaplace(**ln_vb_params)
    vb_normal_obj = VBNormal(**ln_vb_params)
    lasso_obj = LassoCV(**ln_lasso_params)
    ridge_obj = RidgeCV(**ln_ridge_params)
    ard_obj = ARDRegression(**ln_ard_params)
    
    # data generation
    train_X = np.random.normal(size = (n, M))
    train_Y = train_X @ true_w + np.random.normal(size = n)

    lasso_obj.fit(train_X, train_Y)
    ridge_obj.fit(train_X, train_Y)
    ard_obj.fit(train_X, train_Y)
    vb_laplace_exact_obj.fit(train_X, train_Y)
    vb_normal_obj.fit(train_X, train_Y)
    vb_laplace_approx_obj.fit(train_X, train_Y)

    test_X = np.random.normal(size = (N, M))
    test_Y = test_X @ true_w + np.random.normal(size = N)
    
    ### evaluation by square error
    sq_error_lasso[dataset_ind] = sq_error(test_X, test_Y, lasso_obj.coef_)
    sq_error_ridge[dataset_ind] = sq_error(test_X, test_Y, ridge_obj.coef_)
    sq_error_ard[dataset_ind] = sq_error(test_X, test_Y, ard_obj.coef_)
    sq_error_vb_laplace_exact[dataset_ind] = sq_error(test_X, test_Y, vb_laplace_exact_obj.mean_)
    sq_error_vb_normal[dataset_ind] = sq_error(test_X, test_Y, vb_normal_obj.mean_)
    sq_error_vb_laplace_approx[dataset_ind] = sq_error(test_X, test_Y, vb_laplace_approx_obj.mean_)

    print(
        "sq_error:"
        , sq_error_lasso[dataset_ind]
        , sq_error_ridge[dataset_ind]
        , sq_error_ard[dataset_ind]
        , sq_error_vb_laplace_exact[dataset_ind]
        , sq_error_vb_normal[dataset_ind]
        , sq_error_vb_laplace_approx[dataset_ind]
    )    
    
    ### evaluation by R^2 score
    score_lasso[dataset_ind] = score_func(test_X, test_Y, lasso_obj.coef_)
    score_ridge[dataset_ind] = score_func(test_X, test_Y, ridge_obj.coef_)
    score_ard[dataset_ind] = score_func(test_X, test_Y, ard_obj.coef_)
    score_vb_laplace_exact[dataset_ind] = score_func(test_X, test_Y, vb_laplace_exact_obj.mean_)
    score_vb_normal[dataset_ind] = score_func(test_X, test_Y, vb_normal_obj.mean_)
    score_vb_laplace_approx[dataset_ind] = score_func(test_X, test_Y, vb_laplace_approx_obj.mean_)
    
    print(
        "R^2 score:"
        , score_lasso[dataset_ind]
        , score_ridge[dataset_ind]
        , score_ard[dataset_ind]
        , score_vb_laplace_exact[dataset_ind]
        , score_vb_normal[dataset_ind]
        , score_vb_laplace_approx[dataset_ind]
    )

TypeError: __init__() got an unexpected keyword argument 'variance'

In [79]:
print(
    sq_error_lasso.mean()
    , sq_error_ridge.mean()
    , sq_error_ard.mean()
    , sq_error_vb_laplace_exact.mean()
    , sq_error_vb_normal.mean()
    , sq_error_vb_laplace_approx.mean()
)

393.94768472453995 308.2709538942517 572.4249348002895 293.35773252079895 301.32563165797364 304.6139525695377


In [80]:
print(
    score_lasso.mean()
    , score_ridge.mean()
    , score_ard.mean()
    , score_vb_laplace_exact.mean()
    , score_vb_normal.mean()
    , score_vb_laplace_approx.mean()
)

0.5560140895408846 0.6525618684686846 0.3547590596218103 0.6693602376627976 0.660415787359578 0.6566254504698098


In [75]:
import matplotlib.pyplot as plt

In [20]:
true_w

array([ 0.        ,  0.        ,  0.        ,  0.        , -0.76668197,
        0.07758316,  0.        , -0.0995576 ,  0.        , -0.90644613,
        0.        ,  0.        , -1.16645995,  1.00565264, -1.27819984,
        0.30094245, -0.627204  ,  3.38047157,  1.22344481,  0.0473449 ,
       -0.03814966,  0.        ,  0.        ,  3.52029545,  0.        ,
        4.81596502,  0.        , -3.25347577,  0.66002526,  0.        ,
        0.        ,  0.        ,  0.        ,  2.63380714,  1.33677451,
        0.        , -2.69622796,  3.58475763,  0.        ,  0.        ,
        2.71306942,  0.        ,  1.88162727, -2.59235379,  3.48865255,
        0.        ,  0.        ,  0.        ,  2.00792993, -0.4410573 ,
        0.        , -4.15620415,  3.9016001 ,  0.        ,  6.5176337 ,
       -3.96303101, -0.03691128,  1.35713552,  2.36783035,  0.        ,
        0.        ,  0.18762913, -1.26910508, -3.83069906,  0.26426108,
       -1.0842188 ,  0.        ,  0.        ,  0.        ,  0.  

# Conclusion
+ We experimented the performance of the rigorously derived variational linear regression algorithm for the Laplace prior by comparing:
    1. Ordinal optimized Lasso by cross-validation
    2. Ordinal optimized Ridge by cross-validation
    3. Variational Bayes linear regression for the normal prior
    4. Bayesian ARD
    5. Variational Bayes linear regression for the approximated Laplace prior.
+ Results are as follows:
    1. n > M with non-zero elements: ridge, vb for the normal prior gives the best performance, although vb for the Laplace prior gives better performance.
    2. n > M with zero-elements: lasso, vb for the approximated Laplace gives the best performance. although vb for the Laplace prior also gives better performance.
    3. M > n with zero-elements: results is similar with 1.
    4. M > n with zero-elements: results is similar with 2.
    5. M >> n, especially # of non-zero elements is larger than # of samples, vb for the Laplace prior gives the best performance.
+ Summary of results:
    + Derived algorithm can estimate every case, and # of features are extremely larger. 