Follow [Friedman et al 2007 , pg 6](http://arxiv.org/pdf/0708.1485.pdf)

$$min_{\beta} \frac{1}{2} \sum (y_i -\sum x_{ij} \beta_j)^2 + \lambda \sum_{j} (\alpha |\beta_j| + (1-\alpha) \frac{\beta_{j}^2}{2} )$$

If the data are standardized so that $ \frac{\sum x_i^2}{n} =1 $ and $\sum x_i=0$, the algorithm can be defined as:

$$ \beta_j \leftarrow \frac{S(\frac{\sum_i x_{ij} (y_i - \hat{y_i^j})_{+}}{n} , \lambda \alpha )}{1+(\lambda(1-\alpha) )} $$

Where the soft threshold operator is given by

$$S(\beta, \tau) = sign(\beta) (|\beta|-\tau)$$

And $\hat{y_i^j} = \sum_{k \neq j } x_{ik} \beta_{k}$ stands for the fitted values of the standing betas ignoring the jth column.

In [1]:
import sklearn
import numpy as np
from sklearn import datasets

In [2]:
def fElastic_net(x,y,alpha, l1_ratio, maxiter, tol):
    '''
    -x: array of exogenous variables
    -y: array (vector) of endogenous variables
    -alpha: penalization/shrinking parameters
    -l1_ratio: weight put on the lasso, 1-l1_ratio is the weight put on the ridge
    -maxiter: maximum number of iterations
    -tol: level of tolerance for convergence.
    '''

    #1. Standardize the data#
    x =sklearn.preprocessing.scale(x, axis = 0)
    y = sklearn.preprocessing.scale(y)

    #2. Retrieve parameter of interest#
    #Sample size
    n = x.shape[0]
    #Number predictors
    k = x.shape[1]

    #3.Initialize the beta vector#
    betas = np.ones(k)
    betas_last = betas +10

    #4. Start outer loop
    for it in range(maxiter):
        #Check convergence#
        #Depends on proportional increase
        if np.linalg.norm(betas_last-betas)/np.linalg.norm(betas_last) <tol:
            print('converged  at iteration %d'%it,'with difference %d'%np.linalg.norm(betas_last-betas), 'and betas',betas)
            return betas
            break;     
        else:#o.w. loop over coefficients
            betas_last =np.array(betas, copy= True)

            #Cycle around coordinates
            for j in range(k):
                #a.Calculate partial residuals#
                #Use a mask to extract all but the j column
                mask = np.ones(k,dtype=bool)
                mask[[j]]= False
                #Compute the partial residuals
                res_j = y - np.dot(x[:,mask],betas[mask])
                
                #b.Regress on partial residuals residuals and obtain standard beta_ols_j#
                beta_ols_j = np.dot(x[:,j],res_j)/n
                
                #c. Update using the soft threshold operator and adjust for ridge#
                betas[j] = (np.sign(beta_ols_j)*(np.abs(beta_ols_j)-alpha*l1_ratio))/(1+alpha*(1-l1_ratio))
            

### Consider the OLS


In [3]:
#Load data to play with
diabetes = datasets.load_diabetes()
x = diabetes.data
y = diabetes.target
#Setup arguments
#Regularization parameters
alpha = 0#similar to OLS
#Convex combination
l1_ratio = 1
#Maximum iterations
maxiter = 200000
#Tolerance
tol = 1e-5

fElastic_net(x,y,alpha, l1_ratio, maxiter, tol)

converged  at iteration 482 with difference 0 and betas [-0.00618541 -0.14813387  0.32109297  0.20037221 -0.48966253  0.29474712
  0.06256708  0.10941377  0.46417943  0.04177067]


array([-0.00618541, -0.14813387,  0.32109297,  0.20037221, -0.48966253,
        0.29474712,  0.06256708,  0.10941377,  0.46417943,  0.04177067])

In [4]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
#Standardize the data
x =sklearn.preprocessing.scale(x, axis = 0)
y = sklearn.preprocessing.scale(y)
#Run the model
mod = sm.OLS( y,x)
res = mod.fit()
print(res.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.518
Model:                            OLS   Adj. R-squared:                  0.507
Method:                 Least Squares   F-statistic:                     46.38
Date:                Mon, 11 Apr 2016   Prob (F-statistic):           2.68e-62
Time:                        00:55:37   Log-Likelihood:                -466.00
No. Observations:                 442   AIC:                             952.0
Df Residuals:                     432   BIC:                             992.9
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0062      0.037     -0.168      0.8

The result is the same. 

## Compare with the elastic net package for the regularize case

In [5]:
#Load data to play with
diabetes = datasets.load_diabetes()
x = diabetes.data
y = diabetes.target
x =sklearn.preprocessing.scale(x, axis = 0)
y = sklearn.preprocessing.scale(y)
#Setup arguments
#Regularization parameters
lambda1 = 1#similar to OLS
#Convex combination
alpha = .5
#Maximum iterations
maxiter = 200000
#Tolerance
tol = 1e-5

fElastic_net(x,y,lambda1, alpha, maxiter, tol)

converged  at iteration 15 with difference 0 and betas [-0.17231757 -0.29164263  0.09726452  0.02783835 -0.18492683 -0.14766446
  0.1045021   0.1439201   0.14110477 -0.03157433]


array([-0.17231757, -0.29164263,  0.09726452,  0.02783835, -0.18492683,
       -0.14766446,  0.1045021 ,  0.1439201 ,  0.14110477, -0.03157433])

In [6]:
from sklearn.linear_model import ElasticNet
enet = ElasticNet(alpha=1, l1_ratio=0.5).fit(x,y).coef_
enet

array([ 0.      ,  0.      ,  0.048895,  0.      ,  0.      ,  0.      ,
       -0.      ,  0.      ,  0.029379,  0.      ])

The result is very different :(  something must be wrong. 