## Why number of nonzero differ : Lasso VS Relaxed Lasso

In [4]:
include("lasso_glmnet.jl")
include("data_generator.jl")

#11 (generic function with 1 method)

In [41]:
using Random 
Random.seed!(1)
n = 500
p = 100
β_type = 2
s = 5
ν = 0.7
ρ = 0.35

X, Σ = gen_pred(n, p, ρ)
X_test = gen_pred(n, p , ρ)[1]

β = gen_beta(β_type, p, s)
σ2 = β'*Σ*β / ν;
y = gen_resp(X, β, Σ, ν) ;

In [42]:
# Lasso tuning procedure
β_path_lasso , λgrid = lasso_glmnet(X, y, nlambda = 50, nrelax = 1, intercept = false)
test_lasso = zeros(size(β_path_lasso, 2))

for i in 1:size(β_path_lasso, 2)           
    β_test = β_path_lasso[:, i]
    test_lasso[i] = norm(X_test * β - X_test * β_test)^2 / norm(X_test * β)^2
end

β̂_lasso = β_path_lasso[:, argmin(test_lasso)] ;

In [43]:
@show length(test_lasso)
@show count(!iszero, β̂_lasso)
@show argmin(test_lasso)
@show λgrid[argmin(test_lasso)]

length(test_lasso) = 42
count(!iszero, β̂_lasso) = 11
argmin(test_lasso) = 12
λgrid[argmin(test_lasso)] = 0.2807829056926784


0.2807829056926784

In [61]:
 # Relaxed Lasso tuning procedure
 β_path_relaxo , relaxλgrid = lasso_glmnet(X, y, nlambda = 50, nrelax = 11, intercept = false)
 test_relaxo = zeros(size(β_path_relaxo, 2))
 
 for i in 1:size(β_path_relaxo, 2)           
     β_test = β_path_relaxo[:, i]
     test_relaxo[i] = norm(X_test * β - X_test * β_test)^2 / norm(X_test * β)^2
 end
 
 β̂_relaxo = β_path_relaxo[:, argmin(test_relaxo)] ;

In [45]:
@show length(test_relaxo)
@show count(!iszero, β̂_relaxo)
@show argmin(test_relaxo)


length(test_relaxo) = 462
count(!iszero, β̂_relaxo) = 5
argmin(test_relaxo) = 76


76

$76 = 7 * 10 + 6$ . Thus relaxed lasso uses $7$-th $\lambda$ and $6$-th $\gamma$

In [46]:
@show relaxλgrid[7]     

relaxλgrid[7] = 0.7186773029952344


0.7186773029952344

For the same training data and validatiaon data , the relaxed lasso chooses tuning parameter $λ = 0.718$ and $γ = 0.5$ while the lasso chooses tuning parameter $\lambda = 0.281$

Since relaxed lasso chooses larger $\lambda$ than the lasso through the tuning (which can be balanced by taking weighted average with active set restriced LSE), we can now understand why relaxed lasso has smaller nonzero values in the figures.

In [64]:
using SparseArrays

In [65]:
sparse(β̂_lasso)

100-element SparseVector{Float64, Int64} with 11 stored entries:
  [1  ]  =  0.84479
  [2  ]  =  0.918044
  [3  ]  =  0.989172
  [4  ]  =  0.913061
  [5  ]  =  0.722364
  [26 ]  =  0.105994
  [30 ]  =  -0.0439563
  [70 ]  =  -0.0422064
  [76 ]  =  0.237095
  [80 ]  =  -0.0697346
  [82 ]  =  -0.0246187

In [66]:
sparse(β̂_relaxo)

100-element SparseVector{Float64, Int64} with 5 stored entries:
  [1  ]  =  0.998882
  [2  ]  =  1.04285
  [3  ]  =  1.09293
  [4  ]  =  1.02919
  [5  ]  =  0.882299

In [67]:
sparse(β)

100-element SparseVector{Float64, Int64} with 5 stored entries:
  [1  ]  =  1.0
  [2  ]  =  1.0
  [3  ]  =  1.0
  [4  ]  =  1.0
  [5  ]  =  1.0

Indeed, relaxed lasso has come very close to true $\beta$ while the lasso is a little further from true $\beta$ than the relaxed one.