<div >
<img src = "../banner.jpg" />
</div>

<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/BDML_202401/blob/main/Modulo03/Modulo03_Regularizacion.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>



# Regularization

## Ridge 

To apply a regularized model we can use the `glmnet` function of the homonymous package. The function contains an `alpha` parameter that tells `glmnet` to perform a ridge (`alpha` = 0), lasso (`alpha` = 1), or elastic net (0 < `alpha` < 1) model. 

By default, `glmnet` will do two things that you should know:

1. Since regularized methods apply a penalty to the coefficients, we need to ensure our coefficients are on a common scale. If not, then predictors with naturally larger values  will be penalized more than predictors with naturally smaller values. By default, `glmnet` automatically standardizes your features. If you standardize your predictors prior to glmnet you can turn this argument off with `standardize = FALSE`.

2. The regularization path is computed at a grid of values (on the log scale) for the regularization parameter $\lambda$. The algorithm is extremely fast!

In [None]:
#install.packages("pacman") for #google colab

In [None]:
require("pacman")
p_load("tidyverse","glmnet")


In [None]:
dta <-read.csv("https://raw.githubusercontent.com/ignaciomsarmiento/datasets/main/regularization_train.csv")

In [None]:
dim(dta)

`glmnet` has some drawbacks, the main one is that we need to specify the arguments in terms of matrices and vector. `caret`, in contrast, streamlines the process of creating predictive models by providing a uniform interface for predictive models, which, among other things, allows for specifying formulas.

In [None]:
Xsmall<-as.matrix(dta[,2:7])
Xsmall

In [None]:
y<-dta[,1]

In [None]:
ridge1 <- glmnet(
  x = Xsmall,
  y = y,
  lambda=1,
  alpha = 0 #ridge
)

In [None]:
coef(ridge1)

In [None]:
summary(lm(y~Xsmall))

In [None]:
cor(Xsmall)

Let's see the regularization path, that shows how much the coefficients are penalized for different values of $\lambda$. 

In [None]:
ridge2 <- glmnet(
  x = Xsmall,
  y = y,
  alpha = 0 #ridge
)

In [None]:
plot(ridge2, xvar = "lambda")

In [None]:
lasso1 <- glmnet(
  x = Xsmall,
  y = y,
 lambda=0.01,
  alpha = 1 #lasso
)


In [None]:
coef(lasso1)

In [None]:
lasso2 <- glmnet(
  x = Xsmall,
  y = y,
  alpha = 1 #lasso
)

plot(lasso2, xvar = "lambda")

In [None]:
data.frame(cbind(lambda=log(lasso2$lambda),t(as.matrix(lasso2$beta))))  %>% arrange(lambda)

# Predictive exercise with k>n

## Ridge

In [None]:
X<-as.matrix(dta[,-1])

In [None]:
dim(X)

In [None]:
cv_ridge <- cv.glmnet(
  x = X,
  y = y,
  alpha = 0 #ridge
)

In [None]:
cv_ridge

In [None]:
plot(cv_ridge)

In [None]:
log(cv_ridge$lambda.min)

In [None]:
coef(cv_ridge, s = "lambda.min")

## Lasso

In [None]:
cv_lasso <- cv.glmnet(
  x = X,
  y = y,
  alpha = 1 #lasso
)

In [None]:
coef(cv_lasso, s = "lambda.min")

In [None]:
plot(cv_lasso)

## Elastic Net

In [None]:
cv_en <- cv.glmnet(
  x = X,
  y = y,
  alpha = 0.75
)

In [None]:
plot(cv_en)

## Out of sample performance

In [None]:
dta_test<-read_csv("https://raw.githubusercontent.com/ignaciomsarmiento/datasets/main/regularization_test.csv")

In [None]:
Xtest<-as.matrix(dta_test[,-1])
ytest<-dta_test$y

In [None]:
yhat_ridge<-predict(cv_ridge, newx = Xtest, s = "lambda.min")

In [None]:
MSE_ridge <- summary(lm((ytest-yhat_ridge)^2~1))$coef[1]
MSE_ridge

In [None]:
yhat_lasso<-predict(cv_lasso, newx = Xtest, s = "lambda.min")

In [None]:
MSE_lasso<- summary(lm((ytest-yhat_lasso)^2~1))$coef[1]
MSE_lasso

In [None]:
yhat_en<-predict(cv_en, newx = Xtest, s = "lambda.min")

In [None]:
MSE_en<- summary(lm((ytest-yhat_en)^2~1))$coef[1]
MSE_en

In [None]:
yhat_en1se<-predict(cv_en, newx = Xtest, s = "lambda.1se")
MSE_en1se<- summary(lm((ytest-yhat_en1se)^2~1))$coef[1]
MSE_en1se

### Caret for tunning alpha

In [None]:
p_load("caret")

In [None]:
set.seed(42)

tc_10 <- trainControl(method = "cv", number = 10)

en_caret <- train(
  x=X,
  y=y,
  method = "glmnet",
  trControl = tc_10,
  tuneLength=100
)

In [None]:
en_caret

In [None]:
yhat_en_caret<-predict(en_caret, newdata = Xtest)

In [None]:
MSE_en_caret<- summary(lm((ytest-yhat_en_caret)^2~1))$coef[1]
MSE_en_caret

In [None]:
MSE_lasso