<div >
<img src = "figures_notebook/banner.png" />
</div>

# Cuando usar Deeplearning?

The goal is to predict the Salary of a baseball player in 1987 using his performance statistics from 1986. 

In [1]:
# install.packages("pacman") #run this line if you use Google Colab

In [2]:
require("pacman")
p_load("ISLR2","keras")

Gitters <- na.omit(Hitters)
n <- nrow(Gitters)
set.seed(13)
ntest <- trunc(n / 3)
testid <- sample(1:n, ntest)


Loading required package: pacman



In [3]:
head(Gitters[-testid, ])

Unnamed: 0_level_0,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,League,Division,PutOuts,Assists,Errors,Salary,NewLeague
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<fct>,<fct>,<int>,<int>,<int>,<dbl>,<fct>
-Alan Ashby,315,81,7,24,38,39,14,3449,835,69,321,414,375,N,W,632,43,10,475.0,N
-Alvin Davis,479,130,18,66,72,76,3,1624,457,63,224,266,263,A,W,880,82,14,480.0,A
-Andre Dawson,496,141,20,65,78,37,11,5628,1575,225,828,838,354,N,E,200,11,3,500.0,N
-Andres Galarraga,321,87,10,39,42,30,2,396,101,12,48,46,33,N,E,805,40,4,91.5,N
-Al Newman,185,37,1,23,8,21,2,214,42,1,30,9,24,N,E,76,127,7,70.0,A
-Andres Thomas,323,81,6,26,32,8,2,341,86,6,32,34,8,N,W,143,290,19,75.0,N


## OLS

We begin with a linear model to fit the training data, and make predictions on the test data. The model has 20 parameters.

In [4]:

lfit <- lm(Salary ~ ., data = Gitters[-testid, ])
summary(lfit)


Call:
lm(formula = Salary ~ ., data = Gitters[-testid, ])

Residuals:
    Min      1Q  Median      3Q     Max 
-741.19 -178.38  -35.56  126.66 1788.49 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  208.4773   116.8454   1.784  0.07633 . 
AtBat         -1.6710     0.8002  -2.088  0.03840 * 
Hits           5.6020     3.0260   1.851  0.06602 . 
HmRun         16.4724     8.0076   2.057  0.04134 * 
Runs          -1.4194     3.9696  -0.358  0.72115   
RBI           -4.2062     3.4562  -1.217  0.22544   
Walks          7.6626     2.3529   3.257  0.00138 **
Years         -1.0588    16.3828  -0.065  0.94855   
CAtBat        -0.1222     0.1727  -0.708  0.48030   
CHits          0.3250     0.8296   0.392  0.69580   
CHmRun        -1.0436     2.0797  -0.502  0.61651   
CRuns          0.6036     0.9954   0.606  0.54512   
CRBI           0.7710     0.8915   0.865  0.38846   
CWalks        -0.5362     0.4149  -1.292  0.19813   
LeagueN       53.8149    86.0297   0.

In [5]:
lpred <- predict(lfit, Gitters[testid, ])

#MAE
with(Gitters[testid, ], mean(abs(lpred - Salary)))


## Lasso

The same linear model but with lasso regularization. The tuning parameter will be selected by 10-fold cross-validation on the training data.

In [6]:
p_load("glmnet")

x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
y <- Gitters$Salary
cvfit <- cv.glmnet(x[-testid, ], y[-testid],
                   type.measure = "mae")


cvfit


Call:  cv.glmnet(x = x[-testid, ], y = y[-testid], type.measure = "mae") 

Measure: Mean Absolute Error 

    Lambda Index Measure    SE Nonzero
min   6.25    39   236.6 29.74      13
1se  70.18    13   264.7 26.31       6

In [7]:
coef(cvfit)

21 x 1 sparse Matrix of class "dgCMatrix"
                    s1
(Intercept) 530.701335
AtBat         .       
Hits         10.577456
HmRun         1.257239
Runs          .       
RBI           5.074517
Walks        57.358689
Years         .       
CAtBat        .       
CHits        35.712624
CHmRun        .       
CRuns        85.386016
CRBI          .       
CWalks        .       
LeagueA       .       
LeagueN       .       
DivisionW     .       
PutOuts       .       
Assists       .       
Errors        .       
NewLeagueN    .       

In [8]:
cpred <- predict(cvfit, x[testid, ], s = "lambda.min")

# MAE
mean(abs(y[testid] - cpred))

## Redes Neuronales

A neural network with one hidden layer consisting of 64 ReLU units was fit to the data.

In [9]:
modnn <- keras_model_sequential() %>%
  layer_dense(units = 50, activation = "relu",
              input_shape = ncol(x)) %>%
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 1)

x <- scale(model.matrix(Salary ~ . - 1, data = Gitters))
x <- model.matrix(Salary ~ . - 1, data = Gitters) %>% scale()

modnn %>% compile(loss = "mse",
                  optimizer = optimizer_rmsprop(),
                  metrics = list("mean_absolute_error")
)

history <- modnn %>% fit(
  x[-testid, ], y[-testid], epochs = 600, batch_size = 32,
  validation_data = list(x[testid, ], y[testid])
)

npred <- predict(modnn, x[testid, ])



In [10]:
summary(modnn)

Model: "sequential"
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
dense_1 (Dense)                     (None, 50)                      1050        
________________________________________________________________________________
dropout (Dropout)                   (None, 50)                      0           
________________________________________________________________________________
dense (Dense)                       (None, 1)                       51          
Total params: 1,101
Trainable params: 1,101
Non-trainable params: 0
________________________________________________________________________________


In [11]:
#MAE
mean(abs(y[testid] - npred))

So in cases like this we are much better off following the Occam’s razor principle: when faced with several methods that give roughly equivalent performance, pick the simplest.