In [1]:
library(tidyverse)
library(mlr)
library(mlbench)
library(e1071)
library(xgboost)
library(parallelMap)

count_na = function(df){
    sapply(df, function(x){sum(is.na(x))})
    }

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──
[32m✔[39m [34mggplot2[39m 3.1.1       [32m✔[39m [34mpurrr  [39m 0.3.2  
[32m✔[39m [34mtibble [39m 2.1.1       [32m✔[39m [34mdplyr  [39m 0.8.0.[31m1[39m
[32m✔[39m [34mtidyr  [39m 0.8.3       [32m✔[39m [34mstringr[39m 1.4.0  
[32m✔[39m [34mreadr  [39m 1.3.1       [32m✔[39m [34mforcats[39m 0.4.0  
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: ParamHelpers

Attaching package: ‘e1071’

The following object is masked

## MLR: Machine Learning in R

Site de referência: https://mlr.mlr-org.com/

Aproveite para ver o tutorial básico [neste link](https://mlr.mlr-org.com/articles/tutorial/usecase_regression.html).

![workflow](imgs/Selection_047.png)

Vamos aprender o workflow com o `BostonHousing`. Descrição em na documentação do pacote [mlbench](https://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/BostonHousing).

In [2]:
library(mlbench)
library(tidyverse)
library(mlr)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──
[32m✔[39m [34mggplot2[39m 3.1.1       [32m✔[39m [34mpurrr  [39m 0.3.2  
[32m✔[39m [34mtibble [39m 2.1.1       [32m✔[39m [34mdplyr  [39m 0.8.0.[31m1[39m
[32m✔[39m [34mtidyr  [39m 0.8.3       [32m✔[39m [34mstringr[39m 1.4.0  
[32m✔[39m [34mreadr  [39m 1.3.1       [32m✔[39m [34mforcats[39m 0.4.0  
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: ParamHelpers


In [5]:
data(BostonHousing)
df = BostonHousing
head(df)

crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
<dbl>,<dbl>,<dbl>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
0.02731,0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
0.02729,0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
0.03237,0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
0.06905,0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
0.02985,0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7


In [7]:
?BostonHousing

## 1. Criar a task

In [8]:
regr.task = makeRegrTask(data = df, target = 'medv')

In [9]:
regr.task

Supervised task: df
Type: regr
Target: medv
Observations: 506
Features:
   numerics     factors     ordered functionals 
         12           1           0           0 
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
Has coordinates: FALSE

## 2. Definir o learner

Checar os learners disponíveis no [site](https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html)

In [10]:
svm_learner = makeLearner(cl='regr.svm', cost = 1)

In [11]:
svm_learner

Learner regr.svm from package e1071
Type: regr
Name: Support Vector Machines (libsvm); Short name: svm
Class: regr.svm
Properties: numerics,factors
Predict-Type: response
Hyperparameters: cost=1


## 3. Treinar o modelo

Após os 2 primeiros passos, podemos definir a estratégia de resample e treinar o modelo.

Aqui vamos criar duas estratégias: `Holdout` e `Cross Validation` com 5 folds.

In [81]:
holdout = makeResampleDesc(method = 'Holdout', split = 0.7)
cv = makeResampleDesc(method = 'CV', iters = 8)

In [82]:
cv

Resample description: cross-validation with 8 iterations.
Predict: test
Stratification: FALSE

In [13]:
holdout
cv

Resample description: holdout with 0.70 split rate.
Predict: test
Stratification: FALSE

Resample description: cross-validation with 5 iterations.
Predict: test
Stratification: FALSE

Para treinar, usamos a função `resample()`.

In [14]:
res_holdout = resample(learner = svm_learner,task = regr.task,
                       resampling = holdout, list(mae, mse) )

Resampling: holdout
Measures:             mae       mse       
[Resample] iter 1:    2.5856057 20.7799511


Aggregated Result: mae.test.mean=2.5856057,mse.test.mean=20.7799511




In [27]:
set.seed(2019)
res_cv = resample(svm_learner, regr.task, cv, mae)

Resampling: cross-validation
Measures:             mae       
[Resample] iter 1:    2.2784589 
[Resample] iter 2:    1.8464070 
[Resample] iter 3:    2.4946917 
[Resample] iter 4:    2.1577096 
[Resample] iter 5:    2.3452422 


Aggregated Result: mae.test.mean=2.2245019




## 3.1 Com ajuste de hiperparâmetros

In [28]:
parameters_svm = makeParamSet(makeNumericParam("cost",lower = 0.1,
                                              upper = 1),
                             makeNumericParam("gamma", lower = 0.1,
                                              upper = 1))

In [29]:
parameters_svm

         Type len Def   Constr Req Tunable Trafo
cost  numeric   -   - 0.1 to 1   -    TRUE     -
gamma numeric   -   - 0.1 to 1   -    TRUE     -

Definir a forma de busca, vamos usar `random search`. Mais detalhes no [link](https://mlr.mlr-org.com/articles/tutorial/tune.html).

In [31]:
ctrl  = makeTuneControlRandom(maxit = 100)

In [34]:
tr$x

In [37]:
tr$resampling

Resample instance for 506 cases.
Resample description: cross-validation with 5 iterations.
Predict: test
Stratification: FALSE

In [39]:
getLearnerParamSet(svm_learner)

                   Type  len            Def                           Constr
type           discrete    - eps-regression     eps-regression,nu-regression
kernel         discrete    -         radial linear,polynomial,radial,sigmoid
degree          integer    -              3                         1 to Inf
gamma           numeric    -              -                         0 to Inf
coef0           numeric    -              0                      -Inf to Inf
cost            numeric    -              1                         0 to Inf
nu              numeric    -            0.5                      -Inf to Inf
cachesize       numeric    -             40                      -Inf to Inf
tolerance       numeric    -          0.001                         0 to Inf
epsilon         numeric    -              -                         0 to Inf
shrinking       logical    -           TRUE                                -
cross           integer    -              0                         0 to Inf

In [32]:
set.seed(2019)
tr = tuneParams(svm_learner, regr.task, cv, mae, parameters_svm, 
               ctrl)

[Tune] Started tuning learner regr.svm for parameter set:
         Type len Def   Constr Req Tunable Trafo
cost  numeric   -   - 0.1 to 1   -    TRUE     -
gamma numeric   -   - 0.1 to 1   -    TRUE     -
With control class: TuneControlRandom
Imputation value: Inf
[Tune-x] 1: cost=0.802; gamma=0.7
[Tune-y] 1: mae.test.mean=3.3302249; time: 0.0 min
[Tune-x] 2: cost=0.409; gamma=0.713
[Tune-y] 2: mae.test.mean=3.7629726; time: 0.0 min
[Tune-x] 3: cost=0.542; gamma=0.509
[Tune-y] 3: mae.test.mean=3.2051902; time: 0.0 min
[Tune-x] 4: cost=0.4; gamma=0.742
[Tune-y] 4: mae.test.mean=3.8274946; time: 0.0 min
[Tune-x] 5: cost=0.529; gamma=0.75
[Tune-y] 5: mae.test.mean=3.6524673; time: 0.0 min
[Tune-x] 6: cost=0.818; gamma=0.51
[Tune-y] 6: mae.test.mean=2.9913404; time: 0.0 min
[Tune-x] 7: cost=0.898; gamma=0.32
[Tune-y] 7: mae.test.mean=2.5550492; time: 0.0 min
[Tune-x] 8: cost=0.428; gamma=0.907
[Tune-y] 8: mae.test.mean=4.0515848; time: 0.0 min
[Tune-x] 9: cost=0.692; gamma=0.142
[Tune-y] 9

[Tune-x] 91: cost=0.245; gamma=0.986
[Tune-y] 91: mae.test.mean=4.6019590; time: 0.0 min
[Tune-x] 92: cost=0.449; gamma=0.162
[Tune-y] 92: mae.test.mean=2.5496946; time: 0.0 min
[Tune-x] 93: cost=0.691; gamma=0.765
[Tune-y] 93: mae.test.mean=3.5130059; time: 0.0 min
[Tune-x] 94: cost=0.92; gamma=0.518
[Tune-y] 94: mae.test.mean=2.9531818; time: 0.0 min
[Tune-x] 95: cost=0.239; gamma=0.937
[Tune-y] 95: mae.test.mean=4.5571643; time: 0.0 min
[Tune-x] 96: cost=0.589; gamma=0.945
[Tune-y] 96: mae.test.mean=3.8780895; time: 0.0 min
[Tune-x] 97: cost=0.377; gamma=0.834
[Tune-y] 97: mae.test.mean=4.0328014; time: 0.0 min
[Tune-x] 98: cost=0.239; gamma=0.817
[Tune-y] 98: mae.test.mean=4.3796056; time: 0.0 min
[Tune-x] 99: cost=0.425; gamma=0.639
[Tune-y] 99: mae.test.mean=3.6075700; time: 0.0 min
[Tune-x] 100: cost=0.924; gamma=0.99
[Tune-y] 100: mae.test.mean=3.6698739; time: 0.0 min
[Tune] Result: cost=0.946; gamma=0.139 : mae.test.mean=2.2119958


Melhores hiperparâmetros:

In [40]:
tr$x

# Agora é sua vez!


![your_turn](imgs/avengers.jpg)




## Faça o mesmo com o conjunto de dados [Soybean](https://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/Soybean) do pacote mlbench. 

### Siga as instruções abaixo:

1. Crie um holdout set e NÃO USE DURANTE O CROSS VALIDATION
2. Vamos comparar `xgboost` e `svm`
3. Crie um learner para cada tecninca
4. Use cv com 5 folds como técnica de amostragem (resample)
5. Use random search com 100 iterações como controle do ajuste de parâmetros
6. Encontre os melhores hiperparâmetros para cada técnica
7. Ao fim, treinaremos um modelo com os melhores e testaremos no conjunto separado no item 1 para comparar a performance dos dois

## 0. Criando dummy features (0 e 1 para categóricas) 

In [48]:
data(Soybean,package = 'mlbench')
soy = createDummyFeatures(Soybean,target = "Class")
dim(Soybean)
dim(soy)


In [52]:
table(soy$Class)


               2-4-d-injury         alternarialeaf-spot 
                         16                          91 
                anthracnose            bacterial-blight 
                         44                          20 
          bacterial-pustule                  brown-spot 
                         20                          92 
             brown-stem-rot                charcoal-rot 
                         44                          20 
              cyst-nematode diaporthe-pod-&-stem-blight 
                         14                          15 
      diaporthe-stem-canker                downy-mildew 
                         20                          20 
         frog-eye-leaf-spot            herbicide-injury 
                         91                           8 
     phyllosticta-leaf-spot            phytophthora-rot 
                         20                          88 
             powdery-mildew           purple-seed-stain 
                         20   

In [53]:
dim(drop_na(soy))

In [65]:
task = makeClassifTask(data = drop_na(soy), target = 'Class')
set.seed(25)
holdout = makeResampleInstance("Holdout",task, split = 0.7)
tsk_train = subsetTask(task, holdout$train.inds[[1]])
tsk_test = subsetTask(task, holdout$test.inds[[1]])

“Target column 'Class' contains empty factor levels”

In [67]:
task

Supervised task: drop_na(soy)
Type: classif
Target: Class
Observations: 562
Features:
   numerics     factors     ordered functionals 
         99           0           0           0 
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
Has coordinates: FALSE
Classes: 15
   alternarialeaf-spot            anthracnose       bacterial-blight 
                    91                     44                     20 
     bacterial-pustule             brown-spot         brown-stem-rot 
                    20                     92                     44 
          charcoal-rot  diaporthe-stem-canker           downy-mildew 
                    20                     20                     20 
    frog-eye-leaf-spot phyllosticta-leaf-spot       phytophthora-rot 
                    91                     20                     20 
        powdery-mildew      purple-seed-stain   rhizoctonia-root-rot 
                    20                     20                     20 
Positive class: NA

In [68]:
tsk_test

Supervised task: drop_na(soy)
Type: classif
Target: Class
Observations: 169
Features:
   numerics     factors     ordered functionals 
         99           0           0           0 
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
Has coordinates: FALSE
Classes: 15
   alternarialeaf-spot            anthracnose       bacterial-blight 
                    27                     13                      4 
     bacterial-pustule             brown-spot         brown-stem-rot 
                     6                     30                     16 
          charcoal-rot  diaporthe-stem-canker           downy-mildew 
                     6                      5                      6 
    frog-eye-leaf-spot phyllosticta-leaf-spot       phytophthora-rot 
                    23                      7                      6 
        powdery-mildew      purple-seed-stain   rhizoctonia-root-rot 
                     5                      7                      8 
Positive class: NA

In [64]:
holdout$test.inds[[1]]

In [63]:
length(holdout$train.inds[[1]])

In [69]:
library(e1071)
library(xgboost)


Attaching package: ‘e1071’

The following object is masked from ‘package:mlr’:

    impute


Attaching package: ‘xgboost’

The following object is masked from ‘package:dplyr’:

    slice



In [71]:
xgb_learner = makeLearner("classif.xgboost")
#Warning: https://stackoverflow.com/questions/55545145/what-does-the-warning-na-used-as-a-default-value-for-learner-parameter-missing
svm_learner = makeLearner("classif.svm")

“NA used as a default value for learner parameter missing.
ParamHelpers uses NA as a special value for dependent parameters.”

In [72]:
parameters_svm

         Type len Def   Constr Req Tunable Trafo
cost  numeric   -   - 0.1 to 1   -    TRUE     -
gamma numeric   -   - 0.1 to 1   -    TRUE     -

In [75]:
parameters_sxgb = makeParamSet(
                makeNumericParam("eta",0,1),
                makeNumericParam("lambda",0,200),
                makeIntegerParam("max_depth",1,20))
parameters_sxgb

             Type len Def   Constr Req Tunable Trafo
eta       numeric   -   -   0 to 1   -    TRUE     -
lambda    numeric   -   - 0 to 200   -    TRUE     -
max_depth integer   -   -  1 to 20   -    TRUE     -

In [74]:
getLearnerParamSet(xgb_learner)

                                Type  len             Def               Constr
booster                     discrete    -          gbtree gbtree,gblinear,dart
watchlist                    untyped    -          <NULL>                    -
eta                          numeric    -             0.3               0 to 1
gamma                        numeric    -               0             0 to Inf
max_depth                    integer    -               6             1 to Inf
min_child_weight             numeric    -               1             0 to Inf
subsample                    numeric    -               1               0 to 1
colsample_bytree             numeric    -               1               0 to 1
colsample_bylevel            numeric    -               1               0 to 1
num_parallel_tree            integer    -               1             1 to Inf
lambda                       numeric    -               1             0 to Inf
lambda_bias                  numeric    -           

In [76]:
tune_control = makeTuneControlRandom(maxit=100)

In [80]:
cv5

Resample description: cross-validation with 5 iterations.
Predict: test
Stratification: FALSE

In [109]:
set.seed(10)
tr_xgb = tuneParams(xgb_learner, tsk_train, cv5,acc,parameters_sxgb,tune_control)

[Tune] Started tuning learner classif.xgboost for parameter set:
             Type len Def   Constr Req Tunable Trafo
eta       numeric   -   -   0 to 1   -    TRUE     -
lambda    numeric   -   - 0 to 200   -    TRUE     -
max_depth integer   -   -  1 to 20   -    TRUE     -
With control class: TuneControlRandom
Imputation value: -0
[Tune-x] 1: eta=0.741; lambda=168; max_depth=4
[Tune-y] 1: acc.test.mean=0.6967868; time: 0.0 min
[Tune-x] 2: eta=0.157; lambda=39.6; max_depth=5
[Tune-y] 2: acc.test.mean=0.7248296; time: 0.0 min
[Tune-x] 3: eta=0.451; lambda=192; max_depth=8
[Tune-y] 3: acc.test.mean=0.6967868; time: 0.0 min
[Tune-x] 4: eta=0.117; lambda=36.2; max_depth=7
[Tune-y] 4: acc.test.mean=0.7273612; time: 0.0 min
[Tune-x] 5: eta=0.329; lambda=11.6; max_depth=1
[Tune-y] 5: acc.test.mean=0.5342746; time: 0.0 min
[Tune-x] 6: eta=0.39; lambda=189; max_depth=6
[Tune-y] 6: acc.test.mean=0.6967868; time: 0.0 min
[Tune-x] 7: eta=0.284; lambda=149; max_depth=1
[Tune-y] 7: acc.test.mean=0

[Tune-y] 79: acc.test.mean=0.7121714; time: 0.0 min
[Tune-x] 80: eta=0.959; lambda=12.5; max_depth=1
[Tune-y] 80: acc.test.mean=0.5111977; time: 0.0 min
[Tune-x] 81: eta=0.176; lambda=10.7; max_depth=15
[Tune-y] 81: acc.test.mean=0.8017202; time: 0.0 min
[Tune-x] 82: eta=0.743; lambda=195; max_depth=17
[Tune-y] 82: acc.test.mean=0.6967868; time: 0.0 min
[Tune-x] 83: eta=0.258; lambda=155; max_depth=6
[Tune-y] 83: acc.test.mean=0.6967868; time: 0.0 min
[Tune-x] 84: eta=0.41; lambda=47.1; max_depth=5
[Tune-y] 84: acc.test.mean=0.7121714; time: 0.0 min
[Tune-x] 85: eta=0.69; lambda=139; max_depth=3
[Tune-y] 85: acc.test.mean=0.6637131; time: 0.0 min
[Tune-x] 86: eta=0.731; lambda=66.5; max_depth=1
[Tune-y] 86: acc.test.mean=0.4092502; time: 0.0 min
[Tune-x] 87: eta=0.351; lambda=73.7; max_depth=18
[Tune-y] 87: acc.test.mean=0.7121714; time: 0.0 min
[Tune-x] 88: eta=0.694; lambda=18.1; max_depth=1
[Tune-y] 88: acc.test.mean=0.4934761; time: 0.0 min
[Tune-x] 89: eta=0.596; lambda=83.6; max_

In [89]:
tr_xgb$y

In [92]:
t0 = Sys.time()
tr_svm = tuneParams(svm_learner,tsk_train,cv5,list(acc),parameters_svm,tune_control)
t1 = Sys.time()

[Tune] Started tuning learner classif.svm for parameter set:
         Type len Def   Constr Req Tunable Trafo
cost  numeric   -   - 0.1 to 1   -    TRUE     -
gamma numeric   -   - 0.1 to 1   -    TRUE     -
With control class: TuneControlRandom
Imputation value: -0
[Tune-x] 1: cost=0.292; gamma=0.779
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 1: acc.test.mean=0.1578384; time: 0.0 min
[Tune-x] 2: cost=0.51; gamma=0.184
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 2: acc.test.mean=0.6948393; time: 0.0 min
[Tune-x] 3: cost=0.767; gamma=0.209
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 3: acc.test.mean=0.7735475; time: 0.0 min
[Tune-x] 4: cost=0.402; gamma=0.65
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 4: acc.test.mean=0.2341123; time: 0.0 min
[Tune-x] 5: cost=0.122; gamma=0

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 9: acc.test.mean=0.8727361; time: 0.0 min
[Tune-x] 10: cost=0.745; gamma=0.362
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 10: acc.test.mean=0.4937683; time: 0.0 min
[Tune-x] 11: cost=0.441; gamma=0.31
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 11: acc.test.mean=0.4201233; time: 0.0 min
[Tune-x] 12: cost=0.171; gamma=0.119
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 12: acc.test.mean=0.4608569; time: 0.0 min
[Tune-x] 13: cost=0.134; gamma=0.721
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 13: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 14: cost=0.692; gamma=0.107
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 14: acc.test.m

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 18: acc.test.mean=0.2647517; time: 0.0 min
[Tune-x] 19: cost=0.142; gamma=0.879
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 19: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 20: cost=0.359; gamma=0.126
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 20: acc.test.mean=0.7354755; time: 0.0 min
[Tune-x] 21: cost=0.119; gamma=0.11
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 21: acc.test.mean=0.4250893; time: 0.0 min
[Tune-x] 22: cost=0.644; gamma=0.842
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 22: acc.test.mean=0.2596884; time: 0.0 min
[Tune-x] 23: cost=0.889; gamma=0.242
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 23: acc.test.

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 26: acc.test.mean=0.2851996; time: 0.0 min
[Tune-x] 27: cost=0.983; gamma=0.939
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 27: acc.test.mean=0.3079844; time: 0.0 min
[Tune-x] 28: cost=0.883; gamma=0.327
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 28: acc.test.mean=0.5573515; time: 0.0 min
[Tune-x] 29: cost=0.413; gamma=0.761
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 29: acc.test.mean=0.2213892; time: 0.0 min
[Tune-x] 30: cost=0.158; gamma=0.479
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 30: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 31: cost=0.224; gamma=0.637
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 31: acc.test

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 35: acc.test.mean=0.2570269; time: 0.0 min
[Tune-x] 36: cost=0.933; gamma=0.24
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 36: acc.test.mean=0.7557611; time: 0.0 min
[Tune-x] 37: cost=0.165; gamma=0.5
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 37: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 38: cost=0.832; gamma=0.874
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 38: acc.test.mean=0.2979228; time: 0.0 min
[Tune-x] 39: cost=0.931; gamma=0.194
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 39: acc.test.mean=0.8446608; time: 0.0 min
[Tune-x] 40: cost=0.352; gamma=0.949
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 40: acc.test.me

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 44: acc.test.mean=0.1705615; time: 0.0 min
[Tune-x] 45: cost=0.533; gamma=0.772
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 45: acc.test.mean=0.2519961; time: 0.0 min
[Tune-x] 46: cost=0.658; gamma=0.891
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 46: acc.test.mean=0.2519961; time: 0.0 min
[Tune-x] 47: cost=0.361; gamma=0.637
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 47: acc.test.mean=0.2467381; time: 0.0 min
[Tune-x] 48: cost=0.136; gamma=0.153
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 48: acc.test.mean=0.3996754; time: 0.0 min
[Tune-x] 49: cost=0.805; gamma=0.452
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 49: acc.test

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 52: acc.test.mean=0.2240506; time: 0.0 min
[Tune-x] 53: cost=0.15; gamma=0.857
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 53: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 54: cost=0.944; gamma=0.959
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 54: acc.test.mean=0.2927945; time: 0.0 min
[Tune-x] 55: cost=0.53; gamma=0.82
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 55: acc.test.mean=0.2341772; time: 0.0 min
[Tune-x] 56: cost=0.771; gamma=0.765
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 56: acc.test.mean=0.3131451; time: 0.0 min
[Tune-x] 57: cost=0.997; gamma=0.775
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 57: acc.test.me

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 61: acc.test.mean=0.1553067; time: 0.0 min
[Tune-x] 62: cost=0.767; gamma=0.662
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 62: acc.test.mean=0.3410906; time: 0.0 min
[Tune-x] 63: cost=0.553; gamma=0.897
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 63: acc.test.mean=0.2240506; time: 0.0 min
[Tune-x] 64: cost=0.517; gamma=0.372
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 64: acc.test.mean=0.4048036; time: 0.0 min
[Tune-x] 65: cost=0.85; gamma=0.73
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 65: acc.test.mean=0.3334956; time: 0.0 min
[Tune-x] 66: cost=0.244; gamma=0.115
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 66: acc.test.m

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 70: acc.test.mean=0.5420967; time: 0.0 min
[Tune-x] 71: cost=0.419; gamma=0.596
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 71: acc.test.mean=0.2672509; time: 0.0 min
[Tune-x] 72: cost=0.19; gamma=0.49
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 72: acc.test.mean=0.1527751; time: 0.0 min
[Tune-x] 73: cost=0.604; gamma=0.596
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 73: acc.test.mean=0.3308666; time: 0.0 min
[Tune-x] 74: cost=0.315; gamma=0.82
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 74: acc.test.mean=0.1680299; time: 0.0 min
[Tune-x] 75: cost=0.377; gamma=0.431
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 75: acc.test.me

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 79: acc.test.mean=0.4658552; time: 0.0 min
[Tune-x] 80: cost=0.698; gamma=0.69
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 80: acc.test.mean=0.3105485; time: 0.0 min
[Tune-x] 81: cost=0.656; gamma=0.89
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 81: acc.test.mean=0.2519961; time: 0.0 min
[Tune-x] 82: cost=0.965; gamma=0.462
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 82: acc.test.mean=0.4530672; time: 0.0 min
[Tune-x] 83: cost=0.246; gamma=0.529
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 83: acc.test.mean=0.1857838; time: 0.0 min
[Tune-x] 84: cost=0.716; gamma=0.52
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 84: acc.test.me

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 87: acc.test.mean=0.3818890; time: 0.0 min
[Tune-x] 88: cost=0.624; gamma=0.817
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 88: acc.test.mean=0.2622201; time: 0.0 min
[Tune-x] 89: cost=0.182; gamma=0.428
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 89: acc.test.mean=0.1680299; time: 0.0 min
[Tune-x] 90: cost=0.862; gamma=0.759
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 90: acc.test.mean=0.3283674; time: 0.0 min
[Tune-x] 91: cost=0.453; gamma=0.672
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 91: acc.test.mean=0.2544953; time: 0.0 min
[Tune-x] 92: cost=0.83; gamma=0.783
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 92: acc.test.

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 96: acc.test.mean=0.3997079; time: 0.0 min
[Tune-x] 97: cost=0.853; gamma=0.129
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 97: acc.test.mean=0.8828952; time: 0.0 min
[Tune-x] 98: cost=0.142; gamma=0.117
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 98: acc.test.mean=0.4582928; time: 0.0 min
[Tune-x] 99: cost=0.266; gamma=0.915
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 99: acc.test.mean=0.1553067; time: 0.0 min
[Tune-x] 100: cost=0.575; gamma=0.733
“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”[Tune-y] 100: acc.test.mean=0.2698799; time: 0.0 min
[Tune] Result: cost=0.853; gamma=0.129 : acc.test.mean=0.8828952


In [110]:
tr_xgb$x

In [94]:
tr_svm$x

## Treine no conjunto de treino completo

In [111]:
tuned_xgb = setHyperPars(xgb_learner,par.vals = tr_xgb$x)
tuned_svm = setHyperPars(svm_learner,par.vals = tr_svm$x)

In [112]:
tuned_xgb
tuned_svm

Learner classif.xgboost from package xgboost
Type: classif
Name: eXtreme Gradient Boosting; Short name: xgboost
Class: classif.xgboost
Properties: twoclass,multiclass,numerics,prob,weights,missings,featimp
Predict-Type: response
Hyperparameters: nrounds=1,verbose=0,eta=0.0759,lambda=5.69,max_depth=14


Learner classif.svm from package e1071
Type: classif
Name: Support Vector Machines (libsvm); Short name: svm
Class: classif.svm
Properties: twoclass,multiclass,numerics,factors,prob,class.weights
Predict-Type: response
Hyperparameters: cost=0.853,gamma=0.129


In [115]:
xgb_model = train(tuned_xgb, tsk_train)
svm_model = train(tuned_svm, tsk_train)

“Variable(s) ‘ext.decay.2’ and ‘fruit.pods.2’ and ‘roots.2’ constant. Cannot scale data.”

## Teste no conjunto de teste do passo 1

In [117]:
xgb_pred = predict(xgb_model, tsk_test)

svm_pred = predict(svm_model, tsk_test)

## Acurácia dos dois modelos

In [118]:
mean(xgb_pred$data$response == xgb_pred$data$truth)

In [108]:
mean(svm_pred$data$response == svm_pred$data$truth)

## Matriz de confusão dos dois modelos

In [119]:
calculateConfusionMatrix(xgb_pred)

                        predicted
true                     alternarialeaf-spot anthracnose bacterial-blight
  alternarialeaf-spot                     27           0                0
  anthracnose                              0          13                0
  bacterial-blight                         0           0                4
  bacterial-pustule                        0           0                4
  brown-spot                               1           0                0
  brown-stem-rot                           0           0                0
  charcoal-rot                             0           0                0
  diaporthe-stem-canker                    0           0                0
  downy-mildew                             4           0                0
  frog-eye-leaf-spot                       5           0                0
  phyllosticta-leaf-spot                   0           0                0
  phytophthora-rot                         0           0                0
  po

In [122]:
cm = calculateConfusionMatrix(svm_pred,relative = TRUE)

In [125]:
cm$result

Unnamed: 0,alternarialeaf-spot,anthracnose,bacterial-blight,bacterial-pustule,brown-spot,brown-stem-rot,charcoal-rot,diaporthe-stem-canker,downy-mildew,frog-eye-leaf-spot,phyllosticta-leaf-spot,phytophthora-rot,powdery-mildew,purple-seed-stain,rhizoctonia-root-rot,-err.-
alternarialeaf-spot,27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
anthracnose,0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0
bacterial-blight,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0
bacterial-pustule,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0
brown-spot,0,0,0,0,28,0,0,0,0,2,0,0,0,0,0,2
brown-stem-rot,0,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0
charcoal-rot,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0
diaporthe-stem-canker,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0
downy-mildew,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0
frog-eye-leaf-spot,4,0,0,0,0,0,0,0,0,19,0,0,0,0,0,4
