diff --git a/devel/html/benchmark_experiments/index.html b/devel/html/benchmark_experiments/index.html index 93bac277..c05239f6 100644 --- a/devel/html/benchmark_experiments/index.html +++ b/devel/html/benchmark_experiments/index.html @@ -439,7 +439,7 @@

Example: #> Stratification: FALSE #> predict.type: response #> threshold: -#> time (mean): 0.01 +#> time (mean): 0.02 #> id truth response iter set #> 180 180 M M 1 test #> 100 100 M R 1 test diff --git a/devel/html/configureMlr/index.html b/devel/html/configureMlr/index.html index 9dfc2739..28a87f8d 100644 --- a/devel/html/configureMlr/index.html +++ b/devel/html/configureMlr/index.html @@ -488,7 +488,7 @@

Example: Handl
## This call gives an error caused by the low number of observations in class "virginica"
 train("classif.qda", task = iris.task, subset = 1:104)
 #> Error in qda.default(x, grouping, ...): some group is too small for 'qda'
-#> Timing stopped at: 0.004 0 0.005
+#> Timing stopped at: 0.005 0 0.005
 
 ## Turn learner errors into warnings
 configureMlr(on.learner.error = "warn")
diff --git a/devel/html/cost_sensitive_classif/index.html b/devel/html/cost_sensitive_classif/index.html
index 6365c7e4..98c0a61b 100644
--- a/devel/html/cost_sensitive_classif/index.html
+++ b/devel/html/cost_sensitive_classif/index.html
@@ -472,6 +472,7 @@ 

Binary classification problems

credit.task = makeClassifTask(data = GermanCredit, target = "Class") credit.task = removeConstantFeatures(credit.task) #> Removing 2 columns: Purpose.Vacation,Personal.Female.Single + credit.task #> Supervised task: GermanCredit #> Type: classif @@ -560,9 +561,10 @@
i. Theoretical thresholding
which requires the ClassifTask object and the cost matrix (argument costs). It is expected that the rows of the cost matrix indicate true and the columns predicted class labels.

-
credit.costs = makeCostMeasure(id = "credit.costs", costs = costs, task = credit.task, best = 0, worst = 5)
+
credit.costs = makeCostMeasure(id = "credit.costs", name = "Credit costs", costs = costs, task = credit.task,
+  best = 0, worst = 5)
 credit.costs
-#> Name: credit.costs
+#> Name: Credit costs
 #> Performance measure: credit.costs
 #> Properties: classif,classif.multi,req.pred,req.truth,predtype.response,predtype.prob
 #> Minimize: TRUE
@@ -605,7 +607,7 @@ 
i. Theoretical thresholding
#> mmce.aggr: 0.36 #> mmce.mean: 0.36 #> mmce.sd: 0.02 -#> Runtime: 0.323072 +#> Runtime: 0.385333

If we are also interested in the cross-validated performance for the default threshold values @@ -628,7 +630,7 @@

i. Theoretical thresholding
plotThreshVsPerf(d, mark.th = th)
-

plot of chunk unnamed-chunk-9

+

plot of chunk unnamed-chunk-9

ii. Empirical thresholding

The idea of empirical thresholding (see Sheng and Ling, 2006) is to select cost-optimal threshold values for a given learning method based on the training data. @@ -653,7 +655,7 @@

ii. Empirical thresholding
#> mmce.aggr: 0.25 #> mmce.mean: 0.25 #> mmce.sd: 0.03 -#> Runtime: 0.343711 +#> Runtime: 0.370338 ## Tune the threshold based on the predicted probabilities on the 3 test data sets tune.res = tuneThreshold(pred = r$pred, measure = credit.costs) @@ -752,7 +754,7 @@
i. Weighting
#> mmce.aggr: 0.35 #> mmce.mean: 0.35 #> mmce.sd: 0.02 -#> Runtime: 0.395105 +#> Runtime: 0.44081

For classification methods like "classif.ksvm" (the support vector machine @@ -774,7 +776,7 @@

i. Weighting
#> mmce.aggr: 0.31 #> mmce.mean: 0.31 #> mmce.sd: 0.02 -#> Runtime: 0.546966 +#> Runtime: 0.61962

Just like the theoretical threshold, the theoretical weights may not always be suitable, @@ -857,7 +859,7 @@

ii. Over- and undersampling
#> mmce.aggr: 0.35 #> mmce.mean: 0.35 #> mmce.sd: 0.02 -#> Runtime: 0.696635 +#> Runtime: 0.812204

Of course, we can also tune the oversampling rate. For this purpose we again have to create @@ -924,8 +926,8 @@

Multi-class problems

colnames(costs) = rownames(costs) = getTaskClassLevels(wf.task) ## Performance measure -wf.costs = makeCostMeasure(id = "wf.costs", costs = costs, task = wf.task, best = 0, - worst = 10) +wf.costs = makeCostMeasure(id = "wf.costs", name = "Waveform costs", costs = costs, task = wf.task, + best = 0, worst = 10)

In the multi-class case, both, thresholding and rebalancing correspond to cost matrices @@ -958,7 +960,7 @@

1. Thresholding

#> mmce.aggr: 0.30 #> mmce.mean: 0.30 #> mmce.sd: 0.00 -#> Runtime: 0.0716846 +#> Runtime: 0.0816503 ## Calculate thresholds as 1/(average costs of true classes) th = 2/rowSums(costs) @@ -1119,7 +1121,7 @@

Example-dependent misclassifi #> Prediction: 150 observations #> predict.type: response #> threshold: -#> time: 0.05 +#> time: 0.06 #> id response #> 1 1 setosa #> 2 2 setosa diff --git a/devel/html/create_learner/index.html b/devel/html/create_learner/index.html index 33535d3e..dabb933d 100644 --- a/devel/html/create_learner/index.html +++ b/devel/html/create_learner/index.html @@ -329,7 +329,7 @@
  • Regression
  • -
  • Survival Analysis
  • +
  • Survival analysis
  • Clustering
  • @@ -516,7 +516,7 @@

    Regression

    Again most of the data is passed straight through to/from the train/predict functions of the learner.

    -

    Survival Analysis

    +

    Survival analysis

    For survival analysis, you have to return so-called linear predictors in order to compute the default measure for this task type, the cindex (for .learner$predict.type == "response"). For .learner$predict.type == "prob", diff --git a/devel/html/create_measure/index.html b/devel/html/create_measure/index.html index 744956ce..e2149ddd 100644 --- a/devel/html/create_measure/index.html +++ b/devel/html/create_measure/index.html @@ -369,7 +369,7 @@

    Performance measures and a #> { #> measureMSE(pred$data$truth, pred$data$response) #> } -#> <bytecode: 0x192b7000> +#> <bytecode: 0x1f0f1820> #> <environment: namespace:mlr> measureMSE @@ -377,7 +377,7 @@

    Performance measures and a #> { #> mean((response - truth)^2) #> } -#> <bytecode: 0xf8e6500> +#> <bytecode: 0x111b2dd8> #> <environment: namespace:mlr> @@ -412,7 +412,7 @@

    Performance measures and a test.mean$fun #> function (task, perf.test, perf.train, measure, group, pred) #> mean(perf.test) -#> <bytecode: 0x1e7ad968> +#> <bytecode: 0x1ab3c3d8> #> <environment: namespace:mlr> diff --git a/devel/html/feature_selection/index.html b/devel/html/feature_selection/index.html index 55b1d6b0..29925b34 100644 --- a/devel/html/feature_selection/index.html +++ b/devel/html/feature_selection/index.html @@ -504,13 +504,13 @@

    Tuning the size of the feature su

    The performance of all percentage values visited during tuning is:

    as.data.frame(res$opt.path)
     #>   fw.perc mse.test.mean dob eol error.message exec.time
    -#> 1     0.2      40.59578   1  NA          <NA>     0.270
    -#> 2    0.25      40.59578   2  NA          <NA>     0.235
    -#> 3     0.3      37.05592   3  NA          <NA>     0.238
    -#> 4    0.35      35.83712   4  NA          <NA>     0.240
    -#> 5     0.4      35.83712   5  NA          <NA>     0.237
    -#> 6    0.45      27.39955   6  NA          <NA>     0.238
    -#> 7     0.5      27.39955   7  NA          <NA>     0.244
    +#> 1     0.2      40.59578   1  NA          <NA>     0.280
    +#> 2    0.25      40.59578   2  NA          <NA>     0.269
    +#> 3     0.3      37.05592   3  NA          <NA>     0.289
    +#> 4    0.35      35.83712   4  NA          <NA>     0.258
    +#> 5     0.4      35.83712   5  NA          <NA>     0.256
    +#> 6    0.45      27.39955   6  NA          <NA>     0.258
    +#> 7     0.5      27.39955   7  NA          <NA>     0.256
     

    The optimal percentage and the corresponding performance can be accessed as follows:

    @@ -553,12 +553,12 @@

    Tuning the size of the feature su #> Points on front: 13 head(as.data.frame(res$opt.path)) #> fw.threshold fpr.test.mean fnr.test.mean dob eol error.message exec.time -#> 1 0.4892321 0.3092818 0.2639033 1 NA <NA> 2.293 -#> 2 0.2481696 0.2045499 0.2319697 2 NA <NA> 2.296 -#> 3 0.7691875 0.5128000 0.3459740 3 NA <NA> 2.279 -#> 4 0.1470133 0.2045499 0.2319697 4 NA <NA> 2.348 -#> 5 0.5958241 0.5028216 0.5239538 5 NA <NA> 2.169 -#> 6 0.6892421 0.6323959 0.4480808 6 NA <NA> 2.091 +#> 1 0.4892321 0.3092818 0.2639033 1 NA <NA> 2.552 +#> 2 0.2481696 0.2045499 0.2319697 2 NA <NA> 2.617 +#> 3 0.7691875 0.5128000 0.3459740 3 NA <NA> 2.428 +#> 4 0.1470133 0.2045499 0.2319697 4 NA <NA> 2.610 +#> 5 0.5958241 0.5028216 0.5239538 5 NA <NA> 2.487 +#> 6 0.6892421 0.6323959 0.4480808 6 NA <NA> 2.469

    The results can be visualized with function plotTuneMultiCritResult. diff --git a/devel/html/index.html b/devel/html/index.html index 8197e457..5fb55143 100644 --- a/devel/html/index.html +++ b/devel/html/index.html @@ -419,5 +419,5 @@

    diff --git a/devel/html/integrated_learners/index.html b/devel/html/integrated_learners/index.html index b2b82caa..864c4567 100644 --- a/devel/html/integrated_learners/index.html +++ b/devel/html/integrated_learners/index.html @@ -394,7 +394,7 @@

    Classification (70)

    X multiclass
    prob
    twoclass -sizehas been set to 3 by default. Doing bagging training of nnetif set bag=TRUE +size has been set to 3 by default. Doing bagging training of nnet if set bag=TRUE. classif.bartMachine
    bartmachine @@ -449,7 +449,7 @@

    Classification (70)

    X multiclass
    prob
    twoclass -xvalhas been set to 0 by default for speed. +xval has been set to 0 by default for speed. classif.bst
    bst @@ -460,7 +460,7 @@

    Classification (70)

    twoclass -The argument learnerhas been renamed to Learnerdue to a name conflict with setHyerPars Learnerhas been set to lmby default. +The argument learner has been renamed to Learner due to a name conflict with setHyerPars. Learner has been set to lm by default. classif.cforest
    cforest @@ -482,7 +482,7 @@

    Classification (70)

    twoclass -centersset to 2 by default +centers set to 2 by default classif.ctree
    ctree @@ -504,7 +504,7 @@

    Classification (70)

    multiclass
    prob
    twoclass -outputset to softmaxby default +output set to softmax by default classif.dcSVM
    dcSVM @@ -581,7 +581,7 @@

    Classification (70)

    X prob
    twoclass -familyhas been set to Binomial()by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. +family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. classif.glmnet
    glmnet @@ -669,7 +669,7 @@

    Classification (70)

    class.weights
    multiclass
    prob
    twoclass -Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fithas been set to FALSEby default for speed. +Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. classif.lda
    lda @@ -779,7 +779,7 @@

    Classification (70)

    X prob
    twoclass -penaltyhas been set to lasso and lambdato 0.1 by default. +penalty has been set to lasso and lambda to 0.1 by default. classif.lssvm
    lssvm @@ -790,7 +790,7 @@

    Classification (70)

    multiclass
    twoclass -fittedhas been set to FALSEby default for speed. +fitted has been set to FALSE by default for speed. classif.lvq1
    lvq1 @@ -812,7 +812,7 @@

    Classification (70)

    multiclass
    prob
    twoclass -keep.fittedhas been set to FALSEby default for speed and we use start.method='lvq' for more robust behavior / less technical crashes +keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes classif.mlp
    mlp @@ -856,7 +856,7 @@

    Classification (70)

    prob
    twoclass -err.fcthas been set to ceto do classification. +err.fct has been set to ce to do classification. classif.nnet
    nnet @@ -867,7 +867,7 @@

    Classification (70)

    X multiclass
    prob
    twoclass -sizehas been set to 3 by default. +size has been set to 3 by default. classif.nnTrain
    nn.train @@ -878,7 +878,7 @@

    Classification (70)

    multiclass
    prob
    twoclass -outputset to softmaxby default +output set to softmax by default classif.nodeHarvest
    nodeHarvest @@ -911,7 +911,7 @@

    Classification (70)

    prob
    twoclass -threshold for prediction (threshold.predict has been set to 1by default +threshold for prediction (threshold.predict) has been set to 1 by default classif.PART
    part @@ -1010,7 +1010,7 @@

    Classification (70)

    multiclass
    prob
    twoclass - +By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. classif.rda
    rda @@ -1021,7 +1021,7 @@

    Classification (70)

    multiclass
    prob
    twoclass -estimate.errorhas been set to FALSEby default for speed. +estimate.error has been set to FALSE by default for speed. classif.rFerns
    rFerns @@ -1065,7 +1065,7 @@

    Classification (70)

    X X multiclass
    ordered
    prob
    twoclass -xvalhas been set to 0 by default for speed. +xval has been set to 0 by default for speed. classif.rrlda
    rrlda @@ -1087,7 +1087,7 @@

    Classification (70)

    multiclass
    prob
    twoclass -outputset to softmaxby default +output set to softmax by default classif.sda
    sda @@ -1171,7 +1171,7 @@

    Regression (49)

    X -sizehas been set to 3 by default. +size has been set to 3 by default. regr.bartMachine
    bartmachine @@ -1270,7 +1270,7 @@

    Regression (49)

    -The argument learnerhas been renamed to Learnerdue to a name conflict with setHyerPars +The argument learner has been renamed to Learner due to a name conflict with setHyerPars regr.btgp
    btgp @@ -1413,7 +1413,7 @@

    Regression (49)

    X X -distributionhas been set to gaussian by default. +distribution has been set to gaussian by default. regr.glmboost
    glmboost @@ -1468,7 +1468,7 @@

    Regression (49)

    -Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fithas been set to FALSEby default for speed. +Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. regr.LiblineaRL2L1SVR
    liblinl2l1svr @@ -1534,7 +1534,7 @@

    Regression (49)

    X -sizehas been set to 3 by default. +size has been set to 3 by default. regr.nodeHarvest
    nodeHarvest @@ -1622,7 +1622,7 @@

    Regression (49)

    - +By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. regr.rknn
    rknn @@ -1644,7 +1644,7 @@

    Regression (49)

    X X ordered -xvalhas been set to 0 by default for speed. +xval has been set to 0 by default for speed. regr.rsm
    rsm @@ -1666,7 +1666,7 @@

    Regression (49)

    -Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fithas been set to FALSEby default for speed. +Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed. regr.svm
    svm @@ -1761,7 +1761,7 @@

    Survival analysis (9)

    X ordered
    rcens -familyhas been set to CoxPH()by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. +family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. surv.glmnet
    glmnet @@ -1805,7 +1805,7 @@

    Survival analysis (9)

    prob
    rcens - +By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. surv.rpart
    rpart @@ -1816,7 +1816,7 @@

    Survival analysis (9)

    X X ordered
    rcens -xvalhas been set to 0 by default for speed. +xval has been set to 0 by default for speed. diff --git a/devel/html/measures/index.html b/devel/html/measures/index.html index e91b3417..e88a76f3 100644 --- a/devel/html/measures/index.html +++ b/devel/html/measures/index.html @@ -613,7 +613,7 @@

    Classification

    multiclass.auc - Multiclass area under the curve -Calls pROC::multiclass.roc +Calls pROC::multiclass.roc. 1 0 @@ -893,7 +893,7 @@

    Cluster analysis

    db - Davies-Bouldin cluster separation measure -See ?clusterSim::index.DB +See ?clusterSim::index.DB. X 0 Inf @@ -907,7 +907,7 @@

    Cluster analysis

    dunn - Dunn index -See ?clValid::dunn +See ?clValid::dunn. Inf 0 @@ -921,7 +921,7 @@

    Cluster analysis

    G1 - Calinski-Harabasz pseudo F statistic -See ?clusterSim::index.G1 +See ?clusterSim::index.G1. Inf 0 @@ -935,7 +935,7 @@

    Cluster analysis

    G2 - Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic -See ?clusterSim::index.G2 +See ?clusterSim::index.G2. Inf 0 @@ -949,7 +949,7 @@

    Cluster analysis

    silhouette - Rousseeuw's silhouette internal cluster quality index -See ?clusterSim::index.S +See ?clusterSim::index.S. Inf 0 diff --git a/devel/html/mkdocs/search_index.json b/devel/html/mkdocs/search_index.json index fd51e891..0495d25a 100644 --- a/devel/html/mkdocs/search_index.json +++ b/devel/html/mkdocs/search_index.json @@ -122,7 +122,7 @@ }, { "location": "/performance/index.html", - "text": "Evaluating Learner Performance\n\n\nThe quality of the predictions of a model in \nmlr\n can be assessed with respect to a\nnumber of different performance measures.\nIn order to calculate the performance measures, call \nperformance\n on the object\nreturned by \npredict\n and specify the desired performance measures.\n\n\nAvailable performance measures\n\n\nmlr\n provides a large number of performance measures for all types of learning problems.\nTypical performance measures for \nclassification\n are the mean misclassification error (\nmmce\n),\naccuracy (\nacc\n) or measures based on \nROC analysis\n.\nFor \nregression\n the mean of squared errors (\nmse\n) or mean of absolute errors (\nmae\n)\nare usually considered.\nFor \nclustering\n tasks, measures such as the Dunn index (\ndunn\n) are provided,\nwhile for \nsurvival\n predictions, the Concordance Index (\ncindex\n) is\nsupported, and for \ncost-sensitive\n predictions the misclassification penalty\n(\nmcp\n) and others. It is also possible to access the time to train the\nlearner (\ntimetrain\n), the time to compute the prediction (\ntimepredict\n) and their\nsum (\ntimeboth\n) as performance measures.\n\n\nTo see which performance measures are implemented, have a look at the\n\ntable of performance measures\n and the \nmeasures\n documentation page.\n\n\nIf you want to implement an additional measure or include a measure with\nnon-standard misclassification costs, see the section on\n\ncreating custom measures\n.\n\n\nListing measures\n\n\nThe properties and requirements of the individual measures are shown in the \ntable of performance measures\n.\n\n\nIf you would like a list of available measures with certain properties or suitable for a\ncertain learning \nTask\n use the function \nlistMeasures\n.\n\n\n## Performance measures for classification with multiple classes\nlistMeasures(\nclassif\n, properties = \nclassif.multi\n)\n#\n [1] \ntimepredict\n \nacc\n \nber\n \nfeatperc\n \n#\n [5] \nmmce\n \ntimeboth\n \ntimetrain\n \nmulticlass.auc\n\n## Performance measure suitable for the iris classification task\nlistMeasures(iris.task)\n#\n [1] \ntimepredict\n \nacc\n \nber\n \nfeatperc\n \n#\n [5] \nmmce\n \ntimeboth\n \ntimetrain\n \nmulticlass.auc\n\n\n\n\n\nFor convenience there exists a default measure for each type of learning problem, which is\ncalculated if nothing else is specified. As defaults we chose the most commonly used measures for the\nrespective types, e.g., the mean squared error (\nmse\n) for regression and the\nmisclassification rate (\nmmce\n) for classification.\nThe help page of function \ngetDefaultMeasure\n lists all defaults for all types of learning problems.\nThe function itself returns the default measure for a given task type, \nTask\n or \nLearner\n.\n\n\n## Get default measure for iris.task\ngetDefaultMeasure(iris.task)\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: test.mean\n#\n Note:\n\n## Get the default measure for linear regression\ngetDefaultMeasure(makeLearner(\nregr.lm\n))\n#\n Name: Mean of squared errors\n#\n Performance measure: mse\n#\n Properties: regr,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: Inf\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nCalculate performance measures\n\n\nIn the following example we fit a \ngradient boosting machine\n on a subset of the\n\nBostonHousing\n data set and calculate the default measure mean\nsquared error (\nmse\n) on the remaining observations.\n\n\nn = getTaskSize(bh.task)\nlrn = makeLearner(\nregr.gbm\n, n.trees = 1000)\nmod = train(lrn, task = bh.task, subset = seq(1, n, 2))\npred = predict(mod, task = bh.task, subset = seq(2, n, 2))\n\nperformance(pred)\n#\n mse \n#\n 42.68414\n\n\n\n\nThe following code computes the median of squared errors (\nmedse\n) instead.\n\n\nperformance(pred, measures = medse)\n#\n medse \n#\n 9.134965\n\n\n\n\nOf course, we can also calculate multiple performance measures at once by simply passing a\nlist of measures which can also include \nyour own measure\n.\n\n\nCalculate the mean squared error, median squared error and mean absolute error (\nmae\n).\n\n\nperformance(pred, measures = list(mse, medse, mae))\n#\n mse medse mae \n#\n 42.684141 9.134965 4.536750\n\n\n\n\nFor the other types of learning problems and measures, calculating the performance basically\nworks in the same way.\n\n\nRequirements of performance measures\n\n\nNote that in order to calculate some performance measures it is required that you pass the\n\nTask\n or the \nfitted model\n in addition to the \nPrediction\n.\n\n\nFor example in order to assess the time needed for training (\ntimetrain\n), the fitted\nmodel has to be passed.\n\n\nperformance(pred, measures = timetrain, model = mod)\n#\n timetrain \n#\n 0.121\n\n\n\n\nFor many performance measures in cluster analysis the \nTask\n is required.\n\n\nlrn = makeLearner(\ncluster.kmeans\n, centers = 3)\nmod = train(lrn, mtcars.task)\npred = predict(mod, task = mtcars.task)\n\n## Calculate the Dunn index\nperformance(pred, measures = dunn, task = mtcars.task)\n#\n dunn \n#\n 0.1462919\n\n\n\n\nMoreover, some measures require a certain type of prediction.\nFor example in binary classification in order to calculate the AUC (\nauc\n) -- the area\nunder the ROC (receiver operating characteristic) curve -- we have to make sure that posterior\nprobabilities are predicted.\nFor more information on ROC analysis, see the section on \nROC analysis\n.\n\n\nlrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nmod = train(lrn, task = sonar.task)\npred = predict(mod, task = sonar.task)\n\nperformance(pred, measures = auc)\n#\n auc \n#\n 0.9224018\n\n\n\n\nAlso bear in mind that many of the performance measures that are available for classification,\ne.g., the false positive rate (\nfpr\n), are only suitable for binary problems.\n\n\nAccess a performance measure\n\n\nPerformance measures in \nmlr\n are objects of class \nMeasure\n.\nIf you are interested in the properties or requirements of a single measure you can access it directly.\nSee the help page of \nMeasure\n for information on the individual slots.\n\n\n## Mean misclassification error\nstr(mmce)\n#\n List of 10\n#\n $ id : chr \nmmce\n\n#\n $ minimize : logi TRUE\n#\n $ properties: chr [1:4] \nclassif\n \nclassif.multi\n \nreq.pred\n \nreq.truth\n\n#\n $ fun :function (task, model, pred, feats, extra.args) \n#\n $ extra.args: list()\n#\n $ best : num 0\n#\n $ worst : num 1\n#\n $ name : chr \nMean misclassification error\n\n#\n $ note : chr \n\n#\n $ aggr :List of 3\n#\n ..$ id : chr \ntest.mean\n\n#\n ..$ name: chr \nTest mean\n\n#\n ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n ..- attr(*, \nclass\n)= chr \nAggregation\n\n#\n - attr(*, \nclass\n)= chr \nMeasure\n\n\n\n\n\nBinary classification: Plot performance versus threshold\n\n\nAs you may recall (see the previous section on \nmaking predictions\n)\nin binary classification we can adjust the threshold used to map probabilities to class labels.\nHelpful in this regard is are the functions \ngenerateThreshVsPerfData\n and \nplotThreshVsPerf\n, which generate and plot, respectively, the learner performance versus the threshold.\n\n\nFor more performance plots and automatic threshold tuning see \nhere\n.\n\n\nIn the following example we consider the \nSonar\n data set and\nplot the false positive rate (\nfpr\n), the false negative rate (\nfnr\n)\nas well as the misclassification rate (\nmmce\n) for all possible threshold values.\n\n\nlrn = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nn = getTaskSize(sonar.task)\nmod = train(lrn, task = sonar.task, subset = seq(1, n, by = 2))\npred = predict(mod, task = sonar.task, subset = seq(2, n, by = 2))\n\n## Performance for the default threshold 0.5\nperformance(pred, measures = list(fpr, fnr, mmce))\n#\n fpr fnr mmce \n#\n 0.2500000 0.3035714 0.2788462\n## Plot false negative and positive rates as well as the error rate versus the threshold\nd = generateThreshVsPerfData(pred, measures = list(fpr, fnr, mmce))\nplotThreshVsPerf(d)\n\n\n\n\n \n\n\nThere is an experimental \nggvis\n plotting function \nplotThreshVsPerfGGVIS\n which performs similarly\nto \nplotThreshVsPerf\n but instead of creating facetted subplots to visualize multiple learners and/or\nmultiple measures, one of them is mapped to an interactive sidebar which selects what to display.\n\n\nplotThreshVsPerfGGVIS(d)", + "text": "Evaluating Learner Performance\n\n\nThe quality of the predictions of a model in \nmlr\n can be assessed with respect to a\nnumber of different performance measures.\nIn order to calculate the performance measures, call \nperformance\n on the object\nreturned by \npredict\n and specify the desired performance measures.\n\n\nAvailable performance measures\n\n\nmlr\n provides a large number of performance measures for all types of learning problems.\nTypical performance measures for \nclassification\n are the mean misclassification error (\nmmce\n),\naccuracy (\nacc\n) or measures based on \nROC analysis\n.\nFor \nregression\n the mean of squared errors (\nmse\n) or mean of absolute errors (\nmae\n)\nare usually considered.\nFor \nclustering\n tasks, measures such as the Dunn index (\ndunn\n) are provided,\nwhile for \nsurvival\n predictions, the Concordance Index (\ncindex\n) is\nsupported, and for \ncost-sensitive\n predictions the misclassification penalty\n(\nmcp\n) and others. It is also possible to access the time to train the\nlearner (\ntimetrain\n), the time to compute the prediction (\ntimepredict\n) and their\nsum (\ntimeboth\n) as performance measures.\n\n\nTo see which performance measures are implemented, have a look at the\n\ntable of performance measures\n and the \nmeasures\n documentation page.\n\n\nIf you want to implement an additional measure or include a measure with\nnon-standard misclassification costs, see the section on\n\ncreating custom measures\n.\n\n\nListing measures\n\n\nThe properties and requirements of the individual measures are shown in the \ntable of performance measures\n.\n\n\nIf you would like a list of available measures with certain properties or suitable for a\ncertain learning \nTask\n use the function \nlistMeasures\n.\n\n\n## Performance measures for classification with multiple classes\nlistMeasures(\nclassif\n, properties = \nclassif.multi\n)\n#\n [1] \ntimepredict\n \nacc\n \nber\n \nfeatperc\n \n#\n [5] \nmmce\n \ntimeboth\n \ntimetrain\n \nmulticlass.auc\n\n## Performance measure suitable for the iris classification task\nlistMeasures(iris.task)\n#\n [1] \ntimepredict\n \nacc\n \nber\n \nfeatperc\n \n#\n [5] \nmmce\n \ntimeboth\n \ntimetrain\n \nmulticlass.auc\n\n\n\n\n\nFor convenience there exists a default measure for each type of learning problem, which is\ncalculated if nothing else is specified. As defaults we chose the most commonly used measures for the\nrespective types, e.g., the mean squared error (\nmse\n) for regression and the\nmisclassification rate (\nmmce\n) for classification.\nThe help page of function \ngetDefaultMeasure\n lists all defaults for all types of learning problems.\nThe function itself returns the default measure for a given task type, \nTask\n or \nLearner\n.\n\n\n## Get default measure for iris.task\ngetDefaultMeasure(iris.task)\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: test.mean\n#\n Note:\n\n## Get the default measure for linear regression\ngetDefaultMeasure(makeLearner(\nregr.lm\n))\n#\n Name: Mean of squared errors\n#\n Performance measure: mse\n#\n Properties: regr,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: Inf\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nCalculate performance measures\n\n\nIn the following example we fit a \ngradient boosting machine\n on a subset of the\n\nBostonHousing\n data set and calculate the default measure mean\nsquared error (\nmse\n) on the remaining observations.\n\n\nn = getTaskSize(bh.task)\nlrn = makeLearner(\nregr.gbm\n, n.trees = 1000)\nmod = train(lrn, task = bh.task, subset = seq(1, n, 2))\npred = predict(mod, task = bh.task, subset = seq(2, n, 2))\n\nperformance(pred)\n#\n mse \n#\n 42.68414\n\n\n\n\nThe following code computes the median of squared errors (\nmedse\n) instead.\n\n\nperformance(pred, measures = medse)\n#\n medse \n#\n 9.134965\n\n\n\n\nOf course, we can also calculate multiple performance measures at once by simply passing a\nlist of measures which can also include \nyour own measure\n.\n\n\nCalculate the mean squared error, median squared error and mean absolute error (\nmae\n).\n\n\nperformance(pred, measures = list(mse, medse, mae))\n#\n mse medse mae \n#\n 42.684141 9.134965 4.536750\n\n\n\n\nFor the other types of learning problems and measures, calculating the performance basically\nworks in the same way.\n\n\nRequirements of performance measures\n\n\nNote that in order to calculate some performance measures it is required that you pass the\n\nTask\n or the \nfitted model\n in addition to the \nPrediction\n.\n\n\nFor example in order to assess the time needed for training (\ntimetrain\n), the fitted\nmodel has to be passed.\n\n\nperformance(pred, measures = timetrain, model = mod)\n#\n timetrain \n#\n 0.111\n\n\n\n\nFor many performance measures in cluster analysis the \nTask\n is required.\n\n\nlrn = makeLearner(\ncluster.kmeans\n, centers = 3)\nmod = train(lrn, mtcars.task)\npred = predict(mod, task = mtcars.task)\n\n## Calculate the Dunn index\nperformance(pred, measures = dunn, task = mtcars.task)\n#\n dunn \n#\n 0.1462919\n\n\n\n\nMoreover, some measures require a certain type of prediction.\nFor example in binary classification in order to calculate the AUC (\nauc\n) -- the area\nunder the ROC (receiver operating characteristic) curve -- we have to make sure that posterior\nprobabilities are predicted.\nFor more information on ROC analysis, see the section on \nROC analysis\n.\n\n\nlrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nmod = train(lrn, task = sonar.task)\npred = predict(mod, task = sonar.task)\n\nperformance(pred, measures = auc)\n#\n auc \n#\n 0.9224018\n\n\n\n\nAlso bear in mind that many of the performance measures that are available for classification,\ne.g., the false positive rate (\nfpr\n), are only suitable for binary problems.\n\n\nAccess a performance measure\n\n\nPerformance measures in \nmlr\n are objects of class \nMeasure\n.\nIf you are interested in the properties or requirements of a single measure you can access it directly.\nSee the help page of \nMeasure\n for information on the individual slots.\n\n\n## Mean misclassification error\nstr(mmce)\n#\n List of 10\n#\n $ id : chr \nmmce\n\n#\n $ minimize : logi TRUE\n#\n $ properties: chr [1:4] \nclassif\n \nclassif.multi\n \nreq.pred\n \nreq.truth\n\n#\n $ fun :function (task, model, pred, feats, extra.args) \n#\n $ extra.args: list()\n#\n $ best : num 0\n#\n $ worst : num 1\n#\n $ name : chr \nMean misclassification error\n\n#\n $ note : chr \n\n#\n $ aggr :List of 3\n#\n ..$ id : chr \ntest.mean\n\n#\n ..$ name: chr \nTest mean\n\n#\n ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n ..- attr(*, \nclass\n)= chr \nAggregation\n\n#\n - attr(*, \nclass\n)= chr \nMeasure\n\n\n\n\n\nBinary classification: Plot performance versus threshold\n\n\nAs you may recall (see the previous section on \nmaking predictions\n)\nin binary classification we can adjust the threshold used to map probabilities to class labels.\nHelpful in this regard is are the functions \ngenerateThreshVsPerfData\n and \nplotThreshVsPerf\n, which generate and plot, respectively, the learner performance versus the threshold.\n\n\nFor more performance plots and automatic threshold tuning see \nhere\n.\n\n\nIn the following example we consider the \nSonar\n data set and\nplot the false positive rate (\nfpr\n), the false negative rate (\nfnr\n)\nas well as the misclassification rate (\nmmce\n) for all possible threshold values.\n\n\nlrn = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nn = getTaskSize(sonar.task)\nmod = train(lrn, task = sonar.task, subset = seq(1, n, by = 2))\npred = predict(mod, task = sonar.task, subset = seq(2, n, by = 2))\n\n## Performance for the default threshold 0.5\nperformance(pred, measures = list(fpr, fnr, mmce))\n#\n fpr fnr mmce \n#\n 0.2500000 0.3035714 0.2788462\n## Plot false negative and positive rates as well as the error rate versus the threshold\nd = generateThreshVsPerfData(pred, measures = list(fpr, fnr, mmce))\nplotThreshVsPerf(d)\n\n\n\n\n \n\n\nThere is an experimental \nggvis\n plotting function \nplotThreshVsPerfGGVIS\n which performs similarly\nto \nplotThreshVsPerf\n but instead of creating facetted subplots to visualize multiple learners and/or\nmultiple measures, one of them is mapped to an interactive sidebar which selects what to display.\n\n\nplotThreshVsPerfGGVIS(d)", "title": "Performance" }, { @@ -142,7 +142,7 @@ }, { "location": "/performance/index.html#calculate-performance-measures", - "text": "In the following example we fit a gradient boosting machine on a subset of the BostonHousing data set and calculate the default measure mean\nsquared error ( mse ) on the remaining observations. n = getTaskSize(bh.task)\nlrn = makeLearner( regr.gbm , n.trees = 1000)\nmod = train(lrn, task = bh.task, subset = seq(1, n, 2))\npred = predict(mod, task = bh.task, subset = seq(2, n, 2))\n\nperformance(pred)\n# mse \n# 42.68414 The following code computes the median of squared errors ( medse ) instead. performance(pred, measures = medse)\n# medse \n# 9.134965 Of course, we can also calculate multiple performance measures at once by simply passing a\nlist of measures which can also include your own measure . Calculate the mean squared error, median squared error and mean absolute error ( mae ). performance(pred, measures = list(mse, medse, mae))\n# mse medse mae \n# 42.684141 9.134965 4.536750 For the other types of learning problems and measures, calculating the performance basically\nworks in the same way. Requirements of performance measures Note that in order to calculate some performance measures it is required that you pass the Task or the fitted model in addition to the Prediction . For example in order to assess the time needed for training ( timetrain ), the fitted\nmodel has to be passed. performance(pred, measures = timetrain, model = mod)\n# timetrain \n# 0.121 For many performance measures in cluster analysis the Task is required. lrn = makeLearner( cluster.kmeans , centers = 3)\nmod = train(lrn, mtcars.task)\npred = predict(mod, task = mtcars.task)\n\n## Calculate the Dunn index\nperformance(pred, measures = dunn, task = mtcars.task)\n# dunn \n# 0.1462919 Moreover, some measures require a certain type of prediction.\nFor example in binary classification in order to calculate the AUC ( auc ) -- the area\nunder the ROC (receiver operating characteristic) curve -- we have to make sure that posterior\nprobabilities are predicted.\nFor more information on ROC analysis, see the section on ROC analysis . lrn = makeLearner( classif.rpart , predict.type = prob )\nmod = train(lrn, task = sonar.task)\npred = predict(mod, task = sonar.task)\n\nperformance(pred, measures = auc)\n# auc \n# 0.9224018 Also bear in mind that many of the performance measures that are available for classification,\ne.g., the false positive rate ( fpr ), are only suitable for binary problems.", + "text": "In the following example we fit a gradient boosting machine on a subset of the BostonHousing data set and calculate the default measure mean\nsquared error ( mse ) on the remaining observations. n = getTaskSize(bh.task)\nlrn = makeLearner( regr.gbm , n.trees = 1000)\nmod = train(lrn, task = bh.task, subset = seq(1, n, 2))\npred = predict(mod, task = bh.task, subset = seq(2, n, 2))\n\nperformance(pred)\n# mse \n# 42.68414 The following code computes the median of squared errors ( medse ) instead. performance(pred, measures = medse)\n# medse \n# 9.134965 Of course, we can also calculate multiple performance measures at once by simply passing a\nlist of measures which can also include your own measure . Calculate the mean squared error, median squared error and mean absolute error ( mae ). performance(pred, measures = list(mse, medse, mae))\n# mse medse mae \n# 42.684141 9.134965 4.536750 For the other types of learning problems and measures, calculating the performance basically\nworks in the same way. Requirements of performance measures Note that in order to calculate some performance measures it is required that you pass the Task or the fitted model in addition to the Prediction . For example in order to assess the time needed for training ( timetrain ), the fitted\nmodel has to be passed. performance(pred, measures = timetrain, model = mod)\n# timetrain \n# 0.111 For many performance measures in cluster analysis the Task is required. lrn = makeLearner( cluster.kmeans , centers = 3)\nmod = train(lrn, mtcars.task)\npred = predict(mod, task = mtcars.task)\n\n## Calculate the Dunn index\nperformance(pred, measures = dunn, task = mtcars.task)\n# dunn \n# 0.1462919 Moreover, some measures require a certain type of prediction.\nFor example in binary classification in order to calculate the AUC ( auc ) -- the area\nunder the ROC (receiver operating characteristic) curve -- we have to make sure that posterior\nprobabilities are predicted.\nFor more information on ROC analysis, see the section on ROC analysis . lrn = makeLearner( classif.rpart , predict.type = prob )\nmod = train(lrn, task = sonar.task)\npred = predict(mod, task = sonar.task)\n\nperformance(pred, measures = auc)\n# auc \n# 0.9224018 Also bear in mind that many of the performance measures that are available for classification,\ne.g., the false positive rate ( fpr ), are only suitable for binary problems.", "title": "Calculate performance measures" }, { @@ -157,12 +157,12 @@ }, { "location": "/resample/index.html", - "text": "Resampling\n\n\nIn order to assess the performance of a learning algorithm, resampling\nstrategies are usually used.\nThe entire data set is split into (multiple) training and test sets.\nYou train a learner on each training set, predict on the corresponding test set (sometimes\non the training set as well) and calculate some performance measure.\nThen the individual performance values are aggregated, typically by calculating the mean.\nThere exist various different resampling strategies, for example\ncross-validation and bootstrap, to mention just two popular approaches.\n\n\n\n\nIf you want to read up further details, the paper\n\nResampling Strategies for Model Assessment and Selection\n\nby Simon is proabably not a bad choice.\nBernd has also published a paper\n\nResampling methods for meta-model validation with recommendations for evolutionary computation\n\nwhich contains detailed descriptions and lots of statistical background information on resampling methods.\n\n\nIn \nmlr\n the resampling strategy can be chosen via the function \nmakeResampleDesc\n.\nThe supported resampling strategies are:\n\n\n\n\nCross-validation (\n\"CV\"\n),\n\n\nLeave-one-out cross-validation (\n\"LOO\"\"\n),\n\n\nRepeated cross-validation (\n\"RepCV\"\n),\n\n\nOut-of-bag bootstrap and other variants (\n\"Bootstrap\"\n),\n\n\nSubsampling, also called Monte-Carlo cross-validaton (\n\"Subsample\"\n),\n\n\nHoldout (training/test) (\n\"Holdout\"\n).\n\n\n\n\nThe \nresample\n function evaluates the performance of a \nLearner\n using\nthe specified resampling strategy for a given machine learning \nTask\n.\n\n\nIn the following example the performance of the\n\nCox proportional hazards model\n on the\n\nlung\n data set is calculated using \n3-fold cross-validation\n.\nGenerally, in \nK\n-fold cross-validation\n the data set \nD\n is partitioned into \nK\n subsets of\n(approximately) equal size.\nIn the \ni\n-th step of the \nK\n iterations, the \ni\n-th subset is\nused for testing, while the union of the remaining parts forms the training\nset.\nThe default performance measure in survival analysis is the concordance index (\ncindex\n).\n\n\n## Specify the resampling strategy (3-fold cross-validation)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\n## Calculate the performance\nr = resample(\nsurv.coxph\n, lung.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: cindex.test.mean=0.627\nr\n#\n Resample Result\n#\n Task: lung-example\n#\n Learner: surv.coxph\n#\n cindex.aggr: 0.63\n#\n cindex.mean: 0.63\n#\n cindex.sd: 0.05\n#\n Runtime: 0.167005\n## peak a little bit into r\nnames(r)\n#\n [1] \nlearner.id\n \ntask.id\n \nmeasures.train\n \nmeasures.test\n \n#\n [5] \naggr\n \npred\n \nmodels\n \nerr.msgs\n \n#\n [9] \nextract\n \nruntime\n\nr$aggr\n#\n cindex.test.mean \n#\n 0.6271182\nr$measures.test\n#\n iter cindex\n#\n 1 1 0.5783027\n#\n 2 2 0.6324074\n#\n 3 3 0.6706444\nr$measures.train\n#\n iter cindex\n#\n 1 1 NA\n#\n 2 2 NA\n#\n 3 3 NA\n\n\n\n\nr$measures.test\n gives the value of the performance measure on the 3 individual test\ndata sets.\n\nr$aggr\n shows the aggregated performance value.\nIts name, \n\"cindex.test.mean\"\n, indicates the performance measure, \ncindex\n,\nand the method used to aggregate the 3 individual performances.\n\ntest.mean\n is the default method and, as the name implies, takes the mean over the\nperformances on the 3 test data sets.\nNo predictions on the training data sets were made and thus \nr$measures.train\n contains missing values.\n\n\nIf predictions for the training set are required, too, set \npredict = \"train\"\nor \npredict = \"both\"\n\nin \nmakeResampleDesc\n. This is necessary for some bootstrap methods (\nb632\n and \nb632+\n) and\nwe will see some examples later on.\n\n\nr$pred\n is an object of class \nResamplePrediction\n.\nJust as a \nPrediction\n object (see the section on \nmaking predictions\n)\n\nr$pred\n has an element called \n\"data\"\n which is a \ndata.frame\n that contains the\npredictions and in case of a supervised learning problem the true values of the target\nvariable.\n\n\nhead(r$pred$data)\n#\n id truth.time truth.event response iter set\n#\n 1 1 455 TRUE -0.4951788 1 test\n#\n 2 2 210 TRUE 0.9573824 1 test\n#\n 3 4 310 TRUE 0.8069059 1 test\n#\n 4 10 613 TRUE 0.1918188 1 test\n#\n 5 12 61 TRUE 0.6638736 1 test\n#\n 6 14 81 TRUE -0.1873917 1 test\n\n\n\n\nThe columns \niter\n and \nset\nindicate the resampling iteration and\nif an individual prediction was made on the test or the training data set.\n\n\nIn the above example the performance measure is the concordance index (\ncindex\n).\nOf course, it is possible to compute multiple performance measures at once by\npassing a list of measures\n(see also the previous section on \nevaluating learner performance\n).\n\n\nIn the following we estimate the Dunn index (\ndunn\n), the Davies-Bouldin cluster\nseparation measure (\ndb\n), and the time for training the learner (\ntimetrain\n)\nby \nsubsampling\n with 5 iterations.\nIn each iteration the data set \nD\n is randomly partitioned into a\ntraining and a test set according to a given percentage, e.g., 2/3\ntraining and 1/3 test set. If there is just one iteration, the strategy\nis commonly called \nholdout\n or \ntest sample estimation\n.\n\n\n## cluster iris feature data\ntask = makeClusterTask(data = iris[,-5])\n## Subsampling with 5 iterations and default split 2/3\nrdesc = makeResampleDesc(\nSubsample\n, iters = 5)\n## Subsampling with 5 iterations and 4/5 training data\nrdesc = makeResampleDesc(\nSubsample\n, iters = 5, split = 4/5)\n\n## Calculate the three performance measures\nr = resample(\ncluster.kmeans\n, task, rdesc, measures = list(dunn, db, timetrain))\n#\n [Resample] subsampling iter: 1\n#\n [Resample] subsampling iter: 2\n#\n [Resample] subsampling iter: 3\n#\n [Resample] subsampling iter: 4\n#\n [Resample] subsampling iter: 5\n#\n [Resample] Result: dunn.test.mean=0.274,db.test.mean=0.51,timetrain.test.mean=0.001\nr$aggr\n#\n dunn.test.mean db.test.mean timetrain.test.mean \n#\n 0.2738893 0.5103655 0.0010000\n\n\n\n\nStratified resampling\n\n\nFor classification, it is usually desirable to have the same proportion of the classes in all of the partitions of the original data set. Stratified resampling ensures this.\nThis is particularly useful in case of imbalanced classes and small data sets. Otherwise it may happen, for example,\nthat observations of less frequent classes are missing in some of the training sets which can\ndecrease the performance of the learner, or lead to model crashes\nIn order to conduct stratified resampling, set \nstratify = TRUE\n when calling \nmakeResampleDesc\n.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3, stratify = TRUE)\n\nr = resample(\nclassif.lda\n, iris.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.02\n\n\n\n\nStratification is also available for survival tasks.\nHere the stratification balances the censoring rate.\n\n\nSometimes it is required to also stratify on the input data, e.g. to ensure that all subgroups are represented in all training and test sets.\nTo stratify on the input columns, specify factor columns of your task data via \nstratify.cols\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3, stratify.cols = \nchas\n)\nr = resample(\nregr.rpart\n, bh.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mse.test.mean=23.2\n\n\n\n\nAccessing individual learner models\n\n\nIn each resampling iteration a \nLearner\n is fitted on the respective training set.\nBy default, the resulting \nWrappedModel\ns are not returned by \nresample\n.\nIf you want to keep them, set \nmodels = TRUE\n when calling \nresample\n.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\nr = resample(\nclassif.lda\n, iris.task, rdesc, models = TRUE)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.02\nr$models\n#\n [[1]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters: \n#\n \n#\n [[2]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters: \n#\n \n#\n [[3]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters:\n\n\n\n\nKeeping only certain information instead of entire \nmodels\n, for example the\nvariable importance in a regression tree, can be achieved using the \nextract\n argument.\nThe function passed to \nextract\n is applied to each \nmodel\n fitted on one of\nthe 3 training sets.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\n## Extract the variable importance in a regression tree\nr = resample(\nregr.rpart\n, bh.task, rdesc,\n extract = function(x) x$learner.model$variable.importance)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mse.test.mean=30.3\nr$extract\n#\n [[1]]\n#\n rm lstat crim indus age ptratio \n#\n 15228.2872 10742.2277 3893.2744 3651.6232 2601.5262 2551.8492 \n#\n dis nox rad tax zn \n#\n 2498.2748 2419.5269 1014.2609 743.3742 308.8209 \n#\n \n#\n [[2]]\n#\n lstat nox age indus crim rm \n#\n 15725.19021 9323.20270 8474.23077 8358.67000 8251.74446 7332.59637 \n#\n zn dis tax rad ptratio b \n#\n 6151.29577 2741.12074 2055.67537 1216.01398 634.78381 71.00088 \n#\n \n#\n [[3]]\n#\n rm lstat age ptratio nox dis \n#\n 15890.9279 13262.3672 4296.4175 3678.6651 3668.4944 3512.2753 \n#\n crim tax indus zn b rad \n#\n 3474.5883 2844.9918 1437.7900 1284.4714 578.6932 496.2382\n\n\n\n\nResample descriptions and resample instances\n\n\nAs shown above, the function \nmakeResampleDesc\n is used to specify the resampling strategy.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nstr(rdesc)\n#\n List of 4\n#\n $ id : chr \ncross-validation\n\n#\n $ iters : int 3\n#\n $ predict : chr \ntest\n\n#\n $ stratify: logi FALSE\n#\n - attr(*, \nclass\n)= chr [1:2] \nCVDesc\n \nResampleDesc\n\n\n\n\n\nThe result \nrdesc\nis an object of class \nResampleDesc\n and contains,\nas the name implies, a description of the resampling strategy.\nIn principle, this is an instruction for drawing training and test sets including\nthe necessary parameters like the number of iterations, the sizes of the training and test\nsets etc.\n\n\nBased on this description, the data set is randomly partitioned into multiple training and\ntest sets.\nFor each iteration, we get a set of index vectors indicating the training and test examples.\nThese are stored in a \nResampleInstance\n.\n\n\nIf a \nResampleDesc\n is passed to \nresample\n, it is instantiated internally.\nNaturally, it is also possible to pass a \nResampleInstance\n directly.\n\n\nA \nResampleInstance\n can be created through the function\n\nmakeResampleInstance\n given a \nResampleDesc\n and either the size of\nthe data set at hand or the \nTask\n.\nIt basically performs the random drawing of indices to separate the data into training and\ntest sets according to the description.\n\n\n## Create a resample instance based an a task\nrin = makeResampleInstance(rdesc, task = iris.task)\nrin\n#\n Resample instance for 150 cases.\n#\n Resample description: cross-validation with 3 iterations.\n#\n Predict: test\n#\n Stratification: FALSE\n\n## Create a resample instance given the size of the data set\nrin = makeResampleInstance(rdesc, size = nrow(iris))\nstr(rin)\n#\n List of 5\n#\n $ desc :List of 4\n#\n ..$ id : chr \ncross-validation\n\n#\n ..$ iters : int 3\n#\n ..$ predict : chr \ntest\n\n#\n ..$ stratify: logi FALSE\n#\n ..- attr(*, \nclass\n)= chr [1:2] \nCVDesc\n \nResampleDesc\n\n#\n $ size : int 150\n#\n $ train.inds:List of 3\n#\n ..$ : int [1:100] 36 81 6 82 120 110 118 132 105 61 ...\n#\n ..$ : int [1:100] 6 119 120 110 121 118 99 100 29 127 ...\n#\n ..$ : int [1:100] 36 81 82 119 121 99 132 105 61 115 ...\n#\n $ test.inds :List of 3\n#\n ..$ : int [1:50] 2 3 4 5 7 9 11 16 22 24 ...\n#\n ..$ : int [1:50] 8 12 17 19 20 23 25 27 32 33 ...\n#\n ..$ : int [1:50] 1 6 10 13 14 15 18 21 29 31 ...\n#\n $ group : Factor w/ 0 levels: \n#\n - attr(*, \nclass\n)= chr \nResampleInstance\n\n\n## Access the indices of the training observations in iteration 3\nrin$train.inds[[3]]\n#\n [1] 36 81 82 119 121 99 132 105 61 115 17 42 4 71 5 79 30\n#\n [18] 113 138 19 150 77 58 92 114 133 8 109 33 145 22 111 97 24\n#\n [35] 7 44 3 20 134 96 16 43 149 9 46 32 139 87 2 11 52\n#\n [52] 86 40 141 142 72 54 48 83 64 90 112 148 129 137 116 143 69\n#\n [69] 84 25 80 37 38 75 130 126 135 107 146 26 12 98 55 124 60\n#\n [86] 63 117 23 67 73 28 106 76 50 144 59 47 102 56 27\n\n\n\n\nWhile having two separate objects, resample descriptions and instances as well as the \nresample\n\nfunction seems overly complicated, it has several advantages:\n\n\n\n\nResample instances allow for paired experiments, that is comparing the performance\n of several learners on exactly the same training and test sets.\n This is particularly useful if you want to add another method to a comparison experiment\n you already did.\n\n\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nrin = makeResampleInstance(rdesc, task = iris.task)\n\n## Calculate the performance of two learners based on the same resample instance\nr.lda = resample(\nclassif.lda\n, iris.task, rin, show.info = FALSE)\nr.rpart = resample(\nclassif.rpart\n, iris.task, rin, show.info = FALSE)\nr.lda$aggr\n#\n mmce.test.mean \n#\n 0.02666667\nr.rpart$aggr\n#\n mmce.test.mean \n#\n 0.06\n\n\n\n\n\n\nIt is easy to add other resampling methods later on. You can\n simply derive from the \nResampleInstance\n\n class, but you do not have to touch any methods that use the\n resampling strategy.\n\n\n\n\nAs mentioned above, when calling \nmakeResampleInstance\n the index sets are drawn randomly.\nMainly for \nholdout\n (\ntest sample\n) \nestimation\n you might want full control about the training\nand tests set and specify them manually.\nThis can be done using the function \nmakeFixedHoldoutInstance\n.\n\n\nrin = makeFixedHoldoutInstance(train.inds = 1:100, test.inds = 101:150, size = 150)\nrin\n#\n Resample instance for 150 cases.\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n\n\n\n\nAggregating performance values\n\n\nIn resampling we get (for each measure we wish to calculate) one performance\nvalue (on the test set, training set, or both) for each iteration.\nSubsequently, these are aggregated.\nAs mentioned above, mainly the mean over the performance values on the test data sets\n(\ntest.mean\n) is calculated.\n\n\nFor example, a 10-fold cross validation computes 10 values for the chosen\nperformance measure.\nThe aggregated value is the mean of these 10 numbers.\n\nmlr\n knows how to handle it because each \nMeasure\n knows how it is aggregated:\n\n\n## Mean misclassification error\nmmce$aggr\n#\n Aggregation function: test.mean\n\n## Root mean square error\nrmse$aggr\n#\n Aggregation function: test.rmse\n\n\n\n\nThe aggregation method of a \nMeasure\n can be changed via the function \nsetAggregation\n.\nSee the documentation of \naggregations\n for available methods.\n\n\nExample: Different measures and aggregations\n\n\ntest.median\n computes the median of the performance values on the test sets.\n\n\n## We use the mean error rate and the median of the true positive rates\nm1 = mmce\nm2 = setAggregation(tpr, test.median)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(\nclassif.rpart\n, sonar.task, rdesc, measures = list(m1, m2))\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.293,tpr.test.median=0.735\nr$aggr\n#\n mmce.test.mean tpr.test.median \n#\n 0.2930987 0.7352941\n\n\n\n\nExample: Calculating the training error\n\n\nHere we calculate the mean misclassification error (\nmmce\n) on the training and the test\ndata sets. Note that we have to set \npredict = \"both\"\nwhen calling \nmakeResampleDesc\n\nin order to get predictions on both data sets, training and test.\n\n\nmmce.train.mean = setAggregation(mmce, train.mean)\nrdesc = makeResampleDesc(\nCV\n, iters = 3, predict = \nboth\n)\nr = resample(\nclassif.rpart\n, iris.task, rdesc, measures = list(mmce, mmce.train.mean))\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.0467,mmce.train.mean=0.0367\nr$measures.train\n#\n iter mmce mmce\n#\n 1 1 0.04 0.04\n#\n 2 2 0.03 0.03\n#\n 3 3 0.04 0.04\nr$aggr\n#\n mmce.test.mean mmce.train.mean \n#\n 0.04666667 0.03666667\n\n\n\n\nExample: Bootstrap\n\n\nIn \nout-of-bag bootstrap estimation\n \nB\n new data sets \nD_1\n to \nD_B\n are drawn from the\ndata set \nD\n with replacement, each of the same size as \nD\n.\nIn the \ni\n-th iteration, \nD_i\n forms the training set, while the remaining elements from\n\nD\n, i.e., elements not in the training set, form the test set.\n\n\n\n\n\nThe variants \nb632\n and \nb632+\n calculate a convex combination of the training performance and\nthe out-of-bag bootstrap performance and thus require predictions on the training sets and an\nappropriate aggregation strategy.\n\n\nrdesc = makeResampleDesc(\nBootstrap\n, predict = \nboth\n, iters = 10)\nb632.mmce = setAggregation(mmce, b632)\nb632plus.mmce = setAggregation(mmce, b632plus)\nb632.mmce\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: b632\n#\n Note:\n\nr = resample(\nclassif.rpart\n, iris.task, rdesc,\n measures = list(mmce, b632.mmce, b632plus.mmce), show.info = FALSE)\nhead(r$measures.train)\n#\n iter mmce mmce mmce\n#\n 1 1 0.026666667 0.026666667 0.026666667\n#\n 2 2 0.026666667 0.026666667 0.026666667\n#\n 3 3 0.006666667 0.006666667 0.006666667\n#\n 4 4 0.026666667 0.026666667 0.026666667\n#\n 5 5 0.033333333 0.033333333 0.033333333\n#\n 6 6 0.013333333 0.013333333 0.013333333\nr$aggr\n#\n mmce.test.mean mmce.b632 mmce.b632plus \n#\n 0.07051905 0.05389071 0.05496489\n\n\n\n\nConvenience functions\n\n\nWhen quickly trying out some learners, it can get tedious to write the \nR\n\ncode for generating a resample instance, setting the aggregation strategy and so\non. For this reason \nmlr\n provides some convenience functions for the\nfrequently used resampling strategies, for example \nholdout\n,\n\ncrossval\n or \nbootstrapB632\n. But note that you do not\nhave as much control and flexibility as when using \nresample\n with a resample\ndescription or instance.\n\n\nholdout(\nregr.lm\n, bh.task, measures = list(mse, mae))\ncrossval(\nclassif.lda\n, iris.task, iters = 3, measures = list(mmce, ber))", + "text": "Resampling\n\n\nIn order to assess the performance of a learning algorithm, resampling\nstrategies are usually used.\nThe entire data set is split into (multiple) training and test sets.\nYou train a learner on each training set, predict on the corresponding test set (sometimes\non the training set as well) and calculate some performance measure.\nThen the individual performance values are aggregated, typically by calculating the mean.\nThere exist various different resampling strategies, for example\ncross-validation and bootstrap, to mention just two popular approaches.\n\n\n\n\nIf you want to read up further details, the paper\n\nResampling Strategies for Model Assessment and Selection\n\nby Simon is proabably not a bad choice.\nBernd has also published a paper\n\nResampling methods for meta-model validation with recommendations for evolutionary computation\n\nwhich contains detailed descriptions and lots of statistical background information on resampling methods.\n\n\nIn \nmlr\n the resampling strategy can be chosen via the function \nmakeResampleDesc\n.\nThe supported resampling strategies are:\n\n\n\n\nCross-validation (\n\"CV\"\n),\n\n\nLeave-one-out cross-validation (\n\"LOO\"\"\n),\n\n\nRepeated cross-validation (\n\"RepCV\"\n),\n\n\nOut-of-bag bootstrap and other variants (\n\"Bootstrap\"\n),\n\n\nSubsampling, also called Monte-Carlo cross-validaton (\n\"Subsample\"\n),\n\n\nHoldout (training/test) (\n\"Holdout\"\n).\n\n\n\n\nThe \nresample\n function evaluates the performance of a \nLearner\n using\nthe specified resampling strategy for a given machine learning \nTask\n.\n\n\nIn the following example the performance of the\n\nCox proportional hazards model\n on the\n\nlung\n data set is calculated using \n3-fold cross-validation\n.\nGenerally, in \nK\n-fold cross-validation\n the data set \nD\n is partitioned into \nK\n subsets of\n(approximately) equal size.\nIn the \ni\n-th step of the \nK\n iterations, the \ni\n-th subset is\nused for testing, while the union of the remaining parts forms the training\nset.\nThe default performance measure in survival analysis is the concordance index (\ncindex\n).\n\n\n## Specify the resampling strategy (3-fold cross-validation)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\n## Calculate the performance\nr = resample(\nsurv.coxph\n, lung.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: cindex.test.mean=0.627\nr\n#\n Resample Result\n#\n Task: lung-example\n#\n Learner: surv.coxph\n#\n cindex.aggr: 0.63\n#\n cindex.mean: 0.63\n#\n cindex.sd: 0.05\n#\n Runtime: 0.126201\n## peak a little bit into r\nnames(r)\n#\n [1] \nlearner.id\n \ntask.id\n \nmeasures.train\n \nmeasures.test\n \n#\n [5] \naggr\n \npred\n \nmodels\n \nerr.msgs\n \n#\n [9] \nextract\n \nruntime\n\nr$aggr\n#\n cindex.test.mean \n#\n 0.6271182\nr$measures.test\n#\n iter cindex\n#\n 1 1 0.5783027\n#\n 2 2 0.6324074\n#\n 3 3 0.6706444\nr$measures.train\n#\n iter cindex\n#\n 1 1 NA\n#\n 2 2 NA\n#\n 3 3 NA\n\n\n\n\nr$measures.test\n gives the value of the performance measure on the 3 individual test\ndata sets.\n\nr$aggr\n shows the aggregated performance value.\nIts name, \n\"cindex.test.mean\"\n, indicates the performance measure, \ncindex\n,\nand the method used to aggregate the 3 individual performances.\n\ntest.mean\n is the default method and, as the name implies, takes the mean over the\nperformances on the 3 test data sets.\nNo predictions on the training data sets were made and thus \nr$measures.train\n contains missing values.\n\n\nIf predictions for the training set are required, too, set \npredict = \"train\"\nor \npredict = \"both\"\n\nin \nmakeResampleDesc\n. This is necessary for some bootstrap methods (\nb632\n and \nb632+\n) and\nwe will see some examples later on.\n\n\nr$pred\n is an object of class \nResamplePrediction\n.\nJust as a \nPrediction\n object (see the section on \nmaking predictions\n)\n\nr$pred\n has an element called \n\"data\"\n which is a \ndata.frame\n that contains the\npredictions and in case of a supervised learning problem the true values of the target\nvariable.\n\n\nhead(r$pred$data)\n#\n id truth.time truth.event response iter set\n#\n 1 1 455 TRUE -0.4951788 1 test\n#\n 2 2 210 TRUE 0.9573824 1 test\n#\n 3 4 310 TRUE 0.8069059 1 test\n#\n 4 10 613 TRUE 0.1918188 1 test\n#\n 5 12 61 TRUE 0.6638736 1 test\n#\n 6 14 81 TRUE -0.1873917 1 test\n\n\n\n\nThe columns \niter\n and \nset\nindicate the resampling iteration and\nif an individual prediction was made on the test or the training data set.\n\n\nIn the above example the performance measure is the concordance index (\ncindex\n).\nOf course, it is possible to compute multiple performance measures at once by\npassing a list of measures\n(see also the previous section on \nevaluating learner performance\n).\n\n\nIn the following we estimate the Dunn index (\ndunn\n), the Davies-Bouldin cluster\nseparation measure (\ndb\n), and the time for training the learner (\ntimetrain\n)\nby \nsubsampling\n with 5 iterations.\nIn each iteration the data set \nD\n is randomly partitioned into a\ntraining and a test set according to a given percentage, e.g., 2/3\ntraining and 1/3 test set. If there is just one iteration, the strategy\nis commonly called \nholdout\n or \ntest sample estimation\n.\n\n\n## cluster iris feature data\ntask = makeClusterTask(data = iris[,-5])\n## Subsampling with 5 iterations and default split 2/3\nrdesc = makeResampleDesc(\nSubsample\n, iters = 5)\n## Subsampling with 5 iterations and 4/5 training data\nrdesc = makeResampleDesc(\nSubsample\n, iters = 5, split = 4/5)\n\n## Calculate the three performance measures\nr = resample(\ncluster.kmeans\n, task, rdesc, measures = list(dunn, db, timetrain))\n#\n [Resample] subsampling iter: 1\n#\n [Resample] subsampling iter: 2\n#\n [Resample] subsampling iter: 3\n#\n [Resample] subsampling iter: 4\n#\n [Resample] subsampling iter: 5\n#\n [Resample] Result: dunn.test.mean=0.274,db.test.mean=0.51,timetrain.test.mean=0.001\nr$aggr\n#\n dunn.test.mean db.test.mean timetrain.test.mean \n#\n 0.2738893 0.5103655 0.0010000\n\n\n\n\nStratified resampling\n\n\nFor classification, it is usually desirable to have the same proportion of the classes in all of the partitions of the original data set. Stratified resampling ensures this.\nThis is particularly useful in case of imbalanced classes and small data sets. Otherwise it may happen, for example,\nthat observations of less frequent classes are missing in some of the training sets which can\ndecrease the performance of the learner, or lead to model crashes\nIn order to conduct stratified resampling, set \nstratify = TRUE\n when calling \nmakeResampleDesc\n.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3, stratify = TRUE)\n\nr = resample(\nclassif.lda\n, iris.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.02\n\n\n\n\nStratification is also available for survival tasks.\nHere the stratification balances the censoring rate.\n\n\nSometimes it is required to also stratify on the input data, e.g. to ensure that all subgroups are represented in all training and test sets.\nTo stratify on the input columns, specify factor columns of your task data via \nstratify.cols\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3, stratify.cols = \nchas\n)\nr = resample(\nregr.rpart\n, bh.task, rdesc)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mse.test.mean=23.2\n\n\n\n\nAccessing individual learner models\n\n\nIn each resampling iteration a \nLearner\n is fitted on the respective training set.\nBy default, the resulting \nWrappedModel\ns are not returned by \nresample\n.\nIf you want to keep them, set \nmodels = TRUE\n when calling \nresample\n.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\nr = resample(\nclassif.lda\n, iris.task, rdesc, models = TRUE)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.02\nr$models\n#\n [[1]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters: \n#\n \n#\n [[2]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters: \n#\n \n#\n [[3]]\n#\n Model for learner.id=classif.lda; learner.class=classif.lda\n#\n Trained on: task.id = iris-example; obs = 100; features = 4\n#\n Hyperparameters:\n\n\n\n\nKeeping only certain information instead of entire \nmodels\n, for example the\nvariable importance in a regression tree, can be achieved using the \nextract\n argument.\nThe function passed to \nextract\n is applied to each \nmodel\n fitted on one of\nthe 3 training sets.\n\n\n## 3-fold cross-validation\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\n## Extract the variable importance in a regression tree\nr = resample(\nregr.rpart\n, bh.task, rdesc,\n extract = function(x) x$learner.model$variable.importance)\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mse.test.mean=30.3\nr$extract\n#\n [[1]]\n#\n rm lstat crim indus age ptratio \n#\n 15228.2872 10742.2277 3893.2744 3651.6232 2601.5262 2551.8492 \n#\n dis nox rad tax zn \n#\n 2498.2748 2419.5269 1014.2609 743.3742 308.8209 \n#\n \n#\n [[2]]\n#\n lstat nox age indus crim rm \n#\n 15725.19021 9323.20270 8474.23077 8358.67000 8251.74446 7332.59637 \n#\n zn dis tax rad ptratio b \n#\n 6151.29577 2741.12074 2055.67537 1216.01398 634.78381 71.00088 \n#\n \n#\n [[3]]\n#\n rm lstat age ptratio nox dis \n#\n 15890.9279 13262.3672 4296.4175 3678.6651 3668.4944 3512.2753 \n#\n crim tax indus zn b rad \n#\n 3474.5883 2844.9918 1437.7900 1284.4714 578.6932 496.2382\n\n\n\n\nResample descriptions and resample instances\n\n\nAs shown above, the function \nmakeResampleDesc\n is used to specify the resampling strategy.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nstr(rdesc)\n#\n List of 4\n#\n $ id : chr \ncross-validation\n\n#\n $ iters : int 3\n#\n $ predict : chr \ntest\n\n#\n $ stratify: logi FALSE\n#\n - attr(*, \nclass\n)= chr [1:2] \nCVDesc\n \nResampleDesc\n\n\n\n\n\nThe result \nrdesc\nis an object of class \nResampleDesc\n and contains,\nas the name implies, a description of the resampling strategy.\nIn principle, this is an instruction for drawing training and test sets including\nthe necessary parameters like the number of iterations, the sizes of the training and test\nsets etc.\n\n\nBased on this description, the data set is randomly partitioned into multiple training and\ntest sets.\nFor each iteration, we get a set of index vectors indicating the training and test examples.\nThese are stored in a \nResampleInstance\n.\n\n\nIf a \nResampleDesc\n is passed to \nresample\n, it is instantiated internally.\nNaturally, it is also possible to pass a \nResampleInstance\n directly.\n\n\nA \nResampleInstance\n can be created through the function\n\nmakeResampleInstance\n given a \nResampleDesc\n and either the size of\nthe data set at hand or the \nTask\n.\nIt basically performs the random drawing of indices to separate the data into training and\ntest sets according to the description.\n\n\n## Create a resample instance based an a task\nrin = makeResampleInstance(rdesc, task = iris.task)\nrin\n#\n Resample instance for 150 cases.\n#\n Resample description: cross-validation with 3 iterations.\n#\n Predict: test\n#\n Stratification: FALSE\n\n## Create a resample instance given the size of the data set\nrin = makeResampleInstance(rdesc, size = nrow(iris))\nstr(rin)\n#\n List of 5\n#\n $ desc :List of 4\n#\n ..$ id : chr \ncross-validation\n\n#\n ..$ iters : int 3\n#\n ..$ predict : chr \ntest\n\n#\n ..$ stratify: logi FALSE\n#\n ..- attr(*, \nclass\n)= chr [1:2] \nCVDesc\n \nResampleDesc\n\n#\n $ size : int 150\n#\n $ train.inds:List of 3\n#\n ..$ : int [1:100] 36 81 6 82 120 110 118 132 105 61 ...\n#\n ..$ : int [1:100] 6 119 120 110 121 118 99 100 29 127 ...\n#\n ..$ : int [1:100] 36 81 82 119 121 99 132 105 61 115 ...\n#\n $ test.inds :List of 3\n#\n ..$ : int [1:50] 2 3 4 5 7 9 11 16 22 24 ...\n#\n ..$ : int [1:50] 8 12 17 19 20 23 25 27 32 33 ...\n#\n ..$ : int [1:50] 1 6 10 13 14 15 18 21 29 31 ...\n#\n $ group : Factor w/ 0 levels: \n#\n - attr(*, \nclass\n)= chr \nResampleInstance\n\n\n## Access the indices of the training observations in iteration 3\nrin$train.inds[[3]]\n#\n [1] 36 81 82 119 121 99 132 105 61 115 17 42 4 71 5 79 30\n#\n [18] 113 138 19 150 77 58 92 114 133 8 109 33 145 22 111 97 24\n#\n [35] 7 44 3 20 134 96 16 43 149 9 46 32 139 87 2 11 52\n#\n [52] 86 40 141 142 72 54 48 83 64 90 112 148 129 137 116 143 69\n#\n [69] 84 25 80 37 38 75 130 126 135 107 146 26 12 98 55 124 60\n#\n [86] 63 117 23 67 73 28 106 76 50 144 59 47 102 56 27\n\n\n\n\nWhile having two separate objects, resample descriptions and instances as well as the \nresample\n\nfunction seems overly complicated, it has several advantages:\n\n\n\n\nResample instances allow for paired experiments, that is comparing the performance\n of several learners on exactly the same training and test sets.\n This is particularly useful if you want to add another method to a comparison experiment\n you already did.\n\n\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nrin = makeResampleInstance(rdesc, task = iris.task)\n\n## Calculate the performance of two learners based on the same resample instance\nr.lda = resample(\nclassif.lda\n, iris.task, rin, show.info = FALSE)\nr.rpart = resample(\nclassif.rpart\n, iris.task, rin, show.info = FALSE)\nr.lda$aggr\n#\n mmce.test.mean \n#\n 0.02666667\nr.rpart$aggr\n#\n mmce.test.mean \n#\n 0.06\n\n\n\n\n\n\nIt is easy to add other resampling methods later on. You can\n simply derive from the \nResampleInstance\n\n class, but you do not have to touch any methods that use the\n resampling strategy.\n\n\n\n\nAs mentioned above, when calling \nmakeResampleInstance\n the index sets are drawn randomly.\nMainly for \nholdout\n (\ntest sample\n) \nestimation\n you might want full control about the training\nand tests set and specify them manually.\nThis can be done using the function \nmakeFixedHoldoutInstance\n.\n\n\nrin = makeFixedHoldoutInstance(train.inds = 1:100, test.inds = 101:150, size = 150)\nrin\n#\n Resample instance for 150 cases.\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n\n\n\n\nAggregating performance values\n\n\nIn resampling we get (for each measure we wish to calculate) one performance\nvalue (on the test set, training set, or both) for each iteration.\nSubsequently, these are aggregated.\nAs mentioned above, mainly the mean over the performance values on the test data sets\n(\ntest.mean\n) is calculated.\n\n\nFor example, a 10-fold cross validation computes 10 values for the chosen\nperformance measure.\nThe aggregated value is the mean of these 10 numbers.\n\nmlr\n knows how to handle it because each \nMeasure\n knows how it is aggregated:\n\n\n## Mean misclassification error\nmmce$aggr\n#\n Aggregation function: test.mean\n\n## Root mean square error\nrmse$aggr\n#\n Aggregation function: test.rmse\n\n\n\n\nThe aggregation method of a \nMeasure\n can be changed via the function \nsetAggregation\n.\nSee the documentation of \naggregations\n for available methods.\n\n\nExample: Different measures and aggregations\n\n\ntest.median\n computes the median of the performance values on the test sets.\n\n\n## We use the mean error rate and the median of the true positive rates\nm1 = mmce\nm2 = setAggregation(tpr, test.median)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(\nclassif.rpart\n, sonar.task, rdesc, measures = list(m1, m2))\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.293,tpr.test.median=0.735\nr$aggr\n#\n mmce.test.mean tpr.test.median \n#\n 0.2930987 0.7352941\n\n\n\n\nExample: Calculating the training error\n\n\nHere we calculate the mean misclassification error (\nmmce\n) on the training and the test\ndata sets. Note that we have to set \npredict = \"both\"\nwhen calling \nmakeResampleDesc\n\nin order to get predictions on both data sets, training and test.\n\n\nmmce.train.mean = setAggregation(mmce, train.mean)\nrdesc = makeResampleDesc(\nCV\n, iters = 3, predict = \nboth\n)\nr = resample(\nclassif.rpart\n, iris.task, rdesc, measures = list(mmce, mmce.train.mean))\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] Result: mmce.test.mean=0.0467,mmce.train.mean=0.0367\nr$measures.train\n#\n iter mmce mmce\n#\n 1 1 0.04 0.04\n#\n 2 2 0.03 0.03\n#\n 3 3 0.04 0.04\nr$aggr\n#\n mmce.test.mean mmce.train.mean \n#\n 0.04666667 0.03666667\n\n\n\n\nExample: Bootstrap\n\n\nIn \nout-of-bag bootstrap estimation\n \nB\n new data sets \nD_1\n to \nD_B\n are drawn from the\ndata set \nD\n with replacement, each of the same size as \nD\n.\nIn the \ni\n-th iteration, \nD_i\n forms the training set, while the remaining elements from\n\nD\n, i.e., elements not in the training set, form the test set.\n\n\n\n\n\nThe variants \nb632\n and \nb632+\n calculate a convex combination of the training performance and\nthe out-of-bag bootstrap performance and thus require predictions on the training sets and an\nappropriate aggregation strategy.\n\n\nrdesc = makeResampleDesc(\nBootstrap\n, predict = \nboth\n, iters = 10)\nb632.mmce = setAggregation(mmce, b632)\nb632plus.mmce = setAggregation(mmce, b632plus)\nb632.mmce\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: b632\n#\n Note:\n\nr = resample(\nclassif.rpart\n, iris.task, rdesc,\n measures = list(mmce, b632.mmce, b632plus.mmce), show.info = FALSE)\nhead(r$measures.train)\n#\n iter mmce mmce mmce\n#\n 1 1 0.026666667 0.026666667 0.026666667\n#\n 2 2 0.026666667 0.026666667 0.026666667\n#\n 3 3 0.006666667 0.006666667 0.006666667\n#\n 4 4 0.026666667 0.026666667 0.026666667\n#\n 5 5 0.033333333 0.033333333 0.033333333\n#\n 6 6 0.013333333 0.013333333 0.013333333\nr$aggr\n#\n mmce.test.mean mmce.b632 mmce.b632plus \n#\n 0.07051905 0.05389071 0.05496489\n\n\n\n\nConvenience functions\n\n\nWhen quickly trying out some learners, it can get tedious to write the \nR\n\ncode for generating a resample instance, setting the aggregation strategy and so\non. For this reason \nmlr\n provides some convenience functions for the\nfrequently used resampling strategies, for example \nholdout\n,\n\ncrossval\n or \nbootstrapB632\n. But note that you do not\nhave as much control and flexibility as when using \nresample\n with a resample\ndescription or instance.\n\n\nholdout(\nregr.lm\n, bh.task, measures = list(mse, mae))\ncrossval(\nclassif.lda\n, iris.task, iters = 3, measures = list(mmce, ber))", "title": "Resampling" }, { "location": "/resample/index.html#resampling", - "text": "In order to assess the performance of a learning algorithm, resampling\nstrategies are usually used.\nThe entire data set is split into (multiple) training and test sets.\nYou train a learner on each training set, predict on the corresponding test set (sometimes\non the training set as well) and calculate some performance measure.\nThen the individual performance values are aggregated, typically by calculating the mean.\nThere exist various different resampling strategies, for example\ncross-validation and bootstrap, to mention just two popular approaches. If you want to read up further details, the paper Resampling Strategies for Model Assessment and Selection \nby Simon is proabably not a bad choice.\nBernd has also published a paper Resampling methods for meta-model validation with recommendations for evolutionary computation \nwhich contains detailed descriptions and lots of statistical background information on resampling methods. In mlr the resampling strategy can be chosen via the function makeResampleDesc .\nThe supported resampling strategies are: Cross-validation ( \"CV\" ), Leave-one-out cross-validation ( \"LOO\"\" ), Repeated cross-validation ( \"RepCV\" ), Out-of-bag bootstrap and other variants ( \"Bootstrap\" ), Subsampling, also called Monte-Carlo cross-validaton ( \"Subsample\" ), Holdout (training/test) ( \"Holdout\" ). The resample function evaluates the performance of a Learner using\nthe specified resampling strategy for a given machine learning Task . In the following example the performance of the Cox proportional hazards model on the lung data set is calculated using 3-fold cross-validation .\nGenerally, in K -fold cross-validation the data set D is partitioned into K subsets of\n(approximately) equal size.\nIn the i -th step of the K iterations, the i -th subset is\nused for testing, while the union of the remaining parts forms the training\nset.\nThe default performance measure in survival analysis is the concordance index ( cindex ). ## Specify the resampling strategy (3-fold cross-validation)\nrdesc = makeResampleDesc( CV , iters = 3)\n\n## Calculate the performance\nr = resample( surv.coxph , lung.task, rdesc)\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] Result: cindex.test.mean=0.627\nr\n# Resample Result\n# Task: lung-example\n# Learner: surv.coxph\n# cindex.aggr: 0.63\n# cindex.mean: 0.63\n# cindex.sd: 0.05\n# Runtime: 0.167005\n## peak a little bit into r\nnames(r)\n# [1] learner.id task.id measures.train measures.test \n# [5] aggr pred models err.msgs \n# [9] extract runtime \nr$aggr\n# cindex.test.mean \n# 0.6271182\nr$measures.test\n# iter cindex\n# 1 1 0.5783027\n# 2 2 0.6324074\n# 3 3 0.6706444\nr$measures.train\n# iter cindex\n# 1 1 NA\n# 2 2 NA\n# 3 3 NA r$measures.test gives the value of the performance measure on the 3 individual test\ndata sets. r$aggr shows the aggregated performance value.\nIts name, \"cindex.test.mean\" , indicates the performance measure, cindex ,\nand the method used to aggregate the 3 individual performances. test.mean is the default method and, as the name implies, takes the mean over the\nperformances on the 3 test data sets.\nNo predictions on the training data sets were made and thus r$measures.train contains missing values. If predictions for the training set are required, too, set predict = \"train\" or predict = \"both\" \nin makeResampleDesc . This is necessary for some bootstrap methods ( b632 and b632+ ) and\nwe will see some examples later on. r$pred is an object of class ResamplePrediction .\nJust as a Prediction object (see the section on making predictions ) r$pred has an element called \"data\" which is a data.frame that contains the\npredictions and in case of a supervised learning problem the true values of the target\nvariable. head(r$pred$data)\n# id truth.time truth.event response iter set\n# 1 1 455 TRUE -0.4951788 1 test\n# 2 2 210 TRUE 0.9573824 1 test\n# 3 4 310 TRUE 0.8069059 1 test\n# 4 10 613 TRUE 0.1918188 1 test\n# 5 12 61 TRUE 0.6638736 1 test\n# 6 14 81 TRUE -0.1873917 1 test The columns iter and set indicate the resampling iteration and\nif an individual prediction was made on the test or the training data set. In the above example the performance measure is the concordance index ( cindex ).\nOf course, it is possible to compute multiple performance measures at once by\npassing a list of measures\n(see also the previous section on evaluating learner performance ). In the following we estimate the Dunn index ( dunn ), the Davies-Bouldin cluster\nseparation measure ( db ), and the time for training the learner ( timetrain )\nby subsampling with 5 iterations.\nIn each iteration the data set D is randomly partitioned into a\ntraining and a test set according to a given percentage, e.g., 2/3\ntraining and 1/3 test set. If there is just one iteration, the strategy\nis commonly called holdout or test sample estimation . ## cluster iris feature data\ntask = makeClusterTask(data = iris[,-5])\n## Subsampling with 5 iterations and default split 2/3\nrdesc = makeResampleDesc( Subsample , iters = 5)\n## Subsampling with 5 iterations and 4/5 training data\nrdesc = makeResampleDesc( Subsample , iters = 5, split = 4/5)\n\n## Calculate the three performance measures\nr = resample( cluster.kmeans , task, rdesc, measures = list(dunn, db, timetrain))\n# [Resample] subsampling iter: 1\n# [Resample] subsampling iter: 2\n# [Resample] subsampling iter: 3\n# [Resample] subsampling iter: 4\n# [Resample] subsampling iter: 5\n# [Resample] Result: dunn.test.mean=0.274,db.test.mean=0.51,timetrain.test.mean=0.001\nr$aggr\n# dunn.test.mean db.test.mean timetrain.test.mean \n# 0.2738893 0.5103655 0.0010000", + "text": "In order to assess the performance of a learning algorithm, resampling\nstrategies are usually used.\nThe entire data set is split into (multiple) training and test sets.\nYou train a learner on each training set, predict on the corresponding test set (sometimes\non the training set as well) and calculate some performance measure.\nThen the individual performance values are aggregated, typically by calculating the mean.\nThere exist various different resampling strategies, for example\ncross-validation and bootstrap, to mention just two popular approaches. If you want to read up further details, the paper Resampling Strategies for Model Assessment and Selection \nby Simon is proabably not a bad choice.\nBernd has also published a paper Resampling methods for meta-model validation with recommendations for evolutionary computation \nwhich contains detailed descriptions and lots of statistical background information on resampling methods. In mlr the resampling strategy can be chosen via the function makeResampleDesc .\nThe supported resampling strategies are: Cross-validation ( \"CV\" ), Leave-one-out cross-validation ( \"LOO\"\" ), Repeated cross-validation ( \"RepCV\" ), Out-of-bag bootstrap and other variants ( \"Bootstrap\" ), Subsampling, also called Monte-Carlo cross-validaton ( \"Subsample\" ), Holdout (training/test) ( \"Holdout\" ). The resample function evaluates the performance of a Learner using\nthe specified resampling strategy for a given machine learning Task . In the following example the performance of the Cox proportional hazards model on the lung data set is calculated using 3-fold cross-validation .\nGenerally, in K -fold cross-validation the data set D is partitioned into K subsets of\n(approximately) equal size.\nIn the i -th step of the K iterations, the i -th subset is\nused for testing, while the union of the remaining parts forms the training\nset.\nThe default performance measure in survival analysis is the concordance index ( cindex ). ## Specify the resampling strategy (3-fold cross-validation)\nrdesc = makeResampleDesc( CV , iters = 3)\n\n## Calculate the performance\nr = resample( surv.coxph , lung.task, rdesc)\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] Result: cindex.test.mean=0.627\nr\n# Resample Result\n# Task: lung-example\n# Learner: surv.coxph\n# cindex.aggr: 0.63\n# cindex.mean: 0.63\n# cindex.sd: 0.05\n# Runtime: 0.126201\n## peak a little bit into r\nnames(r)\n# [1] learner.id task.id measures.train measures.test \n# [5] aggr pred models err.msgs \n# [9] extract runtime \nr$aggr\n# cindex.test.mean \n# 0.6271182\nr$measures.test\n# iter cindex\n# 1 1 0.5783027\n# 2 2 0.6324074\n# 3 3 0.6706444\nr$measures.train\n# iter cindex\n# 1 1 NA\n# 2 2 NA\n# 3 3 NA r$measures.test gives the value of the performance measure on the 3 individual test\ndata sets. r$aggr shows the aggregated performance value.\nIts name, \"cindex.test.mean\" , indicates the performance measure, cindex ,\nand the method used to aggregate the 3 individual performances. test.mean is the default method and, as the name implies, takes the mean over the\nperformances on the 3 test data sets.\nNo predictions on the training data sets were made and thus r$measures.train contains missing values. If predictions for the training set are required, too, set predict = \"train\" or predict = \"both\" \nin makeResampleDesc . This is necessary for some bootstrap methods ( b632 and b632+ ) and\nwe will see some examples later on. r$pred is an object of class ResamplePrediction .\nJust as a Prediction object (see the section on making predictions ) r$pred has an element called \"data\" which is a data.frame that contains the\npredictions and in case of a supervised learning problem the true values of the target\nvariable. head(r$pred$data)\n# id truth.time truth.event response iter set\n# 1 1 455 TRUE -0.4951788 1 test\n# 2 2 210 TRUE 0.9573824 1 test\n# 3 4 310 TRUE 0.8069059 1 test\n# 4 10 613 TRUE 0.1918188 1 test\n# 5 12 61 TRUE 0.6638736 1 test\n# 6 14 81 TRUE -0.1873917 1 test The columns iter and set indicate the resampling iteration and\nif an individual prediction was made on the test or the training data set. In the above example the performance measure is the concordance index ( cindex ).\nOf course, it is possible to compute multiple performance measures at once by\npassing a list of measures\n(see also the previous section on evaluating learner performance ). In the following we estimate the Dunn index ( dunn ), the Davies-Bouldin cluster\nseparation measure ( db ), and the time for training the learner ( timetrain )\nby subsampling with 5 iterations.\nIn each iteration the data set D is randomly partitioned into a\ntraining and a test set according to a given percentage, e.g., 2/3\ntraining and 1/3 test set. If there is just one iteration, the strategy\nis commonly called holdout or test sample estimation . ## cluster iris feature data\ntask = makeClusterTask(data = iris[,-5])\n## Subsampling with 5 iterations and default split 2/3\nrdesc = makeResampleDesc( Subsample , iters = 5)\n## Subsampling with 5 iterations and 4/5 training data\nrdesc = makeResampleDesc( Subsample , iters = 5, split = 4/5)\n\n## Calculate the three performance measures\nr = resample( cluster.kmeans , task, rdesc, measures = list(dunn, db, timetrain))\n# [Resample] subsampling iter: 1\n# [Resample] subsampling iter: 2\n# [Resample] subsampling iter: 3\n# [Resample] subsampling iter: 4\n# [Resample] subsampling iter: 5\n# [Resample] Result: dunn.test.mean=0.274,db.test.mean=0.51,timetrain.test.mean=0.001\nr$aggr\n# dunn.test.mean db.test.mean timetrain.test.mean \n# 0.2738893 0.5103655 0.0010000", "title": "Resampling" }, { @@ -192,7 +192,7 @@ }, { "location": "/benchmark_experiments/index.html", - "text": "Benchmark Experiments\n\n\nIn a benchmark experiment different learning methods are applied to one or several data sets\nwith the aim to compare and rank the algorithms with respect to one or more\nperformance measures.\n\n\nIn \nmlr\n a benchmark experiment can be conducted by calling function \nbenchmark\n on\na \nlist\n of \nLearner\ns and a \nlist\n of \nTask\ns.\n\nbenchmark\n basically executes \nresample\n for each combination of \nLearner\n\nand \nTask\n.\nYou can specify an individual resampling strategy for each \nTask\n and select one or\nmultiple performance measures to be calculated.\n\n\nExample: One task, two learners, prediction on a single test set\n\n\nWe start with a small example. Two learners, \nlinear discriminant analysis\n and\n\nclassification trees\n, are applied to one classification problem (\nsonar.task\n).\nAs resampling strategy we choose \n\"Holdout\"\n.\nThe performance is thus calculated on one single randomly sampled test data set.\n\n\nIn the example below we create a resample description (\nResampleDesc\n),\nwhich is automatically instantiated by \nbenchmark\n.\nThe instantiation is done only once for each \nTask\n, i.e., the same resample instance\n(\nResampleInstance\n) is used for all learners.\nIt is also possible to directly pass a \nResampleInstance\n.\n\n\nIf you would like to use a \nfixed test data set\n instead of a randomly selected one, you can\ncreate a suitable \nResampleInstance\n through function\n\nmakeFixedHoldoutInstance\n.\n\n\n## Two learners to be compared\nlrns = list(makeLearner(\nclassif.lda\n), makeLearner(\nclassif.rpart\n))\n\n## Choose the resampling strategy\nrdesc = makeResampleDesc(\nHoldout\n)\n\n## Conduct the benchmark experiment\nres = benchmark(lrns, sonar.task, rdesc)\n#\n Task: Sonar-example, Learner: classif.lda\n#\n [Resample] holdout iter: 1\n#\n [Resample] Result: mmce.test.mean= 0.3\n#\n Task: Sonar-example, Learner: classif.rpart\n#\n [Resample] holdout iter: 1\n#\n [Resample] Result: mmce.test.mean=0.286\n\nres\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.3000000\n#\n 2 Sonar-example classif.rpart 0.2857143\n\n\n\n\nIn the printed table every row corresponds to one pair of \nTask\n and \nLearner\n.\nThe entries show the mean misclassification error (\nmmce\n), the default performance\nmeasure for classification, on the test data set.\n\n\nThe result \nres\n is an object of class \nBenchmarkResult\n. Basically, it contains a \nlist\n\nof lists of \nResampleResult\n objects, first ordered by \nTask\n and then by \nLearner\n.\n\n\nmlr\n provides several accessor functions, named \ngetBMR\nwhat_to_extract\n, that permit\nto retrieve information for further analyses. This includes for example the performances\nor predictions of the learning algorithms under consideration.\n\n\nLet's have a look at the benchmark result above.\n\ngetBMRPerformances\n returns individual performances in resampling runs, while\n\ngetBMRAggrPerformances\n gives the aggregated values.\n\n\ngetBMRPerformances(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n iter mmce\n#\n 1 1 0.3\n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n iter mmce\n#\n 1 1 0.2857143\n\ngetBMRAggrPerformances(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n mmce.test.mean \n#\n 0.3 \n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n mmce.test.mean \n#\n 0.2857143\n\n\n\n\nSince we used holdout as resampling strategy, individual and aggregated performance values\ncoincide.\n\n\nOften it is more convenient to work with \ndata.frame\ns. You can easily\nconvert the result structure by setting \nas.df = TRUE\n.\n\n\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter mmce\n#\n 1 Sonar-example classif.lda 1 0.3000000\n#\n 2 Sonar-example classif.rpart 1 0.2857143\n\ngetBMRAggrPerformances(res, as.df = TRUE)\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.3000000\n#\n 2 Sonar-example classif.rpart 0.2857143\n\n\n\n\nFunction \ngetBMRPredictions\n returns the predictions.\nPer default, you get a \nlist\n of lists of \nResamplePrediction\n objects.\nIn most cases you might prefer the \ndata.frame\n version.\n\n\ngetBMRPredictions(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n Resampled Prediction for:\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n#\n predict.type: response\n#\n threshold: \n#\n time (mean): 0.01\n#\n id truth response iter set\n#\n 180 180 M M 1 test\n#\n 100 100 M R 1 test\n#\n 53 53 R M 1 test\n#\n 89 89 R R 1 test\n#\n 92 92 R M 1 test\n#\n 11 11 R R 1 test\n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n Resampled Prediction for:\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n#\n predict.type: response\n#\n threshold: \n#\n time (mean): 0.01\n#\n id truth response iter set\n#\n 180 180 M M 1 test\n#\n 100 100 M M 1 test\n#\n 53 53 R R 1 test\n#\n 89 89 R M 1 test\n#\n 92 92 R M 1 test\n#\n 11 11 R R 1 test\n\nhead(getBMRPredictions(res, as.df = TRUE))\n#\n task.id learner.id id truth response iter set\n#\n 1 Sonar-example classif.lda 180 M M 1 test\n#\n 2 Sonar-example classif.lda 100 M R 1 test\n#\n 3 Sonar-example classif.lda 53 R M 1 test\n#\n 4 Sonar-example classif.lda 89 R R 1 test\n#\n 5 Sonar-example classif.lda 92 R M 1 test\n#\n 6 Sonar-example classif.lda 11 R R 1 test\n\n\n\n\nIt is also easily possible to access results for certain learners or tasks via their\nIDs. For this purpose many \"getter\" functions have a \nlearner.ids\n and a \ntask.ids\n argument.\n\n\nhead(getBMRPredictions(res, learner.ids = \nclassif.rpart\n, as.df = TRUE))\n#\n task.id learner.id id truth response iter set\n#\n 180 Sonar-example classif.rpart 180 M M 1 test\n#\n 100 Sonar-example classif.rpart 100 M M 1 test\n#\n 53 Sonar-example classif.rpart 53 R R 1 test\n#\n 89 Sonar-example classif.rpart 89 R M 1 test\n#\n 92 Sonar-example classif.rpart 92 R M 1 test\n#\n 11 Sonar-example classif.rpart 11 R R 1 test\n\n\n\n\nAs you might recall, you can set the IDs of learners and tasks via the \nid\n option of\n\nmakeLearner\n and \nmake*Task\n.\nMoreover, you can conveniently change the ID of a \nLearner\n via function \nsetId\n.\n\n\nThe IDs of all \nLearner\ns, \nTask\ns and \nMeasure\ns in a benchmark\nexperiment can be retrieved as follows:\n\n\ngetBMRTaskIds(res)\n#\n [1] \nSonar-example\n\n\ngetBMRLearnerIds(res)\n#\n [1] \nclassif.lda\n \nclassif.rpart\n\n\ngetBMRMeasureIds(res)\n#\n [1] \nmmce\n\n\n\n\n\nMoreover, you can extract the employed \nLearner\ns and \nMeasure\ns.\n\n\ngetBMRLearners(res)\n#\n $classif.lda\n#\n Learner classif.lda from package MASS\n#\n Type: classif\n#\n Name: Linear Discriminant Analysis; Short name: lda\n#\n Class: classif.lda\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: \n#\n \n#\n \n#\n $classif.rpart\n#\n Learner classif.rpart from package rpart\n#\n Type: classif\n#\n Name: Decision Tree; Short name: rpart\n#\n Class: classif.rpart\n#\n Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0\n\ngetBMRMeasures(res)\n#\n [[1]]\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nBenchmark analysis and visualization\n\n\nExample: Compare lda, rpart and random Forest\n\n\nAs an introductory example, we compare three learners (\nlda\n, \nrpart\n\nand \nrandom forest\n).\nSince the default learner IDs are a little long, we choose shorter names.\n\n\nAs comparing their performance on only one task does not provide a generally valid answer, the\ncomparison is performed on several tasks, using \nmmce\n as primary performance\nmeasure.\nPackage \nmlbench\n provides additional \nTask\ns, that we will use for our validation.\n\n\nFor both tasks 10-fold cross-validation is chosen as resampling strategy.\nThis is achieved by passing a single resample description to \nbenchmark\n, which is then\ninstantiated automatically once for each \nTask\n. This way, the same instance is used for all\nlearners applied to one task.\n\n\nIt is also possible to choose a different resampling strategy for each \nTask\n by passing a\n\nlist\n of the same length as the number of tasks that can contain both\n\nResampleDesc\ns and \nResampleInstance\ns.\n\n\nIn this example additional to the mean misclassification error (\nmmce\n),\nthe balanced error rate (\nber\n) and accuracy (\nacc\n) are calculated.\n\n\n## Create a list of learners\nlrns = list(\n makeLearner(\nclassif.lda\n, id = \nlda\n),\n makeLearner(\nclassif.rpart\n, id = \nrpart\n),\n makeLearner(\nclassif.randomForest\n, id = \nrandomForest\n)\n)\n\n## Get additional Tasks from package mlbench\nring.task = convertMLBenchObjToTask(\nmlbench.ringnorm\n, n = 600)\nwave.task = convertMLBenchObjToTask(\nmlbench.waveform\n, n = 600)\n\ntasks = list(iris.task, sonar.task, pid.task, ring.task, wave.task)\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nmeas = list(mmce, ber, acc)\nres = benchmark(lrns, tasks, rdesc, meas, show.info = FALSE)\nres\n#\n task.id learner.id mmce.test.mean ber.test.mean\n#\n 1 iris-example lda 0.02000000 0.02222222\n#\n 2 iris-example rpart 0.08000000 0.07555556\n#\n 3 iris-example randomForest 0.05333333 0.05250000\n#\n 4 mlbench.ringnorm lda 0.35000000 0.34605671\n#\n 5 mlbench.ringnorm rpart 0.17333333 0.17313632\n#\n 6 mlbench.ringnorm randomForest 0.05833333 0.05806121\n#\n 7 mlbench.waveform lda 0.19000000 0.18257244\n#\n 8 mlbench.waveform rpart 0.28833333 0.28765247\n#\n 9 mlbench.waveform randomForest 0.16500000 0.16306057\n#\n 10 PimaIndiansDiabetes-example lda 0.22778537 0.27148893\n#\n 11 PimaIndiansDiabetes-example rpart 0.25133288 0.28967870\n#\n 12 PimaIndiansDiabetes-example randomForest 0.23685919 0.27543146\n#\n 13 Sonar-example lda 0.24619048 0.23986694\n#\n 14 Sonar-example rpart 0.30785714 0.31153361\n#\n 15 Sonar-example randomForest 0.17785714 0.17442696\n#\n acc.test.mean\n#\n 1 0.9800000\n#\n 2 0.9200000\n#\n 3 0.9466667\n#\n 4 0.6500000\n#\n 5 0.8266667\n#\n 6 0.9416667\n#\n 7 0.8100000\n#\n 8 0.7116667\n#\n 9 0.8350000\n#\n 10 0.7722146\n#\n 11 0.7486671\n#\n 12 0.7631408\n#\n 13 0.7538095\n#\n 14 0.6921429\n#\n 15 0.8221429\n\n\n\n\nThe table below shows the three selected \nperformance measures\n across all learners\nand tasks.\nIt can be easily used for plotting as well, but \nmlr\n also features some integrated plots\nthat operate on \nBenchmarkResult\ns.\n\n\nperf = getBMRPerformances(res, as.df = TRUE)\nhead(perf)\n#\n task.id learner.id iter mmce ber acc\n#\n 1 iris-example lda 1 0.0000000 0.0000000 1.0000000\n#\n 2 iris-example lda 2 0.1333333 0.1666667 0.8666667\n#\n 3 iris-example lda 3 0.0000000 0.0000000 1.0000000\n#\n 4 iris-example lda 4 0.0000000 0.0000000 1.0000000\n#\n 5 iris-example lda 5 0.0000000 0.0000000 1.0000000\n#\n 6 iris-example lda 6 0.0000000 0.0000000 1.0000000\n\n\n\n\nA closer look at the result reveals that the \nrandom Forest\n\noutperforms the \nclassification tree\n in every instance, while\n\nlinear discriminant analysis\n performs better than \nrpart\n most\nof the time. Additionally \nlda\n sometimes even beats the random forest implementation.\nWith increasing size of such \nbenchmark\n experiments, those tables become almost unreadable\nand hard to comprehend. In order to gain deeper insight into those results, several plotting\nstrategies are implemented in \nmlr\n.\n\n\nData generation and plotting\n\n\nAs explained in the \nvisualization\n section, \nmlr\n provides\n\ndata generation\n functions, e.g., \ngenerateRankMatrixAsBarData()\n\nthat can be used within \nplot functions\n like \nplotRankMatrixAsBar()\n.\nThe data generated by such data generation functions contains all information required for\nplotting later on.\nThis design additionally enables the user to use the generated data in order to create custom\nplots using other packages such as \nlattice\n, \nggplot2\n or the base \nplot\n\nfunctions.\nPlots are produced using \nggplot2\n or \nggvis\n, as these packages enable further customization,\nsuch as renaming plot elements or changing colors.\n\n\nAn example that demonstrates how a plot can be extended is given below.\nAs a first result, we might want to compare \nLearner\n performance across\n\nresample\n iterations, in order to get insight on how the individual\n\nLearner\ns perform within the different tasks. Additionally we color the boxes\naccording to the learner in order to facilitate distinction by adding additional\n\naesthetics\n.\n\n\nplotBenchmarkResult(res, measure = mmce, pretty.names = FALSE) +\n aes(color = learner.id) + facet_wrap(~ task.id, nrow = 2)\n\n\n\n\n \n\n\nAs no data transformation or computation was conducted for this plot, and all data used for\nplotting comes from \ngetBMRPerformances\n, no data generation function is needed.\n\n\nTaking into account the properties of such resample results, such as an eventual lack of\ncommensurability, and the hypothesis tests that have to be employed for such comparisons, a\n\nrank\n structure for plots might provide valuable additional insight.\n\n\nAlthough \nBenchmarkResult\ns usually provide a variety of measures, subsequent analysis is\nbest performed on one or few selected measures. In this case\n\nmean misclassification error\n is chosen as it is easy to compute and understand.\nIn order to break down the benchmark result, we can either display a\n\nrank matrix\n or plot it as a \nbar plot\n.\nAttention: Due to the plotting structure, ties that might occur in the aggregated\n\nBenchmarkResult\n are broken randomly.\n\n\n## Convert to a rank matrix\nm = convertBMRToRankMatrix(res, mmce)\nm\n#\n iris-example mlbench.ringnorm mlbench.waveform\n#\n lda 1 3 2\n#\n rpart 3 2 3\n#\n randomForest 2 1 1\n#\n PimaIndiansDiabetes-example Sonar-example\n#\n lda 1 2\n#\n rpart 3 3\n#\n randomForest 2 1\n\n## Plot the rank matrix\ng = generateRankMatrixAsBarData(res, mmce, pos = \ntile\n)\nplotRankMatrixAsBar(g)\n\n\n\n\n \n\n\nAs an additional visualization, we can compare the performances using a \nbenchmark summary plot\n.\nThis plot displays the \nLearner\ns' performance in comparison to the best or worst\nperforming learner within this task. It facilitates finding tasks where learners perform similarly.\nAdditionally it might allow to uncover structures within the experiment that might be hidden\notherwise.\n\n\ng = generateBenchmarkSummaryData(res, mmce, fill = \nbest\n)\nplotBenchmarkSummary(g)\n\n\n\n\n \n\n\nComparing learners using hypothesis tests\n\n\nMany researchers feel the need to display an algorithm's superiority by employing some sort\nof hypothesis testing. As non-parametric tests seem better suited for such benchmark results\nthe tests provided in \nmlr\n are the \nOverall Friedman test\n and the\n\nFriedman-Nemenyi post hoc test\n.\n\n\nWhile the ad hoc \nFriedman test\n based on \nfriedman.test\n\nfrom the \nstats\n package is testing the hypothesis whether there is a significant difference\nbetween the employed learners, the post hoc \nFriedman-Nemenyi test\n tests\nfor significant differences between all pairs of learners. \nNon parametric\n tests often do\nhave less power then their \nparametric\n counterparts but less assumptions about underlying\ndistributions have to be made. This often means many \ndata sets\n are needed in order to be\nable to show significant differences at reasonable significance levels.\n\n\nIn our example, we want to compare the three \nlearners\n on the selected data sets.\nFirst we might we want to test the hypothesis whether there is a difference between the learners.\n\n\nfriedmanTestBMR(res)\n#\n \n#\n Friedman rank sum test\n#\n \n#\n data: x and learner.id and task.id\n#\n Friedman chi-squared = 5.2, df = 2, p-value = 0.07427\n\n\n\n\nIn order to keep the computation time for this tutorial small, the \nLearner\ns\nare only evaluated on 6 tasks. This also means that we operate on a relatively low significance\nlevel \n\\alpha = 0.1\n.\nAs we can reject the null hypothesis of the Friedman test at a reasonable significance level\nwe might now want to test where these differences lie exactly.\n\n\nfriedmanPostHocTestBMR(res, p.value = 0.1)\n#\n \n#\n Pairwise comparisons using Nemenyi multiple comparison test \n#\n with q approximation for unreplicated blocked data \n#\n \n#\n data: x and learner.id and task.id \n#\n \n#\n lda rpart\n#\n rpart 0.254 - \n#\n randomForest 0.802 0.069\n#\n \n#\n P value adjustment method: none\n\n\n\n\nAt this level of significance, we can accept the hypothesis that there exists a significant\ndifference between the decision tree (\nrpart\n) and the\n\nrandom Forest\n.\n\n\nCritical differences diagram\n\n\nIn order to visualize differently performing learners, a\n\ncritical differences diagram\n can be plotted, using either the\nNemenyi test (\ntest = \"nemenyi\"\n) or the Bonferroni-Dunn test (\ntest = \"bd\"\n).\n\n\nInterpretation:\n\nLearners are drawn on the x-axis according to their mean rank.\n\n\n\n\nChoosing \ntest = \"nemenyi\"\n compares all pairs of \nLearner\ns to each other, thus\n the output are groups of not significantly different learners. The diagram connects all groups\n of learners where the mean ranks do not differ by more than the critical differences. Learners\n that are not connected by a bar are significantly different, and the learner(s) with the\n lower mean rank can be considered \"better\" at the chosen significance level.\n\n\nChoosing \ntest = \"bd\"\n performs a \npairwise comparison with a baseline\n. An interval which\n extends by the given \ncritical difference\n in both directions is drawn around the\n \nLearner\n chosen as baseline, though only comparisons with the baseline are\n possible. All learners within the interval are not significantly different, while the\n baseline can be considered better or worse than a given learner which is outside of the\n interval.\n\n\n\n\nCalculation:\n\nThe critical difference \nCD\n can be calculated by\n\nCD = q_\\alpha \\cdot \\sqrt{\\frac{k(k+1)}{6N}},\n\nwhere \nq_\\alpha\n comes from the studentized range statistic divided by \n\\sqrt{2}\n.\nFor details see \nDemsar (2006)\n.\n\n\n## Nemenyi test\ng = generateCritDifferencesData(res, p.value = 0.1, test = \nnemenyi\n)\nplotCritDifferences(g) + coord_cartesian(xlim = c(-1,5), ylim = c(0,2))\n\n\n\n\n \n\n\n## Bonferroni-Dunn test\ng = generateCritDifferencesData(res, p.value = 0.1, test = \nbd\n, baseline = \nrandomForest\n)\nplotCritDifferences(g) + coord_cartesian(xlim = c(-1,5), ylim = c(0,2))\n\n\n\n\n \n\n\nCustom plots\n\n\nThis section is dedicated to some custom plots not yet integrated into \nmlr\n, that might\nbe important to the researcher.\nInstead of just comparing mean performance values it is generally preferable to have a look\nat the distribution of performance values obtained in individual resampling iterations.\nThe individual performances on the 10 folds for every task and learner are retrieved below.\n\n\nperf = getBMRPerformances(res, as.df = TRUE)\n\n## Plot density for two examples\nqplot(mmce, colour = learner.id, facets = . ~ task.id,\n data = perf[perf$task.id %in% c(\niris-example\n, \nSonar-example\n),], geom = \ndensity\n)\n\n\n\n\n \n\n\nIn order to plot both performance measures in parallel, \nperf\n is reshaped to long format.\nBelow we generate grouped boxplots and densityplots for some tasks, learners and measures.\n\n\n## Compare ber and mmce\ndf = reshape2::melt(perf, id.vars = c(\ntask.id\n, \nlearner.id\n, \niter\n),\n measure.vars = c(\nacc\n, \nmmce\n, \nber\n))\ndf = df[df$variable != \nacc\n,]\nhead(df)\n#\n task.id learner.id iter variable value\n#\n 151 iris-example lda 1 mmce 0.0000000\n#\n 152 iris-example lda 2 mmce 0.1333333\n#\n 153 iris-example lda 3 mmce 0.0000000\n#\n 154 iris-example lda 4 mmce 0.0000000\n#\n 155 iris-example lda 5 mmce 0.0000000\n#\n 156 iris-example lda 6 mmce 0.0000000\n\nqplot(variable, value, data = df, colour = learner.id, geom = \nboxplot\n,\n xlab = \nmeasure\n, ylab = \nperformance\n) + facet_wrap(~ task.id, nrow = 2)\n\n\n\n\n \n\n\nIt might also be useful to assess if learner performances in single resampling iterations,\ni.e., in one fold, are related.\nThis might help to gain further insight, for example by having a closer look at bootstrap\nsamples where one learner performs exceptionally well while another one is fairly bad.\nMoreover, this might be useful for the construction of ensembles of learning algorithms.\nBelow, function \nggpairs\n from package \nGGally\n is used to generate a scatterplot\nmatrix of mean misclassification errors (\nmmce\n) on the \nSonar\n\ndata set.\n\n\nperf = getBMRPerformances(res, task.id = \nSonar-example\n, as.df = TRUE)\ndf = reshape(perf, direction = \nwide\n, v.names = c(\nacc\n, \nmmce\n, \nber\n), timevar = \nlearner.id\n,\n idvar = c(\ntask.id\n, \niter\n))\n\nhead(df)\n#\n task.id iter acc.lda mmce.lda ber.lda acc.rpart mmce.rpart\n#\n 1 Sonar-example 1 0.7142857 0.2857143 0.2777778 0.7142857 0.2857143\n#\n 2 Sonar-example 2 0.7619048 0.2380952 0.2222222 0.7619048 0.2380952\n#\n 3 Sonar-example 3 0.6666667 0.3333333 0.3194444 0.7142857 0.2857143\n#\n 4 Sonar-example 4 0.7619048 0.2380952 0.2163462 0.6666667 0.3333333\n#\n 5 Sonar-example 5 0.8571429 0.1428571 0.1454545 0.7142857 0.2857143\n#\n 6 Sonar-example 6 0.6000000 0.4000000 0.3939394 0.5500000 0.4500000\n#\n ber.rpart acc.randomForest mmce.randomForest ber.randomForest\n#\n 1 0.2916667 0.8571429 0.14285714 0.15277778\n#\n 2 0.2222222 0.7619048 0.23809524 0.22222222\n#\n 3 0.2777778 0.7142857 0.28571429 0.27777778\n#\n 4 0.3413462 0.9523810 0.04761905 0.03846154\n#\n 5 0.2909091 0.8095238 0.19047619 0.20000000\n#\n 6 0.4393939 0.7500000 0.25000000 0.24747475\n\nGGally::ggpairs(df, c(4,7,10))\n\n\n\n\n \n\n\nFurther comments\n\n\n\n\nIn the examples shown in this section we applied \"raw\" learning algorithms, but often things\nare more complicated.\nAt the very least, many learners have hyperparameters that need to be tuned to get sensible\nresults.\nReliable performance estimates can be obtained by \nnested resampling\n,\ni.e., by doing the tuning in an\ninner resampling loop while estimating the performance in an outer loop.\nMoreover, you might want to combine learners with pre-processing steps like imputation, scaling,\noutlier removal, dimensionality reduction or feature selection and so on.\nAll this can be easily done by using \nmlr\n's wrapper functionality.\nThe general principle is explained in the section about \nwrapped learners\n in the\nAdvanced part of this tutorial. There are also several sections devoted to common pre-processing\nsteps.\n\n\nBenchmark experiments can very quickly become computationally demanding. \nmlr\n offers\nsome possibilities for \nparallelization\n.", + "text": "Benchmark Experiments\n\n\nIn a benchmark experiment different learning methods are applied to one or several data sets\nwith the aim to compare and rank the algorithms with respect to one or more\nperformance measures.\n\n\nIn \nmlr\n a benchmark experiment can be conducted by calling function \nbenchmark\n on\na \nlist\n of \nLearner\ns and a \nlist\n of \nTask\ns.\n\nbenchmark\n basically executes \nresample\n for each combination of \nLearner\n\nand \nTask\n.\nYou can specify an individual resampling strategy for each \nTask\n and select one or\nmultiple performance measures to be calculated.\n\n\nExample: One task, two learners, prediction on a single test set\n\n\nWe start with a small example. Two learners, \nlinear discriminant analysis\n and\n\nclassification trees\n, are applied to one classification problem (\nsonar.task\n).\nAs resampling strategy we choose \n\"Holdout\"\n.\nThe performance is thus calculated on one single randomly sampled test data set.\n\n\nIn the example below we create a resample description (\nResampleDesc\n),\nwhich is automatically instantiated by \nbenchmark\n.\nThe instantiation is done only once for each \nTask\n, i.e., the same resample instance\n(\nResampleInstance\n) is used for all learners.\nIt is also possible to directly pass a \nResampleInstance\n.\n\n\nIf you would like to use a \nfixed test data set\n instead of a randomly selected one, you can\ncreate a suitable \nResampleInstance\n through function\n\nmakeFixedHoldoutInstance\n.\n\n\n## Two learners to be compared\nlrns = list(makeLearner(\nclassif.lda\n), makeLearner(\nclassif.rpart\n))\n\n## Choose the resampling strategy\nrdesc = makeResampleDesc(\nHoldout\n)\n\n## Conduct the benchmark experiment\nres = benchmark(lrns, sonar.task, rdesc)\n#\n Task: Sonar-example, Learner: classif.lda\n#\n [Resample] holdout iter: 1\n#\n [Resample] Result: mmce.test.mean= 0.3\n#\n Task: Sonar-example, Learner: classif.rpart\n#\n [Resample] holdout iter: 1\n#\n [Resample] Result: mmce.test.mean=0.286\n\nres\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.3000000\n#\n 2 Sonar-example classif.rpart 0.2857143\n\n\n\n\nIn the printed table every row corresponds to one pair of \nTask\n and \nLearner\n.\nThe entries show the mean misclassification error (\nmmce\n), the default performance\nmeasure for classification, on the test data set.\n\n\nThe result \nres\n is an object of class \nBenchmarkResult\n. Basically, it contains a \nlist\n\nof lists of \nResampleResult\n objects, first ordered by \nTask\n and then by \nLearner\n.\n\n\nmlr\n provides several accessor functions, named \ngetBMR\nwhat_to_extract\n, that permit\nto retrieve information for further analyses. This includes for example the performances\nor predictions of the learning algorithms under consideration.\n\n\nLet's have a look at the benchmark result above.\n\ngetBMRPerformances\n returns individual performances in resampling runs, while\n\ngetBMRAggrPerformances\n gives the aggregated values.\n\n\ngetBMRPerformances(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n iter mmce\n#\n 1 1 0.3\n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n iter mmce\n#\n 1 1 0.2857143\n\ngetBMRAggrPerformances(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n mmce.test.mean \n#\n 0.3 \n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n mmce.test.mean \n#\n 0.2857143\n\n\n\n\nSince we used holdout as resampling strategy, individual and aggregated performance values\ncoincide.\n\n\nOften it is more convenient to work with \ndata.frame\ns. You can easily\nconvert the result structure by setting \nas.df = TRUE\n.\n\n\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter mmce\n#\n 1 Sonar-example classif.lda 1 0.3000000\n#\n 2 Sonar-example classif.rpart 1 0.2857143\n\ngetBMRAggrPerformances(res, as.df = TRUE)\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.3000000\n#\n 2 Sonar-example classif.rpart 0.2857143\n\n\n\n\nFunction \ngetBMRPredictions\n returns the predictions.\nPer default, you get a \nlist\n of lists of \nResamplePrediction\n objects.\nIn most cases you might prefer the \ndata.frame\n version.\n\n\ngetBMRPredictions(res)\n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.lda\n#\n Resampled Prediction for:\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n#\n predict.type: response\n#\n threshold: \n#\n time (mean): 0.02\n#\n id truth response iter set\n#\n 180 180 M M 1 test\n#\n 100 100 M R 1 test\n#\n 53 53 R M 1 test\n#\n 89 89 R R 1 test\n#\n 92 92 R M 1 test\n#\n 11 11 R R 1 test\n#\n \n#\n $`Sonar-example`$classif.rpart\n#\n Resampled Prediction for:\n#\n Resample description: holdout with 0.67 split rate.\n#\n Predict: test\n#\n Stratification: FALSE\n#\n predict.type: response\n#\n threshold: \n#\n time (mean): 0.01\n#\n id truth response iter set\n#\n 180 180 M M 1 test\n#\n 100 100 M M 1 test\n#\n 53 53 R R 1 test\n#\n 89 89 R M 1 test\n#\n 92 92 R M 1 test\n#\n 11 11 R R 1 test\n\nhead(getBMRPredictions(res, as.df = TRUE))\n#\n task.id learner.id id truth response iter set\n#\n 1 Sonar-example classif.lda 180 M M 1 test\n#\n 2 Sonar-example classif.lda 100 M R 1 test\n#\n 3 Sonar-example classif.lda 53 R M 1 test\n#\n 4 Sonar-example classif.lda 89 R R 1 test\n#\n 5 Sonar-example classif.lda 92 R M 1 test\n#\n 6 Sonar-example classif.lda 11 R R 1 test\n\n\n\n\nIt is also easily possible to access results for certain learners or tasks via their\nIDs. For this purpose many \"getter\" functions have a \nlearner.ids\n and a \ntask.ids\n argument.\n\n\nhead(getBMRPredictions(res, learner.ids = \nclassif.rpart\n, as.df = TRUE))\n#\n task.id learner.id id truth response iter set\n#\n 180 Sonar-example classif.rpart 180 M M 1 test\n#\n 100 Sonar-example classif.rpart 100 M M 1 test\n#\n 53 Sonar-example classif.rpart 53 R R 1 test\n#\n 89 Sonar-example classif.rpart 89 R M 1 test\n#\n 92 Sonar-example classif.rpart 92 R M 1 test\n#\n 11 Sonar-example classif.rpart 11 R R 1 test\n\n\n\n\nAs you might recall, you can set the IDs of learners and tasks via the \nid\n option of\n\nmakeLearner\n and \nmake*Task\n.\nMoreover, you can conveniently change the ID of a \nLearner\n via function \nsetId\n.\n\n\nThe IDs of all \nLearner\ns, \nTask\ns and \nMeasure\ns in a benchmark\nexperiment can be retrieved as follows:\n\n\ngetBMRTaskIds(res)\n#\n [1] \nSonar-example\n\n\ngetBMRLearnerIds(res)\n#\n [1] \nclassif.lda\n \nclassif.rpart\n\n\ngetBMRMeasureIds(res)\n#\n [1] \nmmce\n\n\n\n\n\nMoreover, you can extract the employed \nLearner\ns and \nMeasure\ns.\n\n\ngetBMRLearners(res)\n#\n $classif.lda\n#\n Learner classif.lda from package MASS\n#\n Type: classif\n#\n Name: Linear Discriminant Analysis; Short name: lda\n#\n Class: classif.lda\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: \n#\n \n#\n \n#\n $classif.rpart\n#\n Learner classif.rpart from package rpart\n#\n Type: classif\n#\n Name: Decision Tree; Short name: rpart\n#\n Class: classif.rpart\n#\n Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0\n\ngetBMRMeasures(res)\n#\n [[1]]\n#\n Name: Mean misclassification error\n#\n Performance measure: mmce\n#\n Properties: classif,classif.multi,req.pred,req.truth\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 1\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nBenchmark analysis and visualization\n\n\nExample: Compare lda, rpart and random Forest\n\n\nAs an introductory example, we compare three learners (\nlda\n, \nrpart\n\nand \nrandom forest\n).\nSince the default learner IDs are a little long, we choose shorter names.\n\n\nAs comparing their performance on only one task does not provide a generally valid answer, the\ncomparison is performed on several tasks, using \nmmce\n as primary performance\nmeasure.\nPackage \nmlbench\n provides additional \nTask\ns, that we will use for our validation.\n\n\nFor both tasks 10-fold cross-validation is chosen as resampling strategy.\nThis is achieved by passing a single resample description to \nbenchmark\n, which is then\ninstantiated automatically once for each \nTask\n. This way, the same instance is used for all\nlearners applied to one task.\n\n\nIt is also possible to choose a different resampling strategy for each \nTask\n by passing a\n\nlist\n of the same length as the number of tasks that can contain both\n\nResampleDesc\ns and \nResampleInstance\ns.\n\n\nIn this example additional to the mean misclassification error (\nmmce\n),\nthe balanced error rate (\nber\n) and accuracy (\nacc\n) are calculated.\n\n\n## Create a list of learners\nlrns = list(\n makeLearner(\nclassif.lda\n, id = \nlda\n),\n makeLearner(\nclassif.rpart\n, id = \nrpart\n),\n makeLearner(\nclassif.randomForest\n, id = \nrandomForest\n)\n)\n\n## Get additional Tasks from package mlbench\nring.task = convertMLBenchObjToTask(\nmlbench.ringnorm\n, n = 600)\nwave.task = convertMLBenchObjToTask(\nmlbench.waveform\n, n = 600)\n\ntasks = list(iris.task, sonar.task, pid.task, ring.task, wave.task)\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nmeas = list(mmce, ber, acc)\nres = benchmark(lrns, tasks, rdesc, meas, show.info = FALSE)\nres\n#\n task.id learner.id mmce.test.mean ber.test.mean\n#\n 1 iris-example lda 0.02000000 0.02222222\n#\n 2 iris-example rpart 0.08000000 0.07555556\n#\n 3 iris-example randomForest 0.05333333 0.05250000\n#\n 4 mlbench.ringnorm lda 0.35000000 0.34605671\n#\n 5 mlbench.ringnorm rpart 0.17333333 0.17313632\n#\n 6 mlbench.ringnorm randomForest 0.05833333 0.05806121\n#\n 7 mlbench.waveform lda 0.19000000 0.18257244\n#\n 8 mlbench.waveform rpart 0.28833333 0.28765247\n#\n 9 mlbench.waveform randomForest 0.16500000 0.16306057\n#\n 10 PimaIndiansDiabetes-example lda 0.22778537 0.27148893\n#\n 11 PimaIndiansDiabetes-example rpart 0.25133288 0.28967870\n#\n 12 PimaIndiansDiabetes-example randomForest 0.23685919 0.27543146\n#\n 13 Sonar-example lda 0.24619048 0.23986694\n#\n 14 Sonar-example rpart 0.30785714 0.31153361\n#\n 15 Sonar-example randomForest 0.17785714 0.17442696\n#\n acc.test.mean\n#\n 1 0.9800000\n#\n 2 0.9200000\n#\n 3 0.9466667\n#\n 4 0.6500000\n#\n 5 0.8266667\n#\n 6 0.9416667\n#\n 7 0.8100000\n#\n 8 0.7116667\n#\n 9 0.8350000\n#\n 10 0.7722146\n#\n 11 0.7486671\n#\n 12 0.7631408\n#\n 13 0.7538095\n#\n 14 0.6921429\n#\n 15 0.8221429\n\n\n\n\nThe table below shows the three selected \nperformance measures\n across all learners\nand tasks.\nIt can be easily used for plotting as well, but \nmlr\n also features some integrated plots\nthat operate on \nBenchmarkResult\ns.\n\n\nperf = getBMRPerformances(res, as.df = TRUE)\nhead(perf)\n#\n task.id learner.id iter mmce ber acc\n#\n 1 iris-example lda 1 0.0000000 0.0000000 1.0000000\n#\n 2 iris-example lda 2 0.1333333 0.1666667 0.8666667\n#\n 3 iris-example lda 3 0.0000000 0.0000000 1.0000000\n#\n 4 iris-example lda 4 0.0000000 0.0000000 1.0000000\n#\n 5 iris-example lda 5 0.0000000 0.0000000 1.0000000\n#\n 6 iris-example lda 6 0.0000000 0.0000000 1.0000000\n\n\n\n\nA closer look at the result reveals that the \nrandom Forest\n\noutperforms the \nclassification tree\n in every instance, while\n\nlinear discriminant analysis\n performs better than \nrpart\n most\nof the time. Additionally \nlda\n sometimes even beats the random forest implementation.\nWith increasing size of such \nbenchmark\n experiments, those tables become almost unreadable\nand hard to comprehend. In order to gain deeper insight into those results, several plotting\nstrategies are implemented in \nmlr\n.\n\n\nData generation and plotting\n\n\nAs explained in the \nvisualization\n section, \nmlr\n provides\n\ndata generation\n functions, e.g., \ngenerateRankMatrixAsBarData()\n\nthat can be used within \nplot functions\n like \nplotRankMatrixAsBar()\n.\nThe data generated by such data generation functions contains all information required for\nplotting later on.\nThis design additionally enables the user to use the generated data in order to create custom\nplots using other packages such as \nlattice\n, \nggplot2\n or the base \nplot\n\nfunctions.\nPlots are produced using \nggplot2\n or \nggvis\n, as these packages enable further customization,\nsuch as renaming plot elements or changing colors.\n\n\nAn example that demonstrates how a plot can be extended is given below.\nAs a first result, we might want to compare \nLearner\n performance across\n\nresample\n iterations, in order to get insight on how the individual\n\nLearner\ns perform within the different tasks. Additionally we color the boxes\naccording to the learner in order to facilitate distinction by adding additional\n\naesthetics\n.\n\n\nplotBenchmarkResult(res, measure = mmce, pretty.names = FALSE) +\n aes(color = learner.id) + facet_wrap(~ task.id, nrow = 2)\n\n\n\n\n \n\n\nAs no data transformation or computation was conducted for this plot, and all data used for\nplotting comes from \ngetBMRPerformances\n, no data generation function is needed.\n\n\nTaking into account the properties of such resample results, such as an eventual lack of\ncommensurability, and the hypothesis tests that have to be employed for such comparisons, a\n\nrank\n structure for plots might provide valuable additional insight.\n\n\nAlthough \nBenchmarkResult\ns usually provide a variety of measures, subsequent analysis is\nbest performed on one or few selected measures. In this case\n\nmean misclassification error\n is chosen as it is easy to compute and understand.\nIn order to break down the benchmark result, we can either display a\n\nrank matrix\n or plot it as a \nbar plot\n.\nAttention: Due to the plotting structure, ties that might occur in the aggregated\n\nBenchmarkResult\n are broken randomly.\n\n\n## Convert to a rank matrix\nm = convertBMRToRankMatrix(res, mmce)\nm\n#\n iris-example mlbench.ringnorm mlbench.waveform\n#\n lda 1 3 2\n#\n rpart 3 2 3\n#\n randomForest 2 1 1\n#\n PimaIndiansDiabetes-example Sonar-example\n#\n lda 1 2\n#\n rpart 3 3\n#\n randomForest 2 1\n\n## Plot the rank matrix\ng = generateRankMatrixAsBarData(res, mmce, pos = \ntile\n)\nplotRankMatrixAsBar(g)\n\n\n\n\n \n\n\nAs an additional visualization, we can compare the performances using a \nbenchmark summary plot\n.\nThis plot displays the \nLearner\ns' performance in comparison to the best or worst\nperforming learner within this task. It facilitates finding tasks where learners perform similarly.\nAdditionally it might allow to uncover structures within the experiment that might be hidden\notherwise.\n\n\ng = generateBenchmarkSummaryData(res, mmce, fill = \nbest\n)\nplotBenchmarkSummary(g)\n\n\n\n\n \n\n\nComparing learners using hypothesis tests\n\n\nMany researchers feel the need to display an algorithm's superiority by employing some sort\nof hypothesis testing. As non-parametric tests seem better suited for such benchmark results\nthe tests provided in \nmlr\n are the \nOverall Friedman test\n and the\n\nFriedman-Nemenyi post hoc test\n.\n\n\nWhile the ad hoc \nFriedman test\n based on \nfriedman.test\n\nfrom the \nstats\n package is testing the hypothesis whether there is a significant difference\nbetween the employed learners, the post hoc \nFriedman-Nemenyi test\n tests\nfor significant differences between all pairs of learners. \nNon parametric\n tests often do\nhave less power then their \nparametric\n counterparts but less assumptions about underlying\ndistributions have to be made. This often means many \ndata sets\n are needed in order to be\nable to show significant differences at reasonable significance levels.\n\n\nIn our example, we want to compare the three \nlearners\n on the selected data sets.\nFirst we might we want to test the hypothesis whether there is a difference between the learners.\n\n\nfriedmanTestBMR(res)\n#\n \n#\n Friedman rank sum test\n#\n \n#\n data: x and learner.id and task.id\n#\n Friedman chi-squared = 5.2, df = 2, p-value = 0.07427\n\n\n\n\nIn order to keep the computation time for this tutorial small, the \nLearner\ns\nare only evaluated on 6 tasks. This also means that we operate on a relatively low significance\nlevel \n\\alpha = 0.1\n.\nAs we can reject the null hypothesis of the Friedman test at a reasonable significance level\nwe might now want to test where these differences lie exactly.\n\n\nfriedmanPostHocTestBMR(res, p.value = 0.1)\n#\n \n#\n Pairwise comparisons using Nemenyi multiple comparison test \n#\n with q approximation for unreplicated blocked data \n#\n \n#\n data: x and learner.id and task.id \n#\n \n#\n lda rpart\n#\n rpart 0.254 - \n#\n randomForest 0.802 0.069\n#\n \n#\n P value adjustment method: none\n\n\n\n\nAt this level of significance, we can accept the hypothesis that there exists a significant\ndifference between the decision tree (\nrpart\n) and the\n\nrandom Forest\n.\n\n\nCritical differences diagram\n\n\nIn order to visualize differently performing learners, a\n\ncritical differences diagram\n can be plotted, using either the\nNemenyi test (\ntest = \"nemenyi\"\n) or the Bonferroni-Dunn test (\ntest = \"bd\"\n).\n\n\nInterpretation:\n\nLearners are drawn on the x-axis according to their mean rank.\n\n\n\n\nChoosing \ntest = \"nemenyi\"\n compares all pairs of \nLearner\ns to each other, thus\n the output are groups of not significantly different learners. The diagram connects all groups\n of learners where the mean ranks do not differ by more than the critical differences. Learners\n that are not connected by a bar are significantly different, and the learner(s) with the\n lower mean rank can be considered \"better\" at the chosen significance level.\n\n\nChoosing \ntest = \"bd\"\n performs a \npairwise comparison with a baseline\n. An interval which\n extends by the given \ncritical difference\n in both directions is drawn around the\n \nLearner\n chosen as baseline, though only comparisons with the baseline are\n possible. All learners within the interval are not significantly different, while the\n baseline can be considered better or worse than a given learner which is outside of the\n interval.\n\n\n\n\nCalculation:\n\nThe critical difference \nCD\n can be calculated by\n\nCD = q_\\alpha \\cdot \\sqrt{\\frac{k(k+1)}{6N}},\n\nwhere \nq_\\alpha\n comes from the studentized range statistic divided by \n\\sqrt{2}\n.\nFor details see \nDemsar (2006)\n.\n\n\n## Nemenyi test\ng = generateCritDifferencesData(res, p.value = 0.1, test = \nnemenyi\n)\nplotCritDifferences(g) + coord_cartesian(xlim = c(-1,5), ylim = c(0,2))\n\n\n\n\n \n\n\n## Bonferroni-Dunn test\ng = generateCritDifferencesData(res, p.value = 0.1, test = \nbd\n, baseline = \nrandomForest\n)\nplotCritDifferences(g) + coord_cartesian(xlim = c(-1,5), ylim = c(0,2))\n\n\n\n\n \n\n\nCustom plots\n\n\nThis section is dedicated to some custom plots not yet integrated into \nmlr\n, that might\nbe important to the researcher.\nInstead of just comparing mean performance values it is generally preferable to have a look\nat the distribution of performance values obtained in individual resampling iterations.\nThe individual performances on the 10 folds for every task and learner are retrieved below.\n\n\nperf = getBMRPerformances(res, as.df = TRUE)\n\n## Plot density for two examples\nqplot(mmce, colour = learner.id, facets = . ~ task.id,\n data = perf[perf$task.id %in% c(\niris-example\n, \nSonar-example\n),], geom = \ndensity\n)\n\n\n\n\n \n\n\nIn order to plot both performance measures in parallel, \nperf\n is reshaped to long format.\nBelow we generate grouped boxplots and densityplots for some tasks, learners and measures.\n\n\n## Compare ber and mmce\ndf = reshape2::melt(perf, id.vars = c(\ntask.id\n, \nlearner.id\n, \niter\n),\n measure.vars = c(\nacc\n, \nmmce\n, \nber\n))\ndf = df[df$variable != \nacc\n,]\nhead(df)\n#\n task.id learner.id iter variable value\n#\n 151 iris-example lda 1 mmce 0.0000000\n#\n 152 iris-example lda 2 mmce 0.1333333\n#\n 153 iris-example lda 3 mmce 0.0000000\n#\n 154 iris-example lda 4 mmce 0.0000000\n#\n 155 iris-example lda 5 mmce 0.0000000\n#\n 156 iris-example lda 6 mmce 0.0000000\n\nqplot(variable, value, data = df, colour = learner.id, geom = \nboxplot\n,\n xlab = \nmeasure\n, ylab = \nperformance\n) + facet_wrap(~ task.id, nrow = 2)\n\n\n\n\n \n\n\nIt might also be useful to assess if learner performances in single resampling iterations,\ni.e., in one fold, are related.\nThis might help to gain further insight, for example by having a closer look at bootstrap\nsamples where one learner performs exceptionally well while another one is fairly bad.\nMoreover, this might be useful for the construction of ensembles of learning algorithms.\nBelow, function \nggpairs\n from package \nGGally\n is used to generate a scatterplot\nmatrix of mean misclassification errors (\nmmce\n) on the \nSonar\n\ndata set.\n\n\nperf = getBMRPerformances(res, task.id = \nSonar-example\n, as.df = TRUE)\ndf = reshape(perf, direction = \nwide\n, v.names = c(\nacc\n, \nmmce\n, \nber\n), timevar = \nlearner.id\n,\n idvar = c(\ntask.id\n, \niter\n))\n\nhead(df)\n#\n task.id iter acc.lda mmce.lda ber.lda acc.rpart mmce.rpart\n#\n 1 Sonar-example 1 0.7142857 0.2857143 0.2777778 0.7142857 0.2857143\n#\n 2 Sonar-example 2 0.7619048 0.2380952 0.2222222 0.7619048 0.2380952\n#\n 3 Sonar-example 3 0.6666667 0.3333333 0.3194444 0.7142857 0.2857143\n#\n 4 Sonar-example 4 0.7619048 0.2380952 0.2163462 0.6666667 0.3333333\n#\n 5 Sonar-example 5 0.8571429 0.1428571 0.1454545 0.7142857 0.2857143\n#\n 6 Sonar-example 6 0.6000000 0.4000000 0.3939394 0.5500000 0.4500000\n#\n ber.rpart acc.randomForest mmce.randomForest ber.randomForest\n#\n 1 0.2916667 0.8571429 0.14285714 0.15277778\n#\n 2 0.2222222 0.7619048 0.23809524 0.22222222\n#\n 3 0.2777778 0.7142857 0.28571429 0.27777778\n#\n 4 0.3413462 0.9523810 0.04761905 0.03846154\n#\n 5 0.2909091 0.8095238 0.19047619 0.20000000\n#\n 6 0.4393939 0.7500000 0.25000000 0.24747475\n\nGGally::ggpairs(df, c(4,7,10))\n\n\n\n\n \n\n\nFurther comments\n\n\n\n\nIn the examples shown in this section we applied \"raw\" learning algorithms, but often things\nare more complicated.\nAt the very least, many learners have hyperparameters that need to be tuned to get sensible\nresults.\nReliable performance estimates can be obtained by \nnested resampling\n,\ni.e., by doing the tuning in an\ninner resampling loop while estimating the performance in an outer loop.\nMoreover, you might want to combine learners with pre-processing steps like imputation, scaling,\noutlier removal, dimensionality reduction or feature selection and so on.\nAll this can be easily done by using \nmlr\n's wrapper functionality.\nThe general principle is explained in the section about \nwrapped learners\n in the\nAdvanced part of this tutorial. There are also several sections devoted to common pre-processing\nsteps.\n\n\nBenchmark experiments can very quickly become computationally demanding. \nmlr\n offers\nsome possibilities for \nparallelization\n.", "title": "Benchmark Experiments" }, { @@ -202,7 +202,7 @@ }, { "location": "/benchmark_experiments/index.html#example-one-task-two-learners-prediction-on-a-single-test-set", - "text": "We start with a small example. Two learners, linear discriminant analysis and classification trees , are applied to one classification problem ( sonar.task ).\nAs resampling strategy we choose \"Holdout\" .\nThe performance is thus calculated on one single randomly sampled test data set. In the example below we create a resample description ( ResampleDesc ),\nwhich is automatically instantiated by benchmark .\nThe instantiation is done only once for each Task , i.e., the same resample instance\n( ResampleInstance ) is used for all learners.\nIt is also possible to directly pass a ResampleInstance . If you would like to use a fixed test data set instead of a randomly selected one, you can\ncreate a suitable ResampleInstance through function makeFixedHoldoutInstance . ## Two learners to be compared\nlrns = list(makeLearner( classif.lda ), makeLearner( classif.rpart ))\n\n## Choose the resampling strategy\nrdesc = makeResampleDesc( Holdout )\n\n## Conduct the benchmark experiment\nres = benchmark(lrns, sonar.task, rdesc)\n# Task: Sonar-example, Learner: classif.lda\n# [Resample] holdout iter: 1\n# [Resample] Result: mmce.test.mean= 0.3\n# Task: Sonar-example, Learner: classif.rpart\n# [Resample] holdout iter: 1\n# [Resample] Result: mmce.test.mean=0.286\n\nres\n# task.id learner.id mmce.test.mean\n# 1 Sonar-example classif.lda 0.3000000\n# 2 Sonar-example classif.rpart 0.2857143 In the printed table every row corresponds to one pair of Task and Learner .\nThe entries show the mean misclassification error ( mmce ), the default performance\nmeasure for classification, on the test data set. The result res is an object of class BenchmarkResult . Basically, it contains a list \nof lists of ResampleResult objects, first ordered by Task and then by Learner . mlr provides several accessor functions, named getBMR what_to_extract , that permit\nto retrieve information for further analyses. This includes for example the performances\nor predictions of the learning algorithms under consideration. Let's have a look at the benchmark result above. getBMRPerformances returns individual performances in resampling runs, while getBMRAggrPerformances gives the aggregated values. getBMRPerformances(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# iter mmce\n# 1 1 0.3\n# \n# $`Sonar-example`$classif.rpart\n# iter mmce\n# 1 1 0.2857143\n\ngetBMRAggrPerformances(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# mmce.test.mean \n# 0.3 \n# \n# $`Sonar-example`$classif.rpart\n# mmce.test.mean \n# 0.2857143 Since we used holdout as resampling strategy, individual and aggregated performance values\ncoincide. Often it is more convenient to work with data.frame s. You can easily\nconvert the result structure by setting as.df = TRUE . getBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter mmce\n# 1 Sonar-example classif.lda 1 0.3000000\n# 2 Sonar-example classif.rpart 1 0.2857143\n\ngetBMRAggrPerformances(res, as.df = TRUE)\n# task.id learner.id mmce.test.mean\n# 1 Sonar-example classif.lda 0.3000000\n# 2 Sonar-example classif.rpart 0.2857143 Function getBMRPredictions returns the predictions.\nPer default, you get a list of lists of ResamplePrediction objects.\nIn most cases you might prefer the data.frame version. getBMRPredictions(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# Resampled Prediction for:\n# Resample description: holdout with 0.67 split rate.\n# Predict: test\n# Stratification: FALSE\n# predict.type: response\n# threshold: \n# time (mean): 0.01\n# id truth response iter set\n# 180 180 M M 1 test\n# 100 100 M R 1 test\n# 53 53 R M 1 test\n# 89 89 R R 1 test\n# 92 92 R M 1 test\n# 11 11 R R 1 test\n# \n# $`Sonar-example`$classif.rpart\n# Resampled Prediction for:\n# Resample description: holdout with 0.67 split rate.\n# Predict: test\n# Stratification: FALSE\n# predict.type: response\n# threshold: \n# time (mean): 0.01\n# id truth response iter set\n# 180 180 M M 1 test\n# 100 100 M M 1 test\n# 53 53 R R 1 test\n# 89 89 R M 1 test\n# 92 92 R M 1 test\n# 11 11 R R 1 test\n\nhead(getBMRPredictions(res, as.df = TRUE))\n# task.id learner.id id truth response iter set\n# 1 Sonar-example classif.lda 180 M M 1 test\n# 2 Sonar-example classif.lda 100 M R 1 test\n# 3 Sonar-example classif.lda 53 R M 1 test\n# 4 Sonar-example classif.lda 89 R R 1 test\n# 5 Sonar-example classif.lda 92 R M 1 test\n# 6 Sonar-example classif.lda 11 R R 1 test It is also easily possible to access results for certain learners or tasks via their\nIDs. For this purpose many \"getter\" functions have a learner.ids and a task.ids argument. head(getBMRPredictions(res, learner.ids = classif.rpart , as.df = TRUE))\n# task.id learner.id id truth response iter set\n# 180 Sonar-example classif.rpart 180 M M 1 test\n# 100 Sonar-example classif.rpart 100 M M 1 test\n# 53 Sonar-example classif.rpart 53 R R 1 test\n# 89 Sonar-example classif.rpart 89 R M 1 test\n# 92 Sonar-example classif.rpart 92 R M 1 test\n# 11 Sonar-example classif.rpart 11 R R 1 test As you might recall, you can set the IDs of learners and tasks via the id option of makeLearner and make*Task .\nMoreover, you can conveniently change the ID of a Learner via function setId . The IDs of all Learner s, Task s and Measure s in a benchmark\nexperiment can be retrieved as follows: getBMRTaskIds(res)\n# [1] Sonar-example \n\ngetBMRLearnerIds(res)\n# [1] classif.lda classif.rpart \n\ngetBMRMeasureIds(res)\n# [1] mmce Moreover, you can extract the employed Learner s and Measure s. getBMRLearners(res)\n# $classif.lda\n# Learner classif.lda from package MASS\n# Type: classif\n# Name: Linear Discriminant Analysis; Short name: lda\n# Class: classif.lda\n# Properties: twoclass,multiclass,numerics,factors,prob\n# Predict-Type: response\n# Hyperparameters: \n# \n# \n# $classif.rpart\n# Learner classif.rpart from package rpart\n# Type: classif\n# Name: Decision Tree; Short name: rpart\n# Class: classif.rpart\n# Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n# Predict-Type: response\n# Hyperparameters: xval=0\n\ngetBMRMeasures(res)\n# [[1]]\n# Name: Mean misclassification error\n# Performance measure: mmce\n# Properties: classif,classif.multi,req.pred,req.truth\n# Minimize: TRUE\n# Best: 0; Worst: 1\n# Aggregated by: test.mean\n# Note:", + "text": "We start with a small example. Two learners, linear discriminant analysis and classification trees , are applied to one classification problem ( sonar.task ).\nAs resampling strategy we choose \"Holdout\" .\nThe performance is thus calculated on one single randomly sampled test data set. In the example below we create a resample description ( ResampleDesc ),\nwhich is automatically instantiated by benchmark .\nThe instantiation is done only once for each Task , i.e., the same resample instance\n( ResampleInstance ) is used for all learners.\nIt is also possible to directly pass a ResampleInstance . If you would like to use a fixed test data set instead of a randomly selected one, you can\ncreate a suitable ResampleInstance through function makeFixedHoldoutInstance . ## Two learners to be compared\nlrns = list(makeLearner( classif.lda ), makeLearner( classif.rpart ))\n\n## Choose the resampling strategy\nrdesc = makeResampleDesc( Holdout )\n\n## Conduct the benchmark experiment\nres = benchmark(lrns, sonar.task, rdesc)\n# Task: Sonar-example, Learner: classif.lda\n# [Resample] holdout iter: 1\n# [Resample] Result: mmce.test.mean= 0.3\n# Task: Sonar-example, Learner: classif.rpart\n# [Resample] holdout iter: 1\n# [Resample] Result: mmce.test.mean=0.286\n\nres\n# task.id learner.id mmce.test.mean\n# 1 Sonar-example classif.lda 0.3000000\n# 2 Sonar-example classif.rpart 0.2857143 In the printed table every row corresponds to one pair of Task and Learner .\nThe entries show the mean misclassification error ( mmce ), the default performance\nmeasure for classification, on the test data set. The result res is an object of class BenchmarkResult . Basically, it contains a list \nof lists of ResampleResult objects, first ordered by Task and then by Learner . mlr provides several accessor functions, named getBMR what_to_extract , that permit\nto retrieve information for further analyses. This includes for example the performances\nor predictions of the learning algorithms under consideration. Let's have a look at the benchmark result above. getBMRPerformances returns individual performances in resampling runs, while getBMRAggrPerformances gives the aggregated values. getBMRPerformances(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# iter mmce\n# 1 1 0.3\n# \n# $`Sonar-example`$classif.rpart\n# iter mmce\n# 1 1 0.2857143\n\ngetBMRAggrPerformances(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# mmce.test.mean \n# 0.3 \n# \n# $`Sonar-example`$classif.rpart\n# mmce.test.mean \n# 0.2857143 Since we used holdout as resampling strategy, individual and aggregated performance values\ncoincide. Often it is more convenient to work with data.frame s. You can easily\nconvert the result structure by setting as.df = TRUE . getBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter mmce\n# 1 Sonar-example classif.lda 1 0.3000000\n# 2 Sonar-example classif.rpart 1 0.2857143\n\ngetBMRAggrPerformances(res, as.df = TRUE)\n# task.id learner.id mmce.test.mean\n# 1 Sonar-example classif.lda 0.3000000\n# 2 Sonar-example classif.rpart 0.2857143 Function getBMRPredictions returns the predictions.\nPer default, you get a list of lists of ResamplePrediction objects.\nIn most cases you might prefer the data.frame version. getBMRPredictions(res)\n# $`Sonar-example`\n# $`Sonar-example`$classif.lda\n# Resampled Prediction for:\n# Resample description: holdout with 0.67 split rate.\n# Predict: test\n# Stratification: FALSE\n# predict.type: response\n# threshold: \n# time (mean): 0.02\n# id truth response iter set\n# 180 180 M M 1 test\n# 100 100 M R 1 test\n# 53 53 R M 1 test\n# 89 89 R R 1 test\n# 92 92 R M 1 test\n# 11 11 R R 1 test\n# \n# $`Sonar-example`$classif.rpart\n# Resampled Prediction for:\n# Resample description: holdout with 0.67 split rate.\n# Predict: test\n# Stratification: FALSE\n# predict.type: response\n# threshold: \n# time (mean): 0.01\n# id truth response iter set\n# 180 180 M M 1 test\n# 100 100 M M 1 test\n# 53 53 R R 1 test\n# 89 89 R M 1 test\n# 92 92 R M 1 test\n# 11 11 R R 1 test\n\nhead(getBMRPredictions(res, as.df = TRUE))\n# task.id learner.id id truth response iter set\n# 1 Sonar-example classif.lda 180 M M 1 test\n# 2 Sonar-example classif.lda 100 M R 1 test\n# 3 Sonar-example classif.lda 53 R M 1 test\n# 4 Sonar-example classif.lda 89 R R 1 test\n# 5 Sonar-example classif.lda 92 R M 1 test\n# 6 Sonar-example classif.lda 11 R R 1 test It is also easily possible to access results for certain learners or tasks via their\nIDs. For this purpose many \"getter\" functions have a learner.ids and a task.ids argument. head(getBMRPredictions(res, learner.ids = classif.rpart , as.df = TRUE))\n# task.id learner.id id truth response iter set\n# 180 Sonar-example classif.rpart 180 M M 1 test\n# 100 Sonar-example classif.rpart 100 M M 1 test\n# 53 Sonar-example classif.rpart 53 R R 1 test\n# 89 Sonar-example classif.rpart 89 R M 1 test\n# 92 Sonar-example classif.rpart 92 R M 1 test\n# 11 Sonar-example classif.rpart 11 R R 1 test As you might recall, you can set the IDs of learners and tasks via the id option of makeLearner and make*Task .\nMoreover, you can conveniently change the ID of a Learner via function setId . The IDs of all Learner s, Task s and Measure s in a benchmark\nexperiment can be retrieved as follows: getBMRTaskIds(res)\n# [1] Sonar-example \n\ngetBMRLearnerIds(res)\n# [1] classif.lda classif.rpart \n\ngetBMRMeasureIds(res)\n# [1] mmce Moreover, you can extract the employed Learner s and Measure s. getBMRLearners(res)\n# $classif.lda\n# Learner classif.lda from package MASS\n# Type: classif\n# Name: Linear Discriminant Analysis; Short name: lda\n# Class: classif.lda\n# Properties: twoclass,multiclass,numerics,factors,prob\n# Predict-Type: response\n# Hyperparameters: \n# \n# \n# $classif.rpart\n# Learner classif.rpart from package rpart\n# Type: classif\n# Name: Decision Tree; Short name: rpart\n# Class: classif.rpart\n# Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n# Predict-Type: response\n# Hyperparameters: xval=0\n\ngetBMRMeasures(res)\n# [[1]]\n# Name: Mean misclassification error\n# Performance measure: mmce\n# Properties: classif,classif.multi,req.pred,req.truth\n# Minimize: TRUE\n# Best: 0; Worst: 1\n# Aggregated by: test.mean\n# Note:", "title": "Example: One task, two learners, prediction on a single test set" }, { @@ -262,7 +262,7 @@ }, { "location": "/configureMlr/index.html", - "text": "Configuring mlr\n\n\nIf you really know what you are doing you may think \nmlr\n is limiting you in certain ways.\n\nmlr\n is designed to make usage errors due to typos or invalid parameter values\nas unlikely as possible.\nBut sometimes you want to break those barriers and get full access.\nFor all available options, simply refer to the documentation of \nconfigureMlr\n.\n\n\nFunction \nconfigureMlr\n permits to set options globally for your current \nR\n session.\n\n\nIt is also possible to set options locally.\n\n\n\n\nAll options referring to the behavior of learners (these are all options except \nshow.info\n)\n can be set for an individual learner via the \nconfig\n argument of \nmakeLearner\n.\n The local precedes the global configuration.\n\n\nSome functions like \nresample\n, \nbenchmark\n, \nselectFeatures\n, \ntuneParams\n,\n and \ntuneParamsMultiCrit\n have a \nshow.info\n flag that controls if verbose messages\n are shown. The default value of \nshow.info\n can be set by \nconfigureMlr\n.\n\n\n\n\nExample: Reducing the output on the console\n\n\nYou are bothered by all the output on the console like in this example?\n\n\nrdesc = makeResampleDesc(\nHoldout\n)\nr = resample(\nclassif.multinom\n, iris.task, rdesc)\n#\n [Resample] holdout iter: 1\n#\n # weights: 18 (10 variable)\n#\n initial value 109.861229 \n#\n iter 10 value 12.256619\n#\n iter 20 value 3.638740\n#\n iter 30 value 3.228628\n#\n iter 40 value 2.951100\n#\n iter 50 value 2.806521\n#\n iter 60 value 2.739076\n#\n iter 70 value 2.522206\n#\n iter 80 value 2.485225\n#\n iter 90 value 2.381397\n#\n iter 100 value 2.360602\n#\n final value 2.360602 \n#\n stopped after 100 iterations\n#\n [Resample] Result: mmce.test.mean=0.02\n\n\n\n\nYou can suppress the output for this \nLearner\n and this \nresample\n call as follows:\n\n\nlrn = makeLearner(\nclassif.multinom\n, config = list(show.learner.output = FALSE))\nr = resample(lrn, iris.task, rdesc, show.info = FALSE)\n\n\n\n\n(Note that \nmultinom\n has a \ntrace\n switch that can alternatively be used to turn off\nthe progress messages.)\n\n\nTo globally suppress the output for all subsequent learners and calls to \nresample\n,\n\nbenchmark\n, etc. try the following:\n\n\nconfigureMlr(show.learner.output = FALSE, show.info = FALSE)\nr = resample(\nclassif.multinom\n, iris.task, rdesc)\n\n\n\n\nAccessing and resetting the configuration\n\n\nFunction \ngetMlrOptions\n returns a \nlist\n with the current configuration.\n\n\ngetMlrOptions()\n#\n $on.learner.error\n#\n [1] \nstop\n\n#\n \n#\n $on.learner.warning\n#\n [1] \nwarn\n\n#\n \n#\n $on.par.out.of.bounds\n#\n [1] \nstop\n\n#\n \n#\n $on.par.without.desc\n#\n [1] \nstop\n\n#\n \n#\n $show.info\n#\n [1] FALSE\n#\n \n#\n $show.learner.output\n#\n [1] FALSE\n\n\n\n\nTo restore the default configuration call \nconfigureMlr\n with an empty argument list.\n\n\nconfigureMlr()\n\n\n\n\ngetMlrOptions()\n#\n $on.learner.error\n#\n [1] \nstop\n\n#\n \n#\n $on.learner.warning\n#\n [1] \nwarn\n\n#\n \n#\n $on.par.out.of.bounds\n#\n [1] \nstop\n\n#\n \n#\n $on.par.without.desc\n#\n [1] \nstop\n\n#\n \n#\n $show.info\n#\n [1] TRUE\n#\n \n#\n $show.learner.output\n#\n [1] TRUE\n\n\n\n\nExample: Turning off parameter checking\n\n\nIt might happen that you want to access a new parameter of a \nLearner\n which\nis already available in \nmlr\n, but the parameter is not \"registered\" in the learner's\n\nparameter set\n yet.\nIn this case you might want to \ncontact us\n\nor \nopen an issue\n as well!\nBut until then you can turn off \nmlr\n's parameter checking.\nThe parameter setting will then be passed to the underlying function without further ado.\n\n\n## Support Vector Machine with linear kernel and new parameter 'newParam'\nlrn = makeLearner(\nclassif.ksvm\n, kernel = \nvanilladot\n, newParam = 3)\n#\n Error in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter newParam without available description object!\n#\n You can switch off this check by using configureMlr!\n\n## Turn off parameter checking completely\nconfigureMlr(on.par.without.desc = \nquiet\n)\nlrn = makeLearner(\nclassif.ksvm\n, kernel = \nvanilladot\n, newParam = 3)\ntrain(lrn, iris.task)\n#\n Setting default kernel parameters\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernel=vanilladot,newParam=\nnumeric\n\n\n## Option \nquiet\n also masks typos\nlrn = makeLearner(\nclassif.ksvm\n, kernl = \nvanilladot\n)\ntrain(lrn, iris.task)\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernl=\ncharacter\n\n\n## Alternatively turn off parameter checking, but still see warnings\nconfigureMlr(on.par.without.desc = \nwarn\n)\nlrn = makeLearner(\nclassif.ksvm\n, kernl = \nvanilladot\n, newParam = 3)\n#\n Warning in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter kernl without available description object!\n#\n You can switch off this check by using configureMlr!\n#\n Warning in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter newParam without available description object!\n#\n You can switch off this check by using configureMlr!\n\ntrain(lrn, iris.task)\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernl=\ncharacter\n,newParam=\nnumeric\n\n\n\n\n\nExample: Handling errors in an underlying learning method\n\n\nIf an underlying learning method throws an error the default behavior of \nmlr\n is to\ngenerate an exception as well.\nHowever, in some situations, for example if you conduct a \nbenchmark study\n\nwith multiple data sets and learners, you usually don't want the whole experiment stopped due\nto one error.\nThe following example shows how to prevent this:\n\n\n## This call gives an error caused by the low number of observations in class \nvirginica\n\ntrain(\nclassif.qda\n, task = iris.task, subset = 1:104)\n#\n Error in qda.default(x, grouping, ...): some group is too small for 'qda'\n#\n Timing stopped at: 0.004 0 0.005\n\n## Turn learner errors into warnings\nconfigureMlr(on.learner.error = \nwarn\n)\nmod = train(\nclassif.qda\n, task = iris.task, subset = 1:104)\n#\n Warning in train(\nclassif.qda\n, task = iris.task, subset = 1:104): Could not train learner classif.qda: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n\nmod\n#\n Model for learner.id=classif.qda; learner.class=classif.qda\n#\n Trained on: task.id = iris-example; obs = 104; features = 4\n#\n Hyperparameters: \n#\n Training failed: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n#\n \n#\n Training failed: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n\n## mod is an object of class FailureModel\nisFailureModel(mod)\n#\n [1] TRUE\n\n## Retrieve the error message\ngetFailureModelMsg(mod)\n#\n [1] \nError in qda.default(x, grouping, ...) : \\n some group is too small for 'qda'\\n\n\n\n## predict and performance return NA's\npred = predict(mod, iris.task)\npred\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: NA\n#\n id truth response\n#\n 1 1 setosa \nNA\n\n#\n 2 2 setosa \nNA\n\n#\n 3 3 setosa \nNA\n\n#\n 4 4 setosa \nNA\n\n#\n 5 5 setosa \nNA\n\n#\n 6 6 setosa \nNA\n\n\nperformance(pred)\n#\n mmce \n#\n NA\n\n\n\n\nInstead of an exception, a warning is issued and a \nFailureModel\n is created.\nFunction \ngetFailureModelMsg\n extracts the error message.\nAll further steps like prediction and performance calculation work and return \nNA's\n.", + "text": "Configuring mlr\n\n\nIf you really know what you are doing you may think \nmlr\n is limiting you in certain ways.\n\nmlr\n is designed to make usage errors due to typos or invalid parameter values\nas unlikely as possible.\nBut sometimes you want to break those barriers and get full access.\nFor all available options, simply refer to the documentation of \nconfigureMlr\n.\n\n\nFunction \nconfigureMlr\n permits to set options globally for your current \nR\n session.\n\n\nIt is also possible to set options locally.\n\n\n\n\nAll options referring to the behavior of learners (these are all options except \nshow.info\n)\n can be set for an individual learner via the \nconfig\n argument of \nmakeLearner\n.\n The local precedes the global configuration.\n\n\nSome functions like \nresample\n, \nbenchmark\n, \nselectFeatures\n, \ntuneParams\n,\n and \ntuneParamsMultiCrit\n have a \nshow.info\n flag that controls if verbose messages\n are shown. The default value of \nshow.info\n can be set by \nconfigureMlr\n.\n\n\n\n\nExample: Reducing the output on the console\n\n\nYou are bothered by all the output on the console like in this example?\n\n\nrdesc = makeResampleDesc(\nHoldout\n)\nr = resample(\nclassif.multinom\n, iris.task, rdesc)\n#\n [Resample] holdout iter: 1\n#\n # weights: 18 (10 variable)\n#\n initial value 109.861229 \n#\n iter 10 value 12.256619\n#\n iter 20 value 3.638740\n#\n iter 30 value 3.228628\n#\n iter 40 value 2.951100\n#\n iter 50 value 2.806521\n#\n iter 60 value 2.739076\n#\n iter 70 value 2.522206\n#\n iter 80 value 2.485225\n#\n iter 90 value 2.381397\n#\n iter 100 value 2.360602\n#\n final value 2.360602 \n#\n stopped after 100 iterations\n#\n [Resample] Result: mmce.test.mean=0.02\n\n\n\n\nYou can suppress the output for this \nLearner\n and this \nresample\n call as follows:\n\n\nlrn = makeLearner(\nclassif.multinom\n, config = list(show.learner.output = FALSE))\nr = resample(lrn, iris.task, rdesc, show.info = FALSE)\n\n\n\n\n(Note that \nmultinom\n has a \ntrace\n switch that can alternatively be used to turn off\nthe progress messages.)\n\n\nTo globally suppress the output for all subsequent learners and calls to \nresample\n,\n\nbenchmark\n, etc. try the following:\n\n\nconfigureMlr(show.learner.output = FALSE, show.info = FALSE)\nr = resample(\nclassif.multinom\n, iris.task, rdesc)\n\n\n\n\nAccessing and resetting the configuration\n\n\nFunction \ngetMlrOptions\n returns a \nlist\n with the current configuration.\n\n\ngetMlrOptions()\n#\n $on.learner.error\n#\n [1] \nstop\n\n#\n \n#\n $on.learner.warning\n#\n [1] \nwarn\n\n#\n \n#\n $on.par.out.of.bounds\n#\n [1] \nstop\n\n#\n \n#\n $on.par.without.desc\n#\n [1] \nstop\n\n#\n \n#\n $show.info\n#\n [1] FALSE\n#\n \n#\n $show.learner.output\n#\n [1] FALSE\n\n\n\n\nTo restore the default configuration call \nconfigureMlr\n with an empty argument list.\n\n\nconfigureMlr()\n\n\n\n\ngetMlrOptions()\n#\n $on.learner.error\n#\n [1] \nstop\n\n#\n \n#\n $on.learner.warning\n#\n [1] \nwarn\n\n#\n \n#\n $on.par.out.of.bounds\n#\n [1] \nstop\n\n#\n \n#\n $on.par.without.desc\n#\n [1] \nstop\n\n#\n \n#\n $show.info\n#\n [1] TRUE\n#\n \n#\n $show.learner.output\n#\n [1] TRUE\n\n\n\n\nExample: Turning off parameter checking\n\n\nIt might happen that you want to access a new parameter of a \nLearner\n which\nis already available in \nmlr\n, but the parameter is not \"registered\" in the learner's\n\nparameter set\n yet.\nIn this case you might want to \ncontact us\n\nor \nopen an issue\n as well!\nBut until then you can turn off \nmlr\n's parameter checking.\nThe parameter setting will then be passed to the underlying function without further ado.\n\n\n## Support Vector Machine with linear kernel and new parameter 'newParam'\nlrn = makeLearner(\nclassif.ksvm\n, kernel = \nvanilladot\n, newParam = 3)\n#\n Error in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter newParam without available description object!\n#\n You can switch off this check by using configureMlr!\n\n## Turn off parameter checking completely\nconfigureMlr(on.par.without.desc = \nquiet\n)\nlrn = makeLearner(\nclassif.ksvm\n, kernel = \nvanilladot\n, newParam = 3)\ntrain(lrn, iris.task)\n#\n Setting default kernel parameters\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernel=vanilladot,newParam=\nnumeric\n\n\n## Option \nquiet\n also masks typos\nlrn = makeLearner(\nclassif.ksvm\n, kernl = \nvanilladot\n)\ntrain(lrn, iris.task)\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernl=\ncharacter\n\n\n## Alternatively turn off parameter checking, but still see warnings\nconfigureMlr(on.par.without.desc = \nwarn\n)\nlrn = makeLearner(\nclassif.ksvm\n, kernl = \nvanilladot\n, newParam = 3)\n#\n Warning in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter kernl without available description object!\n#\n You can switch off this check by using configureMlr!\n#\n Warning in setHyperPars2.Learner(learner, insert(par.vals, args)): classif.ksvm: Setting parameter newParam without available description object!\n#\n You can switch off this check by using configureMlr!\n\ntrain(lrn, iris.task)\n#\n Model for learner.id=classif.ksvm; learner.class=classif.ksvm\n#\n Trained on: task.id = iris-example; obs = 150; features = 4\n#\n Hyperparameters: fit=FALSE,kernl=\ncharacter\n,newParam=\nnumeric\n\n\n\n\n\nExample: Handling errors in an underlying learning method\n\n\nIf an underlying learning method throws an error the default behavior of \nmlr\n is to\ngenerate an exception as well.\nHowever, in some situations, for example if you conduct a \nbenchmark study\n\nwith multiple data sets and learners, you usually don't want the whole experiment stopped due\nto one error.\nThe following example shows how to prevent this:\n\n\n## This call gives an error caused by the low number of observations in class \nvirginica\n\ntrain(\nclassif.qda\n, task = iris.task, subset = 1:104)\n#\n Error in qda.default(x, grouping, ...): some group is too small for 'qda'\n#\n Timing stopped at: 0.005 0 0.005\n\n## Turn learner errors into warnings\nconfigureMlr(on.learner.error = \nwarn\n)\nmod = train(\nclassif.qda\n, task = iris.task, subset = 1:104)\n#\n Warning in train(\nclassif.qda\n, task = iris.task, subset = 1:104): Could not train learner classif.qda: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n\nmod\n#\n Model for learner.id=classif.qda; learner.class=classif.qda\n#\n Trained on: task.id = iris-example; obs = 104; features = 4\n#\n Hyperparameters: \n#\n Training failed: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n#\n \n#\n Training failed: Error in qda.default(x, grouping, ...) : \n#\n some group is too small for 'qda'\n\n## mod is an object of class FailureModel\nisFailureModel(mod)\n#\n [1] TRUE\n\n## Retrieve the error message\ngetFailureModelMsg(mod)\n#\n [1] \nError in qda.default(x, grouping, ...) : \\n some group is too small for 'qda'\\n\n\n\n## predict and performance return NA's\npred = predict(mod, iris.task)\npred\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: NA\n#\n id truth response\n#\n 1 1 setosa \nNA\n\n#\n 2 2 setosa \nNA\n\n#\n 3 3 setosa \nNA\n\n#\n 4 4 setosa \nNA\n\n#\n 5 5 setosa \nNA\n\n#\n 6 6 setosa \nNA\n\n\nperformance(pred)\n#\n mmce \n#\n NA\n\n\n\n\nInstead of an exception, a warning is issued and a \nFailureModel\n is created.\nFunction \ngetFailureModelMsg\n extracts the error message.\nAll further steps like prediction and performance calculation work and return \nNA's\n.", "title": "Configuration" }, { @@ -287,12 +287,12 @@ }, { "location": "/configureMlr/index.html#example-handling-errors-in-an-underlying-learning-method", - "text": "If an underlying learning method throws an error the default behavior of mlr is to\ngenerate an exception as well.\nHowever, in some situations, for example if you conduct a benchmark study \nwith multiple data sets and learners, you usually don't want the whole experiment stopped due\nto one error.\nThe following example shows how to prevent this: ## This call gives an error caused by the low number of observations in class virginica \ntrain( classif.qda , task = iris.task, subset = 1:104)\n# Error in qda.default(x, grouping, ...): some group is too small for 'qda'\n# Timing stopped at: 0.004 0 0.005\n\n## Turn learner errors into warnings\nconfigureMlr(on.learner.error = warn )\nmod = train( classif.qda , task = iris.task, subset = 1:104)\n# Warning in train( classif.qda , task = iris.task, subset = 1:104): Could not train learner classif.qda: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n\nmod\n# Model for learner.id=classif.qda; learner.class=classif.qda\n# Trained on: task.id = iris-example; obs = 104; features = 4\n# Hyperparameters: \n# Training failed: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n# \n# Training failed: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n\n## mod is an object of class FailureModel\nisFailureModel(mod)\n# [1] TRUE\n\n## Retrieve the error message\ngetFailureModelMsg(mod)\n# [1] Error in qda.default(x, grouping, ...) : \\n some group is too small for 'qda'\\n \n\n## predict and performance return NA's\npred = predict(mod, iris.task)\npred\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: NA\n# id truth response\n# 1 1 setosa NA \n# 2 2 setosa NA \n# 3 3 setosa NA \n# 4 4 setosa NA \n# 5 5 setosa NA \n# 6 6 setosa NA \n\nperformance(pred)\n# mmce \n# NA Instead of an exception, a warning is issued and a FailureModel is created.\nFunction getFailureModelMsg extracts the error message.\nAll further steps like prediction and performance calculation work and return NA's .", + "text": "If an underlying learning method throws an error the default behavior of mlr is to\ngenerate an exception as well.\nHowever, in some situations, for example if you conduct a benchmark study \nwith multiple data sets and learners, you usually don't want the whole experiment stopped due\nto one error.\nThe following example shows how to prevent this: ## This call gives an error caused by the low number of observations in class virginica \ntrain( classif.qda , task = iris.task, subset = 1:104)\n# Error in qda.default(x, grouping, ...): some group is too small for 'qda'\n# Timing stopped at: 0.005 0 0.005\n\n## Turn learner errors into warnings\nconfigureMlr(on.learner.error = warn )\nmod = train( classif.qda , task = iris.task, subset = 1:104)\n# Warning in train( classif.qda , task = iris.task, subset = 1:104): Could not train learner classif.qda: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n\nmod\n# Model for learner.id=classif.qda; learner.class=classif.qda\n# Trained on: task.id = iris-example; obs = 104; features = 4\n# Hyperparameters: \n# Training failed: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n# \n# Training failed: Error in qda.default(x, grouping, ...) : \n# some group is too small for 'qda'\n\n## mod is an object of class FailureModel\nisFailureModel(mod)\n# [1] TRUE\n\n## Retrieve the error message\ngetFailureModelMsg(mod)\n# [1] Error in qda.default(x, grouping, ...) : \\n some group is too small for 'qda'\\n \n\n## predict and performance return NA's\npred = predict(mod, iris.task)\npred\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: NA\n# id truth response\n# 1 1 setosa NA \n# 2 2 setosa NA \n# 3 3 setosa NA \n# 4 4 setosa NA \n# 5 5 setosa NA \n# 6 6 setosa NA \n\nperformance(pred)\n# mmce \n# NA Instead of an exception, a warning is issued and a FailureModel is created.\nFunction getFailureModelMsg extracts the error message.\nAll further steps like prediction and performance calculation work and return NA's .", "title": "Example: Handling errors in an underlying learning method" }, { "location": "/wrapper/index.html", - "text": "Wrapper\n\n\nWrappers can be employed to extend integrated \nlearners\n with new functionality.\nThe broad scope of operations and methods which are implemented as wrappers underline the flexibility of the wrapping approach:\n\n\n\n\nData preprocessing\n\n\nImputation\n\n\nBagging\n\n\nTuning\n\n\nFeature selection\n\n\nCost-sensitive classification\n\n\nOver- and undersampling\n for imbalanced classification problems\n\n\nMulticlass extension\n for binary-class learners\n\n\nMultilabel binary relevance wrapper\n for multilabel\nclassification with the binary relevance method\n\n\n\n\nAll these operations and methods have a few things in common:\nFirst, they all wrap around \nmlr\n \nlearners\n and they return a new learner.\nTherefore learners can be wrapped multiple times.\nSecond, they are implemented using a \ntrain\n (pre-model hook) and \npredict\n (post-model hook) method.\n\n\nExample: Bagging wrapper\n\n\nIn this section we exemplary describe the bagging wrapper to create a random forest which supports weights.\nTo achieve that we combine several decision trees from the \nrpart\n package to create our own custom random forest.\n\n\nFirst, we create a weighted toy task.\n\n\ndata(iris)\ntask = makeClassifTask(data = iris, target = \nSpecies\n, weights = as.integer(iris$Species))\n\n\n\n\nNext, we use \nmakeBaggingWrapper\n to create the base learners and the bagged learner.\nWe choose to set equivalents of \nntree\n (100 base learners) and \nmtry\n (proportion of randomly selected features).\n\n\nbase.lrn = makeLearner(\nclassif.rpart\n)\nwrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)\nprint(wrapped.lrn)\n#\n Learner classif.rpart.bagged from package rpart\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: BaggingWrapper\n#\n Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5\n\n\n\n\nAs we can see in the output, the wrapped learner inherited all properties from the base learner, especially the \"weights\" attribute is still present.\nWe can use this newly constructed learner like all base learners, i.e. we can use it in \ntrain\n, \nbenchmark\n, \nresample\n, etc.\n\n\nbenchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))\n#\n Task: iris, Learner: classif.rpart\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] cross-validation iter: 4\n#\n [Resample] cross-validation iter: 5\n#\n [Resample] cross-validation iter: 6\n#\n [Resample] cross-validation iter: 7\n#\n [Resample] cross-validation iter: 8\n#\n [Resample] cross-validation iter: 9\n#\n [Resample] cross-validation iter: 10\n#\n [Resample] Result: mmce.test.mean=0.0667\n#\n Task: iris, Learner: classif.rpart.bagged\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] cross-validation iter: 4\n#\n [Resample] cross-validation iter: 5\n#\n [Resample] cross-validation iter: 6\n#\n [Resample] cross-validation iter: 7\n#\n [Resample] cross-validation iter: 8\n#\n [Resample] cross-validation iter: 9\n#\n [Resample] cross-validation iter: 10\n#\n [Resample] Result: mmce.test.mean=0.06\n#\n task.id learner.id mmce.test.mean\n#\n 1 iris classif.rpart 0.06666667\n#\n 2 iris classif.rpart.bagged 0.06000000\n\n\n\n\nThat far we are quite happy with our new learner.\nBut we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper.\nLet's have a look at the available hyperparameters of the fused learner:\n\n\ngetParamSet(wrapped.lrn)\n#\n Type len Def Constr Req Tunable Trafo\n#\n bw.iters integer - 10 1 to Inf - TRUE -\n#\n bw.replace logical - TRUE - - TRUE -\n#\n bw.size numeric - - 0 to 1 - TRUE -\n#\n bw.feats numeric - 0.667 0 to 1 - TRUE -\n#\n minsplit integer - 20 1 to Inf - TRUE -\n#\n minbucket integer - - 1 to Inf - TRUE -\n#\n cp numeric - 0.01 0 to 1 - TRUE -\n#\n maxcompete integer - 4 0 to Inf - TRUE -\n#\n maxsurrogate integer - 5 0 to Inf - TRUE -\n#\n usesurrogate discrete - 2 0,1,2 - TRUE -\n#\n surrogatestyle discrete - 0 0,1 - TRUE -\n#\n maxdepth integer - 30 1 to 30 - TRUE -\n#\n xval integer - 10 0 to Inf - TRUE -\n#\n parms untyped - - - - FALSE -\n\n\n\n\nWe choose to tune the parameters \nminsplit\n and \nbw.feats\n for the \nmmce\n using a \nrandom search\n in a 3-fold CV:\n\n\nctrl = makeTuneControlRandom(maxit = 10)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\npar.set = makeParamSet(\n makeIntegerParam(\nminsplit\n, lower = 1, upper = 10),\n makeNumericParam(\nbw.feats\n, lower = 0.25, upper = 1)\n)\ntuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)\nprint(tuned.lrn)\n#\n Learner classif.rpart.bagged.tuned from package rpart\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: TuneWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5\n\n\n\n\nCalling the train method of the newly constructed learner performs the following steps:\n\n\n\n\nThe tuning wrapper sets parameters for the underlying model in slot \n$next.learner\n and calls its train method.\n\n\nNext learner is the bagging wrapper. The passed down argument \nbw.feats\n is used in the bagging wrapper training function, the argument \nminsplit\n gets passed down to \n$next.learner\n.\n The base wrapper function calls the base learner \nbw.iters\n times and stores the resulting models.\n\n\nThe bagged models are evaluated using the mean \nmmce\n (default aggregation for this performance measure) and new parameters are selected using the tuning method.\n\n\nThis is repeated until the tuner terminates. Output is a tuned bagged learner.\n\n\n\n\nlrn = train(tuned.lrn, task = task)\n#\n [Tune] Started tuning learner classif.rpart.bagged for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n minsplit integer - - 1 to 10 - TRUE -\n#\n bw.feats numeric - - 0.25 to 1 - TRUE -\n#\n With control class: TuneControlRandom\n#\n Imputation value: 1\n#\n [Tune-x] 1: minsplit=5; bw.feats=0.935\n#\n [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 2: minsplit=9; bw.feats=0.675\n#\n [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 3: minsplit=2; bw.feats=0.847\n#\n [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 4: minsplit=4; bw.feats=0.761\n#\n [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 5: minsplit=6; bw.feats=0.338\n#\n [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 6: minsplit=1; bw.feats=0.637\n#\n [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 7: minsplit=1; bw.feats=0.998\n#\n [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 8: minsplit=4; bw.feats=0.698\n#\n [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 9: minsplit=3; bw.feats=0.836\n#\n [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune-x] 10: minsplit=10; bw.feats=0.529\n#\n [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 478Mb max\n#\n [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467\nprint(lrn)\n#\n Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = iris; obs = 150; features = 4\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5", + "text": "Wrapper\n\n\nWrappers can be employed to extend integrated \nlearners\n with new functionality.\nThe broad scope of operations and methods which are implemented as wrappers underline the flexibility of the wrapping approach:\n\n\n\n\nData preprocessing\n\n\nImputation\n\n\nBagging\n\n\nTuning\n\n\nFeature selection\n\n\nCost-sensitive classification\n\n\nOver- and undersampling\n for imbalanced classification problems\n\n\nMulticlass extension\n for binary-class learners\n\n\nMultilabel binary relevance wrapper\n for multilabel\nclassification with the binary relevance method\n\n\n\n\nAll these operations and methods have a few things in common:\nFirst, they all wrap around \nmlr\n \nlearners\n and they return a new learner.\nTherefore learners can be wrapped multiple times.\nSecond, they are implemented using a \ntrain\n (pre-model hook) and \npredict\n (post-model hook) method.\n\n\nExample: Bagging wrapper\n\n\nIn this section we exemplary describe the bagging wrapper to create a random forest which supports weights.\nTo achieve that we combine several decision trees from the \nrpart\n package to create our own custom random forest.\n\n\nFirst, we create a weighted toy task.\n\n\ndata(iris)\ntask = makeClassifTask(data = iris, target = \nSpecies\n, weights = as.integer(iris$Species))\n\n\n\n\nNext, we use \nmakeBaggingWrapper\n to create the base learners and the bagged learner.\nWe choose to set equivalents of \nntree\n (100 base learners) and \nmtry\n (proportion of randomly selected features).\n\n\nbase.lrn = makeLearner(\nclassif.rpart\n)\nwrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)\nprint(wrapped.lrn)\n#\n Learner classif.rpart.bagged from package rpart\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: BaggingWrapper\n#\n Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5\n\n\n\n\nAs we can see in the output, the wrapped learner inherited all properties from the base learner, especially the \"weights\" attribute is still present.\nWe can use this newly constructed learner like all base learners, i.e. we can use it in \ntrain\n, \nbenchmark\n, \nresample\n, etc.\n\n\nbenchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))\n#\n Task: iris, Learner: classif.rpart\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] cross-validation iter: 4\n#\n [Resample] cross-validation iter: 5\n#\n [Resample] cross-validation iter: 6\n#\n [Resample] cross-validation iter: 7\n#\n [Resample] cross-validation iter: 8\n#\n [Resample] cross-validation iter: 9\n#\n [Resample] cross-validation iter: 10\n#\n [Resample] Result: mmce.test.mean=0.0667\n#\n Task: iris, Learner: classif.rpart.bagged\n#\n [Resample] cross-validation iter: 1\n#\n [Resample] cross-validation iter: 2\n#\n [Resample] cross-validation iter: 3\n#\n [Resample] cross-validation iter: 4\n#\n [Resample] cross-validation iter: 5\n#\n [Resample] cross-validation iter: 6\n#\n [Resample] cross-validation iter: 7\n#\n [Resample] cross-validation iter: 8\n#\n [Resample] cross-validation iter: 9\n#\n [Resample] cross-validation iter: 10\n#\n [Resample] Result: mmce.test.mean=0.06\n#\n task.id learner.id mmce.test.mean\n#\n 1 iris classif.rpart 0.06666667\n#\n 2 iris classif.rpart.bagged 0.06000000\n\n\n\n\nThat far we are quite happy with our new learner.\nBut we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper.\nLet's have a look at the available hyperparameters of the fused learner:\n\n\ngetParamSet(wrapped.lrn)\n#\n Type len Def Constr Req Tunable Trafo\n#\n bw.iters integer - 10 1 to Inf - TRUE -\n#\n bw.replace logical - TRUE - - TRUE -\n#\n bw.size numeric - - 0 to 1 - TRUE -\n#\n bw.feats numeric - 0.667 0 to 1 - TRUE -\n#\n minsplit integer - 20 1 to Inf - TRUE -\n#\n minbucket integer - - 1 to Inf - TRUE -\n#\n cp numeric - 0.01 0 to 1 - TRUE -\n#\n maxcompete integer - 4 0 to Inf - TRUE -\n#\n maxsurrogate integer - 5 0 to Inf - TRUE -\n#\n usesurrogate discrete - 2 0,1,2 - TRUE -\n#\n surrogatestyle discrete - 0 0,1 - TRUE -\n#\n maxdepth integer - 30 1 to 30 - TRUE -\n#\n xval integer - 10 0 to Inf - TRUE -\n#\n parms untyped - - - - FALSE -\n\n\n\n\nWe choose to tune the parameters \nminsplit\n and \nbw.feats\n for the \nmmce\n using a \nrandom search\n in a 3-fold CV:\n\n\nctrl = makeTuneControlRandom(maxit = 10)\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\npar.set = makeParamSet(\n makeIntegerParam(\nminsplit\n, lower = 1, upper = 10),\n makeNumericParam(\nbw.feats\n, lower = 0.25, upper = 1)\n)\ntuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)\nprint(tuned.lrn)\n#\n Learner classif.rpart.bagged.tuned from package rpart\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: TuneWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5\n\n\n\n\nCalling the train method of the newly constructed learner performs the following steps:\n\n\n\n\nThe tuning wrapper sets parameters for the underlying model in slot \n$next.learner\n and calls its train method.\n\n\nNext learner is the bagging wrapper. The passed down argument \nbw.feats\n is used in the bagging wrapper training function, the argument \nminsplit\n gets passed down to \n$next.learner\n.\n The base wrapper function calls the base learner \nbw.iters\n times and stores the resulting models.\n\n\nThe bagged models are evaluated using the mean \nmmce\n (default aggregation for this performance measure) and new parameters are selected using the tuning method.\n\n\nThis is repeated until the tuner terminates. Output is a tuned bagged learner.\n\n\n\n\nlrn = train(tuned.lrn, task = task)\n#\n [Tune] Started tuning learner classif.rpart.bagged for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n minsplit integer - - 1 to 10 - TRUE -\n#\n bw.feats numeric - - 0.25 to 1 - TRUE -\n#\n With control class: TuneControlRandom\n#\n Imputation value: 1\n#\n [Tune-x] 1: minsplit=5; bw.feats=0.935\n#\n [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 2: minsplit=9; bw.feats=0.675\n#\n [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 3: minsplit=2; bw.feats=0.847\n#\n [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 4: minsplit=4; bw.feats=0.761\n#\n [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 5: minsplit=6; bw.feats=0.338\n#\n [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 6: minsplit=1; bw.feats=0.637\n#\n [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 7: minsplit=1; bw.feats=0.998\n#\n [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 8: minsplit=4; bw.feats=0.698\n#\n [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 9: minsplit=3; bw.feats=0.836\n#\n [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune-x] 10: minsplit=10; bw.feats=0.529\n#\n [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 522Mb max\n#\n [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467\nprint(lrn)\n#\n Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = iris; obs = 150; features = 4\n#\n Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5", "title": "Wrapped Learners" }, { @@ -302,12 +302,12 @@ }, { "location": "/wrapper/index.html#example-bagging-wrapper", - "text": "In this section we exemplary describe the bagging wrapper to create a random forest which supports weights.\nTo achieve that we combine several decision trees from the rpart package to create our own custom random forest. First, we create a weighted toy task. data(iris)\ntask = makeClassifTask(data = iris, target = Species , weights = as.integer(iris$Species)) Next, we use makeBaggingWrapper to create the base learners and the bagged learner.\nWe choose to set equivalents of ntree (100 base learners) and mtry (proportion of randomly selected features). base.lrn = makeLearner( classif.rpart )\nwrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)\nprint(wrapped.lrn)\n# Learner classif.rpart.bagged from package rpart\n# Type: classif\n# Name: ; Short name: \n# Class: BaggingWrapper\n# Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n# Predict-Type: response\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5 As we can see in the output, the wrapped learner inherited all properties from the base learner, especially the \"weights\" attribute is still present.\nWe can use this newly constructed learner like all base learners, i.e. we can use it in train , benchmark , resample , etc. benchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))\n# Task: iris, Learner: classif.rpart\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] cross-validation iter: 4\n# [Resample] cross-validation iter: 5\n# [Resample] cross-validation iter: 6\n# [Resample] cross-validation iter: 7\n# [Resample] cross-validation iter: 8\n# [Resample] cross-validation iter: 9\n# [Resample] cross-validation iter: 10\n# [Resample] Result: mmce.test.mean=0.0667\n# Task: iris, Learner: classif.rpart.bagged\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] cross-validation iter: 4\n# [Resample] cross-validation iter: 5\n# [Resample] cross-validation iter: 6\n# [Resample] cross-validation iter: 7\n# [Resample] cross-validation iter: 8\n# [Resample] cross-validation iter: 9\n# [Resample] cross-validation iter: 10\n# [Resample] Result: mmce.test.mean=0.06\n# task.id learner.id mmce.test.mean\n# 1 iris classif.rpart 0.06666667\n# 2 iris classif.rpart.bagged 0.06000000 That far we are quite happy with our new learner.\nBut we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper.\nLet's have a look at the available hyperparameters of the fused learner: getParamSet(wrapped.lrn)\n# Type len Def Constr Req Tunable Trafo\n# bw.iters integer - 10 1 to Inf - TRUE -\n# bw.replace logical - TRUE - - TRUE -\n# bw.size numeric - - 0 to 1 - TRUE -\n# bw.feats numeric - 0.667 0 to 1 - TRUE -\n# minsplit integer - 20 1 to Inf - TRUE -\n# minbucket integer - - 1 to Inf - TRUE -\n# cp numeric - 0.01 0 to 1 - TRUE -\n# maxcompete integer - 4 0 to Inf - TRUE -\n# maxsurrogate integer - 5 0 to Inf - TRUE -\n# usesurrogate discrete - 2 0,1,2 - TRUE -\n# surrogatestyle discrete - 0 0,1 - TRUE -\n# maxdepth integer - 30 1 to 30 - TRUE -\n# xval integer - 10 0 to Inf - TRUE -\n# parms untyped - - - - FALSE - We choose to tune the parameters minsplit and bw.feats for the mmce using a random search in a 3-fold CV: ctrl = makeTuneControlRandom(maxit = 10)\nrdesc = makeResampleDesc( CV , iters = 3)\npar.set = makeParamSet(\n makeIntegerParam( minsplit , lower = 1, upper = 10),\n makeNumericParam( bw.feats , lower = 0.25, upper = 1)\n)\ntuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)\nprint(tuned.lrn)\n# Learner classif.rpart.bagged.tuned from package rpart\n# Type: classif\n# Name: ; Short name: \n# Class: TuneWrapper\n# Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5 Calling the train method of the newly constructed learner performs the following steps: The tuning wrapper sets parameters for the underlying model in slot $next.learner and calls its train method. Next learner is the bagging wrapper. The passed down argument bw.feats is used in the bagging wrapper training function, the argument minsplit gets passed down to $next.learner .\n The base wrapper function calls the base learner bw.iters times and stores the resulting models. The bagged models are evaluated using the mean mmce (default aggregation for this performance measure) and new parameters are selected using the tuning method. This is repeated until the tuner terminates. Output is a tuned bagged learner. lrn = train(tuned.lrn, task = task)\n# [Tune] Started tuning learner classif.rpart.bagged for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# minsplit integer - - 1 to 10 - TRUE -\n# bw.feats numeric - - 0.25 to 1 - TRUE -\n# With control class: TuneControlRandom\n# Imputation value: 1\n# [Tune-x] 1: minsplit=5; bw.feats=0.935\n# [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 2: minsplit=9; bw.feats=0.675\n# [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 3: minsplit=2; bw.feats=0.847\n# [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 4: minsplit=4; bw.feats=0.761\n# [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 5: minsplit=6; bw.feats=0.338\n# [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 6: minsplit=1; bw.feats=0.637\n# [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 7: minsplit=1; bw.feats=0.998\n# [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 8: minsplit=4; bw.feats=0.698\n# [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 9: minsplit=3; bw.feats=0.836\n# [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune-x] 10: minsplit=10; bw.feats=0.529\n# [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 478Mb max\n# [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467\nprint(lrn)\n# Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper\n# Trained on: task.id = iris; obs = 150; features = 4\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5", + "text": "In this section we exemplary describe the bagging wrapper to create a random forest which supports weights.\nTo achieve that we combine several decision trees from the rpart package to create our own custom random forest. First, we create a weighted toy task. data(iris)\ntask = makeClassifTask(data = iris, target = Species , weights = as.integer(iris$Species)) Next, we use makeBaggingWrapper to create the base learners and the bagged learner.\nWe choose to set equivalents of ntree (100 base learners) and mtry (proportion of randomly selected features). base.lrn = makeLearner( classif.rpart )\nwrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)\nprint(wrapped.lrn)\n# Learner classif.rpart.bagged from package rpart\n# Type: classif\n# Name: ; Short name: \n# Class: BaggingWrapper\n# Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights\n# Predict-Type: response\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5 As we can see in the output, the wrapped learner inherited all properties from the base learner, especially the \"weights\" attribute is still present.\nWe can use this newly constructed learner like all base learners, i.e. we can use it in train , benchmark , resample , etc. benchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))\n# Task: iris, Learner: classif.rpart\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] cross-validation iter: 4\n# [Resample] cross-validation iter: 5\n# [Resample] cross-validation iter: 6\n# [Resample] cross-validation iter: 7\n# [Resample] cross-validation iter: 8\n# [Resample] cross-validation iter: 9\n# [Resample] cross-validation iter: 10\n# [Resample] Result: mmce.test.mean=0.0667\n# Task: iris, Learner: classif.rpart.bagged\n# [Resample] cross-validation iter: 1\n# [Resample] cross-validation iter: 2\n# [Resample] cross-validation iter: 3\n# [Resample] cross-validation iter: 4\n# [Resample] cross-validation iter: 5\n# [Resample] cross-validation iter: 6\n# [Resample] cross-validation iter: 7\n# [Resample] cross-validation iter: 8\n# [Resample] cross-validation iter: 9\n# [Resample] cross-validation iter: 10\n# [Resample] Result: mmce.test.mean=0.06\n# task.id learner.id mmce.test.mean\n# 1 iris classif.rpart 0.06666667\n# 2 iris classif.rpart.bagged 0.06000000 That far we are quite happy with our new learner.\nBut we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper.\nLet's have a look at the available hyperparameters of the fused learner: getParamSet(wrapped.lrn)\n# Type len Def Constr Req Tunable Trafo\n# bw.iters integer - 10 1 to Inf - TRUE -\n# bw.replace logical - TRUE - - TRUE -\n# bw.size numeric - - 0 to 1 - TRUE -\n# bw.feats numeric - 0.667 0 to 1 - TRUE -\n# minsplit integer - 20 1 to Inf - TRUE -\n# minbucket integer - - 1 to Inf - TRUE -\n# cp numeric - 0.01 0 to 1 - TRUE -\n# maxcompete integer - 4 0 to Inf - TRUE -\n# maxsurrogate integer - 5 0 to Inf - TRUE -\n# usesurrogate discrete - 2 0,1,2 - TRUE -\n# surrogatestyle discrete - 0 0,1 - TRUE -\n# maxdepth integer - 30 1 to 30 - TRUE -\n# xval integer - 10 0 to Inf - TRUE -\n# parms untyped - - - - FALSE - We choose to tune the parameters minsplit and bw.feats for the mmce using a random search in a 3-fold CV: ctrl = makeTuneControlRandom(maxit = 10)\nrdesc = makeResampleDesc( CV , iters = 3)\npar.set = makeParamSet(\n makeIntegerParam( minsplit , lower = 1, upper = 10),\n makeNumericParam( bw.feats , lower = 0.25, upper = 1)\n)\ntuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)\nprint(tuned.lrn)\n# Learner classif.rpart.bagged.tuned from package rpart\n# Type: classif\n# Name: ; Short name: \n# Class: TuneWrapper\n# Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5 Calling the train method of the newly constructed learner performs the following steps: The tuning wrapper sets parameters for the underlying model in slot $next.learner and calls its train method. Next learner is the bagging wrapper. The passed down argument bw.feats is used in the bagging wrapper training function, the argument minsplit gets passed down to $next.learner .\n The base wrapper function calls the base learner bw.iters times and stores the resulting models. The bagged models are evaluated using the mean mmce (default aggregation for this performance measure) and new parameters are selected using the tuning method. This is repeated until the tuner terminates. Output is a tuned bagged learner. lrn = train(tuned.lrn, task = task)\n# [Tune] Started tuning learner classif.rpart.bagged for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# minsplit integer - - 1 to 10 - TRUE -\n# bw.feats numeric - - 0.25 to 1 - TRUE -\n# With control class: TuneControlRandom\n# Imputation value: 1\n# [Tune-x] 1: minsplit=5; bw.feats=0.935\n# [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 2: minsplit=9; bw.feats=0.675\n# [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 3: minsplit=2; bw.feats=0.847\n# [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 4: minsplit=4; bw.feats=0.761\n# [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 5: minsplit=6; bw.feats=0.338\n# [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 6: minsplit=1; bw.feats=0.637\n# [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 7: minsplit=1; bw.feats=0.998\n# [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 8: minsplit=4; bw.feats=0.698\n# [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 9: minsplit=3; bw.feats=0.836\n# [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune-x] 10: minsplit=10; bw.feats=0.529\n# [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 522Mb max\n# [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467\nprint(lrn)\n# Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper\n# Trained on: task.id = iris; obs = 150; features = 4\n# Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5", "title": "Example: Bagging wrapper" }, { "location": "/preproc/index.html", - "text": "Data Preprocessing\n\n\nData preprocessing refers to any transformation of the data done before applying a learning\nalgorithm.\nThis comprises for example finding and resolving inconsistencies, imputation of missing values,\nidentifying, removing or replacing outliers, discretizing numerical data or generating numerical\ndummy variables for categorical data, any kind of transformation like standardization of predictors\nor Box-Cox, dimensionality reduction and feature extraction and/or selection.\n\n\nmlr\n offers several options for data preprocessing.\nSome of the following simple methods to change a \nTask\n were already mentioned in the\nsection about \nlearning tasks\n:\n\n\n\n\ncapLargeValues\n: Convert large/infinite numeric values in a \ndata.frame\n\n or \nTask\n.\n\n\ncreateDummyFeatures\n: Generate dummy variables for factor features in a\n \ndata.frame\n or \nTask\n.\n\n\ndropFeatures\n: Remove some features from a \nTask\n.\n\n\njoinClassLevels\n: Only for classification: Merge existing classes in a \nTask\n to\n new, larger classes.\n\n\nmergeSmallFactorLevels\n: Merge infrequent levels of factor features in a \nTask\n.\n\n\nnormalizeFeatures\n: Normalize features in a \nTask\n by different methods, e.g.,\n standardization or scaling to a certain range.\n\n\nremoveConstantFeatures\n: Remove constant features from a \nTask\n.\n\n\nsubsetTask\n: Remove observations and/or features from a \nTask\n.\n\n\n\n\nMoreover, there are tutorial sections devoted to\n\n\n\n\nFeature selection\n and\n\n\nImputation of missing values\n.\n\n\n\n\nFusing learners with preprocessing\n\n\nmlr\n's wrapper functionality permits to combine learners with preprocessing steps.\nThis means that the preprocessing \"belongs\" to the learner and is done any time the learner\nis trained or predictions are made.\n\n\nThis is, on the one hand, very practical.\nYou don't need to change any data or learning \nTask\ns and it's quite easy to combine\ndifferent learners with different preprocessing steps.\n\n\nOn the other hand this helps to avoid a common mistake in evaluating the performance of a\nlearner with preprocessing:\nPreprocessing is often seen as completely independent of the later applied learning algorithms.\nWhen estimating the performance of the a learner, e.g., by cross-validation all preprocessing\nis done beforehand on the full data set and only training/predicting the learner is done on the\ntrain/test sets.\nDepending on what exactly is done as preprocessing this can lead to overoptimistic results.\nFor example if imputation by the mean is done on the whole data set before evaluating the learner\nperformance you are using information from the test data during training, which can cause\noveroptimistic performance results.\n\n\nTo clarify things one should distinguish between \ndata-dependent\n and \ndata-independent\n\npreprocessing steps:\nData-dependent steps in some way learn from the data and give different results when applied to\ndifferent data sets. Data-independent steps always lead to the same results.\nClearly, correcting errors in the data or removing data columns like Ids that should\nnot be used for learning, is data-independent.\nImputation of missing values by the mean, as mentioned above, is data-dependent.\nImputation by a fixed constant, however, is not.\n\n\nTo get a honest estimate of learner performance combined with preprocessing, all data-dependent\npreprocessing steps must be included in the resampling.\nThis is automatically done when fusing a learner with preprocessing.\n\n\nTo this end \nmlr\n provides two \nwrappers\n:\n\n\n\n\nmakePreprocWrapperCaret\n is an interface to all preprocessing options offered by \ncaret\n's\n \npreProcess\n function.\n\n\nmakePreprocWrapper\n permits to write your own custom preprocessing methods by defining\n the actions to be taken before training and before prediction.\n\n\n\n\nAs mentioned above the specified preprocessing steps then \"belong\" to the wrapped \nLearner\n.\nIn contrast to the preprocessing options listed above like \nnormalizeFeatures\n\n\n\n\nthe \nTask\n itself remains unchanged,\n\n\nthe preprocessing is not done globally, i.e., for the whole data set, but for every pair of\n training/test data sets in, e.g., resampling,\n\n\nany parameters controlling the preprocessing as, e.g., the percentage of outliers to be removed\n can be \ntuned\n together with the base learner parameters.\n\n\n\n\nWe start with some examples for \nmakePreprocWrapperCaret\n.\n\n\nPreprocessing with makePreprocWrapperCaret\n\n\nmakePreprocWrapperCaret\n is an interface to \ncaret\n's \npreProcess\n\nfunction that provides many different options like imputation of missing values,\ndata transformations as scaling the features to a certain range or Box-Cox and dimensionality\nreduction via Independent or Principal Component Analysis.\nFor all possible options see the help page of function \npreProcess\n.\n\n\nNote that the usage of \nmakePreprocWrapperCaret\n is slightly different than that of\n\npreProcess\n.\n\n\n\n\nmakePreprocWrapperCaret\n takes (almost) the same formal arguments as \npreProcess\n,\n but their names are prefixed by \nppc.\n.\n\n\nThe only exception: \nmakePreprocWrapperCaret\n does not have a \nmethod\n argument. Instead\n all preprocessing options that would be passed to \npreProcess\n's \nmethod\n\n argument are given as individual logical parameters to \nmakePreprocWrapperCaret\n.\n\n\n\n\nFor example the following call to \npreProcess\n\n\npreProcess(x, method = c(\nknnImpute\n, \npca\n), pcaComp = 10)\n\n\n\n\nwith \nx\n being a \nmatrix\n or \ndata.frame\n\nwould thus translate into\n\n\nmakePreprocWrapperCaret(learner, ppc.knnImpute = TRUE, ppc.pca = TRUE, ppc.pcaComp = 10)\n\n\n\n\nwhere \nlearner\n is a \nmlr\n \nLearner\n or the name of a learner class like\n\n\"classif.lda\"\n.\n\n\nIf you enable multiple preprocessing options (like knn imputation and principal component\nanalysis above) these are executed in a certain order detailed on the help page of function\n\npreProcess\n.\n\n\nIn the following we show an example where principal components analysis (PCA) is used for\ndimensionality reduction.\nThis should never be applied blindly, but can be beneficial with learners that get problems\nwith high dimensionality or those that can profit from rotating the data.\n\n\nWe consider the \nsonar.task\n, which poses a binary classification problem with 208 observations\nand 60 features.\n\n\nsonar.task\n#\n Supervised task: Sonar-example\n#\n Type: classif\n#\n Target: Class\n#\n Observations: 208\n#\n Features:\n#\n numerics factors ordered \n#\n 60 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 2\n#\n M R \n#\n 111 97 \n#\n Positive class: M\n\n\n\n\nBelow we fuse \nquadratic discriminant analysis\n from package \nMASS\n with a principal\ncomponents preprocessing step.\nThe threshold is set to 0.9, i.e., the principal components necessary to explain a cumulative\npercentage of 90% of the total variance are kept.\nThe data are automatically standardized prior to PCA.\n\n\nlrn = makePreprocWrapperCaret(\nclassif.qda\n, ppc.pca = TRUE, ppc.thresh = 0.9)\nlrn\n#\n Learner classif.qda.preproc from package MASS\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: PreprocWrapperCaret\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: ppc.BoxCox=FALSE,ppc.YeoJohnson=FALSE,ppc.expoTrans=FALSE,ppc.center=TRUE,ppc.scale=TRUE,ppc.range=FALSE,ppc.knnImpute=FALSE,ppc.bagImpute=FALSE,ppc.medianImpute=FALSE,ppc.pca=TRUE,ppc.ica=FALSE,ppc.spatialSign=FALSE,ppc.thresh=0.9,ppc.na.remove=TRUE,ppc.k=5,ppc.fudge=0.2,ppc.numUnique=3\n\n\n\n\nThe wrapped learner is trained on the \nsonar.task\n.\nBy inspecting the underlying \nqda\n model, we see that the first 22\nprincipal components have been used for training.\n\n\nmod = train(lrn, sonar.task)\nmod\n#\n Model for learner.id=classif.qda.preproc; learner.class=PreprocWrapperCaret\n#\n Trained on: task.id = Sonar-example; obs = 208; features = 60\n#\n Hyperparameters: ppc.BoxCox=FALSE,ppc.YeoJohnson=FALSE,ppc.expoTrans=FALSE,ppc.center=TRUE,ppc.scale=TRUE,ppc.range=FALSE,ppc.knnImpute=FALSE,ppc.bagImpute=FALSE,ppc.medianImpute=FALSE,ppc.pca=TRUE,ppc.ica=FALSE,ppc.spatialSign=FALSE,ppc.thresh=0.9,ppc.na.remove=TRUE,ppc.k=5,ppc.fudge=0.2,ppc.numUnique=3\n\ngetLearnerModel(mod)\n#\n Model for learner.id=classif.qda; learner.class=classif.qda\n#\n Trained on: task.id = Sonar-example; obs = 208; features = 22\n#\n Hyperparameters:\n\ngetLearnerModel(mod, more.unwrap = TRUE)\n#\n Call:\n#\n qda(f, data = getTaskData(.task, .subset, recode.target = \ndrop.levels\n))\n#\n \n#\n Prior probabilities of groups:\n#\n M R \n#\n 0.5336538 0.4663462 \n#\n \n#\n Group means:\n#\n PC1 PC2 PC3 PC4 PC5 PC6\n#\n M 0.5976122 -0.8058235 0.9773518 0.03794232 -0.04568166 -0.06721702\n#\n R -0.6838655 0.9221279 -1.1184128 -0.04341853 0.05227489 0.07691845\n#\n PC7 PC8 PC9 PC10 PC11 PC12\n#\n M 0.2278162 -0.01034406 -0.2530606 -0.1793157 -0.04084466 -0.0004789888\n#\n R -0.2606969 0.01183702 0.2895848 0.2051963 0.04673977 0.0005481212\n#\n PC13 PC14 PC15 PC16 PC17 PC18\n#\n M -0.06138758 -0.1057137 0.02808048 0.05215865 -0.07453265 0.03869042\n#\n R 0.07024765 0.1209713 -0.03213333 -0.05968671 0.08528994 -0.04427460\n#\n PC19 PC20 PC21 PC22\n#\n M -0.01192247 0.006098658 0.01263492 -0.001224809\n#\n R 0.01364323 -0.006978877 -0.01445851 0.001401586\n\n\n\n\nBelow the performances of \nqda\n with and without PCA preprocessing are compared\nin a \nbenchmark experiment\n.\nNote that we use stratified resampling to prevent errors in \nqda\n due to a too\nsmall number of observations from either class.\n\n\nrin = makeResampleInstance(\nCV\n, iters = 3, stratify = TRUE, task = sonar.task)\nres = benchmark(list(makeLearner(\nclassif.qda\n), lrn), sonar.task, rin, show.info = FALSE)\nres\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.qda 0.3941339\n#\n 2 Sonar-example classif.qda.preproc 0.2643202\n\n\n\n\nPCA preprocessing in this case turns out to be really beneficial for the\nperformance of Quadratic Discriminant Analysis.\n\n\nJoint tuning of preprocessing options and learner parameters\n\n\nLet's see if we can optimize this a bit.\nThe threshold value of 0.9 above was chosen arbitrarily and led to 22 out of 60 principal\ncomponents.\nBut maybe a lower or higher number of principal components should be used.\nMoreover, \nqda\n has several options that control how the class covariance matrices\nor class probabilities are estimated.\n\n\nThose preprocessing and learner parameters can be \ntuned\n jointly.\nBefore doing this let's first get an overview of all the parameters of the wrapped learner\nusing function \ngetParamSet\n.\n\n\ngetParamSet(lrn)\n#\n Type len Def Constr Req\n#\n ppc.BoxCox logical - FALSE - -\n#\n ppc.YeoJohnson logical - FALSE - -\n#\n ppc.expoTrans logical - FALSE - -\n#\n ppc.center logical - TRUE - -\n#\n ppc.scale logical - TRUE - -\n#\n ppc.range logical - FALSE - -\n#\n ppc.knnImpute logical - FALSE - -\n#\n ppc.bagImpute logical - FALSE - -\n#\n ppc.medianImpute logical - FALSE - -\n#\n ppc.pca logical - FALSE - -\n#\n ppc.ica logical - FALSE - -\n#\n ppc.spatialSign logical - FALSE - -\n#\n ppc.thresh numeric - 0.95 0 to Inf -\n#\n ppc.pcaComp integer - - 1 to Inf -\n#\n ppc.na.remove logical - TRUE - -\n#\n ppc.k integer - 5 1 to Inf -\n#\n ppc.fudge numeric - 0.2 0 to Inf -\n#\n ppc.numUnique integer - 3 1 to Inf -\n#\n method discrete - moment moment,mle,mve,t -\n#\n nu numeric - 5 2 to Inf Y\n#\n predict.method discrete - plug-in plug-in,predictive,debiased -\n#\n Tunable Trafo\n#\n ppc.BoxCox TRUE -\n#\n ppc.YeoJohnson TRUE -\n#\n ppc.expoTrans TRUE -\n#\n ppc.center TRUE -\n#\n ppc.scale TRUE -\n#\n ppc.range TRUE -\n#\n ppc.knnImpute TRUE -\n#\n ppc.bagImpute TRUE -\n#\n ppc.medianImpute TRUE -\n#\n ppc.pca TRUE -\n#\n ppc.ica TRUE -\n#\n ppc.spatialSign TRUE -\n#\n ppc.thresh TRUE -\n#\n ppc.pcaComp TRUE -\n#\n ppc.na.remove TRUE -\n#\n ppc.k TRUE -\n#\n ppc.fudge TRUE -\n#\n ppc.numUnique TRUE -\n#\n method TRUE -\n#\n nu TRUE -\n#\n predict.method TRUE -\n\n\n\n\nThe parameters prefixed by \nppc.\n belong to preprocessing. \nmethod\n, \nnu\n and \npredict.method\n\nare \nqda\n parameters.\n\n\nInstead of tuning the PCA threshold (\nppc.thresh\n) we tune the number of principal\ncomponents (\nppc.pcaComp\n) directly.\nMoreover, for \nqda\n we try two different ways to estimate the posterior probabilities\n(parameter \npredict.method\n): the usual plug-in estimates and unbiased estimates.\n\n\nWe perform a grid search and set the resolution to 10.\nThis is for demonstration. You might want to use a finer resolution.\n\n\nps = makeParamSet(\n makeIntegerParam(\nppc.pcaComp\n, lower = 1, upper = getTaskNFeats(sonar.task)),\n makeDiscreteParam(\npredict.method\n, values = c(\nplug-in\n, \ndebiased\n))\n)\nctrl = makeTuneControlGrid(resolution = 10)\nres = tuneParams(lrn, sonar.task, rin, par.set = ps, control = ctrl, show.info = FALSE)\nres\n#\n Tune result:\n#\n Op. pars: ppc.pcaComp=8; predict.method=plug-in\n#\n mmce.test.mean=0.192\n\nas.data.frame(res$opt.path)[1:3]\n#\n ppc.pcaComp predict.method mmce.test.mean\n#\n 1 1 plug-in 0.4757074\n#\n 2 8 plug-in 0.1920635\n#\n 3 14 plug-in 0.2162871\n#\n 4 21 plug-in 0.2643202\n#\n 5 27 plug-in 0.2454106\n#\n 6 34 plug-in 0.2645273\n#\n 7 40 plug-in 0.2742581\n#\n 8 47 plug-in 0.3173223\n#\n 9 53 plug-in 0.3512767\n#\n 10 60 plug-in 0.3941339\n#\n 11 1 debiased 0.5336094\n#\n 12 8 debiased 0.2450656\n#\n 13 14 debiased 0.2403037\n#\n 14 21 debiased 0.2546584\n#\n 15 27 debiased 0.3075224\n#\n 16 34 debiased 0.3172533\n#\n 17 40 debiased 0.3125604\n#\n 18 47 debiased 0.2979986\n#\n 19 53 debiased 0.3079365\n#\n 20 60 debiased 0.3654244\n\n\n\n\nThere seems to be a preference for a lower number of principal components (\n27) for both \n\"plug-in\"\n\nand \n\"debiased\"\n with \n\"plug-in\"\n achieving slightly lower error rates.\n\n\nWriting a custom preprocessing wrapper\n\n\nIf the options offered by \nmakePreprocWrapperCaret\n are not enough, you can write your own\npreprocessing wrapper using function \nmakePreprocWrapper\n.\n\n\nAs described in the tutorial section about \nwrapped learners\n wrappers are\nimplemented using a \ntrain\n and a \npredict\n method.\nIn case of preprocessing wrappers these methods specify how to transform the data before\ntraining and before prediction and are \ncompletely user-defined\n.\n\n\nBelow we show how to create a preprocessing wrapper that centers and scales the data before\ntraining/predicting.\nSome learning methods as, e.g., k nearest neighbors, support vector machines or neural networks\nusually require scaled features.\nMany, but not all, have a built-in scaling option where the training data set is scaled before\nmodel fitting and the test data set is scaled accordingly, that is by using the scaling\nparameters from the training stage, before making predictions.\nIn the following we show how to add a scaling option to a \nLearner\n by coupling\nit with function \nscale\n.\n\n\nNote that we chose this simple example for demonstration.\nCentering/scaling the data is also possible with \nmakePreprocWrapperCaret\n.\n\n\nSpecifying the train function\n\n\nThe \ntrain\n function has to be a function with the following arguments:\n\n\n\n\ndata\n is a \ndata.frame\n with columns for all features and\n the target variable.\n\n\ntarget\n is a string and denotes the name of the target variable in \ndata\n.\n\n\nargs\n is a \nlist\n of further arguments and parameters that influence the\n preprocessing.\n\n\n\n\nIt must return a \nlist\n with elements \n$data\n and \n$control\n,\nwhere \n$data\n is the preprocessed data set and \n$control\n stores all information required\nto preprocess the data before prediction.\n\n\nThe \ntrain\n function for the scaling example is given below. It calls \nscale\n on the\nnumerical features and returns the scaled training data and the corresponding scaling parameters.\n\n\nargs\n contains the \ncenter\n and \nscale\n arguments of function \nscale\n\nand slot \n$control\n stores the scaling parameters to be used in the prediction stage.\n\n\nRegarding the latter note that the \ncenter\n and \nscale\n arguments of \nscale\n\ncan be either a logical value or a numeric vector of length equal to the number of the numeric\ncolumns in \ndata\n, respectively.\nIf a logical value was passed to \nargs\n we store the column means and standard deviations/\nroot mean squares in the \n$center\n and \n$scale\n slots of the returned \n$control\n object.\n\n\ntrainfun = function(data, target, args = list(center, scale)) {\n ## Identify numerical features\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n ## Store the scaling parameters in control\n ## These are needed to preprocess the data before prediction\n control = args\n if (is.logical(control$center) \n control$center)\n control$center = attr(x, \nscaled:center\n)\n if (is.logical(control$scale) \n control$scale)\n control$scale = attr(x, \nscaled:scale\n)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n}\n\n\n\n\nSpecifying the predict function\n\n\nThe \npredict\n function has the following arguments:\n\n\n\n\ndata\n is a \ndata.frame\n containing \nonly\n feature values\n (as for prediction the target values naturally are not known).\n\n\ntarget\n is a string indicating the name of the target variable.\n\n\nargs\n are the \nargs\n that were passed to the \ntrain\n function.\n\n\ncontrol\n is the object returned by the \ntrain\n function.\n\n\n\n\nIt returns the preprocessed data.\n\n\nIn our scaling example the \npredict\n function scales the numerical features using the\nparameters from the training stage stored in \ncontrol\n.\n\n\npredictfun = function(data, target, args, control) {\n ## Identify numerical features\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n}\n\n\n\n\nCreating the preprocessing wrapper\n\n\nBelow we create a preprocessing wrapper with a \nregression neural network\n (which\nitself does not have a scaling option) as base learner.\n\n\nThe \ntrain\n and \npredict\n functions defined above are passed to \nmakePreprocWrapper\n via\nthe \ntrain\n and \npredict\n arguments.\n\npar.vals\n is a \nlist\n of parameter values that is relayed to the \nargs\n\nargument of the \ntrain\n function.\n\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE, decay = 1e-02)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.vals = list(center = TRUE, scale = TRUE))\nlrn\n#\n Learner regr.nnet.preproc from package nnet\n#\n Type: regr\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,weights\n#\n Predict-Type: response\n#\n Hyperparameters: size=3,trace=FALSE,decay=0.01\n\n\n\n\nLet's compare the cross-validated mean squared error (\nmse\n) on the\n\nBoston Housing data set\n with and without scaling.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.nnet.preproc\n#\n mse.aggr: 20.62\n#\n mse.mean: 20.62\n#\n mse.sd: 8.53\n#\n Runtime: 0.218279\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE, decay = 1e-02)\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.nnet\n#\n mse.aggr: 55.06\n#\n mse.mean: 55.06\n#\n mse.sd: 19.40\n#\n Runtime: 0.177695\n\n\n\n\nJoint tuning of preprocessing and learner parameters\n\n\nOften it's not clear which preprocessing options work best with a certain learning algorithm.\nAs already shown for the number of principal components in \nmakePreprocWrapperCaret\n we can\n\ntune\n them easily together with other hyperparameters of the learner.\n\n\nIn our scaling example we can try if \nnnet\n works best with both centering and\nscaling the data or if it's better to omit one of the two operations or do no preprocessing\nat all.\nIn order to tune \ncenter\n and \nscale\n we have to add appropriate \nLearnerParam\ns\nto the \nparameter set\n of the wrapped learner.\n\n\nAs mentioned above \nscale\n allows for numeric and logical \ncenter\n and \nscale\n\narguments. As we want to use the latter option we declare \ncenter\n and \nscale\n as logical\nlearner parameters.\n\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam(\ncenter\n),\n makeLogicalLearnerParam(\nscale\n)\n ),\n par.vals = list(center = TRUE, scale = TRUE))\n\nlrn\n#\n Learner regr.nnet.preproc from package nnet\n#\n Type: regr\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,weights\n#\n Predict-Type: response\n#\n Hyperparameters: size=3,trace=FALSE,center=TRUE,scale=TRUE\n\ngetParamSet(lrn)\n#\n Type len Def Constr Req Tunable Trafo\n#\n center logical - - - - TRUE -\n#\n scale logical - - - - TRUE -\n#\n size integer - 3 0 to Inf - TRUE -\n#\n maxit integer - 100 1 to Inf - TRUE -\n#\n linout logical - FALSE - Y TRUE -\n#\n entropy logical - FALSE - Y TRUE -\n#\n softmax logical - FALSE - Y TRUE -\n#\n censored logical - FALSE - Y TRUE -\n#\n skip logical - FALSE - - TRUE -\n#\n rang numeric - 0.7 -Inf to Inf - TRUE -\n#\n decay numeric - 0 0 to Inf - TRUE -\n#\n Hess logical - FALSE - - TRUE -\n#\n trace logical - TRUE - - FALSE -\n#\n MaxNWts integer - 1000 1 to Inf - TRUE -\n#\n abstoll numeric - 0.0001 -Inf to Inf - TRUE -\n#\n reltoll numeric - 1e-08 -Inf to Inf - TRUE -\n\n\n\n\nNow we do a simple grid search for the \ndecay\n parameter of \nnnet\n and the\n\ncenter\n and \nscale\n parameters.\n\n\nrdesc = makeResampleDesc(\nHoldout\n)\nps = makeParamSet(\n makeDiscreteParam(\ndecay\n, c(0, 0.05, 0.1)),\n makeLogicalParam(\ncenter\n),\n makeLogicalParam(\nscale\n)\n)\nctrl = makeTuneControlGrid()\nres = tuneParams(lrn, bh.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\n\nres\n#\n Tune result:\n#\n Op. pars: decay=0.05; center=FALSE; scale=TRUE\n#\n mse.test.mean=14.8\n\nas.data.frame(res$opt.path)\n#\n decay center scale mse.test.mean dob eol error.message exec.time\n#\n 1 0 TRUE TRUE 49.38128 1 NA \nNA\n 0.069\n#\n 2 0.05 TRUE TRUE 20.64761 2 NA \nNA\n 0.082\n#\n 3 0.1 TRUE TRUE 22.42986 3 NA \nNA\n 0.078\n#\n 4 0 FALSE TRUE 96.25474 4 NA \nNA\n 0.034\n#\n 5 0.05 FALSE TRUE 14.84306 5 NA \nNA\n 0.082\n#\n 6 0.1 FALSE TRUE 16.65383 6 NA \nNA\n 0.078\n#\n 7 0 TRUE FALSE 40.51518 7 NA \nNA\n 0.079\n#\n 8 0.05 TRUE FALSE 68.00069 8 NA \nNA\n 0.077\n#\n 9 0.1 TRUE FALSE 55.42210 9 NA \nNA\n 0.085\n#\n 10 0 FALSE FALSE 96.25474 10 NA \nNA\n 0.032\n#\n 11 0.05 FALSE FALSE 56.25758 11 NA \nNA\n 0.083\n#\n 12 0.1 FALSE FALSE 42.85529 12 NA \nNA\n 0.079\n\n\n\n\nPreprocessing wrapper functions\n\n\nIf you have written a preprocessing wrapper that you might want to use from time to time\nit's a good idea to encapsulate it in an own function as shown below.\nIf you think your preprocessing method is something others might want to use as well and should\nbe integrated into \nmlr\n just \ncontact us\n.\n\n\nmakePreprocWrapperScale = function(learner) {\n trainfun = function(data, target, args = list(center, scale)) {\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n control = args\n if (is.logical(control$center) \n control$center)\n control$center = attr(x, \nscaled:center\n)\n if (is.logical(control$scale) \n control$scale)\n control$scale = attr(x, \nscaled:scale\n)\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n }\n predictfun = function(data, target, args, control) {\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n }\n makePreprocWrapper(\n learner,\n train = trainfun,\n predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam(\ncenter\n),\n makeLogicalLearnerParam(\nscale\n)\n ),\n par.vals = list(center = TRUE, scale = TRUE)\n )\n}\n\nmakePreprocWrapperScale(\nclassif.lda\n)\n#\n Learner classif.lda.preproc from package MASS\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: center=TRUE,scale=TRUE", + "text": "Data Preprocessing\n\n\nData preprocessing refers to any transformation of the data done before applying a learning\nalgorithm.\nThis comprises for example finding and resolving inconsistencies, imputation of missing values,\nidentifying, removing or replacing outliers, discretizing numerical data or generating numerical\ndummy variables for categorical data, any kind of transformation like standardization of predictors\nor Box-Cox, dimensionality reduction and feature extraction and/or selection.\n\n\nmlr\n offers several options for data preprocessing.\nSome of the following simple methods to change a \nTask\n were already mentioned in the\nsection about \nlearning tasks\n:\n\n\n\n\ncapLargeValues\n: Convert large/infinite numeric values in a \ndata.frame\n\n or \nTask\n.\n\n\ncreateDummyFeatures\n: Generate dummy variables for factor features in a\n \ndata.frame\n or \nTask\n.\n\n\ndropFeatures\n: Remove some features from a \nTask\n.\n\n\njoinClassLevels\n: Only for classification: Merge existing classes in a \nTask\n to\n new, larger classes.\n\n\nmergeSmallFactorLevels\n: Merge infrequent levels of factor features in a \nTask\n.\n\n\nnormalizeFeatures\n: Normalize features in a \nTask\n by different methods, e.g.,\n standardization or scaling to a certain range.\n\n\nremoveConstantFeatures\n: Remove constant features from a \nTask\n.\n\n\nsubsetTask\n: Remove observations and/or features from a \nTask\n.\n\n\n\n\nMoreover, there are tutorial sections devoted to\n\n\n\n\nFeature selection\n and\n\n\nImputation of missing values\n.\n\n\n\n\nFusing learners with preprocessing\n\n\nmlr\n's wrapper functionality permits to combine learners with preprocessing steps.\nThis means that the preprocessing \"belongs\" to the learner and is done any time the learner\nis trained or predictions are made.\n\n\nThis is, on the one hand, very practical.\nYou don't need to change any data or learning \nTask\ns and it's quite easy to combine\ndifferent learners with different preprocessing steps.\n\n\nOn the other hand this helps to avoid a common mistake in evaluating the performance of a\nlearner with preprocessing:\nPreprocessing is often seen as completely independent of the later applied learning algorithms.\nWhen estimating the performance of the a learner, e.g., by cross-validation all preprocessing\nis done beforehand on the full data set and only training/predicting the learner is done on the\ntrain/test sets.\nDepending on what exactly is done as preprocessing this can lead to overoptimistic results.\nFor example if imputation by the mean is done on the whole data set before evaluating the learner\nperformance you are using information from the test data during training, which can cause\noveroptimistic performance results.\n\n\nTo clarify things one should distinguish between \ndata-dependent\n and \ndata-independent\n\npreprocessing steps:\nData-dependent steps in some way learn from the data and give different results when applied to\ndifferent data sets. Data-independent steps always lead to the same results.\nClearly, correcting errors in the data or removing data columns like Ids that should\nnot be used for learning, is data-independent.\nImputation of missing values by the mean, as mentioned above, is data-dependent.\nImputation by a fixed constant, however, is not.\n\n\nTo get a honest estimate of learner performance combined with preprocessing, all data-dependent\npreprocessing steps must be included in the resampling.\nThis is automatically done when fusing a learner with preprocessing.\n\n\nTo this end \nmlr\n provides two \nwrappers\n:\n\n\n\n\nmakePreprocWrapperCaret\n is an interface to all preprocessing options offered by \ncaret\n's\n \npreProcess\n function.\n\n\nmakePreprocWrapper\n permits to write your own custom preprocessing methods by defining\n the actions to be taken before training and before prediction.\n\n\n\n\nAs mentioned above the specified preprocessing steps then \"belong\" to the wrapped \nLearner\n.\nIn contrast to the preprocessing options listed above like \nnormalizeFeatures\n\n\n\n\nthe \nTask\n itself remains unchanged,\n\n\nthe preprocessing is not done globally, i.e., for the whole data set, but for every pair of\n training/test data sets in, e.g., resampling,\n\n\nany parameters controlling the preprocessing as, e.g., the percentage of outliers to be removed\n can be \ntuned\n together with the base learner parameters.\n\n\n\n\nWe start with some examples for \nmakePreprocWrapperCaret\n.\n\n\nPreprocessing with makePreprocWrapperCaret\n\n\nmakePreprocWrapperCaret\n is an interface to \ncaret\n's \npreProcess\n\nfunction that provides many different options like imputation of missing values,\ndata transformations as scaling the features to a certain range or Box-Cox and dimensionality\nreduction via Independent or Principal Component Analysis.\nFor all possible options see the help page of function \npreProcess\n.\n\n\nNote that the usage of \nmakePreprocWrapperCaret\n is slightly different than that of\n\npreProcess\n.\n\n\n\n\nmakePreprocWrapperCaret\n takes (almost) the same formal arguments as \npreProcess\n,\n but their names are prefixed by \nppc.\n.\n\n\nThe only exception: \nmakePreprocWrapperCaret\n does not have a \nmethod\n argument. Instead\n all preprocessing options that would be passed to \npreProcess\n's \nmethod\n\n argument are given as individual logical parameters to \nmakePreprocWrapperCaret\n.\n\n\n\n\nFor example the following call to \npreProcess\n\n\npreProcess(x, method = c(\nknnImpute\n, \npca\n), pcaComp = 10)\n\n\n\n\nwith \nx\n being a \nmatrix\n or \ndata.frame\n\nwould thus translate into\n\n\nmakePreprocWrapperCaret(learner, ppc.knnImpute = TRUE, ppc.pca = TRUE, ppc.pcaComp = 10)\n\n\n\n\nwhere \nlearner\n is a \nmlr\n \nLearner\n or the name of a learner class like\n\n\"classif.lda\"\n.\n\n\nIf you enable multiple preprocessing options (like knn imputation and principal component\nanalysis above) these are executed in a certain order detailed on the help page of function\n\npreProcess\n.\n\n\nIn the following we show an example where principal components analysis (PCA) is used for\ndimensionality reduction.\nThis should never be applied blindly, but can be beneficial with learners that get problems\nwith high dimensionality or those that can profit from rotating the data.\n\n\nWe consider the \nsonar.task\n, which poses a binary classification problem with 208 observations\nand 60 features.\n\n\nsonar.task\n#\n Supervised task: Sonar-example\n#\n Type: classif\n#\n Target: Class\n#\n Observations: 208\n#\n Features:\n#\n numerics factors ordered \n#\n 60 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 2\n#\n M R \n#\n 111 97 \n#\n Positive class: M\n\n\n\n\nBelow we fuse \nquadratic discriminant analysis\n from package \nMASS\n with a principal\ncomponents preprocessing step.\nThe threshold is set to 0.9, i.e., the principal components necessary to explain a cumulative\npercentage of 90% of the total variance are kept.\nThe data are automatically standardized prior to PCA.\n\n\nlrn = makePreprocWrapperCaret(\nclassif.qda\n, ppc.pca = TRUE, ppc.thresh = 0.9)\nlrn\n#\n Learner classif.qda.preproc from package MASS\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: PreprocWrapperCaret\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: ppc.BoxCox=FALSE,ppc.YeoJohnson=FALSE,ppc.expoTrans=FALSE,ppc.center=TRUE,ppc.scale=TRUE,ppc.range=FALSE,ppc.knnImpute=FALSE,ppc.bagImpute=FALSE,ppc.medianImpute=FALSE,ppc.pca=TRUE,ppc.ica=FALSE,ppc.spatialSign=FALSE,ppc.thresh=0.9,ppc.na.remove=TRUE,ppc.k=5,ppc.fudge=0.2,ppc.numUnique=3\n\n\n\n\nThe wrapped learner is trained on the \nsonar.task\n.\nBy inspecting the underlying \nqda\n model, we see that the first 22\nprincipal components have been used for training.\n\n\nmod = train(lrn, sonar.task)\nmod\n#\n Model for learner.id=classif.qda.preproc; learner.class=PreprocWrapperCaret\n#\n Trained on: task.id = Sonar-example; obs = 208; features = 60\n#\n Hyperparameters: ppc.BoxCox=FALSE,ppc.YeoJohnson=FALSE,ppc.expoTrans=FALSE,ppc.center=TRUE,ppc.scale=TRUE,ppc.range=FALSE,ppc.knnImpute=FALSE,ppc.bagImpute=FALSE,ppc.medianImpute=FALSE,ppc.pca=TRUE,ppc.ica=FALSE,ppc.spatialSign=FALSE,ppc.thresh=0.9,ppc.na.remove=TRUE,ppc.k=5,ppc.fudge=0.2,ppc.numUnique=3\n\ngetLearnerModel(mod)\n#\n Model for learner.id=classif.qda; learner.class=classif.qda\n#\n Trained on: task.id = Sonar-example; obs = 208; features = 22\n#\n Hyperparameters:\n\ngetLearnerModel(mod, more.unwrap = TRUE)\n#\n Call:\n#\n qda(f, data = getTaskData(.task, .subset, recode.target = \ndrop.levels\n))\n#\n \n#\n Prior probabilities of groups:\n#\n M R \n#\n 0.5336538 0.4663462 \n#\n \n#\n Group means:\n#\n PC1 PC2 PC3 PC4 PC5 PC6\n#\n M 0.5976122 -0.8058235 0.9773518 0.03794232 -0.04568166 -0.06721702\n#\n R -0.6838655 0.9221279 -1.1184128 -0.04341853 0.05227489 0.07691845\n#\n PC7 PC8 PC9 PC10 PC11 PC12\n#\n M 0.2278162 -0.01034406 -0.2530606 -0.1793157 -0.04084466 -0.0004789888\n#\n R -0.2606969 0.01183702 0.2895848 0.2051963 0.04673977 0.0005481212\n#\n PC13 PC14 PC15 PC16 PC17 PC18\n#\n M -0.06138758 -0.1057137 0.02808048 0.05215865 -0.07453265 0.03869042\n#\n R 0.07024765 0.1209713 -0.03213333 -0.05968671 0.08528994 -0.04427460\n#\n PC19 PC20 PC21 PC22\n#\n M -0.01192247 0.006098658 0.01263492 -0.001224809\n#\n R 0.01364323 -0.006978877 -0.01445851 0.001401586\n\n\n\n\nBelow the performances of \nqda\n with and without PCA preprocessing are compared\nin a \nbenchmark experiment\n.\nNote that we use stratified resampling to prevent errors in \nqda\n due to a too\nsmall number of observations from either class.\n\n\nrin = makeResampleInstance(\nCV\n, iters = 3, stratify = TRUE, task = sonar.task)\nres = benchmark(list(makeLearner(\nclassif.qda\n), lrn), sonar.task, rin, show.info = FALSE)\nres\n#\n task.id learner.id mmce.test.mean\n#\n 1 Sonar-example classif.qda 0.3941339\n#\n 2 Sonar-example classif.qda.preproc 0.2643202\n\n\n\n\nPCA preprocessing in this case turns out to be really beneficial for the\nperformance of Quadratic Discriminant Analysis.\n\n\nJoint tuning of preprocessing options and learner parameters\n\n\nLet's see if we can optimize this a bit.\nThe threshold value of 0.9 above was chosen arbitrarily and led to 22 out of 60 principal\ncomponents.\nBut maybe a lower or higher number of principal components should be used.\nMoreover, \nqda\n has several options that control how the class covariance matrices\nor class probabilities are estimated.\n\n\nThose preprocessing and learner parameters can be \ntuned\n jointly.\nBefore doing this let's first get an overview of all the parameters of the wrapped learner\nusing function \ngetParamSet\n.\n\n\ngetParamSet(lrn)\n#\n Type len Def Constr Req\n#\n ppc.BoxCox logical - FALSE - -\n#\n ppc.YeoJohnson logical - FALSE - -\n#\n ppc.expoTrans logical - FALSE - -\n#\n ppc.center logical - TRUE - -\n#\n ppc.scale logical - TRUE - -\n#\n ppc.range logical - FALSE - -\n#\n ppc.knnImpute logical - FALSE - -\n#\n ppc.bagImpute logical - FALSE - -\n#\n ppc.medianImpute logical - FALSE - -\n#\n ppc.pca logical - FALSE - -\n#\n ppc.ica logical - FALSE - -\n#\n ppc.spatialSign logical - FALSE - -\n#\n ppc.thresh numeric - 0.95 0 to Inf -\n#\n ppc.pcaComp integer - - 1 to Inf -\n#\n ppc.na.remove logical - TRUE - -\n#\n ppc.k integer - 5 1 to Inf -\n#\n ppc.fudge numeric - 0.2 0 to Inf -\n#\n ppc.numUnique integer - 3 1 to Inf -\n#\n method discrete - moment moment,mle,mve,t -\n#\n nu numeric - 5 2 to Inf Y\n#\n predict.method discrete - plug-in plug-in,predictive,debiased -\n#\n Tunable Trafo\n#\n ppc.BoxCox TRUE -\n#\n ppc.YeoJohnson TRUE -\n#\n ppc.expoTrans TRUE -\n#\n ppc.center TRUE -\n#\n ppc.scale TRUE -\n#\n ppc.range TRUE -\n#\n ppc.knnImpute TRUE -\n#\n ppc.bagImpute TRUE -\n#\n ppc.medianImpute TRUE -\n#\n ppc.pca TRUE -\n#\n ppc.ica TRUE -\n#\n ppc.spatialSign TRUE -\n#\n ppc.thresh TRUE -\n#\n ppc.pcaComp TRUE -\n#\n ppc.na.remove TRUE -\n#\n ppc.k TRUE -\n#\n ppc.fudge TRUE -\n#\n ppc.numUnique TRUE -\n#\n method TRUE -\n#\n nu TRUE -\n#\n predict.method TRUE -\n\n\n\n\nThe parameters prefixed by \nppc.\n belong to preprocessing. \nmethod\n, \nnu\n and \npredict.method\n\nare \nqda\n parameters.\n\n\nInstead of tuning the PCA threshold (\nppc.thresh\n) we tune the number of principal\ncomponents (\nppc.pcaComp\n) directly.\nMoreover, for \nqda\n we try two different ways to estimate the posterior probabilities\n(parameter \npredict.method\n): the usual plug-in estimates and unbiased estimates.\n\n\nWe perform a grid search and set the resolution to 10.\nThis is for demonstration. You might want to use a finer resolution.\n\n\nps = makeParamSet(\n makeIntegerParam(\nppc.pcaComp\n, lower = 1, upper = getTaskNFeats(sonar.task)),\n makeDiscreteParam(\npredict.method\n, values = c(\nplug-in\n, \ndebiased\n))\n)\nctrl = makeTuneControlGrid(resolution = 10)\nres = tuneParams(lrn, sonar.task, rin, par.set = ps, control = ctrl, show.info = FALSE)\nres\n#\n Tune result:\n#\n Op. pars: ppc.pcaComp=8; predict.method=plug-in\n#\n mmce.test.mean=0.192\n\nas.data.frame(res$opt.path)[1:3]\n#\n ppc.pcaComp predict.method mmce.test.mean\n#\n 1 1 plug-in 0.4757074\n#\n 2 8 plug-in 0.1920635\n#\n 3 14 plug-in 0.2162871\n#\n 4 21 plug-in 0.2643202\n#\n 5 27 plug-in 0.2454106\n#\n 6 34 plug-in 0.2645273\n#\n 7 40 plug-in 0.2742581\n#\n 8 47 plug-in 0.3173223\n#\n 9 53 plug-in 0.3512767\n#\n 10 60 plug-in 0.3941339\n#\n 11 1 debiased 0.5336094\n#\n 12 8 debiased 0.2450656\n#\n 13 14 debiased 0.2403037\n#\n 14 21 debiased 0.2546584\n#\n 15 27 debiased 0.3075224\n#\n 16 34 debiased 0.3172533\n#\n 17 40 debiased 0.3125604\n#\n 18 47 debiased 0.2979986\n#\n 19 53 debiased 0.3079365\n#\n 20 60 debiased 0.3654244\n\n\n\n\nThere seems to be a preference for a lower number of principal components (\n27) for both \n\"plug-in\"\n\nand \n\"debiased\"\n with \n\"plug-in\"\n achieving slightly lower error rates.\n\n\nWriting a custom preprocessing wrapper\n\n\nIf the options offered by \nmakePreprocWrapperCaret\n are not enough, you can write your own\npreprocessing wrapper using function \nmakePreprocWrapper\n.\n\n\nAs described in the tutorial section about \nwrapped learners\n wrappers are\nimplemented using a \ntrain\n and a \npredict\n method.\nIn case of preprocessing wrappers these methods specify how to transform the data before\ntraining and before prediction and are \ncompletely user-defined\n.\n\n\nBelow we show how to create a preprocessing wrapper that centers and scales the data before\ntraining/predicting.\nSome learning methods as, e.g., k nearest neighbors, support vector machines or neural networks\nusually require scaled features.\nMany, but not all, have a built-in scaling option where the training data set is scaled before\nmodel fitting and the test data set is scaled accordingly, that is by using the scaling\nparameters from the training stage, before making predictions.\nIn the following we show how to add a scaling option to a \nLearner\n by coupling\nit with function \nscale\n.\n\n\nNote that we chose this simple example for demonstration.\nCentering/scaling the data is also possible with \nmakePreprocWrapperCaret\n.\n\n\nSpecifying the train function\n\n\nThe \ntrain\n function has to be a function with the following arguments:\n\n\n\n\ndata\n is a \ndata.frame\n with columns for all features and\n the target variable.\n\n\ntarget\n is a string and denotes the name of the target variable in \ndata\n.\n\n\nargs\n is a \nlist\n of further arguments and parameters that influence the\n preprocessing.\n\n\n\n\nIt must return a \nlist\n with elements \n$data\n and \n$control\n,\nwhere \n$data\n is the preprocessed data set and \n$control\n stores all information required\nto preprocess the data before prediction.\n\n\nThe \ntrain\n function for the scaling example is given below. It calls \nscale\n on the\nnumerical features and returns the scaled training data and the corresponding scaling parameters.\n\n\nargs\n contains the \ncenter\n and \nscale\n arguments of function \nscale\n\nand slot \n$control\n stores the scaling parameters to be used in the prediction stage.\n\n\nRegarding the latter note that the \ncenter\n and \nscale\n arguments of \nscale\n\ncan be either a logical value or a numeric vector of length equal to the number of the numeric\ncolumns in \ndata\n, respectively.\nIf a logical value was passed to \nargs\n we store the column means and standard deviations/\nroot mean squares in the \n$center\n and \n$scale\n slots of the returned \n$control\n object.\n\n\ntrainfun = function(data, target, args = list(center, scale)) {\n ## Identify numerical features\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n ## Store the scaling parameters in control\n ## These are needed to preprocess the data before prediction\n control = args\n if (is.logical(control$center) \n control$center)\n control$center = attr(x, \nscaled:center\n)\n if (is.logical(control$scale) \n control$scale)\n control$scale = attr(x, \nscaled:scale\n)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n}\n\n\n\n\nSpecifying the predict function\n\n\nThe \npredict\n function has the following arguments:\n\n\n\n\ndata\n is a \ndata.frame\n containing \nonly\n feature values\n (as for prediction the target values naturally are not known).\n\n\ntarget\n is a string indicating the name of the target variable.\n\n\nargs\n are the \nargs\n that were passed to the \ntrain\n function.\n\n\ncontrol\n is the object returned by the \ntrain\n function.\n\n\n\n\nIt returns the preprocessed data.\n\n\nIn our scaling example the \npredict\n function scales the numerical features using the\nparameters from the training stage stored in \ncontrol\n.\n\n\npredictfun = function(data, target, args, control) {\n ## Identify numerical features\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n}\n\n\n\n\nCreating the preprocessing wrapper\n\n\nBelow we create a preprocessing wrapper with a \nregression neural network\n (which\nitself does not have a scaling option) as base learner.\n\n\nThe \ntrain\n and \npredict\n functions defined above are passed to \nmakePreprocWrapper\n via\nthe \ntrain\n and \npredict\n arguments.\n\npar.vals\n is a \nlist\n of parameter values that is relayed to the \nargs\n\nargument of the \ntrain\n function.\n\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE, decay = 1e-02)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.vals = list(center = TRUE, scale = TRUE))\nlrn\n#\n Learner regr.nnet.preproc from package nnet\n#\n Type: regr\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,weights\n#\n Predict-Type: response\n#\n Hyperparameters: size=3,trace=FALSE,decay=0.01\n\n\n\n\nLet's compare the cross-validated mean squared error (\nmse\n) on the\n\nBoston Housing data set\n with and without scaling.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\n\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.nnet.preproc\n#\n mse.aggr: 20.62\n#\n mse.mean: 20.62\n#\n mse.sd: 8.53\n#\n Runtime: 0.185565\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE, decay = 1e-02)\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.nnet\n#\n mse.aggr: 55.06\n#\n mse.mean: 55.06\n#\n mse.sd: 19.40\n#\n Runtime: 0.150321\n\n\n\n\nJoint tuning of preprocessing and learner parameters\n\n\nOften it's not clear which preprocessing options work best with a certain learning algorithm.\nAs already shown for the number of principal components in \nmakePreprocWrapperCaret\n we can\n\ntune\n them easily together with other hyperparameters of the learner.\n\n\nIn our scaling example we can try if \nnnet\n works best with both centering and\nscaling the data or if it's better to omit one of the two operations or do no preprocessing\nat all.\nIn order to tune \ncenter\n and \nscale\n we have to add appropriate \nLearnerParam\ns\nto the \nparameter set\n of the wrapped learner.\n\n\nAs mentioned above \nscale\n allows for numeric and logical \ncenter\n and \nscale\n\narguments. As we want to use the latter option we declare \ncenter\n and \nscale\n as logical\nlearner parameters.\n\n\nlrn = makeLearner(\nregr.nnet\n, trace = FALSE)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam(\ncenter\n),\n makeLogicalLearnerParam(\nscale\n)\n ),\n par.vals = list(center = TRUE, scale = TRUE))\n\nlrn\n#\n Learner regr.nnet.preproc from package nnet\n#\n Type: regr\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,weights\n#\n Predict-Type: response\n#\n Hyperparameters: size=3,trace=FALSE,center=TRUE,scale=TRUE\n\ngetParamSet(lrn)\n#\n Type len Def Constr Req Tunable Trafo\n#\n center logical - - - - TRUE -\n#\n scale logical - - - - TRUE -\n#\n size integer - 3 0 to Inf - TRUE -\n#\n maxit integer - 100 1 to Inf - TRUE -\n#\n linout logical - FALSE - Y TRUE -\n#\n entropy logical - FALSE - Y TRUE -\n#\n softmax logical - FALSE - Y TRUE -\n#\n censored logical - FALSE - Y TRUE -\n#\n skip logical - FALSE - - TRUE -\n#\n rang numeric - 0.7 -Inf to Inf - TRUE -\n#\n decay numeric - 0 0 to Inf - TRUE -\n#\n Hess logical - FALSE - - TRUE -\n#\n trace logical - TRUE - - FALSE -\n#\n MaxNWts integer - 1000 1 to Inf - TRUE -\n#\n abstoll numeric - 0.0001 -Inf to Inf - TRUE -\n#\n reltoll numeric - 1e-08 -Inf to Inf - TRUE -\n\n\n\n\nNow we do a simple grid search for the \ndecay\n parameter of \nnnet\n and the\n\ncenter\n and \nscale\n parameters.\n\n\nrdesc = makeResampleDesc(\nHoldout\n)\nps = makeParamSet(\n makeDiscreteParam(\ndecay\n, c(0, 0.05, 0.1)),\n makeLogicalParam(\ncenter\n),\n makeLogicalParam(\nscale\n)\n)\nctrl = makeTuneControlGrid()\nres = tuneParams(lrn, bh.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\n\nres\n#\n Tune result:\n#\n Op. pars: decay=0.05; center=FALSE; scale=TRUE\n#\n mse.test.mean=14.8\n\nas.data.frame(res$opt.path)\n#\n decay center scale mse.test.mean dob eol error.message exec.time\n#\n 1 0 TRUE TRUE 49.38128 1 NA \nNA\n 0.064\n#\n 2 0.05 TRUE TRUE 20.64761 2 NA \nNA\n 0.071\n#\n 3 0.1 TRUE TRUE 22.42986 3 NA \nNA\n 0.066\n#\n 4 0 FALSE TRUE 96.25474 4 NA \nNA\n 0.028\n#\n 5 0.05 FALSE TRUE 14.84306 5 NA \nNA\n 0.070\n#\n 6 0.1 FALSE TRUE 16.65383 6 NA \nNA\n 0.060\n#\n 7 0 TRUE FALSE 40.51518 7 NA \nNA\n 0.069\n#\n 8 0.05 TRUE FALSE 68.00069 8 NA \nNA\n 0.065\n#\n 9 0.1 TRUE FALSE 55.42210 9 NA \nNA\n 0.069\n#\n 10 0 FALSE FALSE 96.25474 10 NA \nNA\n 0.027\n#\n 11 0.05 FALSE FALSE 56.25758 11 NA \nNA\n 0.069\n#\n 12 0.1 FALSE FALSE 42.85529 12 NA \nNA\n 0.067\n\n\n\n\nPreprocessing wrapper functions\n\n\nIf you have written a preprocessing wrapper that you might want to use from time to time\nit's a good idea to encapsulate it in an own function as shown below.\nIf you think your preprocessing method is something others might want to use as well and should\nbe integrated into \nmlr\n just \ncontact us\n.\n\n\nmakePreprocWrapperScale = function(learner) {\n trainfun = function(data, target, args = list(center, scale)) {\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n control = args\n if (is.logical(control$center) \n control$center)\n control$center = attr(x, \nscaled:center\n)\n if (is.logical(control$scale) \n control$scale)\n control$scale = attr(x, \nscaled:scale\n)\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n }\n predictfun = function(data, target, args, control) {\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n }\n makePreprocWrapper(\n learner,\n train = trainfun,\n predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam(\ncenter\n),\n makeLogicalLearnerParam(\nscale\n)\n ),\n par.vals = list(center = TRUE, scale = TRUE)\n )\n}\n\nmakePreprocWrapperScale(\nclassif.lda\n)\n#\n Learner classif.lda.preproc from package MASS\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: PreprocWrapper\n#\n Properties: numerics,factors,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: center=TRUE,scale=TRUE", "title": "Preprocessing" }, { @@ -327,7 +327,7 @@ }, { "location": "/preproc/index.html#writing-a-custom-preprocessing-wrapper", - "text": "If the options offered by makePreprocWrapperCaret are not enough, you can write your own\npreprocessing wrapper using function makePreprocWrapper . As described in the tutorial section about wrapped learners wrappers are\nimplemented using a train and a predict method.\nIn case of preprocessing wrappers these methods specify how to transform the data before\ntraining and before prediction and are completely user-defined . Below we show how to create a preprocessing wrapper that centers and scales the data before\ntraining/predicting.\nSome learning methods as, e.g., k nearest neighbors, support vector machines or neural networks\nusually require scaled features.\nMany, but not all, have a built-in scaling option where the training data set is scaled before\nmodel fitting and the test data set is scaled accordingly, that is by using the scaling\nparameters from the training stage, before making predictions.\nIn the following we show how to add a scaling option to a Learner by coupling\nit with function scale . Note that we chose this simple example for demonstration.\nCentering/scaling the data is also possible with makePreprocWrapperCaret . Specifying the train function The train function has to be a function with the following arguments: data is a data.frame with columns for all features and\n the target variable. target is a string and denotes the name of the target variable in data . args is a list of further arguments and parameters that influence the\n preprocessing. It must return a list with elements $data and $control ,\nwhere $data is the preprocessed data set and $control stores all information required\nto preprocess the data before prediction. The train function for the scaling example is given below. It calls scale on the\nnumerical features and returns the scaled training data and the corresponding scaling parameters. args contains the center and scale arguments of function scale \nand slot $control stores the scaling parameters to be used in the prediction stage. Regarding the latter note that the center and scale arguments of scale \ncan be either a logical value or a numeric vector of length equal to the number of the numeric\ncolumns in data , respectively.\nIf a logical value was passed to args we store the column means and standard deviations/\nroot mean squares in the $center and $scale slots of the returned $control object. trainfun = function(data, target, args = list(center, scale)) {\n ## Identify numerical features\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n ## Store the scaling parameters in control\n ## These are needed to preprocess the data before prediction\n control = args\n if (is.logical(control$center) control$center)\n control$center = attr(x, scaled:center )\n if (is.logical(control$scale) control$scale)\n control$scale = attr(x, scaled:scale )\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n} Specifying the predict function The predict function has the following arguments: data is a data.frame containing only feature values\n (as for prediction the target values naturally are not known). target is a string indicating the name of the target variable. args are the args that were passed to the train function. control is the object returned by the train function. It returns the preprocessed data. In our scaling example the predict function scales the numerical features using the\nparameters from the training stage stored in control . predictfun = function(data, target, args, control) {\n ## Identify numerical features\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n} Creating the preprocessing wrapper Below we create a preprocessing wrapper with a regression neural network (which\nitself does not have a scaling option) as base learner. The train and predict functions defined above are passed to makePreprocWrapper via\nthe train and predict arguments. par.vals is a list of parameter values that is relayed to the args \nargument of the train function. lrn = makeLearner( regr.nnet , trace = FALSE, decay = 1e-02)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.vals = list(center = TRUE, scale = TRUE))\nlrn\n# Learner regr.nnet.preproc from package nnet\n# Type: regr\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,weights\n# Predict-Type: response\n# Hyperparameters: size=3,trace=FALSE,decay=0.01 Let's compare the cross-validated mean squared error ( mse ) on the Boston Housing data set with and without scaling. rdesc = makeResampleDesc( CV , iters = 3)\n\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.nnet.preproc\n# mse.aggr: 20.62\n# mse.mean: 20.62\n# mse.sd: 8.53\n# Runtime: 0.218279\n\nlrn = makeLearner( regr.nnet , trace = FALSE, decay = 1e-02)\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.nnet\n# mse.aggr: 55.06\n# mse.mean: 55.06\n# mse.sd: 19.40\n# Runtime: 0.177695 Joint tuning of preprocessing and learner parameters Often it's not clear which preprocessing options work best with a certain learning algorithm.\nAs already shown for the number of principal components in makePreprocWrapperCaret we can tune them easily together with other hyperparameters of the learner. In our scaling example we can try if nnet works best with both centering and\nscaling the data or if it's better to omit one of the two operations or do no preprocessing\nat all.\nIn order to tune center and scale we have to add appropriate LearnerParam s\nto the parameter set of the wrapped learner. As mentioned above scale allows for numeric and logical center and scale \narguments. As we want to use the latter option we declare center and scale as logical\nlearner parameters. lrn = makeLearner( regr.nnet , trace = FALSE)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam( center ),\n makeLogicalLearnerParam( scale )\n ),\n par.vals = list(center = TRUE, scale = TRUE))\n\nlrn\n# Learner regr.nnet.preproc from package nnet\n# Type: regr\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,weights\n# Predict-Type: response\n# Hyperparameters: size=3,trace=FALSE,center=TRUE,scale=TRUE\n\ngetParamSet(lrn)\n# Type len Def Constr Req Tunable Trafo\n# center logical - - - - TRUE -\n# scale logical - - - - TRUE -\n# size integer - 3 0 to Inf - TRUE -\n# maxit integer - 100 1 to Inf - TRUE -\n# linout logical - FALSE - Y TRUE -\n# entropy logical - FALSE - Y TRUE -\n# softmax logical - FALSE - Y TRUE -\n# censored logical - FALSE - Y TRUE -\n# skip logical - FALSE - - TRUE -\n# rang numeric - 0.7 -Inf to Inf - TRUE -\n# decay numeric - 0 0 to Inf - TRUE -\n# Hess logical - FALSE - - TRUE -\n# trace logical - TRUE - - FALSE -\n# MaxNWts integer - 1000 1 to Inf - TRUE -\n# abstoll numeric - 0.0001 -Inf to Inf - TRUE -\n# reltoll numeric - 1e-08 -Inf to Inf - TRUE - Now we do a simple grid search for the decay parameter of nnet and the center and scale parameters. rdesc = makeResampleDesc( Holdout )\nps = makeParamSet(\n makeDiscreteParam( decay , c(0, 0.05, 0.1)),\n makeLogicalParam( center ),\n makeLogicalParam( scale )\n)\nctrl = makeTuneControlGrid()\nres = tuneParams(lrn, bh.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\n\nres\n# Tune result:\n# Op. pars: decay=0.05; center=FALSE; scale=TRUE\n# mse.test.mean=14.8\n\nas.data.frame(res$opt.path)\n# decay center scale mse.test.mean dob eol error.message exec.time\n# 1 0 TRUE TRUE 49.38128 1 NA NA 0.069\n# 2 0.05 TRUE TRUE 20.64761 2 NA NA 0.082\n# 3 0.1 TRUE TRUE 22.42986 3 NA NA 0.078\n# 4 0 FALSE TRUE 96.25474 4 NA NA 0.034\n# 5 0.05 FALSE TRUE 14.84306 5 NA NA 0.082\n# 6 0.1 FALSE TRUE 16.65383 6 NA NA 0.078\n# 7 0 TRUE FALSE 40.51518 7 NA NA 0.079\n# 8 0.05 TRUE FALSE 68.00069 8 NA NA 0.077\n# 9 0.1 TRUE FALSE 55.42210 9 NA NA 0.085\n# 10 0 FALSE FALSE 96.25474 10 NA NA 0.032\n# 11 0.05 FALSE FALSE 56.25758 11 NA NA 0.083\n# 12 0.1 FALSE FALSE 42.85529 12 NA NA 0.079 Preprocessing wrapper functions If you have written a preprocessing wrapper that you might want to use from time to time\nit's a good idea to encapsulate it in an own function as shown below.\nIf you think your preprocessing method is something others might want to use as well and should\nbe integrated into mlr just contact us . makePreprocWrapperScale = function(learner) {\n trainfun = function(data, target, args = list(center, scale)) {\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n control = args\n if (is.logical(control$center) control$center)\n control$center = attr(x, scaled:center )\n if (is.logical(control$scale) control$scale)\n control$scale = attr(x, scaled:scale )\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n }\n predictfun = function(data, target, args, control) {\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n }\n makePreprocWrapper(\n learner,\n train = trainfun,\n predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam( center ),\n makeLogicalLearnerParam( scale )\n ),\n par.vals = list(center = TRUE, scale = TRUE)\n )\n}\n\nmakePreprocWrapperScale( classif.lda )\n# Learner classif.lda.preproc from package MASS\n# Type: classif\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: center=TRUE,scale=TRUE", + "text": "If the options offered by makePreprocWrapperCaret are not enough, you can write your own\npreprocessing wrapper using function makePreprocWrapper . As described in the tutorial section about wrapped learners wrappers are\nimplemented using a train and a predict method.\nIn case of preprocessing wrappers these methods specify how to transform the data before\ntraining and before prediction and are completely user-defined . Below we show how to create a preprocessing wrapper that centers and scales the data before\ntraining/predicting.\nSome learning methods as, e.g., k nearest neighbors, support vector machines or neural networks\nusually require scaled features.\nMany, but not all, have a built-in scaling option where the training data set is scaled before\nmodel fitting and the test data set is scaled accordingly, that is by using the scaling\nparameters from the training stage, before making predictions.\nIn the following we show how to add a scaling option to a Learner by coupling\nit with function scale . Note that we chose this simple example for demonstration.\nCentering/scaling the data is also possible with makePreprocWrapperCaret . Specifying the train function The train function has to be a function with the following arguments: data is a data.frame with columns for all features and\n the target variable. target is a string and denotes the name of the target variable in data . args is a list of further arguments and parameters that influence the\n preprocessing. It must return a list with elements $data and $control ,\nwhere $data is the preprocessed data set and $control stores all information required\nto preprocess the data before prediction. The train function for the scaling example is given below. It calls scale on the\nnumerical features and returns the scaled training data and the corresponding scaling parameters. args contains the center and scale arguments of function scale \nand slot $control stores the scaling parameters to be used in the prediction stage. Regarding the latter note that the center and scale arguments of scale \ncan be either a logical value or a numeric vector of length equal to the number of the numeric\ncolumns in data , respectively.\nIf a logical value was passed to args we store the column means and standard deviations/\nroot mean squares in the $center and $scale slots of the returned $control object. trainfun = function(data, target, args = list(center, scale)) {\n ## Identify numerical features\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n ## Store the scaling parameters in control\n ## These are needed to preprocess the data before prediction\n control = args\n if (is.logical(control$center) control$center)\n control$center = attr(x, scaled:center )\n if (is.logical(control$scale) control$scale)\n control$scale = attr(x, scaled:scale )\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n} Specifying the predict function The predict function has the following arguments: data is a data.frame containing only feature values\n (as for prediction the target values naturally are not known). target is a string indicating the name of the target variable. args are the args that were passed to the train function. control is the object returned by the train function. It returns the preprocessed data. In our scaling example the predict function scales the numerical features using the\nparameters from the training stage stored in control . predictfun = function(data, target, args, control) {\n ## Identify numerical features\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n ## Extract numerical features from the data set and call scale\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n ## Recombine the data\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n} Creating the preprocessing wrapper Below we create a preprocessing wrapper with a regression neural network (which\nitself does not have a scaling option) as base learner. The train and predict functions defined above are passed to makePreprocWrapper via\nthe train and predict arguments. par.vals is a list of parameter values that is relayed to the args \nargument of the train function. lrn = makeLearner( regr.nnet , trace = FALSE, decay = 1e-02)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.vals = list(center = TRUE, scale = TRUE))\nlrn\n# Learner regr.nnet.preproc from package nnet\n# Type: regr\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,weights\n# Predict-Type: response\n# Hyperparameters: size=3,trace=FALSE,decay=0.01 Let's compare the cross-validated mean squared error ( mse ) on the Boston Housing data set with and without scaling. rdesc = makeResampleDesc( CV , iters = 3)\n\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.nnet.preproc\n# mse.aggr: 20.62\n# mse.mean: 20.62\n# mse.sd: 8.53\n# Runtime: 0.185565\n\nlrn = makeLearner( regr.nnet , trace = FALSE, decay = 1e-02)\nr = resample(lrn, bh.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.nnet\n# mse.aggr: 55.06\n# mse.mean: 55.06\n# mse.sd: 19.40\n# Runtime: 0.150321 Joint tuning of preprocessing and learner parameters Often it's not clear which preprocessing options work best with a certain learning algorithm.\nAs already shown for the number of principal components in makePreprocWrapperCaret we can tune them easily together with other hyperparameters of the learner. In our scaling example we can try if nnet works best with both centering and\nscaling the data or if it's better to omit one of the two operations or do no preprocessing\nat all.\nIn order to tune center and scale we have to add appropriate LearnerParam s\nto the parameter set of the wrapped learner. As mentioned above scale allows for numeric and logical center and scale \narguments. As we want to use the latter option we declare center and scale as logical\nlearner parameters. lrn = makeLearner( regr.nnet , trace = FALSE)\nlrn = makePreprocWrapper(lrn, train = trainfun, predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam( center ),\n makeLogicalLearnerParam( scale )\n ),\n par.vals = list(center = TRUE, scale = TRUE))\n\nlrn\n# Learner regr.nnet.preproc from package nnet\n# Type: regr\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,weights\n# Predict-Type: response\n# Hyperparameters: size=3,trace=FALSE,center=TRUE,scale=TRUE\n\ngetParamSet(lrn)\n# Type len Def Constr Req Tunable Trafo\n# center logical - - - - TRUE -\n# scale logical - - - - TRUE -\n# size integer - 3 0 to Inf - TRUE -\n# maxit integer - 100 1 to Inf - TRUE -\n# linout logical - FALSE - Y TRUE -\n# entropy logical - FALSE - Y TRUE -\n# softmax logical - FALSE - Y TRUE -\n# censored logical - FALSE - Y TRUE -\n# skip logical - FALSE - - TRUE -\n# rang numeric - 0.7 -Inf to Inf - TRUE -\n# decay numeric - 0 0 to Inf - TRUE -\n# Hess logical - FALSE - - TRUE -\n# trace logical - TRUE - - FALSE -\n# MaxNWts integer - 1000 1 to Inf - TRUE -\n# abstoll numeric - 0.0001 -Inf to Inf - TRUE -\n# reltoll numeric - 1e-08 -Inf to Inf - TRUE - Now we do a simple grid search for the decay parameter of nnet and the center and scale parameters. rdesc = makeResampleDesc( Holdout )\nps = makeParamSet(\n makeDiscreteParam( decay , c(0, 0.05, 0.1)),\n makeLogicalParam( center ),\n makeLogicalParam( scale )\n)\nctrl = makeTuneControlGrid()\nres = tuneParams(lrn, bh.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\n\nres\n# Tune result:\n# Op. pars: decay=0.05; center=FALSE; scale=TRUE\n# mse.test.mean=14.8\n\nas.data.frame(res$opt.path)\n# decay center scale mse.test.mean dob eol error.message exec.time\n# 1 0 TRUE TRUE 49.38128 1 NA NA 0.064\n# 2 0.05 TRUE TRUE 20.64761 2 NA NA 0.071\n# 3 0.1 TRUE TRUE 22.42986 3 NA NA 0.066\n# 4 0 FALSE TRUE 96.25474 4 NA NA 0.028\n# 5 0.05 FALSE TRUE 14.84306 5 NA NA 0.070\n# 6 0.1 FALSE TRUE 16.65383 6 NA NA 0.060\n# 7 0 TRUE FALSE 40.51518 7 NA NA 0.069\n# 8 0.05 TRUE FALSE 68.00069 8 NA NA 0.065\n# 9 0.1 TRUE FALSE 55.42210 9 NA NA 0.069\n# 10 0 FALSE FALSE 96.25474 10 NA NA 0.027\n# 11 0.05 FALSE FALSE 56.25758 11 NA NA 0.069\n# 12 0.1 FALSE FALSE 42.85529 12 NA NA 0.067 Preprocessing wrapper functions If you have written a preprocessing wrapper that you might want to use from time to time\nit's a good idea to encapsulate it in an own function as shown below.\nIf you think your preprocessing method is something others might want to use as well and should\nbe integrated into mlr just contact us . makePreprocWrapperScale = function(learner) {\n trainfun = function(data, target, args = list(center, scale)) {\n cns = colnames(data)\n nums = setdiff(cns[sapply(data, is.numeric)], target)\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = args$center, scale = args$scale)\n control = args\n if (is.logical(control$center) control$center)\n control$center = attr(x, scaled:center )\n if (is.logical(control$scale) control$scale)\n control$scale = attr(x, scaled:scale )\n data = data[, setdiff(cns, nums), drop = FALSE]\n data = cbind(data, as.data.frame(x))\n return(list(data = data, control = control))\n }\n predictfun = function(data, target, args, control) {\n cns = colnames(data)\n nums = cns[sapply(data, is.numeric)]\n x = as.matrix(data[, nums, drop = FALSE])\n x = scale(x, center = control$center, scale = control$scale)\n data = data[, setdiff(cns, nums), drop = FALSE] \n data = cbind(data, as.data.frame(x))\n return(data)\n }\n makePreprocWrapper(\n learner,\n train = trainfun,\n predict = predictfun,\n par.set = makeParamSet(\n makeLogicalLearnerParam( center ),\n makeLogicalLearnerParam( scale )\n ),\n par.vals = list(center = TRUE, scale = TRUE)\n )\n}\n\nmakePreprocWrapperScale( classif.lda )\n# Learner classif.lda.preproc from package MASS\n# Type: classif\n# Name: ; Short name: \n# Class: PreprocWrapper\n# Properties: numerics,factors,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: center=TRUE,scale=TRUE", "title": "Writing a custom preprocessing wrapper" }, { @@ -367,7 +367,7 @@ }, { "location": "/tune/index.html", - "text": "Tuning Hyperparameters\n\n\nMany machine learning algorithms have hyperparameters that need to be set.\nIf selected by the user they can be specified as explained in the section about\n\nLearners\n -- simply pass them to \nmakeLearner\n.\nOften suitable parameter values are not obvious and it is preferable to tune the hyperparameters,\nthat is automatically identify values that lead to the best performance.\n\n\nBasics\n\n\nFor tuning you have to specify\n\n\n\n\nthe search space,\n\n\nthe optimization algorithm,\n\n\nan evaluation method, i.e., a resampling strategy and a performance measure.\n\n\n\n\nThe last point is already covered in this tutorial in the parts about the\n\nevaluation of learning methods\n and \nresampling\n.\n\n\nBelow we show how to specify the search space and optimization algorithm, how to do the\ntuning and how to access the tuning result, using the example of a grid search.\n\n\nThroughout this section we consider classification examples. For the other types of learning\nproblems tuning works analogously.\n\n\nGrid search with manual discretization\n\n\nA grid search is one of the standard -- albeit slow -- ways to choose an\nappropriate set of parameters from a given range of values.\n\n\nWe use the \niris classification task\n for illustration and tune the\nhyperparameters of an SVM (function \nksvm\n from the \nkernlab\n package)\nwith a radial basis kernel.\n\n\nFirst, we create a \nParamSet\n object, which describes the\nparameter space we wish to search.\nThis is done via function \nmakeParamSet\n.\nWe wish to tune the cost parameter \nC\n and the RBF kernel parameter \nsigma\n of the\n\nksvm\n function.\nSince we will use a grid search strategy, we add discrete parameters to the parameter set.\nThe specified \nvalues\n have to be vectors of feasible settings and the complete grid simply is\ntheir cross-product.\nEvery entry in the parameter set has to be named according to the corresponding parameter\nof the underlying \nR\n function.\n\n\nPlease note that whenever parameters in the underlying \nR\n functions should be\npassed in a \nlist\n structure, \nmlr\n tries to give you direct access to\neach parameter and get rid of the list structure.\nThis is the case with the \nkpar\n argument of \nksvm\n which is a list of\nkernel parameters like \nsigma\n.\n\n\nps = makeParamSet(\n makeDiscreteParam(\nC\n, values = 2^(-2:2)),\n makeDiscreteParam(\nsigma\n, values = 2^(-2:2))\n)\n\n\n\n\nAdditional to the parameter set, we need an instance of a \nTuneControl\n object.\nThese describe the optimization strategy to be used and its settings.\nHere we choose a grid search:\n\n\nctrl = makeTuneControlGrid()\n\n\n\n\nWe will use 3-fold cross-validation to assess the quality of a specific parameter setting.\nFor this we need to create a resampling description just like in the \nresampling\n\npart of the tutorial.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3L)\n\n\n\n\nFinally, by combining all the previous pieces, we can tune the SVM parameters by calling\n\ntuneParams\n.\n\n\nres = tuneParams(\nclassif.ksvm\n, task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl)\n#\n [Tune] Started tuning learner classif.ksvm for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n C discrete - - 0.25,0.5,1,2,4 - TRUE -\n#\n sigma discrete - - 0.25,0.5,1,2,4 - TRUE -\n#\n With control class: TuneControlGrid\n#\n Imputation value: 1\n#\n [Tune-x] 1: C=0.25; sigma=0.25\n#\n [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 2: C=0.5; sigma=0.25\n#\n [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 3: C=1; sigma=0.25\n#\n [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 4: C=2; sigma=0.25\n#\n [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 5: C=4; sigma=0.25\n#\n [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 6: C=0.25; sigma=0.5\n#\n [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 7: C=0.5; sigma=0.5\n#\n [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 8: C=1; sigma=0.5\n#\n [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 9: C=2; sigma=0.5\n#\n [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 10: C=4; sigma=0.5\n#\n [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 11: C=0.25; sigma=1\n#\n [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 12: C=0.5; sigma=1\n#\n [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 13: C=1; sigma=1\n#\n [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 14: C=2; sigma=1\n#\n [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 15: C=4; sigma=1\n#\n [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 16: C=0.25; sigma=2\n#\n [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 17: C=0.5; sigma=2\n#\n [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 18: C=1; sigma=2\n#\n [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 19: C=2; sigma=2\n#\n [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 20: C=4; sigma=2\n#\n [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 21: C=0.25; sigma=4\n#\n [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 22: C=0.5; sigma=4\n#\n [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 23: C=1; sigma=4\n#\n [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 24: C=2; sigma=4\n#\n [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 25: C=4; sigma=4\n#\n [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune] Result: C=1; sigma=2 : mmce.test.mean=0.0333\nres\n#\n Tune result:\n#\n Op. pars: C=1; sigma=2\n#\n mmce.test.mean=0.0333\n\n\n\n\ntuneParams\n simply performs the cross-validation for every element of the\ncross-product and selects the parameter setting with the best mean performance.\nAs no performance measure was specified, by default the error rate (\nmmce\n) is\nused.\n\n\nNote that each \nmeasure\n \"knows\" if it is minimized or maximized during tuning.\n\n\n## error rate\nmmce$minimize\n#\n [1] TRUE\n\n## accuracy\nacc$minimize\n#\n [1] FALSE\n\n\n\n\nOf course, you can pass other measures and also a list of measures to \ntuneParams\n.\nIn the latter case the first measure is optimized during tuning, the others are simply evaluated.\nIf you are interested in optimizing several measures simultaneously have a look at the\nparagraph about multi-criteria tuning below.\n\n\nIn the example below we calculate the accuracy (\nacc\n) instead of the error\nrate.\nWe use function \nsetAggregation\n, as described in the section on \nresampling\n,\nto additionally obtain the standard deviation of the accuracy.\n\n\nres = tuneParams(\nclassif.ksvm\n, task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl, measures = list(acc, setAggregation(acc, test.sd)), show.info = FALSE)\nres\n#\n Tune result:\n#\n Op. pars: C=0.25; sigma=0.25\n#\n acc.test.mean=0.953,acc.test.sd=0.0306\n\n\n\n\nAccessing the tuning result\n\n\nThe result object \nTuneResult\n allows you to access the best found settings \n$x\n and their\nestimated performance \n$y\n.\n\n\nres$x\n#\n $C\n#\n [1] 0.25\n#\n \n#\n $sigma\n#\n [1] 0.25\nres$y\n#\n acc.test.mean acc.test.sd \n#\n 0.9533333 0.0305505\n\n\n\n\nMoreover, we can inspect all points evaluated during the search by accessing the\n\n$opt.path\n (see also the documentation of \nOptPath\n).\n\n\nres$opt.path\n#\n Optimization path\n#\n Dimensions: x = 2/2, y = 2\n#\n Length: 25\n#\n Add x values transformed: FALSE\n#\n Error messages: TRUE. Errors: 0 / 25.\n#\n Exec times: TRUE. Range: 0.077 - 0.092. 0 NAs.\nopt.grid = as.data.frame(res$opt.path)\nhead(opt.grid)\n#\n C sigma acc.test.mean acc.test.sd dob eol error.message exec.time\n#\n 1 0.25 0.25 0.9533333 0.03055050 1 NA \nNA\n 0.091\n#\n 2 0.5 0.25 0.9466667 0.02309401 2 NA \nNA\n 0.081\n#\n 3 1 0.25 0.9533333 0.01154701 3 NA \nNA\n 0.083\n#\n 4 2 0.25 0.9533333 0.01154701 4 NA \nNA\n 0.082\n#\n 5 4 0.25 0.9533333 0.01154701 5 NA \nNA\n 0.081\n#\n 6 0.25 0.5 0.9333333 0.01154701 6 NA \nNA\n 0.085\n\n\n\n\nA quick visualization of the performance values on the search grid can be accomplished as follows:\n\n\nlibrary(ggplot2)\ng = ggplot(opt.grid, aes(x = C, y = sigma, fill = acc.test.mean, label = round(acc.test.sd, 3)))\ng + geom_tile() + geom_text(color = \nwhite\n)\n\n\n\n\n \n\n\nThe colors of the tiles display the achieved accuracy, the tile labels show the standard deviation.\n\n\nUsing the optimal parameter values\n\n\nAfter tuning you can generate a \nLearner\n with optimal hyperparameter settings\nas follows:\n\n\nlrn = setHyperPars(makeLearner(\nclassif.ksvm\n), par.vals = res$x)\nlrn\n#\n Learner classif.ksvm from package kernlab\n#\n Type: classif\n#\n Name: Support Vector Machines; Short name: ksvm\n#\n Class: classif.ksvm\n#\n Properties: twoclass,multiclass,numerics,factors,prob,class.weights\n#\n Predict-Type: response\n#\n Hyperparameters: fit=FALSE,C=0.25,sigma=0.25\n\n\n\n\nThen you can proceed as usual.\nHere we refit and predict the learner on the complete \niris\n data\nset.\n\n\nm = train(lrn, iris.task)\npredict(m, task = iris.task)\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: 0.01\n#\n id truth response\n#\n 1 1 setosa setosa\n#\n 2 2 setosa setosa\n#\n 3 3 setosa setosa\n#\n 4 4 setosa setosa\n#\n 5 5 setosa setosa\n#\n 6 6 setosa setosa\n\n\n\n\nGrid search without manual discretization\n\n\nWe can also specify the true numeric parameter types of \nC\n and \nsigma\n when creating the\nparameter set and use the \nresolution\n option of \nmakeTuneControlGrid\n to\nautomatically discretize them.\n\n\nNote how we also make use of the \ntrafo\n option when creating the parameter set to easily\noptimize on a log-scale.\n\n\nTrafos work like this: All optimizers basically see the parameters on their\noriginal scale (from -12 to 12) in this case and produce values on this scale during the search.\nRight before they are passed to the learning algorithm, the transformation function is applied.\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneControlGrid(resolution = 3L)\nrdesc = makeResampleDesc(\nCV\n, iters = 2L)\nres = tuneParams(\nclassif.ksvm\n, iris.task, rdesc, par.set = ps, control = ctrl)\n#\n [Tune] Started tuning learner classif.ksvm for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n C numeric - - -12 to 12 - TRUE Y\n#\n sigma numeric - - -12 to 12 - TRUE Y\n#\n With control class: TuneControlGrid\n#\n Imputation value: 1\n#\n [Tune-x] 1: C=0.000244; sigma=0.000244\n#\n [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 2: C=1; sigma=0.000244\n#\n [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 3: C=4.1e+03; sigma=0.000244\n#\n [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 4: C=0.000244; sigma=1\n#\n [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 5: C=1; sigma=1\n#\n [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 6: C=4.1e+03; sigma=1\n#\n [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 7: C=0.000244; sigma=4.1e+03\n#\n [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 8: C=1; sigma=4.1e+03\n#\n [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune-x] 9: C=4.1e+03; sigma=4.1e+03\n#\n [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max\n#\n [Tune] Result: C=1; sigma=1 : mmce.test.mean=0.04\nres\n#\n Tune result:\n#\n Op. pars: C=1; sigma=1\n#\n mmce.test.mean=0.04\n\n\n\n\nNote that \nres$opt.path\n contains the parameter values \non the original scale\n.\n\n\nas.data.frame(res$opt.path)\n#\n C sigma mmce.test.mean dob eol error.message exec.time\n#\n 1 -12 -12 0.52666667 1 NA \nNA\n 0.054\n#\n 2 0 -12 0.52666667 2 NA \nNA\n 0.055\n#\n 3 12 -12 0.04000000 3 NA \nNA\n 0.052\n#\n 4 -12 0 0.52666667 4 NA \nNA\n 0.054\n#\n 5 0 0 0.04000000 5 NA \nNA\n 0.053\n#\n 6 12 0 0.06666667 6 NA \nNA\n 0.057\n#\n 7 -12 12 0.56666667 7 NA \nNA\n 0.059\n#\n 8 0 12 0.68666667 8 NA \nNA\n 0.061\n#\n 9 12 12 0.68666667 9 NA \nNA\n 0.056\n\n\n\n\nIn order to get the \ntransformed\n parameter values instead, use function\n\ntrafoOptPath\n.\n\n\nas.data.frame(trafoOptPath(res$opt.path))\n#\n C sigma mmce.test.mean dob eol\n#\n 1 2.441406e-04 2.441406e-04 0.52666667 1 NA\n#\n 2 1.000000e+00 2.441406e-04 0.52666667 2 NA\n#\n 3 4.096000e+03 2.441406e-04 0.04000000 3 NA\n#\n 4 2.441406e-04 1.000000e+00 0.52666667 4 NA\n#\n 5 1.000000e+00 1.000000e+00 0.04000000 5 NA\n#\n 6 4.096000e+03 1.000000e+00 0.06666667 6 NA\n#\n 7 2.441406e-04 4.096000e+03 0.56666667 7 NA\n#\n 8 1.000000e+00 4.096000e+03 0.68666667 8 NA\n#\n 9 4.096000e+03 4.096000e+03 0.68666667 9 NA\n\n\n\n\nIterated F-Racing for mixed spaces and dependencies\n\n\nThe package supports a larger number of tuning algorithms, which can all be looked up and\nselected via \nTuneControl\n. One of the cooler algorithms is iterated F-racing from the \n\nirace\n package. This not only works for arbitrary parameter types (numeric, integer,\ndiscrete, logical), but also for so-called dependent / hierarchical parameters:\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeDiscreteParam(\nkernel\n, values = c(\nvanilladot\n, \npolydot\n, \nrbfdot\n)),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x,\n requires = quote(kernel == \nrbfdot\n)),\n makeIntegerParam(\ndegree\n, lower = 2L, upper = 5L,\n requires = quote(kernel == \npolydot\n))\n)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nrdesc = makeResampleDesc(\nHoldout\n)\nres = tuneParams(\nclassif.ksvm\n, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n#\n C kernel sigma degree mmce.test.mean dob eol\n#\n 1 3.148837 polydot NA 5 0.08 1 NA\n#\n 2 3.266305 vanilladot NA NA 0.02 2 NA\n#\n 3 -3.808213 vanilladot NA NA 0.04 3 NA\n#\n 4 1.694097 rbfdot 6.580514 NA 0.48 4 NA\n#\n 5 11.995501 polydot NA 2 0.08 5 NA\n#\n 6 -5.731782 vanilladot NA NA 0.14 6 NA\n#\n error.message exec.time\n#\n 1 \nNA\n 0.033\n#\n 2 \nNA\n 0.034\n#\n 3 \nNA\n 0.031\n#\n 4 \nNA\n 0.035\n#\n 5 \nNA\n 0.060\n#\n 6 \nNA\n 0.031\n\n\n\n\nSee how we made the kernel parameters like \nsigma\n and \ndegree\n dependent on the \nkernel\n\nselection parameters? This approach allows you to tune parameters of multiple kernels at once, \nefficiently concentrating on the ones which work best for your given data set.\n\n\nTuning across whole model spaces with ModelMultiplexer\n\n\nWe can now take the following example even one step further. If we use the\n\nModelMultiplexer\n we can tune over different model classes at once,\njust as we did with the SVM kernels above.\n\n\nbase.learners = list(\n makeLearner(\nclassif.ksvm\n),\n makeLearner(\nclassif.randomForest\n)\n)\nlrn = makeModelMultiplexer(base.learners)\n\n\n\n\nFunction \nmakeModelMultiplexerParamSet\n offers a simple way to contruct parameter set for tuning:\nThe parameter names are prefixed automatically and the \nrequires\n element is set, too,\nto make all paramaters subordinate to \nselected.learner\n.\n\n\nps = makeModelMultiplexerParamSet(lrn,\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeIntegerParam(\nntree\n, lower = 1L, upper = 500L)\n)\nprint(ps)\n#\n Type len Def\n#\n selected.learner discrete - -\n#\n classif.ksvm.sigma numeric - -\n#\n classif.randomForest.ntree integer - -\n#\n Constr Req Tunable\n#\n selected.learner classif.ksvm,classif.randomForest - TRUE\n#\n classif.ksvm.sigma -12 to 12 Y TRUE\n#\n classif.randomForest.ntree 1 to 500 Y TRUE\n#\n Trafo\n#\n selected.learner -\n#\n classif.ksvm.sigma Y\n#\n classif.randomForest.ntree -\nrdesc = makeResampleDesc(\nCV\n, iters = 2L)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nres = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n#\n selected.learner classif.ksvm.sigma classif.randomForest.ntree\n#\n 1 classif.ksvm -3.673815 NA\n#\n 2 classif.ksvm 6.361006 NA\n#\n 3 classif.randomForest NA 487\n#\n 4 classif.ksvm 3.165340 NA\n#\n 5 classif.randomForest NA 125\n#\n 6 classif.randomForest NA 383\n#\n mmce.test.mean dob eol error.message exec.time\n#\n 1 0.04666667 1 NA \nNA\n 0.056\n#\n 2 0.75333333 2 NA \nNA\n 0.055\n#\n 3 0.03333333 3 NA \nNA\n 0.094\n#\n 4 0.24000000 4 NA \nNA\n 0.060\n#\n 5 0.04000000 5 NA \nNA\n 0.056\n#\n 6 0.04000000 6 NA \nNA\n 0.079\n\n\n\n\nMulti-criteria evaluation and optimization\n\n\nDuring tuning you might want to optimize multiple, potentially conflicting, performance measures\nsimultaneously.\n\n\nIn the following example we aim to minimize both, the false positive and the false negative rates\n(\nfpr\n and \nfnr\n).\nWe again tune the hyperparameters of an SVM (function \nksvm\n) with a radial\nbasis kernel and use the \nsonar classification task\n for illustration.\nAs search strategy we choose a random search.\n\n\nFor all available multi-criteria tuning algorithms see \nTuneMultiCritControl\n.\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneMultiCritControlRandom(maxit = 30L)\nrdesc = makeResampleDesc(\nHoldout\n)\nres = tuneParamsMultiCrit(\nclassif.ksvm\n, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = ctrl, show.info = FALSE)\nres\n#\n Tune multicrit result:\n#\n Points on front: 5\nhead(as.data.frame(trafoOptPath(res$opt.path)))\n#\n C sigma fpr.test.mean fnr.test.mean dob eol\n#\n 1 1.052637e-01 0.003374481 0.0000000 1.00000000 1 NA\n#\n 2 1.612578e+02 14.303163917 0.0000000 1.00000000 2 NA\n#\n 3 3.697931e+03 0.026982462 0.1851852 0.06976744 3 NA\n#\n 4 2.331471e+02 11.791412207 0.0000000 1.00000000 4 NA\n#\n 5 2.078857e-02 0.010218565 0.0000000 1.00000000 5 NA\n#\n 6 3.382767e+02 2.187025359 0.0000000 1.00000000 6 NA\n\n\n\n\nThe results can be visualized with function \nplotTuneMultiCritResult\n.\nThe plot shows the false positive and false negative rates for all parameter settings evaluated\nduring tuning. Points on the Pareto front are slightly increased.\n\n\nplotTuneMultiCritResult(res)\n\n\n\n\n \n\n\nFurther comments\n\n\n\n\n\n\nTuning works for all other tasks like regression, survival analysis and so on in a completely\n similar fashion.\n\n\n\n\n\n\nIn longer running tuning experiments it is very annoying if the computation stops due to\n numerical or other errors. Have a look at \non.learner.error\n in \nconfigureMlr\n as well as\n the examples given in section \nConfigure mlr\n of this tutorial.\n You might also want to inform yourself about \nimpute.val\n in \nTuneControl\n.\n\n\n\n\n\n\nAs we continually optimize over the same data during tuning, the estimated\nperformance value might be optimistically biased.\nA clean approach to ensure unbiased performance estimation is \nnested resampling\n,\nwhere we embed the whole model selection process into an outer resampling loop.", + "text": "Tuning Hyperparameters\n\n\nMany machine learning algorithms have hyperparameters that need to be set.\nIf selected by the user they can be specified as explained in the section about\n\nLearners\n -- simply pass them to \nmakeLearner\n.\nOften suitable parameter values are not obvious and it is preferable to tune the hyperparameters,\nthat is automatically identify values that lead to the best performance.\n\n\nBasics\n\n\nFor tuning you have to specify\n\n\n\n\nthe search space,\n\n\nthe optimization algorithm,\n\n\nan evaluation method, i.e., a resampling strategy and a performance measure.\n\n\n\n\nThe last point is already covered in this tutorial in the parts about the\n\nevaluation of learning methods\n and \nresampling\n.\n\n\nBelow we show how to specify the search space and optimization algorithm, how to do the\ntuning and how to access the tuning result, using the example of a grid search.\n\n\nThroughout this section we consider classification examples. For the other types of learning\nproblems tuning works analogously.\n\n\nGrid search with manual discretization\n\n\nA grid search is one of the standard -- albeit slow -- ways to choose an\nappropriate set of parameters from a given range of values.\n\n\nWe use the \niris classification task\n for illustration and tune the\nhyperparameters of an SVM (function \nksvm\n from the \nkernlab\n package)\nwith a radial basis kernel.\n\n\nFirst, we create a \nParamSet\n object, which describes the\nparameter space we wish to search.\nThis is done via function \nmakeParamSet\n.\nWe wish to tune the cost parameter \nC\n and the RBF kernel parameter \nsigma\n of the\n\nksvm\n function.\nSince we will use a grid search strategy, we add discrete parameters to the parameter set.\nThe specified \nvalues\n have to be vectors of feasible settings and the complete grid simply is\ntheir cross-product.\nEvery entry in the parameter set has to be named according to the corresponding parameter\nof the underlying \nR\n function.\n\n\nPlease note that whenever parameters in the underlying \nR\n functions should be\npassed in a \nlist\n structure, \nmlr\n tries to give you direct access to\neach parameter and get rid of the list structure.\nThis is the case with the \nkpar\n argument of \nksvm\n which is a list of\nkernel parameters like \nsigma\n.\n\n\nps = makeParamSet(\n makeDiscreteParam(\nC\n, values = 2^(-2:2)),\n makeDiscreteParam(\nsigma\n, values = 2^(-2:2))\n)\n\n\n\n\nAdditional to the parameter set, we need an instance of a \nTuneControl\n object.\nThese describe the optimization strategy to be used and its settings.\nHere we choose a grid search:\n\n\nctrl = makeTuneControlGrid()\n\n\n\n\nWe will use 3-fold cross-validation to assess the quality of a specific parameter setting.\nFor this we need to create a resampling description just like in the \nresampling\n\npart of the tutorial.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3L)\n\n\n\n\nFinally, by combining all the previous pieces, we can tune the SVM parameters by calling\n\ntuneParams\n.\n\n\nres = tuneParams(\nclassif.ksvm\n, task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl)\n#\n [Tune] Started tuning learner classif.ksvm for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n C discrete - - 0.25,0.5,1,2,4 - TRUE -\n#\n sigma discrete - - 0.25,0.5,1,2,4 - TRUE -\n#\n With control class: TuneControlGrid\n#\n Imputation value: 1\n#\n [Tune-x] 1: C=0.25; sigma=0.25\n#\n [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 2: C=0.5; sigma=0.25\n#\n [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 3: C=1; sigma=0.25\n#\n [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 4: C=2; sigma=0.25\n#\n [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 5: C=4; sigma=0.25\n#\n [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 6: C=0.25; sigma=0.5\n#\n [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 7: C=0.5; sigma=0.5\n#\n [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 8: C=1; sigma=0.5\n#\n [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 9: C=2; sigma=0.5\n#\n [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 10: C=4; sigma=0.5\n#\n [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 11: C=0.25; sigma=1\n#\n [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 12: C=0.5; sigma=1\n#\n [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 13: C=1; sigma=1\n#\n [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 14: C=2; sigma=1\n#\n [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 15: C=4; sigma=1\n#\n [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 16: C=0.25; sigma=2\n#\n [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 17: C=0.5; sigma=2\n#\n [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 18: C=1; sigma=2\n#\n [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 19: C=2; sigma=2\n#\n [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 20: C=4; sigma=2\n#\n [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 21: C=0.25; sigma=4\n#\n [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 22: C=0.5; sigma=4\n#\n [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 23: C=1; sigma=4\n#\n [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 24: C=2; sigma=4\n#\n [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 25: C=4; sigma=4\n#\n [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune] Result: C=1; sigma=2 : mmce.test.mean=0.0333\nres\n#\n Tune result:\n#\n Op. pars: C=1; sigma=2\n#\n mmce.test.mean=0.0333\n\n\n\n\ntuneParams\n simply performs the cross-validation for every element of the\ncross-product and selects the parameter setting with the best mean performance.\nAs no performance measure was specified, by default the error rate (\nmmce\n) is\nused.\n\n\nNote that each \nmeasure\n \"knows\" if it is minimized or maximized during tuning.\n\n\n## error rate\nmmce$minimize\n#\n [1] TRUE\n\n## accuracy\nacc$minimize\n#\n [1] FALSE\n\n\n\n\nOf course, you can pass other measures and also a list of measures to \ntuneParams\n.\nIn the latter case the first measure is optimized during tuning, the others are simply evaluated.\nIf you are interested in optimizing several measures simultaneously have a look at the\nparagraph about multi-criteria tuning below.\n\n\nIn the example below we calculate the accuracy (\nacc\n) instead of the error\nrate.\nWe use function \nsetAggregation\n, as described in the section on \nresampling\n,\nto additionally obtain the standard deviation of the accuracy.\n\n\nres = tuneParams(\nclassif.ksvm\n, task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl, measures = list(acc, setAggregation(acc, test.sd)), show.info = FALSE)\nres\n#\n Tune result:\n#\n Op. pars: C=0.25; sigma=0.25\n#\n acc.test.mean=0.953,acc.test.sd=0.0306\n\n\n\n\nAccessing the tuning result\n\n\nThe result object \nTuneResult\n allows you to access the best found settings \n$x\n and their\nestimated performance \n$y\n.\n\n\nres$x\n#\n $C\n#\n [1] 0.25\n#\n \n#\n $sigma\n#\n [1] 0.25\nres$y\n#\n acc.test.mean acc.test.sd \n#\n 0.9533333 0.0305505\n\n\n\n\nMoreover, we can inspect all points evaluated during the search by accessing the\n\n$opt.path\n (see also the documentation of \nOptPath\n).\n\n\nres$opt.path\n#\n Optimization path\n#\n Dimensions: x = 2/2, y = 2\n#\n Length: 25\n#\n Add x values transformed: FALSE\n#\n Error messages: TRUE. Errors: 0 / 25.\n#\n Exec times: TRUE. Range: 0.067 - 0.085. 0 NAs.\nopt.grid = as.data.frame(res$opt.path)\nhead(opt.grid)\n#\n C sigma acc.test.mean acc.test.sd dob eol error.message exec.time\n#\n 1 0.25 0.25 0.9533333 0.03055050 1 NA \nNA\n 0.071\n#\n 2 0.5 0.25 0.9466667 0.02309401 2 NA \nNA\n 0.069\n#\n 3 1 0.25 0.9533333 0.01154701 3 NA \nNA\n 0.067\n#\n 4 2 0.25 0.9533333 0.01154701 4 NA \nNA\n 0.067\n#\n 5 4 0.25 0.9533333 0.01154701 5 NA \nNA\n 0.068\n#\n 6 0.25 0.5 0.9333333 0.01154701 6 NA \nNA\n 0.069\n\n\n\n\nA quick visualization of the performance values on the search grid can be accomplished as follows:\n\n\nlibrary(ggplot2)\ng = ggplot(opt.grid, aes(x = C, y = sigma, fill = acc.test.mean, label = round(acc.test.sd, 3)))\ng + geom_tile() + geom_text(color = \nwhite\n)\n\n\n\n\n \n\n\nThe colors of the tiles display the achieved accuracy, the tile labels show the standard deviation.\n\n\nUsing the optimal parameter values\n\n\nAfter tuning you can generate a \nLearner\n with optimal hyperparameter settings\nas follows:\n\n\nlrn = setHyperPars(makeLearner(\nclassif.ksvm\n), par.vals = res$x)\nlrn\n#\n Learner classif.ksvm from package kernlab\n#\n Type: classif\n#\n Name: Support Vector Machines; Short name: ksvm\n#\n Class: classif.ksvm\n#\n Properties: twoclass,multiclass,numerics,factors,prob,class.weights\n#\n Predict-Type: response\n#\n Hyperparameters: fit=FALSE,C=0.25,sigma=0.25\n\n\n\n\nThen you can proceed as usual.\nHere we refit and predict the learner on the complete \niris\n data\nset.\n\n\nm = train(lrn, iris.task)\npredict(m, task = iris.task)\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: 0.01\n#\n id truth response\n#\n 1 1 setosa setosa\n#\n 2 2 setosa setosa\n#\n 3 3 setosa setosa\n#\n 4 4 setosa setosa\n#\n 5 5 setosa setosa\n#\n 6 6 setosa setosa\n\n\n\n\nGrid search without manual discretization\n\n\nWe can also specify the true numeric parameter types of \nC\n and \nsigma\n when creating the\nparameter set and use the \nresolution\n option of \nmakeTuneControlGrid\n to\nautomatically discretize them.\n\n\nNote how we also make use of the \ntrafo\n option when creating the parameter set to easily\noptimize on a log-scale.\n\n\nTrafos work like this: All optimizers basically see the parameters on their\noriginal scale (from -12 to 12) in this case and produce values on this scale during the search.\nRight before they are passed to the learning algorithm, the transformation function is applied.\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneControlGrid(resolution = 3L)\nrdesc = makeResampleDesc(\nCV\n, iters = 2L)\nres = tuneParams(\nclassif.ksvm\n, iris.task, rdesc, par.set = ps, control = ctrl)\n#\n [Tune] Started tuning learner classif.ksvm for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n C numeric - - -12 to 12 - TRUE Y\n#\n sigma numeric - - -12 to 12 - TRUE Y\n#\n With control class: TuneControlGrid\n#\n Imputation value: 1\n#\n [Tune-x] 1: C=0.000244; sigma=0.000244\n#\n [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 2: C=1; sigma=0.000244\n#\n [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 3: C=4.1e+03; sigma=0.000244\n#\n [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 4: C=0.000244; sigma=1\n#\n [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 5: C=1; sigma=1\n#\n [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 6: C=4.1e+03; sigma=1\n#\n [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 7: C=0.000244; sigma=4.1e+03\n#\n [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 8: C=1; sigma=4.1e+03\n#\n [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune-x] 9: C=4.1e+03; sigma=4.1e+03\n#\n [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max\n#\n [Tune] Result: C=1; sigma=1 : mmce.test.mean=0.04\nres\n#\n Tune result:\n#\n Op. pars: C=1; sigma=1\n#\n mmce.test.mean=0.04\n\n\n\n\nNote that \nres$opt.path\n contains the parameter values \non the original scale\n.\n\n\nas.data.frame(res$opt.path)\n#\n C sigma mmce.test.mean dob eol error.message exec.time\n#\n 1 -12 -12 0.52666667 1 NA \nNA\n 0.048\n#\n 2 0 -12 0.52666667 2 NA \nNA\n 0.050\n#\n 3 12 -12 0.04000000 3 NA \nNA\n 0.046\n#\n 4 -12 0 0.52666667 4 NA \nNA\n 0.050\n#\n 5 0 0 0.04000000 5 NA \nNA\n 0.047\n#\n 6 12 0 0.06666667 6 NA \nNA\n 0.048\n#\n 7 -12 12 0.56666667 7 NA \nNA\n 0.049\n#\n 8 0 12 0.68666667 8 NA \nNA\n 0.049\n#\n 9 12 12 0.68666667 9 NA \nNA\n 0.049\n\n\n\n\nIn order to get the \ntransformed\n parameter values instead, use function\n\ntrafoOptPath\n.\n\n\nas.data.frame(trafoOptPath(res$opt.path))\n#\n C sigma mmce.test.mean dob eol\n#\n 1 2.441406e-04 2.441406e-04 0.52666667 1 NA\n#\n 2 1.000000e+00 2.441406e-04 0.52666667 2 NA\n#\n 3 4.096000e+03 2.441406e-04 0.04000000 3 NA\n#\n 4 2.441406e-04 1.000000e+00 0.52666667 4 NA\n#\n 5 1.000000e+00 1.000000e+00 0.04000000 5 NA\n#\n 6 4.096000e+03 1.000000e+00 0.06666667 6 NA\n#\n 7 2.441406e-04 4.096000e+03 0.56666667 7 NA\n#\n 8 1.000000e+00 4.096000e+03 0.68666667 8 NA\n#\n 9 4.096000e+03 4.096000e+03 0.68666667 9 NA\n\n\n\n\nIterated F-Racing for mixed spaces and dependencies\n\n\nThe package supports a larger number of tuning algorithms, which can all be looked up and\nselected via \nTuneControl\n. One of the cooler algorithms is iterated F-racing from the \n\nirace\n package. This not only works for arbitrary parameter types (numeric, integer,\ndiscrete, logical), but also for so-called dependent / hierarchical parameters:\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeDiscreteParam(\nkernel\n, values = c(\nvanilladot\n, \npolydot\n, \nrbfdot\n)),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x,\n requires = quote(kernel == \nrbfdot\n)),\n makeIntegerParam(\ndegree\n, lower = 2L, upper = 5L,\n requires = quote(kernel == \npolydot\n))\n)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nrdesc = makeResampleDesc(\nHoldout\n)\nres = tuneParams(\nclassif.ksvm\n, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n#\n C kernel sigma degree mmce.test.mean dob eol\n#\n 1 3.148837 polydot NA 5 0.08 1 NA\n#\n 2 3.266305 vanilladot NA NA 0.02 2 NA\n#\n 3 -3.808213 vanilladot NA NA 0.04 3 NA\n#\n 4 1.694097 rbfdot 6.580514 NA 0.48 4 NA\n#\n 5 11.995501 polydot NA 2 0.08 5 NA\n#\n 6 -5.731782 vanilladot NA NA 0.14 6 NA\n#\n error.message exec.time\n#\n 1 \nNA\n 0.031\n#\n 2 \nNA\n 0.027\n#\n 3 \nNA\n 0.029\n#\n 4 \nNA\n 0.030\n#\n 5 \nNA\n 0.028\n#\n 6 \nNA\n 0.031\n\n\n\n\nSee how we made the kernel parameters like \nsigma\n and \ndegree\n dependent on the \nkernel\n\nselection parameters? This approach allows you to tune parameters of multiple kernels at once, \nefficiently concentrating on the ones which work best for your given data set.\n\n\nTuning across whole model spaces with ModelMultiplexer\n\n\nWe can now take the following example even one step further. If we use the\n\nModelMultiplexer\n we can tune over different model classes at once,\njust as we did with the SVM kernels above.\n\n\nbase.learners = list(\n makeLearner(\nclassif.ksvm\n),\n makeLearner(\nclassif.randomForest\n)\n)\nlrn = makeModelMultiplexer(base.learners)\n\n\n\n\nFunction \nmakeModelMultiplexerParamSet\n offers a simple way to contruct parameter set for tuning:\nThe parameter names are prefixed automatically and the \nrequires\n element is set, too,\nto make all paramaters subordinate to \nselected.learner\n.\n\n\nps = makeModelMultiplexerParamSet(lrn,\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeIntegerParam(\nntree\n, lower = 1L, upper = 500L)\n)\nprint(ps)\n#\n Type len Def\n#\n selected.learner discrete - -\n#\n classif.ksvm.sigma numeric - -\n#\n classif.randomForest.ntree integer - -\n#\n Constr Req Tunable\n#\n selected.learner classif.ksvm,classif.randomForest - TRUE\n#\n classif.ksvm.sigma -12 to 12 Y TRUE\n#\n classif.randomForest.ntree 1 to 500 Y TRUE\n#\n Trafo\n#\n selected.learner -\n#\n classif.ksvm.sigma Y\n#\n classif.randomForest.ntree -\nrdesc = makeResampleDesc(\nCV\n, iters = 2L)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nres = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n#\n selected.learner classif.ksvm.sigma classif.randomForest.ntree\n#\n 1 classif.ksvm -3.673815 NA\n#\n 2 classif.ksvm 6.361006 NA\n#\n 3 classif.randomForest NA 487\n#\n 4 classif.ksvm 3.165340 NA\n#\n 5 classif.randomForest NA 125\n#\n 6 classif.randomForest NA 383\n#\n mmce.test.mean dob eol error.message exec.time\n#\n 1 0.04666667 1 NA \nNA\n 0.051\n#\n 2 0.75333333 2 NA \nNA\n 0.054\n#\n 3 0.03333333 3 NA \nNA\n 0.079\n#\n 4 0.24000000 4 NA \nNA\n 0.053\n#\n 5 0.04000000 5 NA \nNA\n 0.049\n#\n 6 0.04000000 6 NA \nNA\n 0.071\n\n\n\n\nMulti-criteria evaluation and optimization\n\n\nDuring tuning you might want to optimize multiple, potentially conflicting, performance measures\nsimultaneously.\n\n\nIn the following example we aim to minimize both, the false positive and the false negative rates\n(\nfpr\n and \nfnr\n).\nWe again tune the hyperparameters of an SVM (function \nksvm\n) with a radial\nbasis kernel and use the \nsonar classification task\n for illustration.\nAs search strategy we choose a random search.\n\n\nFor all available multi-criteria tuning algorithms see \nTuneMultiCritControl\n.\n\n\nps = makeParamSet(\n makeNumericParam(\nC\n, lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam(\nsigma\n, lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneMultiCritControlRandom(maxit = 30L)\nrdesc = makeResampleDesc(\nHoldout\n)\nres = tuneParamsMultiCrit(\nclassif.ksvm\n, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = ctrl, show.info = FALSE)\nres\n#\n Tune multicrit result:\n#\n Points on front: 5\nhead(as.data.frame(trafoOptPath(res$opt.path)))\n#\n C sigma fpr.test.mean fnr.test.mean dob eol\n#\n 1 1.052637e-01 0.003374481 0.0000000 1.00000000 1 NA\n#\n 2 1.612578e+02 14.303163917 0.0000000 1.00000000 2 NA\n#\n 3 3.697931e+03 0.026982462 0.1851852 0.06976744 3 NA\n#\n 4 2.331471e+02 11.791412207 0.0000000 1.00000000 4 NA\n#\n 5 2.078857e-02 0.010218565 0.0000000 1.00000000 5 NA\n#\n 6 3.382767e+02 2.187025359 0.0000000 1.00000000 6 NA\n\n\n\n\nThe results can be visualized with function \nplotTuneMultiCritResult\n.\nThe plot shows the false positive and false negative rates for all parameter settings evaluated\nduring tuning. Points on the Pareto front are slightly increased.\n\n\nplotTuneMultiCritResult(res)\n\n\n\n\n \n\n\nFurther comments\n\n\n\n\n\n\nTuning works for all other tasks like regression, survival analysis and so on in a completely\n similar fashion.\n\n\n\n\n\n\nIn longer running tuning experiments it is very annoying if the computation stops due to\n numerical or other errors. Have a look at \non.learner.error\n in \nconfigureMlr\n as well as\n the examples given in section \nConfigure mlr\n of this tutorial.\n You might also want to inform yourself about \nimpute.val\n in \nTuneControl\n.\n\n\n\n\n\n\nAs we continually optimize over the same data during tuning, the estimated\nperformance value might be optimistically biased.\nA clean approach to ensure unbiased performance estimation is \nnested resampling\n,\nwhere we embed the whole model selection process into an outer resampling loop.", "title": "Tuning" }, { @@ -377,17 +377,17 @@ }, { "location": "/tune/index.html#basics", - "text": "For tuning you have to specify the search space, the optimization algorithm, an evaluation method, i.e., a resampling strategy and a performance measure. The last point is already covered in this tutorial in the parts about the evaluation of learning methods and resampling . Below we show how to specify the search space and optimization algorithm, how to do the\ntuning and how to access the tuning result, using the example of a grid search. Throughout this section we consider classification examples. For the other types of learning\nproblems tuning works analogously. Grid search with manual discretization A grid search is one of the standard -- albeit slow -- ways to choose an\nappropriate set of parameters from a given range of values. We use the iris classification task for illustration and tune the\nhyperparameters of an SVM (function ksvm from the kernlab package)\nwith a radial basis kernel. First, we create a ParamSet object, which describes the\nparameter space we wish to search.\nThis is done via function makeParamSet .\nWe wish to tune the cost parameter C and the RBF kernel parameter sigma of the ksvm function.\nSince we will use a grid search strategy, we add discrete parameters to the parameter set.\nThe specified values have to be vectors of feasible settings and the complete grid simply is\ntheir cross-product.\nEvery entry in the parameter set has to be named according to the corresponding parameter\nof the underlying R function. Please note that whenever parameters in the underlying R functions should be\npassed in a list structure, mlr tries to give you direct access to\neach parameter and get rid of the list structure.\nThis is the case with the kpar argument of ksvm which is a list of\nkernel parameters like sigma . ps = makeParamSet(\n makeDiscreteParam( C , values = 2^(-2:2)),\n makeDiscreteParam( sigma , values = 2^(-2:2))\n) Additional to the parameter set, we need an instance of a TuneControl object.\nThese describe the optimization strategy to be used and its settings.\nHere we choose a grid search: ctrl = makeTuneControlGrid() We will use 3-fold cross-validation to assess the quality of a specific parameter setting.\nFor this we need to create a resampling description just like in the resampling \npart of the tutorial. rdesc = makeResampleDesc( CV , iters = 3L) Finally, by combining all the previous pieces, we can tune the SVM parameters by calling tuneParams . res = tuneParams( classif.ksvm , task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl)\n# [Tune] Started tuning learner classif.ksvm for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# C discrete - - 0.25,0.5,1,2,4 - TRUE -\n# sigma discrete - - 0.25,0.5,1,2,4 - TRUE -\n# With control class: TuneControlGrid\n# Imputation value: 1\n# [Tune-x] 1: C=0.25; sigma=0.25\n# [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 2: C=0.5; sigma=0.25\n# [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 3: C=1; sigma=0.25\n# [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 4: C=2; sigma=0.25\n# [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 5: C=4; sigma=0.25\n# [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 6: C=0.25; sigma=0.5\n# [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 7: C=0.5; sigma=0.5\n# [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 8: C=1; sigma=0.5\n# [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 9: C=2; sigma=0.5\n# [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 10: C=4; sigma=0.5\n# [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 11: C=0.25; sigma=1\n# [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 12: C=0.5; sigma=1\n# [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 13: C=1; sigma=1\n# [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 14: C=2; sigma=1\n# [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 15: C=4; sigma=1\n# [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 16: C=0.25; sigma=2\n# [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 17: C=0.5; sigma=2\n# [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 18: C=1; sigma=2\n# [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 19: C=2; sigma=2\n# [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 20: C=4; sigma=2\n# [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 21: C=0.25; sigma=4\n# [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 22: C=0.5; sigma=4\n# [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 23: C=1; sigma=4\n# [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 24: C=2; sigma=4\n# [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 25: C=4; sigma=4\n# [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune] Result: C=1; sigma=2 : mmce.test.mean=0.0333\nres\n# Tune result:\n# Op. pars: C=1; sigma=2\n# mmce.test.mean=0.0333 tuneParams simply performs the cross-validation for every element of the\ncross-product and selects the parameter setting with the best mean performance.\nAs no performance measure was specified, by default the error rate ( mmce ) is\nused. Note that each measure \"knows\" if it is minimized or maximized during tuning. ## error rate\nmmce$minimize\n# [1] TRUE\n\n## accuracy\nacc$minimize\n# [1] FALSE Of course, you can pass other measures and also a list of measures to tuneParams .\nIn the latter case the first measure is optimized during tuning, the others are simply evaluated.\nIf you are interested in optimizing several measures simultaneously have a look at the\nparagraph about multi-criteria tuning below. In the example below we calculate the accuracy ( acc ) instead of the error\nrate.\nWe use function setAggregation , as described in the section on resampling ,\nto additionally obtain the standard deviation of the accuracy. res = tuneParams( classif.ksvm , task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl, measures = list(acc, setAggregation(acc, test.sd)), show.info = FALSE)\nres\n# Tune result:\n# Op. pars: C=0.25; sigma=0.25\n# acc.test.mean=0.953,acc.test.sd=0.0306 Accessing the tuning result The result object TuneResult allows you to access the best found settings $x and their\nestimated performance $y . res$x\n# $C\n# [1] 0.25\n# \n# $sigma\n# [1] 0.25\nres$y\n# acc.test.mean acc.test.sd \n# 0.9533333 0.0305505 Moreover, we can inspect all points evaluated during the search by accessing the $opt.path (see also the documentation of OptPath ). res$opt.path\n# Optimization path\n# Dimensions: x = 2/2, y = 2\n# Length: 25\n# Add x values transformed: FALSE\n# Error messages: TRUE. Errors: 0 / 25.\n# Exec times: TRUE. Range: 0.077 - 0.092. 0 NAs.\nopt.grid = as.data.frame(res$opt.path)\nhead(opt.grid)\n# C sigma acc.test.mean acc.test.sd dob eol error.message exec.time\n# 1 0.25 0.25 0.9533333 0.03055050 1 NA NA 0.091\n# 2 0.5 0.25 0.9466667 0.02309401 2 NA NA 0.081\n# 3 1 0.25 0.9533333 0.01154701 3 NA NA 0.083\n# 4 2 0.25 0.9533333 0.01154701 4 NA NA 0.082\n# 5 4 0.25 0.9533333 0.01154701 5 NA NA 0.081\n# 6 0.25 0.5 0.9333333 0.01154701 6 NA NA 0.085 A quick visualization of the performance values on the search grid can be accomplished as follows: library(ggplot2)\ng = ggplot(opt.grid, aes(x = C, y = sigma, fill = acc.test.mean, label = round(acc.test.sd, 3)))\ng + geom_tile() + geom_text(color = white ) The colors of the tiles display the achieved accuracy, the tile labels show the standard deviation. Using the optimal parameter values After tuning you can generate a Learner with optimal hyperparameter settings\nas follows: lrn = setHyperPars(makeLearner( classif.ksvm ), par.vals = res$x)\nlrn\n# Learner classif.ksvm from package kernlab\n# Type: classif\n# Name: Support Vector Machines; Short name: ksvm\n# Class: classif.ksvm\n# Properties: twoclass,multiclass,numerics,factors,prob,class.weights\n# Predict-Type: response\n# Hyperparameters: fit=FALSE,C=0.25,sigma=0.25 Then you can proceed as usual.\nHere we refit and predict the learner on the complete iris data\nset. m = train(lrn, iris.task)\npredict(m, task = iris.task)\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: 0.01\n# id truth response\n# 1 1 setosa setosa\n# 2 2 setosa setosa\n# 3 3 setosa setosa\n# 4 4 setosa setosa\n# 5 5 setosa setosa\n# 6 6 setosa setosa Grid search without manual discretization We can also specify the true numeric parameter types of C and sigma when creating the\nparameter set and use the resolution option of makeTuneControlGrid to\nautomatically discretize them. Note how we also make use of the trafo option when creating the parameter set to easily\noptimize on a log-scale. Trafos work like this: All optimizers basically see the parameters on their\noriginal scale (from -12 to 12) in this case and produce values on this scale during the search.\nRight before they are passed to the learning algorithm, the transformation function is applied. ps = makeParamSet(\n makeNumericParam( C , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneControlGrid(resolution = 3L)\nrdesc = makeResampleDesc( CV , iters = 2L)\nres = tuneParams( classif.ksvm , iris.task, rdesc, par.set = ps, control = ctrl)\n# [Tune] Started tuning learner classif.ksvm for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# C numeric - - -12 to 12 - TRUE Y\n# sigma numeric - - -12 to 12 - TRUE Y\n# With control class: TuneControlGrid\n# Imputation value: 1\n# [Tune-x] 1: C=0.000244; sigma=0.000244\n# [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 2: C=1; sigma=0.000244\n# [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 3: C=4.1e+03; sigma=0.000244\n# [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 4: C=0.000244; sigma=1\n# [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 5: C=1; sigma=1\n# [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 6: C=4.1e+03; sigma=1\n# [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 7: C=0.000244; sigma=4.1e+03\n# [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 8: C=1; sigma=4.1e+03\n# [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune-x] 9: C=4.1e+03; sigma=4.1e+03\n# [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max\n# [Tune] Result: C=1; sigma=1 : mmce.test.mean=0.04\nres\n# Tune result:\n# Op. pars: C=1; sigma=1\n# mmce.test.mean=0.04 Note that res$opt.path contains the parameter values on the original scale . as.data.frame(res$opt.path)\n# C sigma mmce.test.mean dob eol error.message exec.time\n# 1 -12 -12 0.52666667 1 NA NA 0.054\n# 2 0 -12 0.52666667 2 NA NA 0.055\n# 3 12 -12 0.04000000 3 NA NA 0.052\n# 4 -12 0 0.52666667 4 NA NA 0.054\n# 5 0 0 0.04000000 5 NA NA 0.053\n# 6 12 0 0.06666667 6 NA NA 0.057\n# 7 -12 12 0.56666667 7 NA NA 0.059\n# 8 0 12 0.68666667 8 NA NA 0.061\n# 9 12 12 0.68666667 9 NA NA 0.056 In order to get the transformed parameter values instead, use function trafoOptPath . as.data.frame(trafoOptPath(res$opt.path))\n# C sigma mmce.test.mean dob eol\n# 1 2.441406e-04 2.441406e-04 0.52666667 1 NA\n# 2 1.000000e+00 2.441406e-04 0.52666667 2 NA\n# 3 4.096000e+03 2.441406e-04 0.04000000 3 NA\n# 4 2.441406e-04 1.000000e+00 0.52666667 4 NA\n# 5 1.000000e+00 1.000000e+00 0.04000000 5 NA\n# 6 4.096000e+03 1.000000e+00 0.06666667 6 NA\n# 7 2.441406e-04 4.096000e+03 0.56666667 7 NA\n# 8 1.000000e+00 4.096000e+03 0.68666667 8 NA\n# 9 4.096000e+03 4.096000e+03 0.68666667 9 NA", + "text": "For tuning you have to specify the search space, the optimization algorithm, an evaluation method, i.e., a resampling strategy and a performance measure. The last point is already covered in this tutorial in the parts about the evaluation of learning methods and resampling . Below we show how to specify the search space and optimization algorithm, how to do the\ntuning and how to access the tuning result, using the example of a grid search. Throughout this section we consider classification examples. For the other types of learning\nproblems tuning works analogously. Grid search with manual discretization A grid search is one of the standard -- albeit slow -- ways to choose an\nappropriate set of parameters from a given range of values. We use the iris classification task for illustration and tune the\nhyperparameters of an SVM (function ksvm from the kernlab package)\nwith a radial basis kernel. First, we create a ParamSet object, which describes the\nparameter space we wish to search.\nThis is done via function makeParamSet .\nWe wish to tune the cost parameter C and the RBF kernel parameter sigma of the ksvm function.\nSince we will use a grid search strategy, we add discrete parameters to the parameter set.\nThe specified values have to be vectors of feasible settings and the complete grid simply is\ntheir cross-product.\nEvery entry in the parameter set has to be named according to the corresponding parameter\nof the underlying R function. Please note that whenever parameters in the underlying R functions should be\npassed in a list structure, mlr tries to give you direct access to\neach parameter and get rid of the list structure.\nThis is the case with the kpar argument of ksvm which is a list of\nkernel parameters like sigma . ps = makeParamSet(\n makeDiscreteParam( C , values = 2^(-2:2)),\n makeDiscreteParam( sigma , values = 2^(-2:2))\n) Additional to the parameter set, we need an instance of a TuneControl object.\nThese describe the optimization strategy to be used and its settings.\nHere we choose a grid search: ctrl = makeTuneControlGrid() We will use 3-fold cross-validation to assess the quality of a specific parameter setting.\nFor this we need to create a resampling description just like in the resampling \npart of the tutorial. rdesc = makeResampleDesc( CV , iters = 3L) Finally, by combining all the previous pieces, we can tune the SVM parameters by calling tuneParams . res = tuneParams( classif.ksvm , task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl)\n# [Tune] Started tuning learner classif.ksvm for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# C discrete - - 0.25,0.5,1,2,4 - TRUE -\n# sigma discrete - - 0.25,0.5,1,2,4 - TRUE -\n# With control class: TuneControlGrid\n# Imputation value: 1\n# [Tune-x] 1: C=0.25; sigma=0.25\n# [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 2: C=0.5; sigma=0.25\n# [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 3: C=1; sigma=0.25\n# [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 4: C=2; sigma=0.25\n# [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 5: C=4; sigma=0.25\n# [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 6: C=0.25; sigma=0.5\n# [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 7: C=0.5; sigma=0.5\n# [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 8: C=1; sigma=0.5\n# [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 9: C=2; sigma=0.5\n# [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 10: C=4; sigma=0.5\n# [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 11: C=0.25; sigma=1\n# [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 12: C=0.5; sigma=1\n# [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 13: C=1; sigma=1\n# [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 14: C=2; sigma=1\n# [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 15: C=4; sigma=1\n# [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 16: C=0.25; sigma=2\n# [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 17: C=0.5; sigma=2\n# [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 18: C=1; sigma=2\n# [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 19: C=2; sigma=2\n# [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 20: C=4; sigma=2\n# [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 21: C=0.25; sigma=4\n# [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 22: C=0.5; sigma=4\n# [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 23: C=1; sigma=4\n# [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 24: C=2; sigma=4\n# [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 25: C=4; sigma=4\n# [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune] Result: C=1; sigma=2 : mmce.test.mean=0.0333\nres\n# Tune result:\n# Op. pars: C=1; sigma=2\n# mmce.test.mean=0.0333 tuneParams simply performs the cross-validation for every element of the\ncross-product and selects the parameter setting with the best mean performance.\nAs no performance measure was specified, by default the error rate ( mmce ) is\nused. Note that each measure \"knows\" if it is minimized or maximized during tuning. ## error rate\nmmce$minimize\n# [1] TRUE\n\n## accuracy\nacc$minimize\n# [1] FALSE Of course, you can pass other measures and also a list of measures to tuneParams .\nIn the latter case the first measure is optimized during tuning, the others are simply evaluated.\nIf you are interested in optimizing several measures simultaneously have a look at the\nparagraph about multi-criteria tuning below. In the example below we calculate the accuracy ( acc ) instead of the error\nrate.\nWe use function setAggregation , as described in the section on resampling ,\nto additionally obtain the standard deviation of the accuracy. res = tuneParams( classif.ksvm , task = iris.task, resampling = rdesc, par.set = ps,\n control = ctrl, measures = list(acc, setAggregation(acc, test.sd)), show.info = FALSE)\nres\n# Tune result:\n# Op. pars: C=0.25; sigma=0.25\n# acc.test.mean=0.953,acc.test.sd=0.0306 Accessing the tuning result The result object TuneResult allows you to access the best found settings $x and their\nestimated performance $y . res$x\n# $C\n# [1] 0.25\n# \n# $sigma\n# [1] 0.25\nres$y\n# acc.test.mean acc.test.sd \n# 0.9533333 0.0305505 Moreover, we can inspect all points evaluated during the search by accessing the $opt.path (see also the documentation of OptPath ). res$opt.path\n# Optimization path\n# Dimensions: x = 2/2, y = 2\n# Length: 25\n# Add x values transformed: FALSE\n# Error messages: TRUE. Errors: 0 / 25.\n# Exec times: TRUE. Range: 0.067 - 0.085. 0 NAs.\nopt.grid = as.data.frame(res$opt.path)\nhead(opt.grid)\n# C sigma acc.test.mean acc.test.sd dob eol error.message exec.time\n# 1 0.25 0.25 0.9533333 0.03055050 1 NA NA 0.071\n# 2 0.5 0.25 0.9466667 0.02309401 2 NA NA 0.069\n# 3 1 0.25 0.9533333 0.01154701 3 NA NA 0.067\n# 4 2 0.25 0.9533333 0.01154701 4 NA NA 0.067\n# 5 4 0.25 0.9533333 0.01154701 5 NA NA 0.068\n# 6 0.25 0.5 0.9333333 0.01154701 6 NA NA 0.069 A quick visualization of the performance values on the search grid can be accomplished as follows: library(ggplot2)\ng = ggplot(opt.grid, aes(x = C, y = sigma, fill = acc.test.mean, label = round(acc.test.sd, 3)))\ng + geom_tile() + geom_text(color = white ) The colors of the tiles display the achieved accuracy, the tile labels show the standard deviation. Using the optimal parameter values After tuning you can generate a Learner with optimal hyperparameter settings\nas follows: lrn = setHyperPars(makeLearner( classif.ksvm ), par.vals = res$x)\nlrn\n# Learner classif.ksvm from package kernlab\n# Type: classif\n# Name: Support Vector Machines; Short name: ksvm\n# Class: classif.ksvm\n# Properties: twoclass,multiclass,numerics,factors,prob,class.weights\n# Predict-Type: response\n# Hyperparameters: fit=FALSE,C=0.25,sigma=0.25 Then you can proceed as usual.\nHere we refit and predict the learner on the complete iris data\nset. m = train(lrn, iris.task)\npredict(m, task = iris.task)\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: 0.01\n# id truth response\n# 1 1 setosa setosa\n# 2 2 setosa setosa\n# 3 3 setosa setosa\n# 4 4 setosa setosa\n# 5 5 setosa setosa\n# 6 6 setosa setosa Grid search without manual discretization We can also specify the true numeric parameter types of C and sigma when creating the\nparameter set and use the resolution option of makeTuneControlGrid to\nautomatically discretize them. Note how we also make use of the trafo option when creating the parameter set to easily\noptimize on a log-scale. Trafos work like this: All optimizers basically see the parameters on their\noriginal scale (from -12 to 12) in this case and produce values on this scale during the search.\nRight before they are passed to the learning algorithm, the transformation function is applied. ps = makeParamSet(\n makeNumericParam( C , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x)\n)\nctrl = makeTuneControlGrid(resolution = 3L)\nrdesc = makeResampleDesc( CV , iters = 2L)\nres = tuneParams( classif.ksvm , iris.task, rdesc, par.set = ps, control = ctrl)\n# [Tune] Started tuning learner classif.ksvm for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# C numeric - - -12 to 12 - TRUE Y\n# sigma numeric - - -12 to 12 - TRUE Y\n# With control class: TuneControlGrid\n# Imputation value: 1\n# [Tune-x] 1: C=0.000244; sigma=0.000244\n# [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 2: C=1; sigma=0.000244\n# [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 3: C=4.1e+03; sigma=0.000244\n# [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 4: C=0.000244; sigma=1\n# [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 5: C=1; sigma=1\n# [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 6: C=4.1e+03; sigma=1\n# [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 7: C=0.000244; sigma=4.1e+03\n# [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 8: C=1; sigma=4.1e+03\n# [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune-x] 9: C=4.1e+03; sigma=4.1e+03\n# [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max\n# [Tune] Result: C=1; sigma=1 : mmce.test.mean=0.04\nres\n# Tune result:\n# Op. pars: C=1; sigma=1\n# mmce.test.mean=0.04 Note that res$opt.path contains the parameter values on the original scale . as.data.frame(res$opt.path)\n# C sigma mmce.test.mean dob eol error.message exec.time\n# 1 -12 -12 0.52666667 1 NA NA 0.048\n# 2 0 -12 0.52666667 2 NA NA 0.050\n# 3 12 -12 0.04000000 3 NA NA 0.046\n# 4 -12 0 0.52666667 4 NA NA 0.050\n# 5 0 0 0.04000000 5 NA NA 0.047\n# 6 12 0 0.06666667 6 NA NA 0.048\n# 7 -12 12 0.56666667 7 NA NA 0.049\n# 8 0 12 0.68666667 8 NA NA 0.049\n# 9 12 12 0.68666667 9 NA NA 0.049 In order to get the transformed parameter values instead, use function trafoOptPath . as.data.frame(trafoOptPath(res$opt.path))\n# C sigma mmce.test.mean dob eol\n# 1 2.441406e-04 2.441406e-04 0.52666667 1 NA\n# 2 1.000000e+00 2.441406e-04 0.52666667 2 NA\n# 3 4.096000e+03 2.441406e-04 0.04000000 3 NA\n# 4 2.441406e-04 1.000000e+00 0.52666667 4 NA\n# 5 1.000000e+00 1.000000e+00 0.04000000 5 NA\n# 6 4.096000e+03 1.000000e+00 0.06666667 6 NA\n# 7 2.441406e-04 4.096000e+03 0.56666667 7 NA\n# 8 1.000000e+00 4.096000e+03 0.68666667 8 NA\n# 9 4.096000e+03 4.096000e+03 0.68666667 9 NA", "title": "Basics" }, { "location": "/tune/index.html#iterated-f-racing-for-mixed-spaces-and-dependencies", - "text": "The package supports a larger number of tuning algorithms, which can all be looked up and\nselected via TuneControl . One of the cooler algorithms is iterated F-racing from the irace package. This not only works for arbitrary parameter types (numeric, integer,\ndiscrete, logical), but also for so-called dependent / hierarchical parameters: ps = makeParamSet(\n makeNumericParam( C , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeDiscreteParam( kernel , values = c( vanilladot , polydot , rbfdot )),\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x,\n requires = quote(kernel == rbfdot )),\n makeIntegerParam( degree , lower = 2L, upper = 5L,\n requires = quote(kernel == polydot ))\n)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nrdesc = makeResampleDesc( Holdout )\nres = tuneParams( classif.ksvm , iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n# C kernel sigma degree mmce.test.mean dob eol\n# 1 3.148837 polydot NA 5 0.08 1 NA\n# 2 3.266305 vanilladot NA NA 0.02 2 NA\n# 3 -3.808213 vanilladot NA NA 0.04 3 NA\n# 4 1.694097 rbfdot 6.580514 NA 0.48 4 NA\n# 5 11.995501 polydot NA 2 0.08 5 NA\n# 6 -5.731782 vanilladot NA NA 0.14 6 NA\n# error.message exec.time\n# 1 NA 0.033\n# 2 NA 0.034\n# 3 NA 0.031\n# 4 NA 0.035\n# 5 NA 0.060\n# 6 NA 0.031 See how we made the kernel parameters like sigma and degree dependent on the kernel \nselection parameters? This approach allows you to tune parameters of multiple kernels at once, \nefficiently concentrating on the ones which work best for your given data set.", + "text": "The package supports a larger number of tuning algorithms, which can all be looked up and\nselected via TuneControl . One of the cooler algorithms is iterated F-racing from the irace package. This not only works for arbitrary parameter types (numeric, integer,\ndiscrete, logical), but also for so-called dependent / hierarchical parameters: ps = makeParamSet(\n makeNumericParam( C , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeDiscreteParam( kernel , values = c( vanilladot , polydot , rbfdot )),\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x,\n requires = quote(kernel == rbfdot )),\n makeIntegerParam( degree , lower = 2L, upper = 5L,\n requires = quote(kernel == polydot ))\n)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nrdesc = makeResampleDesc( Holdout )\nres = tuneParams( classif.ksvm , iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n# C kernel sigma degree mmce.test.mean dob eol\n# 1 3.148837 polydot NA 5 0.08 1 NA\n# 2 3.266305 vanilladot NA NA 0.02 2 NA\n# 3 -3.808213 vanilladot NA NA 0.04 3 NA\n# 4 1.694097 rbfdot 6.580514 NA 0.48 4 NA\n# 5 11.995501 polydot NA 2 0.08 5 NA\n# 6 -5.731782 vanilladot NA NA 0.14 6 NA\n# error.message exec.time\n# 1 NA 0.031\n# 2 NA 0.027\n# 3 NA 0.029\n# 4 NA 0.030\n# 5 NA 0.028\n# 6 NA 0.031 See how we made the kernel parameters like sigma and degree dependent on the kernel \nselection parameters? This approach allows you to tune parameters of multiple kernels at once, \nefficiently concentrating on the ones which work best for your given data set.", "title": "Iterated F-Racing for mixed spaces and dependencies" }, { "location": "/tune/index.html#tuning-across-whole-model-spaces-with-modelmultiplexer", - "text": "We can now take the following example even one step further. If we use the ModelMultiplexer we can tune over different model classes at once,\njust as we did with the SVM kernels above. base.learners = list(\n makeLearner( classif.ksvm ),\n makeLearner( classif.randomForest )\n)\nlrn = makeModelMultiplexer(base.learners) Function makeModelMultiplexerParamSet offers a simple way to contruct parameter set for tuning:\nThe parameter names are prefixed automatically and the requires element is set, too,\nto make all paramaters subordinate to selected.learner . ps = makeModelMultiplexerParamSet(lrn,\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeIntegerParam( ntree , lower = 1L, upper = 500L)\n)\nprint(ps)\n# Type len Def\n# selected.learner discrete - -\n# classif.ksvm.sigma numeric - -\n# classif.randomForest.ntree integer - -\n# Constr Req Tunable\n# selected.learner classif.ksvm,classif.randomForest - TRUE\n# classif.ksvm.sigma -12 to 12 Y TRUE\n# classif.randomForest.ntree 1 to 500 Y TRUE\n# Trafo\n# selected.learner -\n# classif.ksvm.sigma Y\n# classif.randomForest.ntree -\nrdesc = makeResampleDesc( CV , iters = 2L)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nres = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n# selected.learner classif.ksvm.sigma classif.randomForest.ntree\n# 1 classif.ksvm -3.673815 NA\n# 2 classif.ksvm 6.361006 NA\n# 3 classif.randomForest NA 487\n# 4 classif.ksvm 3.165340 NA\n# 5 classif.randomForest NA 125\n# 6 classif.randomForest NA 383\n# mmce.test.mean dob eol error.message exec.time\n# 1 0.04666667 1 NA NA 0.056\n# 2 0.75333333 2 NA NA 0.055\n# 3 0.03333333 3 NA NA 0.094\n# 4 0.24000000 4 NA NA 0.060\n# 5 0.04000000 5 NA NA 0.056\n# 6 0.04000000 6 NA NA 0.079", + "text": "We can now take the following example even one step further. If we use the ModelMultiplexer we can tune over different model classes at once,\njust as we did with the SVM kernels above. base.learners = list(\n makeLearner( classif.ksvm ),\n makeLearner( classif.randomForest )\n)\nlrn = makeModelMultiplexer(base.learners) Function makeModelMultiplexerParamSet offers a simple way to contruct parameter set for tuning:\nThe parameter names are prefixed automatically and the requires element is set, too,\nto make all paramaters subordinate to selected.learner . ps = makeModelMultiplexerParamSet(lrn,\n makeNumericParam( sigma , lower = -12, upper = 12, trafo = function(x) 2^x),\n makeIntegerParam( ntree , lower = 1L, upper = 500L)\n)\nprint(ps)\n# Type len Def\n# selected.learner discrete - -\n# classif.ksvm.sigma numeric - -\n# classif.randomForest.ntree integer - -\n# Constr Req Tunable\n# selected.learner classif.ksvm,classif.randomForest - TRUE\n# classif.ksvm.sigma -12 to 12 Y TRUE\n# classif.randomForest.ntree 1 to 500 Y TRUE\n# Trafo\n# selected.learner -\n# classif.ksvm.sigma Y\n# classif.randomForest.ntree -\nrdesc = makeResampleDesc( CV , iters = 2L)\nctrl = makeTuneControlIrace(maxExperiments = 200L)\nres = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl, show.info = FALSE)\nprint(head(as.data.frame(res$opt.path)))\n# selected.learner classif.ksvm.sigma classif.randomForest.ntree\n# 1 classif.ksvm -3.673815 NA\n# 2 classif.ksvm 6.361006 NA\n# 3 classif.randomForest NA 487\n# 4 classif.ksvm 3.165340 NA\n# 5 classif.randomForest NA 125\n# 6 classif.randomForest NA 383\n# mmce.test.mean dob eol error.message exec.time\n# 1 0.04666667 1 NA NA 0.051\n# 2 0.75333333 2 NA NA 0.054\n# 3 0.03333333 3 NA NA 0.079\n# 4 0.24000000 4 NA NA 0.053\n# 5 0.04000000 5 NA NA 0.049\n# 6 0.04000000 6 NA NA 0.071", "title": "Tuning across whole model spaces with ModelMultiplexer" }, { @@ -402,7 +402,7 @@ }, { "location": "/feature_selection/index.html", - "text": "Feature Selection\n\n\nOften, data sets include a large number of features.\nThe technique of extracting a subset of relevant features is called feature selection.\nFeature selection can enhance the interpretability of the model, speed up the learning\nprocess and improve the learner performance.\nThere exist different approaches to identify the relevant features.\n\nmlr\n supports \nfilter\n and \nwrapper methods\n.\n\n\nFilter methods\n\n\nFilter methods assign an importance value to each feature.\nBased on these values the features can be ranked and a feature subset can be selected.\n\n\nCalculating the feature importance\n\n\nDifferent methods for calculating the feature importance are built into \nmlr\n's function\n\ngenerateFilterValuesData\n (\ngetFilterValues\n has been deprecated in favor of \ngenerateFilterValuesData\n.). Currently, classification, regression and survival analysis tasks\nare supported. A table showing all available methods can be found \nhere\n.\n\n\nFunction \ngenerateFilterValuesData\n requires the \nTask\n and a character string specifying the filter\nmethod.\n\n\nfv = generateFilterValuesData(iris.task, method = \ninformation.gain\n)\nfv\n#\n FilterValues:\n#\n Task: iris-example\n#\n name type information.gain\n#\n 1 Sepal.Length numeric 0.4521286\n#\n 2 Sepal.Width numeric 0.2672750\n#\n 3 Petal.Length numeric 0.9402853\n#\n 4 Petal.Width numeric 0.9554360\n\n\n\n\nfv\n is a \nFilterValues\n object and \nfv$data\n contains a \ndata.frame\n\nthat gives the importance values for all features. Optionally, a vector of filter methods can be\npassed.\n\n\nfv2 = generateFilterValuesData(iris.task, method = c(\ninformation.gain\n, \nchi.squared\n))\nfv2$data\n#\n name type information.gain chi.squared\n#\n 1 Sepal.Length numeric 0.4521286 0.6288067\n#\n 2 Sepal.Width numeric 0.2672750 0.4922162\n#\n 3 Petal.Length numeric 0.9402853 0.9346311\n#\n 4 Petal.Width numeric 0.9554360 0.9432359\n\n\n\n\nA bar plot of importance values for the individual features can be obtained using\nfunction \nplotFilterValues\n.\n\n\nplotFilterValues(fv2)\n\n\n\n\n \n\n\nBy default \nplotFilterValues\n will create facetted subplots if multiple filter methods are passed as input to\n\ngenerateFilterValuesData\n.\n\n\nThere is also an experimental \nggvis\n plotting function, \nplotFilterValuesGGVIS\n. This takes the same\narguments as \nplotFilterValues\n and produces a \nshiny\n application\nthat allows the interactive selection of the displayed filter method, the number of features selected, and the sorting method (e.g., ascending or descending).\n\n\nplotFilterValuesGGVIS(fv2)\n\n\n\n\nAccording to the \n\"information.gain\"\n measure, \nPetal.Width\n and \nPetal.Length\n\ncontain the most information about the target variable \nSpecies\n.\n\n\nSelecting a feature subset\n\n\nWith \nmlr\n's function \nfilterFeatures\n you can create a new \nTask\n by leaving out\nfeatures of lower importance.\n\n\nThere are several ways to select a feature subset based on feature importance values:\n\n\n\n\nKeep a certain \nabsolute number\n (\nabs\n) of features with highest importance.\n\n\nKeep a certain \npercentage\n (\nperc\n) of features with highest importance.\n\n\nKeep all features whose importance exceeds a certain \nthreshold value\n (\nthreshold\n).\n\n\n\n\nFunction \nfilterFeatures\n supports these three methods as shown in the following example.\nMoreover, you can either specify the \nmethod\n for calculating the feature importance or you can\nuse previously computed importance values via argument \nfval\n.\n\n\n## Keep the 2 most important features\nfiltered.task = filterFeatures(iris.task, method = \ninformation.gain\n, abs = 2)\n\n## Keep the 25% most important features\nfiltered.task = filterFeatures(iris.task, fval = fv, perc = 0.25)\n\n## Keep all features with importance greater than 0.5\nfiltered.task = filterFeatures(iris.task, fval = fv, threshold = 0.5)\nfiltered.task\n#\n Supervised task: iris-example\n#\n Type: classif\n#\n Target: Species\n#\n Observations: 150\n#\n Features:\n#\n numerics factors ordered \n#\n 2 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 3\n#\n setosa versicolor virginica \n#\n 50 50 50 \n#\n Positive class: NA\n\n\n\n\nFuse a learner with a filter method\n\n\nOften feature selection based on a filter method is part of the data preprocessing and in\na subsequent step a learning method is applied to the filtered data.\nIn a proper experimental setup you might want to automate the selection of the\nfeatures so that it can be part of the validation method of your choice.\nA \nLearner\n can be fused with a filter method by function \nmakeFilterWrapper\n.\nThe resulting \nLearner\n has the additional class attribute \nFilterWrapper\n.\n\n\nIn the following example we calculate the 10-fold cross-validated error rate (\nmmce\n)\nof the \nk nearest neighbor classifier\n with preceding feature selection on the\n\niris\n data set.\nWe use \n\"information.gain\"\n as importance measure and select the 2 features with\nhighest importance.\nIn each resampling iteration feature selection is carried out on the corresponding training\ndata set before fitting the learner.\n\n\nlrn = makeFilterWrapper(learner = \nclassif.fnn\n, fw.method = \ninformation.gain\n, fw.abs = 2)\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nr = resample(learner = lrn, task = iris.task, resampling = rdesc, show.info = FALSE, models = TRUE)\nr$aggr\n#\n mmce.test.mean \n#\n 0.04\n\n\n\n\nYou may want to know which features have been used. Luckily, we have called\n\nresample\n with the argument \nmodels = TRUE\n, which means that \nr$models\n\ncontains a \nlist\n of \nmodels\n fitted in the individual resampling iterations.\nIn order to access the selected feature subsets we can call \ngetFilteredFeatures\n on each model.\n\n\nsfeats = sapply(r$models, getFilteredFeatures)\ntable(sfeats)\n#\n sfeats\n#\n Petal.Length Petal.Width \n#\n 10 10\n\n\n\n\nThe selection of features seems to be very stable.\nThe features \nSepal.Length\n and \nSepal.Width\n did not make it into a single fold.\n\n\nTuning the size of the feature subset\n\n\nIn the above examples the number/percentage of features to select or the threshold value\nhave been arbitrarily chosen.\nIf filtering is a preprocessing step before applying a learning method optimal values\nwith regard to the learner performance can be found by \ntuning\n.\n\n\nIn the following regression example we consider the \nBostonHousing\n data set.\nWe use a \nlinear regression model\n and determine the optimal percentage value for feature selection\nsuch that the 3-fold cross-validated \nmean squared error\n of the learner is minimal.\nAs search strategy for tuning a grid search is used.\n\n\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.perc\n, values = seq(0.2, 0.5, 0.05)))\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nres = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps,\n control = makeTuneControlGrid())\n#\n [Tune] Started tuning learner regr.lm.filtered for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n fw.perc discrete - - 0.2,0.25,0.3,0.35,0.4,0.45,0.5 - TRUE -\n#\n With control class: TuneControlGrid\n#\n Imputation value: Inf\n#\n [Tune-x] 1: fw.perc=0.2\n#\n [Tune-y] 1: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 2: fw.perc=0.25\n#\n [Tune-y] 2: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 3: fw.perc=0.3\n#\n [Tune-y] 3: mse.test.mean=37.1; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 4: fw.perc=0.35\n#\n [Tune-y] 4: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 5: fw.perc=0.4\n#\n [Tune-y] 5: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 6: fw.perc=0.45\n#\n [Tune-y] 6: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 7: fw.perc=0.5\n#\n [Tune-y] 7: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune] Result: fw.perc=0.5 : mse.test.mean=27.4\nres\n#\n Tune result:\n#\n Op. pars: fw.perc=0.5\n#\n mse.test.mean=27.4\n\n\n\n\nThe performance of all percentage values visited during tuning is:\n\n\nas.data.frame(res$opt.path)\n#\n fw.perc mse.test.mean dob eol error.message exec.time\n#\n 1 0.2 40.59578 1 NA \nNA\n 0.270\n#\n 2 0.25 40.59578 2 NA \nNA\n 0.235\n#\n 3 0.3 37.05592 3 NA \nNA\n 0.238\n#\n 4 0.35 35.83712 4 NA \nNA\n 0.240\n#\n 5 0.4 35.83712 5 NA \nNA\n 0.237\n#\n 6 0.45 27.39955 6 NA \nNA\n 0.238\n#\n 7 0.5 27.39955 7 NA \nNA\n 0.244\n\n\n\n\nThe optimal percentage and the corresponding performance can be accessed as follows:\n\n\nres$x\n#\n $fw.perc\n#\n [1] 0.5\nres$y\n#\n mse.test.mean \n#\n 27.39955\n\n\n\n\nAfter tuning we can generate a new wrapped learner with the optimal percentage value for\nfurther use.\n\n\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n, fw.perc = res$x$fw.perc)\nmod = train(lrn, bh.task)\nmod\n#\n Model for learner.id=regr.lm.filtered; learner.class=FilterWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 506; features = 13\n#\n Hyperparameters: fw.method=chi.squared,fw.perc=0.5\n\ngetFilteredFeatures(mod)\n#\n [1] \ncrim\n \nzn\n \nrm\n \ndis\n \nrad\n \nlstat\n\n\n\n\n\nHere is another example using \nmulti-criteria tuning\n.\nWe consider \nlinear discriminant analysis\n with precedent feature selection based on\nthe Chi-squared statistic of independence (\n\"chi.squared\"\n) on the \nSonar\n\ndata set and tune the threshold value.\nDuring tuning both, the false positive and the false negative rate (\nfpr\n and\n\nfnr\n), are minimized. As search strategy we choose a random search\n(see \nmakeTuneMultiCritControlRandom\n).\n\n\nlrn = makeFilterWrapper(learner = \nclassif.lda\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeNumericParam(\nfw.threshold\n, lower = 0.1, upper = 0.9))\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nres = tuneParamsMultiCrit(lrn, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = makeTuneMultiCritControlRandom(maxit = 50L),\n show.info = FALSE)\nres\n#\n Tune multicrit result:\n#\n Points on front: 13\nhead(as.data.frame(res$opt.path))\n#\n fw.threshold fpr.test.mean fnr.test.mean dob eol error.message exec.time\n#\n 1 0.4892321 0.3092818 0.2639033 1 NA \nNA\n 2.293\n#\n 2 0.2481696 0.2045499 0.2319697 2 NA \nNA\n 2.296\n#\n 3 0.7691875 0.5128000 0.3459740 3 NA \nNA\n 2.279\n#\n 4 0.1470133 0.2045499 0.2319697 4 NA \nNA\n 2.348\n#\n 5 0.5958241 0.5028216 0.5239538 5 NA \nNA\n 2.169\n#\n 6 0.6892421 0.6323959 0.4480808 6 NA \nNA\n 2.091\n\n\n\n\nThe results can be visualized with function \nplotTuneMultiCritResult\n.\nThe plot shows the false positive and false negative rates for all parameter values visited\nduring tuning. The size of the points on the Pareto front is slightly increased.\n\n\nplotTuneMultiCritResult(res)\n\n\n\n\n \n\n\nWrapper methods\n\n\nWrapper methods use the performance of a learning algorithm to assess the usefulness of\na feature set.\nIn order to select a feature subset a learner is trained repeatedly on different feature subsets\nand the subset which leads to the best learner performance is chosen.\n\n\nIn order to use the wrapper approach we have to decide:\n\n\n\n\nHow to assess the performance: This involves choosing a performance measure that serves\n as feature selection criterion and a resampling strategy.\n\n\nWhich learning method to use.\n\n\nHow to search the space of possible feature subsets.\n\n\n\n\nThe search strategy is defined by functions following the naming convention\n\nmakeFeatSelControl\nsearch_strategy\n.\nThe following search strategies are available:\n\n\n\n\nExhaustive search (\nmakeFeatSelControlExhaustive\n),\n\n\nGenetic algorithm (\nmakeFeatSelControlGA\n),\n\n\nRandom search (\nmakeFeatSelControlRandom\n),\n\n\nDeterministic forward or backward search (\nmakeFeatSelControlSequential\n).\n\n\n\n\nSelect a feature subset\n\n\nFeature selection can be conducted with function \nselectFeatures\n.\n\n\nIn the following example we perform an exhaustive search on the\n\nWisconsin Prognostic Breast Cancer\n data set.\nAs learning method we use the \nCox proportional hazards model\n.\nThe performance is assessed by the holdout estimate of the concordance index\n(\ncindex\n).\n\n\n## Specify the search strategy\nctrl = makeFeatSelControlRandom(maxit = 20L)\nctrl\n#\n FeatSel control: FeatSelControlRandom\n#\n Same resampling instance: TRUE\n#\n Imputation value: \nworst\n\n#\n Max. features: \nnot used\n\n#\n Max. iterations: 20\n#\n Tune threshold: FALSE\n#\n Further arguments: prob=0.5\n\n\n\n\nctrl\n is a \nFeatSelControl\n object that contains information about the search strategy\nand potential parameter values.\n\n\n## Resample description\nrdesc = makeResampleDesc(\nHoldout\n)\n\n## Select features\nsfeats = selectFeatures(learner = \nsurv.coxph\n, task = wpbc.task, resampling = rdesc,\n control = ctrl, show.info = FALSE)\nsfeats\n#\n FeatSel result:\n#\n Features (17): mean_radius, mean_area, mean_smoothness, mean_concavepoints, mean_symmetry, mean_fractaldim, SE_texture, SE_perimeter, SE_smoothness, SE_compactness, SE_concavity, SE_concavepoints, worst_area, worst_compactness, worst_concavepoints, tsize, pnodes\n#\n cindex.test.mean=0.714\n\n\n\n\nsfeats\nis a \nFeatSelResult\n object.\nThe selected features and the corresponding performance can be accessed as follows:\n\n\nsfeats$x\n#\n [1] \nmean_radius\n \nmean_area\n \nmean_smoothness\n \n#\n [4] \nmean_concavepoints\n \nmean_symmetry\n \nmean_fractaldim\n \n#\n [7] \nSE_texture\n \nSE_perimeter\n \nSE_smoothness\n \n#\n [10] \nSE_compactness\n \nSE_concavity\n \nSE_concavepoints\n \n#\n [13] \nworst_area\n \nworst_compactness\n \nworst_concavepoints\n\n#\n [16] \ntsize\n \npnodes\n\nsfeats$y\n#\n cindex.test.mean \n#\n 0.713799\n\n\n\n\nIn a second example we fit a simple linear regression model to the \nBostonHousing\n\ndata set and use a sequential search to find a feature set that minimizes the mean squared\nerror (\nmse\n).\n\nmethod = \"sfs\"\n indicates that we want to conduct a sequential forward search where features\nare added to the model until the performance cannot be improved anymore.\nSee the documentation page \nmakeFeatSelControlSequential\n for other available\nsequential search methods.\nThe search is stopped if the improvement is smaller than \nalpha = 0.02\n.\n\n\n## Specify the search strategy\nctrl = makeFeatSelControlSequential(method = \nsfs\n, alpha = 0.02)\n\n## Select features\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nsfeats = selectFeatures(learner = \nregr.lm\n, task = bh.task, resampling = rdesc, control = ctrl,\n show.info = FALSE)\nsfeats\n#\n FeatSel result:\n#\n Features (11): crim, zn, chas, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=23.7\n\n\n\n\nFurther information about the sequential feature selection process can be obtained by\nfunction \nanalyzeFeatSelResult\n.\n\n\nanalyzeFeatSelResult(sfeats)\n#\n Features : 11\n#\n Performance : mse.test.mean=23.7\n#\n crim, zn, chas, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 84.831 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 38.894 Diff: 45.936 *\n#\n - Features: 2 Add : rm Perf = 31.279 Diff: 7.6156 *\n#\n - Features: 3 Add : ptratio Perf = 28.108 Diff: 3.1703 *\n#\n - Features: 4 Add : dis Perf = 27.48 Diff: 0.62813 *\n#\n - Features: 5 Add : nox Perf = 26.079 Diff: 1.4008 *\n#\n - Features: 6 Add : b Perf = 25.563 Diff: 0.51594 *\n#\n - Features: 7 Add : chas Perf = 25.132 Diff: 0.43097 *\n#\n - Features: 8 Add : zn Perf = 24.792 Diff: 0.34018 *\n#\n - Features: 9 Add : rad Perf = 24.599 Diff: 0.19327 *\n#\n - Features: 10 Add : tax Perf = 24.082 Diff: 0.51706 *\n#\n - Features: 11 Add : crim Perf = 23.732 Diff: 0.35 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nFuse a learner with feature selection\n\n\nA \nLearner\n can be fused with a feature selection strategy (i.e., a search\nstrategy, a performance measure and a resampling strategy) by function \nmakeFeatSelWrapper\n.\nDuring training features are selected according to the specified selection scheme. Then, the\nlearner is trained on the selected feature subset.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeFeatSelWrapper(\nsurv.coxph\n, resampling = rdesc,\n control = makeFeatSelControlRandom(maxit = 10), show.info = FALSE)\nmod = train(lrn, task = wpbc.task)\nmod\n#\n Model for learner.id=surv.coxph.featsel; learner.class=FeatSelWrapper\n#\n Trained on: task.id = wpbc-example; obs = 194; features = 32\n#\n Hyperparameters:\n\n\n\n\nThe result of the feature selection can be extracted by function \ngetFeatSelResult\n.\n\n\nsfeats = getFeatSelResult(mod)\nsfeats\n#\n FeatSel result:\n#\n Features (19): mean_radius, mean_texture, mean_perimeter, mean_area, mean_smoothness, mean_compactness, mean_concavepoints, mean_fractaldim, SE_compactness, SE_concavity, SE_concavepoints, SE_symmetry, worst_texture, worst_perimeter, worst_area, worst_concavepoints, worst_symmetry, tsize, pnodes\n#\n cindex.test.mean=0.631\n\n\n\n\nThe selected features are:\n\n\nsfeats$x\n#\n [1] \nmean_radius\n \nmean_texture\n \nmean_perimeter\n \n#\n [4] \nmean_area\n \nmean_smoothness\n \nmean_compactness\n \n#\n [7] \nmean_concavepoints\n \nmean_fractaldim\n \nSE_compactness\n \n#\n [10] \nSE_concavity\n \nSE_concavepoints\n \nSE_symmetry\n \n#\n [13] \nworst_texture\n \nworst_perimeter\n \nworst_area\n \n#\n [16] \nworst_concavepoints\n \nworst_symmetry\n \ntsize\n \n#\n [19] \npnodes\n\n\n\n\n\nThe 5-fold cross-validated performance of the learner specified above can be computed as\nfollows:\n\n\nout.rdesc = makeResampleDesc(\nCV\n, iters = 5)\n\nr = resample(learner = lrn, task = wpbc.task, resampling = out.rdesc, models = TRUE,\n show.info = FALSE)\nr$aggr\n#\n cindex.test.mean \n#\n 0.632357\n\n\n\n\nThe selected feature sets in the individual resampling iterations can be extracted as follows:\n\n\nlapply(r$models, getFeatSelResult)\n#\n [[1]]\n#\n FeatSel result:\n#\n Features (18): mean_texture, mean_area, mean_smoothness, mean_compactness, mean_concavity, mean_symmetry, SE_radius, SE_compactness, SE_concavity, SE_concavepoints, SE_fractaldim, worst_radius, worst_smoothness, worst_compactness, worst_concavity, worst_symmetry, tsize, pnodes\n#\n cindex.test.mean=0.66\n#\n \n#\n [[2]]\n#\n FeatSel result:\n#\n Features (12): mean_area, mean_compactness, mean_symmetry, mean_fractaldim, SE_perimeter, SE_area, SE_concavity, SE_symmetry, worst_texture, worst_smoothness, worst_fractaldim, tsize\n#\n cindex.test.mean=0.652\n#\n \n#\n [[3]]\n#\n FeatSel result:\n#\n Features (14): mean_compactness, mean_symmetry, mean_fractaldim, SE_radius, SE_perimeter, SE_smoothness, SE_concavity, SE_concavepoints, SE_fractaldim, worst_concavity, worst_concavepoints, worst_symmetry, worst_fractaldim, pnodes\n#\n cindex.test.mean=0.607\n#\n \n#\n [[4]]\n#\n FeatSel result:\n#\n Features (18): mean_radius, mean_texture, mean_perimeter, mean_compactness, mean_concavity, SE_texture, SE_area, SE_smoothness, SE_concavity, SE_symmetry, SE_fractaldim, worst_radius, worst_compactness, worst_concavepoints, worst_symmetry, worst_fractaldim, tsize, pnodes\n#\n cindex.test.mean=0.653\n#\n \n#\n [[5]]\n#\n FeatSel result:\n#\n Features (14): mean_radius, mean_texture, mean_compactness, mean_concavepoints, mean_symmetry, SE_texture, SE_compactness, SE_symmetry, SE_fractaldim, worst_radius, worst_smoothness, worst_compactness, worst_concavity, pnodes\n#\n cindex.test.mean=0.626", + "text": "Feature Selection\n\n\nOften, data sets include a large number of features.\nThe technique of extracting a subset of relevant features is called feature selection.\nFeature selection can enhance the interpretability of the model, speed up the learning\nprocess and improve the learner performance.\nThere exist different approaches to identify the relevant features.\n\nmlr\n supports \nfilter\n and \nwrapper methods\n.\n\n\nFilter methods\n\n\nFilter methods assign an importance value to each feature.\nBased on these values the features can be ranked and a feature subset can be selected.\n\n\nCalculating the feature importance\n\n\nDifferent methods for calculating the feature importance are built into \nmlr\n's function\n\ngenerateFilterValuesData\n (\ngetFilterValues\n has been deprecated in favor of \ngenerateFilterValuesData\n.). Currently, classification, regression and survival analysis tasks\nare supported. A table showing all available methods can be found \nhere\n.\n\n\nFunction \ngenerateFilterValuesData\n requires the \nTask\n and a character string specifying the filter\nmethod.\n\n\nfv = generateFilterValuesData(iris.task, method = \ninformation.gain\n)\nfv\n#\n FilterValues:\n#\n Task: iris-example\n#\n name type information.gain\n#\n 1 Sepal.Length numeric 0.4521286\n#\n 2 Sepal.Width numeric 0.2672750\n#\n 3 Petal.Length numeric 0.9402853\n#\n 4 Petal.Width numeric 0.9554360\n\n\n\n\nfv\n is a \nFilterValues\n object and \nfv$data\n contains a \ndata.frame\n\nthat gives the importance values for all features. Optionally, a vector of filter methods can be\npassed.\n\n\nfv2 = generateFilterValuesData(iris.task, method = c(\ninformation.gain\n, \nchi.squared\n))\nfv2$data\n#\n name type information.gain chi.squared\n#\n 1 Sepal.Length numeric 0.4521286 0.6288067\n#\n 2 Sepal.Width numeric 0.2672750 0.4922162\n#\n 3 Petal.Length numeric 0.9402853 0.9346311\n#\n 4 Petal.Width numeric 0.9554360 0.9432359\n\n\n\n\nA bar plot of importance values for the individual features can be obtained using\nfunction \nplotFilterValues\n.\n\n\nplotFilterValues(fv2)\n\n\n\n\n \n\n\nBy default \nplotFilterValues\n will create facetted subplots if multiple filter methods are passed as input to\n\ngenerateFilterValuesData\n.\n\n\nThere is also an experimental \nggvis\n plotting function, \nplotFilterValuesGGVIS\n. This takes the same\narguments as \nplotFilterValues\n and produces a \nshiny\n application\nthat allows the interactive selection of the displayed filter method, the number of features selected, and the sorting method (e.g., ascending or descending).\n\n\nplotFilterValuesGGVIS(fv2)\n\n\n\n\nAccording to the \n\"information.gain\"\n measure, \nPetal.Width\n and \nPetal.Length\n\ncontain the most information about the target variable \nSpecies\n.\n\n\nSelecting a feature subset\n\n\nWith \nmlr\n's function \nfilterFeatures\n you can create a new \nTask\n by leaving out\nfeatures of lower importance.\n\n\nThere are several ways to select a feature subset based on feature importance values:\n\n\n\n\nKeep a certain \nabsolute number\n (\nabs\n) of features with highest importance.\n\n\nKeep a certain \npercentage\n (\nperc\n) of features with highest importance.\n\n\nKeep all features whose importance exceeds a certain \nthreshold value\n (\nthreshold\n).\n\n\n\n\nFunction \nfilterFeatures\n supports these three methods as shown in the following example.\nMoreover, you can either specify the \nmethod\n for calculating the feature importance or you can\nuse previously computed importance values via argument \nfval\n.\n\n\n## Keep the 2 most important features\nfiltered.task = filterFeatures(iris.task, method = \ninformation.gain\n, abs = 2)\n\n## Keep the 25% most important features\nfiltered.task = filterFeatures(iris.task, fval = fv, perc = 0.25)\n\n## Keep all features with importance greater than 0.5\nfiltered.task = filterFeatures(iris.task, fval = fv, threshold = 0.5)\nfiltered.task\n#\n Supervised task: iris-example\n#\n Type: classif\n#\n Target: Species\n#\n Observations: 150\n#\n Features:\n#\n numerics factors ordered \n#\n 2 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 3\n#\n setosa versicolor virginica \n#\n 50 50 50 \n#\n Positive class: NA\n\n\n\n\nFuse a learner with a filter method\n\n\nOften feature selection based on a filter method is part of the data preprocessing and in\na subsequent step a learning method is applied to the filtered data.\nIn a proper experimental setup you might want to automate the selection of the\nfeatures so that it can be part of the validation method of your choice.\nA \nLearner\n can be fused with a filter method by function \nmakeFilterWrapper\n.\nThe resulting \nLearner\n has the additional class attribute \nFilterWrapper\n.\n\n\nIn the following example we calculate the 10-fold cross-validated error rate (\nmmce\n)\nof the \nk nearest neighbor classifier\n with preceding feature selection on the\n\niris\n data set.\nWe use \n\"information.gain\"\n as importance measure and select the 2 features with\nhighest importance.\nIn each resampling iteration feature selection is carried out on the corresponding training\ndata set before fitting the learner.\n\n\nlrn = makeFilterWrapper(learner = \nclassif.fnn\n, fw.method = \ninformation.gain\n, fw.abs = 2)\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nr = resample(learner = lrn, task = iris.task, resampling = rdesc, show.info = FALSE, models = TRUE)\nr$aggr\n#\n mmce.test.mean \n#\n 0.04\n\n\n\n\nYou may want to know which features have been used. Luckily, we have called\n\nresample\n with the argument \nmodels = TRUE\n, which means that \nr$models\n\ncontains a \nlist\n of \nmodels\n fitted in the individual resampling iterations.\nIn order to access the selected feature subsets we can call \ngetFilteredFeatures\n on each model.\n\n\nsfeats = sapply(r$models, getFilteredFeatures)\ntable(sfeats)\n#\n sfeats\n#\n Petal.Length Petal.Width \n#\n 10 10\n\n\n\n\nThe selection of features seems to be very stable.\nThe features \nSepal.Length\n and \nSepal.Width\n did not make it into a single fold.\n\n\nTuning the size of the feature subset\n\n\nIn the above examples the number/percentage of features to select or the threshold value\nhave been arbitrarily chosen.\nIf filtering is a preprocessing step before applying a learning method optimal values\nwith regard to the learner performance can be found by \ntuning\n.\n\n\nIn the following regression example we consider the \nBostonHousing\n data set.\nWe use a \nlinear regression model\n and determine the optimal percentage value for feature selection\nsuch that the 3-fold cross-validated \nmean squared error\n of the learner is minimal.\nAs search strategy for tuning a grid search is used.\n\n\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.perc\n, values = seq(0.2, 0.5, 0.05)))\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nres = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps,\n control = makeTuneControlGrid())\n#\n [Tune] Started tuning learner regr.lm.filtered for parameter set:\n#\n Type len Def Constr Req Tunable Trafo\n#\n fw.perc discrete - - 0.2,0.25,0.3,0.35,0.4,0.45,0.5 - TRUE -\n#\n With control class: TuneControlGrid\n#\n Imputation value: Inf\n#\n [Tune-x] 1: fw.perc=0.2\n#\n [Tune-y] 1: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 2: fw.perc=0.25\n#\n [Tune-y] 2: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 3: fw.perc=0.3\n#\n [Tune-y] 3: mse.test.mean=37.1; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 4: fw.perc=0.35\n#\n [Tune-y] 4: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 5: fw.perc=0.4\n#\n [Tune-y] 5: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 6: fw.perc=0.45\n#\n [Tune-y] 6: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune-x] 7: fw.perc=0.5\n#\n [Tune-y] 7: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n#\n [Tune] Result: fw.perc=0.5 : mse.test.mean=27.4\nres\n#\n Tune result:\n#\n Op. pars: fw.perc=0.5\n#\n mse.test.mean=27.4\n\n\n\n\nThe performance of all percentage values visited during tuning is:\n\n\nas.data.frame(res$opt.path)\n#\n fw.perc mse.test.mean dob eol error.message exec.time\n#\n 1 0.2 40.59578 1 NA \nNA\n 0.280\n#\n 2 0.25 40.59578 2 NA \nNA\n 0.269\n#\n 3 0.3 37.05592 3 NA \nNA\n 0.289\n#\n 4 0.35 35.83712 4 NA \nNA\n 0.258\n#\n 5 0.4 35.83712 5 NA \nNA\n 0.256\n#\n 6 0.45 27.39955 6 NA \nNA\n 0.258\n#\n 7 0.5 27.39955 7 NA \nNA\n 0.256\n\n\n\n\nThe optimal percentage and the corresponding performance can be accessed as follows:\n\n\nres$x\n#\n $fw.perc\n#\n [1] 0.5\nres$y\n#\n mse.test.mean \n#\n 27.39955\n\n\n\n\nAfter tuning we can generate a new wrapped learner with the optimal percentage value for\nfurther use.\n\n\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n, fw.perc = res$x$fw.perc)\nmod = train(lrn, bh.task)\nmod\n#\n Model for learner.id=regr.lm.filtered; learner.class=FilterWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 506; features = 13\n#\n Hyperparameters: fw.method=chi.squared,fw.perc=0.5\n\ngetFilteredFeatures(mod)\n#\n [1] \ncrim\n \nzn\n \nrm\n \ndis\n \nrad\n \nlstat\n\n\n\n\n\nHere is another example using \nmulti-criteria tuning\n.\nWe consider \nlinear discriminant analysis\n with precedent feature selection based on\nthe Chi-squared statistic of independence (\n\"chi.squared\"\n) on the \nSonar\n\ndata set and tune the threshold value.\nDuring tuning both, the false positive and the false negative rate (\nfpr\n and\n\nfnr\n), are minimized. As search strategy we choose a random search\n(see \nmakeTuneMultiCritControlRandom\n).\n\n\nlrn = makeFilterWrapper(learner = \nclassif.lda\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeNumericParam(\nfw.threshold\n, lower = 0.1, upper = 0.9))\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nres = tuneParamsMultiCrit(lrn, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = makeTuneMultiCritControlRandom(maxit = 50L),\n show.info = FALSE)\nres\n#\n Tune multicrit result:\n#\n Points on front: 13\nhead(as.data.frame(res$opt.path))\n#\n fw.threshold fpr.test.mean fnr.test.mean dob eol error.message exec.time\n#\n 1 0.4892321 0.3092818 0.2639033 1 NA \nNA\n 2.552\n#\n 2 0.2481696 0.2045499 0.2319697 2 NA \nNA\n 2.617\n#\n 3 0.7691875 0.5128000 0.3459740 3 NA \nNA\n 2.428\n#\n 4 0.1470133 0.2045499 0.2319697 4 NA \nNA\n 2.610\n#\n 5 0.5958241 0.5028216 0.5239538 5 NA \nNA\n 2.487\n#\n 6 0.6892421 0.6323959 0.4480808 6 NA \nNA\n 2.469\n\n\n\n\nThe results can be visualized with function \nplotTuneMultiCritResult\n.\nThe plot shows the false positive and false negative rates for all parameter values visited\nduring tuning. The size of the points on the Pareto front is slightly increased.\n\n\nplotTuneMultiCritResult(res)\n\n\n\n\n \n\n\nWrapper methods\n\n\nWrapper methods use the performance of a learning algorithm to assess the usefulness of\na feature set.\nIn order to select a feature subset a learner is trained repeatedly on different feature subsets\nand the subset which leads to the best learner performance is chosen.\n\n\nIn order to use the wrapper approach we have to decide:\n\n\n\n\nHow to assess the performance: This involves choosing a performance measure that serves\n as feature selection criterion and a resampling strategy.\n\n\nWhich learning method to use.\n\n\nHow to search the space of possible feature subsets.\n\n\n\n\nThe search strategy is defined by functions following the naming convention\n\nmakeFeatSelControl\nsearch_strategy\n.\nThe following search strategies are available:\n\n\n\n\nExhaustive search (\nmakeFeatSelControlExhaustive\n),\n\n\nGenetic algorithm (\nmakeFeatSelControlGA\n),\n\n\nRandom search (\nmakeFeatSelControlRandom\n),\n\n\nDeterministic forward or backward search (\nmakeFeatSelControlSequential\n).\n\n\n\n\nSelect a feature subset\n\n\nFeature selection can be conducted with function \nselectFeatures\n.\n\n\nIn the following example we perform an exhaustive search on the\n\nWisconsin Prognostic Breast Cancer\n data set.\nAs learning method we use the \nCox proportional hazards model\n.\nThe performance is assessed by the holdout estimate of the concordance index\n(\ncindex\n).\n\n\n## Specify the search strategy\nctrl = makeFeatSelControlRandom(maxit = 20L)\nctrl\n#\n FeatSel control: FeatSelControlRandom\n#\n Same resampling instance: TRUE\n#\n Imputation value: \nworst\n\n#\n Max. features: \nnot used\n\n#\n Max. iterations: 20\n#\n Tune threshold: FALSE\n#\n Further arguments: prob=0.5\n\n\n\n\nctrl\n is a \nFeatSelControl\n object that contains information about the search strategy\nand potential parameter values.\n\n\n## Resample description\nrdesc = makeResampleDesc(\nHoldout\n)\n\n## Select features\nsfeats = selectFeatures(learner = \nsurv.coxph\n, task = wpbc.task, resampling = rdesc,\n control = ctrl, show.info = FALSE)\nsfeats\n#\n FeatSel result:\n#\n Features (17): mean_radius, mean_area, mean_smoothness, mean_concavepoints, mean_symmetry, mean_fractaldim, SE_texture, SE_perimeter, SE_smoothness, SE_compactness, SE_concavity, SE_concavepoints, worst_area, worst_compactness, worst_concavepoints, tsize, pnodes\n#\n cindex.test.mean=0.714\n\n\n\n\nsfeats\nis a \nFeatSelResult\n object.\nThe selected features and the corresponding performance can be accessed as follows:\n\n\nsfeats$x\n#\n [1] \nmean_radius\n \nmean_area\n \nmean_smoothness\n \n#\n [4] \nmean_concavepoints\n \nmean_symmetry\n \nmean_fractaldim\n \n#\n [7] \nSE_texture\n \nSE_perimeter\n \nSE_smoothness\n \n#\n [10] \nSE_compactness\n \nSE_concavity\n \nSE_concavepoints\n \n#\n [13] \nworst_area\n \nworst_compactness\n \nworst_concavepoints\n\n#\n [16] \ntsize\n \npnodes\n\nsfeats$y\n#\n cindex.test.mean \n#\n 0.713799\n\n\n\n\nIn a second example we fit a simple linear regression model to the \nBostonHousing\n\ndata set and use a sequential search to find a feature set that minimizes the mean squared\nerror (\nmse\n).\n\nmethod = \"sfs\"\n indicates that we want to conduct a sequential forward search where features\nare added to the model until the performance cannot be improved anymore.\nSee the documentation page \nmakeFeatSelControlSequential\n for other available\nsequential search methods.\nThe search is stopped if the improvement is smaller than \nalpha = 0.02\n.\n\n\n## Specify the search strategy\nctrl = makeFeatSelControlSequential(method = \nsfs\n, alpha = 0.02)\n\n## Select features\nrdesc = makeResampleDesc(\nCV\n, iters = 10)\nsfeats = selectFeatures(learner = \nregr.lm\n, task = bh.task, resampling = rdesc, control = ctrl,\n show.info = FALSE)\nsfeats\n#\n FeatSel result:\n#\n Features (11): crim, zn, chas, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=23.7\n\n\n\n\nFurther information about the sequential feature selection process can be obtained by\nfunction \nanalyzeFeatSelResult\n.\n\n\nanalyzeFeatSelResult(sfeats)\n#\n Features : 11\n#\n Performance : mse.test.mean=23.7\n#\n crim, zn, chas, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 84.831 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 38.894 Diff: 45.936 *\n#\n - Features: 2 Add : rm Perf = 31.279 Diff: 7.6156 *\n#\n - Features: 3 Add : ptratio Perf = 28.108 Diff: 3.1703 *\n#\n - Features: 4 Add : dis Perf = 27.48 Diff: 0.62813 *\n#\n - Features: 5 Add : nox Perf = 26.079 Diff: 1.4008 *\n#\n - Features: 6 Add : b Perf = 25.563 Diff: 0.51594 *\n#\n - Features: 7 Add : chas Perf = 25.132 Diff: 0.43097 *\n#\n - Features: 8 Add : zn Perf = 24.792 Diff: 0.34018 *\n#\n - Features: 9 Add : rad Perf = 24.599 Diff: 0.19327 *\n#\n - Features: 10 Add : tax Perf = 24.082 Diff: 0.51706 *\n#\n - Features: 11 Add : crim Perf = 23.732 Diff: 0.35 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nFuse a learner with feature selection\n\n\nA \nLearner\n can be fused with a feature selection strategy (i.e., a search\nstrategy, a performance measure and a resampling strategy) by function \nmakeFeatSelWrapper\n.\nDuring training features are selected according to the specified selection scheme. Then, the\nlearner is trained on the selected feature subset.\n\n\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeFeatSelWrapper(\nsurv.coxph\n, resampling = rdesc,\n control = makeFeatSelControlRandom(maxit = 10), show.info = FALSE)\nmod = train(lrn, task = wpbc.task)\nmod\n#\n Model for learner.id=surv.coxph.featsel; learner.class=FeatSelWrapper\n#\n Trained on: task.id = wpbc-example; obs = 194; features = 32\n#\n Hyperparameters:\n\n\n\n\nThe result of the feature selection can be extracted by function \ngetFeatSelResult\n.\n\n\nsfeats = getFeatSelResult(mod)\nsfeats\n#\n FeatSel result:\n#\n Features (19): mean_radius, mean_texture, mean_perimeter, mean_area, mean_smoothness, mean_compactness, mean_concavepoints, mean_fractaldim, SE_compactness, SE_concavity, SE_concavepoints, SE_symmetry, worst_texture, worst_perimeter, worst_area, worst_concavepoints, worst_symmetry, tsize, pnodes\n#\n cindex.test.mean=0.631\n\n\n\n\nThe selected features are:\n\n\nsfeats$x\n#\n [1] \nmean_radius\n \nmean_texture\n \nmean_perimeter\n \n#\n [4] \nmean_area\n \nmean_smoothness\n \nmean_compactness\n \n#\n [7] \nmean_concavepoints\n \nmean_fractaldim\n \nSE_compactness\n \n#\n [10] \nSE_concavity\n \nSE_concavepoints\n \nSE_symmetry\n \n#\n [13] \nworst_texture\n \nworst_perimeter\n \nworst_area\n \n#\n [16] \nworst_concavepoints\n \nworst_symmetry\n \ntsize\n \n#\n [19] \npnodes\n\n\n\n\n\nThe 5-fold cross-validated performance of the learner specified above can be computed as\nfollows:\n\n\nout.rdesc = makeResampleDesc(\nCV\n, iters = 5)\n\nr = resample(learner = lrn, task = wpbc.task, resampling = out.rdesc, models = TRUE,\n show.info = FALSE)\nr$aggr\n#\n cindex.test.mean \n#\n 0.632357\n\n\n\n\nThe selected feature sets in the individual resampling iterations can be extracted as follows:\n\n\nlapply(r$models, getFeatSelResult)\n#\n [[1]]\n#\n FeatSel result:\n#\n Features (18): mean_texture, mean_area, mean_smoothness, mean_compactness, mean_concavity, mean_symmetry, SE_radius, SE_compactness, SE_concavity, SE_concavepoints, SE_fractaldim, worst_radius, worst_smoothness, worst_compactness, worst_concavity, worst_symmetry, tsize, pnodes\n#\n cindex.test.mean=0.66\n#\n \n#\n [[2]]\n#\n FeatSel result:\n#\n Features (12): mean_area, mean_compactness, mean_symmetry, mean_fractaldim, SE_perimeter, SE_area, SE_concavity, SE_symmetry, worst_texture, worst_smoothness, worst_fractaldim, tsize\n#\n cindex.test.mean=0.652\n#\n \n#\n [[3]]\n#\n FeatSel result:\n#\n Features (14): mean_compactness, mean_symmetry, mean_fractaldim, SE_radius, SE_perimeter, SE_smoothness, SE_concavity, SE_concavepoints, SE_fractaldim, worst_concavity, worst_concavepoints, worst_symmetry, worst_fractaldim, pnodes\n#\n cindex.test.mean=0.607\n#\n \n#\n [[4]]\n#\n FeatSel result:\n#\n Features (18): mean_radius, mean_texture, mean_perimeter, mean_compactness, mean_concavity, SE_texture, SE_area, SE_smoothness, SE_concavity, SE_symmetry, SE_fractaldim, worst_radius, worst_compactness, worst_concavepoints, worst_symmetry, worst_fractaldim, tsize, pnodes\n#\n cindex.test.mean=0.653\n#\n \n#\n [[5]]\n#\n FeatSel result:\n#\n Features (14): mean_radius, mean_texture, mean_compactness, mean_concavepoints, mean_symmetry, SE_texture, SE_compactness, SE_symmetry, SE_fractaldim, worst_radius, worst_smoothness, worst_compactness, worst_concavity, pnodes\n#\n cindex.test.mean=0.626", "title": "Feature Selection" }, { @@ -412,7 +412,7 @@ }, { "location": "/feature_selection/index.html#filter-methods", - "text": "Filter methods assign an importance value to each feature.\nBased on these values the features can be ranked and a feature subset can be selected. Calculating the feature importance Different methods for calculating the feature importance are built into mlr 's function generateFilterValuesData ( getFilterValues has been deprecated in favor of generateFilterValuesData .). Currently, classification, regression and survival analysis tasks\nare supported. A table showing all available methods can be found here . Function generateFilterValuesData requires the Task and a character string specifying the filter\nmethod. fv = generateFilterValuesData(iris.task, method = information.gain )\nfv\n# FilterValues:\n# Task: iris-example\n# name type information.gain\n# 1 Sepal.Length numeric 0.4521286\n# 2 Sepal.Width numeric 0.2672750\n# 3 Petal.Length numeric 0.9402853\n# 4 Petal.Width numeric 0.9554360 fv is a FilterValues object and fv$data contains a data.frame \nthat gives the importance values for all features. Optionally, a vector of filter methods can be\npassed. fv2 = generateFilterValuesData(iris.task, method = c( information.gain , chi.squared ))\nfv2$data\n# name type information.gain chi.squared\n# 1 Sepal.Length numeric 0.4521286 0.6288067\n# 2 Sepal.Width numeric 0.2672750 0.4922162\n# 3 Petal.Length numeric 0.9402853 0.9346311\n# 4 Petal.Width numeric 0.9554360 0.9432359 A bar plot of importance values for the individual features can be obtained using\nfunction plotFilterValues . plotFilterValues(fv2) By default plotFilterValues will create facetted subplots if multiple filter methods are passed as input to generateFilterValuesData . There is also an experimental ggvis plotting function, plotFilterValuesGGVIS . This takes the same\narguments as plotFilterValues and produces a shiny application\nthat allows the interactive selection of the displayed filter method, the number of features selected, and the sorting method (e.g., ascending or descending). plotFilterValuesGGVIS(fv2) According to the \"information.gain\" measure, Petal.Width and Petal.Length \ncontain the most information about the target variable Species . Selecting a feature subset With mlr 's function filterFeatures you can create a new Task by leaving out\nfeatures of lower importance. There are several ways to select a feature subset based on feature importance values: Keep a certain absolute number ( abs ) of features with highest importance. Keep a certain percentage ( perc ) of features with highest importance. Keep all features whose importance exceeds a certain threshold value ( threshold ). Function filterFeatures supports these three methods as shown in the following example.\nMoreover, you can either specify the method for calculating the feature importance or you can\nuse previously computed importance values via argument fval . ## Keep the 2 most important features\nfiltered.task = filterFeatures(iris.task, method = information.gain , abs = 2)\n\n## Keep the 25% most important features\nfiltered.task = filterFeatures(iris.task, fval = fv, perc = 0.25)\n\n## Keep all features with importance greater than 0.5\nfiltered.task = filterFeatures(iris.task, fval = fv, threshold = 0.5)\nfiltered.task\n# Supervised task: iris-example\n# Type: classif\n# Target: Species\n# Observations: 150\n# Features:\n# numerics factors ordered \n# 2 0 0 \n# Missings: FALSE\n# Has weights: FALSE\n# Has blocking: FALSE\n# Classes: 3\n# setosa versicolor virginica \n# 50 50 50 \n# Positive class: NA Fuse a learner with a filter method Often feature selection based on a filter method is part of the data preprocessing and in\na subsequent step a learning method is applied to the filtered data.\nIn a proper experimental setup you might want to automate the selection of the\nfeatures so that it can be part of the validation method of your choice.\nA Learner can be fused with a filter method by function makeFilterWrapper .\nThe resulting Learner has the additional class attribute FilterWrapper . In the following example we calculate the 10-fold cross-validated error rate ( mmce )\nof the k nearest neighbor classifier with preceding feature selection on the iris data set.\nWe use \"information.gain\" as importance measure and select the 2 features with\nhighest importance.\nIn each resampling iteration feature selection is carried out on the corresponding training\ndata set before fitting the learner. lrn = makeFilterWrapper(learner = classif.fnn , fw.method = information.gain , fw.abs = 2)\nrdesc = makeResampleDesc( CV , iters = 10)\nr = resample(learner = lrn, task = iris.task, resampling = rdesc, show.info = FALSE, models = TRUE)\nr$aggr\n# mmce.test.mean \n# 0.04 You may want to know which features have been used. Luckily, we have called resample with the argument models = TRUE , which means that r$models \ncontains a list of models fitted in the individual resampling iterations.\nIn order to access the selected feature subsets we can call getFilteredFeatures on each model. sfeats = sapply(r$models, getFilteredFeatures)\ntable(sfeats)\n# sfeats\n# Petal.Length Petal.Width \n# 10 10 The selection of features seems to be very stable.\nThe features Sepal.Length and Sepal.Width did not make it into a single fold. Tuning the size of the feature subset In the above examples the number/percentage of features to select or the threshold value\nhave been arbitrarily chosen.\nIf filtering is a preprocessing step before applying a learning method optimal values\nwith regard to the learner performance can be found by tuning . In the following regression example we consider the BostonHousing data set.\nWe use a linear regression model and determine the optimal percentage value for feature selection\nsuch that the 3-fold cross-validated mean squared error of the learner is minimal.\nAs search strategy for tuning a grid search is used. lrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.perc , values = seq(0.2, 0.5, 0.05)))\nrdesc = makeResampleDesc( CV , iters = 3)\nres = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps,\n control = makeTuneControlGrid())\n# [Tune] Started tuning learner regr.lm.filtered for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# fw.perc discrete - - 0.2,0.25,0.3,0.35,0.4,0.45,0.5 - TRUE -\n# With control class: TuneControlGrid\n# Imputation value: Inf\n# [Tune-x] 1: fw.perc=0.2\n# [Tune-y] 1: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 2: fw.perc=0.25\n# [Tune-y] 2: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 3: fw.perc=0.3\n# [Tune-y] 3: mse.test.mean=37.1; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 4: fw.perc=0.35\n# [Tune-y] 4: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 5: fw.perc=0.4\n# [Tune-y] 5: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 6: fw.perc=0.45\n# [Tune-y] 6: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 7: fw.perc=0.5\n# [Tune-y] 7: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune] Result: fw.perc=0.5 : mse.test.mean=27.4\nres\n# Tune result:\n# Op. pars: fw.perc=0.5\n# mse.test.mean=27.4 The performance of all percentage values visited during tuning is: as.data.frame(res$opt.path)\n# fw.perc mse.test.mean dob eol error.message exec.time\n# 1 0.2 40.59578 1 NA NA 0.270\n# 2 0.25 40.59578 2 NA NA 0.235\n# 3 0.3 37.05592 3 NA NA 0.238\n# 4 0.35 35.83712 4 NA NA 0.240\n# 5 0.4 35.83712 5 NA NA 0.237\n# 6 0.45 27.39955 6 NA NA 0.238\n# 7 0.5 27.39955 7 NA NA 0.244 The optimal percentage and the corresponding performance can be accessed as follows: res$x\n# $fw.perc\n# [1] 0.5\nres$y\n# mse.test.mean \n# 27.39955 After tuning we can generate a new wrapped learner with the optimal percentage value for\nfurther use. lrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared , fw.perc = res$x$fw.perc)\nmod = train(lrn, bh.task)\nmod\n# Model for learner.id=regr.lm.filtered; learner.class=FilterWrapper\n# Trained on: task.id = BostonHousing-example; obs = 506; features = 13\n# Hyperparameters: fw.method=chi.squared,fw.perc=0.5\n\ngetFilteredFeatures(mod)\n# [1] crim zn rm dis rad lstat Here is another example using multi-criteria tuning .\nWe consider linear discriminant analysis with precedent feature selection based on\nthe Chi-squared statistic of independence ( \"chi.squared\" ) on the Sonar \ndata set and tune the threshold value.\nDuring tuning both, the false positive and the false negative rate ( fpr and fnr ), are minimized. As search strategy we choose a random search\n(see makeTuneMultiCritControlRandom ). lrn = makeFilterWrapper(learner = classif.lda , fw.method = chi.squared )\nps = makeParamSet(makeNumericParam( fw.threshold , lower = 0.1, upper = 0.9))\nrdesc = makeResampleDesc( CV , iters = 10)\nres = tuneParamsMultiCrit(lrn, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = makeTuneMultiCritControlRandom(maxit = 50L),\n show.info = FALSE)\nres\n# Tune multicrit result:\n# Points on front: 13\nhead(as.data.frame(res$opt.path))\n# fw.threshold fpr.test.mean fnr.test.mean dob eol error.message exec.time\n# 1 0.4892321 0.3092818 0.2639033 1 NA NA 2.293\n# 2 0.2481696 0.2045499 0.2319697 2 NA NA 2.296\n# 3 0.7691875 0.5128000 0.3459740 3 NA NA 2.279\n# 4 0.1470133 0.2045499 0.2319697 4 NA NA 2.348\n# 5 0.5958241 0.5028216 0.5239538 5 NA NA 2.169\n# 6 0.6892421 0.6323959 0.4480808 6 NA NA 2.091 The results can be visualized with function plotTuneMultiCritResult .\nThe plot shows the false positive and false negative rates for all parameter values visited\nduring tuning. The size of the points on the Pareto front is slightly increased. plotTuneMultiCritResult(res)", + "text": "Filter methods assign an importance value to each feature.\nBased on these values the features can be ranked and a feature subset can be selected. Calculating the feature importance Different methods for calculating the feature importance are built into mlr 's function generateFilterValuesData ( getFilterValues has been deprecated in favor of generateFilterValuesData .). Currently, classification, regression and survival analysis tasks\nare supported. A table showing all available methods can be found here . Function generateFilterValuesData requires the Task and a character string specifying the filter\nmethod. fv = generateFilterValuesData(iris.task, method = information.gain )\nfv\n# FilterValues:\n# Task: iris-example\n# name type information.gain\n# 1 Sepal.Length numeric 0.4521286\n# 2 Sepal.Width numeric 0.2672750\n# 3 Petal.Length numeric 0.9402853\n# 4 Petal.Width numeric 0.9554360 fv is a FilterValues object and fv$data contains a data.frame \nthat gives the importance values for all features. Optionally, a vector of filter methods can be\npassed. fv2 = generateFilterValuesData(iris.task, method = c( information.gain , chi.squared ))\nfv2$data\n# name type information.gain chi.squared\n# 1 Sepal.Length numeric 0.4521286 0.6288067\n# 2 Sepal.Width numeric 0.2672750 0.4922162\n# 3 Petal.Length numeric 0.9402853 0.9346311\n# 4 Petal.Width numeric 0.9554360 0.9432359 A bar plot of importance values for the individual features can be obtained using\nfunction plotFilterValues . plotFilterValues(fv2) By default plotFilterValues will create facetted subplots if multiple filter methods are passed as input to generateFilterValuesData . There is also an experimental ggvis plotting function, plotFilterValuesGGVIS . This takes the same\narguments as plotFilterValues and produces a shiny application\nthat allows the interactive selection of the displayed filter method, the number of features selected, and the sorting method (e.g., ascending or descending). plotFilterValuesGGVIS(fv2) According to the \"information.gain\" measure, Petal.Width and Petal.Length \ncontain the most information about the target variable Species . Selecting a feature subset With mlr 's function filterFeatures you can create a new Task by leaving out\nfeatures of lower importance. There are several ways to select a feature subset based on feature importance values: Keep a certain absolute number ( abs ) of features with highest importance. Keep a certain percentage ( perc ) of features with highest importance. Keep all features whose importance exceeds a certain threshold value ( threshold ). Function filterFeatures supports these three methods as shown in the following example.\nMoreover, you can either specify the method for calculating the feature importance or you can\nuse previously computed importance values via argument fval . ## Keep the 2 most important features\nfiltered.task = filterFeatures(iris.task, method = information.gain , abs = 2)\n\n## Keep the 25% most important features\nfiltered.task = filterFeatures(iris.task, fval = fv, perc = 0.25)\n\n## Keep all features with importance greater than 0.5\nfiltered.task = filterFeatures(iris.task, fval = fv, threshold = 0.5)\nfiltered.task\n# Supervised task: iris-example\n# Type: classif\n# Target: Species\n# Observations: 150\n# Features:\n# numerics factors ordered \n# 2 0 0 \n# Missings: FALSE\n# Has weights: FALSE\n# Has blocking: FALSE\n# Classes: 3\n# setosa versicolor virginica \n# 50 50 50 \n# Positive class: NA Fuse a learner with a filter method Often feature selection based on a filter method is part of the data preprocessing and in\na subsequent step a learning method is applied to the filtered data.\nIn a proper experimental setup you might want to automate the selection of the\nfeatures so that it can be part of the validation method of your choice.\nA Learner can be fused with a filter method by function makeFilterWrapper .\nThe resulting Learner has the additional class attribute FilterWrapper . In the following example we calculate the 10-fold cross-validated error rate ( mmce )\nof the k nearest neighbor classifier with preceding feature selection on the iris data set.\nWe use \"information.gain\" as importance measure and select the 2 features with\nhighest importance.\nIn each resampling iteration feature selection is carried out on the corresponding training\ndata set before fitting the learner. lrn = makeFilterWrapper(learner = classif.fnn , fw.method = information.gain , fw.abs = 2)\nrdesc = makeResampleDesc( CV , iters = 10)\nr = resample(learner = lrn, task = iris.task, resampling = rdesc, show.info = FALSE, models = TRUE)\nr$aggr\n# mmce.test.mean \n# 0.04 You may want to know which features have been used. Luckily, we have called resample with the argument models = TRUE , which means that r$models \ncontains a list of models fitted in the individual resampling iterations.\nIn order to access the selected feature subsets we can call getFilteredFeatures on each model. sfeats = sapply(r$models, getFilteredFeatures)\ntable(sfeats)\n# sfeats\n# Petal.Length Petal.Width \n# 10 10 The selection of features seems to be very stable.\nThe features Sepal.Length and Sepal.Width did not make it into a single fold. Tuning the size of the feature subset In the above examples the number/percentage of features to select or the threshold value\nhave been arbitrarily chosen.\nIf filtering is a preprocessing step before applying a learning method optimal values\nwith regard to the learner performance can be found by tuning . In the following regression example we consider the BostonHousing data set.\nWe use a linear regression model and determine the optimal percentage value for feature selection\nsuch that the 3-fold cross-validated mean squared error of the learner is minimal.\nAs search strategy for tuning a grid search is used. lrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.perc , values = seq(0.2, 0.5, 0.05)))\nrdesc = makeResampleDesc( CV , iters = 3)\nres = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps,\n control = makeTuneControlGrid())\n# [Tune] Started tuning learner regr.lm.filtered for parameter set:\n# Type len Def Constr Req Tunable Trafo\n# fw.perc discrete - - 0.2,0.25,0.3,0.35,0.4,0.45,0.5 - TRUE -\n# With control class: TuneControlGrid\n# Imputation value: Inf\n# [Tune-x] 1: fw.perc=0.2\n# [Tune-y] 1: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 2: fw.perc=0.25\n# [Tune-y] 2: mse.test.mean=40.6; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 3: fw.perc=0.3\n# [Tune-y] 3: mse.test.mean=37.1; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 4: fw.perc=0.35\n# [Tune-y] 4: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 5: fw.perc=0.4\n# [Tune-y] 5: mse.test.mean=35.8; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 6: fw.perc=0.45\n# [Tune-y] 6: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune-x] 7: fw.perc=0.5\n# [Tune-y] 7: mse.test.mean=27.4; time: 0.0 min; memory: 139Mb use, 585Mb max\n# [Tune] Result: fw.perc=0.5 : mse.test.mean=27.4\nres\n# Tune result:\n# Op. pars: fw.perc=0.5\n# mse.test.mean=27.4 The performance of all percentage values visited during tuning is: as.data.frame(res$opt.path)\n# fw.perc mse.test.mean dob eol error.message exec.time\n# 1 0.2 40.59578 1 NA NA 0.280\n# 2 0.25 40.59578 2 NA NA 0.269\n# 3 0.3 37.05592 3 NA NA 0.289\n# 4 0.35 35.83712 4 NA NA 0.258\n# 5 0.4 35.83712 5 NA NA 0.256\n# 6 0.45 27.39955 6 NA NA 0.258\n# 7 0.5 27.39955 7 NA NA 0.256 The optimal percentage and the corresponding performance can be accessed as follows: res$x\n# $fw.perc\n# [1] 0.5\nres$y\n# mse.test.mean \n# 27.39955 After tuning we can generate a new wrapped learner with the optimal percentage value for\nfurther use. lrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared , fw.perc = res$x$fw.perc)\nmod = train(lrn, bh.task)\nmod\n# Model for learner.id=regr.lm.filtered; learner.class=FilterWrapper\n# Trained on: task.id = BostonHousing-example; obs = 506; features = 13\n# Hyperparameters: fw.method=chi.squared,fw.perc=0.5\n\ngetFilteredFeatures(mod)\n# [1] crim zn rm dis rad lstat Here is another example using multi-criteria tuning .\nWe consider linear discriminant analysis with precedent feature selection based on\nthe Chi-squared statistic of independence ( \"chi.squared\" ) on the Sonar \ndata set and tune the threshold value.\nDuring tuning both, the false positive and the false negative rate ( fpr and fnr ), are minimized. As search strategy we choose a random search\n(see makeTuneMultiCritControlRandom ). lrn = makeFilterWrapper(learner = classif.lda , fw.method = chi.squared )\nps = makeParamSet(makeNumericParam( fw.threshold , lower = 0.1, upper = 0.9))\nrdesc = makeResampleDesc( CV , iters = 10)\nres = tuneParamsMultiCrit(lrn, task = sonar.task, resampling = rdesc, par.set = ps,\n measures = list(fpr, fnr), control = makeTuneMultiCritControlRandom(maxit = 50L),\n show.info = FALSE)\nres\n# Tune multicrit result:\n# Points on front: 13\nhead(as.data.frame(res$opt.path))\n# fw.threshold fpr.test.mean fnr.test.mean dob eol error.message exec.time\n# 1 0.4892321 0.3092818 0.2639033 1 NA NA 2.552\n# 2 0.2481696 0.2045499 0.2319697 2 NA NA 2.617\n# 3 0.7691875 0.5128000 0.3459740 3 NA NA 2.428\n# 4 0.1470133 0.2045499 0.2319697 4 NA NA 2.610\n# 5 0.5958241 0.5028216 0.5239538 5 NA NA 2.487\n# 6 0.6892421 0.6323959 0.4480808 6 NA NA 2.469 The results can be visualized with function plotTuneMultiCritResult .\nThe plot shows the false positive and false negative rates for all parameter values visited\nduring tuning. The size of the points on the Pareto front is slightly increased. plotTuneMultiCritResult(res)", "title": "Filter methods" }, { @@ -422,7 +422,7 @@ }, { "location": "/nested_resampling/index.html", - "text": "Nested Resampling\n\n\nIn order to obtain honest performance estimates for a learner all parts of the model building\nlike preprocessing and model selection steps should be included in the resampling, i.e.,\nrepeated for every pair of training/test data.\nFor steps that themselves require resampling like \nparameter tuning\n or\n\nfeature selection\n (via the wrapper approach) this results in two\nnested resampling loops.\n\n\n\nThe graphic above illustrates nested resampling for parameter tuning with 3-fold cross-validation\nin the outer and 4-fold cross-validation in the inner loop.\n\n\nIn the outer resampling loop, we have three pairs of training/test sets.\nOn each of these outer training sets parameter tuning is done, thereby executing the inner\nresampling loop.\nThis way, we get one set of selected hyperparameters for each outer training set.\nThen the learner is fitted on each outer training set using the corresponding selected\nhyperparameters and its performance is evaluated on the outer test sets.\n\n\nIn \nmlr\n, you can get nested resampling for free without programming any looping by\nusing the \nwrapper functionality\n. This works as follows:\n\n\n\n\nGenerate a wrapped \nLearner\n via function \nmakeTuneWrapper\n or \nmakeFeatSelWrapper\n.\n Specify the inner resampling strategy using their \nresampling\n argument.\n\n\nCall function \nresample\n (see also the section about \nresampling\n) and\n pass the outer resampling strategy to its \nresampling\n argument.\n\n\n\n\nYou can freely combine different inner and outer resampling strategies.\n\n\nThe outer strategy can be a resample description (\nResampleDesc\n) or a\nresample instance (\nResampleInstance\n).\nA common setup is prediction and performance evaluation on a fixed outer test set. This can\nbe achieved by using function \nmakeFixedHoldoutInstance\n to generate the outer\n\nResampleInstance\n.\n\n\nThe inner resampling strategy should preferably be a \nResampleDesc\n, as the sizes\nof the outer training sets might differ.\nPer default, the inner resample description is instantiated once for every outer training set.\nThis way during tuning/feature selection all parameter or feature sets are compared\non the same inner training/test sets to reduce variance.\nYou can also turn this off using the \nsame.resampling.instance\n argument of \nmakeTuneControl*\n\nor \nmakeFeatSelControl*\n.\n\n\nNested resampling is computationally expensive.\nFor this reason in the examples shown below we use relatively small search spaces and a low\nnumber of resampling iterations. In practice, you normally have to increase both.\nAs this is computationally intensive you might want to have a look at section\n\nparallelization\n.\n\n\nTuning\n\n\nAs you might recall from the tutorial page about \ntuning\n, you need to define a search space by\nfunction \nmakeParamSet\n, a search strategy by \nmakeTuneControl*\n,\nand a method to evaluate hyperparameter settings (i.e., the inner resampling strategy and a performance measure).\n\n\nBelow is a classification example.\nWe evaluate the performance of a support vector machine (\nksvm\n) with tuned\ncost parameter \nC\n and RBF kernel parameter \nsigma\n.\nWe use 3-fold cross-validation in the outer and subsampling with 2 iterations in the inner\nloop.\nFor tuning a grid search is used to find the hyperparameters with lowest error rate\n(\nmmce\n is the default measure for classification).\nThe wrapped \nLearner\n is generated by calling \nmakeTuneWrapper\n.\n\n\nNote that in practice the parameter set should be larger.\nA common recommendation is \n2^(-12:12)\n for both \nC\n and \nsigma\n.\n\n\n## Tuning in inner resampling loop\nps = makeParamSet(\n makeDiscreteParam(\nC\n, values = 2^(-2:2)),\n makeDiscreteParam(\nsigma\n, values = 2^(-2:2))\n)\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nSubsample\n, iters = 2)\nlrn = makeTuneWrapper(\nclassif.ksvm\n, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(lrn, iris.task, resampling = outer, extract = getTuneResult, show.info = FALSE)\n\nr\n#\n Resample Result\n#\n Task: iris-example\n#\n Learner: classif.ksvm.tuned\n#\n mmce.aggr: 0.05\n#\n mmce.mean: 0.05\n#\n mmce.sd: 0.03\n#\n Runtime: 18.9149\n\n\n\n\nYou can obtain the error rates on the 3 outer test sets by:\n\n\nr$measures.test\n#\n iter mmce\n#\n 1 1 0.02\n#\n 2 2 0.06\n#\n 3 3 0.08\n\n\n\n\nAccessing the tuning result\n\n\nWe have kept the results of the tuning for further evaluations.\nFor example one might want to find out, if the best obtained configurations vary for the\ndifferent outer splits.\nAs storing entire models may be expensive (but possible by setting \nmodels = TRUE\n) we used\nthe \nextract\n option of \nresample\n.\nFunction \ngetTuneResult\n returns, among other things, the optimal hyperparameter values and\nthe \noptimization path\n for each iteration of the outer resampling loop.\nNote that the performance values shown when printing \nr$extract\n are the aggregated performances\nresulting from inner resampling on the outer training set for the best hyperparameter configurations\n(not to be confused with \nr$measures.test\n shown above).\n\n\nr$extract\n#\n [[1]]\n#\n Tune result:\n#\n Op. pars: C=2; sigma=0.25\n#\n mmce.test.mean=0.0147\n#\n \n#\n [[2]]\n#\n Tune result:\n#\n Op. pars: C=4; sigma=0.25\n#\n mmce.test.mean= 0\n#\n \n#\n [[3]]\n#\n Tune result:\n#\n Op. pars: C=4; sigma=0.25\n#\n mmce.test.mean=0.0735\n\nnames(r$extract[[1]])\n#\n [1] \nlearner\n \ncontrol\n \nx\n \ny\n \nthreshold\n \nopt.path\n\n\n\n\n\nWe can compare the optimal parameter settings obtained in the 3 resampling iterations.\nAs you can see, the optimal configuration usually depends on the data. You may\nbe able to identify a \nrange\n of parameter settings that achieve good\nperformance though, e.g., the values for \nC\n should be at least 1 and the values\nfor \nsigma\n should be between 0 and 1.\n\n\nWith function \ngetNestedTuneResultsOptPathDf\n you can extract the optimization paths\nfor the 3 outer cross-validation iterations for further inspection and analysis.\nThese are stacked in one \ndata.frame\n with column \niter\n indicating the\nresampling iteration.\n\n\nopt.paths = getNestedTuneResultsOptPathDf(r)\nhead(opt.paths, 10)\n#\n C sigma mmce.test.mean dob eol error.message exec.time iter\n#\n 1 0.25 0.25 0.05882353 1 NA \nNA\n 0.046 1\n#\n 2 0.5 0.25 0.04411765 2 NA \nNA\n 0.045 1\n#\n 3 1 0.25 0.04411765 3 NA \nNA\n 0.046 1\n#\n 4 2 0.25 0.01470588 4 NA \nNA\n 0.047 1\n#\n 5 4 0.25 0.05882353 5 NA \nNA\n 0.046 1\n#\n 6 0.25 0.5 0.05882353 6 NA \nNA\n 0.046 1\n#\n 7 0.5 0.5 0.01470588 7 NA \nNA\n 0.046 1\n#\n 8 1 0.5 0.02941176 8 NA \nNA\n 0.045 1\n#\n 9 2 0.5 0.01470588 9 NA \nNA\n 0.046 1\n#\n 10 4 0.5 0.05882353 10 NA \nNA\n 0.049 1\n\n\n\n\nBelow we visualize the \nopt.path\ns for the 3 outer resampling iterations.\n\n\ng = ggplot(opt.paths, aes(x = C, y = sigma, fill = mmce.test.mean))\ng + geom_tile() + facet_wrap(~ iter)\n\n\n\n\n \n\n\nAnother useful function is \ngetNestedTuneResultsX\n, which extracts the best found hyperparameter\nsettings for each outer resampling iteration.\n\n\ngetNestedTuneResultsX(r)\n#\n C sigma\n#\n 1 2 0.25\n#\n 2 4 0.25\n#\n 3 4 0.25\n\n\n\n\nFeature selection\n\n\nAs you might recall from the section about \nfeature selection\n, \nmlr\n\nsupports the filter and the wrapper approach.\n\n\nWrapper methods\n\n\nWrapper methods use the performance of a learning algorithm to assess the usefulness of a\nfeature set. In order to select a feature subset a learner is trained repeatedly on different\nfeature subsets and the subset which leads to the best learner performance is chosen.\n\n\nFor feature selection in the inner resampling loop, you need to choose a search strategy\n(function \nmakeFeatSelControl*\n), a performance measure and the inner\nresampling strategy. Then use function \nmakeFeatSelWrapper\n to bind everything together.\n\n\nBelow we use sequential forward selection with linear regression on the\n\nBostonHousing\n data set (\nbh.task\n).\n\n\n## Feature selection in inner resampling loop\ninner = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeFeatSelWrapper(\nregr.lm\n, resampling = inner,\n control = makeFeatSelControlSequential(method = \nsfs\n), show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iters = 2)\nr = resample(learner = lrn, task = bh.task, resampling = outer, extract = getFeatSelResult,\n show.info = FALSE)\n\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.lm.featsel\n#\n mse.aggr: 31.70\n#\n mse.mean: 31.70\n#\n mse.sd: 4.79\n#\n Runtime: 49.0522\n\nr$measures.test\n#\n iter mse\n#\n 1 1 35.08611\n#\n 2 2 28.31215\n\n\n\n\nAccessing the selected features\n\n\nThe result of the feature selection can be extracted by function \ngetFeatSelResult\n.\nIt is also possible to keep whole \nmodels\n by setting \nmodels = TRUE\n\nwhen calling \nresample\n.\n\n\nr$extract\n#\n [[1]]\n#\n FeatSel result:\n#\n Features (10): crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n#\n mse.test.mean=20.2\n#\n \n#\n [[2]]\n#\n FeatSel result:\n#\n Features (9): zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=22.6\n\n## Selected features in the first outer resampling iteration\nr$extract[[1]]$x\n#\n [1] \ncrim\n \nzn\n \nindus\n \nnox\n \nrm\n \ndis\n \nrad\n \n#\n [8] \ntax\n \nptratio\n \nlstat\n\n\n## Resampled performance of the selected feature subset on the first inner training set\nr$extract[[1]]$y\n#\n mse.test.mean \n#\n 20.15939\n\n\n\n\nAs for tuning, you can extract the optimization paths.\nThe resulting \ndata.frame\ns contain, among others, binary columns for\nall features, indicating if they were included in the linear regression model, and the\ncorresponding performances.\n\n\nopt.paths = lapply(r$extract, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n#\n crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n#\n 1 0 0 0 0 0 0 0 0 0 0 0 0 0 80.33019\n#\n 2 1 0 0 0 0 0 0 0 0 0 0 0 0 65.95316\n#\n 3 0 1 0 0 0 0 0 0 0 0 0 0 0 69.15417\n#\n 4 0 0 1 0 0 0 0 0 0 0 0 0 0 55.75473\n#\n 5 0 0 0 1 0 0 0 0 0 0 0 0 0 80.48765\n#\n 6 0 0 0 0 1 0 0 0 0 0 0 0 0 63.06724\n#\n dob eol error.message exec.time\n#\n 1 1 2 \nNA\n 0.022\n#\n 2 2 2 \nNA\n 0.038\n#\n 3 2 2 \nNA\n 0.034\n#\n 4 2 2 \nNA\n 0.034\n#\n 5 2 2 \nNA\n 0.034\n#\n 6 2 2 \nNA\n 0.034\n\n\n\n\nAn easy-to-read version of the optimization path for sequential feature selection can be\nobtained with function \nanalyzeFeatSelResult\n.\n\n\nanalyzeFeatSelResult(r$extract[[1]])\n#\n Features : 10\n#\n Performance : mse.test.mean=20.2\n#\n crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 80.33 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 36.451 Diff: 43.879 *\n#\n - Features: 2 Add : rm Perf = 27.289 Diff: 9.1623 *\n#\n - Features: 3 Add : ptratio Perf = 24.004 Diff: 3.2849 *\n#\n - Features: 4 Add : nox Perf = 23.513 Diff: 0.49082 *\n#\n - Features: 5 Add : dis Perf = 21.49 Diff: 2.023 *\n#\n - Features: 6 Add : crim Perf = 21.12 Diff: 0.37008 *\n#\n - Features: 7 Add : indus Perf = 20.82 Diff: 0.29994 *\n#\n - Features: 8 Add : rad Perf = 20.609 Diff: 0.21054 *\n#\n - Features: 9 Add : tax Perf = 20.209 Diff: 0.40059 *\n#\n - Features: 10 Add : zn Perf = 20.159 Diff: 0.049441 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nFilter methods with tuning\n\n\nFilter methods assign an importance value to each feature.\nBased on these values you can select a feature subset by either keeping all features with importance\nhigher than a certain threshold or by keeping a fixed number or percentage of the highest ranking features.\nOften, neither the theshold nor the number or percentage of features is known in advance\nand thus tuning is necessary.\n\n\nIn the example below the threshold value (\nfw.threshold\n) is tuned in the inner resampling loop.\nFor this purpose the base \nLearner\n \n\"regr.lm\"\n is wrapped two times.\nFirst, \nmakeFilterWrapper\n is used to fuse linear regression with a feature filtering\npreprocessing step. Then a tuning step is added by \nmakeTuneWrapper\n.\n\n\n## Tuning of the percentage of selected filters in the inner loop\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.threshold\n, values = seq(0, 1, 0.2)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(learner = lrn, task = bh.task, resampling = outer, models = TRUE, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.lm.filtered.tuned\n#\n mse.aggr: 25.39\n#\n mse.mean: 25.39\n#\n mse.sd: 8.35\n#\n Runtime: 8.5303\n\n\n\n\nAccessing the selected features and optimal percentage\n\n\nIn the above example we kept the complete \nmodel\ns.\n\n\nBelow are some examples that show how to extract information from the \nmodel\ns.\n\n\nr$models\n#\n [[1]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n#\n \n#\n [[2]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 338; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n#\n \n#\n [[3]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n\n\n\n\nThe result of the feature selection can be extracted by function \ngetFilteredFeatures\n.\nAlmost always all 13 features are selected.\n\n\nlapply(r$models, function(x) getFilteredFeatures(x$learner.model$next.model))\n#\n [[1]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nchas\n \nnox\n \nrm\n \nage\n \n#\n [8] \ndis\n \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n \n#\n \n#\n [[2]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nnox\n \nrm\n \nage\n \ndis\n \n#\n [8] \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n \n#\n \n#\n [[3]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nchas\n \nnox\n \nrm\n \nage\n \n#\n [8] \ndis\n \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n\n\n\n\n\nBelow the \ntune results\n and \noptimization paths\n\nare accessed.\n\n\nres = lapply(r$models, getTuneResult)\nres\n#\n [[1]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0\n#\n mse.test.mean=24.9\n#\n \n#\n [[2]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0.4\n#\n mse.test.mean=27.2\n#\n \n#\n [[3]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0\n#\n mse.test.mean=19.7\n\nopt.paths = lapply(res, function(x) as.data.frame(x$opt.path))\nopt.paths[[1]]\n#\n fw.threshold mse.test.mean dob eol error.message exec.time\n#\n 1 0 24.89160 1 NA \nNA\n 0.560\n#\n 2 0.2 25.18817 2 NA \nNA\n 0.273\n#\n 3 0.4 25.18817 3 NA \nNA\n 0.262\n#\n 4 0.6 32.15930 4 NA \nNA\n 0.245\n#\n 5 0.8 90.89848 5 NA \nNA\n 0.233\n#\n 6 1 90.89848 6 NA \nNA\n 0.218\n\n\n\n\nBenchmark experiments\n\n\nIn a benchmark experiment multiple learners are compared on one or several tasks\n(see also the section about \nbenchmarking\n).\nNested resampling in benchmark experiments is achieved the same way as in resampling:\n\n\n\n\nFirst, use \nmakeTuneWrapper\n or \nmakeFeatSelWrapper\n to generate wrapped \nLearner\ns\n with the inner resampling strategies of your choice.\n\n\nSecond, call \nbenchmark\n and specify the outer resampling strategies for all tasks.\n\n\n\n\nThe inner resampling strategies should be \nresample descriptions\n.\nYou can use different inner resampling strategies for different wrapped learners.\nFor example it might be practical to do fewer subsampling or bootstrap iterations for slower\nlearners.\n\n\nIf you have larger benchmark experiments you might want to have a look at the section\nabout \nparallelization\n.\n\n\nAs mentioned in the section about \nbenchmark experiments\n you can also use\ndifferent resampling strategies for different learning tasks by passing a\n\nlist\n of resampling descriptions or instances to \nbenchmark\n.\n\n\nWe will see three examples to show different benchmark settings:\n\n\n\n\nTwo data sets + two classification algorithms + tuning\n\n\nOne data set + two regression algorithms + feature selection\n\n\nOne data set + two regression algorithms + feature filtering + tuning\n\n\n\n\nExample 1: Two tasks, two learners, tuning\n\n\nBelow is a benchmark experiment with two data sets, \niris\n and\n\nsonar\n, and two \nLearner\ns,\n\nksvm\n and \nkknn\n, that are both tuned.\n\n\nAs inner resampling strategies we use holdout for \nksvm\n and subsampling\nwith 3 iterations for \nkknn\n.\nAs outer resampling strategies we take holdout for the \niris\n and bootstrap\nwith 2 iterations for the \nsonar\n data (\nsonar.task\n).\nWe consider the accuracy (\nacc\n), which is used as tuning criterion, and also\ncalculate the balanced error rate (\nber\n).\n\n\n## List of learning tasks\ntasks = list(iris.task, sonar.task)\n\n## Tune svm in the inner resampling loop\nps = makeParamSet(\n makeDiscreteParam(\nC\n, 2^(-1:1)),\n makeDiscreteParam(\nsigma\n, 2^(-1:1)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nHoldout\n)\nlrn1 = makeTuneWrapper(\nclassif.ksvm\n, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Tune k-nearest neighbor in inner resampling loop\nps = makeParamSet(makeDiscreteParam(\nk\n, 3:5))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nSubsample\n, iters = 3)\nlrn2 = makeTuneWrapper(\nclassif.kknn\n, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(lrn1, lrn2)\n\n## Outer resampling loop\nouter = list(makeResampleDesc(\nHoldout\n), makeResampleDesc(\nBootstrap\n, iters = 2))\nres = benchmark(lrns, tasks, outer, measures = list(acc, ber), show.info = FALSE)\nres\n#\n task.id learner.id acc.test.mean ber.test.mean\n#\n 1 iris-example classif.ksvm.tuned 0.9400000 0.05882353\n#\n 2 iris-example classif.kknn.tuned 0.9200000 0.08683473\n#\n 3 Sonar-example classif.ksvm.tuned 0.5289307 0.50000000\n#\n 4 Sonar-example classif.kknn.tuned 0.8077080 0.19549714\n\n\n\n\nThe \nprint\n method for the \nBenchmarkResult\n shows the aggregated performances\nfrom the outer resampling loop.\n\n\nAs you might recall, \nmlr\n offers several accessor function to extract information from\nthe benchmark result.\nThese are listed on the help page of \nBenchmarkResult\n and many examples are shown on the\ntutorial page about \nbenchmark experiments\n.\n\n\nThe performance values in individual outer resampling runs can be obtained by \ngetBMRPerformances\n.\nNote that, since we used different outer resampling strategies for the two tasks, the number\nof rows per task differ.\n\n\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter acc ber\n#\n 1 iris-example classif.ksvm.tuned 1 0.9400000 0.05882353\n#\n 2 iris-example classif.kknn.tuned 1 0.9200000 0.08683473\n#\n 3 Sonar-example classif.ksvm.tuned 1 0.5373134 0.50000000\n#\n 4 Sonar-example classif.ksvm.tuned 2 0.5205479 0.50000000\n#\n 5 Sonar-example classif.kknn.tuned 1 0.8208955 0.18234767\n#\n 6 Sonar-example classif.kknn.tuned 2 0.7945205 0.20864662\n\n\n\n\nThe results from the parameter tuning can be obtained through function \ngetBMRTuneResults\n.\n\n\ngetBMRTuneResults(res)\n#\n $`iris-example`\n#\n $`iris-example`$classif.ksvm.tuned\n#\n $`iris-example`$classif.ksvm.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: C=0.5; sigma=0.5\n#\n mmce.test.mean=0.0588\n#\n \n#\n \n#\n $`iris-example`$classif.kknn.tuned\n#\n $`iris-example`$classif.kknn.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: k=3\n#\n mmce.test.mean=0.049\n#\n \n#\n \n#\n \n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.ksvm.tuned\n#\n $`Sonar-example`$classif.ksvm.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: C=1; sigma=2\n#\n mmce.test.mean=0.343\n#\n \n#\n $`Sonar-example`$classif.ksvm.tuned[[2]]\n#\n Tune result:\n#\n Op. pars: C=2; sigma=0.5\n#\n mmce.test.mean= 0.2\n#\n \n#\n \n#\n $`Sonar-example`$classif.kknn.tuned\n#\n $`Sonar-example`$classif.kknn.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: k=4\n#\n mmce.test.mean=0.11\n#\n \n#\n $`Sonar-example`$classif.kknn.tuned[[2]]\n#\n Tune result:\n#\n Op. pars: k=3\n#\n mmce.test.mean=0.0667\n\n\n\n\nAs for several other accessor functions a clearer representation as \ndata.frame\n\ncan be achieved by setting \nas.df = TRUE\n.\n\n\ngetBMRTuneResults(res, as.df = TRUE)\n#\n task.id learner.id iter C sigma mmce.test.mean k\n#\n 1 iris-example classif.ksvm.tuned 1 0.5 0.5 0.05882353 NA\n#\n 2 iris-example classif.kknn.tuned 1 NA NA 0.04901961 3\n#\n 3 Sonar-example classif.ksvm.tuned 1 1.0 2.0 0.34285714 NA\n#\n 4 Sonar-example classif.ksvm.tuned 2 2.0 0.5 0.20000000 NA\n#\n 5 Sonar-example classif.kknn.tuned 1 NA NA 0.10952381 4\n#\n 6 Sonar-example classif.kknn.tuned 2 NA NA 0.06666667 3\n\n\n\n\nIt is also possible to extract the tuning results for individual tasks and learners and,\nas shown in earlier examples, inspect the \noptimization path\n.\n\n\ntune.res = getBMRTuneResults(res, task.ids = \nSonar-example\n, learner.ids = \nclassif.ksvm.tuned\n,\n as.df = TRUE)\ntune.res\n#\n task.id learner.id iter C sigma mmce.test.mean\n#\n 1 Sonar-example classif.ksvm.tuned 1 1 2.0 0.3428571\n#\n 2 Sonar-example classif.ksvm.tuned 2 2 0.5 0.2000000\n\ngetNestedTuneResultsOptPathDf(res$results[[\nSonar-example\n]][[\nclassif.ksvm.tuned\n]])\n#\n C sigma mmce.test.mean dob eol error.message exec.time iter\n#\n 1 0.5 0.5 0.3428571 1 NA \nNA\n 0.048 1\n#\n 2 1 0.5 0.3428571 2 NA \nNA\n 0.049 1\n#\n 3 2 0.5 0.3428571 3 NA \nNA\n 0.046 1\n#\n 4 0.5 1 0.3428571 4 NA \nNA\n 0.048 1\n#\n 5 1 1 0.3428571 5 NA \nNA\n 0.048 1\n#\n 6 2 1 0.3428571 6 NA \nNA\n 0.049 1\n#\n 7 0.5 2 0.3428571 7 NA \nNA\n 0.048 1\n#\n 8 1 2 0.3428571 8 NA \nNA\n 0.052 1\n#\n 9 2 2 0.3428571 9 NA \nNA\n 0.048 1\n#\n 10 0.5 0.5 0.2142857 1 NA \nNA\n 0.051 2\n#\n 11 1 0.5 0.2142857 2 NA \nNA\n 0.048 2\n#\n 12 2 0.5 0.2000000 3 NA \nNA\n 0.047 2\n#\n 13 0.5 1 0.2142857 4 NA \nNA\n 0.050 2\n#\n 14 1 1 0.2142857 5 NA \nNA\n 0.048 2\n#\n 15 2 1 0.2142857 6 NA \nNA\n 0.050 2\n#\n 16 0.5 2 0.2142857 7 NA \nNA\n 0.051 2\n#\n 17 1 2 0.2142857 8 NA \nNA\n 0.051 2\n#\n 18 2 2 0.2142857 9 NA \nNA\n 0.051 2\n\n\n\n\nExample 2: One task, two learners, feature selection\n\n\nLet's see how we can do \nfeature selection\n in\na benchmark experiment:\n\n\n## Feature selection in inner resampling loop\nctrl = makeFeatSelControlSequential(method = \nsfs\n)\ninner = makeResampleDesc(\nSubsample\n, iters = 2)\nlrn = makeFeatSelWrapper(\nregr.lm\n, resampling = inner, control = ctrl, show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner(\nregr.rpart\n), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iters = 2)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n#\n task.id learner.id mse.test.mean\n#\n 1 BostonHousing-example regr.rpart 25.86232\n#\n 2 BostonHousing-example regr.lm.featsel 25.07465\n\n\n\n\nThe selected features can be extracted by function \ngetBMRFeatSelResults\n.\n\n\ngetBMRFeatSelResults(res)\n#\n $`BostonHousing-example`\n#\n $`BostonHousing-example`$regr.rpart\n#\n NULL\n#\n \n#\n $`BostonHousing-example`$regr.lm.featsel\n#\n $`BostonHousing-example`$regr.lm.featsel[[1]]\n#\n FeatSel result:\n#\n Features (8): crim, zn, chas, nox, rm, dis, ptratio, lstat\n#\n mse.test.mean=26.7\n#\n \n#\n $`BostonHousing-example`$regr.lm.featsel[[2]]\n#\n FeatSel result:\n#\n Features (10): crim, zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=24.3\n\n\n\n\nYou can access results for individual learners and tasks and inspect them further.\n\n\nfeats = getBMRFeatSelResults(res, learner.id = \nregr.lm.featsel\n)\nfeats = feats$`BostonHousing-example`$`regr.lm.featsel`\n\n## Selected features in the first outer resampling iteration\nfeats[[1]]$x\n#\n [1] \ncrim\n \nzn\n \nchas\n \nnox\n \nrm\n \ndis\n \nptratio\n\n#\n [8] \nlstat\n\n\n## Resampled performance of the selected feature subset on the first inner training set\nfeats[[1]]$y\n#\n mse.test.mean \n#\n 26.72574\n\n\n\n\nAs for tuning, you can extract the optimization paths. The resulting \ndata.frame\ns\ncontain, among others, binary columns for all features, indicating if they were included in the\nlinear regression model, and the corresponding performances.\n\nanalyzeFeatSelResult\n gives a clearer overview.\n\n\nopt.paths = lapply(feats, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n#\n crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n#\n 1 0 0 0 0 0 0 0 0 0 0 0 0 0 90.16159\n#\n 2 1 0 0 0 0 0 0 0 0 0 0 0 0 82.85880\n#\n 3 0 1 0 0 0 0 0 0 0 0 0 0 0 79.55202\n#\n 4 0 0 1 0 0 0 0 0 0 0 0 0 0 70.02071\n#\n 5 0 0 0 1 0 0 0 0 0 0 0 0 0 86.93409\n#\n 6 0 0 0 0 1 0 0 0 0 0 0 0 0 76.32457\n#\n dob eol error.message exec.time\n#\n 1 1 2 \nNA\n 0.018\n#\n 2 2 2 \nNA\n 0.023\n#\n 3 2 2 \nNA\n 0.023\n#\n 4 2 2 \nNA\n 0.024\n#\n 5 2 2 \nNA\n 0.025\n#\n 6 2 2 \nNA\n 0.023\n\nanalyzeFeatSelResult(feats[[1]])\n#\n Features : 8\n#\n Performance : mse.test.mean=26.7\n#\n crim, zn, chas, nox, rm, dis, ptratio, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 90.162 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 42.646 Diff: 47.515 *\n#\n - Features: 2 Add : ptratio Perf = 34.52 Diff: 8.1263 *\n#\n - Features: 3 Add : rm Perf = 30.454 Diff: 4.066 *\n#\n - Features: 4 Add : dis Perf = 29.405 Diff: 1.0495 *\n#\n - Features: 5 Add : nox Perf = 28.059 Diff: 1.3454 *\n#\n - Features: 6 Add : chas Perf = 27.334 Diff: 0.72499 *\n#\n - Features: 7 Add : zn Perf = 26.901 Diff: 0.43296 *\n#\n - Features: 8 Add : crim Perf = 26.726 Diff: 0.17558 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nExample 3: One task, two learners, feature filtering with tuning\n\n\nHere is a minimal example for feature filtering with tuning of the feature subset size.\n\n\n## Feature filtering with tuning in the inner resampling loop\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.abs\n, values = seq_len(getTaskNFeats(bh.task))))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nCV\n, iter = 2)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner(\nregr.rpart\n), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iter = 3)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n#\n task.id learner.id mse.test.mean\n#\n 1 BostonHousing-example regr.rpart 22.11687\n#\n 2 BostonHousing-example regr.lm.filtered.tuned 23.76666\n\n\n\n\n## Performances on individual outer test data sets\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter mse\n#\n 1 BostonHousing-example regr.rpart 1 23.55486\n#\n 2 BostonHousing-example regr.rpart 2 20.03453\n#\n 3 BostonHousing-example regr.rpart 3 22.76121\n#\n 4 BostonHousing-example regr.lm.filtered.tuned 1 27.51086\n#\n 5 BostonHousing-example regr.lm.filtered.tuned 2 24.87820\n#\n 6 BostonHousing-example regr.lm.filtered.tuned 3 18.91091", + "text": "Nested Resampling\n\n\nIn order to obtain honest performance estimates for a learner all parts of the model building\nlike preprocessing and model selection steps should be included in the resampling, i.e.,\nrepeated for every pair of training/test data.\nFor steps that themselves require resampling like \nparameter tuning\n or\n\nfeature selection\n (via the wrapper approach) this results in two\nnested resampling loops.\n\n\n\nThe graphic above illustrates nested resampling for parameter tuning with 3-fold cross-validation\nin the outer and 4-fold cross-validation in the inner loop.\n\n\nIn the outer resampling loop, we have three pairs of training/test sets.\nOn each of these outer training sets parameter tuning is done, thereby executing the inner\nresampling loop.\nThis way, we get one set of selected hyperparameters for each outer training set.\nThen the learner is fitted on each outer training set using the corresponding selected\nhyperparameters and its performance is evaluated on the outer test sets.\n\n\nIn \nmlr\n, you can get nested resampling for free without programming any looping by\nusing the \nwrapper functionality\n. This works as follows:\n\n\n\n\nGenerate a wrapped \nLearner\n via function \nmakeTuneWrapper\n or \nmakeFeatSelWrapper\n.\n Specify the inner resampling strategy using their \nresampling\n argument.\n\n\nCall function \nresample\n (see also the section about \nresampling\n) and\n pass the outer resampling strategy to its \nresampling\n argument.\n\n\n\n\nYou can freely combine different inner and outer resampling strategies.\n\n\nThe outer strategy can be a resample description (\nResampleDesc\n) or a\nresample instance (\nResampleInstance\n).\nA common setup is prediction and performance evaluation on a fixed outer test set. This can\nbe achieved by using function \nmakeFixedHoldoutInstance\n to generate the outer\n\nResampleInstance\n.\n\n\nThe inner resampling strategy should preferably be a \nResampleDesc\n, as the sizes\nof the outer training sets might differ.\nPer default, the inner resample description is instantiated once for every outer training set.\nThis way during tuning/feature selection all parameter or feature sets are compared\non the same inner training/test sets to reduce variance.\nYou can also turn this off using the \nsame.resampling.instance\n argument of \nmakeTuneControl*\n\nor \nmakeFeatSelControl*\n.\n\n\nNested resampling is computationally expensive.\nFor this reason in the examples shown below we use relatively small search spaces and a low\nnumber of resampling iterations. In practice, you normally have to increase both.\nAs this is computationally intensive you might want to have a look at section\n\nparallelization\n.\n\n\nTuning\n\n\nAs you might recall from the tutorial page about \ntuning\n, you need to define a search space by\nfunction \nmakeParamSet\n, a search strategy by \nmakeTuneControl*\n,\nand a method to evaluate hyperparameter settings (i.e., the inner resampling strategy and a performance measure).\n\n\nBelow is a classification example.\nWe evaluate the performance of a support vector machine (\nksvm\n) with tuned\ncost parameter \nC\n and RBF kernel parameter \nsigma\n.\nWe use 3-fold cross-validation in the outer and subsampling with 2 iterations in the inner\nloop.\nFor tuning a grid search is used to find the hyperparameters with lowest error rate\n(\nmmce\n is the default measure for classification).\nThe wrapped \nLearner\n is generated by calling \nmakeTuneWrapper\n.\n\n\nNote that in practice the parameter set should be larger.\nA common recommendation is \n2^(-12:12)\n for both \nC\n and \nsigma\n.\n\n\n## Tuning in inner resampling loop\nps = makeParamSet(\n makeDiscreteParam(\nC\n, values = 2^(-2:2)),\n makeDiscreteParam(\nsigma\n, values = 2^(-2:2))\n)\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nSubsample\n, iters = 2)\nlrn = makeTuneWrapper(\nclassif.ksvm\n, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(lrn, iris.task, resampling = outer, extract = getTuneResult, show.info = FALSE)\n\nr\n#\n Resample Result\n#\n Task: iris-example\n#\n Learner: classif.ksvm.tuned\n#\n mmce.aggr: 0.05\n#\n mmce.mean: 0.05\n#\n mmce.sd: 0.03\n#\n Runtime: 23.346\n\n\n\n\nYou can obtain the error rates on the 3 outer test sets by:\n\n\nr$measures.test\n#\n iter mmce\n#\n 1 1 0.02\n#\n 2 2 0.06\n#\n 3 3 0.08\n\n\n\n\nAccessing the tuning result\n\n\nWe have kept the results of the tuning for further evaluations.\nFor example one might want to find out, if the best obtained configurations vary for the\ndifferent outer splits.\nAs storing entire models may be expensive (but possible by setting \nmodels = TRUE\n) we used\nthe \nextract\n option of \nresample\n.\nFunction \ngetTuneResult\n returns, among other things, the optimal hyperparameter values and\nthe \noptimization path\n for each iteration of the outer resampling loop.\nNote that the performance values shown when printing \nr$extract\n are the aggregated performances\nresulting from inner resampling on the outer training set for the best hyperparameter configurations\n(not to be confused with \nr$measures.test\n shown above).\n\n\nr$extract\n#\n [[1]]\n#\n Tune result:\n#\n Op. pars: C=2; sigma=0.25\n#\n mmce.test.mean=0.0147\n#\n \n#\n [[2]]\n#\n Tune result:\n#\n Op. pars: C=4; sigma=0.25\n#\n mmce.test.mean= 0\n#\n \n#\n [[3]]\n#\n Tune result:\n#\n Op. pars: C=4; sigma=0.25\n#\n mmce.test.mean=0.0735\n\nnames(r$extract[[1]])\n#\n [1] \nlearner\n \ncontrol\n \nx\n \ny\n \nthreshold\n \nopt.path\n\n\n\n\n\nWe can compare the optimal parameter settings obtained in the 3 resampling iterations.\nAs you can see, the optimal configuration usually depends on the data. You may\nbe able to identify a \nrange\n of parameter settings that achieve good\nperformance though, e.g., the values for \nC\n should be at least 1 and the values\nfor \nsigma\n should be between 0 and 1.\n\n\nWith function \ngetNestedTuneResultsOptPathDf\n you can extract the optimization paths\nfor the 3 outer cross-validation iterations for further inspection and analysis.\nThese are stacked in one \ndata.frame\n with column \niter\n indicating the\nresampling iteration.\n\n\nopt.paths = getNestedTuneResultsOptPathDf(r)\nhead(opt.paths, 10)\n#\n C sigma mmce.test.mean dob eol error.message exec.time iter\n#\n 1 0.25 0.25 0.05882353 1 NA \nNA\n 0.048 1\n#\n 2 0.5 0.25 0.04411765 2 NA \nNA\n 0.045 1\n#\n 3 1 0.25 0.04411765 3 NA \nNA\n 0.045 1\n#\n 4 2 0.25 0.01470588 4 NA \nNA\n 0.046 1\n#\n 5 4 0.25 0.05882353 5 NA \nNA\n 0.045 1\n#\n 6 0.25 0.5 0.05882353 6 NA \nNA\n 0.047 1\n#\n 7 0.5 0.5 0.01470588 7 NA \nNA\n 0.046 1\n#\n 8 1 0.5 0.02941176 8 NA \nNA\n 0.045 1\n#\n 9 2 0.5 0.01470588 9 NA \nNA\n 0.045 1\n#\n 10 4 0.5 0.05882353 10 NA \nNA\n 0.046 1\n\n\n\n\nBelow we visualize the \nopt.path\ns for the 3 outer resampling iterations.\n\n\ng = ggplot(opt.paths, aes(x = C, y = sigma, fill = mmce.test.mean))\ng + geom_tile() + facet_wrap(~ iter)\n\n\n\n\n \n\n\nAnother useful function is \ngetNestedTuneResultsX\n, which extracts the best found hyperparameter\nsettings for each outer resampling iteration.\n\n\ngetNestedTuneResultsX(r)\n#\n C sigma\n#\n 1 2 0.25\n#\n 2 4 0.25\n#\n 3 4 0.25\n\n\n\n\nFeature selection\n\n\nAs you might recall from the section about \nfeature selection\n, \nmlr\n\nsupports the filter and the wrapper approach.\n\n\nWrapper methods\n\n\nWrapper methods use the performance of a learning algorithm to assess the usefulness of a\nfeature set. In order to select a feature subset a learner is trained repeatedly on different\nfeature subsets and the subset which leads to the best learner performance is chosen.\n\n\nFor feature selection in the inner resampling loop, you need to choose a search strategy\n(function \nmakeFeatSelControl*\n), a performance measure and the inner\nresampling strategy. Then use function \nmakeFeatSelWrapper\n to bind everything together.\n\n\nBelow we use sequential forward selection with linear regression on the\n\nBostonHousing\n data set (\nbh.task\n).\n\n\n## Feature selection in inner resampling loop\ninner = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeFeatSelWrapper(\nregr.lm\n, resampling = inner,\n control = makeFeatSelControlSequential(method = \nsfs\n), show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iters = 2)\nr = resample(learner = lrn, task = bh.task, resampling = outer, extract = getFeatSelResult,\n show.info = FALSE)\n\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.lm.featsel\n#\n mse.aggr: 31.70\n#\n mse.mean: 31.70\n#\n mse.sd: 4.79\n#\n Runtime: 52.561\n\nr$measures.test\n#\n iter mse\n#\n 1 1 35.08611\n#\n 2 2 28.31215\n\n\n\n\nAccessing the selected features\n\n\nThe result of the feature selection can be extracted by function \ngetFeatSelResult\n.\nIt is also possible to keep whole \nmodels\n by setting \nmodels = TRUE\n\nwhen calling \nresample\n.\n\n\nr$extract\n#\n [[1]]\n#\n FeatSel result:\n#\n Features (10): crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n#\n mse.test.mean=20.2\n#\n \n#\n [[2]]\n#\n FeatSel result:\n#\n Features (9): zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=22.6\n\n## Selected features in the first outer resampling iteration\nr$extract[[1]]$x\n#\n [1] \ncrim\n \nzn\n \nindus\n \nnox\n \nrm\n \ndis\n \nrad\n \n#\n [8] \ntax\n \nptratio\n \nlstat\n\n\n## Resampled performance of the selected feature subset on the first inner training set\nr$extract[[1]]$y\n#\n mse.test.mean \n#\n 20.15939\n\n\n\n\nAs for tuning, you can extract the optimization paths.\nThe resulting \ndata.frame\ns contain, among others, binary columns for\nall features, indicating if they were included in the linear regression model, and the\ncorresponding performances.\n\n\nopt.paths = lapply(r$extract, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n#\n crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n#\n 1 0 0 0 0 0 0 0 0 0 0 0 0 0 80.33019\n#\n 2 1 0 0 0 0 0 0 0 0 0 0 0 0 65.95316\n#\n 3 0 1 0 0 0 0 0 0 0 0 0 0 0 69.15417\n#\n 4 0 0 1 0 0 0 0 0 0 0 0 0 0 55.75473\n#\n 5 0 0 0 1 0 0 0 0 0 0 0 0 0 80.48765\n#\n 6 0 0 0 0 1 0 0 0 0 0 0 0 0 63.06724\n#\n dob eol error.message exec.time\n#\n 1 1 2 \nNA\n 0.022\n#\n 2 2 2 \nNA\n 0.033\n#\n 3 2 2 \nNA\n 0.032\n#\n 4 2 2 \nNA\n 0.031\n#\n 5 2 2 \nNA\n 0.034\n#\n 6 2 2 \nNA\n 0.031\n\n\n\n\nAn easy-to-read version of the optimization path for sequential feature selection can be\nobtained with function \nanalyzeFeatSelResult\n.\n\n\nanalyzeFeatSelResult(r$extract[[1]])\n#\n Features : 10\n#\n Performance : mse.test.mean=20.2\n#\n crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 80.33 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 36.451 Diff: 43.879 *\n#\n - Features: 2 Add : rm Perf = 27.289 Diff: 9.1623 *\n#\n - Features: 3 Add : ptratio Perf = 24.004 Diff: 3.2849 *\n#\n - Features: 4 Add : nox Perf = 23.513 Diff: 0.49082 *\n#\n - Features: 5 Add : dis Perf = 21.49 Diff: 2.023 *\n#\n - Features: 6 Add : crim Perf = 21.12 Diff: 0.37008 *\n#\n - Features: 7 Add : indus Perf = 20.82 Diff: 0.29994 *\n#\n - Features: 8 Add : rad Perf = 20.609 Diff: 0.21054 *\n#\n - Features: 9 Add : tax Perf = 20.209 Diff: 0.40059 *\n#\n - Features: 10 Add : zn Perf = 20.159 Diff: 0.049441 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nFilter methods with tuning\n\n\nFilter methods assign an importance value to each feature.\nBased on these values you can select a feature subset by either keeping all features with importance\nhigher than a certain threshold or by keeping a fixed number or percentage of the highest ranking features.\nOften, neither the theshold nor the number or percentage of features is known in advance\nand thus tuning is necessary.\n\n\nIn the example below the threshold value (\nfw.threshold\n) is tuned in the inner resampling loop.\nFor this purpose the base \nLearner\n \n\"regr.lm\"\n is wrapped two times.\nFirst, \nmakeFilterWrapper\n is used to fuse linear regression with a feature filtering\npreprocessing step. Then a tuning step is added by \nmakeTuneWrapper\n.\n\n\n## Tuning of the percentage of selected filters in the inner loop\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.threshold\n, values = seq(0, 1, 0.2)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nCV\n, iters = 3)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nCV\n, iters = 3)\nr = resample(learner = lrn, task = bh.task, resampling = outer, models = TRUE, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: BostonHousing-example\n#\n Learner: regr.lm.filtered.tuned\n#\n mse.aggr: 25.39\n#\n mse.mean: 25.39\n#\n mse.sd: 8.35\n#\n Runtime: 9.75966\n\n\n\n\nAccessing the selected features and optimal percentage\n\n\nIn the above example we kept the complete \nmodel\ns.\n\n\nBelow are some examples that show how to extract information from the \nmodel\ns.\n\n\nr$models\n#\n [[1]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n#\n \n#\n [[2]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 338; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n#\n \n#\n [[3]]\n#\n Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n#\n Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n#\n Hyperparameters: fw.method=chi.squared\n\n\n\n\nThe result of the feature selection can be extracted by function \ngetFilteredFeatures\n.\nAlmost always all 13 features are selected.\n\n\nlapply(r$models, function(x) getFilteredFeatures(x$learner.model$next.model))\n#\n [[1]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nchas\n \nnox\n \nrm\n \nage\n \n#\n [8] \ndis\n \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n \n#\n \n#\n [[2]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nnox\n \nrm\n \nage\n \ndis\n \n#\n [8] \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n \n#\n \n#\n [[3]]\n#\n [1] \ncrim\n \nzn\n \nindus\n \nchas\n \nnox\n \nrm\n \nage\n \n#\n [8] \ndis\n \nrad\n \ntax\n \nptratio\n \nb\n \nlstat\n\n\n\n\n\nBelow the \ntune results\n and \noptimization paths\n\nare accessed.\n\n\nres = lapply(r$models, getTuneResult)\nres\n#\n [[1]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0\n#\n mse.test.mean=24.9\n#\n \n#\n [[2]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0.4\n#\n mse.test.mean=27.2\n#\n \n#\n [[3]]\n#\n Tune result:\n#\n Op. pars: fw.threshold=0\n#\n mse.test.mean=19.7\n\nopt.paths = lapply(res, function(x) as.data.frame(x$opt.path))\nopt.paths[[1]]\n#\n fw.threshold mse.test.mean dob eol error.message exec.time\n#\n 1 0 24.89160 1 NA \nNA\n 0.584\n#\n 2 0.2 25.18817 2 NA \nNA\n 0.283\n#\n 3 0.4 25.18817 3 NA \nNA\n 0.263\n#\n 4 0.6 32.15930 4 NA \nNA\n 0.251\n#\n 5 0.8 90.89848 5 NA \nNA\n 0.241\n#\n 6 1 90.89848 6 NA \nNA\n 0.228\n\n\n\n\nBenchmark experiments\n\n\nIn a benchmark experiment multiple learners are compared on one or several tasks\n(see also the section about \nbenchmarking\n).\nNested resampling in benchmark experiments is achieved the same way as in resampling:\n\n\n\n\nFirst, use \nmakeTuneWrapper\n or \nmakeFeatSelWrapper\n to generate wrapped \nLearner\ns\n with the inner resampling strategies of your choice.\n\n\nSecond, call \nbenchmark\n and specify the outer resampling strategies for all tasks.\n\n\n\n\nThe inner resampling strategies should be \nresample descriptions\n.\nYou can use different inner resampling strategies for different wrapped learners.\nFor example it might be practical to do fewer subsampling or bootstrap iterations for slower\nlearners.\n\n\nIf you have larger benchmark experiments you might want to have a look at the section\nabout \nparallelization\n.\n\n\nAs mentioned in the section about \nbenchmark experiments\n you can also use\ndifferent resampling strategies for different learning tasks by passing a\n\nlist\n of resampling descriptions or instances to \nbenchmark\n.\n\n\nWe will see three examples to show different benchmark settings:\n\n\n\n\nTwo data sets + two classification algorithms + tuning\n\n\nOne data set + two regression algorithms + feature selection\n\n\nOne data set + two regression algorithms + feature filtering + tuning\n\n\n\n\nExample 1: Two tasks, two learners, tuning\n\n\nBelow is a benchmark experiment with two data sets, \niris\n and\n\nsonar\n, and two \nLearner\ns,\n\nksvm\n and \nkknn\n, that are both tuned.\n\n\nAs inner resampling strategies we use holdout for \nksvm\n and subsampling\nwith 3 iterations for \nkknn\n.\nAs outer resampling strategies we take holdout for the \niris\n and bootstrap\nwith 2 iterations for the \nsonar\n data (\nsonar.task\n).\nWe consider the accuracy (\nacc\n), which is used as tuning criterion, and also\ncalculate the balanced error rate (\nber\n).\n\n\n## List of learning tasks\ntasks = list(iris.task, sonar.task)\n\n## Tune svm in the inner resampling loop\nps = makeParamSet(\n makeDiscreteParam(\nC\n, 2^(-1:1)),\n makeDiscreteParam(\nsigma\n, 2^(-1:1)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nHoldout\n)\nlrn1 = makeTuneWrapper(\nclassif.ksvm\n, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Tune k-nearest neighbor in inner resampling loop\nps = makeParamSet(makeDiscreteParam(\nk\n, 3:5))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nSubsample\n, iters = 3)\nlrn2 = makeTuneWrapper(\nclassif.kknn\n, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(lrn1, lrn2)\n\n## Outer resampling loop\nouter = list(makeResampleDesc(\nHoldout\n), makeResampleDesc(\nBootstrap\n, iters = 2))\nres = benchmark(lrns, tasks, outer, measures = list(acc, ber), show.info = FALSE)\nres\n#\n task.id learner.id acc.test.mean ber.test.mean\n#\n 1 iris-example classif.ksvm.tuned 0.9400000 0.05882353\n#\n 2 iris-example classif.kknn.tuned 0.9200000 0.08683473\n#\n 3 Sonar-example classif.ksvm.tuned 0.5289307 0.50000000\n#\n 4 Sonar-example classif.kknn.tuned 0.8077080 0.19549714\n\n\n\n\nThe \nprint\n method for the \nBenchmarkResult\n shows the aggregated performances\nfrom the outer resampling loop.\n\n\nAs you might recall, \nmlr\n offers several accessor function to extract information from\nthe benchmark result.\nThese are listed on the help page of \nBenchmarkResult\n and many examples are shown on the\ntutorial page about \nbenchmark experiments\n.\n\n\nThe performance values in individual outer resampling runs can be obtained by \ngetBMRPerformances\n.\nNote that, since we used different outer resampling strategies for the two tasks, the number\nof rows per task differ.\n\n\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter acc ber\n#\n 1 iris-example classif.ksvm.tuned 1 0.9400000 0.05882353\n#\n 2 iris-example classif.kknn.tuned 1 0.9200000 0.08683473\n#\n 3 Sonar-example classif.ksvm.tuned 1 0.5373134 0.50000000\n#\n 4 Sonar-example classif.ksvm.tuned 2 0.5205479 0.50000000\n#\n 5 Sonar-example classif.kknn.tuned 1 0.8208955 0.18234767\n#\n 6 Sonar-example classif.kknn.tuned 2 0.7945205 0.20864662\n\n\n\n\nThe results from the parameter tuning can be obtained through function \ngetBMRTuneResults\n.\n\n\ngetBMRTuneResults(res)\n#\n $`iris-example`\n#\n $`iris-example`$classif.ksvm.tuned\n#\n $`iris-example`$classif.ksvm.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: C=0.5; sigma=0.5\n#\n mmce.test.mean=0.0588\n#\n \n#\n \n#\n $`iris-example`$classif.kknn.tuned\n#\n $`iris-example`$classif.kknn.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: k=3\n#\n mmce.test.mean=0.049\n#\n \n#\n \n#\n \n#\n $`Sonar-example`\n#\n $`Sonar-example`$classif.ksvm.tuned\n#\n $`Sonar-example`$classif.ksvm.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: C=1; sigma=2\n#\n mmce.test.mean=0.343\n#\n \n#\n $`Sonar-example`$classif.ksvm.tuned[[2]]\n#\n Tune result:\n#\n Op. pars: C=2; sigma=0.5\n#\n mmce.test.mean= 0.2\n#\n \n#\n \n#\n $`Sonar-example`$classif.kknn.tuned\n#\n $`Sonar-example`$classif.kknn.tuned[[1]]\n#\n Tune result:\n#\n Op. pars: k=4\n#\n mmce.test.mean=0.11\n#\n \n#\n $`Sonar-example`$classif.kknn.tuned[[2]]\n#\n Tune result:\n#\n Op. pars: k=3\n#\n mmce.test.mean=0.0667\n\n\n\n\nAs for several other accessor functions a clearer representation as \ndata.frame\n\ncan be achieved by setting \nas.df = TRUE\n.\n\n\ngetBMRTuneResults(res, as.df = TRUE)\n#\n task.id learner.id iter C sigma mmce.test.mean k\n#\n 1 iris-example classif.ksvm.tuned 1 0.5 0.5 0.05882353 NA\n#\n 2 iris-example classif.kknn.tuned 1 NA NA 0.04901961 3\n#\n 3 Sonar-example classif.ksvm.tuned 1 1.0 2.0 0.34285714 NA\n#\n 4 Sonar-example classif.ksvm.tuned 2 2.0 0.5 0.20000000 NA\n#\n 5 Sonar-example classif.kknn.tuned 1 NA NA 0.10952381 4\n#\n 6 Sonar-example classif.kknn.tuned 2 NA NA 0.06666667 3\n\n\n\n\nIt is also possible to extract the tuning results for individual tasks and learners and,\nas shown in earlier examples, inspect the \noptimization path\n.\n\n\ntune.res = getBMRTuneResults(res, task.ids = \nSonar-example\n, learner.ids = \nclassif.ksvm.tuned\n,\n as.df = TRUE)\ntune.res\n#\n task.id learner.id iter C sigma mmce.test.mean\n#\n 1 Sonar-example classif.ksvm.tuned 1 1 2.0 0.3428571\n#\n 2 Sonar-example classif.ksvm.tuned 2 2 0.5 0.2000000\n\ngetNestedTuneResultsOptPathDf(res$results[[\nSonar-example\n]][[\nclassif.ksvm.tuned\n]])\n#\n C sigma mmce.test.mean dob eol error.message exec.time iter\n#\n 1 0.5 0.5 0.3428571 1 NA \nNA\n 0.056 1\n#\n 2 1 0.5 0.3428571 2 NA \nNA\n 0.052 1\n#\n 3 2 0.5 0.3428571 3 NA \nNA\n 0.048 1\n#\n 4 0.5 1 0.3428571 4 NA \nNA\n 0.049 1\n#\n 5 1 1 0.3428571 5 NA \nNA\n 0.048 1\n#\n 6 2 1 0.3428571 6 NA \nNA\n 0.047 1\n#\n 7 0.5 2 0.3428571 7 NA \nNA\n 0.049 1\n#\n 8 1 2 0.3428571 8 NA \nNA\n 0.052 1\n#\n 9 2 2 0.3428571 9 NA \nNA\n 0.048 1\n#\n 10 0.5 0.5 0.2142857 1 NA \nNA\n 0.052 2\n#\n 11 1 0.5 0.2142857 2 NA \nNA\n 0.052 2\n#\n 12 2 0.5 0.2000000 3 NA \nNA\n 0.048 2\n#\n 13 0.5 1 0.2142857 4 NA \nNA\n 0.049 2\n#\n 14 1 1 0.2142857 5 NA \nNA\n 0.049 2\n#\n 15 2 1 0.2142857 6 NA \nNA\n 0.051 2\n#\n 16 0.5 2 0.2142857 7 NA \nNA\n 0.056 2\n#\n 17 1 2 0.2142857 8 NA \nNA\n 0.056 2\n#\n 18 2 2 0.2142857 9 NA \nNA\n 0.051 2\n\n\n\n\nExample 2: One task, two learners, feature selection\n\n\nLet's see how we can do \nfeature selection\n in\na benchmark experiment:\n\n\n## Feature selection in inner resampling loop\nctrl = makeFeatSelControlSequential(method = \nsfs\n)\ninner = makeResampleDesc(\nSubsample\n, iters = 2)\nlrn = makeFeatSelWrapper(\nregr.lm\n, resampling = inner, control = ctrl, show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner(\nregr.rpart\n), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iters = 2)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n#\n task.id learner.id mse.test.mean\n#\n 1 BostonHousing-example regr.rpart 25.86232\n#\n 2 BostonHousing-example regr.lm.featsel 25.07465\n\n\n\n\nThe selected features can be extracted by function \ngetBMRFeatSelResults\n.\n\n\ngetBMRFeatSelResults(res)\n#\n $`BostonHousing-example`\n#\n $`BostonHousing-example`$regr.rpart\n#\n NULL\n#\n \n#\n $`BostonHousing-example`$regr.lm.featsel\n#\n $`BostonHousing-example`$regr.lm.featsel[[1]]\n#\n FeatSel result:\n#\n Features (8): crim, zn, chas, nox, rm, dis, ptratio, lstat\n#\n mse.test.mean=26.7\n#\n \n#\n $`BostonHousing-example`$regr.lm.featsel[[2]]\n#\n FeatSel result:\n#\n Features (10): crim, zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n#\n mse.test.mean=24.3\n\n\n\n\nYou can access results for individual learners and tasks and inspect them further.\n\n\nfeats = getBMRFeatSelResults(res, learner.id = \nregr.lm.featsel\n)\nfeats = feats$`BostonHousing-example`$`regr.lm.featsel`\n\n## Selected features in the first outer resampling iteration\nfeats[[1]]$x\n#\n [1] \ncrim\n \nzn\n \nchas\n \nnox\n \nrm\n \ndis\n \nptratio\n\n#\n [8] \nlstat\n\n\n## Resampled performance of the selected feature subset on the first inner training set\nfeats[[1]]$y\n#\n mse.test.mean \n#\n 26.72574\n\n\n\n\nAs for tuning, you can extract the optimization paths. The resulting \ndata.frame\ns\ncontain, among others, binary columns for all features, indicating if they were included in the\nlinear regression model, and the corresponding performances.\n\nanalyzeFeatSelResult\n gives a clearer overview.\n\n\nopt.paths = lapply(feats, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n#\n crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n#\n 1 0 0 0 0 0 0 0 0 0 0 0 0 0 90.16159\n#\n 2 1 0 0 0 0 0 0 0 0 0 0 0 0 82.85880\n#\n 3 0 1 0 0 0 0 0 0 0 0 0 0 0 79.55202\n#\n 4 0 0 1 0 0 0 0 0 0 0 0 0 0 70.02071\n#\n 5 0 0 0 1 0 0 0 0 0 0 0 0 0 86.93409\n#\n 6 0 0 0 0 1 0 0 0 0 0 0 0 0 76.32457\n#\n dob eol error.message exec.time\n#\n 1 1 2 \nNA\n 0.019\n#\n 2 2 2 \nNA\n 0.024\n#\n 3 2 2 \nNA\n 0.024\n#\n 4 2 2 \nNA\n 0.024\n#\n 5 2 2 \nNA\n 0.027\n#\n 6 2 2 \nNA\n 0.024\n\nanalyzeFeatSelResult(feats[[1]])\n#\n Features : 8\n#\n Performance : mse.test.mean=26.7\n#\n crim, zn, chas, nox, rm, dis, ptratio, lstat\n#\n \n#\n Path to optimum:\n#\n - Features: 0 Init : Perf = 90.162 Diff: NA *\n#\n - Features: 1 Add : lstat Perf = 42.646 Diff: 47.515 *\n#\n - Features: 2 Add : ptratio Perf = 34.52 Diff: 8.1263 *\n#\n - Features: 3 Add : rm Perf = 30.454 Diff: 4.066 *\n#\n - Features: 4 Add : dis Perf = 29.405 Diff: 1.0495 *\n#\n - Features: 5 Add : nox Perf = 28.059 Diff: 1.3454 *\n#\n - Features: 6 Add : chas Perf = 27.334 Diff: 0.72499 *\n#\n - Features: 7 Add : zn Perf = 26.901 Diff: 0.43296 *\n#\n - Features: 8 Add : crim Perf = 26.726 Diff: 0.17558 *\n#\n \n#\n Stopped, because no improving feature was found.\n\n\n\n\nExample 3: One task, two learners, feature filtering with tuning\n\n\nHere is a minimal example for feature filtering with tuning of the feature subset size.\n\n\n## Feature filtering with tuning in the inner resampling loop\nlrn = makeFilterWrapper(learner = \nregr.lm\n, fw.method = \nchi.squared\n)\nps = makeParamSet(makeDiscreteParam(\nfw.abs\n, values = seq_len(getTaskNFeats(bh.task))))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc(\nCV\n, iter = 2)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner(\nregr.rpart\n), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc(\nSubsample\n, iter = 3)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n#\n task.id learner.id mse.test.mean\n#\n 1 BostonHousing-example regr.rpart 22.11687\n#\n 2 BostonHousing-example regr.lm.filtered.tuned 23.76666\n\n\n\n\n## Performances on individual outer test data sets\ngetBMRPerformances(res, as.df = TRUE)\n#\n task.id learner.id iter mse\n#\n 1 BostonHousing-example regr.rpart 1 23.55486\n#\n 2 BostonHousing-example regr.rpart 2 20.03453\n#\n 3 BostonHousing-example regr.rpart 3 22.76121\n#\n 4 BostonHousing-example regr.lm.filtered.tuned 1 27.51086\n#\n 5 BostonHousing-example regr.lm.filtered.tuned 2 24.87820\n#\n 6 BostonHousing-example regr.lm.filtered.tuned 3 18.91091", "title": "Nested Resampling" }, { @@ -432,22 +432,22 @@ }, { "location": "/nested_resampling/index.html#tuning", - "text": "As you might recall from the tutorial page about tuning , you need to define a search space by\nfunction makeParamSet , a search strategy by makeTuneControl* ,\nand a method to evaluate hyperparameter settings (i.e., the inner resampling strategy and a performance measure). Below is a classification example.\nWe evaluate the performance of a support vector machine ( ksvm ) with tuned\ncost parameter C and RBF kernel parameter sigma .\nWe use 3-fold cross-validation in the outer and subsampling with 2 iterations in the inner\nloop.\nFor tuning a grid search is used to find the hyperparameters with lowest error rate\n( mmce is the default measure for classification).\nThe wrapped Learner is generated by calling makeTuneWrapper . Note that in practice the parameter set should be larger.\nA common recommendation is 2^(-12:12) for both C and sigma . ## Tuning in inner resampling loop\nps = makeParamSet(\n makeDiscreteParam( C , values = 2^(-2:2)),\n makeDiscreteParam( sigma , values = 2^(-2:2))\n)\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Subsample , iters = 2)\nlrn = makeTuneWrapper( classif.ksvm , resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( CV , iters = 3)\nr = resample(lrn, iris.task, resampling = outer, extract = getTuneResult, show.info = FALSE)\n\nr\n# Resample Result\n# Task: iris-example\n# Learner: classif.ksvm.tuned\n# mmce.aggr: 0.05\n# mmce.mean: 0.05\n# mmce.sd: 0.03\n# Runtime: 18.9149 You can obtain the error rates on the 3 outer test sets by: r$measures.test\n# iter mmce\n# 1 1 0.02\n# 2 2 0.06\n# 3 3 0.08 Accessing the tuning result We have kept the results of the tuning for further evaluations.\nFor example one might want to find out, if the best obtained configurations vary for the\ndifferent outer splits.\nAs storing entire models may be expensive (but possible by setting models = TRUE ) we used\nthe extract option of resample .\nFunction getTuneResult returns, among other things, the optimal hyperparameter values and\nthe optimization path for each iteration of the outer resampling loop.\nNote that the performance values shown when printing r$extract are the aggregated performances\nresulting from inner resampling on the outer training set for the best hyperparameter configurations\n(not to be confused with r$measures.test shown above). r$extract\n# [[1]]\n# Tune result:\n# Op. pars: C=2; sigma=0.25\n# mmce.test.mean=0.0147\n# \n# [[2]]\n# Tune result:\n# Op. pars: C=4; sigma=0.25\n# mmce.test.mean= 0\n# \n# [[3]]\n# Tune result:\n# Op. pars: C=4; sigma=0.25\n# mmce.test.mean=0.0735\n\nnames(r$extract[[1]])\n# [1] learner control x y threshold opt.path We can compare the optimal parameter settings obtained in the 3 resampling iterations.\nAs you can see, the optimal configuration usually depends on the data. You may\nbe able to identify a range of parameter settings that achieve good\nperformance though, e.g., the values for C should be at least 1 and the values\nfor sigma should be between 0 and 1. With function getNestedTuneResultsOptPathDf you can extract the optimization paths\nfor the 3 outer cross-validation iterations for further inspection and analysis.\nThese are stacked in one data.frame with column iter indicating the\nresampling iteration. opt.paths = getNestedTuneResultsOptPathDf(r)\nhead(opt.paths, 10)\n# C sigma mmce.test.mean dob eol error.message exec.time iter\n# 1 0.25 0.25 0.05882353 1 NA NA 0.046 1\n# 2 0.5 0.25 0.04411765 2 NA NA 0.045 1\n# 3 1 0.25 0.04411765 3 NA NA 0.046 1\n# 4 2 0.25 0.01470588 4 NA NA 0.047 1\n# 5 4 0.25 0.05882353 5 NA NA 0.046 1\n# 6 0.25 0.5 0.05882353 6 NA NA 0.046 1\n# 7 0.5 0.5 0.01470588 7 NA NA 0.046 1\n# 8 1 0.5 0.02941176 8 NA NA 0.045 1\n# 9 2 0.5 0.01470588 9 NA NA 0.046 1\n# 10 4 0.5 0.05882353 10 NA NA 0.049 1 Below we visualize the opt.path s for the 3 outer resampling iterations. g = ggplot(opt.paths, aes(x = C, y = sigma, fill = mmce.test.mean))\ng + geom_tile() + facet_wrap(~ iter) Another useful function is getNestedTuneResultsX , which extracts the best found hyperparameter\nsettings for each outer resampling iteration. getNestedTuneResultsX(r)\n# C sigma\n# 1 2 0.25\n# 2 4 0.25\n# 3 4 0.25", + "text": "As you might recall from the tutorial page about tuning , you need to define a search space by\nfunction makeParamSet , a search strategy by makeTuneControl* ,\nand a method to evaluate hyperparameter settings (i.e., the inner resampling strategy and a performance measure). Below is a classification example.\nWe evaluate the performance of a support vector machine ( ksvm ) with tuned\ncost parameter C and RBF kernel parameter sigma .\nWe use 3-fold cross-validation in the outer and subsampling with 2 iterations in the inner\nloop.\nFor tuning a grid search is used to find the hyperparameters with lowest error rate\n( mmce is the default measure for classification).\nThe wrapped Learner is generated by calling makeTuneWrapper . Note that in practice the parameter set should be larger.\nA common recommendation is 2^(-12:12) for both C and sigma . ## Tuning in inner resampling loop\nps = makeParamSet(\n makeDiscreteParam( C , values = 2^(-2:2)),\n makeDiscreteParam( sigma , values = 2^(-2:2))\n)\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Subsample , iters = 2)\nlrn = makeTuneWrapper( classif.ksvm , resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( CV , iters = 3)\nr = resample(lrn, iris.task, resampling = outer, extract = getTuneResult, show.info = FALSE)\n\nr\n# Resample Result\n# Task: iris-example\n# Learner: classif.ksvm.tuned\n# mmce.aggr: 0.05\n# mmce.mean: 0.05\n# mmce.sd: 0.03\n# Runtime: 23.346 You can obtain the error rates on the 3 outer test sets by: r$measures.test\n# iter mmce\n# 1 1 0.02\n# 2 2 0.06\n# 3 3 0.08 Accessing the tuning result We have kept the results of the tuning for further evaluations.\nFor example one might want to find out, if the best obtained configurations vary for the\ndifferent outer splits.\nAs storing entire models may be expensive (but possible by setting models = TRUE ) we used\nthe extract option of resample .\nFunction getTuneResult returns, among other things, the optimal hyperparameter values and\nthe optimization path for each iteration of the outer resampling loop.\nNote that the performance values shown when printing r$extract are the aggregated performances\nresulting from inner resampling on the outer training set for the best hyperparameter configurations\n(not to be confused with r$measures.test shown above). r$extract\n# [[1]]\n# Tune result:\n# Op. pars: C=2; sigma=0.25\n# mmce.test.mean=0.0147\n# \n# [[2]]\n# Tune result:\n# Op. pars: C=4; sigma=0.25\n# mmce.test.mean= 0\n# \n# [[3]]\n# Tune result:\n# Op. pars: C=4; sigma=0.25\n# mmce.test.mean=0.0735\n\nnames(r$extract[[1]])\n# [1] learner control x y threshold opt.path We can compare the optimal parameter settings obtained in the 3 resampling iterations.\nAs you can see, the optimal configuration usually depends on the data. You may\nbe able to identify a range of parameter settings that achieve good\nperformance though, e.g., the values for C should be at least 1 and the values\nfor sigma should be between 0 and 1. With function getNestedTuneResultsOptPathDf you can extract the optimization paths\nfor the 3 outer cross-validation iterations for further inspection and analysis.\nThese are stacked in one data.frame with column iter indicating the\nresampling iteration. opt.paths = getNestedTuneResultsOptPathDf(r)\nhead(opt.paths, 10)\n# C sigma mmce.test.mean dob eol error.message exec.time iter\n# 1 0.25 0.25 0.05882353 1 NA NA 0.048 1\n# 2 0.5 0.25 0.04411765 2 NA NA 0.045 1\n# 3 1 0.25 0.04411765 3 NA NA 0.045 1\n# 4 2 0.25 0.01470588 4 NA NA 0.046 1\n# 5 4 0.25 0.05882353 5 NA NA 0.045 1\n# 6 0.25 0.5 0.05882353 6 NA NA 0.047 1\n# 7 0.5 0.5 0.01470588 7 NA NA 0.046 1\n# 8 1 0.5 0.02941176 8 NA NA 0.045 1\n# 9 2 0.5 0.01470588 9 NA NA 0.045 1\n# 10 4 0.5 0.05882353 10 NA NA 0.046 1 Below we visualize the opt.path s for the 3 outer resampling iterations. g = ggplot(opt.paths, aes(x = C, y = sigma, fill = mmce.test.mean))\ng + geom_tile() + facet_wrap(~ iter) Another useful function is getNestedTuneResultsX , which extracts the best found hyperparameter\nsettings for each outer resampling iteration. getNestedTuneResultsX(r)\n# C sigma\n# 1 2 0.25\n# 2 4 0.25\n# 3 4 0.25", "title": "Tuning" }, { "location": "/nested_resampling/index.html#feature-selection", - "text": "As you might recall from the section about feature selection , mlr \nsupports the filter and the wrapper approach. Wrapper methods Wrapper methods use the performance of a learning algorithm to assess the usefulness of a\nfeature set. In order to select a feature subset a learner is trained repeatedly on different\nfeature subsets and the subset which leads to the best learner performance is chosen. For feature selection in the inner resampling loop, you need to choose a search strategy\n(function makeFeatSelControl* ), a performance measure and the inner\nresampling strategy. Then use function makeFeatSelWrapper to bind everything together. Below we use sequential forward selection with linear regression on the BostonHousing data set ( bh.task ). ## Feature selection in inner resampling loop\ninner = makeResampleDesc( CV , iters = 3)\nlrn = makeFeatSelWrapper( regr.lm , resampling = inner,\n control = makeFeatSelControlSequential(method = sfs ), show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iters = 2)\nr = resample(learner = lrn, task = bh.task, resampling = outer, extract = getFeatSelResult,\n show.info = FALSE)\n\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.lm.featsel\n# mse.aggr: 31.70\n# mse.mean: 31.70\n# mse.sd: 4.79\n# Runtime: 49.0522\n\nr$measures.test\n# iter mse\n# 1 1 35.08611\n# 2 2 28.31215 Accessing the selected features The result of the feature selection can be extracted by function getFeatSelResult .\nIt is also possible to keep whole models by setting models = TRUE \nwhen calling resample . r$extract\n# [[1]]\n# FeatSel result:\n# Features (10): crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n# mse.test.mean=20.2\n# \n# [[2]]\n# FeatSel result:\n# Features (9): zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n# mse.test.mean=22.6\n\n## Selected features in the first outer resampling iteration\nr$extract[[1]]$x\n# [1] crim zn indus nox rm dis rad \n# [8] tax ptratio lstat \n\n## Resampled performance of the selected feature subset on the first inner training set\nr$extract[[1]]$y\n# mse.test.mean \n# 20.15939 As for tuning, you can extract the optimization paths.\nThe resulting data.frame s contain, among others, binary columns for\nall features, indicating if they were included in the linear regression model, and the\ncorresponding performances. opt.paths = lapply(r$extract, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n# crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 80.33019\n# 2 1 0 0 0 0 0 0 0 0 0 0 0 0 65.95316\n# 3 0 1 0 0 0 0 0 0 0 0 0 0 0 69.15417\n# 4 0 0 1 0 0 0 0 0 0 0 0 0 0 55.75473\n# 5 0 0 0 1 0 0 0 0 0 0 0 0 0 80.48765\n# 6 0 0 0 0 1 0 0 0 0 0 0 0 0 63.06724\n# dob eol error.message exec.time\n# 1 1 2 NA 0.022\n# 2 2 2 NA 0.038\n# 3 2 2 NA 0.034\n# 4 2 2 NA 0.034\n# 5 2 2 NA 0.034\n# 6 2 2 NA 0.034 An easy-to-read version of the optimization path for sequential feature selection can be\nobtained with function analyzeFeatSelResult . analyzeFeatSelResult(r$extract[[1]])\n# Features : 10\n# Performance : mse.test.mean=20.2\n# crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n# \n# Path to optimum:\n# - Features: 0 Init : Perf = 80.33 Diff: NA *\n# - Features: 1 Add : lstat Perf = 36.451 Diff: 43.879 *\n# - Features: 2 Add : rm Perf = 27.289 Diff: 9.1623 *\n# - Features: 3 Add : ptratio Perf = 24.004 Diff: 3.2849 *\n# - Features: 4 Add : nox Perf = 23.513 Diff: 0.49082 *\n# - Features: 5 Add : dis Perf = 21.49 Diff: 2.023 *\n# - Features: 6 Add : crim Perf = 21.12 Diff: 0.37008 *\n# - Features: 7 Add : indus Perf = 20.82 Diff: 0.29994 *\n# - Features: 8 Add : rad Perf = 20.609 Diff: 0.21054 *\n# - Features: 9 Add : tax Perf = 20.209 Diff: 0.40059 *\n# - Features: 10 Add : zn Perf = 20.159 Diff: 0.049441 *\n# \n# Stopped, because no improving feature was found. Filter methods with tuning Filter methods assign an importance value to each feature.\nBased on these values you can select a feature subset by either keeping all features with importance\nhigher than a certain threshold or by keeping a fixed number or percentage of the highest ranking features.\nOften, neither the theshold nor the number or percentage of features is known in advance\nand thus tuning is necessary. In the example below the threshold value ( fw.threshold ) is tuned in the inner resampling loop.\nFor this purpose the base Learner \"regr.lm\" is wrapped two times.\nFirst, makeFilterWrapper is used to fuse linear regression with a feature filtering\npreprocessing step. Then a tuning step is added by makeTuneWrapper . ## Tuning of the percentage of selected filters in the inner loop\nlrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.threshold , values = seq(0, 1, 0.2)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( CV , iters = 3)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( CV , iters = 3)\nr = resample(learner = lrn, task = bh.task, resampling = outer, models = TRUE, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.lm.filtered.tuned\n# mse.aggr: 25.39\n# mse.mean: 25.39\n# mse.sd: 8.35\n# Runtime: 8.5303 Accessing the selected features and optimal percentage In the above example we kept the complete model s. Below are some examples that show how to extract information from the model s. r$models\n# [[1]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n# Hyperparameters: fw.method=chi.squared\n# \n# [[2]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 338; features = 13\n# Hyperparameters: fw.method=chi.squared\n# \n# [[3]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n# Hyperparameters: fw.method=chi.squared The result of the feature selection can be extracted by function getFilteredFeatures .\nAlmost always all 13 features are selected. lapply(r$models, function(x) getFilteredFeatures(x$learner.model$next.model))\n# [[1]]\n# [1] crim zn indus chas nox rm age \n# [8] dis rad tax ptratio b lstat \n# \n# [[2]]\n# [1] crim zn indus nox rm age dis \n# [8] rad tax ptratio b lstat \n# \n# [[3]]\n# [1] crim zn indus chas nox rm age \n# [8] dis rad tax ptratio b lstat Below the tune results and optimization paths \nare accessed. res = lapply(r$models, getTuneResult)\nres\n# [[1]]\n# Tune result:\n# Op. pars: fw.threshold=0\n# mse.test.mean=24.9\n# \n# [[2]]\n# Tune result:\n# Op. pars: fw.threshold=0.4\n# mse.test.mean=27.2\n# \n# [[3]]\n# Tune result:\n# Op. pars: fw.threshold=0\n# mse.test.mean=19.7\n\nopt.paths = lapply(res, function(x) as.data.frame(x$opt.path))\nopt.paths[[1]]\n# fw.threshold mse.test.mean dob eol error.message exec.time\n# 1 0 24.89160 1 NA NA 0.560\n# 2 0.2 25.18817 2 NA NA 0.273\n# 3 0.4 25.18817 3 NA NA 0.262\n# 4 0.6 32.15930 4 NA NA 0.245\n# 5 0.8 90.89848 5 NA NA 0.233\n# 6 1 90.89848 6 NA NA 0.218", + "text": "As you might recall from the section about feature selection , mlr \nsupports the filter and the wrapper approach. Wrapper methods Wrapper methods use the performance of a learning algorithm to assess the usefulness of a\nfeature set. In order to select a feature subset a learner is trained repeatedly on different\nfeature subsets and the subset which leads to the best learner performance is chosen. For feature selection in the inner resampling loop, you need to choose a search strategy\n(function makeFeatSelControl* ), a performance measure and the inner\nresampling strategy. Then use function makeFeatSelWrapper to bind everything together. Below we use sequential forward selection with linear regression on the BostonHousing data set ( bh.task ). ## Feature selection in inner resampling loop\ninner = makeResampleDesc( CV , iters = 3)\nlrn = makeFeatSelWrapper( regr.lm , resampling = inner,\n control = makeFeatSelControlSequential(method = sfs ), show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iters = 2)\nr = resample(learner = lrn, task = bh.task, resampling = outer, extract = getFeatSelResult,\n show.info = FALSE)\n\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.lm.featsel\n# mse.aggr: 31.70\n# mse.mean: 31.70\n# mse.sd: 4.79\n# Runtime: 52.561\n\nr$measures.test\n# iter mse\n# 1 1 35.08611\n# 2 2 28.31215 Accessing the selected features The result of the feature selection can be extracted by function getFeatSelResult .\nIt is also possible to keep whole models by setting models = TRUE \nwhen calling resample . r$extract\n# [[1]]\n# FeatSel result:\n# Features (10): crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n# mse.test.mean=20.2\n# \n# [[2]]\n# FeatSel result:\n# Features (9): zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n# mse.test.mean=22.6\n\n## Selected features in the first outer resampling iteration\nr$extract[[1]]$x\n# [1] crim zn indus nox rm dis rad \n# [8] tax ptratio lstat \n\n## Resampled performance of the selected feature subset on the first inner training set\nr$extract[[1]]$y\n# mse.test.mean \n# 20.15939 As for tuning, you can extract the optimization paths.\nThe resulting data.frame s contain, among others, binary columns for\nall features, indicating if they were included in the linear regression model, and the\ncorresponding performances. opt.paths = lapply(r$extract, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n# crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 80.33019\n# 2 1 0 0 0 0 0 0 0 0 0 0 0 0 65.95316\n# 3 0 1 0 0 0 0 0 0 0 0 0 0 0 69.15417\n# 4 0 0 1 0 0 0 0 0 0 0 0 0 0 55.75473\n# 5 0 0 0 1 0 0 0 0 0 0 0 0 0 80.48765\n# 6 0 0 0 0 1 0 0 0 0 0 0 0 0 63.06724\n# dob eol error.message exec.time\n# 1 1 2 NA 0.022\n# 2 2 2 NA 0.033\n# 3 2 2 NA 0.032\n# 4 2 2 NA 0.031\n# 5 2 2 NA 0.034\n# 6 2 2 NA 0.031 An easy-to-read version of the optimization path for sequential feature selection can be\nobtained with function analyzeFeatSelResult . analyzeFeatSelResult(r$extract[[1]])\n# Features : 10\n# Performance : mse.test.mean=20.2\n# crim, zn, indus, nox, rm, dis, rad, tax, ptratio, lstat\n# \n# Path to optimum:\n# - Features: 0 Init : Perf = 80.33 Diff: NA *\n# - Features: 1 Add : lstat Perf = 36.451 Diff: 43.879 *\n# - Features: 2 Add : rm Perf = 27.289 Diff: 9.1623 *\n# - Features: 3 Add : ptratio Perf = 24.004 Diff: 3.2849 *\n# - Features: 4 Add : nox Perf = 23.513 Diff: 0.49082 *\n# - Features: 5 Add : dis Perf = 21.49 Diff: 2.023 *\n# - Features: 6 Add : crim Perf = 21.12 Diff: 0.37008 *\n# - Features: 7 Add : indus Perf = 20.82 Diff: 0.29994 *\n# - Features: 8 Add : rad Perf = 20.609 Diff: 0.21054 *\n# - Features: 9 Add : tax Perf = 20.209 Diff: 0.40059 *\n# - Features: 10 Add : zn Perf = 20.159 Diff: 0.049441 *\n# \n# Stopped, because no improving feature was found. Filter methods with tuning Filter methods assign an importance value to each feature.\nBased on these values you can select a feature subset by either keeping all features with importance\nhigher than a certain threshold or by keeping a fixed number or percentage of the highest ranking features.\nOften, neither the theshold nor the number or percentage of features is known in advance\nand thus tuning is necessary. In the example below the threshold value ( fw.threshold ) is tuned in the inner resampling loop.\nFor this purpose the base Learner \"regr.lm\" is wrapped two times.\nFirst, makeFilterWrapper is used to fuse linear regression with a feature filtering\npreprocessing step. Then a tuning step is added by makeTuneWrapper . ## Tuning of the percentage of selected filters in the inner loop\nlrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.threshold , values = seq(0, 1, 0.2)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( CV , iters = 3)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)\n\n## Outer resampling loop\nouter = makeResampleDesc( CV , iters = 3)\nr = resample(learner = lrn, task = bh.task, resampling = outer, models = TRUE, show.info = FALSE)\nr\n# Resample Result\n# Task: BostonHousing-example\n# Learner: regr.lm.filtered.tuned\n# mse.aggr: 25.39\n# mse.mean: 25.39\n# mse.sd: 8.35\n# Runtime: 9.75966 Accessing the selected features and optimal percentage In the above example we kept the complete model s. Below are some examples that show how to extract information from the model s. r$models\n# [[1]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n# Hyperparameters: fw.method=chi.squared\n# \n# [[2]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 338; features = 13\n# Hyperparameters: fw.method=chi.squared\n# \n# [[3]]\n# Model for learner.id=regr.lm.filtered.tuned; learner.class=TuneWrapper\n# Trained on: task.id = BostonHousing-example; obs = 337; features = 13\n# Hyperparameters: fw.method=chi.squared The result of the feature selection can be extracted by function getFilteredFeatures .\nAlmost always all 13 features are selected. lapply(r$models, function(x) getFilteredFeatures(x$learner.model$next.model))\n# [[1]]\n# [1] crim zn indus chas nox rm age \n# [8] dis rad tax ptratio b lstat \n# \n# [[2]]\n# [1] crim zn indus nox rm age dis \n# [8] rad tax ptratio b lstat \n# \n# [[3]]\n# [1] crim zn indus chas nox rm age \n# [8] dis rad tax ptratio b lstat Below the tune results and optimization paths \nare accessed. res = lapply(r$models, getTuneResult)\nres\n# [[1]]\n# Tune result:\n# Op. pars: fw.threshold=0\n# mse.test.mean=24.9\n# \n# [[2]]\n# Tune result:\n# Op. pars: fw.threshold=0.4\n# mse.test.mean=27.2\n# \n# [[3]]\n# Tune result:\n# Op. pars: fw.threshold=0\n# mse.test.mean=19.7\n\nopt.paths = lapply(res, function(x) as.data.frame(x$opt.path))\nopt.paths[[1]]\n# fw.threshold mse.test.mean dob eol error.message exec.time\n# 1 0 24.89160 1 NA NA 0.584\n# 2 0.2 25.18817 2 NA NA 0.283\n# 3 0.4 25.18817 3 NA NA 0.263\n# 4 0.6 32.15930 4 NA NA 0.251\n# 5 0.8 90.89848 5 NA NA 0.241\n# 6 1 90.89848 6 NA NA 0.228", "title": "Feature selection" }, { "location": "/nested_resampling/index.html#benchmark-experiments", - "text": "In a benchmark experiment multiple learners are compared on one or several tasks\n(see also the section about benchmarking ).\nNested resampling in benchmark experiments is achieved the same way as in resampling: First, use makeTuneWrapper or makeFeatSelWrapper to generate wrapped Learner s\n with the inner resampling strategies of your choice. Second, call benchmark and specify the outer resampling strategies for all tasks. The inner resampling strategies should be resample descriptions .\nYou can use different inner resampling strategies for different wrapped learners.\nFor example it might be practical to do fewer subsampling or bootstrap iterations for slower\nlearners. If you have larger benchmark experiments you might want to have a look at the section\nabout parallelization . As mentioned in the section about benchmark experiments you can also use\ndifferent resampling strategies for different learning tasks by passing a list of resampling descriptions or instances to benchmark . We will see three examples to show different benchmark settings: Two data sets + two classification algorithms + tuning One data set + two regression algorithms + feature selection One data set + two regression algorithms + feature filtering + tuning Example 1: Two tasks, two learners, tuning Below is a benchmark experiment with two data sets, iris and sonar , and two Learner s, ksvm and kknn , that are both tuned. As inner resampling strategies we use holdout for ksvm and subsampling\nwith 3 iterations for kknn .\nAs outer resampling strategies we take holdout for the iris and bootstrap\nwith 2 iterations for the sonar data ( sonar.task ).\nWe consider the accuracy ( acc ), which is used as tuning criterion, and also\ncalculate the balanced error rate ( ber ). ## List of learning tasks\ntasks = list(iris.task, sonar.task)\n\n## Tune svm in the inner resampling loop\nps = makeParamSet(\n makeDiscreteParam( C , 2^(-1:1)),\n makeDiscreteParam( sigma , 2^(-1:1)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Holdout )\nlrn1 = makeTuneWrapper( classif.ksvm , resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Tune k-nearest neighbor in inner resampling loop\nps = makeParamSet(makeDiscreteParam( k , 3:5))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Subsample , iters = 3)\nlrn2 = makeTuneWrapper( classif.kknn , resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(lrn1, lrn2)\n\n## Outer resampling loop\nouter = list(makeResampleDesc( Holdout ), makeResampleDesc( Bootstrap , iters = 2))\nres = benchmark(lrns, tasks, outer, measures = list(acc, ber), show.info = FALSE)\nres\n# task.id learner.id acc.test.mean ber.test.mean\n# 1 iris-example classif.ksvm.tuned 0.9400000 0.05882353\n# 2 iris-example classif.kknn.tuned 0.9200000 0.08683473\n# 3 Sonar-example classif.ksvm.tuned 0.5289307 0.50000000\n# 4 Sonar-example classif.kknn.tuned 0.8077080 0.19549714 The print method for the BenchmarkResult shows the aggregated performances\nfrom the outer resampling loop. As you might recall, mlr offers several accessor function to extract information from\nthe benchmark result.\nThese are listed on the help page of BenchmarkResult and many examples are shown on the\ntutorial page about benchmark experiments . The performance values in individual outer resampling runs can be obtained by getBMRPerformances .\nNote that, since we used different outer resampling strategies for the two tasks, the number\nof rows per task differ. getBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter acc ber\n# 1 iris-example classif.ksvm.tuned 1 0.9400000 0.05882353\n# 2 iris-example classif.kknn.tuned 1 0.9200000 0.08683473\n# 3 Sonar-example classif.ksvm.tuned 1 0.5373134 0.50000000\n# 4 Sonar-example classif.ksvm.tuned 2 0.5205479 0.50000000\n# 5 Sonar-example classif.kknn.tuned 1 0.8208955 0.18234767\n# 6 Sonar-example classif.kknn.tuned 2 0.7945205 0.20864662 The results from the parameter tuning can be obtained through function getBMRTuneResults . getBMRTuneResults(res)\n# $`iris-example`\n# $`iris-example`$classif.ksvm.tuned\n# $`iris-example`$classif.ksvm.tuned[[1]]\n# Tune result:\n# Op. pars: C=0.5; sigma=0.5\n# mmce.test.mean=0.0588\n# \n# \n# $`iris-example`$classif.kknn.tuned\n# $`iris-example`$classif.kknn.tuned[[1]]\n# Tune result:\n# Op. pars: k=3\n# mmce.test.mean=0.049\n# \n# \n# \n# $`Sonar-example`\n# $`Sonar-example`$classif.ksvm.tuned\n# $`Sonar-example`$classif.ksvm.tuned[[1]]\n# Tune result:\n# Op. pars: C=1; sigma=2\n# mmce.test.mean=0.343\n# \n# $`Sonar-example`$classif.ksvm.tuned[[2]]\n# Tune result:\n# Op. pars: C=2; sigma=0.5\n# mmce.test.mean= 0.2\n# \n# \n# $`Sonar-example`$classif.kknn.tuned\n# $`Sonar-example`$classif.kknn.tuned[[1]]\n# Tune result:\n# Op. pars: k=4\n# mmce.test.mean=0.11\n# \n# $`Sonar-example`$classif.kknn.tuned[[2]]\n# Tune result:\n# Op. pars: k=3\n# mmce.test.mean=0.0667 As for several other accessor functions a clearer representation as data.frame \ncan be achieved by setting as.df = TRUE . getBMRTuneResults(res, as.df = TRUE)\n# task.id learner.id iter C sigma mmce.test.mean k\n# 1 iris-example classif.ksvm.tuned 1 0.5 0.5 0.05882353 NA\n# 2 iris-example classif.kknn.tuned 1 NA NA 0.04901961 3\n# 3 Sonar-example classif.ksvm.tuned 1 1.0 2.0 0.34285714 NA\n# 4 Sonar-example classif.ksvm.tuned 2 2.0 0.5 0.20000000 NA\n# 5 Sonar-example classif.kknn.tuned 1 NA NA 0.10952381 4\n# 6 Sonar-example classif.kknn.tuned 2 NA NA 0.06666667 3 It is also possible to extract the tuning results for individual tasks and learners and,\nas shown in earlier examples, inspect the optimization path . tune.res = getBMRTuneResults(res, task.ids = Sonar-example , learner.ids = classif.ksvm.tuned ,\n as.df = TRUE)\ntune.res\n# task.id learner.id iter C sigma mmce.test.mean\n# 1 Sonar-example classif.ksvm.tuned 1 1 2.0 0.3428571\n# 2 Sonar-example classif.ksvm.tuned 2 2 0.5 0.2000000\n\ngetNestedTuneResultsOptPathDf(res$results[[ Sonar-example ]][[ classif.ksvm.tuned ]])\n# C sigma mmce.test.mean dob eol error.message exec.time iter\n# 1 0.5 0.5 0.3428571 1 NA NA 0.048 1\n# 2 1 0.5 0.3428571 2 NA NA 0.049 1\n# 3 2 0.5 0.3428571 3 NA NA 0.046 1\n# 4 0.5 1 0.3428571 4 NA NA 0.048 1\n# 5 1 1 0.3428571 5 NA NA 0.048 1\n# 6 2 1 0.3428571 6 NA NA 0.049 1\n# 7 0.5 2 0.3428571 7 NA NA 0.048 1\n# 8 1 2 0.3428571 8 NA NA 0.052 1\n# 9 2 2 0.3428571 9 NA NA 0.048 1\n# 10 0.5 0.5 0.2142857 1 NA NA 0.051 2\n# 11 1 0.5 0.2142857 2 NA NA 0.048 2\n# 12 2 0.5 0.2000000 3 NA NA 0.047 2\n# 13 0.5 1 0.2142857 4 NA NA 0.050 2\n# 14 1 1 0.2142857 5 NA NA 0.048 2\n# 15 2 1 0.2142857 6 NA NA 0.050 2\n# 16 0.5 2 0.2142857 7 NA NA 0.051 2\n# 17 1 2 0.2142857 8 NA NA 0.051 2\n# 18 2 2 0.2142857 9 NA NA 0.051 2 Example 2: One task, two learners, feature selection Let's see how we can do feature selection in\na benchmark experiment: ## Feature selection in inner resampling loop\nctrl = makeFeatSelControlSequential(method = sfs )\ninner = makeResampleDesc( Subsample , iters = 2)\nlrn = makeFeatSelWrapper( regr.lm , resampling = inner, control = ctrl, show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner( regr.rpart ), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iters = 2)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n# task.id learner.id mse.test.mean\n# 1 BostonHousing-example regr.rpart 25.86232\n# 2 BostonHousing-example regr.lm.featsel 25.07465 The selected features can be extracted by function getBMRFeatSelResults . getBMRFeatSelResults(res)\n# $`BostonHousing-example`\n# $`BostonHousing-example`$regr.rpart\n# NULL\n# \n# $`BostonHousing-example`$regr.lm.featsel\n# $`BostonHousing-example`$regr.lm.featsel[[1]]\n# FeatSel result:\n# Features (8): crim, zn, chas, nox, rm, dis, ptratio, lstat\n# mse.test.mean=26.7\n# \n# $`BostonHousing-example`$regr.lm.featsel[[2]]\n# FeatSel result:\n# Features (10): crim, zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n# mse.test.mean=24.3 You can access results for individual learners and tasks and inspect them further. feats = getBMRFeatSelResults(res, learner.id = regr.lm.featsel )\nfeats = feats$`BostonHousing-example`$`regr.lm.featsel`\n\n## Selected features in the first outer resampling iteration\nfeats[[1]]$x\n# [1] crim zn chas nox rm dis ptratio \n# [8] lstat \n\n## Resampled performance of the selected feature subset on the first inner training set\nfeats[[1]]$y\n# mse.test.mean \n# 26.72574 As for tuning, you can extract the optimization paths. The resulting data.frame s\ncontain, among others, binary columns for all features, indicating if they were included in the\nlinear regression model, and the corresponding performances. analyzeFeatSelResult gives a clearer overview. opt.paths = lapply(feats, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n# crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 90.16159\n# 2 1 0 0 0 0 0 0 0 0 0 0 0 0 82.85880\n# 3 0 1 0 0 0 0 0 0 0 0 0 0 0 79.55202\n# 4 0 0 1 0 0 0 0 0 0 0 0 0 0 70.02071\n# 5 0 0 0 1 0 0 0 0 0 0 0 0 0 86.93409\n# 6 0 0 0 0 1 0 0 0 0 0 0 0 0 76.32457\n# dob eol error.message exec.time\n# 1 1 2 NA 0.018\n# 2 2 2 NA 0.023\n# 3 2 2 NA 0.023\n# 4 2 2 NA 0.024\n# 5 2 2 NA 0.025\n# 6 2 2 NA 0.023\n\nanalyzeFeatSelResult(feats[[1]])\n# Features : 8\n# Performance : mse.test.mean=26.7\n# crim, zn, chas, nox, rm, dis, ptratio, lstat\n# \n# Path to optimum:\n# - Features: 0 Init : Perf = 90.162 Diff: NA *\n# - Features: 1 Add : lstat Perf = 42.646 Diff: 47.515 *\n# - Features: 2 Add : ptratio Perf = 34.52 Diff: 8.1263 *\n# - Features: 3 Add : rm Perf = 30.454 Diff: 4.066 *\n# - Features: 4 Add : dis Perf = 29.405 Diff: 1.0495 *\n# - Features: 5 Add : nox Perf = 28.059 Diff: 1.3454 *\n# - Features: 6 Add : chas Perf = 27.334 Diff: 0.72499 *\n# - Features: 7 Add : zn Perf = 26.901 Diff: 0.43296 *\n# - Features: 8 Add : crim Perf = 26.726 Diff: 0.17558 *\n# \n# Stopped, because no improving feature was found. Example 3: One task, two learners, feature filtering with tuning Here is a minimal example for feature filtering with tuning of the feature subset size. ## Feature filtering with tuning in the inner resampling loop\nlrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.abs , values = seq_len(getTaskNFeats(bh.task))))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( CV , iter = 2)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner( regr.rpart ), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iter = 3)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n# task.id learner.id mse.test.mean\n# 1 BostonHousing-example regr.rpart 22.11687\n# 2 BostonHousing-example regr.lm.filtered.tuned 23.76666 ## Performances on individual outer test data sets\ngetBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter mse\n# 1 BostonHousing-example regr.rpart 1 23.55486\n# 2 BostonHousing-example regr.rpart 2 20.03453\n# 3 BostonHousing-example regr.rpart 3 22.76121\n# 4 BostonHousing-example regr.lm.filtered.tuned 1 27.51086\n# 5 BostonHousing-example regr.lm.filtered.tuned 2 24.87820\n# 6 BostonHousing-example regr.lm.filtered.tuned 3 18.91091", + "text": "In a benchmark experiment multiple learners are compared on one or several tasks\n(see also the section about benchmarking ).\nNested resampling in benchmark experiments is achieved the same way as in resampling: First, use makeTuneWrapper or makeFeatSelWrapper to generate wrapped Learner s\n with the inner resampling strategies of your choice. Second, call benchmark and specify the outer resampling strategies for all tasks. The inner resampling strategies should be resample descriptions .\nYou can use different inner resampling strategies for different wrapped learners.\nFor example it might be practical to do fewer subsampling or bootstrap iterations for slower\nlearners. If you have larger benchmark experiments you might want to have a look at the section\nabout parallelization . As mentioned in the section about benchmark experiments you can also use\ndifferent resampling strategies for different learning tasks by passing a list of resampling descriptions or instances to benchmark . We will see three examples to show different benchmark settings: Two data sets + two classification algorithms + tuning One data set + two regression algorithms + feature selection One data set + two regression algorithms + feature filtering + tuning Example 1: Two tasks, two learners, tuning Below is a benchmark experiment with two data sets, iris and sonar , and two Learner s, ksvm and kknn , that are both tuned. As inner resampling strategies we use holdout for ksvm and subsampling\nwith 3 iterations for kknn .\nAs outer resampling strategies we take holdout for the iris and bootstrap\nwith 2 iterations for the sonar data ( sonar.task ).\nWe consider the accuracy ( acc ), which is used as tuning criterion, and also\ncalculate the balanced error rate ( ber ). ## List of learning tasks\ntasks = list(iris.task, sonar.task)\n\n## Tune svm in the inner resampling loop\nps = makeParamSet(\n makeDiscreteParam( C , 2^(-1:1)),\n makeDiscreteParam( sigma , 2^(-1:1)))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Holdout )\nlrn1 = makeTuneWrapper( classif.ksvm , resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Tune k-nearest neighbor in inner resampling loop\nps = makeParamSet(makeDiscreteParam( k , 3:5))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( Subsample , iters = 3)\nlrn2 = makeTuneWrapper( classif.kknn , resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(lrn1, lrn2)\n\n## Outer resampling loop\nouter = list(makeResampleDesc( Holdout ), makeResampleDesc( Bootstrap , iters = 2))\nres = benchmark(lrns, tasks, outer, measures = list(acc, ber), show.info = FALSE)\nres\n# task.id learner.id acc.test.mean ber.test.mean\n# 1 iris-example classif.ksvm.tuned 0.9400000 0.05882353\n# 2 iris-example classif.kknn.tuned 0.9200000 0.08683473\n# 3 Sonar-example classif.ksvm.tuned 0.5289307 0.50000000\n# 4 Sonar-example classif.kknn.tuned 0.8077080 0.19549714 The print method for the BenchmarkResult shows the aggregated performances\nfrom the outer resampling loop. As you might recall, mlr offers several accessor function to extract information from\nthe benchmark result.\nThese are listed on the help page of BenchmarkResult and many examples are shown on the\ntutorial page about benchmark experiments . The performance values in individual outer resampling runs can be obtained by getBMRPerformances .\nNote that, since we used different outer resampling strategies for the two tasks, the number\nof rows per task differ. getBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter acc ber\n# 1 iris-example classif.ksvm.tuned 1 0.9400000 0.05882353\n# 2 iris-example classif.kknn.tuned 1 0.9200000 0.08683473\n# 3 Sonar-example classif.ksvm.tuned 1 0.5373134 0.50000000\n# 4 Sonar-example classif.ksvm.tuned 2 0.5205479 0.50000000\n# 5 Sonar-example classif.kknn.tuned 1 0.8208955 0.18234767\n# 6 Sonar-example classif.kknn.tuned 2 0.7945205 0.20864662 The results from the parameter tuning can be obtained through function getBMRTuneResults . getBMRTuneResults(res)\n# $`iris-example`\n# $`iris-example`$classif.ksvm.tuned\n# $`iris-example`$classif.ksvm.tuned[[1]]\n# Tune result:\n# Op. pars: C=0.5; sigma=0.5\n# mmce.test.mean=0.0588\n# \n# \n# $`iris-example`$classif.kknn.tuned\n# $`iris-example`$classif.kknn.tuned[[1]]\n# Tune result:\n# Op. pars: k=3\n# mmce.test.mean=0.049\n# \n# \n# \n# $`Sonar-example`\n# $`Sonar-example`$classif.ksvm.tuned\n# $`Sonar-example`$classif.ksvm.tuned[[1]]\n# Tune result:\n# Op. pars: C=1; sigma=2\n# mmce.test.mean=0.343\n# \n# $`Sonar-example`$classif.ksvm.tuned[[2]]\n# Tune result:\n# Op. pars: C=2; sigma=0.5\n# mmce.test.mean= 0.2\n# \n# \n# $`Sonar-example`$classif.kknn.tuned\n# $`Sonar-example`$classif.kknn.tuned[[1]]\n# Tune result:\n# Op. pars: k=4\n# mmce.test.mean=0.11\n# \n# $`Sonar-example`$classif.kknn.tuned[[2]]\n# Tune result:\n# Op. pars: k=3\n# mmce.test.mean=0.0667 As for several other accessor functions a clearer representation as data.frame \ncan be achieved by setting as.df = TRUE . getBMRTuneResults(res, as.df = TRUE)\n# task.id learner.id iter C sigma mmce.test.mean k\n# 1 iris-example classif.ksvm.tuned 1 0.5 0.5 0.05882353 NA\n# 2 iris-example classif.kknn.tuned 1 NA NA 0.04901961 3\n# 3 Sonar-example classif.ksvm.tuned 1 1.0 2.0 0.34285714 NA\n# 4 Sonar-example classif.ksvm.tuned 2 2.0 0.5 0.20000000 NA\n# 5 Sonar-example classif.kknn.tuned 1 NA NA 0.10952381 4\n# 6 Sonar-example classif.kknn.tuned 2 NA NA 0.06666667 3 It is also possible to extract the tuning results for individual tasks and learners and,\nas shown in earlier examples, inspect the optimization path . tune.res = getBMRTuneResults(res, task.ids = Sonar-example , learner.ids = classif.ksvm.tuned ,\n as.df = TRUE)\ntune.res\n# task.id learner.id iter C sigma mmce.test.mean\n# 1 Sonar-example classif.ksvm.tuned 1 1 2.0 0.3428571\n# 2 Sonar-example classif.ksvm.tuned 2 2 0.5 0.2000000\n\ngetNestedTuneResultsOptPathDf(res$results[[ Sonar-example ]][[ classif.ksvm.tuned ]])\n# C sigma mmce.test.mean dob eol error.message exec.time iter\n# 1 0.5 0.5 0.3428571 1 NA NA 0.056 1\n# 2 1 0.5 0.3428571 2 NA NA 0.052 1\n# 3 2 0.5 0.3428571 3 NA NA 0.048 1\n# 4 0.5 1 0.3428571 4 NA NA 0.049 1\n# 5 1 1 0.3428571 5 NA NA 0.048 1\n# 6 2 1 0.3428571 6 NA NA 0.047 1\n# 7 0.5 2 0.3428571 7 NA NA 0.049 1\n# 8 1 2 0.3428571 8 NA NA 0.052 1\n# 9 2 2 0.3428571 9 NA NA 0.048 1\n# 10 0.5 0.5 0.2142857 1 NA NA 0.052 2\n# 11 1 0.5 0.2142857 2 NA NA 0.052 2\n# 12 2 0.5 0.2000000 3 NA NA 0.048 2\n# 13 0.5 1 0.2142857 4 NA NA 0.049 2\n# 14 1 1 0.2142857 5 NA NA 0.049 2\n# 15 2 1 0.2142857 6 NA NA 0.051 2\n# 16 0.5 2 0.2142857 7 NA NA 0.056 2\n# 17 1 2 0.2142857 8 NA NA 0.056 2\n# 18 2 2 0.2142857 9 NA NA 0.051 2 Example 2: One task, two learners, feature selection Let's see how we can do feature selection in\na benchmark experiment: ## Feature selection in inner resampling loop\nctrl = makeFeatSelControlSequential(method = sfs )\ninner = makeResampleDesc( Subsample , iters = 2)\nlrn = makeFeatSelWrapper( regr.lm , resampling = inner, control = ctrl, show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner( regr.rpart ), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iters = 2)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n# task.id learner.id mse.test.mean\n# 1 BostonHousing-example regr.rpart 25.86232\n# 2 BostonHousing-example regr.lm.featsel 25.07465 The selected features can be extracted by function getBMRFeatSelResults . getBMRFeatSelResults(res)\n# $`BostonHousing-example`\n# $`BostonHousing-example`$regr.rpart\n# NULL\n# \n# $`BostonHousing-example`$regr.lm.featsel\n# $`BostonHousing-example`$regr.lm.featsel[[1]]\n# FeatSel result:\n# Features (8): crim, zn, chas, nox, rm, dis, ptratio, lstat\n# mse.test.mean=26.7\n# \n# $`BostonHousing-example`$regr.lm.featsel[[2]]\n# FeatSel result:\n# Features (10): crim, zn, nox, rm, dis, rad, tax, ptratio, b, lstat\n# mse.test.mean=24.3 You can access results for individual learners and tasks and inspect them further. feats = getBMRFeatSelResults(res, learner.id = regr.lm.featsel )\nfeats = feats$`BostonHousing-example`$`regr.lm.featsel`\n\n## Selected features in the first outer resampling iteration\nfeats[[1]]$x\n# [1] crim zn chas nox rm dis ptratio \n# [8] lstat \n\n## Resampled performance of the selected feature subset on the first inner training set\nfeats[[1]]$y\n# mse.test.mean \n# 26.72574 As for tuning, you can extract the optimization paths. The resulting data.frame s\ncontain, among others, binary columns for all features, indicating if they were included in the\nlinear regression model, and the corresponding performances. analyzeFeatSelResult gives a clearer overview. opt.paths = lapply(feats, function(x) as.data.frame(x$opt.path))\nhead(opt.paths[[1]])\n# crim zn indus chas nox rm age dis rad tax ptratio b lstat mse.test.mean\n# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 90.16159\n# 2 1 0 0 0 0 0 0 0 0 0 0 0 0 82.85880\n# 3 0 1 0 0 0 0 0 0 0 0 0 0 0 79.55202\n# 4 0 0 1 0 0 0 0 0 0 0 0 0 0 70.02071\n# 5 0 0 0 1 0 0 0 0 0 0 0 0 0 86.93409\n# 6 0 0 0 0 1 0 0 0 0 0 0 0 0 76.32457\n# dob eol error.message exec.time\n# 1 1 2 NA 0.019\n# 2 2 2 NA 0.024\n# 3 2 2 NA 0.024\n# 4 2 2 NA 0.024\n# 5 2 2 NA 0.027\n# 6 2 2 NA 0.024\n\nanalyzeFeatSelResult(feats[[1]])\n# Features : 8\n# Performance : mse.test.mean=26.7\n# crim, zn, chas, nox, rm, dis, ptratio, lstat\n# \n# Path to optimum:\n# - Features: 0 Init : Perf = 90.162 Diff: NA *\n# - Features: 1 Add : lstat Perf = 42.646 Diff: 47.515 *\n# - Features: 2 Add : ptratio Perf = 34.52 Diff: 8.1263 *\n# - Features: 3 Add : rm Perf = 30.454 Diff: 4.066 *\n# - Features: 4 Add : dis Perf = 29.405 Diff: 1.0495 *\n# - Features: 5 Add : nox Perf = 28.059 Diff: 1.3454 *\n# - Features: 6 Add : chas Perf = 27.334 Diff: 0.72499 *\n# - Features: 7 Add : zn Perf = 26.901 Diff: 0.43296 *\n# - Features: 8 Add : crim Perf = 26.726 Diff: 0.17558 *\n# \n# Stopped, because no improving feature was found. Example 3: One task, two learners, feature filtering with tuning Here is a minimal example for feature filtering with tuning of the feature subset size. ## Feature filtering with tuning in the inner resampling loop\nlrn = makeFilterWrapper(learner = regr.lm , fw.method = chi.squared )\nps = makeParamSet(makeDiscreteParam( fw.abs , values = seq_len(getTaskNFeats(bh.task))))\nctrl = makeTuneControlGrid()\ninner = makeResampleDesc( CV , iter = 2)\nlrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl,\n show.info = FALSE)\n\n## Learners\nlrns = list(makeLearner( regr.rpart ), lrn)\n\n## Outer resampling loop\nouter = makeResampleDesc( Subsample , iter = 3)\nres = benchmark(tasks = bh.task, learners = lrns, resampling = outer, show.info = FALSE)\n\nres\n# task.id learner.id mse.test.mean\n# 1 BostonHousing-example regr.rpart 22.11687\n# 2 BostonHousing-example regr.lm.filtered.tuned 23.76666 ## Performances on individual outer test data sets\ngetBMRPerformances(res, as.df = TRUE)\n# task.id learner.id iter mse\n# 1 BostonHousing-example regr.rpart 1 23.55486\n# 2 BostonHousing-example regr.rpart 2 20.03453\n# 3 BostonHousing-example regr.rpart 3 22.76121\n# 4 BostonHousing-example regr.lm.filtered.tuned 1 27.51086\n# 5 BostonHousing-example regr.lm.filtered.tuned 2 24.87820\n# 6 BostonHousing-example regr.lm.filtered.tuned 3 18.91091", "title": "Benchmark experiments" }, { "location": "/cost_sensitive_classif/index.html", - "text": "Cost-Sensitive Classification\n\n\nIn \nregular classification\n the aim is to minimize the misclassification rate and\nthus all types of misclassification errors are deemed equally severe.\nA more general setting is \ncost-sensitive classification\n where the costs caused by different\nkinds of errors are not assumed to be equal and the objective is to minimize the expected costs.\n\n\nIn case of \nclass-dependent costs\n the costs depend on the true and predicted class label.\nThe costs \nc(k, l)\n for predicting class \nk\n if the true label is \nl\n are usually organized\ninto a \nK \\times K\n cost matrix where \nK\n is the number of classes.\nNaturally, it is assumed that the cost of predicting the correct class label \ny\n is minimal\n(that is \nc(y, y) \\leq c(k, y)\n for all \nk = 1,\\ldots,K\n).\n\n\nA further generalization of this scenario are \nexample-dependent misclassification costs\n where\neach example \n(x, y)\n is coupled with an individual cost vector of length \nK\n. Its \nk\n-th\ncomponent expresses the cost of assigning \nx\n to class \nk\n.\nA real-world example is fraud detection where the costs do not only depend on the true and\npredicted status fraud/non-fraud, but also on the amount of money involved in each case.\nNaturally, the cost of predicting the true class label \ny\n is assumed to be minimum.\nThe true class labels are redundant information, as they can be easily inferred from the\ncost vectors.\nMoreover, given the cost vector, the expected costs do not depend on the true class label \ny\n.\nThe classification problem is therefore completely defined by the feature values \nx\n and the\ncorresponding cost vectors.\n\n\nIn the following we show ways to handle cost-sensitive classification problems in \nmlr\n.\nSome of the functionality is currently experimental, and there may be changes in the future.\n\n\nClass-dependent misclassification costs\n\n\nThere are some classification methods that can accomodate misclassification costs\ndirectly.\nOne example is \nrpart\n.\n\n\nAlternatively, we can use cost-insensitive methods and manipulate the predictions or the\ntraining data in order to take misclassification costs into account.\n\nmlr\n supports \nthresholding\n and \nrebalancing\n.\n\n\n\n\n\n\nThresholding\n:\n The thresholds used to turn posterior probabilities into class labels are chosen such that\n the costs are minimized.\n This requires a \nLearner\n that can predict posterior probabilities.\n During training the costs are not taken into account.\n\n\n\n\n\n\nRebalancing\n:\n The idea is to change the proportion of the classes in the training data set in order to\n account for costs during training, either by \nweighting\n or by \nsampling\n.\n Rebalancing does not require that the \nLearner\n can predict probabilities.\n\n\ni. For \nweighting\n we need a \nLearner\n that supports class weights or observation\n weights.\n\n\nii. If the \nLearner\n cannot deal with weights the proportion of classes can\n be changed by \nover-\n and \nundersampling\n.\n\n\n\n\n\n\nWe start with binary classification problems and afterwards deal with multi-class problems.\n\n\nBinary classification problems\n\n\nThe positive and negative classes are labeled \n1\n and \n-1\n, respectively, and we consider the\nfollowing cost matrix where the rows indicate true classes and the columns predicted classes:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\n\n\n+1\n\n\n\n\n\n\n-1\n\n\n\n\n\n\n\n\n\n\n+1\n\n\n\n\n\n\nc(+1,+1)\n\n\n\n\n\n\nc(-1,+1)\n\n\n\n\n\n\n\n\n\n\n-1\n\n\n\n\n\n\nc(+1,-1)\n\n\n\n\n\n\nc(-1,-1)\n\n\n\n\n\n\n\n\n\n\nOften, the diagonal entries are zero or the cost matrix is rescaled to achieve zeros in the diagonal\n(see for example \nO'Brien et al, 2008\n).\n\n\nA well-known cost-sensitive classification problem is posed by the\n\nGerman Credit data set\n\n(see also the \nUCI Machine Learning Repository\n).\nThe corresponding cost matrix (though \nElkan (2001)\n\nargues that this matrix is economically unreasonable) is given as:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\nBad\n\n\nGood\n\n\n\n\n\n\nBad\n\n\n0\n\n\n5\n\n\n\n\n\n\nGood\n\n\n1\n\n\n0\n\n\n\n\n\n\n\n\nAs in the table above, the rows indicate true and the columns predicted classes.\n\n\nIn case of class-dependent costs it is sufficient to generate an ordinary \nClassifTask\n.\nA \nCostSensTask\n is only needed if the costs are example-dependent.\nIn the \nR\n code below we create the \nClassifTask\n, remove two constant features from the\ndata set and generate the cost matrix.\nPer default, Bad is the positive class.\n\n\ndata(GermanCredit, package = \ncaret\n)\ncredit.task = makeClassifTask(data = GermanCredit, target = \nClass\n)\ncredit.task = removeConstantFeatures(credit.task)\n#\n Removing 2 columns: Purpose.Vacation,Personal.Female.Single\ncredit.task\n#\n Supervised task: GermanCredit\n#\n Type: classif\n#\n Target: Class\n#\n Observations: 1000\n#\n Features:\n#\n numerics factors ordered \n#\n 59 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 2\n#\n Bad Good \n#\n 300 700 \n#\n Positive class: Bad\n\ncosts = matrix(c(0, 1, 5, 0), 2)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(credit.task)\ncosts\n#\n Bad Good\n#\n Bad 0 5\n#\n Good 1 0\n\n\n\n\n1. Thresholding\n\n\nWe start by fitting a \nlogistic regression model\n to the\n\nGerman credit data set\n and predict posterior probabilities.\n\n\n## Train and predict posterior probabilities\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, trace = FALSE)\nmod = train(lrn, credit.task)\npred = predict(mod, task = credit.task)\npred\n#\n Prediction: 1000 observations\n#\n predict.type: prob\n#\n threshold: Bad=0.50,Good=0.50\n#\n time: 0.01\n#\n id truth prob.Bad prob.Good response\n#\n 1 1 Good 0.03525092 0.9647491 Good\n#\n 2 2 Bad 0.63222363 0.3677764 Bad\n#\n 3 3 Good 0.02807414 0.9719259 Good\n#\n 4 4 Good 0.25182703 0.7481730 Good\n#\n 5 5 Bad 0.75193275 0.2480673 Bad\n#\n 6 6 Good 0.26230149 0.7376985 Good\n\n\n\n\nThe default thresholds for both classes are 0.5.\nBut according to the cost matrix we should predict class Good only if we are very sure that Good\nis indeed the correct label. Therefore we should increase the threshold for class Good and decrease the\nthreshold for class Bad.\n\n\ni. Theoretical thresholding\n\n\nThe theoretical threshold for the \npositive\n class can be calculated from the cost matrix as\n\nt^* = \\frac{c(+1,-1) - c(-1,-1)}{c(+1,-1) - c(+1,+1) + c(-1,+1) - c(-1,-1)}.\n\nFor more details see \nElkan (2001)\n.\n\n\nBelow the theoretical threshold for the \nGerman credit example\n\nis calculated and used to predict class labels.\nSince the diagonal of the cost matrix is zero the formula given above simplifies accordingly.\n\n\n## Calculate the theoretical threshold for the positive class\nth = costs[2,1]/(costs[2,1] + costs[1,2])\nth\n#\n [1] 0.1666667\n\n\n\n\nAs you may recall you can change thresholds in \nmlr\n either before training by using the\n\npredict.threshold\n option of \nmakeLearner\n or after prediction by calling \nsetThreshold\n\non the \nPrediction\n object.\n\n\nAs we already have a prediction we use the \nsetThreshold\n function. It returns an altered\n\nPrediction\n object with class predictions for the theoretical threshold.\n\n\n## Predict class labels according to the theoretical threshold\npred.th = setThreshold(pred, th)\npred.th\n#\n Prediction: 1000 observations\n#\n predict.type: prob\n#\n threshold: Bad=0.17,Good=0.83\n#\n time: 0.01\n#\n id truth prob.Bad prob.Good response\n#\n 1 1 Good 0.03525092 0.9647491 Good\n#\n 2 2 Bad 0.63222363 0.3677764 Bad\n#\n 3 3 Good 0.02807414 0.9719259 Good\n#\n 4 4 Good 0.25182703 0.7481730 Bad\n#\n 5 5 Bad 0.75193275 0.2480673 Bad\n#\n 6 6 Good 0.26230149 0.7376985 Bad\n\n\n\n\nIn order to calculate the average costs over the entire data set we first need to create a new\nperformance \nMeasure\n. This can be done through function \nmakeCostMeasure\n\nwhich requires the \nClassifTask\n object and the cost matrix (argument \ncosts\n).\nIt is expected that the rows of the cost matrix indicate true and the columns predicted\nclass labels.\n\n\ncredit.costs = makeCostMeasure(id = \ncredit.costs\n, costs = costs, task = credit.task, best = 0, worst = 5)\ncredit.costs\n#\n Name: credit.costs\n#\n Performance measure: credit.costs\n#\n Properties: classif,classif.multi,req.pred,req.truth,predtype.response,predtype.prob\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 5\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nThen the average costs can be computed by function \nperformance\n.\nBelow we compare the average costs and the error rate (\nmmce\n) of the learning algorithm\nwith both default thresholds 0.5 and theoretical thresholds.\n\n\n## Performance with default thresholds 0.5\nperformance(pred, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.774 0.214\n\n## Performance with theoretical thresholds\nperformance(pred.th, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.478 0.346\n\n\n\n\nThese performance values may be overly optimistic as we used the same data set for training\nand prediction, and resampling strategies should be preferred.\nIn the \nR\n code below we make use of the \npredict.threshold\n argument of \nmakeLearner\n to set\nthe threshold before doing a 3-fold cross-validation on the \ncredit.task\n.\nNote that we create a \nResampleInstance\n (\nrin\n) that is used throughout\nthe next several code chunks to get comparable performance values.\n\n\n## Cross-validated performance with theoretical thresholds\nrin = makeResampleInstance(\nCV\n, iters = 3, task = credit.task)\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, predict.threshold = th, trace = FALSE)\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom\n#\n credit.costs.aggr: 0.56\n#\n credit.costs.mean: 0.56\n#\n credit.costs.sd: 0.03\n#\n mmce.aggr: 0.36\n#\n mmce.mean: 0.36\n#\n mmce.sd: 0.02\n#\n Runtime: 0.323072\n\n\n\n\nIf we are also interested in the cross-validated performance for the default threshold values\nwe can call \nsetThreshold\n on the \nresample prediction\n \nr$pred\n.\n\n\n## Cross-validated performance with default thresholds\nperformance(setThreshold(r$pred, 0.5), measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.8521695 0.2480205\n\n\n\n\nTheoretical thresholding is only reliable if the predicted posterior probabilities are correct.\nIf there is bias the thresholds have to be shifted accordingly.\n\n\nUseful in this regard is function \nplotThreshVsPerf\n that you can use to plot the average costs\nas well as any other performance measure versus possible threshold values for the positive\nclass in \n[0,1]\n. The underlying data is generated by \ngenerateThreshVsPerfData\n.\n\n\nThe following plots show the cross-validated costs and error rate (\nmmce\n).\nThe theoretical threshold \nth\n calculated above is indicated by the vertical line.\nAs you can see from the left-hand plot the theoretical threshold seems a bit large.\n\n\nd = generateThreshVsPerfData(r, measures = list(credit.costs, mmce))\nplotThreshVsPerf(d, mark.th = th)\n\n\n\n\n \n\n\nii. Empirical thresholding\n\n\nThe idea of \nempirical thresholding\n (see \nSheng and Ling, 2006\n)\nis to select cost-optimal threshold values for a given learning method based on the training data.\nIn contrast to \ntheoretical thresholding\n it suffices if the estimated posterior probabilities\nare order-correct.\n\n\nIn order to determine optimal threshold values you can use \nmlr\n's function \ntuneThreshold\n.\nAs tuning the threshold on the complete training data set can lead to overfitting, you should\nuse resampling strategies.\nBelow we perform 3-fold cross-validation and use \ntuneThreshold\n to calculate threshold values\nwith lowest average costs over the 3 test data sets.\n\n\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, trace = FALSE)\n\n## 3-fold cross-validation\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom\n#\n credit.costs.aggr: 0.85\n#\n credit.costs.mean: 0.85\n#\n credit.costs.sd: 0.17\n#\n mmce.aggr: 0.25\n#\n mmce.mean: 0.25\n#\n mmce.sd: 0.03\n#\n Runtime: 0.343711\n\n## Tune the threshold based on the predicted probabilities on the 3 test data sets\ntune.res = tuneThreshold(pred = r$pred, measure = credit.costs)\ntune.res\n#\n $th\n#\n [1] 0.1115426\n#\n \n#\n $perf\n#\n credit.costs \n#\n 0.507004\n\n\n\n\ntuneThreshold\n returns the optimal threshold value for the positive class and the corresponding\nperformance.\nAs expected the tuned threshold is smaller than the theoretical threshold.\n\n\n2. Rebalancing\n\n\nIn order to minimize the average costs, observations from the less costly class should be\ngiven higher importance during training.\nThis can be achieved by \nweighting\n the classes, provided that the learner under consideration\nhas a 'class weights' or an 'observation weights' argument.\nTo find out which learning methods support either type of weights have a look at the\n\nlist of integrated learners\n in the Appendix or use \nlistLearners\n.\n\n\n## Learners that accept observation weights\nlistLearners(\nclassif\n, properties = \nweights\n)\n#\n [1] \nclassif.ada\n \nclassif.avNNet\n \nclassif.binomial\n \n#\n [4] \nclassif.blackboost\n \nclassif.cforest\n \nclassif.ctree\n \n#\n [7] \nclassif.extraTrees\n \nclassif.gbm\n \nclassif.glmboost\n \n#\n [10] \nclassif.glmnet\n \nclassif.logreg\n \nclassif.lqa\n \n#\n [13] \nclassif.multinom\n \nclassif.nnet\n \nclassif.plr\n \n#\n [16] \nclassif.probit\n \nclassif.rpart\n \nclassif.xgboost\n\n\n## Learners that can deal with class weights\nlistLearners(\nclassif\n, properties = \nclass.weights\n)\n#\n [1] \nclassif.ksvm\n \nclassif.LiblineaRL1L2SVC\n \n#\n [3] \nclassif.LiblineaRL1LogReg\n \nclassif.LiblineaRL2L1SVC\n \n#\n [5] \nclassif.LiblineaRL2LogReg\n \nclassif.LiblineaRL2SVC\n \n#\n [7] \nclassif.LiblineaRMultiClassSVC\n \nclassif.randomForest\n \n#\n [9] \nclassif.svm\n\n\n\n\n\nAlternatively, \nover- and undersampling\n techniques can be used.\n\n\ni. Weighting\n\n\nJust as \ntheoretical thresholds\n, \ntheoretical weights\n can be calculated from the\ncost matrix.\nIf \nt\n indicates the target threshold and \nt_0\n the original threshold for the positive class the\nproportion of observations in the positive class has to be multiplied by\n\n\\frac{1-t}{t} \\frac{t_0}{1-t_0}.\n\nAlternatively, the proportion of observations in the negative class can be multiplied by\nthe inverse.\nA proof is given by \nElkan (2001)\n.\n\n\nIn most cases, the original threshold is \nt_0 = 0.5\n and thus the second factor vanishes.\nIf additionally the target threshold \nt\n equals the theoretical threshold \nt^*\n the\nproportion of observations in the positive class has to be multiplied by\n\n\\frac{1-t^*}{t^*} = \\frac{c(-1,+1) - c(+1,+1)}{c(+1,-1) - c(-1,-1)}.\n\n\n\n\nFor the \ncredit example\n the theoretical threshold corresponds to a\nweight of 5 for the positive class.\n\n\n## Weight for positive class corresponding to theoretical treshold\nw = (1 - th)/th\nw\n#\n [1] 5\n\n\n\n\nA unified and convenient way to assign class weights to a \nLearner\n (and tune\nthem) is provided by function \nmakeWeightedClassesWrapper\n. The class weights are specified\nusing argument \nwcw.weight\n.\nFor learners that support observation weights a suitable weight vector is then generated\ninternally during training or resampling.\nIf the learner can deal with class weights, the weights are basically passed on to the\nappropriate learner parameter. The advantage of using the wrapper in this case is the unified\nway to specify the class weights.\n\n\nBelow is an example using learner \n\"classif.multinom\"\n (\nmultinom\n from\npackage \nnnet\n) which accepts observation weights.\nFor binary classification problems it is sufficient to specify the weight \nw\n for the positive\nclass. The negative class then automatically receives weight 1.\n\n\n## Weighted learner\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn, wcw.weight = w)\nlrn\n#\n Learner weightedclasses.classif.multinom from package nnet\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: WeightedClassesWrapper\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE,wcw.weight=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: weightedclasses.classif.multinom\n#\n credit.costs.aggr: 0.53\n#\n credit.costs.mean: 0.53\n#\n credit.costs.sd: 0.04\n#\n mmce.aggr: 0.35\n#\n mmce.mean: 0.35\n#\n mmce.sd: 0.02\n#\n Runtime: 0.395105\n\n\n\n\nFor classification methods like \n\"classif.ksvm\"\n (the support vector machine\n\nksvm\n in package \nkernlab\n) that support class weights you can pass them\ndirectly.\n\n\nlrn = makeLearner(\nclassif.ksvm\n, class.weights = c(Bad = w, Good = 1))\n\n\n\n\nOr, more conveniently, you can again use \nmakeWeightedClassesWrapper\n.\n\n\nlrn = makeWeightedClassesWrapper(\nclassif.ksvm\n, wcw.weight = w)\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: weightedclasses.classif.ksvm\n#\n credit.costs.aggr: 0.58\n#\n credit.costs.mean: 0.58\n#\n credit.costs.sd: 0.04\n#\n mmce.aggr: 0.31\n#\n mmce.mean: 0.31\n#\n mmce.sd: 0.02\n#\n Runtime: 0.546966\n\n\n\n\nJust like the theoretical threshold, the theoretical weights may not always be suitable,\ntherefore you can tune the weight for the positive class as shown in the following example.\nCalculating the theoretical weight beforehand may help to narrow down the search interval.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\nps = makeParamSet(makeDiscreteParam(\nwcw.weight\n, seq(4, 12, 0.5)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, resampling = rin, par.set = ps,\n measures = list(credit.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: wcw.weight=7.5\n#\n credit.costs.test.mean=0.501,mmce.test.mean=0.381\n\nas.data.frame(tune.res$opt.path)[1:3]\n#\n wcw.weight credit.costs.test.mean mmce.test.mean\n#\n 1 4 0.5650291 0.3330127\n#\n 2 4.5 0.5550251 0.3430167\n#\n 3 5 0.5260320 0.3460197\n#\n 4 5.5 0.5130070 0.3530147\n#\n 5 6 0.5160100 0.3640137\n#\n 6 6.5 0.5160160 0.3720157\n#\n 7 7 0.5040250 0.3760167\n#\n 8 7.5 0.5010040 0.3810038\n#\n 9 8 0.5100130 0.3900128\n#\n 10 8.5 0.5100070 0.3940108\n#\n 11 9 0.5110080 0.4030078\n#\n 12 9.5 0.5160130 0.4080128\n#\n 13 10 0.5260140 0.4180138\n#\n 14 10.5 0.5240060 0.4200098\n#\n 15 11 0.5319991 0.4280029\n#\n 16 11.5 0.5289901 0.4330019\n#\n 17 12 0.5249801 0.4369999\n\n\n\n\nii. Over- and undersampling\n\n\nIf the \nLearner\n supports neither observation nor class weights the proportions\nof the classes in the training data can be changed by over- or undersampling.\n\n\nIn the \nGermanCredit data set\n the positive class Bad should receive\na theoretical weight of \nw = (1 - th)/th = 5\n.\nThis can be achieved by oversampling class Bad with a \nrate\n of 5 (see also the documentation\nof function \noversample\n).\n\n\ncredit.task.over = oversample(credit.task, rate = w)\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nmod = train(lrn, credit.task.over)\npred = predict(mod, task = credit.task)\nperformance(pred, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.439 0.323\n\n\n\n\nNote that in the above example the learner was trained on the oversampled task \ncredit.task.over\n.\nIn order to get the training performance on the original task predictions were calculated for \ncredit.task\n.\n\n\nWe usually prefer resampled performance values, but simply calling \nresample\n on the oversampled\ntask does not work since predictions have to be based on the original task.\nThe solution is to create a wrapped \nLearner\n via function\n\nmakeOversampleWrapper\n.\nInternally, \noversample\n is called before training, but predictions are done on the original data.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeOversampleWrapper(lrn, osw.rate = w)\nlrn\n#\n Learner classif.multinom.oversampled from package mlr,nnet\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: OversampleWrapper\n#\n Properties: numerics,factors,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE,osw.rate=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom.oversampled\n#\n credit.costs.aggr: 0.56\n#\n credit.costs.mean: 0.56\n#\n credit.costs.sd: 0.05\n#\n mmce.aggr: 0.35\n#\n mmce.mean: 0.35\n#\n mmce.sd: 0.02\n#\n Runtime: 0.696635\n\n\n\n\nOf course, we can also tune the oversampling rate. For this purpose we again have to create\nan \nOversampleWrapper\n.\nOptimal values for parameter \nosw.rate\n can be obtained using function \ntuneParams\n.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeOversampleWrapper(lrn)\nps = makeParamSet(makeDiscreteParam(\nosw.rate\n, seq(3, 7, 0.25)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, rin, par.set = ps, measures = list(credit.costs, mmce),\n control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: osw.rate=7\n#\n credit.costs.test.mean=0.506,mmce.test.mean=0.37\n\n\n\n\nMulti-class problems\n\n\nWe consider the \nwaveform\n data set from package \nmlbench\n and\nadd an artificial cost matrix:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\n1\n\n\n2\n\n\n3\n\n\n\n\n\n\n1\n\n\n0\n\n\n30\n\n\n80\n\n\n\n\n\n\n2\n\n\n5\n\n\n0\n\n\n4\n\n\n\n\n\n\n3\n\n\n10\n\n\n8\n\n\n0\n\n\n\n\n\n\n\n\nWe start by creating the \nTask\n, the cost matrix and the corresponding performance measure.\n\n\n## Task\ndf = mlbench::mlbench.waveform(500)\nwf.task = makeClassifTask(id = \nwaveform\n, data = as.data.frame(df), target = \nclasses\n)\n\n## Cost matrix\ncosts = matrix(c(0, 5, 10, 30, 0, 8, 80, 4, 0), 3)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(wf.task)\n\n## Performance measure\nwf.costs = makeCostMeasure(id = \nwf.costs\n, costs = costs, task = wf.task, best = 0,\n worst = 10)\n\n\n\n\nIn the multi-class case, both, \nthresholding\n and \nrebalancing\n correspond to cost matrices\nof a certain structure where \nc(k,l) = c(l)\n for \nk\n, \nl = 1, \\ldots, K\n, \nk \\neq l\n.\nThis condition means that the cost of misclassifying an observation is independent of the\npredicted class label\n(see \nDomingos, 1999\n).\nGiven a cost matrix of this type, theoretical thresholds and weights can be derived\nin a similar manner as in the binary case.\nObviously, the cost matrix given above does not have this special structure.\n\n\n1. Thresholding\n\n\nGiven a vector of positive threshold values as long as the number of classes \nK\n, the predicted\nprobabilities for all classes are adjusted by dividing them by the corresponding threshold value.\nThen the class with the highest adjusted probability is predicted.\nThis way, as in the binary case, classes with a low threshold are preferred to classes\nwith a larger threshold.\n\n\nAgain this can be done by function \nsetThreshold\n as shown in the following example (or\nalternatively by the \npredict.threshold\n option of \nmakeLearner\n).\nNote that the threshold vector needs to have names that correspond to the class labels.\n\n\nlrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nrin = makeResampleInstance(\nCV\n, iters = 3, task = wf.task)\nr = resample(lrn, wf.task, rin, measures = list(wf.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: waveform\n#\n Learner: classif.rpart\n#\n wf.costs.aggr: 8.40\n#\n wf.costs.mean: 8.40\n#\n wf.costs.sd: 1.57\n#\n mmce.aggr: 0.30\n#\n mmce.mean: 0.30\n#\n mmce.sd: 0.00\n#\n Runtime: 0.0716846\n\n## Calculate thresholds as 1/(average costs of true classes)\nth = 2/rowSums(costs)\nnames(th) = getTaskClassLevels(wf.task)\nth\n#\n 1 2 3 \n#\n 0.01818182 0.22222222 0.11111111\n\npred.th = setThreshold(r$pred, threshold = th)\nperformance(pred.th, measures = list(wf.costs, mmce))\n#\n wf.costs mmce \n#\n 5.7498377 0.3598947\n\n\n\n\nThe threshold vector \nth\n in the above example is chosen according to the average costs\nof the true classes 55, 4.5 and 9.\nMore exactly, \nth\n corresponds to an artificial cost matrix of the structure mentioned\nabove with off-diagonal elements \nc(2,1) = c(3,1) = 55\n, \nc(1,2) = c(3,2) = 4.5\n and\n\nc(1,3) = c(2,3) = 9\n.\nThis threshold vector may be not optimal but leads to smaller total costs on the data set than\nthe default.\n\n\nii. Empirical thresholding\n\n\nAs in the binary case it is possible to tune the threshold vector using function \ntuneThreshold\n.\nSince the scaling of the threshold vector does not change the predicted class labels\n\ntuneThreshold\n returns threshold values that lie in [0,1] and sum to unity.\n\n\ntune.res = tuneThreshold(pred = r$pred, measure = wf.costs)\ntune.res\n#\n $th\n#\n 1 2 3 \n#\n 0.03734676 0.31826144 0.64439179 \n#\n \n#\n $perf\n#\n [1] 4.275197\n\n\n\n\nFor comparison we show the standardized version of the theoretically motivated threshold\nvector chosen above.\n\n\nth/sum(th)\n#\n 1 2 3 \n#\n 0.05172414 0.63218391 0.31609195\n\n\n\n\n2. Rebalancing\n\n\ni. Weighting\n\n\nIn the multi-class case you have to pass a vector of weights as long as the number of classes\n\nK\n to function \nmakeWeightedClassesWrapper\n.\nThe weight vector can be tuned using function \ntuneParams\n.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\n\nps = makeParamSet(makeNumericVectorParam(\nwcw.weight\n, len = 3, lower = 0, upper = 1))\nctrl = makeTuneControlRandom()\n\ntune.res = tuneParams(lrn, wf.task, resampling = rin, par.set = ps,\n measures = list(wf.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: wcw.weight=0.871,0.105,0.0306\n#\n wf.costs.test.mean= 2.7,mmce.test.mean=0.242\n\n\n\n\nExample-dependent misclassification costs\n\n\nIn case of example-dependent costs we have to create a special \nTask\n via function\n\nmakeCostSensTask\n.\nFor this purpose the feature values \nx\n and an \nn \\times K\n\n\ncost\n matrix that contains\nthe cost vectors for all \nn\n examples in the data set are required.\n\n\nWe use the \niris\n data and generate an artificial cost matrix\n(see \nBeygelzimer et al., 2005\n).\n\n\ndf = iris\ncost = matrix(runif(150 * 3, 0, 2000), 150) * (1 - diag(3))[df$Species,] + runif(150, 0, 10)\ncolnames(cost) = levels(iris$Species)\nrownames(cost) = rownames(iris)\ndf$Species = NULL\n\ncostsens.task = makeCostSensTask(id = \niris\n, data = df, cost = cost)\ncostsens.task\n#\n Supervised task: iris\n#\n Type: costsens\n#\n Observations: 150\n#\n Features:\n#\n numerics factors ordered \n#\n 4 0 0 \n#\n Missings: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 3\n#\n setosa, versicolor, virginica\n\n\n\n\nmlr\n provides several \nwrappers\n to turn regular classification or regression methods\ninto \nLearner\ns that can deal with example-dependent costs.\n\n\n\n\nmakeCostSensClassifWrapper\n (wraps a classification \nLearner\n):\n This is a naive approach where the costs are coerced into class labels by choosing the\n class label with minimum cost for each example. Then a regular classification method is\n used.\n\n\nmakeCostSensRegrWrapper\n (wraps a regression \nLearner\n):\n An individual regression model is fitted for the costs of each class.\n In the prediction step first the costs are predicted for all classes and then the class with\n the lowest predicted costs is selected.\n\n\nmakeCostSensWeightedPairsWrapper\n (wraps a classification \nLearner\n):\n This is also known as \ncost-sensitive one-vs-one\n (CS-OVO) and the most sophisticated of\n the currently supported methods.\n For each pair of classes, a binary classifier is fitted.\n For each observation the class label is defined as the element of the pair with minimal costs.\n During fitting, the observations are weighted with the absolute difference in costs.\n Prediction is performed by simple voting.\n\n\n\n\nIn the following example we use the third method. We create the wrapped \nLearner\n\nand train it on the \nCostSensTask\n defined above.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeCostSensWeightedPairsWrapper(lrn)\nlrn\n#\n Learner costsens.classif.multinom from package nnet\n#\n Type: costsens\n#\n Name: ; Short name: \n#\n Class: CostSensWeightedPairsWrapper\n#\n Properties: twoclass,multiclass,numerics,factors\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE\n\nmod = train(lrn, costsens.task)\nmod\n#\n Model for learner.id=costsens.classif.multinom; learner.class=CostSensWeightedPairsWrapper\n#\n Trained on: task.id = iris; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n\n\n\n\nThe models corresponding to the individual pairs can be accessed by function\n\ngetLearnerModel\n.\n\n\ngetLearnerModel(mod)\n#\n [[1]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n#\n \n#\n [[2]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n#\n \n#\n [[3]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n\n\n\n\nmlr\n provides some performance measures for example-specific cost-sensitive classification.\nIn the following example we calculate the mean costs of the predicted class labels\n(\nmeancosts\n) and the misclassification penalty (\nmcp\n).\nThe latter measure is the average difference between the costs caused by the predicted\nclass labels, i.e., \nmeancosts\n, and the costs resulting from choosing the\nclass with lowest cost for each observation.\nIn order to compute these measures the costs for the test observations are required and\ntherefore the \nTask\n has to be passed to \nperformance\n.\n\n\npred = predict(mod, task = costsens.task)\npred\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: 0.05\n#\n id response\n#\n 1 1 setosa\n#\n 2 2 setosa\n#\n 3 3 setosa\n#\n 4 4 setosa\n#\n 5 5 setosa\n#\n 6 6 setosa\n\nperformance(pred, measures = list(meancosts, mcp), task = costsens.task)\n#\n meancosts mcp \n#\n 163.0830 158.2944", + "text": "Cost-Sensitive Classification\n\n\nIn \nregular classification\n the aim is to minimize the misclassification rate and\nthus all types of misclassification errors are deemed equally severe.\nA more general setting is \ncost-sensitive classification\n where the costs caused by different\nkinds of errors are not assumed to be equal and the objective is to minimize the expected costs.\n\n\nIn case of \nclass-dependent costs\n the costs depend on the true and predicted class label.\nThe costs \nc(k, l)\n for predicting class \nk\n if the true label is \nl\n are usually organized\ninto a \nK \\times K\n cost matrix where \nK\n is the number of classes.\nNaturally, it is assumed that the cost of predicting the correct class label \ny\n is minimal\n(that is \nc(y, y) \\leq c(k, y)\n for all \nk = 1,\\ldots,K\n).\n\n\nA further generalization of this scenario are \nexample-dependent misclassification costs\n where\neach example \n(x, y)\n is coupled with an individual cost vector of length \nK\n. Its \nk\n-th\ncomponent expresses the cost of assigning \nx\n to class \nk\n.\nA real-world example is fraud detection where the costs do not only depend on the true and\npredicted status fraud/non-fraud, but also on the amount of money involved in each case.\nNaturally, the cost of predicting the true class label \ny\n is assumed to be minimum.\nThe true class labels are redundant information, as they can be easily inferred from the\ncost vectors.\nMoreover, given the cost vector, the expected costs do not depend on the true class label \ny\n.\nThe classification problem is therefore completely defined by the feature values \nx\n and the\ncorresponding cost vectors.\n\n\nIn the following we show ways to handle cost-sensitive classification problems in \nmlr\n.\nSome of the functionality is currently experimental, and there may be changes in the future.\n\n\nClass-dependent misclassification costs\n\n\nThere are some classification methods that can accomodate misclassification costs\ndirectly.\nOne example is \nrpart\n.\n\n\nAlternatively, we can use cost-insensitive methods and manipulate the predictions or the\ntraining data in order to take misclassification costs into account.\n\nmlr\n supports \nthresholding\n and \nrebalancing\n.\n\n\n\n\n\n\nThresholding\n:\n The thresholds used to turn posterior probabilities into class labels are chosen such that\n the costs are minimized.\n This requires a \nLearner\n that can predict posterior probabilities.\n During training the costs are not taken into account.\n\n\n\n\n\n\nRebalancing\n:\n The idea is to change the proportion of the classes in the training data set in order to\n account for costs during training, either by \nweighting\n or by \nsampling\n.\n Rebalancing does not require that the \nLearner\n can predict probabilities.\n\n\ni. For \nweighting\n we need a \nLearner\n that supports class weights or observation\n weights.\n\n\nii. If the \nLearner\n cannot deal with weights the proportion of classes can\n be changed by \nover-\n and \nundersampling\n.\n\n\n\n\n\n\nWe start with binary classification problems and afterwards deal with multi-class problems.\n\n\nBinary classification problems\n\n\nThe positive and negative classes are labeled \n1\n and \n-1\n, respectively, and we consider the\nfollowing cost matrix where the rows indicate true classes and the columns predicted classes:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\n\n\n+1\n\n\n\n\n\n\n-1\n\n\n\n\n\n\n\n\n\n\n+1\n\n\n\n\n\n\nc(+1,+1)\n\n\n\n\n\n\nc(-1,+1)\n\n\n\n\n\n\n\n\n\n\n-1\n\n\n\n\n\n\nc(+1,-1)\n\n\n\n\n\n\nc(-1,-1)\n\n\n\n\n\n\n\n\n\n\nOften, the diagonal entries are zero or the cost matrix is rescaled to achieve zeros in the diagonal\n(see for example \nO'Brien et al, 2008\n).\n\n\nA well-known cost-sensitive classification problem is posed by the\n\nGerman Credit data set\n\n(see also the \nUCI Machine Learning Repository\n).\nThe corresponding cost matrix (though \nElkan (2001)\n\nargues that this matrix is economically unreasonable) is given as:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\nBad\n\n\nGood\n\n\n\n\n\n\nBad\n\n\n0\n\n\n5\n\n\n\n\n\n\nGood\n\n\n1\n\n\n0\n\n\n\n\n\n\n\n\nAs in the table above, the rows indicate true and the columns predicted classes.\n\n\nIn case of class-dependent costs it is sufficient to generate an ordinary \nClassifTask\n.\nA \nCostSensTask\n is only needed if the costs are example-dependent.\nIn the \nR\n code below we create the \nClassifTask\n, remove two constant features from the\ndata set and generate the cost matrix.\nPer default, Bad is the positive class.\n\n\ndata(GermanCredit, package = \ncaret\n)\ncredit.task = makeClassifTask(data = GermanCredit, target = \nClass\n)\ncredit.task = removeConstantFeatures(credit.task)\n#\n Removing 2 columns: Purpose.Vacation,Personal.Female.Single\n\ncredit.task\n#\n Supervised task: GermanCredit\n#\n Type: classif\n#\n Target: Class\n#\n Observations: 1000\n#\n Features:\n#\n numerics factors ordered \n#\n 59 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 2\n#\n Bad Good \n#\n 300 700 \n#\n Positive class: Bad\n\ncosts = matrix(c(0, 1, 5, 0), 2)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(credit.task)\ncosts\n#\n Bad Good\n#\n Bad 0 5\n#\n Good 1 0\n\n\n\n\n1. Thresholding\n\n\nWe start by fitting a \nlogistic regression model\n to the\n\nGerman credit data set\n and predict posterior probabilities.\n\n\n## Train and predict posterior probabilities\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, trace = FALSE)\nmod = train(lrn, credit.task)\npred = predict(mod, task = credit.task)\npred\n#\n Prediction: 1000 observations\n#\n predict.type: prob\n#\n threshold: Bad=0.50,Good=0.50\n#\n time: 0.01\n#\n id truth prob.Bad prob.Good response\n#\n 1 1 Good 0.03525092 0.9647491 Good\n#\n 2 2 Bad 0.63222363 0.3677764 Bad\n#\n 3 3 Good 0.02807414 0.9719259 Good\n#\n 4 4 Good 0.25182703 0.7481730 Good\n#\n 5 5 Bad 0.75193275 0.2480673 Bad\n#\n 6 6 Good 0.26230149 0.7376985 Good\n\n\n\n\nThe default thresholds for both classes are 0.5.\nBut according to the cost matrix we should predict class Good only if we are very sure that Good\nis indeed the correct label. Therefore we should increase the threshold for class Good and decrease the\nthreshold for class Bad.\n\n\ni. Theoretical thresholding\n\n\nThe theoretical threshold for the \npositive\n class can be calculated from the cost matrix as\n\nt^* = \\frac{c(+1,-1) - c(-1,-1)}{c(+1,-1) - c(+1,+1) + c(-1,+1) - c(-1,-1)}.\n\nFor more details see \nElkan (2001)\n.\n\n\nBelow the theoretical threshold for the \nGerman credit example\n\nis calculated and used to predict class labels.\nSince the diagonal of the cost matrix is zero the formula given above simplifies accordingly.\n\n\n## Calculate the theoretical threshold for the positive class\nth = costs[2,1]/(costs[2,1] + costs[1,2])\nth\n#\n [1] 0.1666667\n\n\n\n\nAs you may recall you can change thresholds in \nmlr\n either before training by using the\n\npredict.threshold\n option of \nmakeLearner\n or after prediction by calling \nsetThreshold\n\non the \nPrediction\n object.\n\n\nAs we already have a prediction we use the \nsetThreshold\n function. It returns an altered\n\nPrediction\n object with class predictions for the theoretical threshold.\n\n\n## Predict class labels according to the theoretical threshold\npred.th = setThreshold(pred, th)\npred.th\n#\n Prediction: 1000 observations\n#\n predict.type: prob\n#\n threshold: Bad=0.17,Good=0.83\n#\n time: 0.01\n#\n id truth prob.Bad prob.Good response\n#\n 1 1 Good 0.03525092 0.9647491 Good\n#\n 2 2 Bad 0.63222363 0.3677764 Bad\n#\n 3 3 Good 0.02807414 0.9719259 Good\n#\n 4 4 Good 0.25182703 0.7481730 Bad\n#\n 5 5 Bad 0.75193275 0.2480673 Bad\n#\n 6 6 Good 0.26230149 0.7376985 Bad\n\n\n\n\nIn order to calculate the average costs over the entire data set we first need to create a new\nperformance \nMeasure\n. This can be done through function \nmakeCostMeasure\n\nwhich requires the \nClassifTask\n object and the cost matrix (argument \ncosts\n).\nIt is expected that the rows of the cost matrix indicate true and the columns predicted\nclass labels.\n\n\ncredit.costs = makeCostMeasure(id = \ncredit.costs\n, name = \nCredit costs\n, costs = costs, task = credit.task,\n best = 0, worst = 5)\ncredit.costs\n#\n Name: Credit costs\n#\n Performance measure: credit.costs\n#\n Properties: classif,classif.multi,req.pred,req.truth,predtype.response,predtype.prob\n#\n Minimize: TRUE\n#\n Best: 0; Worst: 5\n#\n Aggregated by: test.mean\n#\n Note:\n\n\n\n\nThen the average costs can be computed by function \nperformance\n.\nBelow we compare the average costs and the error rate (\nmmce\n) of the learning algorithm\nwith both default thresholds 0.5 and theoretical thresholds.\n\n\n## Performance with default thresholds 0.5\nperformance(pred, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.774 0.214\n\n## Performance with theoretical thresholds\nperformance(pred.th, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.478 0.346\n\n\n\n\nThese performance values may be overly optimistic as we used the same data set for training\nand prediction, and resampling strategies should be preferred.\nIn the \nR\n code below we make use of the \npredict.threshold\n argument of \nmakeLearner\n to set\nthe threshold before doing a 3-fold cross-validation on the \ncredit.task\n.\nNote that we create a \nResampleInstance\n (\nrin\n) that is used throughout\nthe next several code chunks to get comparable performance values.\n\n\n## Cross-validated performance with theoretical thresholds\nrin = makeResampleInstance(\nCV\n, iters = 3, task = credit.task)\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, predict.threshold = th, trace = FALSE)\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom\n#\n credit.costs.aggr: 0.56\n#\n credit.costs.mean: 0.56\n#\n credit.costs.sd: 0.03\n#\n mmce.aggr: 0.36\n#\n mmce.mean: 0.36\n#\n mmce.sd: 0.02\n#\n Runtime: 0.385333\n\n\n\n\nIf we are also interested in the cross-validated performance for the default threshold values\nwe can call \nsetThreshold\n on the \nresample prediction\n \nr$pred\n.\n\n\n## Cross-validated performance with default thresholds\nperformance(setThreshold(r$pred, 0.5), measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.8521695 0.2480205\n\n\n\n\nTheoretical thresholding is only reliable if the predicted posterior probabilities are correct.\nIf there is bias the thresholds have to be shifted accordingly.\n\n\nUseful in this regard is function \nplotThreshVsPerf\n that you can use to plot the average costs\nas well as any other performance measure versus possible threshold values for the positive\nclass in \n[0,1]\n. The underlying data is generated by \ngenerateThreshVsPerfData\n.\n\n\nThe following plots show the cross-validated costs and error rate (\nmmce\n).\nThe theoretical threshold \nth\n calculated above is indicated by the vertical line.\nAs you can see from the left-hand plot the theoretical threshold seems a bit large.\n\n\nd = generateThreshVsPerfData(r, measures = list(credit.costs, mmce))\nplotThreshVsPerf(d, mark.th = th)\n\n\n\n\n \n\n\nii. Empirical thresholding\n\n\nThe idea of \nempirical thresholding\n (see \nSheng and Ling, 2006\n)\nis to select cost-optimal threshold values for a given learning method based on the training data.\nIn contrast to \ntheoretical thresholding\n it suffices if the estimated posterior probabilities\nare order-correct.\n\n\nIn order to determine optimal threshold values you can use \nmlr\n's function \ntuneThreshold\n.\nAs tuning the threshold on the complete training data set can lead to overfitting, you should\nuse resampling strategies.\nBelow we perform 3-fold cross-validation and use \ntuneThreshold\n to calculate threshold values\nwith lowest average costs over the 3 test data sets.\n\n\nlrn = makeLearner(\nclassif.multinom\n, predict.type = \nprob\n, trace = FALSE)\n\n## 3-fold cross-validation\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom\n#\n credit.costs.aggr: 0.85\n#\n credit.costs.mean: 0.85\n#\n credit.costs.sd: 0.17\n#\n mmce.aggr: 0.25\n#\n mmce.mean: 0.25\n#\n mmce.sd: 0.03\n#\n Runtime: 0.370338\n\n## Tune the threshold based on the predicted probabilities on the 3 test data sets\ntune.res = tuneThreshold(pred = r$pred, measure = credit.costs)\ntune.res\n#\n $th\n#\n [1] 0.1115426\n#\n \n#\n $perf\n#\n credit.costs \n#\n 0.507004\n\n\n\n\ntuneThreshold\n returns the optimal threshold value for the positive class and the corresponding\nperformance.\nAs expected the tuned threshold is smaller than the theoretical threshold.\n\n\n2. Rebalancing\n\n\nIn order to minimize the average costs, observations from the less costly class should be\ngiven higher importance during training.\nThis can be achieved by \nweighting\n the classes, provided that the learner under consideration\nhas a 'class weights' or an 'observation weights' argument.\nTo find out which learning methods support either type of weights have a look at the\n\nlist of integrated learners\n in the Appendix or use \nlistLearners\n.\n\n\n## Learners that accept observation weights\nlistLearners(\nclassif\n, properties = \nweights\n)\n#\n [1] \nclassif.ada\n \nclassif.avNNet\n \nclassif.binomial\n \n#\n [4] \nclassif.blackboost\n \nclassif.cforest\n \nclassif.ctree\n \n#\n [7] \nclassif.extraTrees\n \nclassif.gbm\n \nclassif.glmboost\n \n#\n [10] \nclassif.glmnet\n \nclassif.logreg\n \nclassif.lqa\n \n#\n [13] \nclassif.multinom\n \nclassif.nnet\n \nclassif.plr\n \n#\n [16] \nclassif.probit\n \nclassif.rpart\n \nclassif.xgboost\n\n\n## Learners that can deal with class weights\nlistLearners(\nclassif\n, properties = \nclass.weights\n)\n#\n [1] \nclassif.ksvm\n \nclassif.LiblineaRL1L2SVC\n \n#\n [3] \nclassif.LiblineaRL1LogReg\n \nclassif.LiblineaRL2L1SVC\n \n#\n [5] \nclassif.LiblineaRL2LogReg\n \nclassif.LiblineaRL2SVC\n \n#\n [7] \nclassif.LiblineaRMultiClassSVC\n \nclassif.randomForest\n \n#\n [9] \nclassif.svm\n\n\n\n\n\nAlternatively, \nover- and undersampling\n techniques can be used.\n\n\ni. Weighting\n\n\nJust as \ntheoretical thresholds\n, \ntheoretical weights\n can be calculated from the\ncost matrix.\nIf \nt\n indicates the target threshold and \nt_0\n the original threshold for the positive class the\nproportion of observations in the positive class has to be multiplied by\n\n\\frac{1-t}{t} \\frac{t_0}{1-t_0}.\n\nAlternatively, the proportion of observations in the negative class can be multiplied by\nthe inverse.\nA proof is given by \nElkan (2001)\n.\n\n\nIn most cases, the original threshold is \nt_0 = 0.5\n and thus the second factor vanishes.\nIf additionally the target threshold \nt\n equals the theoretical threshold \nt^*\n the\nproportion of observations in the positive class has to be multiplied by\n\n\\frac{1-t^*}{t^*} = \\frac{c(-1,+1) - c(+1,+1)}{c(+1,-1) - c(-1,-1)}.\n\n\n\n\nFor the \ncredit example\n the theoretical threshold corresponds to a\nweight of 5 for the positive class.\n\n\n## Weight for positive class corresponding to theoretical treshold\nw = (1 - th)/th\nw\n#\n [1] 5\n\n\n\n\nA unified and convenient way to assign class weights to a \nLearner\n (and tune\nthem) is provided by function \nmakeWeightedClassesWrapper\n. The class weights are specified\nusing argument \nwcw.weight\n.\nFor learners that support observation weights a suitable weight vector is then generated\ninternally during training or resampling.\nIf the learner can deal with class weights, the weights are basically passed on to the\nappropriate learner parameter. The advantage of using the wrapper in this case is the unified\nway to specify the class weights.\n\n\nBelow is an example using learner \n\"classif.multinom\"\n (\nmultinom\n from\npackage \nnnet\n) which accepts observation weights.\nFor binary classification problems it is sufficient to specify the weight \nw\n for the positive\nclass. The negative class then automatically receives weight 1.\n\n\n## Weighted learner\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn, wcw.weight = w)\nlrn\n#\n Learner weightedclasses.classif.multinom from package nnet\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: WeightedClassesWrapper\n#\n Properties: twoclass,multiclass,numerics,factors,prob\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE,wcw.weight=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: weightedclasses.classif.multinom\n#\n credit.costs.aggr: 0.53\n#\n credit.costs.mean: 0.53\n#\n credit.costs.sd: 0.04\n#\n mmce.aggr: 0.35\n#\n mmce.mean: 0.35\n#\n mmce.sd: 0.02\n#\n Runtime: 0.44081\n\n\n\n\nFor classification methods like \n\"classif.ksvm\"\n (the support vector machine\n\nksvm\n in package \nkernlab\n) that support class weights you can pass them\ndirectly.\n\n\nlrn = makeLearner(\nclassif.ksvm\n, class.weights = c(Bad = w, Good = 1))\n\n\n\n\nOr, more conveniently, you can again use \nmakeWeightedClassesWrapper\n.\n\n\nlrn = makeWeightedClassesWrapper(\nclassif.ksvm\n, wcw.weight = w)\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: weightedclasses.classif.ksvm\n#\n credit.costs.aggr: 0.58\n#\n credit.costs.mean: 0.58\n#\n credit.costs.sd: 0.04\n#\n mmce.aggr: 0.31\n#\n mmce.mean: 0.31\n#\n mmce.sd: 0.02\n#\n Runtime: 0.61962\n\n\n\n\nJust like the theoretical threshold, the theoretical weights may not always be suitable,\ntherefore you can tune the weight for the positive class as shown in the following example.\nCalculating the theoretical weight beforehand may help to narrow down the search interval.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\nps = makeParamSet(makeDiscreteParam(\nwcw.weight\n, seq(4, 12, 0.5)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, resampling = rin, par.set = ps,\n measures = list(credit.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: wcw.weight=7.5\n#\n credit.costs.test.mean=0.501,mmce.test.mean=0.381\n\nas.data.frame(tune.res$opt.path)[1:3]\n#\n wcw.weight credit.costs.test.mean mmce.test.mean\n#\n 1 4 0.5650291 0.3330127\n#\n 2 4.5 0.5550251 0.3430167\n#\n 3 5 0.5260320 0.3460197\n#\n 4 5.5 0.5130070 0.3530147\n#\n 5 6 0.5160100 0.3640137\n#\n 6 6.5 0.5160160 0.3720157\n#\n 7 7 0.5040250 0.3760167\n#\n 8 7.5 0.5010040 0.3810038\n#\n 9 8 0.5100130 0.3900128\n#\n 10 8.5 0.5100070 0.3940108\n#\n 11 9 0.5110080 0.4030078\n#\n 12 9.5 0.5160130 0.4080128\n#\n 13 10 0.5260140 0.4180138\n#\n 14 10.5 0.5240060 0.4200098\n#\n 15 11 0.5319991 0.4280029\n#\n 16 11.5 0.5289901 0.4330019\n#\n 17 12 0.5249801 0.4369999\n\n\n\n\nii. Over- and undersampling\n\n\nIf the \nLearner\n supports neither observation nor class weights the proportions\nof the classes in the training data can be changed by over- or undersampling.\n\n\nIn the \nGermanCredit data set\n the positive class Bad should receive\na theoretical weight of \nw = (1 - th)/th = 5\n.\nThis can be achieved by oversampling class Bad with a \nrate\n of 5 (see also the documentation\nof function \noversample\n).\n\n\ncredit.task.over = oversample(credit.task, rate = w)\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nmod = train(lrn, credit.task.over)\npred = predict(mod, task = credit.task)\nperformance(pred, measures = list(credit.costs, mmce))\n#\n credit.costs mmce \n#\n 0.439 0.323\n\n\n\n\nNote that in the above example the learner was trained on the oversampled task \ncredit.task.over\n.\nIn order to get the training performance on the original task predictions were calculated for \ncredit.task\n.\n\n\nWe usually prefer resampled performance values, but simply calling \nresample\n on the oversampled\ntask does not work since predictions have to be based on the original task.\nThe solution is to create a wrapped \nLearner\n via function\n\nmakeOversampleWrapper\n.\nInternally, \noversample\n is called before training, but predictions are done on the original data.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeOversampleWrapper(lrn, osw.rate = w)\nlrn\n#\n Learner classif.multinom.oversampled from package mlr,nnet\n#\n Type: classif\n#\n Name: ; Short name: \n#\n Class: OversampleWrapper\n#\n Properties: numerics,factors,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE,osw.rate=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: GermanCredit\n#\n Learner: classif.multinom.oversampled\n#\n credit.costs.aggr: 0.56\n#\n credit.costs.mean: 0.56\n#\n credit.costs.sd: 0.05\n#\n mmce.aggr: 0.35\n#\n mmce.mean: 0.35\n#\n mmce.sd: 0.02\n#\n Runtime: 0.812204\n\n\n\n\nOf course, we can also tune the oversampling rate. For this purpose we again have to create\nan \nOversampleWrapper\n.\nOptimal values for parameter \nosw.rate\n can be obtained using function \ntuneParams\n.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeOversampleWrapper(lrn)\nps = makeParamSet(makeDiscreteParam(\nosw.rate\n, seq(3, 7, 0.25)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, rin, par.set = ps, measures = list(credit.costs, mmce),\n control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: osw.rate=7\n#\n credit.costs.test.mean=0.506,mmce.test.mean=0.37\n\n\n\n\nMulti-class problems\n\n\nWe consider the \nwaveform\n data set from package \nmlbench\n and\nadd an artificial cost matrix:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntrue\\pred.\n\n\n1\n\n\n2\n\n\n3\n\n\n\n\n\n\n1\n\n\n0\n\n\n30\n\n\n80\n\n\n\n\n\n\n2\n\n\n5\n\n\n0\n\n\n4\n\n\n\n\n\n\n3\n\n\n10\n\n\n8\n\n\n0\n\n\n\n\n\n\n\n\nWe start by creating the \nTask\n, the cost matrix and the corresponding performance measure.\n\n\n## Task\ndf = mlbench::mlbench.waveform(500)\nwf.task = makeClassifTask(id = \nwaveform\n, data = as.data.frame(df), target = \nclasses\n)\n\n## Cost matrix\ncosts = matrix(c(0, 5, 10, 30, 0, 8, 80, 4, 0), 3)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(wf.task)\n\n## Performance measure\nwf.costs = makeCostMeasure(id = \nwf.costs\n, name = \nWaveform costs\n, costs = costs, task = wf.task,\n best = 0, worst = 10)\n\n\n\n\nIn the multi-class case, both, \nthresholding\n and \nrebalancing\n correspond to cost matrices\nof a certain structure where \nc(k,l) = c(l)\n for \nk\n, \nl = 1, \\ldots, K\n, \nk \\neq l\n.\nThis condition means that the cost of misclassifying an observation is independent of the\npredicted class label\n(see \nDomingos, 1999\n).\nGiven a cost matrix of this type, theoretical thresholds and weights can be derived\nin a similar manner as in the binary case.\nObviously, the cost matrix given above does not have this special structure.\n\n\n1. Thresholding\n\n\nGiven a vector of positive threshold values as long as the number of classes \nK\n, the predicted\nprobabilities for all classes are adjusted by dividing them by the corresponding threshold value.\nThen the class with the highest adjusted probability is predicted.\nThis way, as in the binary case, classes with a low threshold are preferred to classes\nwith a larger threshold.\n\n\nAgain this can be done by function \nsetThreshold\n as shown in the following example (or\nalternatively by the \npredict.threshold\n option of \nmakeLearner\n).\nNote that the threshold vector needs to have names that correspond to the class labels.\n\n\nlrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nrin = makeResampleInstance(\nCV\n, iters = 3, task = wf.task)\nr = resample(lrn, wf.task, rin, measures = list(wf.costs, mmce), show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: waveform\n#\n Learner: classif.rpart\n#\n wf.costs.aggr: 8.40\n#\n wf.costs.mean: 8.40\n#\n wf.costs.sd: 1.57\n#\n mmce.aggr: 0.30\n#\n mmce.mean: 0.30\n#\n mmce.sd: 0.00\n#\n Runtime: 0.0816503\n\n## Calculate thresholds as 1/(average costs of true classes)\nth = 2/rowSums(costs)\nnames(th) = getTaskClassLevels(wf.task)\nth\n#\n 1 2 3 \n#\n 0.01818182 0.22222222 0.11111111\n\npred.th = setThreshold(r$pred, threshold = th)\nperformance(pred.th, measures = list(wf.costs, mmce))\n#\n wf.costs mmce \n#\n 5.7498377 0.3598947\n\n\n\n\nThe threshold vector \nth\n in the above example is chosen according to the average costs\nof the true classes 55, 4.5 and 9.\nMore exactly, \nth\n corresponds to an artificial cost matrix of the structure mentioned\nabove with off-diagonal elements \nc(2,1) = c(3,1) = 55\n, \nc(1,2) = c(3,2) = 4.5\n and\n\nc(1,3) = c(2,3) = 9\n.\nThis threshold vector may be not optimal but leads to smaller total costs on the data set than\nthe default.\n\n\nii. Empirical thresholding\n\n\nAs in the binary case it is possible to tune the threshold vector using function \ntuneThreshold\n.\nSince the scaling of the threshold vector does not change the predicted class labels\n\ntuneThreshold\n returns threshold values that lie in [0,1] and sum to unity.\n\n\ntune.res = tuneThreshold(pred = r$pred, measure = wf.costs)\ntune.res\n#\n $th\n#\n 1 2 3 \n#\n 0.03734676 0.31826144 0.64439179 \n#\n \n#\n $perf\n#\n [1] 4.275197\n\n\n\n\nFor comparison we show the standardized version of the theoretically motivated threshold\nvector chosen above.\n\n\nth/sum(th)\n#\n 1 2 3 \n#\n 0.05172414 0.63218391 0.31609195\n\n\n\n\n2. Rebalancing\n\n\ni. Weighting\n\n\nIn the multi-class case you have to pass a vector of weights as long as the number of classes\n\nK\n to function \nmakeWeightedClassesWrapper\n.\nThe weight vector can be tuned using function \ntuneParams\n.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\n\nps = makeParamSet(makeNumericVectorParam(\nwcw.weight\n, len = 3, lower = 0, upper = 1))\nctrl = makeTuneControlRandom()\n\ntune.res = tuneParams(lrn, wf.task, resampling = rin, par.set = ps,\n measures = list(wf.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n#\n Tune result:\n#\n Op. pars: wcw.weight=0.871,0.105,0.0306\n#\n wf.costs.test.mean= 2.7,mmce.test.mean=0.242\n\n\n\n\nExample-dependent misclassification costs\n\n\nIn case of example-dependent costs we have to create a special \nTask\n via function\n\nmakeCostSensTask\n.\nFor this purpose the feature values \nx\n and an \nn \\times K\n\n\ncost\n matrix that contains\nthe cost vectors for all \nn\n examples in the data set are required.\n\n\nWe use the \niris\n data and generate an artificial cost matrix\n(see \nBeygelzimer et al., 2005\n).\n\n\ndf = iris\ncost = matrix(runif(150 * 3, 0, 2000), 150) * (1 - diag(3))[df$Species,] + runif(150, 0, 10)\ncolnames(cost) = levels(iris$Species)\nrownames(cost) = rownames(iris)\ndf$Species = NULL\n\ncostsens.task = makeCostSensTask(id = \niris\n, data = df, cost = cost)\ncostsens.task\n#\n Supervised task: iris\n#\n Type: costsens\n#\n Observations: 150\n#\n Features:\n#\n numerics factors ordered \n#\n 4 0 0 \n#\n Missings: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 3\n#\n setosa, versicolor, virginica\n\n\n\n\nmlr\n provides several \nwrappers\n to turn regular classification or regression methods\ninto \nLearner\ns that can deal with example-dependent costs.\n\n\n\n\nmakeCostSensClassifWrapper\n (wraps a classification \nLearner\n):\n This is a naive approach where the costs are coerced into class labels by choosing the\n class label with minimum cost for each example. Then a regular classification method is\n used.\n\n\nmakeCostSensRegrWrapper\n (wraps a regression \nLearner\n):\n An individual regression model is fitted for the costs of each class.\n In the prediction step first the costs are predicted for all classes and then the class with\n the lowest predicted costs is selected.\n\n\nmakeCostSensWeightedPairsWrapper\n (wraps a classification \nLearner\n):\n This is also known as \ncost-sensitive one-vs-one\n (CS-OVO) and the most sophisticated of\n the currently supported methods.\n For each pair of classes, a binary classifier is fitted.\n For each observation the class label is defined as the element of the pair with minimal costs.\n During fitting, the observations are weighted with the absolute difference in costs.\n Prediction is performed by simple voting.\n\n\n\n\nIn the following example we use the third method. We create the wrapped \nLearner\n\nand train it on the \nCostSensTask\n defined above.\n\n\nlrn = makeLearner(\nclassif.multinom\n, trace = FALSE)\nlrn = makeCostSensWeightedPairsWrapper(lrn)\nlrn\n#\n Learner costsens.classif.multinom from package nnet\n#\n Type: costsens\n#\n Name: ; Short name: \n#\n Class: CostSensWeightedPairsWrapper\n#\n Properties: twoclass,multiclass,numerics,factors\n#\n Predict-Type: response\n#\n Hyperparameters: trace=FALSE\n\nmod = train(lrn, costsens.task)\nmod\n#\n Model for learner.id=costsens.classif.multinom; learner.class=CostSensWeightedPairsWrapper\n#\n Trained on: task.id = iris; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n\n\n\n\nThe models corresponding to the individual pairs can be accessed by function\n\ngetLearnerModel\n.\n\n\ngetLearnerModel(mod)\n#\n [[1]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n#\n \n#\n [[2]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n#\n \n#\n [[3]]\n#\n Model for learner.id=classif.multinom; learner.class=classif.multinom\n#\n Trained on: task.id = feats; obs = 150; features = 4\n#\n Hyperparameters: trace=FALSE\n\n\n\n\nmlr\n provides some performance measures for example-specific cost-sensitive classification.\nIn the following example we calculate the mean costs of the predicted class labels\n(\nmeancosts\n) and the misclassification penalty (\nmcp\n).\nThe latter measure is the average difference between the costs caused by the predicted\nclass labels, i.e., \nmeancosts\n, and the costs resulting from choosing the\nclass with lowest cost for each observation.\nIn order to compute these measures the costs for the test observations are required and\ntherefore the \nTask\n has to be passed to \nperformance\n.\n\n\npred = predict(mod, task = costsens.task)\npred\n#\n Prediction: 150 observations\n#\n predict.type: response\n#\n threshold: \n#\n time: 0.06\n#\n id response\n#\n 1 1 setosa\n#\n 2 2 setosa\n#\n 3 3 setosa\n#\n 4 4 setosa\n#\n 5 5 setosa\n#\n 6 6 setosa\n\nperformance(pred, measures = list(meancosts, mcp), task = costsens.task)\n#\n meancosts mcp \n#\n 163.0830 158.2944", "title": "Cost-Sensitive Classification" }, { @@ -457,12 +457,12 @@ }, { "location": "/cost_sensitive_classif/index.html#class-dependent-misclassification-costs", - "text": "There are some classification methods that can accomodate misclassification costs\ndirectly.\nOne example is rpart . Alternatively, we can use cost-insensitive methods and manipulate the predictions or the\ntraining data in order to take misclassification costs into account. mlr supports thresholding and rebalancing . Thresholding :\n The thresholds used to turn posterior probabilities into class labels are chosen such that\n the costs are minimized.\n This requires a Learner that can predict posterior probabilities.\n During training the costs are not taken into account. Rebalancing :\n The idea is to change the proportion of the classes in the training data set in order to\n account for costs during training, either by weighting or by sampling .\n Rebalancing does not require that the Learner can predict probabilities. i. For weighting we need a Learner that supports class weights or observation\n weights. ii. If the Learner cannot deal with weights the proportion of classes can\n be changed by over- and undersampling . We start with binary classification problems and afterwards deal with multi-class problems. Binary classification problems The positive and negative classes are labeled 1 and -1 , respectively, and we consider the\nfollowing cost matrix where the rows indicate true classes and the columns predicted classes: true\\pred. +1 -1 +1 c(+1,+1) c(-1,+1) -1 c(+1,-1) c(-1,-1) Often, the diagonal entries are zero or the cost matrix is rescaled to achieve zeros in the diagonal\n(see for example O'Brien et al, 2008 ). A well-known cost-sensitive classification problem is posed by the German Credit data set \n(see also the UCI Machine Learning Repository ).\nThe corresponding cost matrix (though Elkan (2001) \nargues that this matrix is economically unreasonable) is given as: true\\pred. Bad Good Bad 0 5 Good 1 0 As in the table above, the rows indicate true and the columns predicted classes. In case of class-dependent costs it is sufficient to generate an ordinary ClassifTask .\nA CostSensTask is only needed if the costs are example-dependent.\nIn the R code below we create the ClassifTask , remove two constant features from the\ndata set and generate the cost matrix.\nPer default, Bad is the positive class. data(GermanCredit, package = caret )\ncredit.task = makeClassifTask(data = GermanCredit, target = Class )\ncredit.task = removeConstantFeatures(credit.task)\n# Removing 2 columns: Purpose.Vacation,Personal.Female.Single\ncredit.task\n# Supervised task: GermanCredit\n# Type: classif\n# Target: Class\n# Observations: 1000\n# Features:\n# numerics factors ordered \n# 59 0 0 \n# Missings: FALSE\n# Has weights: FALSE\n# Has blocking: FALSE\n# Classes: 2\n# Bad Good \n# 300 700 \n# Positive class: Bad\n\ncosts = matrix(c(0, 1, 5, 0), 2)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(credit.task)\ncosts\n# Bad Good\n# Bad 0 5\n# Good 1 0 1. Thresholding We start by fitting a logistic regression model to the German credit data set and predict posterior probabilities. ## Train and predict posterior probabilities\nlrn = makeLearner( classif.multinom , predict.type = prob , trace = FALSE)\nmod = train(lrn, credit.task)\npred = predict(mod, task = credit.task)\npred\n# Prediction: 1000 observations\n# predict.type: prob\n# threshold: Bad=0.50,Good=0.50\n# time: 0.01\n# id truth prob.Bad prob.Good response\n# 1 1 Good 0.03525092 0.9647491 Good\n# 2 2 Bad 0.63222363 0.3677764 Bad\n# 3 3 Good 0.02807414 0.9719259 Good\n# 4 4 Good 0.25182703 0.7481730 Good\n# 5 5 Bad 0.75193275 0.2480673 Bad\n# 6 6 Good 0.26230149 0.7376985 Good The default thresholds for both classes are 0.5.\nBut according to the cost matrix we should predict class Good only if we are very sure that Good\nis indeed the correct label. Therefore we should increase the threshold for class Good and decrease the\nthreshold for class Bad. i. Theoretical thresholding The theoretical threshold for the positive class can be calculated from the cost matrix as t^* = \\frac{c(+1,-1) - c(-1,-1)}{c(+1,-1) - c(+1,+1) + c(-1,+1) - c(-1,-1)}. \nFor more details see Elkan (2001) . Below the theoretical threshold for the German credit example \nis calculated and used to predict class labels.\nSince the diagonal of the cost matrix is zero the formula given above simplifies accordingly. ## Calculate the theoretical threshold for the positive class\nth = costs[2,1]/(costs[2,1] + costs[1,2])\nth\n# [1] 0.1666667 As you may recall you can change thresholds in mlr either before training by using the predict.threshold option of makeLearner or after prediction by calling setThreshold \non the Prediction object. As we already have a prediction we use the setThreshold function. It returns an altered Prediction object with class predictions for the theoretical threshold. ## Predict class labels according to the theoretical threshold\npred.th = setThreshold(pred, th)\npred.th\n# Prediction: 1000 observations\n# predict.type: prob\n# threshold: Bad=0.17,Good=0.83\n# time: 0.01\n# id truth prob.Bad prob.Good response\n# 1 1 Good 0.03525092 0.9647491 Good\n# 2 2 Bad 0.63222363 0.3677764 Bad\n# 3 3 Good 0.02807414 0.9719259 Good\n# 4 4 Good 0.25182703 0.7481730 Bad\n# 5 5 Bad 0.75193275 0.2480673 Bad\n# 6 6 Good 0.26230149 0.7376985 Bad In order to calculate the average costs over the entire data set we first need to create a new\nperformance Measure . This can be done through function makeCostMeasure \nwhich requires the ClassifTask object and the cost matrix (argument costs ).\nIt is expected that the rows of the cost matrix indicate true and the columns predicted\nclass labels. credit.costs = makeCostMeasure(id = credit.costs , costs = costs, task = credit.task, best = 0, worst = 5)\ncredit.costs\n# Name: credit.costs\n# Performance measure: credit.costs\n# Properties: classif,classif.multi,req.pred,req.truth,predtype.response,predtype.prob\n# Minimize: TRUE\n# Best: 0; Worst: 5\n# Aggregated by: test.mean\n# Note: Then the average costs can be computed by function performance .\nBelow we compare the average costs and the error rate ( mmce ) of the learning algorithm\nwith both default thresholds 0.5 and theoretical thresholds. ## Performance with default thresholds 0.5\nperformance(pred, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.774 0.214\n\n## Performance with theoretical thresholds\nperformance(pred.th, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.478 0.346 These performance values may be overly optimistic as we used the same data set for training\nand prediction, and resampling strategies should be preferred.\nIn the R code below we make use of the predict.threshold argument of makeLearner to set\nthe threshold before doing a 3-fold cross-validation on the credit.task .\nNote that we create a ResampleInstance ( rin ) that is used throughout\nthe next several code chunks to get comparable performance values. ## Cross-validated performance with theoretical thresholds\nrin = makeResampleInstance( CV , iters = 3, task = credit.task)\nlrn = makeLearner( classif.multinom , predict.type = prob , predict.threshold = th, trace = FALSE)\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom\n# credit.costs.aggr: 0.56\n# credit.costs.mean: 0.56\n# credit.costs.sd: 0.03\n# mmce.aggr: 0.36\n# mmce.mean: 0.36\n# mmce.sd: 0.02\n# Runtime: 0.323072 If we are also interested in the cross-validated performance for the default threshold values\nwe can call setThreshold on the resample prediction r$pred . ## Cross-validated performance with default thresholds\nperformance(setThreshold(r$pred, 0.5), measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.8521695 0.2480205 Theoretical thresholding is only reliable if the predicted posterior probabilities are correct.\nIf there is bias the thresholds have to be shifted accordingly. Useful in this regard is function plotThreshVsPerf that you can use to plot the average costs\nas well as any other performance measure versus possible threshold values for the positive\nclass in [0,1] . The underlying data is generated by generateThreshVsPerfData . The following plots show the cross-validated costs and error rate ( mmce ).\nThe theoretical threshold th calculated above is indicated by the vertical line.\nAs you can see from the left-hand plot the theoretical threshold seems a bit large. d = generateThreshVsPerfData(r, measures = list(credit.costs, mmce))\nplotThreshVsPerf(d, mark.th = th) ii. Empirical thresholding The idea of empirical thresholding (see Sheng and Ling, 2006 )\nis to select cost-optimal threshold values for a given learning method based on the training data.\nIn contrast to theoretical thresholding it suffices if the estimated posterior probabilities\nare order-correct. In order to determine optimal threshold values you can use mlr 's function tuneThreshold .\nAs tuning the threshold on the complete training data set can lead to overfitting, you should\nuse resampling strategies.\nBelow we perform 3-fold cross-validation and use tuneThreshold to calculate threshold values\nwith lowest average costs over the 3 test data sets. lrn = makeLearner( classif.multinom , predict.type = prob , trace = FALSE)\n\n## 3-fold cross-validation\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom\n# credit.costs.aggr: 0.85\n# credit.costs.mean: 0.85\n# credit.costs.sd: 0.17\n# mmce.aggr: 0.25\n# mmce.mean: 0.25\n# mmce.sd: 0.03\n# Runtime: 0.343711\n\n## Tune the threshold based on the predicted probabilities on the 3 test data sets\ntune.res = tuneThreshold(pred = r$pred, measure = credit.costs)\ntune.res\n# $th\n# [1] 0.1115426\n# \n# $perf\n# credit.costs \n# 0.507004 tuneThreshold returns the optimal threshold value for the positive class and the corresponding\nperformance.\nAs expected the tuned threshold is smaller than the theoretical threshold. 2. Rebalancing In order to minimize the average costs, observations from the less costly class should be\ngiven higher importance during training.\nThis can be achieved by weighting the classes, provided that the learner under consideration\nhas a 'class weights' or an 'observation weights' argument.\nTo find out which learning methods support either type of weights have a look at the list of integrated learners in the Appendix or use listLearners . ## Learners that accept observation weights\nlistLearners( classif , properties = weights )\n# [1] classif.ada classif.avNNet classif.binomial \n# [4] classif.blackboost classif.cforest classif.ctree \n# [7] classif.extraTrees classif.gbm classif.glmboost \n# [10] classif.glmnet classif.logreg classif.lqa \n# [13] classif.multinom classif.nnet classif.plr \n# [16] classif.probit classif.rpart classif.xgboost \n\n## Learners that can deal with class weights\nlistLearners( classif , properties = class.weights )\n# [1] classif.ksvm classif.LiblineaRL1L2SVC \n# [3] classif.LiblineaRL1LogReg classif.LiblineaRL2L1SVC \n# [5] classif.LiblineaRL2LogReg classif.LiblineaRL2SVC \n# [7] classif.LiblineaRMultiClassSVC classif.randomForest \n# [9] classif.svm Alternatively, over- and undersampling techniques can be used. i. Weighting Just as theoretical thresholds , theoretical weights can be calculated from the\ncost matrix.\nIf t indicates the target threshold and t_0 the original threshold for the positive class the\nproportion of observations in the positive class has to be multiplied by \\frac{1-t}{t} \\frac{t_0}{1-t_0}. \nAlternatively, the proportion of observations in the negative class can be multiplied by\nthe inverse.\nA proof is given by Elkan (2001) . In most cases, the original threshold is t_0 = 0.5 and thus the second factor vanishes.\nIf additionally the target threshold t equals the theoretical threshold t^* the\nproportion of observations in the positive class has to be multiplied by \\frac{1-t^*}{t^*} = \\frac{c(-1,+1) - c(+1,+1)}{c(+1,-1) - c(-1,-1)}. For the credit example the theoretical threshold corresponds to a\nweight of 5 for the positive class. ## Weight for positive class corresponding to theoretical treshold\nw = (1 - th)/th\nw\n# [1] 5 A unified and convenient way to assign class weights to a Learner (and tune\nthem) is provided by function makeWeightedClassesWrapper . The class weights are specified\nusing argument wcw.weight .\nFor learners that support observation weights a suitable weight vector is then generated\ninternally during training or resampling.\nIf the learner can deal with class weights, the weights are basically passed on to the\nappropriate learner parameter. The advantage of using the wrapper in this case is the unified\nway to specify the class weights. Below is an example using learner \"classif.multinom\" ( multinom from\npackage nnet ) which accepts observation weights.\nFor binary classification problems it is sufficient to specify the weight w for the positive\nclass. The negative class then automatically receives weight 1. ## Weighted learner\nlrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn, wcw.weight = w)\nlrn\n# Learner weightedclasses.classif.multinom from package nnet\n# Type: classif\n# Name: ; Short name: \n# Class: WeightedClassesWrapper\n# Properties: twoclass,multiclass,numerics,factors,prob\n# Predict-Type: response\n# Hyperparameters: trace=FALSE,wcw.weight=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: weightedclasses.classif.multinom\n# credit.costs.aggr: 0.53\n# credit.costs.mean: 0.53\n# credit.costs.sd: 0.04\n# mmce.aggr: 0.35\n# mmce.mean: 0.35\n# mmce.sd: 0.02\n# Runtime: 0.395105 For classification methods like \"classif.ksvm\" (the support vector machine ksvm in package kernlab ) that support class weights you can pass them\ndirectly. lrn = makeLearner( classif.ksvm , class.weights = c(Bad = w, Good = 1)) Or, more conveniently, you can again use makeWeightedClassesWrapper . lrn = makeWeightedClassesWrapper( classif.ksvm , wcw.weight = w)\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: weightedclasses.classif.ksvm\n# credit.costs.aggr: 0.58\n# credit.costs.mean: 0.58\n# credit.costs.sd: 0.04\n# mmce.aggr: 0.31\n# mmce.mean: 0.31\n# mmce.sd: 0.02\n# Runtime: 0.546966 Just like the theoretical threshold, the theoretical weights may not always be suitable,\ntherefore you can tune the weight for the positive class as shown in the following example.\nCalculating the theoretical weight beforehand may help to narrow down the search interval. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\nps = makeParamSet(makeDiscreteParam( wcw.weight , seq(4, 12, 0.5)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, resampling = rin, par.set = ps,\n measures = list(credit.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: wcw.weight=7.5\n# credit.costs.test.mean=0.501,mmce.test.mean=0.381\n\nas.data.frame(tune.res$opt.path)[1:3]\n# wcw.weight credit.costs.test.mean mmce.test.mean\n# 1 4 0.5650291 0.3330127\n# 2 4.5 0.5550251 0.3430167\n# 3 5 0.5260320 0.3460197\n# 4 5.5 0.5130070 0.3530147\n# 5 6 0.5160100 0.3640137\n# 6 6.5 0.5160160 0.3720157\n# 7 7 0.5040250 0.3760167\n# 8 7.5 0.5010040 0.3810038\n# 9 8 0.5100130 0.3900128\n# 10 8.5 0.5100070 0.3940108\n# 11 9 0.5110080 0.4030078\n# 12 9.5 0.5160130 0.4080128\n# 13 10 0.5260140 0.4180138\n# 14 10.5 0.5240060 0.4200098\n# 15 11 0.5319991 0.4280029\n# 16 11.5 0.5289901 0.4330019\n# 17 12 0.5249801 0.4369999 ii. Over- and undersampling If the Learner supports neither observation nor class weights the proportions\nof the classes in the training data can be changed by over- or undersampling. In the GermanCredit data set the positive class Bad should receive\na theoretical weight of w = (1 - th)/th = 5 .\nThis can be achieved by oversampling class Bad with a rate of 5 (see also the documentation\nof function oversample ). credit.task.over = oversample(credit.task, rate = w)\nlrn = makeLearner( classif.multinom , trace = FALSE)\nmod = train(lrn, credit.task.over)\npred = predict(mod, task = credit.task)\nperformance(pred, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.439 0.323 Note that in the above example the learner was trained on the oversampled task credit.task.over .\nIn order to get the training performance on the original task predictions were calculated for credit.task . We usually prefer resampled performance values, but simply calling resample on the oversampled\ntask does not work since predictions have to be based on the original task.\nThe solution is to create a wrapped Learner via function makeOversampleWrapper .\nInternally, oversample is called before training, but predictions are done on the original data. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeOversampleWrapper(lrn, osw.rate = w)\nlrn\n# Learner classif.multinom.oversampled from package mlr,nnet\n# Type: classif\n# Name: ; Short name: \n# Class: OversampleWrapper\n# Properties: numerics,factors,weights,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: trace=FALSE,osw.rate=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom.oversampled\n# credit.costs.aggr: 0.56\n# credit.costs.mean: 0.56\n# credit.costs.sd: 0.05\n# mmce.aggr: 0.35\n# mmce.mean: 0.35\n# mmce.sd: 0.02\n# Runtime: 0.696635 Of course, we can also tune the oversampling rate. For this purpose we again have to create\nan OversampleWrapper .\nOptimal values for parameter osw.rate can be obtained using function tuneParams . lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeOversampleWrapper(lrn)\nps = makeParamSet(makeDiscreteParam( osw.rate , seq(3, 7, 0.25)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, rin, par.set = ps, measures = list(credit.costs, mmce),\n control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: osw.rate=7\n# credit.costs.test.mean=0.506,mmce.test.mean=0.37 Multi-class problems We consider the waveform data set from package mlbench and\nadd an artificial cost matrix: true\\pred. 1 2 3 1 0 30 80 2 5 0 4 3 10 8 0 We start by creating the Task , the cost matrix and the corresponding performance measure. ## Task\ndf = mlbench::mlbench.waveform(500)\nwf.task = makeClassifTask(id = waveform , data = as.data.frame(df), target = classes )\n\n## Cost matrix\ncosts = matrix(c(0, 5, 10, 30, 0, 8, 80, 4, 0), 3)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(wf.task)\n\n## Performance measure\nwf.costs = makeCostMeasure(id = wf.costs , costs = costs, task = wf.task, best = 0,\n worst = 10) In the multi-class case, both, thresholding and rebalancing correspond to cost matrices\nof a certain structure where c(k,l) = c(l) for k , l = 1, \\ldots, K , k \\neq l .\nThis condition means that the cost of misclassifying an observation is independent of the\npredicted class label\n(see Domingos, 1999 ).\nGiven a cost matrix of this type, theoretical thresholds and weights can be derived\nin a similar manner as in the binary case.\nObviously, the cost matrix given above does not have this special structure. 1. Thresholding Given a vector of positive threshold values as long as the number of classes K , the predicted\nprobabilities for all classes are adjusted by dividing them by the corresponding threshold value.\nThen the class with the highest adjusted probability is predicted.\nThis way, as in the binary case, classes with a low threshold are preferred to classes\nwith a larger threshold. Again this can be done by function setThreshold as shown in the following example (or\nalternatively by the predict.threshold option of makeLearner ).\nNote that the threshold vector needs to have names that correspond to the class labels. lrn = makeLearner( classif.rpart , predict.type = prob )\nrin = makeResampleInstance( CV , iters = 3, task = wf.task)\nr = resample(lrn, wf.task, rin, measures = list(wf.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: waveform\n# Learner: classif.rpart\n# wf.costs.aggr: 8.40\n# wf.costs.mean: 8.40\n# wf.costs.sd: 1.57\n# mmce.aggr: 0.30\n# mmce.mean: 0.30\n# mmce.sd: 0.00\n# Runtime: 0.0716846\n\n## Calculate thresholds as 1/(average costs of true classes)\nth = 2/rowSums(costs)\nnames(th) = getTaskClassLevels(wf.task)\nth\n# 1 2 3 \n# 0.01818182 0.22222222 0.11111111\n\npred.th = setThreshold(r$pred, threshold = th)\nperformance(pred.th, measures = list(wf.costs, mmce))\n# wf.costs mmce \n# 5.7498377 0.3598947 The threshold vector th in the above example is chosen according to the average costs\nof the true classes 55, 4.5 and 9.\nMore exactly, th corresponds to an artificial cost matrix of the structure mentioned\nabove with off-diagonal elements c(2,1) = c(3,1) = 55 , c(1,2) = c(3,2) = 4.5 and c(1,3) = c(2,3) = 9 .\nThis threshold vector may be not optimal but leads to smaller total costs on the data set than\nthe default. ii. Empirical thresholding As in the binary case it is possible to tune the threshold vector using function tuneThreshold .\nSince the scaling of the threshold vector does not change the predicted class labels tuneThreshold returns threshold values that lie in [0,1] and sum to unity. tune.res = tuneThreshold(pred = r$pred, measure = wf.costs)\ntune.res\n# $th\n# 1 2 3 \n# 0.03734676 0.31826144 0.64439179 \n# \n# $perf\n# [1] 4.275197 For comparison we show the standardized version of the theoretically motivated threshold\nvector chosen above. th/sum(th)\n# 1 2 3 \n# 0.05172414 0.63218391 0.31609195 2. Rebalancing i. Weighting In the multi-class case you have to pass a vector of weights as long as the number of classes K to function makeWeightedClassesWrapper .\nThe weight vector can be tuned using function tuneParams . lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\n\nps = makeParamSet(makeNumericVectorParam( wcw.weight , len = 3, lower = 0, upper = 1))\nctrl = makeTuneControlRandom()\n\ntune.res = tuneParams(lrn, wf.task, resampling = rin, par.set = ps,\n measures = list(wf.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: wcw.weight=0.871,0.105,0.0306\n# wf.costs.test.mean= 2.7,mmce.test.mean=0.242", + "text": "There are some classification methods that can accomodate misclassification costs\ndirectly.\nOne example is rpart . Alternatively, we can use cost-insensitive methods and manipulate the predictions or the\ntraining data in order to take misclassification costs into account. mlr supports thresholding and rebalancing . Thresholding :\n The thresholds used to turn posterior probabilities into class labels are chosen such that\n the costs are minimized.\n This requires a Learner that can predict posterior probabilities.\n During training the costs are not taken into account. Rebalancing :\n The idea is to change the proportion of the classes in the training data set in order to\n account for costs during training, either by weighting or by sampling .\n Rebalancing does not require that the Learner can predict probabilities. i. For weighting we need a Learner that supports class weights or observation\n weights. ii. If the Learner cannot deal with weights the proportion of classes can\n be changed by over- and undersampling . We start with binary classification problems and afterwards deal with multi-class problems. Binary classification problems The positive and negative classes are labeled 1 and -1 , respectively, and we consider the\nfollowing cost matrix where the rows indicate true classes and the columns predicted classes: true\\pred. +1 -1 +1 c(+1,+1) c(-1,+1) -1 c(+1,-1) c(-1,-1) Often, the diagonal entries are zero or the cost matrix is rescaled to achieve zeros in the diagonal\n(see for example O'Brien et al, 2008 ). A well-known cost-sensitive classification problem is posed by the German Credit data set \n(see also the UCI Machine Learning Repository ).\nThe corresponding cost matrix (though Elkan (2001) \nargues that this matrix is economically unreasonable) is given as: true\\pred. Bad Good Bad 0 5 Good 1 0 As in the table above, the rows indicate true and the columns predicted classes. In case of class-dependent costs it is sufficient to generate an ordinary ClassifTask .\nA CostSensTask is only needed if the costs are example-dependent.\nIn the R code below we create the ClassifTask , remove two constant features from the\ndata set and generate the cost matrix.\nPer default, Bad is the positive class. data(GermanCredit, package = caret )\ncredit.task = makeClassifTask(data = GermanCredit, target = Class )\ncredit.task = removeConstantFeatures(credit.task)\n# Removing 2 columns: Purpose.Vacation,Personal.Female.Single\n\ncredit.task\n# Supervised task: GermanCredit\n# Type: classif\n# Target: Class\n# Observations: 1000\n# Features:\n# numerics factors ordered \n# 59 0 0 \n# Missings: FALSE\n# Has weights: FALSE\n# Has blocking: FALSE\n# Classes: 2\n# Bad Good \n# 300 700 \n# Positive class: Bad\n\ncosts = matrix(c(0, 1, 5, 0), 2)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(credit.task)\ncosts\n# Bad Good\n# Bad 0 5\n# Good 1 0 1. Thresholding We start by fitting a logistic regression model to the German credit data set and predict posterior probabilities. ## Train and predict posterior probabilities\nlrn = makeLearner( classif.multinom , predict.type = prob , trace = FALSE)\nmod = train(lrn, credit.task)\npred = predict(mod, task = credit.task)\npred\n# Prediction: 1000 observations\n# predict.type: prob\n# threshold: Bad=0.50,Good=0.50\n# time: 0.01\n# id truth prob.Bad prob.Good response\n# 1 1 Good 0.03525092 0.9647491 Good\n# 2 2 Bad 0.63222363 0.3677764 Bad\n# 3 3 Good 0.02807414 0.9719259 Good\n# 4 4 Good 0.25182703 0.7481730 Good\n# 5 5 Bad 0.75193275 0.2480673 Bad\n# 6 6 Good 0.26230149 0.7376985 Good The default thresholds for both classes are 0.5.\nBut according to the cost matrix we should predict class Good only if we are very sure that Good\nis indeed the correct label. Therefore we should increase the threshold for class Good and decrease the\nthreshold for class Bad. i. Theoretical thresholding The theoretical threshold for the positive class can be calculated from the cost matrix as t^* = \\frac{c(+1,-1) - c(-1,-1)}{c(+1,-1) - c(+1,+1) + c(-1,+1) - c(-1,-1)}. \nFor more details see Elkan (2001) . Below the theoretical threshold for the German credit example \nis calculated and used to predict class labels.\nSince the diagonal of the cost matrix is zero the formula given above simplifies accordingly. ## Calculate the theoretical threshold for the positive class\nth = costs[2,1]/(costs[2,1] + costs[1,2])\nth\n# [1] 0.1666667 As you may recall you can change thresholds in mlr either before training by using the predict.threshold option of makeLearner or after prediction by calling setThreshold \non the Prediction object. As we already have a prediction we use the setThreshold function. It returns an altered Prediction object with class predictions for the theoretical threshold. ## Predict class labels according to the theoretical threshold\npred.th = setThreshold(pred, th)\npred.th\n# Prediction: 1000 observations\n# predict.type: prob\n# threshold: Bad=0.17,Good=0.83\n# time: 0.01\n# id truth prob.Bad prob.Good response\n# 1 1 Good 0.03525092 0.9647491 Good\n# 2 2 Bad 0.63222363 0.3677764 Bad\n# 3 3 Good 0.02807414 0.9719259 Good\n# 4 4 Good 0.25182703 0.7481730 Bad\n# 5 5 Bad 0.75193275 0.2480673 Bad\n# 6 6 Good 0.26230149 0.7376985 Bad In order to calculate the average costs over the entire data set we first need to create a new\nperformance Measure . This can be done through function makeCostMeasure \nwhich requires the ClassifTask object and the cost matrix (argument costs ).\nIt is expected that the rows of the cost matrix indicate true and the columns predicted\nclass labels. credit.costs = makeCostMeasure(id = credit.costs , name = Credit costs , costs = costs, task = credit.task,\n best = 0, worst = 5)\ncredit.costs\n# Name: Credit costs\n# Performance measure: credit.costs\n# Properties: classif,classif.multi,req.pred,req.truth,predtype.response,predtype.prob\n# Minimize: TRUE\n# Best: 0; Worst: 5\n# Aggregated by: test.mean\n# Note: Then the average costs can be computed by function performance .\nBelow we compare the average costs and the error rate ( mmce ) of the learning algorithm\nwith both default thresholds 0.5 and theoretical thresholds. ## Performance with default thresholds 0.5\nperformance(pred, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.774 0.214\n\n## Performance with theoretical thresholds\nperformance(pred.th, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.478 0.346 These performance values may be overly optimistic as we used the same data set for training\nand prediction, and resampling strategies should be preferred.\nIn the R code below we make use of the predict.threshold argument of makeLearner to set\nthe threshold before doing a 3-fold cross-validation on the credit.task .\nNote that we create a ResampleInstance ( rin ) that is used throughout\nthe next several code chunks to get comparable performance values. ## Cross-validated performance with theoretical thresholds\nrin = makeResampleInstance( CV , iters = 3, task = credit.task)\nlrn = makeLearner( classif.multinom , predict.type = prob , predict.threshold = th, trace = FALSE)\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom\n# credit.costs.aggr: 0.56\n# credit.costs.mean: 0.56\n# credit.costs.sd: 0.03\n# mmce.aggr: 0.36\n# mmce.mean: 0.36\n# mmce.sd: 0.02\n# Runtime: 0.385333 If we are also interested in the cross-validated performance for the default threshold values\nwe can call setThreshold on the resample prediction r$pred . ## Cross-validated performance with default thresholds\nperformance(setThreshold(r$pred, 0.5), measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.8521695 0.2480205 Theoretical thresholding is only reliable if the predicted posterior probabilities are correct.\nIf there is bias the thresholds have to be shifted accordingly. Useful in this regard is function plotThreshVsPerf that you can use to plot the average costs\nas well as any other performance measure versus possible threshold values for the positive\nclass in [0,1] . The underlying data is generated by generateThreshVsPerfData . The following plots show the cross-validated costs and error rate ( mmce ).\nThe theoretical threshold th calculated above is indicated by the vertical line.\nAs you can see from the left-hand plot the theoretical threshold seems a bit large. d = generateThreshVsPerfData(r, measures = list(credit.costs, mmce))\nplotThreshVsPerf(d, mark.th = th) ii. Empirical thresholding The idea of empirical thresholding (see Sheng and Ling, 2006 )\nis to select cost-optimal threshold values for a given learning method based on the training data.\nIn contrast to theoretical thresholding it suffices if the estimated posterior probabilities\nare order-correct. In order to determine optimal threshold values you can use mlr 's function tuneThreshold .\nAs tuning the threshold on the complete training data set can lead to overfitting, you should\nuse resampling strategies.\nBelow we perform 3-fold cross-validation and use tuneThreshold to calculate threshold values\nwith lowest average costs over the 3 test data sets. lrn = makeLearner( classif.multinom , predict.type = prob , trace = FALSE)\n\n## 3-fold cross-validation\nr = resample(lrn, credit.task, resampling = rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom\n# credit.costs.aggr: 0.85\n# credit.costs.mean: 0.85\n# credit.costs.sd: 0.17\n# mmce.aggr: 0.25\n# mmce.mean: 0.25\n# mmce.sd: 0.03\n# Runtime: 0.370338\n\n## Tune the threshold based on the predicted probabilities on the 3 test data sets\ntune.res = tuneThreshold(pred = r$pred, measure = credit.costs)\ntune.res\n# $th\n# [1] 0.1115426\n# \n# $perf\n# credit.costs \n# 0.507004 tuneThreshold returns the optimal threshold value for the positive class and the corresponding\nperformance.\nAs expected the tuned threshold is smaller than the theoretical threshold. 2. Rebalancing In order to minimize the average costs, observations from the less costly class should be\ngiven higher importance during training.\nThis can be achieved by weighting the classes, provided that the learner under consideration\nhas a 'class weights' or an 'observation weights' argument.\nTo find out which learning methods support either type of weights have a look at the list of integrated learners in the Appendix or use listLearners . ## Learners that accept observation weights\nlistLearners( classif , properties = weights )\n# [1] classif.ada classif.avNNet classif.binomial \n# [4] classif.blackboost classif.cforest classif.ctree \n# [7] classif.extraTrees classif.gbm classif.glmboost \n# [10] classif.glmnet classif.logreg classif.lqa \n# [13] classif.multinom classif.nnet classif.plr \n# [16] classif.probit classif.rpart classif.xgboost \n\n## Learners that can deal with class weights\nlistLearners( classif , properties = class.weights )\n# [1] classif.ksvm classif.LiblineaRL1L2SVC \n# [3] classif.LiblineaRL1LogReg classif.LiblineaRL2L1SVC \n# [5] classif.LiblineaRL2LogReg classif.LiblineaRL2SVC \n# [7] classif.LiblineaRMultiClassSVC classif.randomForest \n# [9] classif.svm Alternatively, over- and undersampling techniques can be used. i. Weighting Just as theoretical thresholds , theoretical weights can be calculated from the\ncost matrix.\nIf t indicates the target threshold and t_0 the original threshold for the positive class the\nproportion of observations in the positive class has to be multiplied by \\frac{1-t}{t} \\frac{t_0}{1-t_0}. \nAlternatively, the proportion of observations in the negative class can be multiplied by\nthe inverse.\nA proof is given by Elkan (2001) . In most cases, the original threshold is t_0 = 0.5 and thus the second factor vanishes.\nIf additionally the target threshold t equals the theoretical threshold t^* the\nproportion of observations in the positive class has to be multiplied by \\frac{1-t^*}{t^*} = \\frac{c(-1,+1) - c(+1,+1)}{c(+1,-1) - c(-1,-1)}. For the credit example the theoretical threshold corresponds to a\nweight of 5 for the positive class. ## Weight for positive class corresponding to theoretical treshold\nw = (1 - th)/th\nw\n# [1] 5 A unified and convenient way to assign class weights to a Learner (and tune\nthem) is provided by function makeWeightedClassesWrapper . The class weights are specified\nusing argument wcw.weight .\nFor learners that support observation weights a suitable weight vector is then generated\ninternally during training or resampling.\nIf the learner can deal with class weights, the weights are basically passed on to the\nappropriate learner parameter. The advantage of using the wrapper in this case is the unified\nway to specify the class weights. Below is an example using learner \"classif.multinom\" ( multinom from\npackage nnet ) which accepts observation weights.\nFor binary classification problems it is sufficient to specify the weight w for the positive\nclass. The negative class then automatically receives weight 1. ## Weighted learner\nlrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn, wcw.weight = w)\nlrn\n# Learner weightedclasses.classif.multinom from package nnet\n# Type: classif\n# Name: ; Short name: \n# Class: WeightedClassesWrapper\n# Properties: twoclass,multiclass,numerics,factors,prob\n# Predict-Type: response\n# Hyperparameters: trace=FALSE,wcw.weight=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: weightedclasses.classif.multinom\n# credit.costs.aggr: 0.53\n# credit.costs.mean: 0.53\n# credit.costs.sd: 0.04\n# mmce.aggr: 0.35\n# mmce.mean: 0.35\n# mmce.sd: 0.02\n# Runtime: 0.44081 For classification methods like \"classif.ksvm\" (the support vector machine ksvm in package kernlab ) that support class weights you can pass them\ndirectly. lrn = makeLearner( classif.ksvm , class.weights = c(Bad = w, Good = 1)) Or, more conveniently, you can again use makeWeightedClassesWrapper . lrn = makeWeightedClassesWrapper( classif.ksvm , wcw.weight = w)\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: weightedclasses.classif.ksvm\n# credit.costs.aggr: 0.58\n# credit.costs.mean: 0.58\n# credit.costs.sd: 0.04\n# mmce.aggr: 0.31\n# mmce.mean: 0.31\n# mmce.sd: 0.02\n# Runtime: 0.61962 Just like the theoretical threshold, the theoretical weights may not always be suitable,\ntherefore you can tune the weight for the positive class as shown in the following example.\nCalculating the theoretical weight beforehand may help to narrow down the search interval. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\nps = makeParamSet(makeDiscreteParam( wcw.weight , seq(4, 12, 0.5)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, resampling = rin, par.set = ps,\n measures = list(credit.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: wcw.weight=7.5\n# credit.costs.test.mean=0.501,mmce.test.mean=0.381\n\nas.data.frame(tune.res$opt.path)[1:3]\n# wcw.weight credit.costs.test.mean mmce.test.mean\n# 1 4 0.5650291 0.3330127\n# 2 4.5 0.5550251 0.3430167\n# 3 5 0.5260320 0.3460197\n# 4 5.5 0.5130070 0.3530147\n# 5 6 0.5160100 0.3640137\n# 6 6.5 0.5160160 0.3720157\n# 7 7 0.5040250 0.3760167\n# 8 7.5 0.5010040 0.3810038\n# 9 8 0.5100130 0.3900128\n# 10 8.5 0.5100070 0.3940108\n# 11 9 0.5110080 0.4030078\n# 12 9.5 0.5160130 0.4080128\n# 13 10 0.5260140 0.4180138\n# 14 10.5 0.5240060 0.4200098\n# 15 11 0.5319991 0.4280029\n# 16 11.5 0.5289901 0.4330019\n# 17 12 0.5249801 0.4369999 ii. Over- and undersampling If the Learner supports neither observation nor class weights the proportions\nof the classes in the training data can be changed by over- or undersampling. In the GermanCredit data set the positive class Bad should receive\na theoretical weight of w = (1 - th)/th = 5 .\nThis can be achieved by oversampling class Bad with a rate of 5 (see also the documentation\nof function oversample ). credit.task.over = oversample(credit.task, rate = w)\nlrn = makeLearner( classif.multinom , trace = FALSE)\nmod = train(lrn, credit.task.over)\npred = predict(mod, task = credit.task)\nperformance(pred, measures = list(credit.costs, mmce))\n# credit.costs mmce \n# 0.439 0.323 Note that in the above example the learner was trained on the oversampled task credit.task.over .\nIn order to get the training performance on the original task predictions were calculated for credit.task . We usually prefer resampled performance values, but simply calling resample on the oversampled\ntask does not work since predictions have to be based on the original task.\nThe solution is to create a wrapped Learner via function makeOversampleWrapper .\nInternally, oversample is called before training, but predictions are done on the original data. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeOversampleWrapper(lrn, osw.rate = w)\nlrn\n# Learner classif.multinom.oversampled from package mlr,nnet\n# Type: classif\n# Name: ; Short name: \n# Class: OversampleWrapper\n# Properties: numerics,factors,weights,prob,twoclass,multiclass\n# Predict-Type: response\n# Hyperparameters: trace=FALSE,osw.rate=5\n\nr = resample(lrn, credit.task, rin, measures = list(credit.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: GermanCredit\n# Learner: classif.multinom.oversampled\n# credit.costs.aggr: 0.56\n# credit.costs.mean: 0.56\n# credit.costs.sd: 0.05\n# mmce.aggr: 0.35\n# mmce.mean: 0.35\n# mmce.sd: 0.02\n# Runtime: 0.812204 Of course, we can also tune the oversampling rate. For this purpose we again have to create\nan OversampleWrapper .\nOptimal values for parameter osw.rate can be obtained using function tuneParams . lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeOversampleWrapper(lrn)\nps = makeParamSet(makeDiscreteParam( osw.rate , seq(3, 7, 0.25)))\nctrl = makeTuneControlGrid()\ntune.res = tuneParams(lrn, credit.task, rin, par.set = ps, measures = list(credit.costs, mmce),\n control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: osw.rate=7\n# credit.costs.test.mean=0.506,mmce.test.mean=0.37 Multi-class problems We consider the waveform data set from package mlbench and\nadd an artificial cost matrix: true\\pred. 1 2 3 1 0 30 80 2 5 0 4 3 10 8 0 We start by creating the Task , the cost matrix and the corresponding performance measure. ## Task\ndf = mlbench::mlbench.waveform(500)\nwf.task = makeClassifTask(id = waveform , data = as.data.frame(df), target = classes )\n\n## Cost matrix\ncosts = matrix(c(0, 5, 10, 30, 0, 8, 80, 4, 0), 3)\ncolnames(costs) = rownames(costs) = getTaskClassLevels(wf.task)\n\n## Performance measure\nwf.costs = makeCostMeasure(id = wf.costs , name = Waveform costs , costs = costs, task = wf.task,\n best = 0, worst = 10) In the multi-class case, both, thresholding and rebalancing correspond to cost matrices\nof a certain structure where c(k,l) = c(l) for k , l = 1, \\ldots, K , k \\neq l .\nThis condition means that the cost of misclassifying an observation is independent of the\npredicted class label\n(see Domingos, 1999 ).\nGiven a cost matrix of this type, theoretical thresholds and weights can be derived\nin a similar manner as in the binary case.\nObviously, the cost matrix given above does not have this special structure. 1. Thresholding Given a vector of positive threshold values as long as the number of classes K , the predicted\nprobabilities for all classes are adjusted by dividing them by the corresponding threshold value.\nThen the class with the highest adjusted probability is predicted.\nThis way, as in the binary case, classes with a low threshold are preferred to classes\nwith a larger threshold. Again this can be done by function setThreshold as shown in the following example (or\nalternatively by the predict.threshold option of makeLearner ).\nNote that the threshold vector needs to have names that correspond to the class labels. lrn = makeLearner( classif.rpart , predict.type = prob )\nrin = makeResampleInstance( CV , iters = 3, task = wf.task)\nr = resample(lrn, wf.task, rin, measures = list(wf.costs, mmce), show.info = FALSE)\nr\n# Resample Result\n# Task: waveform\n# Learner: classif.rpart\n# wf.costs.aggr: 8.40\n# wf.costs.mean: 8.40\n# wf.costs.sd: 1.57\n# mmce.aggr: 0.30\n# mmce.mean: 0.30\n# mmce.sd: 0.00\n# Runtime: 0.0816503\n\n## Calculate thresholds as 1/(average costs of true classes)\nth = 2/rowSums(costs)\nnames(th) = getTaskClassLevels(wf.task)\nth\n# 1 2 3 \n# 0.01818182 0.22222222 0.11111111\n\npred.th = setThreshold(r$pred, threshold = th)\nperformance(pred.th, measures = list(wf.costs, mmce))\n# wf.costs mmce \n# 5.7498377 0.3598947 The threshold vector th in the above example is chosen according to the average costs\nof the true classes 55, 4.5 and 9.\nMore exactly, th corresponds to an artificial cost matrix of the structure mentioned\nabove with off-diagonal elements c(2,1) = c(3,1) = 55 , c(1,2) = c(3,2) = 4.5 and c(1,3) = c(2,3) = 9 .\nThis threshold vector may be not optimal but leads to smaller total costs on the data set than\nthe default. ii. Empirical thresholding As in the binary case it is possible to tune the threshold vector using function tuneThreshold .\nSince the scaling of the threshold vector does not change the predicted class labels tuneThreshold returns threshold values that lie in [0,1] and sum to unity. tune.res = tuneThreshold(pred = r$pred, measure = wf.costs)\ntune.res\n# $th\n# 1 2 3 \n# 0.03734676 0.31826144 0.64439179 \n# \n# $perf\n# [1] 4.275197 For comparison we show the standardized version of the theoretically motivated threshold\nvector chosen above. th/sum(th)\n# 1 2 3 \n# 0.05172414 0.63218391 0.31609195 2. Rebalancing i. Weighting In the multi-class case you have to pass a vector of weights as long as the number of classes K to function makeWeightedClassesWrapper .\nThe weight vector can be tuned using function tuneParams . lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeWeightedClassesWrapper(lrn)\n\nps = makeParamSet(makeNumericVectorParam( wcw.weight , len = 3, lower = 0, upper = 1))\nctrl = makeTuneControlRandom()\n\ntune.res = tuneParams(lrn, wf.task, resampling = rin, par.set = ps,\n measures = list(wf.costs, mmce), control = ctrl, show.info = FALSE)\ntune.res\n# Tune result:\n# Op. pars: wcw.weight=0.871,0.105,0.0306\n# wf.costs.test.mean= 2.7,mmce.test.mean=0.242", "title": "Class-dependent misclassification costs" }, { "location": "/cost_sensitive_classif/index.html#example-dependent-misclassification-costs", - "text": "In case of example-dependent costs we have to create a special Task via function makeCostSensTask .\nFor this purpose the feature values x and an n \\times K cost matrix that contains\nthe cost vectors for all n examples in the data set are required. We use the iris data and generate an artificial cost matrix\n(see Beygelzimer et al., 2005 ). df = iris\ncost = matrix(runif(150 * 3, 0, 2000), 150) * (1 - diag(3))[df$Species,] + runif(150, 0, 10)\ncolnames(cost) = levels(iris$Species)\nrownames(cost) = rownames(iris)\ndf$Species = NULL\n\ncostsens.task = makeCostSensTask(id = iris , data = df, cost = cost)\ncostsens.task\n# Supervised task: iris\n# Type: costsens\n# Observations: 150\n# Features:\n# numerics factors ordered \n# 4 0 0 \n# Missings: FALSE\n# Has blocking: FALSE\n# Classes: 3\n# setosa, versicolor, virginica mlr provides several wrappers to turn regular classification or regression methods\ninto Learner s that can deal with example-dependent costs. makeCostSensClassifWrapper (wraps a classification Learner ):\n This is a naive approach where the costs are coerced into class labels by choosing the\n class label with minimum cost for each example. Then a regular classification method is\n used. makeCostSensRegrWrapper (wraps a regression Learner ):\n An individual regression model is fitted for the costs of each class.\n In the prediction step first the costs are predicted for all classes and then the class with\n the lowest predicted costs is selected. makeCostSensWeightedPairsWrapper (wraps a classification Learner ):\n This is also known as cost-sensitive one-vs-one (CS-OVO) and the most sophisticated of\n the currently supported methods.\n For each pair of classes, a binary classifier is fitted.\n For each observation the class label is defined as the element of the pair with minimal costs.\n During fitting, the observations are weighted with the absolute difference in costs.\n Prediction is performed by simple voting. In the following example we use the third method. We create the wrapped Learner \nand train it on the CostSensTask defined above. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeCostSensWeightedPairsWrapper(lrn)\nlrn\n# Learner costsens.classif.multinom from package nnet\n# Type: costsens\n# Name: ; Short name: \n# Class: CostSensWeightedPairsWrapper\n# Properties: twoclass,multiclass,numerics,factors\n# Predict-Type: response\n# Hyperparameters: trace=FALSE\n\nmod = train(lrn, costsens.task)\nmod\n# Model for learner.id=costsens.classif.multinom; learner.class=CostSensWeightedPairsWrapper\n# Trained on: task.id = iris; obs = 150; features = 4\n# Hyperparameters: trace=FALSE The models corresponding to the individual pairs can be accessed by function getLearnerModel . getLearnerModel(mod)\n# [[1]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE\n# \n# [[2]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE\n# \n# [[3]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE mlr provides some performance measures for example-specific cost-sensitive classification.\nIn the following example we calculate the mean costs of the predicted class labels\n( meancosts ) and the misclassification penalty ( mcp ).\nThe latter measure is the average difference between the costs caused by the predicted\nclass labels, i.e., meancosts , and the costs resulting from choosing the\nclass with lowest cost for each observation.\nIn order to compute these measures the costs for the test observations are required and\ntherefore the Task has to be passed to performance . pred = predict(mod, task = costsens.task)\npred\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: 0.05\n# id response\n# 1 1 setosa\n# 2 2 setosa\n# 3 3 setosa\n# 4 4 setosa\n# 5 5 setosa\n# 6 6 setosa\n\nperformance(pred, measures = list(meancosts, mcp), task = costsens.task)\n# meancosts mcp \n# 163.0830 158.2944", + "text": "In case of example-dependent costs we have to create a special Task via function makeCostSensTask .\nFor this purpose the feature values x and an n \\times K cost matrix that contains\nthe cost vectors for all n examples in the data set are required. We use the iris data and generate an artificial cost matrix\n(see Beygelzimer et al., 2005 ). df = iris\ncost = matrix(runif(150 * 3, 0, 2000), 150) * (1 - diag(3))[df$Species,] + runif(150, 0, 10)\ncolnames(cost) = levels(iris$Species)\nrownames(cost) = rownames(iris)\ndf$Species = NULL\n\ncostsens.task = makeCostSensTask(id = iris , data = df, cost = cost)\ncostsens.task\n# Supervised task: iris\n# Type: costsens\n# Observations: 150\n# Features:\n# numerics factors ordered \n# 4 0 0 \n# Missings: FALSE\n# Has blocking: FALSE\n# Classes: 3\n# setosa, versicolor, virginica mlr provides several wrappers to turn regular classification or regression methods\ninto Learner s that can deal with example-dependent costs. makeCostSensClassifWrapper (wraps a classification Learner ):\n This is a naive approach where the costs are coerced into class labels by choosing the\n class label with minimum cost for each example. Then a regular classification method is\n used. makeCostSensRegrWrapper (wraps a regression Learner ):\n An individual regression model is fitted for the costs of each class.\n In the prediction step first the costs are predicted for all classes and then the class with\n the lowest predicted costs is selected. makeCostSensWeightedPairsWrapper (wraps a classification Learner ):\n This is also known as cost-sensitive one-vs-one (CS-OVO) and the most sophisticated of\n the currently supported methods.\n For each pair of classes, a binary classifier is fitted.\n For each observation the class label is defined as the element of the pair with minimal costs.\n During fitting, the observations are weighted with the absolute difference in costs.\n Prediction is performed by simple voting. In the following example we use the third method. We create the wrapped Learner \nand train it on the CostSensTask defined above. lrn = makeLearner( classif.multinom , trace = FALSE)\nlrn = makeCostSensWeightedPairsWrapper(lrn)\nlrn\n# Learner costsens.classif.multinom from package nnet\n# Type: costsens\n# Name: ; Short name: \n# Class: CostSensWeightedPairsWrapper\n# Properties: twoclass,multiclass,numerics,factors\n# Predict-Type: response\n# Hyperparameters: trace=FALSE\n\nmod = train(lrn, costsens.task)\nmod\n# Model for learner.id=costsens.classif.multinom; learner.class=CostSensWeightedPairsWrapper\n# Trained on: task.id = iris; obs = 150; features = 4\n# Hyperparameters: trace=FALSE The models corresponding to the individual pairs can be accessed by function getLearnerModel . getLearnerModel(mod)\n# [[1]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE\n# \n# [[2]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE\n# \n# [[3]]\n# Model for learner.id=classif.multinom; learner.class=classif.multinom\n# Trained on: task.id = feats; obs = 150; features = 4\n# Hyperparameters: trace=FALSE mlr provides some performance measures for example-specific cost-sensitive classification.\nIn the following example we calculate the mean costs of the predicted class labels\n( meancosts ) and the misclassification penalty ( mcp ).\nThe latter measure is the average difference between the costs caused by the predicted\nclass labels, i.e., meancosts , and the costs resulting from choosing the\nclass with lowest cost for each observation.\nIn order to compute these measures the costs for the test observations are required and\ntherefore the Task has to be passed to performance . pred = predict(mod, task = costsens.task)\npred\n# Prediction: 150 observations\n# predict.type: response\n# threshold: \n# time: 0.06\n# id response\n# 1 1 setosa\n# 2 2 setosa\n# 3 3 setosa\n# 4 4 setosa\n# 5 5 setosa\n# 6 6 setosa\n\nperformance(pred, measures = list(meancosts, mcp), task = costsens.task)\n# meancosts mcp \n# 163.0830 158.2944", "title": "Example-dependent misclassification costs" }, { @@ -487,7 +487,7 @@ }, { "location": "/roc_analysis/index.html", - "text": "ROC Analysis and Performance Curves\n\n\nFor binary scoring classifiers a \nthreshold\n (in the following also called \ncutoff\n) value\ncontrols how predicted posterior probabilities are turned into class labels.\nROC curves and other performance plots serve to visualize and analyse the relationship between\none or two performance measures and the threshold.\n\n\nThis section is mainly devoted to \nreceiver operating characteristic\n (ROC) curves that\nplot the \ntrue positive rate\n (sensitivity) on the vertical axis against the \nfalse positive rate\n\n(1 - specificity, fall-out) on the horizontal axis for all possible threshold values.\nCreating other performance plots like \nlift charts\n or \nprecision/recall graphs\n works\nanalogously and is only shown briefly.\n\n\nIn addition to performance visualization ROC curves are helpful in\n\n\n\n\ndetermining an optimal decision threshold for given class prior probabilities and\n misclassification costs (for alternatives see also the sections about\n \ncost-sensitive classification\n and\n \nimbalanced classification problems\n in this tutorial),\n\n\nidentifying regions where one classifier outperforms another and building suitable multi-classifier\n systems,\n\n\nobtaining calibrated estimates of the posterior probabilities.\n\n\n\n\nFor more information see the tutorials and introductory papers by\n\nFawcett (2004)\n,\n\nFawcett (2006)\n\nas well as \nFlach (ICML 2004)\n.\n\n\nIn many applications as, e.g., diagnostic tests or spam detection, there is uncertainty\nabout the class priors or the misclassification costs at the time of prediction, for example\nbecause it's hard to quantify the costs or because costs and class priors vary over time.\nUnder these circumstances the classifier is expected to work well for a whole range of\ndecision thresholds and the area under the ROC curve (AUC) provides a scalar performance\nmeasure for comparing and selecting classifiers.\n\nmlr\n provides the AUC for binary classification (\nauc\n based on package\n\nROCR\n) an also a generalization of the AUC for the\nmulti-class case (\nmulticlass.auc\n based on package \npROC\n).\n\n\nmlr\n offers three ways to plot ROC and other performance curves.\n\n\n\n\nmlr\n's function \ngenerateROCRCurvesData\n is a convenient interface to \nROCR\n's\n \nperformance\n methods with an associated plotting function,\n \nplotROCRCurves\n which uses \nggplot2\n.\n\n\nThe \nmlr\n function \nasROCRPrediction\n converts \nmlr\n \nPrediction\n objects to objects\n of \nROCR\n's class \nprediction\n.\n Then, \nROCR\n's functionality can be used to further analyse the results and generate\n performance plots.\n\n\nmlr\n's function \nplotViperCharts\n provides an interface to\n \nViperCharts\n.\n\n\n\n\nLet's have a look at some examples demonstrating the three possible methods.\n\n\nNote that the \nlearners\n have to be capable of predicting probabilities.\nHave a look at the \ntable of learners\n\nor run \nlistLearners(prob = TRUE)\n to get a list of all learners that support this.\n\n\nPerformance plots with generateROCRCurvesData and plotROCRCurves\n\n\nAs mentioned above \ngenerateROCRCurvesData\n is an interface to \nROCR\n's\n\nperformance\n methods.\nIt provides S3 methods for objects of class \nPrediction\n, \nResampleResult\n\nand \nBenchmarkResult\n (resulting from calling \npredict\n, \nresample\n\nor \nbenchmark\n). \nplotROCRCurves\n plots output from \ngenerateROCRCurvesData\n using \nggplot2\n.\n\n\nExample 1: Single predictions\n\n\nWe consider the \nSonar\n data set from package \nmlbench\n, which poses a\nbinary classification problem (\nsonar.task\n) and apply \nlinear discriminant analysis\n.\n\n\nn = getTaskSize(sonar.task)\ntrain.set = sample(n, size = round(2/3 * n))\ntest.set = setdiff(seq_len(n), train.set)\n\nlrn1 = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nmod1 = train(lrn1, sonar.task, subset = train.set)\npred1 = predict(mod1, task = sonar.task, subset = test.set)\nroc_data = generateROCRCurvesData(pred1)\nroc_data\n#\n learner False positive rate True positive rate Cutoff\n#\n 1 prediction 0 0.00000000 1.0147059\n#\n 2 prediction 0 0.02702703 1.0000000\n#\n 3 prediction 0 0.05405405 0.9999999\n#\n 4 prediction 0 0.08108108 0.9999999\n#\n 5 prediction 0 0.10810811 0.9999999\n\n\n\n\ngenerateROCRCurvesData\n returns an object of class \"ROCRCurvesData\" which contains the results from \nROCR\n's \nperformance\n method (which depends on arguments \nmeas1\n and \nmeas2\n). The data can be extracted by accessing the data element of the object. The object also contains information about the input arguments to \ngenerateROCRCurvesData\n which may be useful.\n\n\nPer default, \nplotROCRCurves\n draws a ROC curve and optionally adds a diagonal line that represents\nthe performance of a random classifier.\n\n\ndroc = generateROCRCurvesData(pred1)\nplotROCRCurves(droc, diagonal = TRUE)\n\n\n\n\n \nThere is also an experimental plotting function \nplotROCRCurvesGGVIS\n which uses \nggvis\n to create similar\nfigures with the addition of (optional) interactive tooltips (displayed on hover) that display the threshold\nat that point in the curve.\n\n\nplotROCRCurvesGGVIS(droc, cutoffs = TRUE)\n\n\n\n\nThe corresponding area under curve (\nauc\n) can be calculated as usual by calling\n\nperformance\n.\n\n\nperformance(pred1, auc)\n#\n auc \n#\n 0.847973\n\n\n\n\nIn addition to \nlinear discriminant analysis\n we try a support vector machine\nwith RBF kernel (\nksvm\n).\n\n\nlrn2 = makeLearner(\nclassif.ksvm\n, predict.type = \nprob\n)\nmod2 = train(lrn2, sonar.task, subset = train.set)\npred2 = predict(mod2, task = sonar.task, subset = test.set)\n\n\n\n\nIn order to compare the performance of the two learners you might want to display the two\ncorresponding ROC curves in one plot.\nFor this purpose just pass a named \nlist\n of \nPrediction\ns to \nplotROCRCurves\n.\n\n\nplotROCRCurves(generateROCRCurvesData(list(lda = pred1, ksvm = pred2)))\n\n\n\n\n \n\n\nIt's clear from the plot above that \nksvm\n has a slightly higher AUC than\n\nlda\n.\n\n\nperformance(pred2, auc)\n#\n auc \n#\n 0.9214527\n\n\n\n\nIt is easily possible to generate other performance plots by passing the appropriate performance\nmeasures to \nplotROCRCurves\n.\nNote that arguments \nmeas1\n and \nmeas2\n do not refer to \nmlr\n's performance measures,\nbut to measures provided by \nROCR\n and listed \nhere\n.\nBelow is code for a \nlift chart\n which shows the lift value (\n\"lift\"\n) versus the rate of\npositive predictions (\n\"rpp\"\n).\n\n\nout = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = \nlift\n, meas2 = \nrpp\n)\nplotROCRCurves(out)\n\n\n\n\n \n\n\nA plot of a single performance measure (accuracy in the example code below) versus the\nthreshold can be generated by setting \nmeas2 = \"cutoff\"\n.\n\n\nout = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = \nacc\n, meas2 = \ncutoff\n)\nplotROCRCurves(out)\n\n\n\n\n \n\n\nAs you may recall, an alternative function for plotting performance values versus the decision\nthreshold is \nplotThreshVsPerf\n.\nWhile \nplotThreshVsPerf\n permits to plot several performance measures at once,\n\nplotROCRCurves\n makes it easy to superpose the performance curves of multiple learners.\n\n\nExample 2: Benchmark experiment\n\n\nThe analysis in the example above can be improved in several regards.\nWe only considered the training performance and, ideally, the support vector machine should\nhave been \ntuned\n.\nMoreover, we wrote individual code for training/prediction of each learner, which can become\ntedious very quickly.\nA more practical way to apply several learners to a \nTask\n and compare their performance is\nprovided by function \nbenchmark\n (see also \nBenchmark Experiments\n).\n\n\nWe again consider the \nSonar\n data set and apply \nlda\n\nas well as \nksvm\n.\nWe first generate a \ntuning wrapper\n for \nksvm\n.\nThe cost parameter is tuned on a (for demonstration purposes small) parameter grid.\nWe assume that we are interested in a good performance over the complete threshold range\nand therefore tune with regard to the \nauc\n.\nThe error rate (\nmmce\n) for threshold 0.5 is reported as well.\n\n\n## Tune wrapper for ksvm\nrdesc.inner = makeResampleDesc(\nHoldout\n)\nms = list(auc, mmce)\nps = makeParamSet(\n makeDiscreteParam(\nC\n, 2^(-1:1))\n)\nctrl = makeTuneControlGrid()\nlrn2 = makeTuneWrapper(lrn2, rdesc.inner, ms, ps, ctrl, show.info = FALSE)\n\n\n\n\nBelow the actual benchmark experiment is conducted.\nAs resampling strategy we use 5-fold cross-validation and again calculate the \nauc\n\nas well as the error rate (for a threshold/cutoff value of 0.5).\n\n\n## Benchmark experiment\nlrns = list(lrn1, lrn2)\nrdesc.outer = makeResampleDesc(\nCV\n, iters = 5)\n\nres = benchmark(lrns, tasks = sonar.task, resampling = rdesc.outer, measures = ms, show.info = FALSE)\nres\n#\n task.id learner.id auc.test.mean mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.7835442 0.2592334\n#\n 2 Sonar-example classif.ksvm.tuned 0.9454418 0.1390244\n\n\n\n\nCalling \nplotROCRCurves\n on the \nresult\n of the benchmark experiment\nproduces a plot with ROC curves for all learners in the experiment.\n\n\nplotROCRCurves(generateROCRCurvesData(res))\n\n\n\n\n \n\n\nPer default, threshold-averaged ROC curves are shown.\nSince we used 5-fold cross-validation we have predictions on 5 test data sets and therefore\n5 ROC curves per classifier.\nFor each threshold value the means of the corresponding 5 false and true positive rates are\ncalculated and plotted against each other.\n\n\nIf you want to plot the individual ROC curves for each resample iteration set \navg = \"none\"\n.\nOther available options are \navg = \"horizontal\"\n and \navg = \"vertical\"\n.\n\n\nplotROCRCurves(generateROCRCurvesData(res, avg = \nnone\n))\n\n\n\n\n \n\n\nAn alternative to averaging is to just merge the 5 test folds and draw a single ROC curve.\nMerging can be achieved by manually changing the \nclass\n attribute of\nthe prediction objects from \nResamplePrediction\n to \nPrediction\n.\n\n\nBelow the predictions are extracted from the \nBenchmarkResult\n via function \ngetBMRPredictions\n,\nthe \nclass\n is changed and the ROC curves are created.\n\n\nAveraging methods are normally preferred\n(cp. \nFawcett, 2006\n),\nas they permit to assess the variability, which is needed to properly compare classifier\nperformance.\n\n\n## Extract predictions\npreds = getBMRPredictions(res)[[1]]\n\n## Change the class attribute\npreds2 = lapply(preds, function(x) {class(x) = \nPrediction\n; return(x)})\n\n## Draw ROC curves\nplotROCRCurves(generateROCRCurvesData(preds2, avg = \nnone\n))\n\n\n\n\n \n\n\nAgain, you can easily create other standard evaluation plots by calling \nplotROCRCurves\n\non the \nBenchmarkResult\n with the appropriate performance measures (see \nROCR::performance\n).\n\n\nPerformance plots with asROCRPrediction\n\n\nDrawing performance plots with package \nROCR\n works through three basic commands:\n\n\n\n\nROCR::prediction\n: Create a \nROCR\n \nprediction\n object.\n\n\nROCR::performance\n: Calculate one or more performance measures for the\n given prediction object.\n\n\nA reimplementation of \nROCR::plot\n which uses \nggplot2\n.\n\n\n\n\nmlr\n's function \nasROCRPrediction\n converts an \nmlr\n \nPrediction\n object to\na \nROCR\n \nprediction\n object.\nIn order to create performance plots steps 2. and 3. have to be run by the user.\n\n\nThis is obviously less convenient than calling \nplotROCRCurves\n (which extracts predictions,\ncalls \nasROCRPrediction\n and executes steps 2. and 3. internally).\nOn the other hand this way provides more control over the generated plots by, e.g., using graphical\nparameters that are not (yet) accessible via \nplotROCRCurves\n.\nMoreover, you can directly benefit from any enhancements in \nROCR\n, use your own\n\nROCR\n-based code or other packages that depend on \nROCR\n, and use \nROCR\n's \nplot\n.\nFor more details see the \nROCR\n documentation and \ndemo(ROCR)\n.\n\n\nAn addditional alternative is to call \nplotROCRCurves\n, extract the data from the \nggplot2\n object using the \ndata\n element of the object, e.g., \nobj$data\n, and then plot the data using whatever method you prefer.\n\n\nExample 1: Single predictions (continued)\n\n\nWe go back to out first example where we trained and predicted \nlda\n on the\n\nsonar classification task\n.\n\n\nn = getTaskSize(sonar.task)\ntrain.set = sample(n, size = round(2/3 * n))\ntest.set = setdiff(seq_len(n), train.set)\n\n## Train and predict linear discriminant analysis\nlrn1 = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nmod1 = train(lrn1, sonar.task, subset = train.set)\npred1 = predict(mod1, task = sonar.task, subset = test.set)\n\n\n\n\nBelow we use \nasROCRPrediction\n to convert the lda prediction, let \nROCR\n calculate the\ntrue and false positive rate and plot the ROC curve.\n\n\n## Convert prediction\nROCRpred1 = asROCRPrediction(pred1)\n\n## Calculate true and false positive rate\nROCRperf1 = ROCR::performance(ROCRpred1, \ntpr\n, \nfpr\n)\n\n## Draw ROC curve\nROCR::plot(ROCRperf1)\n\n\n\n\n \n\n\nBelow is the same ROC curve, but we make use of some more graphical parameters:\nThe ROC curve is color-coded by the threshold and selected threshold values are printed on\nthe curve. Additionally, the convex hull (black broken line) of the ROC curve is drawn.\n\n\n## Draw ROC curve\nROCR::plot(ROCRperf1, colorize = TRUE, print.cutoffs.at = seq(0.1, 0.9, 0.1), lwd = 2)\n\n## Draw convex hull of ROC curve\nch = ROCR::performance(ROCRpred1, \nrch\n)\nROCR::plot(ch, add = TRUE, lty = 2)\n\n\n\n\n \n\n\nExample 2: Benchmark experiments (continued)\n\n\nWe again consider the benchmark experiment conducted earlier.\nWe first extract the predictions by \ngetBMRPredictions\n and then convert them via function\n\nasROCRPrediction\n.\n\n\n## Extract predictions\npreds = getBMRPredictions(res)[[1]]\n\n## Convert predictions\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate true and false positive rate\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \ntpr\n, \nfpr\n))\n\n\n\n\nWe draw the horizontally averaged ROC curves (solid lines) as well as the ROC curves for\nthe individual resampling iterations (broken lines).\nMoreover, standard error bars are plotted for selected true positive rates (0.1, 0.2, ..., 0.9).\nSee \nROCR\n's \nplot\n function for details.\n\n\n## lda average ROC curve\nplot(ROCRperfs[[1]], col = \nblue\n, avg = \nhorizontal\n, spread.estimate = \nstderror\n,\n show.spread.at = seq(0.1, 0.9, 0.1), plotCI.col = \nblue\n, plotCI.lwd = 2, lwd = 2)\n## lda individual ROC curves\nplot(ROCRperfs[[1]], col = \nblue\n, lty = 2, lwd = 0.25, add = TRUE)\n\n## ksvm average ROC curve\nplot(ROCRperfs[[2]], col = \nred\n, avg = \nhorizontal\n, spread.estimate = \nstderror\n,\n show.spread.at = seq(0.4, 0.9, 0.1), plotCI.col = \nred\n, plotCI.lwd = 2, lwd = 2, add = TRUE)\n## ksvm individual ROC curves\nplot(ROCRperfs[[2]], col = \nred\n, lty = 2, lwd = 0.25, add = TRUE)\n\nlegend(\nbottomright\n, legend = getBMRLearnerIds(res), lty = 1, lwd = 2, col = c(\nblue\n, \nred\n))\n\n\n\n\n \n\n\nIn order to create other evaluation plots like \nprecision/recall graphs\n you just have to change\nthe performance measures when calling \nROCR::performance\n.\n\n\n## Extract and convert predictions\npreds = getBMRPredictions(res)[[1]]\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate precision and recall\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \nprec\n, \nrec\n))\n\n## Draw performance plot\nplot(ROCRperfs[[1]], col = \nblue\n, avg = \nthreshold\n)\nplot(ROCRperfs[[2]], col = \nred\n, avg = \nthreshold\n, add = TRUE)\nlegend(\nbottomleft\n, legend = getBMRLearnerIds(res), lty = 1, col = c(\nblue\n, \nred\n))\n\n\n\n\n \n\n\nIf you want to plot a performance measure versus the threshold, specify only one measure when\ncalling \nROCR::performance\n.\nBelow the average accuracy over the 5 cross-validation iterations is plotted against the\nthreshold. Moreover, boxplots for certain threshold values (0.1, 0.2, ..., 0.9) are drawn.\n\n\n## Extract and convert predictions\npreds = getBMRPredictions(res)[[1]]\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate accuracy\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \nacc\n))\n\n## Plot accuracy versus threshold\nplot(ROCRperfs[[1]], avg = \nvertical\n, spread.estimate = \nboxplot\n, lwd = 2, col = \nblue\n,\n show.spread.at = seq(0.1, 0.9, 0.1), ylim = c(0,1), xlab = \nThreshold\n)\n\n\n\n\n \n\n\nViper charts\n\n\nmlr\n also supports \nViperCharts\n for plotting ROC and other performance\ncurves. Like \nplotROCRCurves\n it has S3 methods for objects of class \nPrediction\n,\n\nResampleResult\n and \nBenchmarkResult\n.\nBelow plots for the benchmark experiment (Example 2) are generated.\n\n\nz = plotViperCharts(res, chart = \nrocc\n, browse = FALSE)\n#\n Error in function (type, msg, asError = TRUE) : couldn't connect to host\n\n\n\n\nNote that besides ROC curves you get several other plots like lift charts or cost curves.\nFor details, see \nplotViperCharts\n.", + "text": "ROC Analysis and Performance Curves\n\n\nFor binary scoring classifiers a \nthreshold\n (in the following also called \ncutoff\n) value\ncontrols how predicted posterior probabilities are turned into class labels.\nROC curves and other performance plots serve to visualize and analyse the relationship between\none or two performance measures and the threshold.\n\n\nThis section is mainly devoted to \nreceiver operating characteristic\n (ROC) curves that\nplot the \ntrue positive rate\n (sensitivity) on the vertical axis against the \nfalse positive rate\n\n(1 - specificity, fall-out) on the horizontal axis for all possible threshold values.\nCreating other performance plots like \nlift charts\n or \nprecision/recall graphs\n works\nanalogously and is only shown briefly.\n\n\nIn addition to performance visualization ROC curves are helpful in\n\n\n\n\ndetermining an optimal decision threshold for given class prior probabilities and\n misclassification costs (for alternatives see also the sections about\n \ncost-sensitive classification\n and\n \nimbalanced classification problems\n in this tutorial),\n\n\nidentifying regions where one classifier outperforms another and building suitable multi-classifier\n systems,\n\n\nobtaining calibrated estimates of the posterior probabilities.\n\n\n\n\nFor more information see the tutorials and introductory papers by\n\nFawcett (2004)\n,\n\nFawcett (2006)\n\nas well as \nFlach (ICML 2004)\n.\n\n\nIn many applications as, e.g., diagnostic tests or spam detection, there is uncertainty\nabout the class priors or the misclassification costs at the time of prediction, for example\nbecause it's hard to quantify the costs or because costs and class priors vary over time.\nUnder these circumstances the classifier is expected to work well for a whole range of\ndecision thresholds and the area under the ROC curve (AUC) provides a scalar performance\nmeasure for comparing and selecting classifiers.\n\nmlr\n provides the AUC for binary classification (\nauc\n based on package\n\nROCR\n) an also a generalization of the AUC for the\nmulti-class case (\nmulticlass.auc\n based on package \npROC\n).\n\n\nmlr\n offers three ways to plot ROC and other performance curves.\n\n\n\n\nmlr\n's function \ngenerateROCRCurvesData\n is a convenient interface to \nROCR\n's\n \nperformance\n methods with an associated plotting function,\n \nplotROCRCurves\n which uses \nggplot2\n.\n\n\nThe \nmlr\n function \nasROCRPrediction\n converts \nmlr\n \nPrediction\n objects to objects\n of \nROCR\n's class \nprediction\n.\n Then, \nROCR\n's functionality can be used to further analyse the results and generate\n performance plots.\n\n\nmlr\n's function \nplotViperCharts\n provides an interface to\n \nViperCharts\n.\n\n\n\n\nLet's have a look at some examples demonstrating the three possible methods.\n\n\nNote that the \nlearners\n have to be capable of predicting probabilities.\nHave a look at the \ntable of learners\n\nor run \nlistLearners(prob = TRUE)\n to get a list of all learners that support this.\n\n\nPerformance plots with generateROCRCurvesData and plotROCRCurves\n\n\nAs mentioned above \ngenerateROCRCurvesData\n is an interface to \nROCR\n's\n\nperformance\n methods.\nIt provides S3 methods for objects of class \nPrediction\n, \nResampleResult\n\nand \nBenchmarkResult\n (resulting from calling \npredict\n, \nresample\n\nor \nbenchmark\n). \nplotROCRCurves\n plots output from \ngenerateROCRCurvesData\n using \nggplot2\n.\n\n\nExample 1: Single predictions\n\n\nWe consider the \nSonar\n data set from package \nmlbench\n, which poses a\nbinary classification problem (\nsonar.task\n) and apply \nlinear discriminant analysis\n.\n\n\nn = getTaskSize(sonar.task)\ntrain.set = sample(n, size = round(2/3 * n))\ntest.set = setdiff(seq_len(n), train.set)\n\nlrn1 = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nmod1 = train(lrn1, sonar.task, subset = train.set)\npred1 = predict(mod1, task = sonar.task, subset = test.set)\nroc_data = generateROCRCurvesData(pred1)\nroc_data\n#\n learner False positive rate True positive rate Cutoff\n#\n 1 prediction 0 0.00000000 1.0147059\n#\n 2 prediction 0 0.02702703 1.0000000\n#\n 3 prediction 0 0.05405405 0.9999999\n#\n 4 prediction 0 0.08108108 0.9999999\n#\n 5 prediction 0 0.10810811 0.9999999\n\n\n\n\ngenerateROCRCurvesData\n returns an object of class \"ROCRCurvesData\" which contains the results from \nROCR\n's \nperformance\n method (which depends on arguments \nmeas1\n and \nmeas2\n). The data can be extracted by accessing the data element of the object. The object also contains information about the input arguments to \ngenerateROCRCurvesData\n which may be useful.\n\n\nPer default, \nplotROCRCurves\n draws a ROC curve and optionally adds a diagonal line that represents\nthe performance of a random classifier.\n\n\ndroc = generateROCRCurvesData(pred1)\nplotROCRCurves(droc, diagonal = TRUE)\n\n\n\n\n \nThere is also an experimental plotting function \nplotROCRCurvesGGVIS\n which uses \nggvis\n to create similar\nfigures with the addition of (optional) interactive tooltips (displayed on hover) that display the threshold\nat that point in the curve.\n\n\nplotROCRCurvesGGVIS(droc, cutoffs = TRUE)\n\n\n\n\nThe corresponding area under curve (\nauc\n) can be calculated as usual by calling\n\nperformance\n.\n\n\nperformance(pred1, auc)\n#\n auc \n#\n 0.847973\n\n\n\n\nIn addition to \nlinear discriminant analysis\n we try a support vector machine\nwith RBF kernel (\nksvm\n).\n\n\nlrn2 = makeLearner(\nclassif.ksvm\n, predict.type = \nprob\n)\nmod2 = train(lrn2, sonar.task, subset = train.set)\npred2 = predict(mod2, task = sonar.task, subset = test.set)\n\n\n\n\nIn order to compare the performance of the two learners you might want to display the two\ncorresponding ROC curves in one plot.\nFor this purpose just pass a named \nlist\n of \nPrediction\ns to \nplotROCRCurves\n.\n\n\nplotROCRCurves(generateROCRCurvesData(list(lda = pred1, ksvm = pred2)))\n\n\n\n\n \n\n\nIt's clear from the plot above that \nksvm\n has a slightly higher AUC than\n\nlda\n.\n\n\nperformance(pred2, auc)\n#\n auc \n#\n 0.9214527\n\n\n\n\nIt is easily possible to generate other performance plots by passing the appropriate performance\nmeasures to \nplotROCRCurves\n.\nNote that arguments \nmeas1\n and \nmeas2\n do not refer to \nmlr\n's performance measures,\nbut to measures provided by \nROCR\n and listed \nhere\n.\nBelow is code for a \nlift chart\n which shows the lift value (\n\"lift\"\n) versus the rate of\npositive predictions (\n\"rpp\"\n).\n\n\nout = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = \nlift\n, meas2 = \nrpp\n)\nplotROCRCurves(out)\n\n\n\n\n \n\n\nA plot of a single performance measure (accuracy in the example code below) versus the\nthreshold can be generated by setting \nmeas2 = \"cutoff\"\n.\n\n\nout = generateROCRCurvesData(list(lda = pred1, ksvm = pred2), meas1 = \nacc\n, meas2 = \ncutoff\n)\nplotROCRCurves(out)\n\n\n\n\n \n\n\nAs you may recall, an alternative function for plotting performance values versus the decision\nthreshold is \nplotThreshVsPerf\n.\nWhile \nplotThreshVsPerf\n permits to plot several performance measures at once,\n\nplotROCRCurves\n makes it easy to superpose the performance curves of multiple learners.\n\n\nExample 2: Benchmark experiment\n\n\nThe analysis in the example above can be improved in several regards.\nWe only considered the training performance and, ideally, the support vector machine should\nhave been \ntuned\n.\nMoreover, we wrote individual code for training/prediction of each learner, which can become\ntedious very quickly.\nA more practical way to apply several learners to a \nTask\n and compare their performance is\nprovided by function \nbenchmark\n (see also \nBenchmark Experiments\n).\n\n\nWe again consider the \nSonar\n data set and apply \nlda\n\nas well as \nksvm\n.\nWe first generate a \ntuning wrapper\n for \nksvm\n.\nThe cost parameter is tuned on a (for demonstration purposes small) parameter grid.\nWe assume that we are interested in a good performance over the complete threshold range\nand therefore tune with regard to the \nauc\n.\nThe error rate (\nmmce\n) for threshold 0.5 is reported as well.\n\n\n## Tune wrapper for ksvm\nrdesc.inner = makeResampleDesc(\nHoldout\n)\nms = list(auc, mmce)\nps = makeParamSet(\n makeDiscreteParam(\nC\n, 2^(-1:1))\n)\nctrl = makeTuneControlGrid()\nlrn2 = makeTuneWrapper(lrn2, rdesc.inner, ms, ps, ctrl, show.info = FALSE)\n\n\n\n\nBelow the actual benchmark experiment is conducted.\nAs resampling strategy we use 5-fold cross-validation and again calculate the \nauc\n\nas well as the error rate (for a threshold/cutoff value of 0.5).\n\n\n## Benchmark experiment\nlrns = list(lrn1, lrn2)\nrdesc.outer = makeResampleDesc(\nCV\n, iters = 5)\n\nres = benchmark(lrns, tasks = sonar.task, resampling = rdesc.outer, measures = ms, show.info = FALSE)\nres\n#\n task.id learner.id auc.test.mean mmce.test.mean\n#\n 1 Sonar-example classif.lda 0.7835442 0.2592334\n#\n 2 Sonar-example classif.ksvm.tuned 0.9454418 0.1390244\n\n\n\n\nCalling \nplotROCRCurves\n on the \nresult\n of the benchmark experiment\nproduces a plot with ROC curves for all learners in the experiment.\n\n\nplotROCRCurves(generateROCRCurvesData(res))\n\n\n\n\n \n\n\nPer default, threshold-averaged ROC curves are shown.\nSince we used 5-fold cross-validation we have predictions on 5 test data sets and therefore\n5 ROC curves per classifier.\nFor each threshold value the means of the corresponding 5 false and true positive rates are\ncalculated and plotted against each other.\n\n\nIf you want to plot the individual ROC curves for each resample iteration set \navg = \"none\"\n.\nOther available options are \navg = \"horizontal\"\n and \navg = \"vertical\"\n.\n\n\nplotROCRCurves(generateROCRCurvesData(res, avg = \nnone\n))\n\n\n\n\n \n\n\nAn alternative to averaging is to just merge the 5 test folds and draw a single ROC curve.\nMerging can be achieved by manually changing the \nclass\n attribute of\nthe prediction objects from \nResamplePrediction\n to \nPrediction\n.\n\n\nBelow the predictions are extracted from the \nBenchmarkResult\n via function \ngetBMRPredictions\n,\nthe \nclass\n is changed and the ROC curves are created.\n\n\nAveraging methods are normally preferred\n(cp. \nFawcett, 2006\n),\nas they permit to assess the variability, which is needed to properly compare classifier\nperformance.\n\n\n## Extract predictions\npreds = getBMRPredictions(res)[[1]]\n\n## Change the class attribute\npreds2 = lapply(preds, function(x) {class(x) = \nPrediction\n; return(x)})\n\n## Draw ROC curves\nplotROCRCurves(generateROCRCurvesData(preds2, avg = \nnone\n))\n\n\n\n\n \n\n\nAgain, you can easily create other standard evaluation plots by calling \nplotROCRCurves\n\non the \nBenchmarkResult\n with the appropriate performance measures (see \nROCR::performance\n).\n\n\nPerformance plots with asROCRPrediction\n\n\nDrawing performance plots with package \nROCR\n works through three basic commands:\n\n\n\n\nROCR::prediction\n: Create a \nROCR\n \nprediction\n object.\n\n\nROCR::performance\n: Calculate one or more performance measures for the\n given prediction object.\n\n\nA reimplementation of \nROCR::plot\n which uses \nggplot2\n.\n\n\n\n\nmlr\n's function \nasROCRPrediction\n converts an \nmlr\n \nPrediction\n object to\na \nROCR\n \nprediction\n object.\nIn order to create performance plots steps 2. and 3. have to be run by the user.\n\n\nThis is obviously less convenient than calling \nplotROCRCurves\n (which extracts predictions,\ncalls \nasROCRPrediction\n and executes steps 2. and 3. internally).\nOn the other hand this way provides more control over the generated plots by, e.g., using graphical\nparameters that are not (yet) accessible via \nplotROCRCurves\n.\nMoreover, you can directly benefit from any enhancements in \nROCR\n, use your own\n\nROCR\n-based code or other packages that depend on \nROCR\n, and use \nROCR\n's \nplot\n.\nFor more details see the \nROCR\n documentation and \ndemo(ROCR)\n.\n\n\nAn addditional alternative is to call \nplotROCRCurves\n, extract the data from the \nggplot2\n object using the \ndata\n element of the object, e.g., \nobj$data\n, and then plot the data using whatever method you prefer.\n\n\nExample 1: Single predictions (continued)\n\n\nWe go back to out first example where we trained and predicted \nlda\n on the\n\nsonar classification task\n.\n\n\nn = getTaskSize(sonar.task)\ntrain.set = sample(n, size = round(2/3 * n))\ntest.set = setdiff(seq_len(n), train.set)\n\n## Train and predict linear discriminant analysis\nlrn1 = makeLearner(\nclassif.lda\n, predict.type = \nprob\n)\nmod1 = train(lrn1, sonar.task, subset = train.set)\npred1 = predict(mod1, task = sonar.task, subset = test.set)\n\n\n\n\nBelow we use \nasROCRPrediction\n to convert the lda prediction, let \nROCR\n calculate the\ntrue and false positive rate and plot the ROC curve.\n\n\n## Convert prediction\nROCRpred1 = asROCRPrediction(pred1)\n\n## Calculate true and false positive rate\nROCRperf1 = ROCR::performance(ROCRpred1, \ntpr\n, \nfpr\n)\n\n## Draw ROC curve\nROCR::plot(ROCRperf1)\n\n\n\n\n \n\n\nBelow is the same ROC curve, but we make use of some more graphical parameters:\nThe ROC curve is color-coded by the threshold and selected threshold values are printed on\nthe curve. Additionally, the convex hull (black broken line) of the ROC curve is drawn.\n\n\n## Draw ROC curve\nROCR::plot(ROCRperf1, colorize = TRUE, print.cutoffs.at = seq(0.1, 0.9, 0.1), lwd = 2)\n\n## Draw convex hull of ROC curve\nch = ROCR::performance(ROCRpred1, \nrch\n)\nROCR::plot(ch, add = TRUE, lty = 2)\n\n\n\n\n \n\n\nExample 2: Benchmark experiments (continued)\n\n\nWe again consider the benchmark experiment conducted earlier.\nWe first extract the predictions by \ngetBMRPredictions\n and then convert them via function\n\nasROCRPrediction\n.\n\n\n## Extract predictions\npreds = getBMRPredictions(res)[[1]]\n\n## Convert predictions\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate true and false positive rate\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \ntpr\n, \nfpr\n))\n\n\n\n\nWe draw the horizontally averaged ROC curves (solid lines) as well as the ROC curves for\nthe individual resampling iterations (broken lines).\nMoreover, standard error bars are plotted for selected true positive rates (0.1, 0.2, ..., 0.9).\nSee \nROCR\n's \nplot\n function for details.\n\n\n## lda average ROC curve\nplot(ROCRperfs[[1]], col = \nblue\n, avg = \nhorizontal\n, spread.estimate = \nstderror\n,\n show.spread.at = seq(0.1, 0.9, 0.1), plotCI.col = \nblue\n, plotCI.lwd = 2, lwd = 2)\n## lda individual ROC curves\nplot(ROCRperfs[[1]], col = \nblue\n, lty = 2, lwd = 0.25, add = TRUE)\n\n## ksvm average ROC curve\nplot(ROCRperfs[[2]], col = \nred\n, avg = \nhorizontal\n, spread.estimate = \nstderror\n,\n show.spread.at = seq(0.4, 0.9, 0.1), plotCI.col = \nred\n, plotCI.lwd = 2, lwd = 2, add = TRUE)\n## ksvm individual ROC curves\nplot(ROCRperfs[[2]], col = \nred\n, lty = 2, lwd = 0.25, add = TRUE)\n\nlegend(\nbottomright\n, legend = getBMRLearnerIds(res), lty = 1, lwd = 2, col = c(\nblue\n, \nred\n))\n\n\n\n\n \n\n\nIn order to create other evaluation plots like \nprecision/recall graphs\n you just have to change\nthe performance measures when calling \nROCR::performance\n.\n\n\n## Extract and convert predictions\npreds = getBMRPredictions(res)[[1]]\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate precision and recall\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \nprec\n, \nrec\n))\n\n## Draw performance plot\nplot(ROCRperfs[[1]], col = \nblue\n, avg = \nthreshold\n)\nplot(ROCRperfs[[2]], col = \nred\n, avg = \nthreshold\n, add = TRUE)\nlegend(\nbottomleft\n, legend = getBMRLearnerIds(res), lty = 1, col = c(\nblue\n, \nred\n))\n\n\n\n\n \n\n\nIf you want to plot a performance measure versus the threshold, specify only one measure when\ncalling \nROCR::performance\n.\nBelow the average accuracy over the 5 cross-validation iterations is plotted against the\nthreshold. Moreover, boxplots for certain threshold values (0.1, 0.2, ..., 0.9) are drawn.\n\n\n## Extract and convert predictions\npreds = getBMRPredictions(res)[[1]]\nROCRpreds = lapply(preds, asROCRPrediction)\n\n## Calculate accuracy\nROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, \nacc\n))\n\n## Plot accuracy versus threshold\nplot(ROCRperfs[[1]], avg = \nvertical\n, spread.estimate = \nboxplot\n, lwd = 2, col = \nblue\n,\n show.spread.at = seq(0.1, 0.9, 0.1), ylim = c(0,1), xlab = \nThreshold\n)\n\n\n\n\n \n\n\nViper charts\n\n\nmlr\n also supports \nViperCharts\n for plotting ROC and other performance\ncurves. Like \nplotROCRCurves\n it has S3 methods for objects of class \nPrediction\n,\n\nResampleResult\n and \nBenchmarkResult\n.\nBelow plots for the benchmark experiment (Example 2) are generated.\n\n\nz = plotViperCharts(res, chart = \nrocc\n, browse = FALSE)\n\n\n\n\nYou can see the plot created this way \nhere\n.\nNote that besides ROC curves you get several other plots like lift charts or cost curves.\nFor details, see \nplotViperCharts\n.", "title": "ROC Analysis" }, { @@ -507,12 +507,12 @@ }, { "location": "/roc_analysis/index.html#viper-charts", - "text": "mlr also supports ViperCharts for plotting ROC and other performance\ncurves. Like plotROCRCurves it has S3 methods for objects of class Prediction , ResampleResult and BenchmarkResult .\nBelow plots for the benchmark experiment (Example 2) are generated. z = plotViperCharts(res, chart = rocc , browse = FALSE)\n# Error in function (type, msg, asError = TRUE) : couldn't connect to host Note that besides ROC curves you get several other plots like lift charts or cost curves.\nFor details, see plotViperCharts .", + "text": "mlr also supports ViperCharts for plotting ROC and other performance\ncurves. Like plotROCRCurves it has S3 methods for objects of class Prediction , ResampleResult and BenchmarkResult .\nBelow plots for the benchmark experiment (Example 2) are generated. z = plotViperCharts(res, chart = rocc , browse = FALSE) You can see the plot created this way here .\nNote that besides ROC curves you get several other plots like lift charts or cost curves.\nFor details, see plotViperCharts .", "title": "Viper charts" }, { "location": "/multilabel/index.html", - "text": "Multilabel Classification\n\n\nMultilabel classification is a classification problem where multiple target labels can be\nassigned to each observation instead of only one like in multiclass classification.\n\n\nTwo different approaches exist for multilabel classification. \nProblem transformation\nmethods\n try to transform the multilabel classification into binary or multiclass\nclassification problems. \nAlgorithm adaptation methods\n adapt multiclass algorithms\nso they can be applied directly to the problem.\n\n\nCreating a task\n\n\nThe first thing you have to do for multilabel classification in \nmlr\n is to\nget your data in the right format. You need a \ndata.frame\n which\nconsists of the features and a logical vector for each label which indicates if the\nlabel is present in the observation or not. After that you can create a\n\nMultilabelTask\n like a normal \nClassifTask\n. Instead of one\ntarget name you have to specify a vector of targets which correspond to the names of\nlogical variables in the \ndata.frame\n. In the following example\nwe get the yeast data frame from the already existing \nyeast.task\n, extract\nthe 14 label names and create the task again.\n\n\nyeast = getTaskData(yeast.task)\nlabels = colnames(yeast)[1:14]\nyeast.task = makeMultilabelTask(id = \nmulti\n, data = yeast, target = labels)\nyeast.task\n#\n Supervised task: multi\n#\n Type: multilabel\n#\n Target: label1,label2,label3,label4,label5,label6,label7,label8,label9,label10,label11,label12,label13,label14\n#\n Observations: 2417\n#\n Features:\n#\n numerics factors ordered \n#\n 103 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 14\n#\n label1 label2 label3 label4 label5 label6 label7 label8 label9 \n#\n 762 1038 983 862 722 597 428 480 178 \n#\n label10 label11 label12 label13 label14 \n#\n 253 289 1816 1799 34\n\n\n\n\nConstructing a learner\n\n\nMultilabel classification in \nmlr\n can currently be done in two ways:\n\n\n\n\n\n\nUse the binary relevance method.\nThis problem transformation method converts the multilabel problem to binary\nclassification problems for each\nlabel and applies a simple binary classificator on these. In \nmlr\n this can be done by\nconverting your binary learner to a wrapped binary relevance multilabel learner.\n\n\n\n\n\n\nApply directly an algorithm adaptation method which treats the whole\nproblem with a specific algorithm.\n\n\n\n\n\n\nBinary relevance method\n\n\nFor generating a wrapped multilabel learner first create a binary (or multiclass)\nclassification learner with \nmakeLearner\n. Afterwards apply the function\n\nmakeMultilabelBinaryRelevanceWrapper\n to the learner\nto convert it into a binary relevance learner.\n\n\nYou can also generate a binary relevance learner directly, as you can see in the example.\n\n\nmultilabel.lrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nmultilabel.lrn = makeMultilabelBinaryRelevanceWrapper(multilabel.lrn)\nmultilabel.lrn\n#\n Learner multilabel.classif.rpart from package rpart\n#\n Type: multilabel\n#\n Name: ; Short name: \n#\n Class: MultilabelBinaryRelevanceWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: prob\n#\n Hyperparameters: xval=0\n\nmultilabel.lrn1 = makeMultilabelBinaryRelevanceWrapper(\nclassif.rpart\n)\nmultilabel.lrn1\n#\n Learner multilabel.classif.rpart from package rpart\n#\n Type: multilabel\n#\n Name: ; Short name: \n#\n Class: MultilabelBinaryRelevanceWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0\n\n\n\n\nAlgorithm adaptation method\n\n\nCurrently the only available algorithm adaptation method in \nR\n is the Random Ferns\nmultilabel algorithm in the \nrFerns\n package. You can create the learner for this algorithm\nlike in multiclass classification problems.\n\n\nmultilabel.lrn2 = makeLearner(\nmultilabel.rFerns\n)\nmultilabel.lrn2\n#\n Learner multilabel.rFerns from package rFerns\n#\n Type: multilabel\n#\n Name: Random ferns; Short name: rFerns\n#\n Class: multilabel.rFerns\n#\n Properties: numerics,factors,ordered\n#\n Predict-Type: response\n#\n Hyperparameters:\n\n\n\n\nTrain\n\n\nYou can \ntrain\n a model as usual with a multilabel learner and a\nmultilabel task as input. You can also pass \nsubset\n and \nweights\n arguments if the\nlearner supports this.\n\n\nmod = train(multilabel.lrn, yeast.task)\nmod = train(multilabel.lrn, yeast.task, subset = 1:1500, weights = rep(1/1500, 1500))\nmod\n#\n Model for learner.id=multilabel.classif.rpart; learner.class=MultilabelBinaryRelevanceWrapper\n#\n Trained on: task.id = multi; obs = 1500; features = 103\n#\n Hyperparameters: xval=0\n\nmod2 = train(multilabel.lrn2, yeast.task, subset = 1:100)\nmod2\n#\n Model for learner.id=multilabel.rFerns; learner.class=multilabel.rFerns\n#\n Trained on: task.id = multi; obs = 100; features = 103\n#\n Hyperparameters:\n\n\n\n\nPredict\n\n\nPrediction can be done as usual in \nmlr\n with \npredict\n and by passing a trained model\nand either the task to the \ntask\n argument or some new data to the \nnewdata\n\nargument. As always you can specify a \nsubset\n of the data\nwhich should be predicted.\n\n\npred = predict(mod, task = yeast.task, subset = 1:10)\npred = predict(mod, newdata = yeast[1501:1600,])\nnames(as.data.frame(pred))\n#\n [1] \ntruth.label1\n \ntruth.label2\n \ntruth.label3\n \n#\n [4] \ntruth.label4\n \ntruth.label5\n \ntruth.label6\n \n#\n [7] \ntruth.label7\n \ntruth.label8\n \ntruth.label9\n \n#\n [10] \ntruth.label10\n \ntruth.label11\n \ntruth.label12\n \n#\n [13] \ntruth.label13\n \ntruth.label14\n \nprob.label1\n \n#\n [16] \nprob.label2\n \nprob.label3\n \nprob.label4\n \n#\n [19] \nprob.label5\n \nprob.label6\n \nprob.label7\n \n#\n [22] \nprob.label8\n \nprob.label9\n \nprob.label10\n \n#\n [25] \nprob.label11\n \nprob.label12\n \nprob.label13\n \n#\n [28] \nprob.label14\n \nresponse.label1\n \nresponse.label2\n \n#\n [31] \nresponse.label3\n \nresponse.label4\n \nresponse.label5\n \n#\n [34] \nresponse.label6\n \nresponse.label7\n \nresponse.label8\n \n#\n [37] \nresponse.label9\n \nresponse.label10\n \nresponse.label11\n\n#\n [40] \nresponse.label12\n \nresponse.label13\n \nresponse.label14\n\n\npred2 = predict(mod2, task = yeast.task)\nnames(as.data.frame(pred2))\n#\n [1] \nid\n \ntruth.label1\n \ntruth.label2\n \n#\n [4] \ntruth.label3\n \ntruth.label4\n \ntruth.label5\n \n#\n [7] \ntruth.label6\n \ntruth.label7\n \ntruth.label8\n \n#\n [10] \ntruth.label9\n \ntruth.label10\n \ntruth.label11\n \n#\n [13] \ntruth.label12\n \ntruth.label13\n \ntruth.label14\n \n#\n [16] \nresponse.label1\n \nresponse.label2\n \nresponse.label3\n \n#\n [19] \nresponse.label4\n \nresponse.label5\n \nresponse.label6\n \n#\n [22] \nresponse.label7\n \nresponse.label8\n \nresponse.label9\n \n#\n [25] \nresponse.label10\n \nresponse.label11\n \nresponse.label12\n\n#\n [28] \nresponse.label13\n \nresponse.label14\n\n\n\n\n\nDepending on the chosen \npredict.type\n of the learner you get true and predicted values and\npossibly probabilities for each class label.\nThese can be extracted by the usual accessor functions \ngetPredictionTruth\n, \ngetPredictionResponse\n\nand \ngetPredictionProbabilities\n.\n\n\nPerformance\n\n\nThe performance of your prediction can be assessed via function \nperformance\n.\nYou can specify via the \nmeasures\n argument which \nmeasure(s)\n to calculate.\nThe default measure for multilabel classification is the Hamming loss (\nhamloss\n).\nAll available measures for multilabel classification can be shown by \nlistMeasures\n.\n\n\nperformance(pred)\n#\n hamloss \n#\n 0.2257143\n\nperformance(pred2, measures = list(hamloss, timepredict))\n#\n hamloss timepredict \n#\n 0.6946924 0.0830000\n\nlistMeasures(\nmultilabel\n)\n#\n [1] \ntimepredict\n \nfeatperc\n \ntimeboth\n \ntimetrain\n \nhamloss\n\n\n\n\n\nResampling\n\n\nFor evaluating the overall performance of the learning algorithm you can do some\n\nresampling\n. As usual you have to define a resampling strategy, either\nvia \nmakeResampleDesc\n or \nmakeResampleInstance\n. After that you can run the \nresample\n\nfunction. Below the default measure Hamming loss is calculated.\n\n\nrdesc = makeResampleDesc(method = \nCV\n, stratify = FALSE, iters = 3)\nr = resample(learner = multilabel.lrn, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: multi\n#\n Learner: multilabel.classif.rpart\n#\n hamloss.aggr: 0.23\n#\n hamloss.mean: 0.23\n#\n hamloss.sd: 0.01\n#\n Runtime: 9.11298\n\nr = resample(learner = multilabel.lrn2, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: multi\n#\n Learner: multilabel.rFerns\n#\n hamloss.aggr: 0.47\n#\n hamloss.mean: 0.47\n#\n hamloss.sd: 0.01\n#\n Runtime: 0.827824\n\n\n\n\nBinary performance\n\n\nIf you want to calculate a binary\nperformance measure like, e.g., the \naccuracy\n, the \nmmce\n\nor the \nauc\n for each label, you can use function\n\ngetMultilabelBinaryPerformances\n.\nYou can apply this function to any multilabel prediction, e.g., also on the resample\nmultilabel prediction. For calculating the \nauc\n you need\npredicted probabilities.\n\n\ngetMultilabelBinaryPerformances(pred, measures = list(acc, mmce, auc))\n#\n acc.test.mean mmce.test.mean auc.test.mean\n#\n label1 0.75 0.25 0.6321925\n#\n label2 0.64 0.36 0.6547917\n#\n label3 0.68 0.32 0.7118227\n#\n label4 0.69 0.31 0.6764835\n#\n label5 0.73 0.27 0.6676923\n#\n label6 0.70 0.30 0.6417739\n#\n label7 0.81 0.19 0.5968750\n#\n label8 0.73 0.27 0.5164474\n#\n label9 0.89 0.11 0.4688458\n#\n label10 0.86 0.14 0.3996463\n#\n label11 0.85 0.15 0.5000000\n#\n label12 0.76 0.24 0.5330667\n#\n label13 0.75 0.25 0.5938610\n#\n label14 1.00 0.00 NA\n\ngetMultilabelBinaryPerformances(r$pred, measures = list(acc, mmce))\n#\n acc.test.mean mmce.test.mean\n#\n label1 0.69590401 0.3040960\n#\n label2 0.59371121 0.4062888\n#\n label3 0.70417873 0.2958213\n#\n label4 0.71328093 0.2867191\n#\n label5 0.71617708 0.2838229\n#\n label6 0.59577989 0.4042201\n#\n label7 0.55895739 0.4410426\n#\n label8 0.54447662 0.4555234\n#\n label9 0.33140257 0.6685974\n#\n label10 0.46586678 0.5341332\n#\n label11 0.47662391 0.5233761\n#\n label12 0.52172114 0.4782789\n#\n label13 0.52792718 0.4720728\n#\n label14 0.01406703 0.9859330", + "text": "Multilabel Classification\n\n\nMultilabel classification is a classification problem where multiple target labels can be\nassigned to each observation instead of only one like in multiclass classification.\n\n\nTwo different approaches exist for multilabel classification. \nProblem transformation\nmethods\n try to transform the multilabel classification into binary or multiclass\nclassification problems. \nAlgorithm adaptation methods\n adapt multiclass algorithms\nso they can be applied directly to the problem.\n\n\nCreating a task\n\n\nThe first thing you have to do for multilabel classification in \nmlr\n is to\nget your data in the right format. You need a \ndata.frame\n which\nconsists of the features and a logical vector for each label which indicates if the\nlabel is present in the observation or not. After that you can create a\n\nMultilabelTask\n like a normal \nClassifTask\n. Instead of one\ntarget name you have to specify a vector of targets which correspond to the names of\nlogical variables in the \ndata.frame\n. In the following example\nwe get the yeast data frame from the already existing \nyeast.task\n, extract\nthe 14 label names and create the task again.\n\n\nyeast = getTaskData(yeast.task)\nlabels = colnames(yeast)[1:14]\nyeast.task = makeMultilabelTask(id = \nmulti\n, data = yeast, target = labels)\nyeast.task\n#\n Supervised task: multi\n#\n Type: multilabel\n#\n Target: label1,label2,label3,label4,label5,label6,label7,label8,label9,label10,label11,label12,label13,label14\n#\n Observations: 2417\n#\n Features:\n#\n numerics factors ordered \n#\n 103 0 0 \n#\n Missings: FALSE\n#\n Has weights: FALSE\n#\n Has blocking: FALSE\n#\n Classes: 14\n#\n label1 label2 label3 label4 label5 label6 label7 label8 label9 \n#\n 762 1038 983 862 722 597 428 480 178 \n#\n label10 label11 label12 label13 label14 \n#\n 253 289 1816 1799 34\n\n\n\n\nConstructing a learner\n\n\nMultilabel classification in \nmlr\n can currently be done in two ways:\n\n\n\n\n\n\nUse the binary relevance method.\nThis problem transformation method converts the multilabel problem to binary\nclassification problems for each\nlabel and applies a simple binary classificator on these. In \nmlr\n this can be done by\nconverting your binary learner to a wrapped binary relevance multilabel learner.\n\n\n\n\n\n\nApply directly an algorithm adaptation method which treats the whole\nproblem with a specific algorithm.\n\n\n\n\n\n\nBinary relevance method\n\n\nFor generating a wrapped multilabel learner first create a binary (or multiclass)\nclassification learner with \nmakeLearner\n. Afterwards apply the function\n\nmakeMultilabelBinaryRelevanceWrapper\n to the learner\nto convert it into a binary relevance learner.\n\n\nYou can also generate a binary relevance learner directly, as you can see in the example.\n\n\nmultilabel.lrn = makeLearner(\nclassif.rpart\n, predict.type = \nprob\n)\nmultilabel.lrn = makeMultilabelBinaryRelevanceWrapper(multilabel.lrn)\nmultilabel.lrn\n#\n Learner multilabel.classif.rpart from package rpart\n#\n Type: multilabel\n#\n Name: ; Short name: \n#\n Class: MultilabelBinaryRelevanceWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: prob\n#\n Hyperparameters: xval=0\n\nmultilabel.lrn1 = makeMultilabelBinaryRelevanceWrapper(\nclassif.rpart\n)\nmultilabel.lrn1\n#\n Learner multilabel.classif.rpart from package rpart\n#\n Type: multilabel\n#\n Name: ; Short name: \n#\n Class: MultilabelBinaryRelevanceWrapper\n#\n Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass\n#\n Predict-Type: response\n#\n Hyperparameters: xval=0\n\n\n\n\nAlgorithm adaptation method\n\n\nCurrently the only available algorithm adaptation method in \nR\n is the Random Ferns\nmultilabel algorithm in the \nrFerns\n package. You can create the learner for this algorithm\nlike in multiclass classification problems.\n\n\nmultilabel.lrn2 = makeLearner(\nmultilabel.rFerns\n)\nmultilabel.lrn2\n#\n Learner multilabel.rFerns from package rFerns\n#\n Type: multilabel\n#\n Name: Random ferns; Short name: rFerns\n#\n Class: multilabel.rFerns\n#\n Properties: numerics,factors,ordered\n#\n Predict-Type: response\n#\n Hyperparameters:\n\n\n\n\nTrain\n\n\nYou can \ntrain\n a model as usual with a multilabel learner and a\nmultilabel task as input. You can also pass \nsubset\n and \nweights\n arguments if the\nlearner supports this.\n\n\nmod = train(multilabel.lrn, yeast.task)\nmod = train(multilabel.lrn, yeast.task, subset = 1:1500, weights = rep(1/1500, 1500))\nmod\n#\n Model for learner.id=multilabel.classif.rpart; learner.class=MultilabelBinaryRelevanceWrapper\n#\n Trained on: task.id = multi; obs = 1500; features = 103\n#\n Hyperparameters: xval=0\n\nmod2 = train(multilabel.lrn2, yeast.task, subset = 1:100)\nmod2\n#\n Model for learner.id=multilabel.rFerns; learner.class=multilabel.rFerns\n#\n Trained on: task.id = multi; obs = 100; features = 103\n#\n Hyperparameters:\n\n\n\n\nPredict\n\n\nPrediction can be done as usual in \nmlr\n with \npredict\n and by passing a trained model\nand either the task to the \ntask\n argument or some new data to the \nnewdata\n\nargument. As always you can specify a \nsubset\n of the data\nwhich should be predicted.\n\n\npred = predict(mod, task = yeast.task, subset = 1:10)\npred = predict(mod, newdata = yeast[1501:1600,])\nnames(as.data.frame(pred))\n#\n [1] \ntruth.label1\n \ntruth.label2\n \ntruth.label3\n \n#\n [4] \ntruth.label4\n \ntruth.label5\n \ntruth.label6\n \n#\n [7] \ntruth.label7\n \ntruth.label8\n \ntruth.label9\n \n#\n [10] \ntruth.label10\n \ntruth.label11\n \ntruth.label12\n \n#\n [13] \ntruth.label13\n \ntruth.label14\n \nprob.label1\n \n#\n [16] \nprob.label2\n \nprob.label3\n \nprob.label4\n \n#\n [19] \nprob.label5\n \nprob.label6\n \nprob.label7\n \n#\n [22] \nprob.label8\n \nprob.label9\n \nprob.label10\n \n#\n [25] \nprob.label11\n \nprob.label12\n \nprob.label13\n \n#\n [28] \nprob.label14\n \nresponse.label1\n \nresponse.label2\n \n#\n [31] \nresponse.label3\n \nresponse.label4\n \nresponse.label5\n \n#\n [34] \nresponse.label6\n \nresponse.label7\n \nresponse.label8\n \n#\n [37] \nresponse.label9\n \nresponse.label10\n \nresponse.label11\n\n#\n [40] \nresponse.label12\n \nresponse.label13\n \nresponse.label14\n\n\npred2 = predict(mod2, task = yeast.task)\nnames(as.data.frame(pred2))\n#\n [1] \nid\n \ntruth.label1\n \ntruth.label2\n \n#\n [4] \ntruth.label3\n \ntruth.label4\n \ntruth.label5\n \n#\n [7] \ntruth.label6\n \ntruth.label7\n \ntruth.label8\n \n#\n [10] \ntruth.label9\n \ntruth.label10\n \ntruth.label11\n \n#\n [13] \ntruth.label12\n \ntruth.label13\n \ntruth.label14\n \n#\n [16] \nresponse.label1\n \nresponse.label2\n \nresponse.label3\n \n#\n [19] \nresponse.label4\n \nresponse.label5\n \nresponse.label6\n \n#\n [22] \nresponse.label7\n \nresponse.label8\n \nresponse.label9\n \n#\n [25] \nresponse.label10\n \nresponse.label11\n \nresponse.label12\n\n#\n [28] \nresponse.label13\n \nresponse.label14\n\n\n\n\n\nDepending on the chosen \npredict.type\n of the learner you get true and predicted values and\npossibly probabilities for each class label.\nThese can be extracted by the usual accessor functions \ngetPredictionTruth\n, \ngetPredictionResponse\n\nand \ngetPredictionProbabilities\n.\n\n\nPerformance\n\n\nThe performance of your prediction can be assessed via function \nperformance\n.\nYou can specify via the \nmeasures\n argument which \nmeasure(s)\n to calculate.\nThe default measure for multilabel classification is the Hamming loss (\nhamloss\n).\nAll available measures for multilabel classification can be shown by \nlistMeasures\n.\n\n\nperformance(pred)\n#\n hamloss \n#\n 0.2257143\n\nperformance(pred2, measures = list(hamloss, timepredict))\n#\n hamloss timepredict \n#\n 0.6946924 0.0860000\n\nlistMeasures(\nmultilabel\n)\n#\n [1] \ntimepredict\n \nfeatperc\n \ntimeboth\n \ntimetrain\n \nhamloss\n\n\n\n\n\nResampling\n\n\nFor evaluating the overall performance of the learning algorithm you can do some\n\nresampling\n. As usual you have to define a resampling strategy, either\nvia \nmakeResampleDesc\n or \nmakeResampleInstance\n. After that you can run the \nresample\n\nfunction. Below the default measure Hamming loss is calculated.\n\n\nrdesc = makeResampleDesc(method = \nCV\n, stratify = FALSE, iters = 3)\nr = resample(learner = multilabel.lrn, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: multi\n#\n Learner: multilabel.classif.rpart\n#\n hamloss.aggr: 0.23\n#\n hamloss.mean: 0.23\n#\n hamloss.sd: 0.01\n#\n Runtime: 9.4152\n\nr = resample(learner = multilabel.lrn2, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n#\n Resample Result\n#\n Task: multi\n#\n Learner: multilabel.rFerns\n#\n hamloss.aggr: 0.47\n#\n hamloss.mean: 0.47\n#\n hamloss.sd: 0.01\n#\n Runtime: 0.829432\n\n\n\n\nBinary performance\n\n\nIf you want to calculate a binary\nperformance measure like, e.g., the \naccuracy\n, the \nmmce\n\nor the \nauc\n for each label, you can use function\n\ngetMultilabelBinaryPerformances\n.\nYou can apply this function to any multilabel prediction, e.g., also on the resample\nmultilabel prediction. For calculating the \nauc\n you need\npredicted probabilities.\n\n\ngetMultilabelBinaryPerformances(pred, measures = list(acc, mmce, auc))\n#\n acc.test.mean mmce.test.mean auc.test.mean\n#\n label1 0.75 0.25 0.6321925\n#\n label2 0.64 0.36 0.6547917\n#\n label3 0.68 0.32 0.7118227\n#\n label4 0.69 0.31 0.6764835\n#\n label5 0.73 0.27 0.6676923\n#\n label6 0.70 0.30 0.6417739\n#\n label7 0.81 0.19 0.5968750\n#\n label8 0.73 0.27 0.5164474\n#\n label9 0.89 0.11 0.4688458\n#\n label10 0.86 0.14 0.3996463\n#\n label11 0.85 0.15 0.5000000\n#\n label12 0.76 0.24 0.5330667\n#\n label13 0.75 0.25 0.5938610\n#\n label14 1.00 0.00 NA\n\ngetMultilabelBinaryPerformances(r$pred, measures = list(acc, mmce))\n#\n acc.test.mean mmce.test.mean\n#\n label1 0.69590401 0.3040960\n#\n label2 0.59371121 0.4062888\n#\n label3 0.70417873 0.2958213\n#\n label4 0.71328093 0.2867191\n#\n label5 0.71617708 0.2838229\n#\n label6 0.59577989 0.4042201\n#\n label7 0.55895739 0.4410426\n#\n label8 0.54447662 0.4555234\n#\n label9 0.33140257 0.6685974\n#\n label10 0.46586678 0.5341332\n#\n label11 0.47662391 0.5233761\n#\n label12 0.52172114 0.4782789\n#\n label13 0.52792718 0.4720728\n#\n label14 0.01406703 0.9859330", "title": "Multilabel Classification" }, { @@ -542,12 +542,12 @@ }, { "location": "/multilabel/index.html#performance", - "text": "The performance of your prediction can be assessed via function performance .\nYou can specify via the measures argument which measure(s) to calculate.\nThe default measure for multilabel classification is the Hamming loss ( hamloss ).\nAll available measures for multilabel classification can be shown by listMeasures . performance(pred)\n# hamloss \n# 0.2257143\n\nperformance(pred2, measures = list(hamloss, timepredict))\n# hamloss timepredict \n# 0.6946924 0.0830000\n\nlistMeasures( multilabel )\n# [1] timepredict featperc timeboth timetrain hamloss", + "text": "The performance of your prediction can be assessed via function performance .\nYou can specify via the measures argument which measure(s) to calculate.\nThe default measure for multilabel classification is the Hamming loss ( hamloss ).\nAll available measures for multilabel classification can be shown by listMeasures . performance(pred)\n# hamloss \n# 0.2257143\n\nperformance(pred2, measures = list(hamloss, timepredict))\n# hamloss timepredict \n# 0.6946924 0.0860000\n\nlistMeasures( multilabel )\n# [1] timepredict featperc timeboth timetrain hamloss", "title": "Performance" }, { "location": "/multilabel/index.html#resampling", - "text": "For evaluating the overall performance of the learning algorithm you can do some resampling . As usual you have to define a resampling strategy, either\nvia makeResampleDesc or makeResampleInstance . After that you can run the resample \nfunction. Below the default measure Hamming loss is calculated. rdesc = makeResampleDesc(method = CV , stratify = FALSE, iters = 3)\nr = resample(learner = multilabel.lrn, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: multi\n# Learner: multilabel.classif.rpart\n# hamloss.aggr: 0.23\n# hamloss.mean: 0.23\n# hamloss.sd: 0.01\n# Runtime: 9.11298\n\nr = resample(learner = multilabel.lrn2, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: multi\n# Learner: multilabel.rFerns\n# hamloss.aggr: 0.47\n# hamloss.mean: 0.47\n# hamloss.sd: 0.01\n# Runtime: 0.827824", + "text": "For evaluating the overall performance of the learning algorithm you can do some resampling . As usual you have to define a resampling strategy, either\nvia makeResampleDesc or makeResampleInstance . After that you can run the resample \nfunction. Below the default measure Hamming loss is calculated. rdesc = makeResampleDesc(method = CV , stratify = FALSE, iters = 3)\nr = resample(learner = multilabel.lrn, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: multi\n# Learner: multilabel.classif.rpart\n# hamloss.aggr: 0.23\n# hamloss.mean: 0.23\n# hamloss.sd: 0.01\n# Runtime: 9.4152\n\nr = resample(learner = multilabel.lrn2, task = yeast.task, resampling = rdesc, show.info = FALSE)\nr\n# Resample Result\n# Task: multi\n# Learner: multilabel.rFerns\n# hamloss.aggr: 0.47\n# hamloss.mean: 0.47\n# hamloss.sd: 0.01\n# Runtime: 0.829432", "title": "Resampling" }, { @@ -602,7 +602,7 @@ }, { "location": "/create_learner/index.html", - "text": "Integrating another learner\n\n\nIn order to create a new learner in \nmlr\n, interface code to the \nR\n function\nmust be written. Three functions are required for each learner. First, you must\ndefine the learner itself with a name, description, capabilities, parameters,\nand a few other things. Second, you need to provide a function that calls the\nlearner function and builds the model given data. Finally, a prediction function\nthat returns predicted values given new data is needed.\n\n\nAll learners should inherit from \nrlearner.classif\n, \nrlearner.multilabel\n, \nrlearner.regr\n,\n\nrlearner.surv\n, \nrlearner.costsens\n, or \nrlearner.cluster\n. While it is\nalso possible to define a new type of learner that has special properties and\ndoes not fit into one of the existing schemes, this is much more advanced and\nnot covered here.\n\n\nClassification\n\n\nWe show how the \nLinear Discriminant Analysis\n from\npackage \nMASS\n has been integrated\ninto the classification learner \nclassif.lda\n in \nmlr\n as an example.\n\n\nDefinition of the learner\n\n\nThe minimal information required to define a learner is the \nmlr\n name of the\nlearner, its package, the parameter set, and the set of properties of your\nlearner. In addition, you may provide a human-readable name, a short name and a\nnote with information relevant to users of the learner.\n\n\nFirst, name your learner. The naming conventions in \nmlr\n are\n\nclassif.\nR_method_name\n for classification, \nmultilabel.\nR_method_name\n \nfor multilabel classification, \nregr.\nR_method_name\n for\nregression, \nsurv.\nR_method_name\n for survival analysis,\n\ncostsens.\nR_method_name\n for cost sensitive learning, and\n\ncluster.\nR_method_name\n for clustering. So in this example, the name starts with\n\nclassif.\n and we choose \nclassif.lda\n.\n\n\nSecond, we need to define the parameters of the learner. These are any options\nthat can be set when running it to change how it learns, how input is\ninterpreted, how and what output is generated, and so on. \nmlr\n provides a\nnumber of functions to define parameters, a complete list can be found in the\ndocumentation of \nLearnerParam\n of the\n\nParamHelpers\n package.\n\n\nIn our example, we have discrete and numeric parameters, so we use\n\nmakeDiscreteLearnerParam\n and\n\nmakeNumericLearnerParam\n to incorporate the\ncomplete description of the parameters. We include all possible values for\ndiscrete parameters and lower and upper bounds for numeric parameters. Strictly\nspeaking it is not necessary to provide bounds for all parameters and if this\ninformation is not available they can be estimated, but providing accurate and\nspecific information here makes it possible to tune the learner much better (see\nthe section on \ntuning\n).\n\n\nNext, we add information on the properties of the learner (see also the section\non \nlearners\n). Which types of features are supported (numerics,\nfactors)? Are case weights supported? Are class weights supported? Can the method deal\nwith missing values in the features and deal with NA's in a meaningful way (not \nna.omit\n)?\nAre one-class, two-class, multi-class problems supported? Can the learner predict\nposterior probabilities?\n\n\nIf the learner supports class weights the name of the relevant learner parameter\ncan be specified via argument \nclass.weights.param\n.\n\n\nBelow is the complete code for the definition of the LDA learner. It has one\ndiscrete parameter, \nmethod\n, and two continuous ones, \nnu\n and \ntol\n. It\nsupports classification problems with two or more classes and can deal with\nnumeric and factor explanatory variables. It can predict posterior\nprobabilities.\n\n\nmakeRLearner.classif.lda = function() {\n makeRLearnerClassif(\n cl = \nclassif.lda\n,\n package = \nMASS\n,\n par.set = makeParamSet(\n makeDiscreteLearnerParam(id = \nmethod\n, default = \nmoment\n, values = c(\nmoment\n, \nmle\n, \nmve\n, \nt\n)),\n makeNumericLearnerParam(id = \nnu\n, lower = 2, requires = quote(method == \nt\n)),\n makeNumericLearnerParam(id = \ntol\n, default = 1e-4, lower = 0),\n makeDiscreteLearnerParam(id = \npredict.method\n, values = c(\nplug-in\n, \npredictive\n, \ndebiased\n),\n default = \nplug-in\n, when = \npredict\n),\n makeLogicalLearnerParam(id = \nCV\n, default = FALSE, tunable = FALSE)\n ),\n properties = c(\ntwoclass\n, \nmulticlass\n, \nnumerics\n, \nfactors\n, \nprob\n),\n name = \nLinear Discriminant Analysis\n,\n short.name = \nlda\n,\n note = \nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n )\n}\n\n\n\n\nCreating the training function of the learner\n\n\nOnce the learner has been defined, we need to tell \nmlr\n how to call it to\ntrain a model. The name of the function has to start with \ntrainLearner.\n,\nfollowed by the \nmlr\n name of the learner as defined above (\nclassif.lda\n\nhere). The prototype of the function looks as follows.\n\n\nfunction(.learner, .task, .subset, .weights = NULL, ...) { }\n\n\n\n\nThis function must fit a model on the data of the task \n.task\n with regard to\nthe subset defined in the integer vector \n.subset\n and the parameters passed\nin the \n...\n arguments. Usually, the data should be extracted from the task\nusing \ngetTaskData\n. This will take care of any subsetting as well. It must\nreturn the fitted model. \nmlr\n assumes no special data type for the return\nvalue -- it will be passed to the predict function we are going to define below,\nso any special code the learner may need can be encapsulated there.\n\n\nFor our example, the definition of the function looks like this. In addition to\nthe data of the task, we also need the formula that describes what to predict.\nWe use the function \ngetTaskFormula\n to extract this from the task.\n\n\ntrainLearner.classif.lda = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n MASS::lda(f, data = getTaskData(.task, .subset), ...)\n}\n\n\n\n\nCreating the prediction method\n\n\nFinally, the prediction function needs to be defined. The name of this function\nstarts with \npredictLearner.\n, followed again by the \nmlr\n name of the\nlearner. The prototype of the function is as follows.\n\n\nfunction(.learner, .model, .newdata, ...) { }\n\n\n\n\nIt must predict for the new observations in the \ndata.frame\n \n.newdata\n with\nthe wrapped model \n.model\n, which is returned from the training function.\nThe actual model the learner built is stored in the \n$learner.model\n member\nand can be accessed simply through \n.model$learner.model\n.\n\n\nFor classification, you have to return a factor of predicted classes if\n\n.learner$predict.type\n is \n\"response\"\n, or a matrix of predicted\nprobabilities if \n.learner$predict.type\n is \n\"prob\"\n and this type of\nprediction is supported by the learner. In the latter case the matrix must have\nthe same number of columns as there are classes in the task and the columns have\nto be named by the class names.\n\n\nThe definition for LDA looks like this. It is pretty much just a straight\npass-through of the arguments to the \npredict\n function and some extraction of\nprediction data depending on the type of prediction requested.\n\n\npredictLearner.classif.lda = function(.learner, .model, .newdata, predict.method = \nplug-in\n, ...) {\n p = predict(.model$learner.model, newdata = .newdata, method = predict.method, ...)\n if (.learner$predict.type == \nresponse\n) \n return(p$class) else return(p$posterior)\n}\n\n\n\n\nRegression\n\n\nThe main difference for regression is that the type of predictions are different\n(numeric instead of labels or probabilities) and that not all of the properties\nare relevant. In particular, whether one-, two-, or multi-class problems and\nposterior probabilities are supported is not applicable.\n\n\nApart from this, everything explained above applies. Below is the definition for\nthe \nearth\n learner from the\n\nearth\n package.\n\n\nmakeRLearner.regr.earth = function() {\n makeRLearnerRegr(\n cl = \nregr.earth\n,\n package = \nearth\n,\n par.set = makeParamSet(\n makeLogicalLearnerParam(id = \nkeepxy\n, default = FALSE, tunable = FALSE),\n makeNumericLearnerParam(id = \ntrace\n, default = 0, upper = 10, tunable = FALSE),\n makeIntegerLearnerParam(id = \ndegree\n, default = 1L, lower = 1L),\n makeNumericLearnerParam(id = \npenalty\n),\n makeIntegerLearnerParam(id = \nnk\n, lower = 0L),\n makeNumericLearnerParam(id = \nthres\n, default = 0.001),\n makeIntegerLearnerParam(id = \nminspan\n, default = 0L),\n makeIntegerLearnerParam(id = \nendspan\n, default = 0L),\n makeNumericLearnerParam(id = \nnewvar.penalty\n, default = 0),\n makeIntegerLearnerParam(id = \nfast.k\n, default = 20L, lower = 0L),\n makeNumericLearnerParam(id = \nfast.beta\n, default = 1),\n makeDiscreteLearnerParam(id = \npmethod\n, default = \nbackward\n,\n values = c(\nbackward\n, \nnone\n, \nexhaustive\n, \nforward\n, \nseqrep\n, \ncv\n)),\n makeIntegerLearnerParam(id = \nnprune\n)\n ),\n properties = c(\nnumerics\n, \nfactors\n),\n name = \nMultivariate Adaptive Regression Splines\n,\n short.name = \nearth\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.regr.earth = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n earth::earth(f, data = getTaskData(.task, .subset), ...)\n}\n\n\n\n\npredictLearner.regr.earth = function(.learner, .model, .newdata, ...) {\n predict(.model$learner.model, newdata = .newdata)[, 1L]\n}\n\n\n\n\nAgain most of the data is passed straight through to/from the train/predict\nfunctions of the learner.\n\n\nSurvival Analysis\n\n\nFor survival analysis, you have to return so-called linear predictors in order to compute\nthe default measure for this task type, the \ncindex\n (for\n\n.learner$predict.type\n == \n\"response\"\n). For \n.learner$predict.type\n == \n\"prob\"\n,\nthere is no substantially meaningful measure (yet). You may either ignore this case or return\nsomething like predicted survival curves (cf. example below).\n\n\nThere are three properties that are specific to survival learners:\n\"rcens\", \"lcens\" and \"icens\", defining the type(s) of censoring a learner can handle -- right,\nleft and/or interval censored.\n\n\nLet's have a look at how the \nCox Proportional Hazard Model\n from\npackage \nsurvival\n has been integrated\ninto the survival learner \nsurv.coxph\n in \nmlr\n as an example:\n\n\nmakeRLearner.surv.coxph = function() {\n makeRLearnerSurv(\n cl = \nsurv.coxph\n,\n package = \nsurvival\n,\n par.set = makeParamSet(\n makeDiscreteLearnerParam(id = \nties\n, default = \nefron\n, values = c(\nefron\n, \nbreslow\n, \nexact\n)),\n makeLogicalLearnerParam(id = \nsingular.ok\n, default = TRUE),\n makeNumericLearnerParam(id = \neps\n, default = 1e-09, lower = 0),\n makeNumericLearnerParam(id = \ntoler.chol\n, default = .Machine$double.eps^0.75, lower = 0),\n makeIntegerLearnerParam(id = \niter.max\n, default = 20L, lower = 1L),\n makeNumericLearnerParam(id = \ntoler.inf\n, default = sqrt(.Machine$double.eps^0.75), lower = 0),\n makeIntegerLearnerParam(id = \nouter.max\n, default = 10L, lower = 1L),\n makeLogicalLearnerParam(id = \nmodel\n, default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = \nx\n, default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = \ny\n, default = TRUE, tunable = FALSE)\n ),\n properties = c(\nmissings\n, \nnumerics\n, \nfactors\n, \nweights\n, \nprob\n, \nrcens\n),\n name = \nCox Proportional Hazard Model\n,\n short.name = \ncoxph\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.surv.coxph = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n data = getTaskData(.task, subset = .subset)\n if (is.null(.weights)) {\n mod = survival::coxph(formula = f, data = data, ...)\n } else {\n mod = survival::coxph(formula = f, data = data, weights = .weights, ...)\n }\n if (.learner$predict.type == \nprob\n) \n mod = attachTrainingInfo(mod, list(surv.range = range(getTaskTargets(.task)[, 1L])))\n mod\n}\n\n\n\n\npredictLearner.surv.coxph = function(.learner, .model, .newdata, ...) {\n if (.learner$predict.type == \nresponse\n) {\n predict(.model$learner.model, newdata = .newdata, type = \nlp\n, ...)\n } else if (.learner$predict.type == \nprob\n) {\n surv.range = getTrainingInfo(.model$learner.model)$surv.range\n times = seq(from = surv.range[1L], to = surv.range[2L], length.out = 1000)\n t(summary(survival::survfit(.model$learner.model, newdata = .newdata, se.fit = FALSE, conf.int = FALSE), \n times = times)$surv)\n } else {\n stop(\nUnknown predict type\n)\n }\n}\n\n\n\n\nClustering\n\n\nFor clustering, you have to return a numeric vector with the IDs of the clusters\nthat the respective datum has been assigned to. The numbering should start at 1.\n\n\nBelow is the definition for the \nFarthestFirst\n learner\nfrom the \nRWeka\n package. Weka\nstarts the IDs of the clusters at 0, so we add 1 to the predicted clusters.\nRWeka has a different way of setting learner parameters; we use the special\n\nWeka_control\n function to do this.\n\n\nmakeRLearner.cluster.FarthestFirst = function() {\n makeRLearnerCluster(\n cl = \ncluster.FarthestFirst\n,\n package = \nRWeka\n,\n par.set = makeParamSet(\n makeIntegerLearnerParam(id = \nN\n, default = 2L, lower = 1L),\n makeIntegerLearnerParam(id = \nS\n, default = 1L, lower = 1L),\n makeLogicalLearnerParam(id = \noutput-debug-info\n, default = FALSE, tunable = FALSE)\n ),\n properties = c(\nnumerics\n),\n name = \nFarthestFirst Clustering Algorithm\n,\n short.name = \nfarthestfirst\n\n )\n}\n\n\n\n\ntrainLearner.cluster.FarthestFirst = function(.learner, .task, .subset, .weights = NULL, ...) {\n ctrl = RWeka::Weka_control(...)\n RWeka::FarthestFirst(getTaskData(.task, .subset), control = ctrl)\n}\n\n\n\n\npredictLearner.cluster.FarthestFirst = function(.learner, .model, .newdata, ...) {\n as.integer(predict(.model$learner.model, .newdata, ...)) + 1L\n}\n\n\n\n\nMultilabel classification\n\n\nAs stated in the \nmultilabel\n section, multilabel classification\nmethods can be divided into problem transformation methods and algorithm adaptation methods.\n\n\nAt this moment the only problem transformation method implemented in \nmlr\n\nis the \nbinary relevance method\n. Integrating more of\nthese methods requires good knowledge of the architecture of the \nmlr\n package.\n\n\nThe integration of an algorithm adaptation multilabel classification learner is easier and\nworks very similar to the normal multiclass-classification.\nIn contrast to the multiclass case, not all of the learner properties are relevant.\nIn particular, whether one-, two-, or multi-class problems are supported is not applicable.\nFurthermore the prediction function output must be a matrix\nwith each prediction of a label in one column and the names of the labels\nas column names. If \n.learner$predict.type\n is \n\"response\"\n the predictions must\nbe logical. If \n.learner$predict.type\n is \n\"prob\"\n and this type of\nprediction is supported by the learner, the matrix must consist of predicted\nprobabilities.\n\n\nBelow is the definition of the \nrFerns\n learner from the\n\nrFerns\n package, which does not support probability predictions.\n\n\nmakeRLearner.multilabel.rFerns = function() {\n makeRLearnerMultilabel(\n cl = \nmultilabel.rFerns\n,\n package = \nrFerns\n,\n par.set = makeParamSet(\n makeIntegerLearnerParam(id = \ndepth\n, default = 5L),\n makeIntegerLearnerParam(id = \nferns\n, default = 1000L)\n ),\n properties = c(\nnumerics\n, \nfactors\n, \nordered\n),\n name = \nRandom ferns\n,\n short.name = \nrFerns\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.multilabel.rFerns = function(.learner, .task, .subset, .weights = NULL, ...) {\n d = getTaskData(.task, .subset, target.extra = TRUE)\n rFerns::rFerns(x = d$data, y = as.matrix(d$target), ...)\n}\n\n\n\n\npredictLearner.multilabel.rFerns = function(.learner, .model, .newdata, ...) {\n as.matrix(predict(.model$learner.model, .newdata, ...))\n}", + "text": "Integrating another learner\n\n\nIn order to create a new learner in \nmlr\n, interface code to the \nR\n function\nmust be written. Three functions are required for each learner. First, you must\ndefine the learner itself with a name, description, capabilities, parameters,\nand a few other things. Second, you need to provide a function that calls the\nlearner function and builds the model given data. Finally, a prediction function\nthat returns predicted values given new data is needed.\n\n\nAll learners should inherit from \nrlearner.classif\n, \nrlearner.multilabel\n, \nrlearner.regr\n,\n\nrlearner.surv\n, \nrlearner.costsens\n, or \nrlearner.cluster\n. While it is\nalso possible to define a new type of learner that has special properties and\ndoes not fit into one of the existing schemes, this is much more advanced and\nnot covered here.\n\n\nClassification\n\n\nWe show how the \nLinear Discriminant Analysis\n from\npackage \nMASS\n has been integrated\ninto the classification learner \nclassif.lda\n in \nmlr\n as an example.\n\n\nDefinition of the learner\n\n\nThe minimal information required to define a learner is the \nmlr\n name of the\nlearner, its package, the parameter set, and the set of properties of your\nlearner. In addition, you may provide a human-readable name, a short name and a\nnote with information relevant to users of the learner.\n\n\nFirst, name your learner. The naming conventions in \nmlr\n are\n\nclassif.\nR_method_name\n for classification, \nmultilabel.\nR_method_name\n \nfor multilabel classification, \nregr.\nR_method_name\n for\nregression, \nsurv.\nR_method_name\n for survival analysis,\n\ncostsens.\nR_method_name\n for cost sensitive learning, and\n\ncluster.\nR_method_name\n for clustering. So in this example, the name starts with\n\nclassif.\n and we choose \nclassif.lda\n.\n\n\nSecond, we need to define the parameters of the learner. These are any options\nthat can be set when running it to change how it learns, how input is\ninterpreted, how and what output is generated, and so on. \nmlr\n provides a\nnumber of functions to define parameters, a complete list can be found in the\ndocumentation of \nLearnerParam\n of the\n\nParamHelpers\n package.\n\n\nIn our example, we have discrete and numeric parameters, so we use\n\nmakeDiscreteLearnerParam\n and\n\nmakeNumericLearnerParam\n to incorporate the\ncomplete description of the parameters. We include all possible values for\ndiscrete parameters and lower and upper bounds for numeric parameters. Strictly\nspeaking it is not necessary to provide bounds for all parameters and if this\ninformation is not available they can be estimated, but providing accurate and\nspecific information here makes it possible to tune the learner much better (see\nthe section on \ntuning\n).\n\n\nNext, we add information on the properties of the learner (see also the section\non \nlearners\n). Which types of features are supported (numerics,\nfactors)? Are case weights supported? Are class weights supported? Can the method deal\nwith missing values in the features and deal with NA's in a meaningful way (not \nna.omit\n)?\nAre one-class, two-class, multi-class problems supported? Can the learner predict\nposterior probabilities?\n\n\nIf the learner supports class weights the name of the relevant learner parameter\ncan be specified via argument \nclass.weights.param\n.\n\n\nBelow is the complete code for the definition of the LDA learner. It has one\ndiscrete parameter, \nmethod\n, and two continuous ones, \nnu\n and \ntol\n. It\nsupports classification problems with two or more classes and can deal with\nnumeric and factor explanatory variables. It can predict posterior\nprobabilities.\n\n\nmakeRLearner.classif.lda = function() {\n makeRLearnerClassif(\n cl = \nclassif.lda\n,\n package = \nMASS\n,\n par.set = makeParamSet(\n makeDiscreteLearnerParam(id = \nmethod\n, default = \nmoment\n, values = c(\nmoment\n, \nmle\n, \nmve\n, \nt\n)),\n makeNumericLearnerParam(id = \nnu\n, lower = 2, requires = quote(method == \nt\n)),\n makeNumericLearnerParam(id = \ntol\n, default = 1e-4, lower = 0),\n makeDiscreteLearnerParam(id = \npredict.method\n, values = c(\nplug-in\n, \npredictive\n, \ndebiased\n),\n default = \nplug-in\n, when = \npredict\n),\n makeLogicalLearnerParam(id = \nCV\n, default = FALSE, tunable = FALSE)\n ),\n properties = c(\ntwoclass\n, \nmulticlass\n, \nnumerics\n, \nfactors\n, \nprob\n),\n name = \nLinear Discriminant Analysis\n,\n short.name = \nlda\n,\n note = \nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n )\n}\n\n\n\n\nCreating the training function of the learner\n\n\nOnce the learner has been defined, we need to tell \nmlr\n how to call it to\ntrain a model. The name of the function has to start with \ntrainLearner.\n,\nfollowed by the \nmlr\n name of the learner as defined above (\nclassif.lda\n\nhere). The prototype of the function looks as follows.\n\n\nfunction(.learner, .task, .subset, .weights = NULL, ...) { }\n\n\n\n\nThis function must fit a model on the data of the task \n.task\n with regard to\nthe subset defined in the integer vector \n.subset\n and the parameters passed\nin the \n...\n arguments. Usually, the data should be extracted from the task\nusing \ngetTaskData\n. This will take care of any subsetting as well. It must\nreturn the fitted model. \nmlr\n assumes no special data type for the return\nvalue -- it will be passed to the predict function we are going to define below,\nso any special code the learner may need can be encapsulated there.\n\n\nFor our example, the definition of the function looks like this. In addition to\nthe data of the task, we also need the formula that describes what to predict.\nWe use the function \ngetTaskFormula\n to extract this from the task.\n\n\ntrainLearner.classif.lda = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n MASS::lda(f, data = getTaskData(.task, .subset), ...)\n}\n\n\n\n\nCreating the prediction method\n\n\nFinally, the prediction function needs to be defined. The name of this function\nstarts with \npredictLearner.\n, followed again by the \nmlr\n name of the\nlearner. The prototype of the function is as follows.\n\n\nfunction(.learner, .model, .newdata, ...) { }\n\n\n\n\nIt must predict for the new observations in the \ndata.frame\n \n.newdata\n with\nthe wrapped model \n.model\n, which is returned from the training function.\nThe actual model the learner built is stored in the \n$learner.model\n member\nand can be accessed simply through \n.model$learner.model\n.\n\n\nFor classification, you have to return a factor of predicted classes if\n\n.learner$predict.type\n is \n\"response\"\n, or a matrix of predicted\nprobabilities if \n.learner$predict.type\n is \n\"prob\"\n and this type of\nprediction is supported by the learner. In the latter case the matrix must have\nthe same number of columns as there are classes in the task and the columns have\nto be named by the class names.\n\n\nThe definition for LDA looks like this. It is pretty much just a straight\npass-through of the arguments to the \npredict\n function and some extraction of\nprediction data depending on the type of prediction requested.\n\n\npredictLearner.classif.lda = function(.learner, .model, .newdata, predict.method = \nplug-in\n, ...) {\n p = predict(.model$learner.model, newdata = .newdata, method = predict.method, ...)\n if (.learner$predict.type == \nresponse\n) \n return(p$class) else return(p$posterior)\n}\n\n\n\n\nRegression\n\n\nThe main difference for regression is that the type of predictions are different\n(numeric instead of labels or probabilities) and that not all of the properties\nare relevant. In particular, whether one-, two-, or multi-class problems and\nposterior probabilities are supported is not applicable.\n\n\nApart from this, everything explained above applies. Below is the definition for\nthe \nearth\n learner from the\n\nearth\n package.\n\n\nmakeRLearner.regr.earth = function() {\n makeRLearnerRegr(\n cl = \nregr.earth\n,\n package = \nearth\n,\n par.set = makeParamSet(\n makeLogicalLearnerParam(id = \nkeepxy\n, default = FALSE, tunable = FALSE),\n makeNumericLearnerParam(id = \ntrace\n, default = 0, upper = 10, tunable = FALSE),\n makeIntegerLearnerParam(id = \ndegree\n, default = 1L, lower = 1L),\n makeNumericLearnerParam(id = \npenalty\n),\n makeIntegerLearnerParam(id = \nnk\n, lower = 0L),\n makeNumericLearnerParam(id = \nthres\n, default = 0.001),\n makeIntegerLearnerParam(id = \nminspan\n, default = 0L),\n makeIntegerLearnerParam(id = \nendspan\n, default = 0L),\n makeNumericLearnerParam(id = \nnewvar.penalty\n, default = 0),\n makeIntegerLearnerParam(id = \nfast.k\n, default = 20L, lower = 0L),\n makeNumericLearnerParam(id = \nfast.beta\n, default = 1),\n makeDiscreteLearnerParam(id = \npmethod\n, default = \nbackward\n,\n values = c(\nbackward\n, \nnone\n, \nexhaustive\n, \nforward\n, \nseqrep\n, \ncv\n)),\n makeIntegerLearnerParam(id = \nnprune\n)\n ),\n properties = c(\nnumerics\n, \nfactors\n),\n name = \nMultivariate Adaptive Regression Splines\n,\n short.name = \nearth\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.regr.earth = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n earth::earth(f, data = getTaskData(.task, .subset), ...)\n}\n\n\n\n\npredictLearner.regr.earth = function(.learner, .model, .newdata, ...) {\n predict(.model$learner.model, newdata = .newdata)[, 1L]\n}\n\n\n\n\nAgain most of the data is passed straight through to/from the train/predict\nfunctions of the learner.\n\n\nSurvival analysis\n\n\nFor survival analysis, you have to return so-called linear predictors in order to compute\nthe default measure for this task type, the \ncindex\n (for\n\n.learner$predict.type\n == \n\"response\"\n). For \n.learner$predict.type\n == \n\"prob\"\n,\nthere is no substantially meaningful measure (yet). You may either ignore this case or return\nsomething like predicted survival curves (cf. example below).\n\n\nThere are three properties that are specific to survival learners:\n\"rcens\", \"lcens\" and \"icens\", defining the type(s) of censoring a learner can handle -- right,\nleft and/or interval censored.\n\n\nLet's have a look at how the \nCox Proportional Hazard Model\n from\npackage \nsurvival\n has been integrated\ninto the survival learner \nsurv.coxph\n in \nmlr\n as an example:\n\n\nmakeRLearner.surv.coxph = function() {\n makeRLearnerSurv(\n cl = \nsurv.coxph\n,\n package = \nsurvival\n,\n par.set = makeParamSet(\n makeDiscreteLearnerParam(id = \nties\n, default = \nefron\n, values = c(\nefron\n, \nbreslow\n, \nexact\n)),\n makeLogicalLearnerParam(id = \nsingular.ok\n, default = TRUE),\n makeNumericLearnerParam(id = \neps\n, default = 1e-09, lower = 0),\n makeNumericLearnerParam(id = \ntoler.chol\n, default = .Machine$double.eps^0.75, lower = 0),\n makeIntegerLearnerParam(id = \niter.max\n, default = 20L, lower = 1L),\n makeNumericLearnerParam(id = \ntoler.inf\n, default = sqrt(.Machine$double.eps^0.75), lower = 0),\n makeIntegerLearnerParam(id = \nouter.max\n, default = 10L, lower = 1L),\n makeLogicalLearnerParam(id = \nmodel\n, default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = \nx\n, default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = \ny\n, default = TRUE, tunable = FALSE)\n ),\n properties = c(\nmissings\n, \nnumerics\n, \nfactors\n, \nweights\n, \nprob\n, \nrcens\n),\n name = \nCox Proportional Hazard Model\n,\n short.name = \ncoxph\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.surv.coxph = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n data = getTaskData(.task, subset = .subset)\n if (is.null(.weights)) {\n mod = survival::coxph(formula = f, data = data, ...)\n } else {\n mod = survival::coxph(formula = f, data = data, weights = .weights, ...)\n }\n if (.learner$predict.type == \nprob\n) \n mod = attachTrainingInfo(mod, list(surv.range = range(getTaskTargets(.task)[, 1L])))\n mod\n}\n\n\n\n\npredictLearner.surv.coxph = function(.learner, .model, .newdata, ...) {\n if (.learner$predict.type == \nresponse\n) {\n predict(.model$learner.model, newdata = .newdata, type = \nlp\n, ...)\n } else if (.learner$predict.type == \nprob\n) {\n surv.range = getTrainingInfo(.model$learner.model)$surv.range\n times = seq(from = surv.range[1L], to = surv.range[2L], length.out = 1000)\n t(summary(survival::survfit(.model$learner.model, newdata = .newdata, se.fit = FALSE, conf.int = FALSE), \n times = times)$surv)\n } else {\n stop(\nUnknown predict type\n)\n }\n}\n\n\n\n\nClustering\n\n\nFor clustering, you have to return a numeric vector with the IDs of the clusters\nthat the respective datum has been assigned to. The numbering should start at 1.\n\n\nBelow is the definition for the \nFarthestFirst\n learner\nfrom the \nRWeka\n package. Weka\nstarts the IDs of the clusters at 0, so we add 1 to the predicted clusters.\nRWeka has a different way of setting learner parameters; we use the special\n\nWeka_control\n function to do this.\n\n\nmakeRLearner.cluster.FarthestFirst = function() {\n makeRLearnerCluster(\n cl = \ncluster.FarthestFirst\n,\n package = \nRWeka\n,\n par.set = makeParamSet(\n makeIntegerLearnerParam(id = \nN\n, default = 2L, lower = 1L),\n makeIntegerLearnerParam(id = \nS\n, default = 1L, lower = 1L),\n makeLogicalLearnerParam(id = \noutput-debug-info\n, default = FALSE, tunable = FALSE)\n ),\n properties = c(\nnumerics\n),\n name = \nFarthestFirst Clustering Algorithm\n,\n short.name = \nfarthestfirst\n\n )\n}\n\n\n\n\ntrainLearner.cluster.FarthestFirst = function(.learner, .task, .subset, .weights = NULL, ...) {\n ctrl = RWeka::Weka_control(...)\n RWeka::FarthestFirst(getTaskData(.task, .subset), control = ctrl)\n}\n\n\n\n\npredictLearner.cluster.FarthestFirst = function(.learner, .model, .newdata, ...) {\n as.integer(predict(.model$learner.model, .newdata, ...)) + 1L\n}\n\n\n\n\nMultilabel classification\n\n\nAs stated in the \nmultilabel\n section, multilabel classification\nmethods can be divided into problem transformation methods and algorithm adaptation methods.\n\n\nAt this moment the only problem transformation method implemented in \nmlr\n\nis the \nbinary relevance method\n. Integrating more of\nthese methods requires good knowledge of the architecture of the \nmlr\n package.\n\n\nThe integration of an algorithm adaptation multilabel classification learner is easier and\nworks very similar to the normal multiclass-classification.\nIn contrast to the multiclass case, not all of the learner properties are relevant.\nIn particular, whether one-, two-, or multi-class problems are supported is not applicable.\nFurthermore the prediction function output must be a matrix\nwith each prediction of a label in one column and the names of the labels\nas column names. If \n.learner$predict.type\n is \n\"response\"\n the predictions must\nbe logical. If \n.learner$predict.type\n is \n\"prob\"\n and this type of\nprediction is supported by the learner, the matrix must consist of predicted\nprobabilities.\n\n\nBelow is the definition of the \nrFerns\n learner from the\n\nrFerns\n package, which does not support probability predictions.\n\n\nmakeRLearner.multilabel.rFerns = function() {\n makeRLearnerMultilabel(\n cl = \nmultilabel.rFerns\n,\n package = \nrFerns\n,\n par.set = makeParamSet(\n makeIntegerLearnerParam(id = \ndepth\n, default = 5L),\n makeIntegerLearnerParam(id = \nferns\n, default = 1000L)\n ),\n properties = c(\nnumerics\n, \nfactors\n, \nordered\n),\n name = \nRandom ferns\n,\n short.name = \nrFerns\n,\n note = \n\n )\n}\n\n\n\n\ntrainLearner.multilabel.rFerns = function(.learner, .task, .subset, .weights = NULL, ...) {\n d = getTaskData(.task, .subset, target.extra = TRUE)\n rFerns::rFerns(x = d$data, y = as.matrix(d$target), ...)\n}\n\n\n\n\npredictLearner.multilabel.rFerns = function(.learner, .model, .newdata, ...) {\n as.matrix(predict(.model$learner.model, .newdata, ...))\n}", "title": "Create Custom Learners" }, { @@ -623,7 +623,7 @@ { "location": "/create_learner/index.html#survival-analysis", "text": "For survival analysis, you have to return so-called linear predictors in order to compute\nthe default measure for this task type, the cindex (for .learner$predict.type == \"response\" ). For .learner$predict.type == \"prob\" ,\nthere is no substantially meaningful measure (yet). You may either ignore this case or return\nsomething like predicted survival curves (cf. example below). There are three properties that are specific to survival learners:\n\"rcens\", \"lcens\" and \"icens\", defining the type(s) of censoring a learner can handle -- right,\nleft and/or interval censored. Let's have a look at how the Cox Proportional Hazard Model from\npackage survival has been integrated\ninto the survival learner surv.coxph in mlr as an example: makeRLearner.surv.coxph = function() {\n makeRLearnerSurv(\n cl = surv.coxph ,\n package = survival ,\n par.set = makeParamSet(\n makeDiscreteLearnerParam(id = ties , default = efron , values = c( efron , breslow , exact )),\n makeLogicalLearnerParam(id = singular.ok , default = TRUE),\n makeNumericLearnerParam(id = eps , default = 1e-09, lower = 0),\n makeNumericLearnerParam(id = toler.chol , default = .Machine$double.eps^0.75, lower = 0),\n makeIntegerLearnerParam(id = iter.max , default = 20L, lower = 1L),\n makeNumericLearnerParam(id = toler.inf , default = sqrt(.Machine$double.eps^0.75), lower = 0),\n makeIntegerLearnerParam(id = outer.max , default = 10L, lower = 1L),\n makeLogicalLearnerParam(id = model , default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = x , default = FALSE, tunable = FALSE),\n makeLogicalLearnerParam(id = y , default = TRUE, tunable = FALSE)\n ),\n properties = c( missings , numerics , factors , weights , prob , rcens ),\n name = Cox Proportional Hazard Model ,\n short.name = coxph ,\n note = \n )\n} trainLearner.surv.coxph = function(.learner, .task, .subset, .weights = NULL, ...) {\n f = getTaskFormula(.task)\n data = getTaskData(.task, subset = .subset)\n if (is.null(.weights)) {\n mod = survival::coxph(formula = f, data = data, ...)\n } else {\n mod = survival::coxph(formula = f, data = data, weights = .weights, ...)\n }\n if (.learner$predict.type == prob ) \n mod = attachTrainingInfo(mod, list(surv.range = range(getTaskTargets(.task)[, 1L])))\n mod\n} predictLearner.surv.coxph = function(.learner, .model, .newdata, ...) {\n if (.learner$predict.type == response ) {\n predict(.model$learner.model, newdata = .newdata, type = lp , ...)\n } else if (.learner$predict.type == prob ) {\n surv.range = getTrainingInfo(.model$learner.model)$surv.range\n times = seq(from = surv.range[1L], to = surv.range[2L], length.out = 1000)\n t(summary(survival::survfit(.model$learner.model, newdata = .newdata, se.fit = FALSE, conf.int = FALSE), \n times = times)$surv)\n } else {\n stop( Unknown predict type )\n }\n}", - "title": "Survival Analysis" + "title": "Survival analysis" }, { "location": "/create_learner/index.html#clustering", @@ -637,7 +637,7 @@ }, { "location": "/create_measure/index.html", - "text": "Integrating another Measure\n\n\nIn some cases, you might want to evaluate a \nPrediction\n or \nResamplePrediction\n with a\n\nMeasure\n which is not yet implemented in \nmlr\n. This could be either a\nperformance measure which is not listed in the \nAppendix\n or a measure that\nuses a misclassification cost matrix.\n\n\nPerformance measures and aggregation schemes\n\n\nPerformance measures in \nmlr\n are objects of class \nMeasure\n.\nFor example the \nmse\n (mean squared error) looks as follows.\n\n\nstr(mse)\n#\n List of 10\n#\n $ id : chr \nmse\n\n#\n $ minimize : logi TRUE\n#\n $ properties: chr [1:3] \nregr\n \nreq.pred\n \nreq.truth\n\n#\n $ fun :function (task, model, pred, feats, extra.args) \n#\n $ extra.args: list()\n#\n $ best : num 0\n#\n $ worst : num Inf\n#\n $ name : chr \nMean of squared errors\n\n#\n $ note : chr \n\n#\n $ aggr :List of 3\n#\n ..$ id : chr \ntest.mean\n\n#\n ..$ name: chr \nTest mean\n\n#\n ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n ..- attr(*, \nclass\n)= chr \nAggregation\n\n#\n - attr(*, \nclass\n)= chr \nMeasure\n\n\nmse$fun\n#\n function (task, model, pred, feats, extra.args) \n#\n {\n#\n measureMSE(pred$data$truth, pred$data$response)\n#\n }\n#\n \nbytecode: 0x192b7000\n\n#\n \nenvironment: namespace:mlr\n\n\nmeasureMSE\n#\n function (truth, response) \n#\n {\n#\n mean((response - truth)^2)\n#\n }\n#\n \nbytecode: 0xf8e6500\n\n#\n \nenvironment: namespace:mlr\n\n\n\n\n\nSee the \nMeasure\n documentation page for a detailed description of the object\nslots.\n\n\nAt the core is slot \n$fun\n which contains the function that calculates the performance value.\nThe actual work is done by function \nmeasureMSE\n.\nSimilar functions, always following the same naming scheme \nmeasureX\n, exist for most measures.\nSee the \nmeasures\n help page for a complete list.\n\n\nJust as \nTask\n and \nLearner\n objects each \nMeasure\n has an\nidentifier \n$id\n which is for example used to annotate results and plots.\nFor plots there is also the option to use the longer measure \n$name\n instead. See the\ntutorial page on \nVisualization\n for more information.\n\n\nMoreover, a \nMeasure\n includes a number of \n$properties\n that indicate for\nwhich types of learning problems it is suitable and what information is required to\ncalculate it.\nObviously, most measures need the \nPrediction\n object (\n\"req.pred\"\n) and, for supervised\nproblems, the true values of the target variable(s) (\n\"req.truth\"\n).\n\n\nFor \ntuning\n each \nMeasure\n knows its extreme values \n$best\n and\n\n$worst\n and if it wants to be minimized or maximized (\n$minimize\n).\n\n\nFor resampling slot \n$aggr\n specifies how performance values obtained in individual resampling\niterations are aggregated.\nThe most common scheme is \ntest.mean\n, i.e., the unweighted mean of the performances on the\ntest sets.\n\n\nstr(test.mean)\n#\n List of 3\n#\n $ id : chr \ntest.mean\n\n#\n $ name: chr \nTest mean\n\n#\n $ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n - attr(*, \nclass\n)= chr \nAggregation\n\n\ntest.mean$fun\n#\n function (task, perf.test, perf.train, measure, group, pred) \n#\n mean(perf.test)\n#\n \nbytecode: 0x1e7ad968\n\n#\n \nenvironment: namespace:mlr\n\n\n\n\n\nAll aggregation schemes are objects of class \nAggregation\n with the function in slot\n\n$fun\n doing the actual work.\n\n\nYou can change the aggregation scheme of a \nMeasure\n via function\n\nsetAggregation\n. See the tutorial page on \nresampling\n for some examples\nand the \naggregations\n help page for all available aggregation schemes.\n\n\nYou can construct your own \nMeasure\n and \nAggregation\n objects via functions\n\nmakeMeasure\n, \nmakeCostMeasure\n, \nmakeCustomResampledMeasure\n and \nmakeAggregation\n.\nSome examples are shown in the following.\n\n\nConstructing a performance measure\n\n\nFunction \nmakeMeasure\n provides a simple way to construct your own performance measure.\n\n\nBelow this is exemplified by re-implementing the mean misclassification error\n(\nmmce\n).\nWe first write a function that computes the measure on the basis of the true and predicted\nclass labels.\nNote that this function must have certain formal arguments listed in the documentation of\n\nmakeMeasure\n.\nThen the \nMeasure\n object is created and we work with it as usual with the\n\nperformance\n function.\n\n\nSee the \nR\n documentation of \nmakeMeasure\n for more details on the various parameters.\n\n\n## Define a function that calculates the misclassification rate\nmy.mmce.fun = function(task, model, pred, feats, extra.args) {\n tb = table(getPredictionResponse(pred), getPredictionTruth(pred))\n 1 - sum(diag(tb)) / sum(tb)\n}\n\n## Generate the Measure object\nmy.mmce = makeMeasure(\n id = \nmy.mmce\n, name = \nMy Mean Misclassification Error\n,\n properties = c(\nclassif\n, \nclassif.multi\n, \nreq.pred\n, \nreq.truth\n),\n minimize = TRUE, best = 0, worst = 1,\n fun = my.mmce.fun\n)\n\n## Train a learner and make predictions\nmod = train(\nclassif.lda\n, iris.task)\npred = predict(mod, task = iris.task)\n\n## Calculate the performance using the new measure\nperformance(pred, measures = my.mmce)\n#\n my.mmce \n#\n 0.02\n\n## Apparently the result coincides with the mlr implementaion\nperformance(pred, measures = mmce)\n#\n mmce \n#\n 0.02\n\n\n\n\nConstructing a measure for ordinary misclassification costs\n\n\nFor in depth explanations and details see the tutorial page on\n\ncost-sensitive classification\n.\n\n\nTo create a measure that involves ordinary, i.e., class-dependent misclassification costs\nyou can use function \nmakeCostMeasure\n. You first need to define the cost matrix. The rows\nindicate true and the columns predicted classes and the rows and columns have to be named\nby the class labels. The cost matrix can then be wrapped in a \nMeasure\n\nobject and predictions can be evaluated as usual with the \nperformance\n function.\n\n\nSee the \nR\n documentation of function \nmakeCostMeasure\n for details on the various\nparameters.\n\n\n## Create the cost matrix\ncosts = matrix(c(0, 2, 2, 3, 0, 2, 1, 1, 0), ncol = 3)\nrownames(costs) = colnames(costs) = getTaskClassLevels(iris.task)\n\n## Encapsulate the cost matrix in a Measure object\nmy.costs = makeCostMeasure(\n id = \nmy.costs\n, name = \nMy Costs\n,\n costs = costs, task = iris.task,\n minimize = TRUE, best = 0, worst = 3\n)\n\n## Train a learner and make a prediction\nmod = train(\nclassif.lda\n, iris.task)\npred = predict(mod, newdata = iris)\n\n## Calculate the average costs\nperformance(pred, measures = my.costs)\n#\n my.costs \n#\n 0.02666667\n\n\n\n\nCreating an aggregation scheme\n\n\nIt is possible to create your own aggregation scheme by calling the \nmlr\n function\n\nmakeAggregation\n.\nYou need to specify an identifier \nid\n and optionally a \nname\n and, most importantly, write\na function that does the actual aggregation. Note that this function must have a certain\nsignature detailed in the documentation of \nmakeAggregation\n.\n\n\nExample: Evaluating the range of measures\n\n\nLet's say you are interested in the range of the performance values obtained on individual\ntest sets.\n\n\nmy.range.aggr = makeAggregation(id = \ntest.range\n, name = \nTest Range\n,\n fun = function (task, perf.test, perf.train, measure, group, pred)\n diff(range(perf.test))\n)\n\n\n\n\nperf.train\n and \nperf.test\n are both numerical vectors containing the preformances on\nthe train and test data sets.\nIn the usual case (e.g., cross validation), the \nperf.train\n vector is empty.\n\n\nNow we can run a feature selection based on the first measure in the provided\n\nlist\n and see how the other measures turn out.\n\n\n## mmce with default aggregation scheme test.mean\nms1 = mmce\n\n## mmce with new aggregation scheme my.range.aggr\nms2 = setAggregation(ms1, my.range.aggr)\n\n## Minimum and maximum of the mmce over test sets\nms1min = setAggregation(ms1, test.min)\nms1max = setAggregation(ms1, test.max)\n\n## Feature selection\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nres = selectFeatures(\nclassif.rpart\n, iris.task, rdesc, measures = list(ms1, ms2, ms1min, ms1max),\n control = makeFeatSelControlExhaustive(), show.info = FALSE)\n\n## Optimization path, i.e., performances for the 16 possible feature subsets\nperf.data = as.data.frame(res$opt.path)\nhead(perf.data[1:8])\n#\n Sepal.Length Sepal.Width Petal.Length Petal.Width mmce.test.mean\n#\n 1 0 0 0 0 0.70666667\n#\n 2 1 0 0 0 0.31333333\n#\n 3 0 1 0 0 0.50000000\n#\n 4 0 0 1 0 0.09333333\n#\n 5 0 0 0 1 0.04666667\n#\n 6 1 1 0 0 0.28666667\n#\n mmce.test.range mmce.test.min mmce.test.max\n#\n 1 0.16 0.60 0.76\n#\n 2 0.02 0.30 0.32\n#\n 3 0.22 0.36 0.58\n#\n 4 0.10 0.04 0.14\n#\n 5 0.08 0.02 0.10\n#\n 6 0.08 0.24 0.32\n\npd = position_jitter(width = 0.005, height = 0)\np = ggplot(aes(x = mmce.test.range, y = mmce.test.mean, ymax = mmce.test.max, ymin = mmce.test.min,\n color = as.factor(Sepal.Width), pch = as.factor(Petal.Width)), data = perf.data) +\n geom_pointrange(position = pd) +\n coord_flip()\nprint(p)\n\n\n\n\n \n\n\nThe plot shows the range versus the mean misclassification error. The value on the\ny-axis thus corresponds to the length of the error bars. (Note that the points and error\nbars are jittered in y-direction.)", + "text": "Integrating another Measure\n\n\nIn some cases, you might want to evaluate a \nPrediction\n or \nResamplePrediction\n with a\n\nMeasure\n which is not yet implemented in \nmlr\n. This could be either a\nperformance measure which is not listed in the \nAppendix\n or a measure that\nuses a misclassification cost matrix.\n\n\nPerformance measures and aggregation schemes\n\n\nPerformance measures in \nmlr\n are objects of class \nMeasure\n.\nFor example the \nmse\n (mean squared error) looks as follows.\n\n\nstr(mse)\n#\n List of 10\n#\n $ id : chr \nmse\n\n#\n $ minimize : logi TRUE\n#\n $ properties: chr [1:3] \nregr\n \nreq.pred\n \nreq.truth\n\n#\n $ fun :function (task, model, pred, feats, extra.args) \n#\n $ extra.args: list()\n#\n $ best : num 0\n#\n $ worst : num Inf\n#\n $ name : chr \nMean of squared errors\n\n#\n $ note : chr \n\n#\n $ aggr :List of 3\n#\n ..$ id : chr \ntest.mean\n\n#\n ..$ name: chr \nTest mean\n\n#\n ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n ..- attr(*, \nclass\n)= chr \nAggregation\n\n#\n - attr(*, \nclass\n)= chr \nMeasure\n\n\nmse$fun\n#\n function (task, model, pred, feats, extra.args) \n#\n {\n#\n measureMSE(pred$data$truth, pred$data$response)\n#\n }\n#\n \nbytecode: 0x1f0f1820\n\n#\n \nenvironment: namespace:mlr\n\n\nmeasureMSE\n#\n function (truth, response) \n#\n {\n#\n mean((response - truth)^2)\n#\n }\n#\n \nbytecode: 0x111b2dd8\n\n#\n \nenvironment: namespace:mlr\n\n\n\n\n\nSee the \nMeasure\n documentation page for a detailed description of the object\nslots.\n\n\nAt the core is slot \n$fun\n which contains the function that calculates the performance value.\nThe actual work is done by function \nmeasureMSE\n.\nSimilar functions, always following the same naming scheme \nmeasureX\n, exist for most measures.\nSee the \nmeasures\n help page for a complete list.\n\n\nJust as \nTask\n and \nLearner\n objects each \nMeasure\n has an\nidentifier \n$id\n which is for example used to annotate results and plots.\nFor plots there is also the option to use the longer measure \n$name\n instead. See the\ntutorial page on \nVisualization\n for more information.\n\n\nMoreover, a \nMeasure\n includes a number of \n$properties\n that indicate for\nwhich types of learning problems it is suitable and what information is required to\ncalculate it.\nObviously, most measures need the \nPrediction\n object (\n\"req.pred\"\n) and, for supervised\nproblems, the true values of the target variable(s) (\n\"req.truth\"\n).\n\n\nFor \ntuning\n each \nMeasure\n knows its extreme values \n$best\n and\n\n$worst\n and if it wants to be minimized or maximized (\n$minimize\n).\n\n\nFor resampling slot \n$aggr\n specifies how performance values obtained in individual resampling\niterations are aggregated.\nThe most common scheme is \ntest.mean\n, i.e., the unweighted mean of the performances on the\ntest sets.\n\n\nstr(test.mean)\n#\n List of 3\n#\n $ id : chr \ntest.mean\n\n#\n $ name: chr \nTest mean\n\n#\n $ fun :function (task, perf.test, perf.train, measure, group, pred) \n#\n - attr(*, \nclass\n)= chr \nAggregation\n\n\ntest.mean$fun\n#\n function (task, perf.test, perf.train, measure, group, pred) \n#\n mean(perf.test)\n#\n \nbytecode: 0x1ab3c3d8\n\n#\n \nenvironment: namespace:mlr\n\n\n\n\n\nAll aggregation schemes are objects of class \nAggregation\n with the function in slot\n\n$fun\n doing the actual work.\n\n\nYou can change the aggregation scheme of a \nMeasure\n via function\n\nsetAggregation\n. See the tutorial page on \nresampling\n for some examples\nand the \naggregations\n help page for all available aggregation schemes.\n\n\nYou can construct your own \nMeasure\n and \nAggregation\n objects via functions\n\nmakeMeasure\n, \nmakeCostMeasure\n, \nmakeCustomResampledMeasure\n and \nmakeAggregation\n.\nSome examples are shown in the following.\n\n\nConstructing a performance measure\n\n\nFunction \nmakeMeasure\n provides a simple way to construct your own performance measure.\n\n\nBelow this is exemplified by re-implementing the mean misclassification error\n(\nmmce\n).\nWe first write a function that computes the measure on the basis of the true and predicted\nclass labels.\nNote that this function must have certain formal arguments listed in the documentation of\n\nmakeMeasure\n.\nThen the \nMeasure\n object is created and we work with it as usual with the\n\nperformance\n function.\n\n\nSee the \nR\n documentation of \nmakeMeasure\n for more details on the various parameters.\n\n\n## Define a function that calculates the misclassification rate\nmy.mmce.fun = function(task, model, pred, feats, extra.args) {\n tb = table(getPredictionResponse(pred), getPredictionTruth(pred))\n 1 - sum(diag(tb)) / sum(tb)\n}\n\n## Generate the Measure object\nmy.mmce = makeMeasure(\n id = \nmy.mmce\n, name = \nMy Mean Misclassification Error\n,\n properties = c(\nclassif\n, \nclassif.multi\n, \nreq.pred\n, \nreq.truth\n),\n minimize = TRUE, best = 0, worst = 1,\n fun = my.mmce.fun\n)\n\n## Train a learner and make predictions\nmod = train(\nclassif.lda\n, iris.task)\npred = predict(mod, task = iris.task)\n\n## Calculate the performance using the new measure\nperformance(pred, measures = my.mmce)\n#\n my.mmce \n#\n 0.02\n\n## Apparently the result coincides with the mlr implementaion\nperformance(pred, measures = mmce)\n#\n mmce \n#\n 0.02\n\n\n\n\nConstructing a measure for ordinary misclassification costs\n\n\nFor in depth explanations and details see the tutorial page on\n\ncost-sensitive classification\n.\n\n\nTo create a measure that involves ordinary, i.e., class-dependent misclassification costs\nyou can use function \nmakeCostMeasure\n. You first need to define the cost matrix. The rows\nindicate true and the columns predicted classes and the rows and columns have to be named\nby the class labels. The cost matrix can then be wrapped in a \nMeasure\n\nobject and predictions can be evaluated as usual with the \nperformance\n function.\n\n\nSee the \nR\n documentation of function \nmakeCostMeasure\n for details on the various\nparameters.\n\n\n## Create the cost matrix\ncosts = matrix(c(0, 2, 2, 3, 0, 2, 1, 1, 0), ncol = 3)\nrownames(costs) = colnames(costs) = getTaskClassLevels(iris.task)\n\n## Encapsulate the cost matrix in a Measure object\nmy.costs = makeCostMeasure(\n id = \nmy.costs\n, name = \nMy Costs\n,\n costs = costs, task = iris.task,\n minimize = TRUE, best = 0, worst = 3\n)\n\n## Train a learner and make a prediction\nmod = train(\nclassif.lda\n, iris.task)\npred = predict(mod, newdata = iris)\n\n## Calculate the average costs\nperformance(pred, measures = my.costs)\n#\n my.costs \n#\n 0.02666667\n\n\n\n\nCreating an aggregation scheme\n\n\nIt is possible to create your own aggregation scheme by calling the \nmlr\n function\n\nmakeAggregation\n.\nYou need to specify an identifier \nid\n and optionally a \nname\n and, most importantly, write\na function that does the actual aggregation. Note that this function must have a certain\nsignature detailed in the documentation of \nmakeAggregation\n.\n\n\nExample: Evaluating the range of measures\n\n\nLet's say you are interested in the range of the performance values obtained on individual\ntest sets.\n\n\nmy.range.aggr = makeAggregation(id = \ntest.range\n, name = \nTest Range\n,\n fun = function (task, perf.test, perf.train, measure, group, pred)\n diff(range(perf.test))\n)\n\n\n\n\nperf.train\n and \nperf.test\n are both numerical vectors containing the preformances on\nthe train and test data sets.\nIn the usual case (e.g., cross validation), the \nperf.train\n vector is empty.\n\n\nNow we can run a feature selection based on the first measure in the provided\n\nlist\n and see how the other measures turn out.\n\n\n## mmce with default aggregation scheme test.mean\nms1 = mmce\n\n## mmce with new aggregation scheme my.range.aggr\nms2 = setAggregation(ms1, my.range.aggr)\n\n## Minimum and maximum of the mmce over test sets\nms1min = setAggregation(ms1, test.min)\nms1max = setAggregation(ms1, test.max)\n\n## Feature selection\nrdesc = makeResampleDesc(\nCV\n, iters = 3)\nres = selectFeatures(\nclassif.rpart\n, iris.task, rdesc, measures = list(ms1, ms2, ms1min, ms1max),\n control = makeFeatSelControlExhaustive(), show.info = FALSE)\n\n## Optimization path, i.e., performances for the 16 possible feature subsets\nperf.data = as.data.frame(res$opt.path)\nhead(perf.data[1:8])\n#\n Sepal.Length Sepal.Width Petal.Length Petal.Width mmce.test.mean\n#\n 1 0 0 0 0 0.70666667\n#\n 2 1 0 0 0 0.31333333\n#\n 3 0 1 0 0 0.50000000\n#\n 4 0 0 1 0 0.09333333\n#\n 5 0 0 0 1 0.04666667\n#\n 6 1 1 0 0 0.28666667\n#\n mmce.test.range mmce.test.min mmce.test.max\n#\n 1 0.16 0.60 0.76\n#\n 2 0.02 0.30 0.32\n#\n 3 0.22 0.36 0.58\n#\n 4 0.10 0.04 0.14\n#\n 5 0.08 0.02 0.10\n#\n 6 0.08 0.24 0.32\n\npd = position_jitter(width = 0.005, height = 0)\np = ggplot(aes(x = mmce.test.range, y = mmce.test.mean, ymax = mmce.test.max, ymin = mmce.test.min,\n color = as.factor(Sepal.Width), pch = as.factor(Petal.Width)), data = perf.data) +\n geom_pointrange(position = pd) +\n coord_flip()\nprint(p)\n\n\n\n\n \n\n\nThe plot shows the range versus the mean misclassification error. The value on the\ny-axis thus corresponds to the length of the error bars. (Note that the points and error\nbars are jittered in y-direction.)", "title": "Create Custom Measures" }, { @@ -647,7 +647,7 @@ }, { "location": "/create_measure/index.html#performance-measures-and-aggregation-schemes", - "text": "Performance measures in mlr are objects of class Measure .\nFor example the mse (mean squared error) looks as follows. str(mse)\n# List of 10\n# $ id : chr mse \n# $ minimize : logi TRUE\n# $ properties: chr [1:3] regr req.pred req.truth \n# $ fun :function (task, model, pred, feats, extra.args) \n# $ extra.args: list()\n# $ best : num 0\n# $ worst : num Inf\n# $ name : chr Mean of squared errors \n# $ note : chr \n# $ aggr :List of 3\n# ..$ id : chr test.mean \n# ..$ name: chr Test mean \n# ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n# ..- attr(*, class )= chr Aggregation \n# - attr(*, class )= chr Measure \n\nmse$fun\n# function (task, model, pred, feats, extra.args) \n# {\n# measureMSE(pred$data$truth, pred$data$response)\n# }\n# bytecode: 0x192b7000 \n# environment: namespace:mlr \n\nmeasureMSE\n# function (truth, response) \n# {\n# mean((response - truth)^2)\n# }\n# bytecode: 0xf8e6500 \n# environment: namespace:mlr See the Measure documentation page for a detailed description of the object\nslots. At the core is slot $fun which contains the function that calculates the performance value.\nThe actual work is done by function measureMSE .\nSimilar functions, always following the same naming scheme measureX , exist for most measures.\nSee the measures help page for a complete list. Just as Task and Learner objects each Measure has an\nidentifier $id which is for example used to annotate results and plots.\nFor plots there is also the option to use the longer measure $name instead. See the\ntutorial page on Visualization for more information. Moreover, a Measure includes a number of $properties that indicate for\nwhich types of learning problems it is suitable and what information is required to\ncalculate it.\nObviously, most measures need the Prediction object ( \"req.pred\" ) and, for supervised\nproblems, the true values of the target variable(s) ( \"req.truth\" ). For tuning each Measure knows its extreme values $best and $worst and if it wants to be minimized or maximized ( $minimize ). For resampling slot $aggr specifies how performance values obtained in individual resampling\niterations are aggregated.\nThe most common scheme is test.mean , i.e., the unweighted mean of the performances on the\ntest sets. str(test.mean)\n# List of 3\n# $ id : chr test.mean \n# $ name: chr Test mean \n# $ fun :function (task, perf.test, perf.train, measure, group, pred) \n# - attr(*, class )= chr Aggregation \n\ntest.mean$fun\n# function (task, perf.test, perf.train, measure, group, pred) \n# mean(perf.test)\n# bytecode: 0x1e7ad968 \n# environment: namespace:mlr All aggregation schemes are objects of class Aggregation with the function in slot $fun doing the actual work. You can change the aggregation scheme of a Measure via function setAggregation . See the tutorial page on resampling for some examples\nand the aggregations help page for all available aggregation schemes. You can construct your own Measure and Aggregation objects via functions makeMeasure , makeCostMeasure , makeCustomResampledMeasure and makeAggregation .\nSome examples are shown in the following.", + "text": "Performance measures in mlr are objects of class Measure .\nFor example the mse (mean squared error) looks as follows. str(mse)\n# List of 10\n# $ id : chr mse \n# $ minimize : logi TRUE\n# $ properties: chr [1:3] regr req.pred req.truth \n# $ fun :function (task, model, pred, feats, extra.args) \n# $ extra.args: list()\n# $ best : num 0\n# $ worst : num Inf\n# $ name : chr Mean of squared errors \n# $ note : chr \n# $ aggr :List of 3\n# ..$ id : chr test.mean \n# ..$ name: chr Test mean \n# ..$ fun :function (task, perf.test, perf.train, measure, group, pred) \n# ..- attr(*, class )= chr Aggregation \n# - attr(*, class )= chr Measure \n\nmse$fun\n# function (task, model, pred, feats, extra.args) \n# {\n# measureMSE(pred$data$truth, pred$data$response)\n# }\n# bytecode: 0x1f0f1820 \n# environment: namespace:mlr \n\nmeasureMSE\n# function (truth, response) \n# {\n# mean((response - truth)^2)\n# }\n# bytecode: 0x111b2dd8 \n# environment: namespace:mlr See the Measure documentation page for a detailed description of the object\nslots. At the core is slot $fun which contains the function that calculates the performance value.\nThe actual work is done by function measureMSE .\nSimilar functions, always following the same naming scheme measureX , exist for most measures.\nSee the measures help page for a complete list. Just as Task and Learner objects each Measure has an\nidentifier $id which is for example used to annotate results and plots.\nFor plots there is also the option to use the longer measure $name instead. See the\ntutorial page on Visualization for more information. Moreover, a Measure includes a number of $properties that indicate for\nwhich types of learning problems it is suitable and what information is required to\ncalculate it.\nObviously, most measures need the Prediction object ( \"req.pred\" ) and, for supervised\nproblems, the true values of the target variable(s) ( \"req.truth\" ). For tuning each Measure knows its extreme values $best and $worst and if it wants to be minimized or maximized ( $minimize ). For resampling slot $aggr specifies how performance values obtained in individual resampling\niterations are aggregated.\nThe most common scheme is test.mean , i.e., the unweighted mean of the performances on the\ntest sets. str(test.mean)\n# List of 3\n# $ id : chr test.mean \n# $ name: chr Test mean \n# $ fun :function (task, perf.test, perf.train, measure, group, pred) \n# - attr(*, class )= chr Aggregation \n\ntest.mean$fun\n# function (task, perf.test, perf.train, measure, group, pred) \n# mean(perf.test)\n# bytecode: 0x1ab3c3d8 \n# environment: namespace:mlr All aggregation schemes are objects of class Aggregation with the function in slot $fun doing the actual work. You can change the aggregation scheme of a Measure via function setAggregation . See the tutorial page on resampling for some examples\nand the aggregations help page for all available aggregation schemes. You can construct your own Measure and Aggregation objects via functions makeMeasure , makeCostMeasure , makeCustomResampledMeasure and makeAggregation .\nSome examples are shown in the following.", "title": "Performance measures and aggregation schemes" }, { @@ -697,22 +697,22 @@ }, { "location": "/integrated_learners/index.html", - "text": "Integrated Learners\n\n\nThis page lists the learning methods already integrated in \nmlr\n.\n\n\nColumns \nNum.\n, \nFac.\n, \nNAs\n, and \nWeights\n indicate if a method can cope with\nnumerical and factor predictors, if it can deal with missing values in a meaningful way\n(other than simply removing observations with missing values) and if observation\nweights are supported.\n\n\nColumn \nProps\n shows further properties of the learning methods.\n\nordered\n indicates that a method can deal with ordered factor features.\nFor \nclassification\n, you can see if binary and/or multi-class problems are supported\nand if the learner accepts class weights.\nFor \nsurvival analysis\n, the censoring type is shown.\nFor example \nrcens\n means that the learning method can deal with right censored data.\nMoreover, the type of prediction is displayed, where \nprob\n indicates that probabilities\ncan be predicted.\nFor \nregression\n, \nse\n means that additional to the mean response standard errors can be predicted.\nSee also \nRLearner\n for details.\n\n\nClassification (70)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nclassif.ada\n \nada\n\n\nada Boosting\n\n\nada\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.avNNet\n \navNNet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nsize\nhas been set to 3 by default. Doing bagging training of \nnnet\nif set \nbag=TRUE\n\n\n\n\n\n\nclassif.bartMachine\n \nbartmachine\n\n\nBayesian Additive Regression Trees\n\n\nbartMachine\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nprob\ntwoclass\n\n\n'use_missing_data' has been set to TRUE by default to allow missing data support\n\n\n\n\n\n\nclassif.bdk\n \nbdk\n\n\nBi-Directional Kohonen map\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.binomial\n \nbinomial\n\n\nBinomial Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with freely choosable binomial link function via learner param 'link'.\n\n\n\n\n\n\nclassif.blackboost\n \nblackbst\n\n\nGradient Boosting With Regression Trees\n\n\nmboost\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.boosting\n \nadabag\n\n\nAdabag Boosting\n\n\nadabag\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nxval\nhas been set to 0 by default for speed.\n\n\n\n\n\n\nclassif.bst\n \nbst\n\n\nGradient Boosting\n\n\nbst\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\nThe argument \nlearner\nhas been renamed to \nLearner\ndue to a name conflict with \nsetHyerPars\n \nLearner\nhas been set to \nlm\nby default.\n\n\n\n\n\n\nclassif.cforest\n \ncforest\n\n\nRandom forest based on conditional inference trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.clusterSVM\n \nclusterSVM\n\n\nClustered Support Vector Machines\n\n\nSwarmSVM\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\ncenters\nset to 2 by default\n\n\n\n\n\n\nclassif.ctree\n \nctree\n\n\nConditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.dbnDNN\n \ndbn.dnn\n\n\nDeep neural network with weights initialized by DBN\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\nset to \nsoftmax\nby default\n\n\n\n\n\n\nclassif.dcSVM\n \ndcSVM\n\n\nDivided-Conquer Support Vector Machines\n\n\nSwarmSVM\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\n\n\n\n\n\n\nclassif.extraTrees\n \nextraTrees\n\n\nExtremely Randomized Trees\n\n\nextraTrees\n\n\nX\n\n\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.fnn\n \nfnn\n\n\nFast k-Nearest Neighbour\n\n\nFNN\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.gaterSVM\n \ngaterSVM\n\n\nMixture of SVMs with Neural Network Gater Function\n\n\nSwarmSVM\ne1071\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\nm set to 3 and max.iter set to 1 by default\n\n\n\n\n\n\nclassif.gbm\n \ngbm\n\n\nGradient Boosting Machine\n\n\ngbm\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.geoDA\n \ngeoda\n\n\nGeometric Predictive Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.glmboost\n \nglmbst\n\n\nBoosting for GLMs\n\n\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nfamily\nhas been set to \nBinomial()\nby default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.\n\n\n\n\n\n\nclassif.glmnet\n \nglmnet\n\n\nGLM with Lasso or Elasticnet Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nclassif.hdrda\n \nhdrda\n\n\nHigh-Dimensional Regularized Discriminant Analysis\n\n\nsparsediscrim\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.IBk\n \nibk\n\n\nk-Nearest Neighbours\n\n\nRWeka\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.J48\n \nj48\n\n\nJ48 Decision Trees\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.JRip\n \njrip\n\n\nPropositional Rule Learner\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.kknn\n \nkknn\n\n\nk-Nearest Neighbor\n\n\nkknn\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.knn\n \nknn\n\n\nk-Nearest Neighbor\n\n\nclass\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.ksvm\n \nksvm\n\n\nSupport Vector Machines\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\nKernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that \nfit\nhas been set to \nFALSE\nby default for speed.\n\n\n\n\n\n\nclassif.lda\n \nlda\n\n\nLinear Discriminant Analysis\n\n\nMASS\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n\n\n\n\n\nclassif.LiblineaRL1L2SVC\n \nliblinl1l2svc\n\n\nL1-Regularized L2-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL1LogReg\n \nliblinl1logreg\n\n\nL1-Regularized Logistic Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL2L1SVC\n \nliblinl2l1svc\n\n\nL2-Regularized L1-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL2LogReg\n \nliblinl2logreg\n\n\nL2-Regularized Logistic Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\ntype 0 is primal and type 7 is dual problem\n\n\n\n\n\n\nclassif.LiblineaRL2SVC\n \nliblinl2svc\n\n\nL2-Regularized L2-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\ntype 2 is primal and type 1 is dual problem\n\n\n\n\n\n\nclassif.LiblineaRMultiClassSVC\n \nliblinmulticlasssvc\n\n\nSupport Vector Classification by Crammer and Singer\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.linDA\n \nlinda\n\n\nLinear Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.logreg\n \nlogreg\n\n\nLogistic Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with family binomial/logit.\n\n\n\n\n\n\nclassif.lqa\n \nlqa\n\n\nFitting penalized Generalized Linear Models with the LQA algorithm\n\n\nlqa\n\n\nX\n\n\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\npenalty\nhas been set to \nlasso\n and \nlambda\nto 0.1 by default.\n\n\n\n\n\n\nclassif.lssvm\n \nlssvm\n\n\nLeast Squares Support Vector Machine\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\nfitted\nhas been set to \nFALSE\nby default for speed.\n\n\n\n\n\n\nclassif.lvq1\n \nlvq1\n\n\nLearning Vector Quantization\n\n\nclass\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.mda\n \nmda\n\n\nMixture Discriminant Analysis\n\n\nmda\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nkeep.fitted\nhas been set to \nFALSE\nby default for speed and we use start.method='lvq' for more robust behavior / less technical crashes\n\n\n\n\n\n\nclassif.mlp\n \nmlp\n\n\nMulti-Layer Perceptron\n\n\nRSNNS\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.multinom\n \nmultinom\n\n\nMultinomial Regression\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.naiveBayes\n \nnbayes\n\n\nNaive Bayes\n\n\ne1071\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.neuralnet\n \nneuralnet\n\n\nNeural Network from neuralnet\n\n\nneuralnet\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\nerr.fct\nhas been set to \nce\nto do classification.\n\n\n\n\n\n\nclassif.nnet\n \nnnet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nsize\nhas been set to 3 by default.\n\n\n\n\n\n\nclassif.nnTrain\n \nnn.train\n\n\nTraining Neural Network by Backpropagation\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\nset to \nsoftmax\nby default\n\n\n\n\n\n\nclassif.nodeHarvest\n \nnodeHarvest\n\n\nNode Harvest\n\n\nnodeHarvest\n\n\nX\n\n\nX\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.OneR\n \noner\n\n\n1-R Classifier\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.pamr\n \npamr\n\n\nNearest shrunken centroid\n\n\npamr\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\nthreshold for prediction (\nthreshold.predict\n has been set to \n1\nby default\n\n\n\n\n\n\nclassif.PART\n \npart\n\n\nPART Decision Lists\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.plr\n \nplr\n\n\nLogistic Regression with a L2 Penalty\n\n\nstepPlr\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nAIC and BIC penalty types can be selected via the new parameter \ncp.type\n\n\n\n\n\n\nclassif.plsdaCaret\n \nplsdacaret\n\n\nPartial Least Squares (PLS) Discriminant Analysis\n\n\ncaret\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.probit\n \nprobit\n\n\nProbit Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with family binomial/probit.\n\n\n\n\n\n\nclassif.qda\n \nqda\n\n\nQuadratic Discriminant Analysis\n\n\nMASS\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n\n\n\n\n\nclassif.quaDA\n \nquada\n\n\nQuadratic Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.randomForest\n \nrf\n\n\nRandom Forest\n\n\nrandomForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nordered\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.randomForestSRC\n \nrfsrc\n\n\nRandom Forest\n\n\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n'na.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nclassif.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rda\n \nrda\n\n\nRegularized Discriminant Analysis\n\n\nklaR\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nestimate.error\nhas been set to \nFALSE\nby default for speed.\n\n\n\n\n\n\nclassif.rFerns\n \nrFerns\n\n\nRandom ferns\n\n\nrFerns\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nordered\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rknn\n \nrknn\n\n\nRandom k-Nearest-Neighbors\n\n\nrknn\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nordered\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rotationForest\n \nrotationForest\n\n\nRotation Forest\n\n\nrotationForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rpart\n \nrpart\n\n\nDecision Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nxval\nhas been set to 0 by default for speed.\n\n\n\n\n\n\nclassif.rrlda\n \nrrlda\n\n\nRobust Regularized Linear Discriminant Analysis\n\n\nrrlda\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.saeDNN\n \nsae.dnn\n\n\nDeep neural network with weights initialized by Stacked AutoEncoder\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\nset to \nsoftmax\nby default\n\n\n\n\n\n\nclassif.sda\n \nsda\n\n\nShrinkage Discriminant Analysis\n\n\nsda\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.sparseLDA\n \nsparseLDA\n\n\nSparse Discriminant Analysis\n\n\nsparseLDA\nMASS\nelasticnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nArguments Q and stop are not yet provided as they depend on the task.\n\n\n\n\n\n\nclassif.svm\n \nsvm\n\n\nSupport Vector Machines (libsvm)\n\n\ne1071\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.xgboost\n \nxgboost\n\n\neXtreme Gradient Boosting\n\n\nxgboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nAll setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default\n\n\n\n\n\n\nclassif.xyf\n \nxyf\n\n\nX-Y fused self-organising maps\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\n\n\nRegression (49)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nregr.avNNet\n \navNNet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nsize\nhas been set to 3 by default.\n\n\n\n\n\n\nregr.bartMachine\n \nbartmachine\n\n\nBayesian Additive Regression Trees\n\n\nbartMachine\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n'use_missing_data' has been set to TRUE by default to allow missing data support\n\n\n\n\n\n\nregr.bcart\n \nbcart\n\n\nBayesian CART\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.bdk\n \nbdk\n\n\nBi-Directional Kohonen map\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.bgp\n \nbgp\n\n\nBayesian Gaussian Process\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.bgpllm\n \nbgpllm\n\n\nBayesian Gaussian Process with jumps to the Limiting Linear Model\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.blackboost\n \nblackbst\n\n\nGradient Boosting with Regression Trees\n\n\nmboost\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.blm\n \nblm\n\n\nBayesian Linear Model\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.brnn\n \nbrnn\n\n\nBayesian regularization for feed-forward neural networks\n\n\nbrnn\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.bst\n \nbst\n\n\nGradient Boosting\n\n\nbst\n\n\nX\n\n\n\n\n\n\n\n\n\n\nThe argument \nlearner\nhas been renamed to \nLearner\ndue to a name conflict with \nsetHyerPars\n\n\n\n\n\n\nregr.btgp\n \nbtgp\n\n\nBayesian Treed Gaussian Process\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.btgpllm\n \nbtgpllm\n\n\nBayesian Treed Gaussian Process with jumps to the Limiting Linear Model\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.btlm\n \nbtlm\n\n\nBayesian Treed Linear Model\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.cforest\n \ncforest\n\n\nRandom Forest Based on Conditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.crs\n \ncrs\n\n\nRegression Splines\n\n\ncrs\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nse\n\n\n\n\n\n\n\n\nregr.ctree\n \nctree\n\n\nConditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.cubist\n \ncubist\n\n\nCubist\n\n\nCubist\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\nregr.earth\n \nearth\n\n\nMultivariate Adaptive Regression Splines\n\n\nearth\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.elmNN\n \nelmNN\n\n\nExtreme Learning Machine for Single Hidden Layer Feedforward Neural Networks\n\n\nelmNN\n\n\nX\n\n\n\n\n\n\n\n\n\n\nnhid has been set to 1 and actfun has been set to \"sig\" by default\n\n\n\n\n\n\nregr.extraTrees\n \nextraTrees\n\n\nExtremely Randomized Trees\n\n\nextraTrees\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\n\n\n\n\nregr.fnn\n \nfnn\n\n\nFast k-Nearest Neighbor\n\n\nFNN\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.frbs\n \nfrbs\n\n\nFuzzy Rule-based Systems\n\n\nfrbs\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.gbm\n \ngbm\n\n\nGradient Boosting Machine\n\n\ngbm\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\ndistribution\nhas been set to \ngaussian\n by default.\n\n\n\n\n\n\nregr.glmboost\n \nglmboost\n\n\nBoosting for GLMs\n\n\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nMaximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'.\n\n\n\n\n\n\nregr.glmnet\n \nglmnet\n\n\nGLM with Lasso or Elasticnet Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nregr.IBk\n \nibk\n\n\nK-Nearest Neighbours\n\n\nRWeka\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.kknn\n \nkknn\n\n\nK-Nearest-Neighbor regression\n\n\nkknn\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.ksvm\n \nksvm\n\n\nSupport Vector Machines\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\nKernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that \nfit\nhas been set to \nFALSE\nby default for speed.\n\n\n\n\n\n\nregr.LiblineaRL2L1SVR\n \nliblinl2l1svr\n\n\nL2-Regularized L1-Loss Support Vector Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.LiblineaRL2L2SVR\n \nliblinl2l2svr\n\n\nL2-Regularized L2-Loss Support Vector Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntype 11 is primal and 12 is dual problem\n\n\n\n\n\n\nregr.lm\n \nlm\n\n\nSimple Linear Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nse\n\n\n\n\n\n\n\n\nregr.mars\n \nmars\n\n\nMultivariate Adaptive Regression Splines\n\n\nmda\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.mob\n \nmob\n\n\nModel-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node\n\n\nparty\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\n\n\n\n\n\n\nregr.nnet\n \nnnet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nsize\nhas been set to 3 by default.\n\n\n\n\n\n\nregr.nodeHarvest\n \nnodeHarvest\n\n\nNode Harvest\n\n\nnodeHarvest\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.pcr\n \npcr\n\n\nPrincipal Component Regression\n\n\npls\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.penalized.lasso\n \nlasso\n\n\nLasso Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.penalized.ridge\n \nridge\n\n\nPenalized Ridge Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.plsr\n \nplsr\n\n\nPartial Least Squares Regression\n\n\npls\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.randomForest\n \nrf\n\n\nRandom Forest\n\n\nrandomForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nse\n\n\n\n\n\n\n\n\nregr.randomForestSRC\n \nrfsrc\n\n\nRandom Forest\n\n\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\nna.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nregr.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.rknn\n \nrknn\n\n\nRandom k-Nearest-Neighbors\n\n\nrknn\n\n\nX\n\n\n\n\n\n\n\n\nordered\n\n\n\n\n\n\n\n\nregr.rpart\n \nrpart\n\n\nDecision Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nxval\nhas been set to 0 by default for speed.\n\n\n\n\n\n\nregr.rsm\n \nrsm\n\n\nResponse Surface Regression\n\n\nrsm\n\n\nX\n\n\n\n\n\n\n\n\n\n\nYou select the order of the regression by using modelfun = \"FO\" (first order), \"TWI\" (two-way interactions, this is with 1st oder terms!) and \"SO\" (full second order)\n\n\n\n\n\n\nregr.rvm\n \nrvm\n\n\nRelevance Vector Machine\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\nKernel parameters have to be passed directly and not by using the kpar list in rvm. Note that \nfit\nhas been set to \nFALSE\nby default for speed.\n\n\n\n\n\n\nregr.svm\n \nsvm\n\n\nSupport Vector Machines (libsvm)\n\n\ne1071\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.xgboost\n \nxgboost\n\n\neXtreme Gradient Boosting\n\n\nxgboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nAll setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default\n\n\n\n\n\n\nregr.xyf\n \nxyf\n\n\nX-Y fused self-organising maps\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSurvival analysis (9)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nsurv.cforest\n \ncrf\n\n\nRandom Forest based on Conditional Inference Trees\n\n\nparty\nsurvival\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\nrcens\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nsurv.coxph\n \ncoxph\n\n\nCox Proportional Hazard Model\n\n\nsurvival\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nprob\nrcens\n\n\n\n\n\n\n\n\nsurv.cvglmnet\n \ncvglmnet\n\n\nGLM with Regularization (Cross Validated Lambda)\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.glmboost\n \nglmboost\n\n\nGradient Boosting with Componentwise Linear Models\n\n\nsurvival\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nfamily\nhas been set to \nCoxPH()\nby default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.\n\n\n\n\n\n\nsurv.glmnet\n \nglmnet\n\n\nGLM with Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.penalized\n \npenalized\n\n\nPenalized Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.randomForestSRC\n \nrfsrc\n\n\nRandom Forests for Survival\n\n\nsurvival\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nordered\nrcens\n\n\n'na.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nsurv.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\nprob\nrcens\n\n\n\n\n\n\n\n\nsurv.rpart\n \nrpart\n\n\nSurvival Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\nrcens\n\n\nxval\nhas been set to 0 by default for speed.\n\n\n\n\n\n\n\n\nCluster analysis (5)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\ncluster.Cobweb\n \ncobweb\n\n\nCobweb Clustering Algorithm\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.EM\n \nem\n\n\nExpectation-Maximization Clustering\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.FarthestFirst\n \nfarthestfirst\n\n\nFarthestFirst Clustering Algorithm\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.SimpleKMeans\n \nsimplekmeans\n\n\nK-Means Clustering\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.XMeans\n \nxmeans\n\n\nXMeans (k-means with automatic determination of k)\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\nYou may have to install the XMeans Weka package: WPM('install-package', 'XMeans').\n\n\n\n\n\n\n\n\nCost-sensitive classification\n\n\nFor \nordinary misclassification costs\n you can use all the standard classification methods listed\nabove.\n\n\nFor \nexample-dependent costs\n there are several ways to generate cost-sensitive learners from\nordinary regression and classification learners.\nSee section \ncost-sensitive classification\n and the documentation\nof \nmakeCostSensClassifWrapper\n, \nmakeCostSensRegrWrapper\n and \nmakeCostSensWeightedPairsWrapper\n\nfor details.\n\n\nMultilabel classification (1)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nmultilabel.rFerns\n \nrFerns\n\n\nRandom ferns\n\n\nrFerns\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\n\n\n\n\n\n\n\n\n\n\nMoreover, you can use the binary relevance method to apply ordinary classification learners\nto the multilabel problem. See the documentation of function \nmakeMultilabelBinaryRelevanceWrapper\n\nand the tutorial section on \nmultilabel classification\n for details.", + "text": "Integrated Learners\n\n\nThis page lists the learning methods already integrated in \nmlr\n.\n\n\nColumns \nNum.\n, \nFac.\n, \nNAs\n, and \nWeights\n indicate if a method can cope with\nnumerical and factor predictors, if it can deal with missing values in a meaningful way\n(other than simply removing observations with missing values) and if observation\nweights are supported.\n\n\nColumn \nProps\n shows further properties of the learning methods.\n\nordered\n indicates that a method can deal with ordered factor features.\nFor \nclassification\n, you can see if binary and/or multi-class problems are supported\nand if the learner accepts class weights.\nFor \nsurvival analysis\n, the censoring type is shown.\nFor example \nrcens\n means that the learning method can deal with right censored data.\nMoreover, the type of prediction is displayed, where \nprob\n indicates that probabilities\ncan be predicted.\nFor \nregression\n, \nse\n means that additional to the mean response standard errors can be predicted.\nSee also \nRLearner\n for details.\n\n\nClassification (70)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nclassif.ada\n \nada\n\n\nada Boosting\n\n\nada\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.avNNet\n \navNNet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nsize\n has been set to 3 by default. Doing bagging training of \nnnet\n if set \nbag=TRUE\n.\n\n\n\n\n\n\nclassif.bartMachine\n \nbartmachine\n\n\nBayesian Additive Regression Trees\n\n\nbartMachine\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nprob\ntwoclass\n\n\n'use_missing_data' has been set to TRUE by default to allow missing data support\n\n\n\n\n\n\nclassif.bdk\n \nbdk\n\n\nBi-Directional Kohonen map\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.binomial\n \nbinomial\n\n\nBinomial Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with freely choosable binomial link function via learner param 'link'.\n\n\n\n\n\n\nclassif.blackboost\n \nblackbst\n\n\nGradient Boosting With Regression Trees\n\n\nmboost\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.boosting\n \nadabag\n\n\nAdabag Boosting\n\n\nadabag\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nxval\n has been set to 0 by default for speed.\n\n\n\n\n\n\nclassif.bst\n \nbst\n\n\nGradient Boosting\n\n\nbst\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\nThe argument \nlearner\n has been renamed to \nLearner\n due to a name conflict with \nsetHyerPars\n. \nLearner\n has been set to \nlm\n by default.\n\n\n\n\n\n\nclassif.cforest\n \ncforest\n\n\nRandom forest based on conditional inference trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.clusterSVM\n \nclusterSVM\n\n\nClustered Support Vector Machines\n\n\nSwarmSVM\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\ncenters\n set to 2 by default\n\n\n\n\n\n\nclassif.ctree\n \nctree\n\n\nConditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nclassif.dbnDNN\n \ndbn.dnn\n\n\nDeep neural network with weights initialized by DBN\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\n set to \nsoftmax\n by default\n\n\n\n\n\n\nclassif.dcSVM\n \ndcSVM\n\n\nDivided-Conquer Support Vector Machines\n\n\nSwarmSVM\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\n\n\n\n\n\n\nclassif.extraTrees\n \nextraTrees\n\n\nExtremely Randomized Trees\n\n\nextraTrees\n\n\nX\n\n\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.fnn\n \nfnn\n\n\nFast k-Nearest Neighbour\n\n\nFNN\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.gaterSVM\n \ngaterSVM\n\n\nMixture of SVMs with Neural Network Gater Function\n\n\nSwarmSVM\ne1071\n\n\nX\n\n\n\n\n\n\n\n\ntwoclass\n\n\nm set to 3 and max.iter set to 1 by default\n\n\n\n\n\n\nclassif.gbm\n \ngbm\n\n\nGradient Boosting Machine\n\n\ngbm\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.geoDA\n \ngeoda\n\n\nGeometric Predictive Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.glmboost\n \nglmbst\n\n\nBoosting for GLMs\n\n\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nfamily\n has been set to \nBinomial()\n by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.\n\n\n\n\n\n\nclassif.glmnet\n \nglmnet\n\n\nGLM with Lasso or Elasticnet Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nclassif.hdrda\n \nhdrda\n\n\nHigh-Dimensional Regularized Discriminant Analysis\n\n\nsparsediscrim\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.IBk\n \nibk\n\n\nk-Nearest Neighbours\n\n\nRWeka\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.J48\n \nj48\n\n\nJ48 Decision Trees\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.JRip\n \njrip\n\n\nPropositional Rule Learner\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.kknn\n \nkknn\n\n\nk-Nearest Neighbor\n\n\nkknn\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.knn\n \nknn\n\n\nk-Nearest Neighbor\n\n\nclass\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.ksvm\n \nksvm\n\n\nSupport Vector Machines\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\nKernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that \nfit\n has been set to \nFALSE\n by default for speed.\n\n\n\n\n\n\nclassif.lda\n \nlda\n\n\nLinear Discriminant Analysis\n\n\nMASS\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n\n\n\n\n\nclassif.LiblineaRL1L2SVC\n \nliblinl1l2svc\n\n\nL1-Regularized L2-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL1LogReg\n \nliblinl1logreg\n\n\nL1-Regularized Logistic Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL2L1SVC\n \nliblinl2l1svc\n\n\nL2-Regularized L1-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.LiblineaRL2LogReg\n \nliblinl2logreg\n\n\nL2-Regularized Logistic Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\ntype 0 is primal and type 7 is dual problem\n\n\n\n\n\n\nclassif.LiblineaRL2SVC\n \nliblinl2svc\n\n\nL2-Regularized L2-Loss Support Vector Classification\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\ntype 2 is primal and type 1 is dual problem\n\n\n\n\n\n\nclassif.LiblineaRMultiClassSVC\n \nliblinmulticlasssvc\n\n\nSupport Vector Classification by Crammer and Singer\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\nclass.weights\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.linDA\n \nlinda\n\n\nLinear Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.logreg\n \nlogreg\n\n\nLogistic Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with family binomial/logit.\n\n\n\n\n\n\nclassif.lqa\n \nlqa\n\n\nFitting penalized Generalized Linear Models with the LQA algorithm\n\n\nlqa\n\n\nX\n\n\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\npenalty\n has been set to \nlasso\n and \nlambda\n to 0.1 by default.\n\n\n\n\n\n\nclassif.lssvm\n \nlssvm\n\n\nLeast Squares Support Vector Machine\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\nfitted\n has been set to \nFALSE\n by default for speed.\n\n\n\n\n\n\nclassif.lvq1\n \nlvq1\n\n\nLearning Vector Quantization\n\n\nclass\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.mda\n \nmda\n\n\nMixture Discriminant Analysis\n\n\nmda\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nkeep.fitted\n has been set to \nFALSE\n by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes\n\n\n\n\n\n\nclassif.mlp\n \nmlp\n\n\nMulti-Layer Perceptron\n\n\nRSNNS\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.multinom\n \nmultinom\n\n\nMultinomial Regression\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.naiveBayes\n \nnbayes\n\n\nNaive Bayes\n\n\ne1071\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.neuralnet\n \nneuralnet\n\n\nNeural Network from neuralnet\n\n\nneuralnet\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\nerr.fct\n has been set to \nce\n to do classification.\n\n\n\n\n\n\nclassif.nnet\n \nnnet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nsize\n has been set to 3 by default.\n\n\n\n\n\n\nclassif.nnTrain\n \nnn.train\n\n\nTraining Neural Network by Backpropagation\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\n set to \nsoftmax\n by default\n\n\n\n\n\n\nclassif.nodeHarvest\n \nnodeHarvest\n\n\nNode Harvest\n\n\nnodeHarvest\n\n\nX\n\n\nX\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.OneR\n \noner\n\n\n1-R Classifier\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.pamr\n \npamr\n\n\nNearest shrunken centroid\n\n\npamr\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\nthreshold for prediction (\nthreshold.predict\n) has been set to \n1\n by default\n\n\n\n\n\n\nclassif.PART\n \npart\n\n\nPART Decision Lists\n\n\nRWeka\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nNAs are directly passed to WEKA with \nna.action = na.pass\n\n\n\n\n\n\nclassif.plr\n \nplr\n\n\nLogistic Regression with a L2 Penalty\n\n\nstepPlr\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nAIC and BIC penalty types can be selected via the new parameter \ncp.type\n\n\n\n\n\n\nclassif.plsdaCaret\n \nplsdacaret\n\n\nPartial Least Squares (PLS) Discriminant Analysis\n\n\ncaret\n\n\nX\n\n\n\n\n\n\n\n\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.probit\n \nprobit\n\n\nProbit Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nprob\ntwoclass\n\n\nDelegates to glm with family binomial/probit.\n\n\n\n\n\n\nclassif.qda\n \nqda\n\n\nQuadratic Discriminant Analysis\n\n\nMASS\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nLearner param 'predict.method' maps to 'method' in predict.lda.\n\n\n\n\n\n\nclassif.quaDA\n \nquada\n\n\nQuadratic Discriminant Analysis\n\n\nDiscriMiner\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.randomForest\n \nrf\n\n\nRandom Forest\n\n\nrandomForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nordered\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.randomForestSRC\n \nrfsrc\n\n\nRandom Forest\n\n\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n'na.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nclassif.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nBy default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.\n\n\n\n\n\n\nclassif.rda\n \nrda\n\n\nRegularized Discriminant Analysis\n\n\nklaR\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nestimate.error\n has been set to \nFALSE\n by default for speed.\n\n\n\n\n\n\nclassif.rFerns\n \nrFerns\n\n\nRandom ferns\n\n\nrFerns\n\n\nX\n\n\nX\n\n\n\n\n\n\nmulticlass\nordered\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rknn\n \nrknn\n\n\nRandom k-Nearest-Neighbors\n\n\nrknn\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nordered\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rotationForest\n \nrotationForest\n\n\nRotation Forest\n\n\nrotationForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.rpart\n \nrpart\n\n\nDecision Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nmulticlass\nordered\nprob\ntwoclass\n\n\nxval\n has been set to 0 by default for speed.\n\n\n\n\n\n\nclassif.rrlda\n \nrrlda\n\n\nRobust Regularized Linear Discriminant Analysis\n\n\nrrlda\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\ntwoclass\n\n\n\n\n\n\n\n\nclassif.saeDNN\n \nsae.dnn\n\n\nDeep neural network with weights initialized by Stacked AutoEncoder\n\n\ndeepnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\noutput\n set to \nsoftmax\n by default\n\n\n\n\n\n\nclassif.sda\n \nsda\n\n\nShrinkage Discriminant Analysis\n\n\nsda\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.sparseLDA\n \nsparseLDA\n\n\nSparse Discriminant Analysis\n\n\nsparseLDA\nMASS\nelasticnet\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\nArguments Q and stop are not yet provided as they depend on the task.\n\n\n\n\n\n\nclassif.svm\n \nsvm\n\n\nSupport Vector Machines (libsvm)\n\n\ne1071\n\n\nX\n\n\nX\n\n\n\n\n\n\nclass.weights\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\nclassif.xgboost\n \nxgboost\n\n\neXtreme Gradient Boosting\n\n\nxgboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nmulticlass\nprob\ntwoclass\n\n\nAll setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default\n\n\n\n\n\n\nclassif.xyf\n \nxyf\n\n\nX-Y fused self-organising maps\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\nmulticlass\nprob\ntwoclass\n\n\n\n\n\n\n\n\n\n\nRegression (49)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nregr.avNNet\n \navNNet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nsize\n has been set to 3 by default.\n\n\n\n\n\n\nregr.bartMachine\n \nbartmachine\n\n\nBayesian Additive Regression Trees\n\n\nbartMachine\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n'use_missing_data' has been set to TRUE by default to allow missing data support\n\n\n\n\n\n\nregr.bcart\n \nbcart\n\n\nBayesian CART\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.bdk\n \nbdk\n\n\nBi-Directional Kohonen map\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.bgp\n \nbgp\n\n\nBayesian Gaussian Process\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.bgpllm\n \nbgpllm\n\n\nBayesian Gaussian Process with jumps to the Limiting Linear Model\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.blackboost\n \nblackbst\n\n\nGradient Boosting with Regression Trees\n\n\nmboost\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.blm\n \nblm\n\n\nBayesian Linear Model\n\n\ntgp\n\n\nX\n\n\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.brnn\n \nbrnn\n\n\nBayesian regularization for feed-forward neural networks\n\n\nbrnn\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.bst\n \nbst\n\n\nGradient Boosting\n\n\nbst\n\n\nX\n\n\n\n\n\n\n\n\n\n\nThe argument \nlearner\n has been renamed to \nLearner\n due to a name conflict with \nsetHyerPars\n\n\n\n\n\n\nregr.btgp\n \nbtgp\n\n\nBayesian Treed Gaussian Process\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.btgpllm\n \nbtgpllm\n\n\nBayesian Treed Gaussian Process with jumps to the Limiting Linear Model\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.btlm\n \nbtlm\n\n\nBayesian Treed Linear Model\n\n\ntgp\n\n\nX\n\n\nX\n\n\n\n\n\n\nse\n\n\n\n\n\n\n\n\nregr.cforest\n \ncforest\n\n\nRandom Forest Based on Conditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.crs\n \ncrs\n\n\nRegression Splines\n\n\ncrs\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nse\n\n\n\n\n\n\n\n\nregr.ctree\n \nctree\n\n\nConditional Inference Trees\n\n\nparty\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nregr.cubist\n \ncubist\n\n\nCubist\n\n\nCubist\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\nregr.earth\n \nearth\n\n\nMultivariate Adaptive Regression Splines\n\n\nearth\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.elmNN\n \nelmNN\n\n\nExtreme Learning Machine for Single Hidden Layer Feedforward Neural Networks\n\n\nelmNN\n\n\nX\n\n\n\n\n\n\n\n\n\n\nnhid has been set to 1 and actfun has been set to \"sig\" by default\n\n\n\n\n\n\nregr.extraTrees\n \nextraTrees\n\n\nExtremely Randomized Trees\n\n\nextraTrees\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\n\n\n\n\nregr.fnn\n \nfnn\n\n\nFast k-Nearest Neighbor\n\n\nFNN\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.frbs\n \nfrbs\n\n\nFuzzy Rule-based Systems\n\n\nfrbs\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.gbm\n \ngbm\n\n\nGradient Boosting Machine\n\n\ngbm\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\ndistribution\n has been set to \ngaussian\n by default.\n\n\n\n\n\n\nregr.glmboost\n \nglmboost\n\n\nBoosting for GLMs\n\n\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nMaximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'.\n\n\n\n\n\n\nregr.glmnet\n \nglmnet\n\n\nGLM with Lasso or Elasticnet Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nregr.IBk\n \nibk\n\n\nK-Nearest Neighbours\n\n\nRWeka\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.kknn\n \nkknn\n\n\nK-Nearest-Neighbor regression\n\n\nkknn\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.ksvm\n \nksvm\n\n\nSupport Vector Machines\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\nKernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that \nfit\n has been set to \nFALSE\n by default for speed.\n\n\n\n\n\n\nregr.LiblineaRL2L1SVR\n \nliblinl2l1svr\n\n\nL2-Regularized L1-Loss Support Vector Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.LiblineaRL2L2SVR\n \nliblinl2l2svr\n\n\nL2-Regularized L2-Loss Support Vector Regression\n\n\nLiblineaR\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntype 11 is primal and 12 is dual problem\n\n\n\n\n\n\nregr.lm\n \nlm\n\n\nSimple Linear Regression\n\n\nstats\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nse\n\n\n\n\n\n\n\n\nregr.mars\n \nmars\n\n\nMultivariate Adaptive Regression Splines\n\n\nmda\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.mob\n \nmob\n\n\nModel-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node\n\n\nparty\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\n\n\n\n\n\n\nregr.nnet\n \nnnet\n\n\nNeural Network\n\n\nnnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nsize\n has been set to 3 by default.\n\n\n\n\n\n\nregr.nodeHarvest\n \nnodeHarvest\n\n\nNode Harvest\n\n\nnodeHarvest\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.pcr\n \npcr\n\n\nPrincipal Component Regression\n\n\npls\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.penalized.lasso\n \nlasso\n\n\nLasso Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.penalized.ridge\n \nridge\n\n\nPenalized Ridge Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.plsr\n \nplsr\n\n\nPartial Least Squares Regression\n\n\npls\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.randomForest\n \nrf\n\n\nRandom Forest\n\n\nrandomForest\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nse\n\n\n\n\n\n\n\n\nregr.randomForestSRC\n \nrfsrc\n\n\nRandom Forest\n\n\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\nna.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nregr.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\nBy default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.\n\n\n\n\n\n\nregr.rknn\n \nrknn\n\n\nRandom k-Nearest-Neighbors\n\n\nrknn\n\n\nX\n\n\n\n\n\n\n\n\nordered\n\n\n\n\n\n\n\n\nregr.rpart\n \nrpart\n\n\nDecision Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\n\n\nxval\n has been set to 0 by default for speed.\n\n\n\n\n\n\nregr.rsm\n \nrsm\n\n\nResponse Surface Regression\n\n\nrsm\n\n\nX\n\n\n\n\n\n\n\n\n\n\nYou select the order of the regression by using modelfun = \"FO\" (first order), \"TWI\" (two-way interactions, this is with 1st oder terms!) and \"SO\" (full second order)\n\n\n\n\n\n\nregr.rvm\n \nrvm\n\n\nRelevance Vector Machine\n\n\nkernlab\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\nKernel parameters have to be passed directly and not by using the kpar list in rvm. Note that \nfit\n has been set to \nFALSE\n by default for speed.\n\n\n\n\n\n\nregr.svm\n \nsvm\n\n\nSupport Vector Machines (libsvm)\n\n\ne1071\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nregr.xgboost\n \nxgboost\n\n\neXtreme Gradient Boosting\n\n\nxgboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\n\n\nAll setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default\n\n\n\n\n\n\nregr.xyf\n \nxyf\n\n\nX-Y fused self-organising maps\n\n\nkohonen\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSurvival analysis (9)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nsurv.cforest\n \ncrf\n\n\nRandom Forest based on Conditional Inference Trees\n\n\nparty\nsurvival\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\nrcens\n\n\nsee ?ctree_control for possible breakage for nominal features with missingness\n\n\n\n\n\n\nsurv.coxph\n \ncoxph\n\n\nCox Proportional Hazard Model\n\n\nsurvival\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nprob\nrcens\n\n\n\n\n\n\n\n\nsurv.cvglmnet\n \ncvglmnet\n\n\nGLM with Regularization (Cross Validated Lambda)\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.glmboost\n \nglmboost\n\n\nGradient Boosting with Componentwise Linear Models\n\n\nsurvival\nmboost\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nfamily\n has been set to \nCoxPH()\n by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'.\n\n\n\n\n\n\nsurv.glmnet\n \nglmnet\n\n\nGLM with Regularization\n\n\nglmnet\n\n\nX\n\n\nX\n\n\n\n\nX\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.penalized\n \npenalized\n\n\nPenalized Regression\n\n\npenalized\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\nrcens\n\n\nFactors automatically get converted to dummy columns, ordered factors to integer\n\n\n\n\n\n\nsurv.randomForestSRC\n \nrfsrc\n\n\nRandom Forests for Survival\n\n\nsurvival\nrandomForestSRC\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\nordered\nrcens\n\n\n'na.action' has been set to 'na.impute' by default to allow missing data support\n\n\n\n\n\n\nsurv.ranger\n \nranger\n\n\nRandom Forests\n\n\nranger\n\n\nX\n\n\nX\n\n\n\n\n\n\nprob\nrcens\n\n\nBy default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable.\n\n\n\n\n\n\nsurv.rpart\n \nrpart\n\n\nSurvival Tree\n\n\nrpart\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\nordered\nrcens\n\n\nxval\n has been set to 0 by default for speed.\n\n\n\n\n\n\n\n\nCluster analysis (5)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\ncluster.Cobweb\n \ncobweb\n\n\nCobweb Clustering Algorithm\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.EM\n \nem\n\n\nExpectation-Maximization Clustering\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.FarthestFirst\n \nfarthestfirst\n\n\nFarthestFirst Clustering Algorithm\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.SimpleKMeans\n \nsimplekmeans\n\n\nK-Means Clustering\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ncluster.XMeans\n \nxmeans\n\n\nXMeans (k-means with automatic determination of k)\n\n\nRWeka\n\n\nX\n\n\n\n\n\n\n\n\n\n\nYou may have to install the XMeans Weka package: WPM('install-package', 'XMeans').\n\n\n\n\n\n\n\n\nCost-sensitive classification\n\n\nFor \nordinary misclassification costs\n you can use all the standard classification methods listed\nabove.\n\n\nFor \nexample-dependent costs\n there are several ways to generate cost-sensitive learners from\nordinary regression and classification learners.\nSee section \ncost-sensitive classification\n and the documentation\nof \nmakeCostSensClassifWrapper\n, \nmakeCostSensRegrWrapper\n and \nmakeCostSensWeightedPairsWrapper\n\nfor details.\n\n\nMultilabel classification (1)\n\n\n\n\n\n\n\n\nID / Short Name\n\n\nName\n\n\nPackages\n\n\nNum.\n\n\nFac.\n\n\nNAs\n\n\nWeights\n\n\nProps\n\n\nNote\n\n\n\n\n\n\n\n\n\n\nmultilabel.rFerns\n \nrFerns\n\n\nRandom ferns\n\n\nrFerns\n\n\nX\n\n\nX\n\n\n\n\n\n\nordered\n\n\n\n\n\n\n\n\n\n\nMoreover, you can use the binary relevance method to apply ordinary classification learners\nto the multilabel problem. See the documentation of function \nmakeMultilabelBinaryRelevanceWrapper\n\nand the tutorial section on \nmultilabel classification\n for details.", "title": "Integrated Learners" }, { "location": "/integrated_learners/index.html#integrated-learners", - "text": "This page lists the learning methods already integrated in mlr . Columns Num. , Fac. , NAs , and Weights indicate if a method can cope with\nnumerical and factor predictors, if it can deal with missing values in a meaningful way\n(other than simply removing observations with missing values) and if observation\nweights are supported. Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features.\nFor classification , you can see if binary and/or multi-class problems are supported\nand if the learner accepts class weights.\nFor survival analysis , the censoring type is shown.\nFor example rcens means that the learning method can deal with right censored data.\nMoreover, the type of prediction is displayed, where prob indicates that probabilities\ncan be predicted.\nFor regression , se means that additional to the mean response standard errors can be predicted.\nSee also RLearner for details. Classification (70) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note classif.ada ada ada Boosting ada X X X prob twoclass classif.avNNet avNNet Neural Network nnet X X X multiclass prob twoclass size has been set to 3 by default. Doing bagging training of nnet if set bag=TRUE classif.bartMachine bartmachine Bayesian Additive Regression Trees bartMachine X X X prob twoclass 'use_missing_data' has been set to TRUE by default to allow missing data support classif.bdk bdk Bi-Directional Kohonen map kohonen X multiclass prob twoclass classif.binomial binomial Binomial Regression stats X X X prob twoclass Delegates to glm with freely choosable binomial link function via learner param 'link'. classif.blackboost blackbst Gradient Boosting With Regression Trees mboost party X X X X prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.boosting adabag Adabag Boosting adabag rpart X X X multiclass prob twoclass xval has been set to 0 by default for speed. classif.bst bst Gradient Boosting bst X twoclass The argument learner has been renamed to Learner due to a name conflict with setHyerPars Learner has been set to lm by default. classif.cforest cforest Random forest based on conditional inference trees party X X X X multiclass ordered prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.clusterSVM clusterSVM Clustered Support Vector Machines SwarmSVM LiblineaR X twoclass centers set to 2 by default classif.ctree ctree Conditional Inference Trees party X X X X multiclass ordered prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.dbnDNN dbn.dnn Deep neural network with weights initialized by DBN deepnet X multiclass prob twoclass output set to softmax by default classif.dcSVM dcSVM Divided-Conquer Support Vector Machines SwarmSVM X twoclass classif.extraTrees extraTrees Extremely Randomized Trees extraTrees X X multiclass prob twoclass classif.fnn fnn Fast k-Nearest Neighbour FNN X multiclass twoclass classif.gaterSVM gaterSVM Mixture of SVMs with Neural Network Gater Function SwarmSVM e1071 X twoclass m set to 3 and max.iter set to 1 by default classif.gbm gbm Gradient Boosting Machine gbm X X X X multiclass prob twoclass classif.geoDA geoda Geometric Predictive Discriminant Analysis DiscriMiner X multiclass twoclass classif.glmboost glmbst Boosting for GLMs mboost X X X prob twoclass family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. classif.glmnet glmnet GLM with Lasso or Elasticnet Regularization glmnet X X X multiclass prob twoclass Factors automatically get converted to dummy columns, ordered factors to integer classif.hdrda hdrda High-Dimensional Regularized Discriminant Analysis sparsediscrim X prob twoclass classif.IBk ibk k-Nearest Neighbours RWeka X X multiclass prob twoclass classif.J48 j48 J48 Decision Trees RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.JRip jrip Propositional Rule Learner RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.kknn kknn k-Nearest Neighbor kknn X X multiclass prob twoclass classif.knn knn k-Nearest Neighbor class X multiclass twoclass classif.ksvm ksvm Support Vector Machines kernlab X X class.weights multiclass prob twoclass Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. classif.lda lda Linear Discriminant Analysis MASS X X multiclass prob twoclass Learner param 'predict.method' maps to 'method' in predict.lda. classif.LiblineaRL1L2SVC liblinl1l2svc L1-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass classif.LiblineaRL1LogReg liblinl1logreg L1-Regularized Logistic Regression LiblineaR X class.weights multiclass prob twoclass classif.LiblineaRL2L1SVC liblinl2l1svc L2-Regularized L1-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass classif.LiblineaRL2LogReg liblinl2logreg L2-Regularized Logistic Regression LiblineaR X class.weights multiclass prob twoclass type 0 is primal and type 7 is dual problem classif.LiblineaRL2SVC liblinl2svc L2-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass type 2 is primal and type 1 is dual problem classif.LiblineaRMultiClassSVC liblinmulticlasssvc Support Vector Classification by Crammer and Singer LiblineaR X class.weights multiclass twoclass classif.linDA linda Linear Discriminant Analysis DiscriMiner X multiclass twoclass classif.logreg logreg Logistic Regression stats X X X prob twoclass Delegates to glm with family binomial/logit. classif.lqa lqa Fitting penalized Generalized Linear Models with the LQA algorithm lqa X X prob twoclass penalty has been set to lasso and lambda to 0.1 by default. classif.lssvm lssvm Least Squares Support Vector Machine kernlab X X multiclass twoclass fitted has been set to FALSE by default for speed. classif.lvq1 lvq1 Learning Vector Quantization class X multiclass twoclass classif.mda mda Mixture Discriminant Analysis mda X X multiclass prob twoclass keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes classif.mlp mlp Multi-Layer Perceptron RSNNS X multiclass prob twoclass classif.multinom multinom Multinomial Regression nnet X X X multiclass prob twoclass classif.naiveBayes nbayes Naive Bayes e1071 X X X multiclass prob twoclass classif.neuralnet neuralnet Neural Network from neuralnet neuralnet X prob twoclass err.fct has been set to ce to do classification. classif.nnet nnet Neural Network nnet X X X multiclass prob twoclass size has been set to 3 by default. classif.nnTrain nn.train Training Neural Network by Backpropagation deepnet X multiclass prob twoclass output set to softmax by default classif.nodeHarvest nodeHarvest Node Harvest nodeHarvest X X prob twoclass classif.OneR oner 1-R Classifier RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.pamr pamr Nearest shrunken centroid pamr X prob twoclass threshold for prediction ( threshold.predict has been set to 1 by default classif.PART part PART Decision Lists RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.plr plr Logistic Regression with a L2 Penalty stepPlr X X X prob twoclass AIC and BIC penalty types can be selected via the new parameter cp.type classif.plsdaCaret plsdacaret Partial Least Squares (PLS) Discriminant Analysis caret X prob twoclass classif.probit probit Probit Regression stats X X X prob twoclass Delegates to glm with family binomial/probit. classif.qda qda Quadratic Discriminant Analysis MASS X X multiclass prob twoclass Learner param 'predict.method' maps to 'method' in predict.lda. classif.quaDA quada Quadratic Discriminant Analysis DiscriMiner X multiclass twoclass classif.randomForest rf Random Forest randomForest X X class.weights multiclass ordered prob twoclass classif.randomForestSRC rfsrc Random Forest randomForestSRC X X X multiclass prob twoclass 'na.action' has been set to 'na.impute' by default to allow missing data support classif.ranger ranger Random Forests ranger X X multiclass prob twoclass classif.rda rda Regularized Discriminant Analysis klaR X X multiclass prob twoclass estimate.error has been set to FALSE by default for speed. classif.rFerns rFerns Random ferns rFerns X X multiclass ordered twoclass classif.rknn rknn Random k-Nearest-Neighbors rknn X multiclass ordered twoclass classif.rotationForest rotationForest Rotation Forest rotationForest X X ordered prob twoclass classif.rpart rpart Decision Tree rpart X X X X multiclass ordered prob twoclass xval has been set to 0 by default for speed. classif.rrlda rrlda Robust Regularized Linear Discriminant Analysis rrlda X multiclass twoclass classif.saeDNN sae.dnn Deep neural network with weights initialized by Stacked AutoEncoder deepnet X multiclass prob twoclass output set to softmax by default classif.sda sda Shrinkage Discriminant Analysis sda X multiclass prob twoclass classif.sparseLDA sparseLDA Sparse Discriminant Analysis sparseLDA MASS elasticnet X multiclass prob twoclass Arguments Q and stop are not yet provided as they depend on the task. classif.svm svm Support Vector Machines (libsvm) e1071 X X class.weights multiclass prob twoclass classif.xgboost xgboost eXtreme Gradient Boosting xgboost X X X multiclass prob twoclass All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default classif.xyf xyf X-Y fused self-organising maps kohonen X multiclass prob twoclass Regression (49) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note regr.avNNet avNNet Neural Network nnet X X X size has been set to 3 by default. regr.bartMachine bartmachine Bayesian Additive Regression Trees bartMachine X X X 'use_missing_data' has been set to TRUE by default to allow missing data support regr.bcart bcart Bayesian CART tgp X X se regr.bdk bdk Bi-Directional Kohonen map kohonen X regr.bgp bgp Bayesian Gaussian Process tgp X se regr.bgpllm bgpllm Bayesian Gaussian Process with jumps to the Limiting Linear Model tgp X se regr.blackboost blackbst Gradient Boosting with Regression Trees mboost party X X X X see ?ctree_control for possible breakage for nominal features with missingness regr.blm blm Bayesian Linear Model tgp X se regr.brnn brnn Bayesian regularization for feed-forward neural networks brnn X X regr.bst bst Gradient Boosting bst X The argument learner has been renamed to Learner due to a name conflict with setHyerPars regr.btgp btgp Bayesian Treed Gaussian Process tgp X X se regr.btgpllm btgpllm Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model tgp X X se regr.btlm btlm Bayesian Treed Linear Model tgp X X se regr.cforest cforest Random Forest Based on Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness regr.crs crs Regression Splines crs X X X se regr.ctree ctree Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness regr.cubist cubist Cubist Cubist X X X regr.earth earth Multivariate Adaptive Regression Splines earth X X regr.elmNN elmNN Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks elmNN X nhid has been set to 1 and actfun has been set to \"sig\" by default regr.extraTrees extraTrees Extremely Randomized Trees extraTrees X X regr.fnn fnn Fast k-Nearest Neighbor FNN X regr.frbs frbs Fuzzy Rule-based Systems frbs X regr.gbm gbm Gradient Boosting Machine gbm X X X X distribution has been set to gaussian by default. regr.glmboost glmboost Boosting for GLMs mboost X X X Maximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'. regr.glmnet glmnet GLM with Lasso or Elasticnet Regularization glmnet X X X ordered Factors automatically get converted to dummy columns, ordered factors to integer regr.IBk ibk K-Nearest Neighbours RWeka X X regr.kknn kknn K-Nearest-Neighbor regression kknn X X regr.ksvm ksvm Support Vector Machines kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. regr.LiblineaRL2L1SVR liblinl2l1svr L2-Regularized L1-Loss Support Vector Regression LiblineaR X regr.LiblineaRL2L2SVR liblinl2l2svr L2-Regularized L2-Loss Support Vector Regression LiblineaR X type 11 is primal and 12 is dual problem regr.lm lm Simple Linear Regression stats X X X se regr.mars mars Multivariate Adaptive Regression Splines mda X regr.mob mob Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node party X X X regr.nnet nnet Neural Network nnet X X X size has been set to 3 by default. regr.nodeHarvest nodeHarvest Node Harvest nodeHarvest X X regr.pcr pcr Principal Component Regression pls X X regr.penalized.lasso lasso Lasso Regression penalized X X regr.penalized.ridge ridge Penalized Ridge Regression penalized X X regr.plsr plsr Partial Least Squares Regression pls X X regr.randomForest rf Random Forest randomForest X X ordered se regr.randomForestSRC rfsrc Random Forest randomForestSRC X X X na.action' has been set to 'na.impute' by default to allow missing data support regr.ranger ranger Random Forests ranger X X regr.rknn rknn Random k-Nearest-Neighbors rknn X ordered regr.rpart rpart Decision Tree rpart X X X X ordered xval has been set to 0 by default for speed. regr.rsm rsm Response Surface Regression rsm X You select the order of the regression by using modelfun = \"FO\" (first order), \"TWI\" (two-way interactions, this is with 1st oder terms!) and \"SO\" (full second order) regr.rvm rvm Relevance Vector Machine kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed. regr.svm svm Support Vector Machines (libsvm) e1071 X X regr.xgboost xgboost eXtreme Gradient Boosting xgboost X X X All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default regr.xyf xyf X-Y fused self-organising maps kohonen X Survival analysis (9) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note surv.cforest crf Random Forest based on Conditional Inference Trees party survival X X X X ordered rcens see ?ctree_control for possible breakage for nominal features with missingness surv.coxph coxph Cox Proportional Hazard Model survival X X X X prob rcens surv.cvglmnet cvglmnet GLM with Regularization (Cross Validated Lambda) glmnet X X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.glmboost glmboost Gradient Boosting with Componentwise Linear Models survival mboost X X X ordered rcens family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. surv.glmnet glmnet GLM with Regularization glmnet X X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.penalized penalized Penalized Regression penalized X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.randomForestSRC rfsrc Random Forests for Survival survival randomForestSRC X X X ordered rcens 'na.action' has been set to 'na.impute' by default to allow missing data support surv.ranger ranger Random Forests ranger X X prob rcens surv.rpart rpart Survival Tree rpart X X X X ordered rcens xval has been set to 0 by default for speed. Cluster analysis (5) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note cluster.Cobweb cobweb Cobweb Clustering Algorithm RWeka X cluster.EM em Expectation-Maximization Clustering RWeka X cluster.FarthestFirst farthestfirst FarthestFirst Clustering Algorithm RWeka X cluster.SimpleKMeans simplekmeans K-Means Clustering RWeka X cluster.XMeans xmeans XMeans (k-means with automatic determination of k) RWeka X You may have to install the XMeans Weka package: WPM('install-package', 'XMeans'). Cost-sensitive classification For ordinary misclassification costs you can use all the standard classification methods listed\nabove. For example-dependent costs there are several ways to generate cost-sensitive learners from\nordinary regression and classification learners.\nSee section cost-sensitive classification and the documentation\nof makeCostSensClassifWrapper , makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper \nfor details. Multilabel classification (1) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note multilabel.rFerns rFerns Random ferns rFerns X X ordered Moreover, you can use the binary relevance method to apply ordinary classification learners\nto the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper \nand the tutorial section on multilabel classification for details.", + "text": "This page lists the learning methods already integrated in mlr . Columns Num. , Fac. , NAs , and Weights indicate if a method can cope with\nnumerical and factor predictors, if it can deal with missing values in a meaningful way\n(other than simply removing observations with missing values) and if observation\nweights are supported. Column Props shows further properties of the learning methods. ordered indicates that a method can deal with ordered factor features.\nFor classification , you can see if binary and/or multi-class problems are supported\nand if the learner accepts class weights.\nFor survival analysis , the censoring type is shown.\nFor example rcens means that the learning method can deal with right censored data.\nMoreover, the type of prediction is displayed, where prob indicates that probabilities\ncan be predicted.\nFor regression , se means that additional to the mean response standard errors can be predicted.\nSee also RLearner for details. Classification (70) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note classif.ada ada ada Boosting ada X X X prob twoclass classif.avNNet avNNet Neural Network nnet X X X multiclass prob twoclass size has been set to 3 by default. Doing bagging training of nnet if set bag=TRUE . classif.bartMachine bartmachine Bayesian Additive Regression Trees bartMachine X X X prob twoclass 'use_missing_data' has been set to TRUE by default to allow missing data support classif.bdk bdk Bi-Directional Kohonen map kohonen X multiclass prob twoclass classif.binomial binomial Binomial Regression stats X X X prob twoclass Delegates to glm with freely choosable binomial link function via learner param 'link'. classif.blackboost blackbst Gradient Boosting With Regression Trees mboost party X X X X prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.boosting adabag Adabag Boosting adabag rpart X X X multiclass prob twoclass xval has been set to 0 by default for speed. classif.bst bst Gradient Boosting bst X twoclass The argument learner has been renamed to Learner due to a name conflict with setHyerPars . Learner has been set to lm by default. classif.cforest cforest Random forest based on conditional inference trees party X X X X multiclass ordered prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.clusterSVM clusterSVM Clustered Support Vector Machines SwarmSVM LiblineaR X twoclass centers set to 2 by default classif.ctree ctree Conditional Inference Trees party X X X X multiclass ordered prob twoclass see ?ctree_control for possible breakage for nominal features with missingness classif.dbnDNN dbn.dnn Deep neural network with weights initialized by DBN deepnet X multiclass prob twoclass output set to softmax by default classif.dcSVM dcSVM Divided-Conquer Support Vector Machines SwarmSVM X twoclass classif.extraTrees extraTrees Extremely Randomized Trees extraTrees X X multiclass prob twoclass classif.fnn fnn Fast k-Nearest Neighbour FNN X multiclass twoclass classif.gaterSVM gaterSVM Mixture of SVMs with Neural Network Gater Function SwarmSVM e1071 X twoclass m set to 3 and max.iter set to 1 by default classif.gbm gbm Gradient Boosting Machine gbm X X X X multiclass prob twoclass classif.geoDA geoda Geometric Predictive Discriminant Analysis DiscriMiner X multiclass twoclass classif.glmboost glmbst Boosting for GLMs mboost X X X prob twoclass family has been set to Binomial() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. classif.glmnet glmnet GLM with Lasso or Elasticnet Regularization glmnet X X X multiclass prob twoclass Factors automatically get converted to dummy columns, ordered factors to integer classif.hdrda hdrda High-Dimensional Regularized Discriminant Analysis sparsediscrim X prob twoclass classif.IBk ibk k-Nearest Neighbours RWeka X X multiclass prob twoclass classif.J48 j48 J48 Decision Trees RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.JRip jrip Propositional Rule Learner RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.kknn kknn k-Nearest Neighbor kknn X X multiclass prob twoclass classif.knn knn k-Nearest Neighbor class X multiclass twoclass classif.ksvm ksvm Support Vector Machines kernlab X X class.weights multiclass prob twoclass Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. classif.lda lda Linear Discriminant Analysis MASS X X multiclass prob twoclass Learner param 'predict.method' maps to 'method' in predict.lda. classif.LiblineaRL1L2SVC liblinl1l2svc L1-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass classif.LiblineaRL1LogReg liblinl1logreg L1-Regularized Logistic Regression LiblineaR X class.weights multiclass prob twoclass classif.LiblineaRL2L1SVC liblinl2l1svc L2-Regularized L1-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass classif.LiblineaRL2LogReg liblinl2logreg L2-Regularized Logistic Regression LiblineaR X class.weights multiclass prob twoclass type 0 is primal and type 7 is dual problem classif.LiblineaRL2SVC liblinl2svc L2-Regularized L2-Loss Support Vector Classification LiblineaR X class.weights multiclass twoclass type 2 is primal and type 1 is dual problem classif.LiblineaRMultiClassSVC liblinmulticlasssvc Support Vector Classification by Crammer and Singer LiblineaR X class.weights multiclass twoclass classif.linDA linda Linear Discriminant Analysis DiscriMiner X multiclass twoclass classif.logreg logreg Logistic Regression stats X X X prob twoclass Delegates to glm with family binomial/logit. classif.lqa lqa Fitting penalized Generalized Linear Models with the LQA algorithm lqa X X prob twoclass penalty has been set to lasso and lambda to 0.1 by default. classif.lssvm lssvm Least Squares Support Vector Machine kernlab X X multiclass twoclass fitted has been set to FALSE by default for speed. classif.lvq1 lvq1 Learning Vector Quantization class X multiclass twoclass classif.mda mda Mixture Discriminant Analysis mda X X multiclass prob twoclass keep.fitted has been set to FALSE by default for speed and we use start.method='lvq' for more robust behavior / less technical crashes classif.mlp mlp Multi-Layer Perceptron RSNNS X multiclass prob twoclass classif.multinom multinom Multinomial Regression nnet X X X multiclass prob twoclass classif.naiveBayes nbayes Naive Bayes e1071 X X X multiclass prob twoclass classif.neuralnet neuralnet Neural Network from neuralnet neuralnet X prob twoclass err.fct has been set to ce to do classification. classif.nnet nnet Neural Network nnet X X X multiclass prob twoclass size has been set to 3 by default. classif.nnTrain nn.train Training Neural Network by Backpropagation deepnet X multiclass prob twoclass output set to softmax by default classif.nodeHarvest nodeHarvest Node Harvest nodeHarvest X X prob twoclass classif.OneR oner 1-R Classifier RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.pamr pamr Nearest shrunken centroid pamr X prob twoclass threshold for prediction ( threshold.predict ) has been set to 1 by default classif.PART part PART Decision Lists RWeka X X X multiclass prob twoclass NAs are directly passed to WEKA with na.action = na.pass classif.plr plr Logistic Regression with a L2 Penalty stepPlr X X X prob twoclass AIC and BIC penalty types can be selected via the new parameter cp.type classif.plsdaCaret plsdacaret Partial Least Squares (PLS) Discriminant Analysis caret X prob twoclass classif.probit probit Probit Regression stats X X X prob twoclass Delegates to glm with family binomial/probit. classif.qda qda Quadratic Discriminant Analysis MASS X X multiclass prob twoclass Learner param 'predict.method' maps to 'method' in predict.lda. classif.quaDA quada Quadratic Discriminant Analysis DiscriMiner X multiclass twoclass classif.randomForest rf Random Forest randomForest X X class.weights multiclass ordered prob twoclass classif.randomForestSRC rfsrc Random Forest randomForestSRC X X X multiclass prob twoclass 'na.action' has been set to 'na.impute' by default to allow missing data support classif.ranger ranger Random Forests ranger X X multiclass prob twoclass By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. classif.rda rda Regularized Discriminant Analysis klaR X X multiclass prob twoclass estimate.error has been set to FALSE by default for speed. classif.rFerns rFerns Random ferns rFerns X X multiclass ordered twoclass classif.rknn rknn Random k-Nearest-Neighbors rknn X multiclass ordered twoclass classif.rotationForest rotationForest Rotation Forest rotationForest X X ordered prob twoclass classif.rpart rpart Decision Tree rpart X X X X multiclass ordered prob twoclass xval has been set to 0 by default for speed. classif.rrlda rrlda Robust Regularized Linear Discriminant Analysis rrlda X multiclass twoclass classif.saeDNN sae.dnn Deep neural network with weights initialized by Stacked AutoEncoder deepnet X multiclass prob twoclass output set to softmax by default classif.sda sda Shrinkage Discriminant Analysis sda X multiclass prob twoclass classif.sparseLDA sparseLDA Sparse Discriminant Analysis sparseLDA MASS elasticnet X multiclass prob twoclass Arguments Q and stop are not yet provided as they depend on the task. classif.svm svm Support Vector Machines (libsvm) e1071 X X class.weights multiclass prob twoclass classif.xgboost xgboost eXtreme Gradient Boosting xgboost X X X multiclass prob twoclass All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default classif.xyf xyf X-Y fused self-organising maps kohonen X multiclass prob twoclass Regression (49) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note regr.avNNet avNNet Neural Network nnet X X X size has been set to 3 by default. regr.bartMachine bartmachine Bayesian Additive Regression Trees bartMachine X X X 'use_missing_data' has been set to TRUE by default to allow missing data support regr.bcart bcart Bayesian CART tgp X X se regr.bdk bdk Bi-Directional Kohonen map kohonen X regr.bgp bgp Bayesian Gaussian Process tgp X se regr.bgpllm bgpllm Bayesian Gaussian Process with jumps to the Limiting Linear Model tgp X se regr.blackboost blackbst Gradient Boosting with Regression Trees mboost party X X X X see ?ctree_control for possible breakage for nominal features with missingness regr.blm blm Bayesian Linear Model tgp X se regr.brnn brnn Bayesian regularization for feed-forward neural networks brnn X X regr.bst bst Gradient Boosting bst X The argument learner has been renamed to Learner due to a name conflict with setHyerPars regr.btgp btgp Bayesian Treed Gaussian Process tgp X X se regr.btgpllm btgpllm Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model tgp X X se regr.btlm btlm Bayesian Treed Linear Model tgp X X se regr.cforest cforest Random Forest Based on Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness regr.crs crs Regression Splines crs X X X se regr.ctree ctree Conditional Inference Trees party X X X X ordered see ?ctree_control for possible breakage for nominal features with missingness regr.cubist cubist Cubist Cubist X X X regr.earth earth Multivariate Adaptive Regression Splines earth X X regr.elmNN elmNN Extreme Learning Machine for Single Hidden Layer Feedforward Neural Networks elmNN X nhid has been set to 1 and actfun has been set to \"sig\" by default regr.extraTrees extraTrees Extremely Randomized Trees extraTrees X X regr.fnn fnn Fast k-Nearest Neighbor FNN X regr.frbs frbs Fuzzy Rule-based Systems frbs X regr.gbm gbm Gradient Boosting Machine gbm X X X X distribution has been set to gaussian by default. regr.glmboost glmboost Boosting for GLMs mboost X X X Maximum number of boosting iterations is set via 'mstop', the actual number used is controlled by 'm'. regr.glmnet glmnet GLM with Lasso or Elasticnet Regularization glmnet X X X ordered Factors automatically get converted to dummy columns, ordered factors to integer regr.IBk ibk K-Nearest Neighbours RWeka X X regr.kknn kknn K-Nearest-Neighbor regression kknn X X regr.ksvm ksvm Support Vector Machines kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in ksvm. Note that fit has been set to FALSE by default for speed. regr.LiblineaRL2L1SVR liblinl2l1svr L2-Regularized L1-Loss Support Vector Regression LiblineaR X regr.LiblineaRL2L2SVR liblinl2l2svr L2-Regularized L2-Loss Support Vector Regression LiblineaR X type 11 is primal and 12 is dual problem regr.lm lm Simple Linear Regression stats X X X se regr.mars mars Multivariate Adaptive Regression Splines mda X regr.mob mob Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node party X X X regr.nnet nnet Neural Network nnet X X X size has been set to 3 by default. regr.nodeHarvest nodeHarvest Node Harvest nodeHarvest X X regr.pcr pcr Principal Component Regression pls X X regr.penalized.lasso lasso Lasso Regression penalized X X regr.penalized.ridge ridge Penalized Ridge Regression penalized X X regr.plsr plsr Partial Least Squares Regression pls X X regr.randomForest rf Random Forest randomForest X X ordered se regr.randomForestSRC rfsrc Random Forest randomForestSRC X X X na.action' has been set to 'na.impute' by default to allow missing data support regr.ranger ranger Random Forests ranger X X By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. regr.rknn rknn Random k-Nearest-Neighbors rknn X ordered regr.rpart rpart Decision Tree rpart X X X X ordered xval has been set to 0 by default for speed. regr.rsm rsm Response Surface Regression rsm X You select the order of the regression by using modelfun = \"FO\" (first order), \"TWI\" (two-way interactions, this is with 1st oder terms!) and \"SO\" (full second order) regr.rvm rvm Relevance Vector Machine kernlab X X Kernel parameters have to be passed directly and not by using the kpar list in rvm. Note that fit has been set to FALSE by default for speed. regr.svm svm Support Vector Machines (libsvm) e1071 X X regr.xgboost xgboost eXtreme Gradient Boosting xgboost X X X All setting are passed directly, rather than through xgboost's 'param'. 'rounds' set to 1 by default regr.xyf xyf X-Y fused self-organising maps kohonen X Survival analysis (9) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note surv.cforest crf Random Forest based on Conditional Inference Trees party survival X X X X ordered rcens see ?ctree_control for possible breakage for nominal features with missingness surv.coxph coxph Cox Proportional Hazard Model survival X X X X prob rcens surv.cvglmnet cvglmnet GLM with Regularization (Cross Validated Lambda) glmnet X X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.glmboost glmboost Gradient Boosting with Componentwise Linear Models survival mboost X X X ordered rcens family has been set to CoxPH() by default. Maximum number of boosting iterations is set via 'mstop', the actual number used for prediction is controlled by 'm'. surv.glmnet glmnet GLM with Regularization glmnet X X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.penalized penalized Penalized Regression penalized X X ordered rcens Factors automatically get converted to dummy columns, ordered factors to integer surv.randomForestSRC rfsrc Random Forests for Survival survival randomForestSRC X X X ordered rcens 'na.action' has been set to 'na.impute' by default to allow missing data support surv.ranger ranger Random Forests ranger X X prob rcens By default, internal parallelization is switched off (num.threads = 1) and verbose output is disabled. Both settings are changeable. surv.rpart rpart Survival Tree rpart X X X X ordered rcens xval has been set to 0 by default for speed. Cluster analysis (5) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note cluster.Cobweb cobweb Cobweb Clustering Algorithm RWeka X cluster.EM em Expectation-Maximization Clustering RWeka X cluster.FarthestFirst farthestfirst FarthestFirst Clustering Algorithm RWeka X cluster.SimpleKMeans simplekmeans K-Means Clustering RWeka X cluster.XMeans xmeans XMeans (k-means with automatic determination of k) RWeka X You may have to install the XMeans Weka package: WPM('install-package', 'XMeans'). Cost-sensitive classification For ordinary misclassification costs you can use all the standard classification methods listed\nabove. For example-dependent costs there are several ways to generate cost-sensitive learners from\nordinary regression and classification learners.\nSee section cost-sensitive classification and the documentation\nof makeCostSensClassifWrapper , makeCostSensRegrWrapper and makeCostSensWeightedPairsWrapper \nfor details. Multilabel classification (1) ID / Short Name Name Packages Num. Fac. NAs Weights Props Note multilabel.rFerns rFerns Random ferns rFerns X X ordered Moreover, you can use the binary relevance method to apply ordinary classification learners\nto the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper \nand the tutorial section on multilabel classification for details.", "title": "Integrated Learners" }, { "location": "/measures/index.html", - "text": "Implemented Performance Measures\n\n\nThe following tables show the performance measures available for the different types of\nlearning problems as well as general performance measures in alphabetical order.\n(See also the documentation about \nmeasures\n and \nmakeMeasure\n for available measures and\ntheir properties.)\n\n\nIf you find that a measure is missing, either \nopen an issue\n\nor read \nhow to implement a measure yourself\n.\n\n\nColumn \nMinimize\n indicates if the measure is minimized during, e.g., tuning or\nfeature selection.\n\nBest\n and \nWorst\n show the best and worst values the performance measure can attain.\nFor \nclassification\n, column \nMultiClass\n indicates if a measure is suitable for\nmulti-class problems. If not, the measure can only be used for binary classification problems.\n\n\nThe next six columns refer to information required to calculate the performance measure.\n\n\n\n\nPrediction\n: The \nPrediction\n object.\n\n\nTruth\n: The true values of the response variable(s) (for supervised learning).\n\n\nProbs\n: The predicted probabilities (might be needed for classification).\n\n\nModel\n: The \nWrappedModel\n (e.g., for calculating the training time).\n\n\nTask\n: The \nTask\n (relevant for cost-sensitive classification).\n\n\nFeats\n: The predicted data (relevant for clustering).\n\n\n\n\nAggregation\n shows the default \naggregation method\n tied to the measure.\n\n\nClassification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nMultiClass\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nacc\n - Accuracy\n\n\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nauc\n - Area under the curve\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nbac\n - Balanced accuracy\n\n\nMean of true positive rate and true negative rate.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nber\n - Balanced error rate\n\n\nMean of misclassification error rates on all individual classes.\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nbrier\n - Brier score\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nf1\n - F1 measure\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfdr\n - False discovery rate\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfn\n - False negatives\n\n\nAlso called misses.\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfnr\n - False negative rate\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfp\n - False positives\n\n\nAlso called false alarms.\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfpr\n - False positive rate\n\n\nAlso called false alarm rate or fall-out.\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ngmean\n - G-mean\n\n\nGeometric mean of recall and specificity.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ngpr\n - Geometric mean of precision and recall\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmcc\n - Matthews correlation coefficient\n\n\n\n\n\n\n1\n\n\n-1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmmce\n - Mean misclassification error\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmulticlass.auc\n - Multiclass area under the curve\n\n\nCalls \npROC::multiclass.roc\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nnpv\n - Negative predictive value\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nppv\n - Positive predictive value\n\n\nAlso called precision.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntn\n - True negatives\n\n\nAlso called correct rejections.\n\n\n\n\nInf\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntnr\n - True negative rate\n\n\nAlso called specificity.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntp\n - True positives\n\n\n\n\n\n\nInf\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntpr\n - True positive rate\n\n\nAlso called hit rate or recall.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nRegression\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nmae\n - Mean of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmedae\n - Median of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmedse\n - Median of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmse\n - Mean of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nrmse\n - Root mean square error\n\n\nThe RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.rmse\n\n\n\n\n\n\nsae\n - Sum of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nsse\n - Sum of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nSurvival analysis\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\ncindex\n - Concordance index\n\n\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nCluster analysis\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\ndb\n - Davies-Bouldin cluster separation measure\n\n\nSee \n?clusterSim::index.DB\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\ndunn\n - Dunn index\n\n\nSee \n?clValid::dunn\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nG1\n - Calinski-Harabasz pseudo F statistic\n\n\nSee \n?clusterSim::index.G1\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nG2\n - Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic\n\n\nSee \n?clusterSim::index.G2\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nsilhouette\n - Rousseeuw's silhouette internal cluster quality index\n\n\nSee \n?clusterSim::index.S\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\n\n\nCost-sensitive classification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nmcp\n - Misclassification penalty\n\n\nAverage difference between costs of oracle and model prediction.\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\nX\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmeancosts\n - Mean costs of the predicted choices\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\nX\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nNote that in case of \nordinary misclassification costs\n you can also generate performance\nmeasures from cost matrices by function \nmakeCostMeasure\n.\nFor details see the section on \ncost-sensitive classification\n.\n\n\nMultilabel classification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nhamloss\n - Hamming loss\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nGeneral performance measures\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nfeatperc\n - Percentage of original features used for model\n\n\nUseful for feature selection.\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimeboth\n - timetrain + timepredict\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimepredict\n - Time of predicting test set\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimetrain\n - Time of fitting the model\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean", + "text": "Implemented Performance Measures\n\n\nThe following tables show the performance measures available for the different types of\nlearning problems as well as general performance measures in alphabetical order.\n(See also the documentation about \nmeasures\n and \nmakeMeasure\n for available measures and\ntheir properties.)\n\n\nIf you find that a measure is missing, either \nopen an issue\n\nor read \nhow to implement a measure yourself\n.\n\n\nColumn \nMinimize\n indicates if the measure is minimized during, e.g., tuning or\nfeature selection.\n\nBest\n and \nWorst\n show the best and worst values the performance measure can attain.\nFor \nclassification\n, column \nMultiClass\n indicates if a measure is suitable for\nmulti-class problems. If not, the measure can only be used for binary classification problems.\n\n\nThe next six columns refer to information required to calculate the performance measure.\n\n\n\n\nPrediction\n: The \nPrediction\n object.\n\n\nTruth\n: The true values of the response variable(s) (for supervised learning).\n\n\nProbs\n: The predicted probabilities (might be needed for classification).\n\n\nModel\n: The \nWrappedModel\n (e.g., for calculating the training time).\n\n\nTask\n: The \nTask\n (relevant for cost-sensitive classification).\n\n\nFeats\n: The predicted data (relevant for clustering).\n\n\n\n\nAggregation\n shows the default \naggregation method\n tied to the measure.\n\n\nClassification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nMultiClass\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nacc\n - Accuracy\n\n\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nauc\n - Area under the curve\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nbac\n - Balanced accuracy\n\n\nMean of true positive rate and true negative rate.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nber\n - Balanced error rate\n\n\nMean of misclassification error rates on all individual classes.\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nbrier\n - Brier score\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nf1\n - F1 measure\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfdr\n - False discovery rate\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfn\n - False negatives\n\n\nAlso called misses.\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfnr\n - False negative rate\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfp\n - False positives\n\n\nAlso called false alarms.\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nfpr\n - False positive rate\n\n\nAlso called false alarm rate or fall-out.\n\n\nX\n\n\n0\n\n\n1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ngmean\n - G-mean\n\n\nGeometric mean of recall and specificity.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ngpr\n - Geometric mean of precision and recall\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmcc\n - Matthews correlation coefficient\n\n\n\n\n\n\n1\n\n\n-1\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmmce\n - Mean misclassification error\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmulticlass.auc\n - Multiclass area under the curve\n\n\nCalls \npROC::multiclass.roc\n.\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nnpv\n - Negative predictive value\n\n\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nppv\n - Positive predictive value\n\n\nAlso called precision.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntn\n - True negatives\n\n\nAlso called correct rejections.\n\n\n\n\nInf\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntnr\n - True negative rate\n\n\nAlso called specificity.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntp\n - True positives\n\n\n\n\n\n\nInf\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntpr\n - True positive rate\n\n\nAlso called hit rate or recall.\n\n\n\n\n1\n\n\n0\n\n\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nRegression\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nmae\n - Mean of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmedae\n - Median of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmedse\n - Median of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmse\n - Mean of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nrmse\n - Root mean square error\n\n\nThe RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.rmse\n\n\n\n\n\n\nsae\n - Sum of absolute errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\nsse\n - Sum of squared errors\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nSurvival analysis\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\ncindex\n - Concordance index\n\n\n\n\n\n\n1\n\n\n0\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nCluster analysis\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\ndb\n - Davies-Bouldin cluster separation measure\n\n\nSee \n?clusterSim::index.DB\n.\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\ndunn\n - Dunn index\n\n\nSee \n?clValid::dunn\n.\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nG1\n - Calinski-Harabasz pseudo F statistic\n\n\nSee \n?clusterSim::index.G1\n.\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nG2\n - Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic\n\n\nSee \n?clusterSim::index.G2\n.\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\nsilhouette\n - Rousseeuw's silhouette internal cluster quality index\n\n\nSee \n?clusterSim::index.S\n.\n\n\n\n\nInf\n\n\n0\n\n\nX\n\n\n\n\n\n\n\n\n\n\nX\n\n\ntest.mean\n\n\n\n\n\n\n\n\nCost-sensitive classification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nmcp\n - Misclassification penalty\n\n\nAverage difference between costs of oracle and model prediction.\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\nX\n\n\n\n\ntest.mean\n\n\n\n\n\n\nmeancosts\n - Mean costs of the predicted choices\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\nX\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nNote that in case of \nordinary misclassification costs\n you can also generate performance\nmeasures from cost matrices by function \nmakeCostMeasure\n.\nFor details see the section on \ncost-sensitive classification\n.\n\n\nMultilabel classification\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nhamloss\n - Hamming loss\n\n\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\nX\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\n\n\nGeneral performance measures\n\n\n\n\n\n\n\n\nMeasure\n\n\nNote\n\n\nMinimize\n\n\nBest\n\n\nWorst\n\n\nPrediction\n\n\nTruth\n\n\nProbs\n\n\nModel\n\n\nTask\n\n\nFeats\n\n\nAggregation\n\n\n\n\n\n\n\n\n\n\nfeatperc\n - Percentage of original features used for model\n\n\nUseful for feature selection.\n\n\nX\n\n\n0\n\n\n1\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimeboth\n - timetrain + timepredict\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimepredict\n - Time of predicting test set\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\nX\n\n\n\n\n\n\n\n\n\n\n\n\ntest.mean\n\n\n\n\n\n\ntimetrain\n - Time of fitting the model\n\n\n\n\nX\n\n\n0\n\n\nInf\n\n\n\n\n\n\n\n\nX\n\n\n\n\n\n\ntest.mean", "title": "Implemented Performance Measures" }, { "location": "/measures/index.html#implemented-performance-measures", - "text": "The following tables show the performance measures available for the different types of\nlearning problems as well as general performance measures in alphabetical order.\n(See also the documentation about measures and makeMeasure for available measures and\ntheir properties.) If you find that a measure is missing, either open an issue \nor read how to implement a measure yourself . Column Minimize indicates if the measure is minimized during, e.g., tuning or\nfeature selection. Best and Worst show the best and worst values the performance measure can attain.\nFor classification , column MultiClass indicates if a measure is suitable for\nmulti-class problems. If not, the measure can only be used for binary classification problems. The next six columns refer to information required to calculate the performance measure. Prediction : The Prediction object. Truth : The true values of the response variable(s) (for supervised learning). Probs : The predicted probabilities (might be needed for classification). Model : The WrappedModel (e.g., for calculating the training time). Task : The Task (relevant for cost-sensitive classification). Feats : The predicted data (relevant for clustering). Aggregation shows the default aggregation method tied to the measure. Classification Measure Note Minimize Best Worst MultiClass Prediction Truth Probs Model Task Feats Aggregation acc - Accuracy 1 0 X X X test.mean auc - Area under the curve 1 0 X X X test.mean bac - Balanced accuracy Mean of true positive rate and true negative rate. 1 0 X X test.mean ber - Balanced error rate Mean of misclassification error rates on all individual classes. X 0 1 X X X test.mean brier - Brier score X 0 1 X X X test.mean f1 - F1 measure 1 0 X X test.mean fdr - False discovery rate X 0 1 X X test.mean fn - False negatives Also called misses. X 0 Inf X X test.mean fnr - False negative rate X 0 1 X X test.mean fp - False positives Also called false alarms. X 0 Inf X X test.mean fpr - False positive rate Also called false alarm rate or fall-out. X 0 1 X X test.mean gmean - G-mean Geometric mean of recall and specificity. 1 0 X X test.mean gpr - Geometric mean of precision and recall 1 0 X X test.mean mcc - Matthews correlation coefficient 1 -1 X X test.mean mmce - Mean misclassification error X 0 1 X X X test.mean multiclass.auc - Multiclass area under the curve Calls pROC::multiclass.roc 1 0 X X X X test.mean npv - Negative predictive value 1 0 X X test.mean ppv - Positive predictive value Also called precision. 1 0 X X test.mean tn - True negatives Also called correct rejections. Inf 0 X X test.mean tnr - True negative rate Also called specificity. 1 0 X X test.mean tp - True positives Inf 0 X X test.mean tpr - True positive rate Also called hit rate or recall. 1 0 X X test.mean Regression Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation mae - Mean of absolute errors X 0 Inf X X test.mean medae - Median of absolute errors X 0 Inf X X test.mean medse - Median of squared errors X 0 Inf X X test.mean mse - Mean of squared errors X 0 Inf X X test.mean rmse - Root mean square error The RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean X 0 Inf X X test.rmse sae - Sum of absolute errors X 0 Inf X X test.mean sse - Sum of squared errors X 0 Inf X X test.mean Survival analysis Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation cindex - Concordance index 1 0 X X test.mean Cluster analysis Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation db - Davies-Bouldin cluster separation measure See ?clusterSim::index.DB X 0 Inf X X test.mean dunn - Dunn index See ?clValid::dunn Inf 0 X X test.mean G1 - Calinski-Harabasz pseudo F statistic See ?clusterSim::index.G1 Inf 0 X X test.mean G2 - Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic See ?clusterSim::index.G2 Inf 0 X X test.mean silhouette - Rousseeuw's silhouette internal cluster quality index See ?clusterSim::index.S Inf 0 X X test.mean Cost-sensitive classification Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation mcp - Misclassification penalty Average difference between costs of oracle and model prediction. X 0 Inf X X test.mean meancosts - Mean costs of the predicted choices X 0 Inf X X test.mean Note that in case of ordinary misclassification costs you can also generate performance\nmeasures from cost matrices by function makeCostMeasure .\nFor details see the section on cost-sensitive classification . Multilabel classification Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation hamloss - Hamming loss X 0 1 X X test.mean General performance measures Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation featperc - Percentage of original features used for model Useful for feature selection. X 0 1 X X test.mean timeboth - timetrain + timepredict X 0 Inf X X test.mean timepredict - Time of predicting test set X 0 Inf X test.mean timetrain - Time of fitting the model X 0 Inf X test.mean", + "text": "The following tables show the performance measures available for the different types of\nlearning problems as well as general performance measures in alphabetical order.\n(See also the documentation about measures and makeMeasure for available measures and\ntheir properties.) If you find that a measure is missing, either open an issue \nor read how to implement a measure yourself . Column Minimize indicates if the measure is minimized during, e.g., tuning or\nfeature selection. Best and Worst show the best and worst values the performance measure can attain.\nFor classification , column MultiClass indicates if a measure is suitable for\nmulti-class problems. If not, the measure can only be used for binary classification problems. The next six columns refer to information required to calculate the performance measure. Prediction : The Prediction object. Truth : The true values of the response variable(s) (for supervised learning). Probs : The predicted probabilities (might be needed for classification). Model : The WrappedModel (e.g., for calculating the training time). Task : The Task (relevant for cost-sensitive classification). Feats : The predicted data (relevant for clustering). Aggregation shows the default aggregation method tied to the measure. Classification Measure Note Minimize Best Worst MultiClass Prediction Truth Probs Model Task Feats Aggregation acc - Accuracy 1 0 X X X test.mean auc - Area under the curve 1 0 X X X test.mean bac - Balanced accuracy Mean of true positive rate and true negative rate. 1 0 X X test.mean ber - Balanced error rate Mean of misclassification error rates on all individual classes. X 0 1 X X X test.mean brier - Brier score X 0 1 X X X test.mean f1 - F1 measure 1 0 X X test.mean fdr - False discovery rate X 0 1 X X test.mean fn - False negatives Also called misses. X 0 Inf X X test.mean fnr - False negative rate X 0 1 X X test.mean fp - False positives Also called false alarms. X 0 Inf X X test.mean fpr - False positive rate Also called false alarm rate or fall-out. X 0 1 X X test.mean gmean - G-mean Geometric mean of recall and specificity. 1 0 X X test.mean gpr - Geometric mean of precision and recall 1 0 X X test.mean mcc - Matthews correlation coefficient 1 -1 X X test.mean mmce - Mean misclassification error X 0 1 X X X test.mean multiclass.auc - Multiclass area under the curve Calls pROC::multiclass.roc . 1 0 X X X X test.mean npv - Negative predictive value 1 0 X X test.mean ppv - Positive predictive value Also called precision. 1 0 X X test.mean tn - True negatives Also called correct rejections. Inf 0 X X test.mean tnr - True negative rate Also called specificity. 1 0 X X test.mean tp - True positives Inf 0 X X test.mean tpr - True positive rate Also called hit rate or recall. 1 0 X X test.mean Regression Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation mae - Mean of absolute errors X 0 Inf X X test.mean medae - Median of absolute errors X 0 Inf X X test.mean medse - Median of squared errors X 0 Inf X X test.mean mse - Mean of squared errors X 0 Inf X X test.mean rmse - Root mean square error The RMSE is aggregated as sqrt(mean(rmse.vals.on.test.sets^2)). If you don't want that, you could also use test.mean X 0 Inf X X test.rmse sae - Sum of absolute errors X 0 Inf X X test.mean sse - Sum of squared errors X 0 Inf X X test.mean Survival analysis Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation cindex - Concordance index 1 0 X X test.mean Cluster analysis Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation db - Davies-Bouldin cluster separation measure See ?clusterSim::index.DB . X 0 Inf X X test.mean dunn - Dunn index See ?clValid::dunn . Inf 0 X X test.mean G1 - Calinski-Harabasz pseudo F statistic See ?clusterSim::index.G1 . Inf 0 X X test.mean G2 - Baker and Hubert adaptation of Goodman-Kruskal's gamma statistic See ?clusterSim::index.G2 . Inf 0 X X test.mean silhouette - Rousseeuw's silhouette internal cluster quality index See ?clusterSim::index.S . Inf 0 X X test.mean Cost-sensitive classification Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation mcp - Misclassification penalty Average difference between costs of oracle and model prediction. X 0 Inf X X test.mean meancosts - Mean costs of the predicted choices X 0 Inf X X test.mean Note that in case of ordinary misclassification costs you can also generate performance\nmeasures from cost matrices by function makeCostMeasure .\nFor details see the section on cost-sensitive classification . Multilabel classification Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation hamloss - Hamming loss X 0 1 X X test.mean General performance measures Measure Note Minimize Best Worst Prediction Truth Probs Model Task Feats Aggregation featperc - Percentage of original features used for model Useful for feature selection. X 0 1 X X test.mean timeboth - timetrain + timepredict X 0 Inf X X test.mean timepredict - Time of predicting test set X 0 Inf X test.mean timetrain - Time of fitting the model X 0 Inf X test.mean", "title": "Implemented Performance Measures" }, { diff --git a/devel/html/multilabel/index.html b/devel/html/multilabel/index.html index 8828d8f6..c10c5909 100644 --- a/devel/html/multilabel/index.html +++ b/devel/html/multilabel/index.html @@ -510,7 +510,7 @@

    Performance

    performance(pred2, measures = list(hamloss, timepredict)) #> hamloss timepredict -#> 0.6946924 0.0830000 +#> 0.6946924 0.0860000 listMeasures("multilabel") #> [1] "timepredict" "featperc" "timeboth" "timetrain" "hamloss" @@ -530,7 +530,7 @@

    Resampling

    #> hamloss.aggr: 0.23 #> hamloss.mean: 0.23 #> hamloss.sd: 0.01 -#> Runtime: 9.11298 +#> Runtime: 9.4152 r = resample(learner = multilabel.lrn2, task = yeast.task, resampling = rdesc, show.info = FALSE) r @@ -540,7 +540,7 @@

    Resampling

    #> hamloss.aggr: 0.47 #> hamloss.mean: 0.47 #> hamloss.sd: 0.01 -#> Runtime: 0.827824 +#> Runtime: 0.829432

    Binary performance

    diff --git a/devel/html/nested_resampling/index.html b/devel/html/nested_resampling/index.html index f86d11d4..848f6521 100644 --- a/devel/html/nested_resampling/index.html +++ b/devel/html/nested_resampling/index.html @@ -412,7 +412,7 @@

    Tuning

    #> mmce.aggr: 0.05 #> mmce.mean: 0.05 #> mmce.sd: 0.03 -#> Runtime: 18.9149 +#> Runtime: 23.346

    You can obtain the error rates on the 3 outer test sets by:

    @@ -466,16 +466,16 @@

    Accessing the tuning result

    opt.paths = getNestedTuneResultsOptPathDf(r)
     head(opt.paths, 10)
     #>       C sigma mmce.test.mean dob eol error.message exec.time iter
    -#> 1  0.25  0.25     0.05882353   1  NA          <NA>     0.046    1
    +#> 1  0.25  0.25     0.05882353   1  NA          <NA>     0.048    1
     #> 2   0.5  0.25     0.04411765   2  NA          <NA>     0.045    1
    -#> 3     1  0.25     0.04411765   3  NA          <NA>     0.046    1
    -#> 4     2  0.25     0.01470588   4  NA          <NA>     0.047    1
    -#> 5     4  0.25     0.05882353   5  NA          <NA>     0.046    1
    -#> 6  0.25   0.5     0.05882353   6  NA          <NA>     0.046    1
    +#> 3     1  0.25     0.04411765   3  NA          <NA>     0.045    1
    +#> 4     2  0.25     0.01470588   4  NA          <NA>     0.046    1
    +#> 5     4  0.25     0.05882353   5  NA          <NA>     0.045    1
    +#> 6  0.25   0.5     0.05882353   6  NA          <NA>     0.047    1
     #> 7   0.5   0.5     0.01470588   7  NA          <NA>     0.046    1
     #> 8     1   0.5     0.02941176   8  NA          <NA>     0.045    1
    -#> 9     2   0.5     0.01470588   9  NA          <NA>     0.046    1
    -#> 10    4   0.5     0.05882353  10  NA          <NA>     0.049    1
    +#> 9     2   0.5     0.01470588   9  NA          <NA>     0.045    1
    +#> 10    4   0.5     0.05882353  10  NA          <NA>     0.046    1
     

    Below we visualize the opt.paths for the 3 outer resampling iterations.

    @@ -522,7 +522,7 @@

    Wrapper methods

    #> mse.aggr: 31.70 #> mse.mean: 31.70 #> mse.sd: 4.79 -#> Runtime: 49.0522 +#> Runtime: 52.561 r$measures.test #> iter mse @@ -571,11 +571,11 @@

    Accessing the selected features

    #> 6 0 0 0 0 1 0 0 0 0 0 0 0 0 63.06724 #> dob eol error.message exec.time #> 1 1 2 <NA> 0.022 -#> 2 2 2 <NA> 0.038 -#> 3 2 2 <NA> 0.034 -#> 4 2 2 <NA> 0.034 +#> 2 2 2 <NA> 0.033 +#> 3 2 2 <NA> 0.032 +#> 4 2 2 <NA> 0.031 #> 5 2 2 <NA> 0.034 -#> 6 2 2 <NA> 0.034 +#> 6 2 2 <NA> 0.031

    An easy-to-read version of the optimization path for sequential feature selection can be @@ -628,7 +628,7 @@

    Filter methods with tuning

    #> mse.aggr: 25.39 #> mse.mean: 25.39 #> mse.sd: 8.35 -#> Runtime: 8.5303 +#> Runtime: 9.75966

    Accessing the selected features and optimal percentage

    @@ -689,12 +689,12 @@

    Accessing the se opt.paths = lapply(res, function(x) as.data.frame(x$opt.path)) opt.paths[[1]] #> fw.threshold mse.test.mean dob eol error.message exec.time -#> 1 0 24.89160 1 NA <NA> 0.560 -#> 2 0.2 25.18817 2 NA <NA> 0.273 -#> 3 0.4 25.18817 3 NA <NA> 0.262 -#> 4 0.6 32.15930 4 NA <NA> 0.245 -#> 5 0.8 90.89848 5 NA <NA> 0.233 -#> 6 1 90.89848 6 NA <NA> 0.218 +#> 1 0 24.89160 1 NA <NA> 0.584 +#> 2 0.2 25.18817 2 NA <NA> 0.283 +#> 3 0.4 25.18817 3 NA <NA> 0.263 +#> 4 0.6 32.15930 4 NA <NA> 0.251 +#> 5 0.8 90.89848 5 NA <NA> 0.241 +#> 6 1 90.89848 6 NA <NA> 0.228

    Benchmark experiments

    @@ -849,23 +849,23 @@

    Example 1: Two tasks, two learn getNestedTuneResultsOptPathDf(res$results[["Sonar-example"]][["classif.ksvm.tuned"]]) #> C sigma mmce.test.mean dob eol error.message exec.time iter -#> 1 0.5 0.5 0.3428571 1 NA <NA> 0.048 1 -#> 2 1 0.5 0.3428571 2 NA <NA> 0.049 1 -#> 3 2 0.5 0.3428571 3 NA <NA> 0.046 1 -#> 4 0.5 1 0.3428571 4 NA <NA> 0.048 1 +#> 1 0.5 0.5 0.3428571 1 NA <NA> 0.056 1 +#> 2 1 0.5 0.3428571 2 NA <NA> 0.052 1 +#> 3 2 0.5 0.3428571 3 NA <NA> 0.048 1 +#> 4 0.5 1 0.3428571 4 NA <NA> 0.049 1 #> 5 1 1 0.3428571 5 NA <NA> 0.048 1 -#> 6 2 1 0.3428571 6 NA <NA> 0.049 1 -#> 7 0.5 2 0.3428571 7 NA <NA> 0.048 1 +#> 6 2 1 0.3428571 6 NA <NA> 0.047 1 +#> 7 0.5 2 0.3428571 7 NA <NA> 0.049 1 #> 8 1 2 0.3428571 8 NA <NA> 0.052 1 #> 9 2 2 0.3428571 9 NA <NA> 0.048 1 -#> 10 0.5 0.5 0.2142857 1 NA <NA> 0.051 2 -#> 11 1 0.5 0.2142857 2 NA <NA> 0.048 2 -#> 12 2 0.5 0.2000000 3 NA <NA> 0.047 2 -#> 13 0.5 1 0.2142857 4 NA <NA> 0.050 2 -#> 14 1 1 0.2142857 5 NA <NA> 0.048 2 -#> 15 2 1 0.2142857 6 NA <NA> 0.050 2 -#> 16 0.5 2 0.2142857 7 NA <NA> 0.051 2 -#> 17 1 2 0.2142857 8 NA <NA> 0.051 2 +#> 10 0.5 0.5 0.2142857 1 NA <NA> 0.052 2 +#> 11 1 0.5 0.2142857 2 NA <NA> 0.052 2 +#> 12 2 0.5 0.2000000 3 NA <NA> 0.048 2 +#> 13 0.5 1 0.2142857 4 NA <NA> 0.049 2 +#> 14 1 1 0.2142857 5 NA <NA> 0.049 2 +#> 15 2 1 0.2142857 6 NA <NA> 0.051 2 +#> 16 0.5 2 0.2142857 7 NA <NA> 0.056 2 +#> 17 1 2 0.2142857 8 NA <NA> 0.056 2 #> 18 2 2 0.2142857 9 NA <NA> 0.051 2 @@ -937,12 +937,12 @@

    Example 2: One task, #> 5 0 0 0 1 0 0 0 0 0 0 0 0 0 86.93409 #> 6 0 0 0 0 1 0 0 0 0 0 0 0 0 76.32457 #> dob eol error.message exec.time -#> 1 1 2 <NA> 0.018 -#> 2 2 2 <NA> 0.023 -#> 3 2 2 <NA> 0.023 +#> 1 1 2 <NA> 0.019 +#> 2 2 2 <NA> 0.024 +#> 3 2 2 <NA> 0.024 #> 4 2 2 <NA> 0.024 -#> 5 2 2 <NA> 0.025 -#> 6 2 2 <NA> 0.023 +#> 5 2 2 <NA> 0.027 +#> 6 2 2 <NA> 0.024 analyzeFeatSelResult(feats[[1]]) #> Features : 8 diff --git a/devel/html/performance/index.html b/devel/html/performance/index.html index 77050643..e27af440 100644 --- a/devel/html/performance/index.html +++ b/devel/html/performance/index.html @@ -440,7 +440,7 @@

    Requirements of performance measur model has to be passed.

    performance(pred, measures = timetrain, model = mod)
     #> timetrain 
    -#>     0.121
    +#>     0.111
     

    For many performance measures in cluster analysis the Task is required.

    diff --git a/devel/html/preproc/index.html b/devel/html/preproc/index.html index 5592b559..b5a44d3f 100644 --- a/devel/html/preproc/index.html +++ b/devel/html/preproc/index.html @@ -749,7 +749,7 @@

    Creating the preprocessing wrapperCreating the preprocessing wrapper

    Joint tuning of preprocessing and learner parameters

    @@ -830,18 +830,18 @@

    Joint tuning of pr as.data.frame(res$opt.path) #> decay center scale mse.test.mean dob eol error.message exec.time -#> 1 0 TRUE TRUE 49.38128 1 NA <NA> 0.069 -#> 2 0.05 TRUE TRUE 20.64761 2 NA <NA> 0.082 -#> 3 0.1 TRUE TRUE 22.42986 3 NA <NA> 0.078 -#> 4 0 FALSE TRUE 96.25474 4 NA <NA> 0.034 -#> 5 0.05 FALSE TRUE 14.84306 5 NA <NA> 0.082 -#> 6 0.1 FALSE TRUE 16.65383 6 NA <NA> 0.078 -#> 7 0 TRUE FALSE 40.51518 7 NA <NA> 0.079 -#> 8 0.05 TRUE FALSE 68.00069 8 NA <NA> 0.077 -#> 9 0.1 TRUE FALSE 55.42210 9 NA <NA> 0.085 -#> 10 0 FALSE FALSE 96.25474 10 NA <NA> 0.032 -#> 11 0.05 FALSE FALSE 56.25758 11 NA <NA> 0.083 -#> 12 0.1 FALSE FALSE 42.85529 12 NA <NA> 0.079 +#> 1 0 TRUE TRUE 49.38128 1 NA <NA> 0.064 +#> 2 0.05 TRUE TRUE 20.64761 2 NA <NA> 0.071 +#> 3 0.1 TRUE TRUE 22.42986 3 NA <NA> 0.066 +#> 4 0 FALSE TRUE 96.25474 4 NA <NA> 0.028 +#> 5 0.05 FALSE TRUE 14.84306 5 NA <NA> 0.070 +#> 6 0.1 FALSE TRUE 16.65383 6 NA <NA> 0.060 +#> 7 0 TRUE FALSE 40.51518 7 NA <NA> 0.069 +#> 8 0.05 TRUE FALSE 68.00069 8 NA <NA> 0.065 +#> 9 0.1 TRUE FALSE 55.42210 9 NA <NA> 0.069 +#> 10 0 FALSE FALSE 96.25474 10 NA <NA> 0.027 +#> 11 0.05 FALSE FALSE 56.25758 11 NA <NA> 0.069 +#> 12 0.1 FALSE FALSE 42.85529 12 NA <NA> 0.067

    Preprocessing wrapper functions

    diff --git a/devel/html/resample/index.html b/devel/html/resample/index.html index b7f58306..38cf16d1 100644 --- a/devel/html/resample/index.html +++ b/devel/html/resample/index.html @@ -393,7 +393,7 @@

    Resampling

    #> cindex.aggr: 0.63 #> cindex.mean: 0.63 #> cindex.sd: 0.05 -#> Runtime: 0.167005 +#> Runtime: 0.126201 ## peak a little bit into r names(r) #> [1] "learner.id" "task.id" "measures.train" "measures.test" diff --git a/devel/html/roc_analysis/index.html b/devel/html/roc_analysis/index.html index 7cd7f5f4..091d3a86 100644 --- a/devel/html/roc_analysis/index.html +++ b/devel/html/roc_analysis/index.html @@ -683,10 +683,10 @@

    Viper charts

    ResampleResult and BenchmarkResult. Below plots for the benchmark experiment (Example 2) are generated.

    z = plotViperCharts(res, chart = "rocc", browse = FALSE)
    -#> Error in function (type, msg, asError = TRUE) : couldn't connect to host
     
    -

    Note that besides ROC curves you get several other plots like lift charts or cost curves. +

    You can see the plot created this way here. +Note that besides ROC curves you get several other plots like lift charts or cost curves. For details, see plotViperCharts.

    diff --git a/devel/html/sitemap.xml b/devel/html/sitemap.xml index 371abf21..831dbef1 100644 --- a/devel/html/sitemap.xml +++ b/devel/html/sitemap.xml @@ -4,7 +4,7 @@ None/index.html - 2015-11-24 + 2015-11-27 daily @@ -13,55 +13,55 @@ None/task/index.html - 2015-11-24 + 2015-11-27 daily None/learner/index.html - 2015-11-24 + 2015-11-27 daily None/train/index.html - 2015-11-24 + 2015-11-27 daily None/predict/index.html - 2015-11-24 + 2015-11-27 daily None/performance/index.html - 2015-11-24 + 2015-11-27 daily None/resample/index.html - 2015-11-24 + 2015-11-27 daily None/benchmark_experiments/index.html - 2015-11-24 + 2015-11-27 daily None/parallelization/index.html - 2015-11-24 + 2015-11-27 daily None/visualization/index.html - 2015-11-24 + 2015-11-27 daily @@ -71,91 +71,91 @@ None/configureMlr/index.html - 2015-11-24 + 2015-11-27 daily None/wrapper/index.html - 2015-11-24 + 2015-11-27 daily None/preproc/index.html - 2015-11-24 + 2015-11-27 daily None/impute/index.html - 2015-11-24 + 2015-11-27 daily None/bagging/index.html - 2015-11-24 + 2015-11-27 daily None/tune/index.html - 2015-11-24 + 2015-11-27 daily None/feature_selection/index.html - 2015-11-24 + 2015-11-27 daily None/nested_resampling/index.html - 2015-11-24 + 2015-11-27 daily None/cost_sensitive_classif/index.html - 2015-11-24 + 2015-11-27 daily None/over_and_undersampling/index.html - 2015-11-24 + 2015-11-27 daily None/roc_analysis/index.html - 2015-11-24 + 2015-11-27 daily None/multilabel/index.html - 2015-11-24 + 2015-11-27 daily None/learning_curve/index.html - 2015-11-24 + 2015-11-27 daily None/partial_prediction/index.html - 2015-11-24 + 2015-11-27 daily None/classifier_calibration/index.html - 2015-11-24 + 2015-11-27 daily @@ -165,19 +165,19 @@ None/create_learner/index.html - 2015-11-24 + 2015-11-27 daily None/create_measure/index.html - 2015-11-24 + 2015-11-27 daily None/create_imputation/index.html - 2015-11-24 + 2015-11-27 daily @@ -187,25 +187,25 @@ None/example_tasks/index.html - 2015-11-24 + 2015-11-27 daily None/integrated_learners/index.html - 2015-11-24 + 2015-11-27 daily None/measures/index.html - 2015-11-24 + 2015-11-27 daily None/filter_methods/index.html - 2015-11-24 + 2015-11-27 daily diff --git a/devel/html/tune/index.html b/devel/html/tune/index.html index 29904f81..ad9ada0d 100644 --- a/devel/html/tune/index.html +++ b/devel/html/tune/index.html @@ -409,55 +409,55 @@

    Grid search with manual discreti #> With control class: TuneControlGrid #> Imputation value: 1 #> [Tune-x] 1: C=0.25; sigma=0.25 -#> [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 1: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 2: C=0.5; sigma=0.25 -#> [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 2: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 3: C=1; sigma=0.25 -#> [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 4: C=2; sigma=0.25 -#> [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 4: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 5: C=4; sigma=0.25 -#> [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 5: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 6: C=0.25; sigma=0.5 -#> [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 6: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 7: C=0.5; sigma=0.5 -#> [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 7: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 8: C=1; sigma=0.5 -#> [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 8: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 9: C=2; sigma=0.5 -#> [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 9: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 10: C=4; sigma=0.5 -#> [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 10: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 11: C=0.25; sigma=1 -#> [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 11: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 12: C=0.5; sigma=1 -#> [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 12: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 13: C=1; sigma=1 -#> [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 13: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 14: C=2; sigma=1 -#> [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 14: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 15: C=4; sigma=1 -#> [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 15: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 16: C=0.25; sigma=2 -#> [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 16: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 17: C=0.5; sigma=2 -#> [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 17: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 18: C=1; sigma=2 -#> [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 18: mmce.test.mean=0.0333; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 19: C=2; sigma=2 -#> [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 19: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 20: C=4; sigma=2 -#> [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 20: mmce.test.mean=0.0467; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 21: C=0.25; sigma=4 -#> [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 21: mmce.test.mean=0.113; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 22: C=0.5; sigma=4 -#> [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 22: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 23: C=1; sigma=4 -#> [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 23: mmce.test.mean=0.0533; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 24: C=2; sigma=4 -#> [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 24: mmce.test.mean=0.06; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 25: C=4; sigma=4 -#> [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 25: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune] Result: C=1; sigma=2 : mmce.test.mean=0.0333 res #> Tune result: @@ -517,16 +517,16 @@

    Accessing the tuning result

    #> Length: 25 #> Add x values transformed: FALSE #> Error messages: TRUE. Errors: 0 / 25. -#> Exec times: TRUE. Range: 0.077 - 0.092. 0 NAs. +#> Exec times: TRUE. Range: 0.067 - 0.085. 0 NAs. opt.grid = as.data.frame(res$opt.path) head(opt.grid) #> C sigma acc.test.mean acc.test.sd dob eol error.message exec.time -#> 1 0.25 0.25 0.9533333 0.03055050 1 NA <NA> 0.091 -#> 2 0.5 0.25 0.9466667 0.02309401 2 NA <NA> 0.081 -#> 3 1 0.25 0.9533333 0.01154701 3 NA <NA> 0.083 -#> 4 2 0.25 0.9533333 0.01154701 4 NA <NA> 0.082 -#> 5 4 0.25 0.9533333 0.01154701 5 NA <NA> 0.081 -#> 6 0.25 0.5 0.9333333 0.01154701 6 NA <NA> 0.085 +#> 1 0.25 0.25 0.9533333 0.03055050 1 NA <NA> 0.071 +#> 2 0.5 0.25 0.9466667 0.02309401 2 NA <NA> 0.069 +#> 3 1 0.25 0.9533333 0.01154701 3 NA <NA> 0.067 +#> 4 2 0.25 0.9533333 0.01154701 4 NA <NA> 0.067 +#> 5 4 0.25 0.9533333 0.01154701 5 NA <NA> 0.068 +#> 6 0.25 0.5 0.9333333 0.01154701 6 NA <NA> 0.069

    A quick visualization of the performance values on the search grid can be accomplished as follows:

    @@ -592,23 +592,23 @@

    Grid search without manual di #> With control class: TuneControlGrid #> Imputation value: 1 #> [Tune-x] 1: C=0.000244; sigma=0.000244 -#> [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 1: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 2: C=1; sigma=0.000244 -#> [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 2: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 3: C=4.1e+03; sigma=0.000244 -#> [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 3: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 4: C=0.000244; sigma=1 -#> [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 4: mmce.test.mean=0.527; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 5: C=1; sigma=1 -#> [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 5: mmce.test.mean=0.04; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 6: C=4.1e+03; sigma=1 -#> [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 6: mmce.test.mean=0.0667; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 7: C=0.000244; sigma=4.1e+03 -#> [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 7: mmce.test.mean=0.567; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 8: C=1; sigma=4.1e+03 -#> [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 8: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune-x] 9: C=4.1e+03; sigma=4.1e+03 -#> [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 478Mb max +#> [Tune-y] 9: mmce.test.mean=0.687; time: 0.0 min; memory: 156Mb use, 479Mb max #> [Tune] Result: C=1; sigma=1 : mmce.test.mean=0.04 res #> Tune result: @@ -619,15 +619,15 @@

    Grid search without manual di

    Note that res$opt.path contains the parameter values on the original scale.

    as.data.frame(res$opt.path)
     #>     C sigma mmce.test.mean dob eol error.message exec.time
    -#> 1 -12   -12     0.52666667   1  NA          <NA>     0.054
    -#> 2   0   -12     0.52666667   2  NA          <NA>     0.055
    -#> 3  12   -12     0.04000000   3  NA          <NA>     0.052
    -#> 4 -12     0     0.52666667   4  NA          <NA>     0.054
    -#> 5   0     0     0.04000000   5  NA          <NA>     0.053
    -#> 6  12     0     0.06666667   6  NA          <NA>     0.057
    -#> 7 -12    12     0.56666667   7  NA          <NA>     0.059
    -#> 8   0    12     0.68666667   8  NA          <NA>     0.061
    -#> 9  12    12     0.68666667   9  NA          <NA>     0.056
    +#> 1 -12   -12     0.52666667   1  NA          <NA>     0.048
    +#> 2   0   -12     0.52666667   2  NA          <NA>     0.050
    +#> 3  12   -12     0.04000000   3  NA          <NA>     0.046
    +#> 4 -12     0     0.52666667   4  NA          <NA>     0.050
    +#> 5   0     0     0.04000000   5  NA          <NA>     0.047
    +#> 6  12     0     0.06666667   6  NA          <NA>     0.048
    +#> 7 -12    12     0.56666667   7  NA          <NA>     0.049
    +#> 8   0    12     0.68666667   8  NA          <NA>     0.049
    +#> 9  12    12     0.68666667   9  NA          <NA>     0.049
     

    In order to get the transformed parameter values instead, use function @@ -670,11 +670,11 @@

    Iterated F-Racing f #> 5 11.995501 polydot NA 2 0.08 5 NA #> 6 -5.731782 vanilladot NA NA 0.14 6 NA #> error.message exec.time -#> 1 <NA> 0.033 -#> 2 <NA> 0.034 -#> 3 <NA> 0.031 -#> 4 <NA> 0.035 -#> 5 <NA> 0.060 +#> 1 <NA> 0.031 +#> 2 <NA> 0.027 +#> 3 <NA> 0.029 +#> 4 <NA> 0.030 +#> 5 <NA> 0.028 #> 6 <NA> 0.031 @@ -724,12 +724,12 @@

    Tuning across wh #> 5 classif.randomForest NA 125 #> 6 classif.randomForest NA 383 #> mmce.test.mean dob eol error.message exec.time -#> 1 0.04666667 1 NA <NA> 0.056 -#> 2 0.75333333 2 NA <NA> 0.055 -#> 3 0.03333333 3 NA <NA> 0.094 -#> 4 0.24000000 4 NA <NA> 0.060 -#> 5 0.04000000 5 NA <NA> 0.056 -#> 6 0.04000000 6 NA <NA> 0.079 +#> 1 0.04666667 1 NA <NA> 0.051 +#> 2 0.75333333 2 NA <NA> 0.054 +#> 3 0.03333333 3 NA <NA> 0.079 +#> 4 0.24000000 4 NA <NA> 0.053 +#> 5 0.04000000 5 NA <NA> 0.049 +#> 6 0.04000000 6 NA <NA> 0.071

    Multi-criteria evaluation and optimization

    diff --git a/devel/html/wrapper/index.html b/devel/html/wrapper/index.html index 81cf9d2a..2b7d6908 100644 --- a/devel/html/wrapper/index.html +++ b/devel/html/wrapper/index.html @@ -460,25 +460,25 @@

    Example: Bagging wrapper

    #> With control class: TuneControlRandom #> Imputation value: 1 #> [Tune-x] 1: minsplit=5; bw.feats=0.935 -#> [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 1: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 2: minsplit=9; bw.feats=0.675 -#> [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 2: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 3: minsplit=2; bw.feats=0.847 -#> [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 3: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 4: minsplit=4; bw.feats=0.761 -#> [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 4: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 5: minsplit=6; bw.feats=0.338 -#> [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 5: mmce.test.mean=0.0867; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 6: minsplit=1; bw.feats=0.637 -#> [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 6: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 7: minsplit=1; bw.feats=0.998 -#> [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 7: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 8: minsplit=4; bw.feats=0.698 -#> [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 8: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 9: minsplit=3; bw.feats=0.836 -#> [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 9: mmce.test.mean=0.0467; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune-x] 10: minsplit=10; bw.feats=0.529 -#> [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 478Mb max +#> [Tune-y] 10: mmce.test.mean=0.0533; time: 0.1 min; memory: 160Mb use, 522Mb max #> [Tune] Result: minsplit=1; bw.feats=0.998 : mmce.test.mean=0.0467 print(lrn) #> Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper diff --git a/devel/mlr_tutorial.zip b/devel/mlr_tutorial.zip index b8d44708..0e961a84 100644 Binary files a/devel/mlr_tutorial.zip and b/devel/mlr_tutorial.zip differ