Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e1071::svm(): Use formula interface only if factors are present #1740

Merged
merged 10 commits into from Jun 17, 2019
1 change: 1 addition & 0 deletions NEWS.md
Expand Up @@ -11,6 +11,7 @@
See `?regr.randomForest` for more details.
`regr.ranger` relies on the functions provided by the package ("jackknife" and "infjackknife" (default))
(@jakob-r, #1784)
- `e1071::svm()` now only uses the formula interface if factors are present. This change is supposed to prevent from "stack overflow" issues some users encountered when using large datasets. See #1738 for more information. (@mb706, #1740)

## functions - general
- `getClassWeightParam()` now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)
Expand Down
13 changes: 10 additions & 3 deletions R/RLearner_classif_svm.R
Expand Up @@ -28,9 +28,16 @@ makeRLearner.classif.svm = function() {
}

#' @export
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
f = getTaskFormula(.task)
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
# use formula interface if factors are present
f = getTaskFormula(.task)
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
} else {
# use the "data.frame" approach if no factors are present to prevent issues like https://github.com/mlr-org/mlr/issues/1738
d = getTaskData(.task, .subset, target.extra = TRUE)
e1071::svm(d$data, d$target, probability = .learner$predict.type == "prob", ...)
}
}

#' @export
Expand Down
11 changes: 8 additions & 3 deletions R/RLearner_regr_svm.R
Expand Up @@ -27,9 +27,14 @@ makeRLearner.regr.svm = function() {
}

#' @export
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
f = getTaskFormula(.task)
e1071::svm(f, data = getTaskData(.task, .subset), ...)
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
f = getTaskFormula(.task)
e1071::svm(f, data = getTaskData(.task, .subset), ...)
} else {
d = getTaskData(.task, .subset, target.extra = TRUE)
e1071::svm(d$data, d$target, ...)
}
}

#' @export
Expand Down
2 changes: 2 additions & 0 deletions docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions tests/testthat/test_classif_svm.R
Expand Up @@ -54,3 +54,11 @@ test_that("classif_svm", {
preds = predict(model, multiclass.task)
expect_lt(performance(preds), 0.3)
})

test_that("classif_svm with many features", {
set.seed(8008135)
xt = cbind(as.data.frame(matrix(rnorm(4e4), ncol = 2e4)), x = as.factor(c("a", "b")))
xt.task = makeClassifTask("xt", xt, "x")
# the given task has many features, the formula interface fails
train("classif.svm", xt.task)
})
9 changes: 9 additions & 0 deletions tests/testthat/test_regr_svm.R
Expand Up @@ -30,3 +30,12 @@ test_that("regr_svm", {

testCVParsets("regr.svm", regr.df, regr.target, tune.train = tt, tune.predict = tp, parset.list = parset.list)
})

test_that("classif_svm with many features", {
set.seed(8008135)
xt = cbind(as.data.frame(matrix(rnorm(4e4), ncol = 2e4)), x = 1:2)
xt.task = makeRegrTask("xt", xt, "x")
# the given task has many features, the formula interface fails
train("regr.svm", xt.task)
})