-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid creating regression tasks for data with categorical targets #91
Comments
@giuseppec @mllg I tried to train a regression model where target is numeric but covariates are a mix of categorical and numerical var and I face the error as the following: |
@mohammadreza-sheykhmousa @giuseppec @mllg I have the same problem, how to create regression task with numeric target (of course) and categorical covariates? I would be very grateful for your help. |
|
That means that the learner you are trying to apply does not support factor features. You must convert them to numeric first or enhance your learner with one-hot encoding using mlr3pipelines: library(mlr3pipelines)
learner = po("encode") %>>% learner |
Creating such a task should be no problem, only some learners do not like the factor features. If you can give an example where the task creation fails, please reopen. |
Hi @mllg yes you're right! I solved the problem as the following: gr = pipeline_robustify(tsk_rgr, lrn) %>>% po("learner", lrn)
ede = resample(tsk_rgr, GraphLearner$new(gr), rsmp("holdout"))
tsk_regr1 = ede$task$clone() tnx to |
Unfortunately I have the same problem with mgcv::gam. Whereas the original model itself can process factors. The option with library(dplyr)
library(mlr3)
library(mlr3extralearners)
library(mlr3pipelines)
library(mgcv)
# Example from here:
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html
# ... get data
t <- mlr3::tsk("spam")
t_data <- t$data()
# ... (re-)create the task
t_re <- as_task_classif(
id = "spam",
target = "type",
positive = "spam",
x = t_data
)
# ... init mgcv gam learner
l <- mlr3::lrn("classif.gam",
formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu))
# ... train and get gam
l$train(t_re)
l$model # ... creates some output -> example of mlr3-page successfully reproduced
# NOW, the error due to the created factor
set.seed(123)
t_data_fac <- t_data %>%
dplyr::mutate(fac = sample(x = c(1:3), size = nrow(t_data), replace = T) %>% as.factor())
# create task with additional factor variable
t_fac = as_task_classif(
id = "spam_fac",
target = "type",
positive = "spam",
x = t_data_fac
)
# ... init mgcv gam learner with factor
l_fac <- mlr3::lrn("classif.gam",
formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)
# ... train and get gam
l_fac$train(t_fac)
# Error: <TaskClassif:spam_fac> has the following unsupported feature types: factor
# Here is the page of mlr3-implementation of mgcv::gam
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html
# Perhaps in Feature Types: “logical”, “integer”, “numeric” --> "factor" is missing?
# ... but normally, factors are no issues for mgcv::gam
l_gam <- mgcv::gam(formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac,
data = t_data_fac, family = "binomial")
l_gam %>% summary()
# Parametric coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 2.441e+03 2.479e+05 0.010 0.992
# fac2 4.675e-02 9.814e-02 0.476 0.634
# fac3 5.879e-02 9.689e-02 0.607 0.544
# Now, trying mlr3pipelines solution, but it does not work
l_fac_enc <- mlr3pipelines::po("encode") %>>%
mlr3::lrn("classif.gam", formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)
l_fac_enc$train(input = t_fac)
# Error in eval(predvars, data, env) : object 'fac' not found
# This happened PipeOp classif.gam's $train() |
It is currently possible to create regression tasks with categorical target:
The text was updated successfully, but these errors were encountered: