Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid creating regression tasks for data with categorical targets #91

Closed
giuseppec opened this issue Dec 18, 2018 · 7 comments
Closed

Comments

@giuseppec
Copy link
Contributor

It is currently possible to create regression tasks with categorical target:

b = as_data_backend(iris)
TaskRegr$new("iris", backend = b, target = "Species")
@mohammadreza-sheykhmousa
Copy link

mohammadreza-sheykhmousa commented Aug 28, 2020

@giuseppec @mllg I tried to train a regression model where target is numeric but covariates are a mix of categorical and numerical var and I face the error as the following:
Error: <TaskRegr:meuse> has the following unsupported feature types: factor
Can you please help me here?

@frdanconia
Copy link

@mohammadreza-sheykhmousa @giuseppec @mllg I have the same problem, how to create regression task with numeric target (of course) and categorical covariates? I would be very grateful for your help.

@mllg mllg reopened this Sep 24, 2020
@lebensterben
Copy link

  1. @giuseppec There is nothing wrong to use even linear regression on classification.
  2. @mohammadreza-sheykhmousa @frdanconia for a factor variable of n level, create n-1 binary variables as indicator variables

@mllg
Copy link
Member

mllg commented Sep 25, 2020

TaskRegr:meuse has the following unsupported feature types: factor

That means that the learner you are trying to apply does not support factor features. You must convert them to numeric first or enhance your learner with one-hot encoding using mlr3pipelines:

library(mlr3pipelines)
learner = po("encode") %>>% learner

@mllg
Copy link
Member

mllg commented Sep 25, 2020

I have the same problem, how to create regression task with numeric target (of course) and categorical covariates? I would be very grateful for your help.

Creating such a task should be no problem, only some learners do not like the factor features. If you can give an example where the task creation fails, please reopen.

@mllg mllg closed this as completed Sep 25, 2020
@mohammadreza-sheykhmousa

Hi @mllg yes you're right! I solved the problem as the following:

gr = pipeline_robustify(tsk_rgr, lrn) %>>% po("learner", lrn)
      ede = resample(tsk_rgr, GraphLearner$new(gr), rsmp("holdout"))
      tsk_regr1 = ede$task$clone()

tnx to pipeline_robustify

@raff-k
Copy link

raff-k commented Jan 25, 2023

Unfortunately I have the same problem with mgcv::gam. Whereas the original model itself can process factors. The option with mlr3pipelines::po did not help me. Below is an example.

library(dplyr)
library(mlr3)
library(mlr3extralearners)
library(mlr3pipelines)
library(mgcv)

# Example from here:
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html

# ... get data
t <- mlr3::tsk("spam")
t_data <- t$data()

# ... (re-)create the task
t_re <- as_task_classif( 
  id = "spam", 
  target = "type", 
  positive = "spam",
  x = t_data
)

# ... init mgcv gam learner
l <- mlr3::lrn("classif.gam",
              formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu))

# ... train and get gam
l$train(t_re)
l$model # ... creates some output -> example of mlr3-page successfully reproduced


# NOW, the error due to the created factor
set.seed(123)
t_data_fac <- t_data %>%
        dplyr::mutate(fac = sample(x = c(1:3), size = nrow(t_data), replace = T) %>% as.factor())

# create task with additional factor variable
t_fac = as_task_classif( 
  id = "spam_fac", 
  target = "type", 
  positive = "spam",
  x = t_data_fac
)

# ... init mgcv gam learner with factor
l_fac <- mlr3::lrn("classif.gam",
              formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)


# ... train and get gam
l_fac$train(t_fac)

# Error: <TaskClassif:spam_fac> has the following unsupported feature types: factor

# Here is the page of mlr3-implementation of mgcv::gam
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html
# Perhaps in Feature Types: “logical”, “integer”, “numeric” --> "factor" is missing?

# ... but normally, factors are no issues for mgcv::gam
l_gam <- mgcv::gam(formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac, 
                 data = t_data_fac, family = "binomial")

l_gam %>% summary()

# Parametric coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 2.441e+03  2.479e+05   0.010    0.992
# fac2        4.675e-02  9.814e-02   0.476    0.634
# fac3        5.879e-02  9.689e-02   0.607    0.544



# Now, trying mlr3pipelines solution, but it does not work
l_fac_enc <- mlr3pipelines::po("encode")  %>>% 
  mlr3::lrn("classif.gam", formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)

l_fac_enc$train(input = t_fac)
# Error in eval(predvars, data, env) : object 'fac' not found
# This happened PipeOp classif.gam's $train()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants