avoid creating regression tasks for data with categorical targets #91

giuseppec · 2018-12-18T14:13:08Z

It is currently possible to create regression tasks with categorical target:

b = as_data_backend(iris)
TaskRegr$new("iris", backend = b, target = "Species")

The text was updated successfully, but these errors were encountered:

mohammadreza-sheykhmousa · 2020-08-28T09:43:02Z

@giuseppec @mllg I tried to train a regression model where target is numeric but covariates are a mix of categorical and numerical var and I face the error as the following:
Error: <TaskRegr:meuse> has the following unsupported feature types: factor
Can you please help me here?

frdanconia · 2020-09-24T08:28:36Z

@mohammadreza-sheykhmousa @giuseppec @mllg I have the same problem, how to create regression task with numeric target (of course) and categorical covariates? I would be very grateful for your help.

lebensterben · 2020-09-24T09:23:19Z

@giuseppec There is nothing wrong to use even linear regression on classification.
@mohammadreza-sheykhmousa @frdanconia for a factor variable of n level, create n-1 binary variables as indicator variables

mllg · 2020-09-25T12:19:31Z

TaskRegr:meuse has the following unsupported feature types: factor

That means that the learner you are trying to apply does not support factor features. You must convert them to numeric first or enhance your learner with one-hot encoding using mlr3pipelines:

library(mlr3pipelines)
learner = po("encode") %>>% learner

mllg · 2020-09-25T12:20:33Z

I have the same problem, how to create regression task with numeric target (of course) and categorical covariates? I would be very grateful for your help.

Creating such a task should be no problem, only some learners do not like the factor features. If you can give an example where the task creation fails, please reopen.

mohammadreza-sheykhmousa · 2020-09-28T09:41:19Z

Hi @mllg yes you're right! I solved the problem as the following:

gr = pipeline_robustify(tsk_rgr, lrn) %>>% po("learner", lrn)
      ede = resample(tsk_rgr, GraphLearner$new(gr), rsmp("holdout"))
      tsk_regr1 = ede$task$clone()

tnx to pipeline_robustify

raff-k · 2023-01-25T17:29:07Z

Unfortunately I have the same problem with mgcv::gam. Whereas the original model itself can process factors. The option with mlr3pipelines::po did not help me. Below is an example.

library(dplyr)
library(mlr3)
library(mlr3extralearners)
library(mlr3pipelines)
library(mgcv)

# Example from here:
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html

# ... get data
t <- mlr3::tsk("spam")
t_data <- t$data()

# ... (re-)create the task
t_re <- as_task_classif( 
  id = "spam", 
  target = "type", 
  positive = "spam",
  x = t_data
)

# ... init mgcv gam learner
l <- mlr3::lrn("classif.gam",
              formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu))

# ... train and get gam
l$train(t_re)
l$model # ... creates some output -> example of mlr3-page successfully reproduced


# NOW, the error due to the created factor
set.seed(123)
t_data_fac <- t_data %>%
        dplyr::mutate(fac = sample(x = c(1:3), size = nrow(t_data), replace = T) %>% as.factor())

# create task with additional factor variable
t_fac = as_task_classif( 
  id = "spam_fac", 
  target = "type", 
  positive = "spam",
  x = t_data_fac
)

# ... init mgcv gam learner with factor
l_fac <- mlr3::lrn("classif.gam",
              formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)


# ... train and get gam
l_fac$train(t_fac)

# Error: <TaskClassif:spam_fac> has the following unsupported feature types: factor

# Here is the page of mlr3-implementation of mgcv::gam
# https://mlr3extralearners.mlr-org.com/reference/mlr_learners_classif.gam.html
# Perhaps in Feature Types: “logical”, “integer”, “numeric” --> "factor" is missing?

# ... but normally, factors are no issues for mgcv::gam
l_gam <- mgcv::gam(formula = type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac, 
                 data = t_data_fac, family = "binomial")

l_gam %>% summary()

# Parametric coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 2.441e+03  2.479e+05   0.010    0.992
# fac2        4.675e-02  9.814e-02   0.476    0.634
# fac3        5.879e-02  9.689e-02   0.607    0.544



# Now, trying mlr3pipelines solution, but it does not work
l_fac_enc <- mlr3pipelines::po("encode")  %>>% 
  mlr3::lrn("classif.gam", formula =  type ~ s(george) + s(charDollar) + s(edu) + ti(george, edu) + fac)

l_fac_enc$train(input = t_fac)
# Error in eval(predvars, data, env) : object 'fac' not found
# This happened PipeOp classif.gam's $train()

giuseppec added Priority: Medium Type: Enhancement labels Dec 18, 2018

mllg closed this as completed in e383404 Dec 18, 2018

mllg reopened this Sep 24, 2020

mllg closed this as completed Sep 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid creating regression tasks for data with categorical targets #91

avoid creating regression tasks for data with categorical targets #91

giuseppec commented Dec 18, 2018

mohammadreza-sheykhmousa commented Aug 28, 2020 •

edited

Loading

frdanconia commented Sep 24, 2020

lebensterben commented Sep 24, 2020

mllg commented Sep 25, 2020

mllg commented Sep 25, 2020

mohammadreza-sheykhmousa commented Sep 28, 2020

raff-k commented Jan 25, 2023

avoid creating regression tasks for data with categorical targets #91

avoid creating regression tasks for data with categorical targets #91

Comments

giuseppec commented Dec 18, 2018

mohammadreza-sheykhmousa commented Aug 28, 2020 • edited Loading

frdanconia commented Sep 24, 2020

lebensterben commented Sep 24, 2020

mllg commented Sep 25, 2020

mllg commented Sep 25, 2020

mohammadreza-sheykhmousa commented Sep 28, 2020

raff-k commented Jan 25, 2023

mohammadreza-sheykhmousa commented Aug 28, 2020 •

edited

Loading