Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes we reduce variables to only 1 factor level - but not all learners can work with such variables #21

Closed
danielhorn opened this issue Jan 30, 2014 · 22 comments
Assignees
Milestone

Comments

@danielhorn
Copy link
Collaborator

danielhorn commented Jan 30, 2014

Example with kknn:

library(mlrMBO)
fun = function(x)
  sin(x$num2) + ifelse(x$disc1 == "a", sin(x$num1), 0)
ps = makeParamSet(
  makeDiscreteParam("disc1", values = c("a", "b")),
  makeNumericParam("num1", lower = 0, upper = 1, 
                   requires = quote(disc1 == "a")),
  makeNumericParam("num2", lower = 0, upper = 1)
)

res = mbo(fun, ps,
           learner = makeBaggingWrapper(makeLearner("regr.kknn"), 10L, predict.type = "se"), 
           control = makeMBOControl( init.design.points = 20,
                                     iters = 10,
                                     infill.crit = "ei"))

I can think of 3 possible solutions:

  1. Guarentee inside mlrMBO (in the focus search), that every variable has at least 2 factor level
  2. Inside mlrMBO befor learning the model, remove variables with only 1 level
  3. Force the user to use a preproc wrapper for their learner, which removes variables with only 1 factor level
@berndbischl
Copy link
Member

Daniel, doesnt this only concern PREDICTION?
So what you say in 2) makes no real sense? ("learning")
Because in focussearch we never train a model?

And if so, I think I already took care of this in mlr in a general way so it should not happen anymore?

@danielhorn
Copy link
Collaborator Author

Yes, 2) does not make real sense. I think, we allready discussed this a while ago.

I will test it again on monday with Karin

@berndbischl
Copy link
Member

If you test it, whatever the result, please add a uni test!

@danielhorn
Copy link
Collaborator Author

Tested and unit test added.

We found both test cases, that work fine and that fail. Look at e0bab5e

@danielhorn
Copy link
Collaborator Author

danielhorn commented Mar 31, 2014

Have a look at these examples. The first one succeeds, the second one produces an error. The only difference is the second numeric param in the second function.

library(mlrMBO)

f1 = function(x)
  ifelse(x$disc1 == "a", 2 * x$num1 - 1, 1 - x$num2)
ps1 = makeParamSet(
  makeNumericParam("num1", lower = -2, upper = 1),
  makeNumericParam("num2", lower = -1, upper = 2),
  makeDiscreteParam("disc1", values = c("a", "b"))
)

f2 = function(x)
  ifelse(x$disc1 == "a", 2 * x$num1 - 1, 1 - x$num1)
ps2 = makeParamSet(
  makeDiscreteParam("disc1", values = c("a", "b")),
  makeNumericParam("num1", lower = 0, upper = 1)
)

ctrl = makeMBOControl(iters = 2, init.design.points = 10, infill.opt.focussearch.points = 100)
lrn = makeLearner("regr.kknn")
mbo(f1, ps1, learner = lrn, control = ctrl)
mbo(f2, ps2, learner = lrn, control = ctrl)

@jakobbossek
Copy link
Contributor

What is the status here? At the moment both examples of Daniels previous post work fine.

@danielhorn
Copy link
Collaborator Author

I'm not sure if I am missing something at the moment, but have a look at this example:

library(mlrMBO)
par.set = makeParamSet(
  makeNumericVectorParam("x", len = 5, lower = 0, upper = 1),
  makeDiscreteParam("z", values = 1:10)
)
f = function(x) sum(x$x) + as.numeric(x$z)
learner = makeBaggingWrapper(makeLearner("regr.lm"), 2L)
learner = setPredictType(learner, "se")
control =  makeMBOControl( init.design.points = 5L, iters = 2L, save.on.disk.at = numeric(0L))
control = setMBOControlInfill(control, crit = "ei")
res = mbo(f, par.set, learner = learner, control = control)

@jakobbossek
Copy link
Contributor

Ok, this one fails.

@KarinSchork
Copy link
Contributor

KarinSchork commented Aug 18, 2014

I also found some cases which fail or produce warnings:

library(mlrMBO)

fun = function(x) {
  ifelse(x$disc1 == "a", (x$num1-0.3)^2*(x$num1+2)*(x$num1+4)*(x$num1+0.1), (x$num1+0.2)^2*(x$num1-1.1)^2)
}
ps = makeParamSet(
  makeDiscreteParam("disc1", values = c("a", "b")),
  makeNumericParam("num1", lower = 0, upper = 1)
)

learner1 = makeBaggingWrapper(makeLearner("regr.lm"), bw.iters = 10L)
learner1 = setPredictType(learner1, "se")
learner2 = makeBaggingWrapper(makeLearner("regr.blackboost"), bw.iters = 10L)
learner2 = setPredictType(learner2, "se")
learner3 = makeBaggingWrapper(makeLearner("regr.mob"), bw.iters = 10L)
learner3 = setPredictType(learner3, "se")
learner4 = makeBaggingWrapper(makeLearner("regr.crs"), bw.iters = 10L)
learner4 = setPredictType(learner4, "se")


controlMBO = makeMBOControl(init.design.points = 10, 
  iters = 5, save.on.disk.at = numeric(0L))
setMBOControlInfill(controlMBO, crit = "lcb", opt = "focussearch")

set.seed(2274)
mbo(fun, ps, learner = learner1, control = controlMBO)
set.seed(2274)
mbo(fun, ps, learner = learner2, control = controlMBO)
set.seed(2274)
mbo(fun, ps, learner = learner3, control = controlMBO)
set.seed(2274)
mbo(fun, ps, learner = learner4, control = controlMBO)

@berndbischl
Copy link
Member

@jakobbossek
Please check if this is all tested and works.

If so we can close

@mllg
Copy link
Member

mllg commented Dec 11, 2014

learner3 triggered something mob specific and should be fixed now.

@jakobbossek
Copy link
Contributor

learner2 caused an error because of no propose.time argument in extras on model fail. Is fixed now.

@berndbischl
Copy link
Member

The warning we see in learner1 seems to be simply bad luck. In the bagging we select the "disc1" only in rows where it is "b".

We possibly want to stratify on the factors, but this might be hard....

@berndbischl
Copy link
Member

An easier option here would be to simply learn on the non-constant features. But then we get problems in predict. We can handle this via a preproc wrapper. We simply store what was constant in training (+removed) and remove it also in prediction. Maybe this is best for now.

@berndbischl berndbischl added this to the v0.1 milestone Feb 10, 2016
@jakob-r
Copy link
Member

jakob-r commented Feb 10, 2016

We agreed, that we want to check the initial design if all factor level are present. If not an error is thrown. This is an easy fast fix.

@jakobbossek
Copy link
Contributor

How does this solve the problem in cases where we use bagging / a bagging wrapper?
Even if all factors levels are covered for all discrete parameters in the initial design the training sets which are generated for bagging might be awkward and contain only a single factor level.

@jakobbossek
Copy link
Contributor

ping

@berndbischl
Copy link
Member

i think somebody needs to summarize the status here.
so we still have examples that fails? how important are they?

i do have a general solution for all of these problems, i think, using the new vtreat package. but we cannot do that now, and should do that in mlr

@ja-thomas
Copy link
Contributor

Just rerun the examples


f = function(x) {
  ifelse(x$disc1 == "a", (x$num1-0.3)^2*(x$num1+2)*(x$num1+4)*(x$num1+0.1), (x$num1+0.2)^2*(x$num1-1.1)^2)
}
ps = makeParamSet(
  makeDiscreteParam("disc1", values = c("a", "b")),
  makeNumericParam("num1", lower = 0, upper = 1)
)


fun = makeSingleObjectiveFunction(fn = f, par.set = ps, has.simple.signature = FALSE) 

learner1 = makeBaggingWrapper(makeLearner("regr.lm"), bw.iters = 10L)
learner1 = setPredictType(learner1, "se")
learner2 = makeBaggingWrapper(makeLearner("regr.blackboost"), bw.iters = 10L)
learner2 = setPredictType(learner2, "se")
learner3 = makeBaggingWrapper(makeLearner("regr.mob"), bw.iters = 10L)
learner3 = setPredictType(learner3, "se")
learner4 = makeBaggingWrapper(makeLearner("regr.crs"), bw.iters = 10L)
learner4 = setPredictType(learner4, "se")


controlMBO = makeMBOControl()
controlMBO = setMBOControlTermination(controlMBO, iters = 5)
controlMBO = setMBOControlInfill(controlMBO, crit = "cb", opt = "focussearch")

des = generateDesign(n = 10, ps)

set.seed(2274)
mbo(fun, des, learner = learner1, control = controlMBO) # fails
 > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels 

set.seed(2274)
mbo(fun, des, learner = learner2, control = controlMBO) # works
set.seed(2274)
mbo(fun, des, learner = learner3, control = controlMBO) # fails
>  Error in trainLearner.regr.mob(.learner = list(id = "regr.mob", type = "regr",  : 
  Failed to fit party::mob. Some coefficients are estimated as NA 

set.seed(2274)
mbo(fun, des, learner = learner4, control = controlMBO) # warnings
> There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
> Warning messages:
> 1: In krscvNOMAD(xz = xz, y = y, degree.max = degree.max,  ... :
   optimal degree equals search maximum (3): rerun with larger degree.max

1 and 3 fail, 2 runs, 4 gives warnings.

So it depends on the learner and a general fix has to be in mlr, I'm not sure if we can solve it directly in MBO

@mllg
Copy link
Member

mllg commented Nov 22, 2016

Now all example works and I struggle to reproduce something. This was maybe resolved during code cleanup (I've added some drop=FALSE, replaced sapply with vapply etc.).

If someone has a working example, please post.

@ja-thomas
Copy link
Contributor

I would suggest we close here until someone can produce a similar problem again.

ok? @berndbischl @jakobbossek @mllg @jakob-r

Otherwise @berndbischl can/should have a look

@mllg
Copy link
Member

mllg commented Jan 2, 2017

Agreed.

@mllg mllg closed this as completed Jan 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants