hyperband and non-linear runtime scale #13

berndbischl · 2019-10-01T18:41:08Z

i am pretty sure, all of the HB formulas dont hold anymore if the tuned algo does not scale linear in runtime with eta.
this is especially true if we use the subsampling trick. we at least need to document this, but also maybe adapt a bit.

this is a more complicated issue, and needs to be discussed in the team
of course, would be great if people already post observations and thoughts here

SebGGruber · 2019-10-08T08:32:15Z

Documenting appropriately is a no-brainer - I already did that today.
But maybe give an additional argument specifying the runtime complexity of the learner w.r.t. the chosen budget in big-o notation and update the HB formula accordingly, so the first and the last bracket require approximately the same runtime again.

SebGGruber · 2019-10-08T10:08:56Z

I had some further thoughts about it and if we simply apply the inverse of the complexity on each budget, we should end up with the same budget across all brackets again.
A simple example for O(f(budget)) with f(budget) = log(budget) :

The inverse then would be g := f^(-1) = exp

Assume we are using R = 2 and eta = 2, the brackets layout would look like this:

 bracket | bracket_stage | budget | n.configs | "real" runtime (ignoring constants) = f(budget)
    1             1           1          2          log(1) = 0
    1             2           2          1          log(2) = 0.693
    2             1           2          2          log(2) = 0.693

In this example, the usual hyperband algorithm always takes a lot longer to run (theoretically) in the last bracket compared to the first ones.
How to fix this? Apply the inverse of the complexity function on each budget in each bracket stage:

 bracket | bracket_stage | budget = g(budget_old) | n.configs | "real" runtime (ignoring constants)
   1              1            exp(1) = 0               2           log(0) = 1
   1              2            exp(2) = 7.39            1           log(7.39) = 2
   2              1            exp(2) = 7.39            2           log(7.39) = 2

Now, the sum of the budgets is (approximately) equal across all brackets, again.

The only issue I can see here is, that the user specifies R = 2, but the algorithm then uses a budget of 7.39 for the last bracket stage. This is misleading? But at least the budgets are scaled correctly

SebGGruber · 2019-11-12T19:23:23Z

should work with the param trafo.
I added an example to the "examples" section of the class doc.
It's a dumb example, but it shows how it would work and I didnt know better

set.seed(123)
ps = ParamSet$new(list(
  ParamInt$new("nrounds", lower = 1, upper = 10, tags = "budget"),
  ParamDbl$new("eta",     lower = 0, upper = 1),
  ParamFct$new("booster", levels = c("gbtree", "gblinear", "dart"))
))

ps$trafo = function(x, param_set) {
  x$nrounds = round(log(x$nrounds)) + 1L
  return(x)
}

inst = TuningInstance$new(
   tsk("iris"),
   lrn("classif.xgboost"),
   rsmp("holdout"),
   msr("classif.ce"),
   ps,
   term("evals", n_evals = 100000)
)
tuner = TunerHyperband$new(eta = 2)
tuner$tune(inst)
inst$best()

juliambr · 2020-05-25T12:51:09Z

From a theoretical perspective, I think this issue is not a problem. The only assumption the hyperband paper makes is that the algorithm performance converges with a budget parameter going to infty.

"Adapting for convergence rates" is another topic, and is also given as outlook in the original hyperband paper.
The issue can be closed IMO, but another interesting research question can be opened.

SebGGruber self-assigned this Oct 8, 2019

SebGGruber added the ready for review label Nov 12, 2019

jakob-r removed the ready for review label May 20, 2020

juliambr added Status: Revision Needed Status: Review Needed and removed Status: Revision Needed labels May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyperband and non-linear runtime scale #13

hyperband and non-linear runtime scale #13

berndbischl commented Oct 1, 2019

SebGGruber commented Oct 8, 2019 •

edited

Loading

SebGGruber commented Oct 8, 2019 •

edited

Loading

SebGGruber commented Nov 12, 2019 •

edited

Loading

juliambr commented May 25, 2020 •

edited

Loading

hyperband and non-linear runtime scale #13

hyperband and non-linear runtime scale #13

Comments

berndbischl commented Oct 1, 2019

SebGGruber commented Oct 8, 2019 • edited Loading

SebGGruber commented Oct 8, 2019 • edited Loading

SebGGruber commented Nov 12, 2019 • edited Loading

juliambr commented May 25, 2020 • edited Loading

SebGGruber commented Oct 8, 2019 •

edited

Loading

SebGGruber commented Oct 8, 2019 •

edited

Loading

SebGGruber commented Nov 12, 2019 •

edited

Loading

juliambr commented May 25, 2020 •

edited

Loading