Add lp2distr and risk2distr compositor #31

RaphaelS1 · 2019-10-31T15:47:56Z

Using mlr3pipelines, add a compositor (node in pipelines) that takes three inputs: a learner, an estimator, a model form.

In mlr3pipelines notation this would look something like

compositor = po(“compositor”, estimator = “Breslow”, model = “PH”)
learner_po = po(“learner”, learner = lrn(“surv.rpart”))
graph = learner_po %>% compositor
glrn = GraphLearner$new(graph)

But I suggest some sugar of the form

composed_learner = composeSurvival(learner = lrn(“surv.rpart”),
                                   estimator = “Breslow”, model = “PH”)

Two open questions:

Should be two compositors, one for crank to distr the other for lp to distr. In the former case this may only make sense for PH models, whereas in the latter lp works for PH, AFT and PO models.
Should we only allow non-parametric baseline estimators such as Kaplan-Meier, or should we allow parametric ones too? The latter introduces complications as these are typically estimated using a full likelihood and the model coefficients at the same time, I am unsure if the maths of this works for the baseline only (@fkiraly ?)

The text was updated successfully, but these errors were encountered:

RaphaelS1 · 2019-11-02T10:11:21Z

Just to add two points to this:

This could either take the form of one node with:
1. Two inputs where one input is a survival learner and the other is either surv.kaplan or surv.nelson. Or
2. One input which is a survival learner and two parameters, for estimation method and model form (this is the one in the example above)
There are some models, such as random forests, which assume a PH form but do not return a distribution, therefore it does not make sense to allow users to compose this with an AFT model (e.g.). I am not sure if a) we should implement the composition in the learner and thereby force the distr return to be of a PH form, b) have some sort of look-up in the compositor to identify if the model is compatible with PH, or c) just document that the learner should only be paired with PH but the user is free to do what they want. The last is the most user-friendly and diplomatic, but can lead to potentially problematic models

RaphaelS1 · 2019-11-02T16:35:28Z

And to add to my thoughts here is some pseudo-code for how I envisage this functioning. Assume that the compositor is one PipeOp with one argument called form. The below is an example for form = PH.

Let h2 be the hazard function for learner 2 (lrn2) and lp1 be the linear predictor for learner 1 (lrn1), analogously for other abbreviations. lrn1 contains the relative risks and lrn2 is the baseline distribution

if (!missing(h2)){
  if(!missing(lp1))
    h(t) = h2(t)*exp(lp1)
  else
    h(t) = h2(t)*crank1
}

if (!missing(S2)){
  if(!missing(lp1))
    S(t) = S2(t)^exp(lp1)
  else
    S(t) = S2(t)^crank1
}

f(t) = try(h(t)*S(t))
F(t) = try(1 - S(t))
distr(pdf = f(t), cdf = F(t))

This makes the following assumptions:

lrn1 includes lp and/or crank in its prediction
If lp is included then this prioritises over crank
If lp is excluded then crank is a relative risk that can be plugged straight into the PH/AFT/PO equations

fkiraly · 2019-11-18T16:59:51Z

Given discussion last week (about PipeOps etc), I'm trying to give answers to the original questions.

the composition makes sense as a PipeOp as long as it mutates only the output type. It would store a baseline hazard estimate at fit, and then multiply this on crank or relative hazard predictions (obtained from lp) at predict, so this fits the current PipeOp paradigm. For more generic composition, an AutoTuner style wrapper might be more appropriate.
Where crank calibration strategies sit in the current mlr3(+pipelines) architectural paradigm is not entirely clear to me, as the composite may require additional fitting of a calibration functional, which doesn't seem to be a PipeOp as far as I can see.
As long as the model is classical PH, estimation of baseline hazard can be completely separated out from estimation of the proportional hazard, no matter whether it's parameteric or non-parametric or abstract black-box. More entangled model types may require MLE or EM-style algorithms, which aren't easily written as composites (unless you have a neural network update style composition architecture in place).

fkiraly · 2019-11-18T17:04:53Z

Regarding the second post: the question of what to do with generic models that return crank or lp not interpretable as relative risk in a PH model, there are maybe two subsidiary points to make:
(i) as long as evaluation is fair, it would return for bad user-defined composites that they are rubbish.
(ii) even if crank return etc does not define relative risk, you could still step a risk calibration step in-between and then use the PH assumption, or a conditional PH assumption.

I'd be for a variant of (c), the user can build all rubbish models they want, perhaps they aren't all rubbish, and the benchmarking workflow tells them how rubbish the models actually are.
Maybe combined with a model flag or trait that results in a warning "prediction is not a relative hazard, or model doesn't comply with PH assumption. Hence, use within a PH compositor may result in an underperforming model", if you use it in a PH compositor.

fkiraly · 2019-11-18T17:06:50Z

Regarding the compositor, third post: I'd be in favour of a more fine-grained user control on which output is used to make the distr return object, rather than hard-coding a case distinction or conditional flow that looks somewhat arbitrary. Of course what you coded could be the "sensible default".

RaphaelS1 added Priority: Medium Status: Available Type: Enhancement labels Oct 31, 2019

This was referenced Oct 31, 2019

Consider extending glms, esp. glmboost to other compositions #28

Closed

What type of PipeOp is composition? #35

Closed

RaphaelS1 mentioned this issue Nov 13, 2019

survivalsvm return type #13

Closed

RaphaelS1 closed this as completed Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lp2distr and risk2distr compositor #31

Add lp2distr and risk2distr compositor #31

RaphaelS1 commented Oct 31, 2019 •

edited

RaphaelS1 commented Nov 2, 2019

RaphaelS1 commented Nov 2, 2019

fkiraly commented Nov 18, 2019 •

edited

fkiraly commented Nov 18, 2019 •

edited

fkiraly commented Nov 18, 2019

Add lp2distr and risk2distr compositor #31

Add lp2distr and risk2distr compositor #31

Comments

RaphaelS1 commented Oct 31, 2019 • edited

RaphaelS1 commented Nov 2, 2019

RaphaelS1 commented Nov 2, 2019

fkiraly commented Nov 18, 2019 • edited

fkiraly commented Nov 18, 2019 • edited

fkiraly commented Nov 18, 2019

RaphaelS1 commented Oct 31, 2019 •

edited

fkiraly commented Nov 18, 2019 •

edited

fkiraly commented Nov 18, 2019 •

edited