Add predict helpers such as: predict_expectation, predict_percentile (similar to lifelines) #190

konradsemsch · 2021-03-22T15:23:39Z

Hi! I really like your package implementation and its compatibility with sklearn, but one thing that is refraining us from using it in our solution is the lack of the above predict function implementations (particularly predict_expectation). In our case, we need to know the expected number of days a given client is expected to remain a client. Ranking observations by the obtained "arbitrary" score will not work.

Given that the above function is based on the estimated survival functions it should be possible to implement (as long as I read the lifelines code correctly):

    def predict_expectation(self, X: DataFrame, conditional_after: Optional[ndarray] = None) -> pd.Series:
        r"""
        Compute the expected lifetime, :math:`E[T]`, using covariates X. This algorithm to compute the expectation is
        to use the fact that :math:`E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt`. To compute the integral, we use the trapezoidal rule to approximate the integral.
        Caution
        --------
        If the survival function doesn't converge to 0, then the expectation is really infinity and the returned
        values are meaningless/too large. In that case, using ``predict_median`` or ``predict_percentile`` would be better.
        Parameters
        ----------
        X: numpy array or DataFrame
            a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns
            can be in any order. If a numpy array, columns must be in the
            same order as the training data.
        conditional_after: iterable, optional
            Must be equal is size to X.shape[0] (denoted `n` above).  An iterable (array, list, series) of possibly non-zero values that represent how long the
            subject has already lived for. Ex: if :math:`T` is the unknown event time, then this represents :math:`s` in
            :math:`T | T > s`. This is useful for knowing the *remaining* hazard/survival of censored subjects.
            The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
        Notes
        -----
        If X is a DataFrame, the order of the columns do not matter. But
        if X is an array, then the column ordering is assumed to be the
        same as the training dataset.
        See Also
        --------
        predict_median
        predict_percentile
        """
        subjects = utils._get_index(X)
        v = self.predict_survival_function(X, conditional_after=conditional_after)[subjects]
        return pd.Series(trapz(v.values.T, v.index), index=subjects)

Reference to the lifelines implementation:

sebp · 2021-03-25T07:57:18Z

It is correct that the expected survival time can be estimated with the integral under the survival function. However, this only leads to a reasonable estimate if the last observed time point corresponds to an event such that the survival function is zero thereafter. If that's not the case, the expected survival time is undefined without making any additional assumptions.

Typical assumptions are (i) assuming the remaining instances experience an event directly after the last observed time point (S(t) becomes zero), (ii) assuming the remaining instances never experience an event (S(t) remains constant), (iii) assuming a parametric form, e.g. exponential curve, at the tail. Finally, estimating the restricted mean survival time, i.e., an expectation over a pre-defined interval rather than all of time, is from a practical perspective the most reasonable.

I don't know how lifelines handles this, but I think, when implementing this, it is important to make it explicit that estimating the mean survival time can be tricky and avoid giving an answer for which the underlying assumptions are not clear.

konradsemsch · 2021-03-25T11:53:54Z

Hi @sebp and thanks for coming back on this! Yes, there's a note in the lifelines documentation referring to what you have mentioned:

In general, I would consider this a really desirable feature in one of the future releases. I find your package much better integrated with the sklearn ecosystem, but the lack of these prediction helpers prevents us from using it. An additional point in favour of implementing this is that when I check other package implementations, such as the quite famous survival package in R, it also provides an option to predict the expected survival time.

I'd be happy to contribute here with a PR 😉

sebp · 2021-03-25T12:34:25Z

If you can contribute with a PR, that would be even better.

PragyanSubedi · 2021-04-26T08:29:09Z

Any updates on this issue for implementation? @sebp @konradsemsch

sebp · 2021-05-02T19:55:16Z

@PragyanSubedi I did not have time to look into it, I'm afraid.

konradsemsch · 2021-05-03T07:57:50Z

@PragyanSubedi - I've started looking into that. I should have a first PR draft in a week.

konradsemsch · 2021-05-15T07:26:19Z

@sebp - I added a basic first PR: #196. Please take a look at it and let me know if you agree. Once we have that, I'd move further with implementing tests and adding this helper to other models as well.

bobbyx27 · 2022-04-26T09:34:58Z

Hi, @sebp : I wonder why not considering a predict_hazard_function in addition to predict_cumulative_hazard_function ? Or is there an obvious and simple way to get it, that I'm missing... Thank you very much for all the great work!

sebp · 2022-04-26T16:17:42Z

By definition, it would be the derivative of the cumulative hazard function. In general, directly estimating the hazard function is tricky. I'm curious, what would you like to use the hazard function for?

tomikrogerus · 2022-05-03T08:51:54Z

Hi, @konradsemsch and @sebp are you still working with this predict_expectation function? Would be really interested about it.

sebp · 2022-05-04T10:59:48Z

@tomikrogerus I'm currently not working on it. PR #196 is inactive too.

sebp mentioned this issue Apr 1, 2021

What is the origin of time for predict function? #94

Closed

sebp added the enhancement label Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add predict helpers such as: predict_expectation, predict_percentile (similar to lifelines) #190

Add predict helpers such as: predict_expectation, predict_percentile (similar to lifelines) #190

konradsemsch commented Mar 22, 2021

sebp commented Mar 25, 2021

konradsemsch commented Mar 25, 2021 •

edited

sebp commented Mar 25, 2021 via email

PragyanSubedi commented Apr 26, 2021

sebp commented May 2, 2021

konradsemsch commented May 3, 2021

konradsemsch commented May 15, 2021

bobbyx27 commented Apr 26, 2022

sebp commented Apr 26, 2022 via email

tomikrogerus commented May 3, 2022 •

edited

sebp commented May 4, 2022

Add predict helpers such as: predict_expectation, predict_percentile (similar to lifelines) #190

Add predict helpers such as: predict_expectation, predict_percentile (similar to lifelines) #190

Comments

konradsemsch commented Mar 22, 2021

sebp commented Mar 25, 2021

konradsemsch commented Mar 25, 2021 • edited

sebp commented Mar 25, 2021 via email

PragyanSubedi commented Apr 26, 2021

sebp commented May 2, 2021

konradsemsch commented May 3, 2021

konradsemsch commented May 15, 2021

bobbyx27 commented Apr 26, 2022

sebp commented Apr 26, 2022 via email

tomikrogerus commented May 3, 2022 • edited

sebp commented May 4, 2022

konradsemsch commented Mar 25, 2021 •

edited

tomikrogerus commented May 3, 2022 •

edited