PHReg predicted hazard ratios size do not match with test set size #3178

EmanuelGoncalves · 2016-08-23T14:31:58Z

The prediction of the hazard ratios with the PHReg model is somehow constrained to the size of the train set, is this correct?

I don't see this behaviour in LifeLines python package https://github.com/CamDavidsonPilon/lifelines.

Small working example bellow.

import numpy as np
from statsmodels.duration.hazard_regression import PHReg

# Data
n, p = 500, 5
exog = np.random.normal(size=(n, p))
lin_pred = exog.sum(1)
endog = -np.exp(-lin_pred)*np.log(np.random.uniform(size=n))


# Train model - 200 observations
model = PHReg(endog[:200], exog[:200]).fit()
print model.summary()


# Test model - 300 observations
hr_pred = model.predict(exog[200:], pred_type='hr')

print len(hr_pred.predicted_values)
# Output: 200 (it should be 300, the same size as the exog[200:])

Thank you,

The text was updated successfully, but these errors were encountered:

kshedden · 2016-08-24T02:48:24Z

Emanuel,

Thanks for the report. It looks like this is due to the argument order
being different in PHReg compared to some of the other predict
implementations. If you use named arguments as follows it should work:

hr_pred = model.predict(exog=exog[200:], pred_type='hr')

Unlike other predicts, in PHReg we allow people to pass in a new endog to
update the baseline hazard estimates. This endog is the first positional
argument to PHRegResults.predict.

@josef-pkt can comment on whether this is bug that needs to be fixed.

Kerby

EmanuelGoncalves · 2016-08-24T10:41:20Z

Dear Kerby,

I was expecting exog to be the first argument of the predict function. Indeed, if I specify the argument name then the sizes match.

Thank you for your support,

akravetz · 2019-11-24T13:42:37Z

@josef-pkt I recently ran into this as well. I'm happy to submit a PR updating the parameter order if that's the correct solution

josef-pkt · 2019-11-24T14:09:38Z

PHReg has a different order than PHRegResults

I think we should fix it in PHRegResults, but we need to find a way to raise a warning about the change in API, i.e. we cannot return silently different numbers if users used endog, exog as positional arguments.
In general, I wrote somewhere in the docs a warning that deprecation policy does not apply when users use keyword arguments as positional. But this case sounds too important to just change it.

And in spite of that warning, all our own examples use exog in predict as positional, so that case falls under backwards compatibility constraints.

josef-pkt · 2019-11-24T14:39:57Z

related to deprecation:
If we change the behavior and users run a case for the new predict on an older statsmodels version, then this shouldn't cause (silently) incorrect results.

kshedden · 2019-11-24T15:32:09Z

What if we just add a warning now, stating that the behavior is atypical and will change in a future version?

josef-pkt · 2019-11-24T15:32:39Z

I started to look at python's inspect again, but it doesn't help to distinguish positional from keyword arguments in the call, parameters = inspect.getargvalues(inspect.currentframe())
so, not useful here

bashtage · 2019-11-24T16:32:54Z

You just use

def func(*args, **kwargs)
    """
    func(x, y, z=1)

    Parameters
    ----------
    x : int
    y : float
    z : int, optional
    """

And then handle parsing args and kwargs, including warning if needed.

Or just make keyword arguments mandatory, which will break calls but is a simple fix, since we are Pyhotn 3.5+ now.

As a rule, new, complex functions should require keywords, and we should slowly convert many existing one to use this pattern, so

func(x, y, opt1=1, opt2=2, opt3="3", opt4=False)

should be

func(x, y, *, opt1=1, opt2=2, opt3="3", opt4=False)

josef-pkt · 2019-11-24T16:35:07Z

The only way I see right now to find out what has been used as positional and what as keyword arguments is by using a logging decorator, that uses args and kwargs

@bashtage Do you know of any cases in other packages like pandas for how to change signature with backwards compatibility constraints?

josef-pkt · 2019-11-24T16:39:20Z

@bashtage
I didn't see your message when I wrote my last one.

just using predict(*args, **kwargs) would make the signature uninformative
It would be possible for a version or two, but not nice

I haven't seen this
func(x, y, *, opt1=1, opt2=2, opt3="3", opt4=False)

pattern yet, so will have to look at it.

bashtage · 2019-11-24T16:45:34Z

If you are only interested in positional inputs you can just intercept args:

func(x,y,z=1)

can be

func(x,y,*args,z=1)

so that is args is not empty, then you know it is z.

just using predict(*args, **kwargs) would make the signature uninformative

That is the point of the next line in the docstring which has the correct signature. THis is pucked up by numpydoc so that rendered docs are correct, and is shows whenever you func?

josef-pkt · 2019-11-24T16:57:06Z

predict(*args, <old keywords>) looks good, simpler than a decorator and only minimal noise in signature

josef-pkt · 2019-11-24T21:51:32Z

Hi @akravetz
Can you write a PR like the last comment?

Add *args
if pred_type in ["lhr", "hr"], then interpret args as exog if len(args) = 1
in other cases warn or raise.

we need unit test for this to verify which calling versions are allowed and warn or raise.

The target is that exog is allowed to be used as positional argument similar to many examples for other models, all other arguments will eventually have to be specified as keyword arguments by the user/caller.
During transition we warn if we can unambiguously identify positional arguments and raise if it is ambiguous between old and new version.
(some users that don't use keywords might get an exception after the change, but that should be a small set,)

josef-pkt added comp-survey type-refactor backwards-incompat labels Oct 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHReg predicted hazard ratios size do not match with test set size #3178

PHReg predicted hazard ratios size do not match with test set size #3178

EmanuelGoncalves commented Aug 23, 2016

kshedden commented Aug 24, 2016 •

edited by josef-pkt

Loading

EmanuelGoncalves commented Aug 24, 2016

akravetz commented Nov 24, 2019

josef-pkt commented Nov 24, 2019 •

edited

Loading

josef-pkt commented Nov 24, 2019

kshedden commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

bashtage commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

bashtage commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

PHReg predicted hazard ratios size do not match with test set size #3178

PHReg predicted hazard ratios size do not match with test set size #3178

Comments

EmanuelGoncalves commented Aug 23, 2016

kshedden commented Aug 24, 2016 • edited by josef-pkt Loading

EmanuelGoncalves commented Aug 24, 2016

akravetz commented Nov 24, 2019

josef-pkt commented Nov 24, 2019 • edited Loading

josef-pkt commented Nov 24, 2019

kshedden commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

bashtage commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

bashtage commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

josef-pkt commented Nov 24, 2019

kshedden commented Aug 24, 2016 •

edited by josef-pkt

Loading

josef-pkt commented Nov 24, 2019 •

edited

Loading