Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHReg predicted hazard ratios size do not match with test set size #3178

Open
EmanuelGoncalves opened this issue Aug 23, 2016 · 13 comments
Open

Comments

@EmanuelGoncalves
Copy link

The prediction of the hazard ratios with the PHReg model is somehow constrained to the size of the train set, is this correct?

I don't see this behaviour in LifeLines python package https://github.com/CamDavidsonPilon/lifelines.

Small working example bellow.

import numpy as np
from statsmodels.duration.hazard_regression import PHReg

# Data
n, p = 500, 5
exog = np.random.normal(size=(n, p))
lin_pred = exog.sum(1)
endog = -np.exp(-lin_pred)*np.log(np.random.uniform(size=n))


# Train model - 200 observations
model = PHReg(endog[:200], exog[:200]).fit()
print model.summary()


# Test model - 300 observations
hr_pred = model.predict(exog[200:], pred_type='hr')

print len(hr_pred.predicted_values)
# Output: 200 (it should be 300, the same size as the exog[200:])

Thank you,

@kshedden
Copy link
Contributor

kshedden commented Aug 24, 2016

Emanuel,

Thanks for the report. It looks like this is due to the argument order
being different in PHReg compared to some of the other predict
implementations. If you use named arguments as follows it should work:

hr_pred = model.predict(exog=exog[200:], pred_type='hr')

Unlike other predicts, in PHReg we allow people to pass in a new endog to
update the baseline hazard estimates. This endog is the first positional
argument to PHRegResults.predict.

@josef-pkt can comment on whether this is bug that needs to be fixed.

Kerby

@EmanuelGoncalves
Copy link
Author

Dear Kerby,

I was expecting exog to be the first argument of the predict function. Indeed, if I specify the argument name then the sizes match.

Thank you for your support,

@akravetz
Copy link

@josef-pkt I recently ran into this as well. I'm happy to submit a PR updating the parameter order if that's the correct solution

@josef-pkt
Copy link
Member

josef-pkt commented Nov 24, 2019

PHReg has a different order than PHRegResults

I think we should fix it in PHRegResults, but we need to find a way to raise a warning about the change in API, i.e. we cannot return silently different numbers if users used endog, exog as positional arguments.
In general, I wrote somewhere in the docs a warning that deprecation policy does not apply when users use keyword arguments as positional. But this case sounds too important to just change it.

And in spite of that warning, all our own examples use exog in predict as positional, so that case falls under backwards compatibility constraints.

@josef-pkt
Copy link
Member

related to deprecation:
If we change the behavior and users run a case for the new predict on an older statsmodels version, then this shouldn't cause (silently) incorrect results.

@kshedden
Copy link
Contributor

What if we just add a warning now, stating that the behavior is atypical and will change in a future version?

@josef-pkt
Copy link
Member

I started to look at python's inspect again, but it doesn't help to distinguish positional from keyword arguments in the call, parameters = inspect.getargvalues(inspect.currentframe())
so, not useful here

@bashtage
Copy link
Member

You just use

def func(*args, **kwargs)
    """
    func(x, y, z=1)

    Parameters
    ----------
    x : int
    y : float
    z : int, optional
    """

And then handle parsing args and kwargs, including warning if needed.

Or just make keyword arguments mandatory, which will break calls but is a simple fix, since we are Pyhotn 3.5+ now.

As a rule, new, complex functions should require keywords, and we should slowly convert many existing one to use this pattern, so

func(x, y, opt1=1, opt2=2, opt3="3", opt4=False)

should be

func(x, y, *, opt1=1, opt2=2, opt3="3", opt4=False)

@josef-pkt
Copy link
Member

The only way I see right now to find out what has been used as positional and what as keyword arguments is by using a logging decorator, that uses args and kwargs

@bashtage Do you know of any cases in other packages like pandas for how to change signature with backwards compatibility constraints?

@josef-pkt
Copy link
Member

@bashtage
I didn't see your message when I wrote my last one.

just using predict(*args, **kwargs) would make the signature uninformative
It would be possible for a version or two, but not nice

I haven't seen this
func(x, y, *, opt1=1, opt2=2, opt3="3", opt4=False)

pattern yet, so will have to look at it.

@bashtage
Copy link
Member

If you are only interested in positional inputs you can just intercept args:

func(x,y,z=1)

can be

func(x,y,*args,z=1)

so that is args is not empty, then you know it is z.

just using predict(*args, **kwargs) would make the signature uninformative

That is the point of the next line in the docstring which has the correct signature. THis is pucked up by numpydoc so that rendered docs are correct, and is shows whenever you func?

@josef-pkt
Copy link
Member

predict(*args, <old keywords>) looks good, simpler than a decorator and only minimal noise in signature

@josef-pkt
Copy link
Member

Hi @akravetz
Can you write a PR like the last comment?

Add *args
if pred_type in ["lhr", "hr"], then interpret args as exog if len(args) = 1
in other cases warn or raise.

we need unit test for this to verify which calling versions are allowed and warn or raise.

The target is that exog is allowed to be used as positional argument similar to many examples for other models, all other arguments will eventually have to be specified as keyword arguments by the user/caller.
During transition we warn if we can unambiguously identify positional arguments and raise if it is ambiguous between old and new version.
(some users that don't use keywords might get an exception after the change, but that should be a small set,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants