ENH: PHReg formula improvements #1954

merged 2 commits into from Sep 26, 2014


None yet

4 participants

kshedden commented Sep 6, 2014

This PR allows extra arguments of array type to be passed to the PHReg from_formula method as variable names. The data are read out of the same data object that is used for the formula.

kshedden added some commits Sep 5, 2014
@kshedden kshedden Override from_formula so that extra array args can be passed by name 1d86697
@kshedden kshedden Test maintenance

Coverage Status

Coverage increased (+0.01%) when pulling 2ac71f3 on kshedden:phreg_formula into 9ce1605 on statsmodels:master.


looks good

I'm not sure or I haven't checked if the argument names are already "standard"

@jseabold jseabold commented on the diff Sep 21, 2014
@@ -282,6 +282,68 @@ def __init__(self, endog, exog, status=None, entry=None,
self.ties = ties
+ @classmethod
+ def from_formula(cls, formula, data, status=None, entry=None,
+ strata=None, offset=None, subset=None,
+ ties='breslow', missing='drop', *args, **kwargs):
jseabold Sep 21, 2014 Member

I don't feel particularly strongly about it but we have missing='none' as the default everywhere. The idea is not to do any unnecessary computations unless asked for.

josef-pkt Sep 21, 2014 Member

Since patsy 1.2 (IIRC) we have missing='drop' as default in from_formula because we don't prevent patsy from dropping missing rows.
We still have the open issues about conflicts between patsy's and our missing value handling.

@josef-pkt josef-pkt added this to the 0.6 milestone Sep 25, 2014
@josef-pkt josef-pkt commented on the diff Sep 25, 2014
@@ -130,14 +132,40 @@ def test_missing(self):
exog[10:15,:] = np.nan
md = PHReg(time, exog, status, missing='drop')
- assert(len(md.endog) == 185)
- assert(len(md.status) == 185)
- assert(all(md.exog.shape == np.r_[185,4]))
+ assert_allclose(len(md.endog), 185)
josef-pkt Sep 25, 2014 Member

integers can be tested with assert_equal (no floating point errors possible)


@kshedden @jseabold
About the name again. I was reading a survey article (for actuarians) about parametric proportional hazard models. I don't have a good overview of what other related models we might get.
Is PHReg too general as a name without Cox or Partial in the name?
Frailty models, parametric PH models, AFT accelerated failure time, ... I don't know how they all fit together.

We can keep PHReg for now as the name, but we might have to rename if we get additional similar models in future.


Ideally, we get the docstring for free in from_formula so we don't have to have it twice, but that will have to wait. Merging.

@jseabold jseabold merged commit 9f5b54d into statsmodels:master Sep 26, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
continuous-integration/travis-ci The Travis CI build passed
@kshedden kshedden deleted the kshedden:phreg_formula branch Sep 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment