ENH: PHReg formula improvements #1954

Merged
merged 2 commits into from Sep 26, 2014

Projects

None yet

4 participants

@kshedden
Contributor
kshedden commented Sep 6, 2014

This PR allows extra arguments of array type to be passed to the PHReg from_formula method as variable names. The data are read out of the same data object that is used for the formula.

kshedden added some commits Sep 5, 2014
@kshedden kshedden Override from_formula so that extra array args can be passed by name 1d86697
@kshedden kshedden Test maintenance
2ac71f3
@coveralls

Coverage Status

Coverage increased (+0.01%) when pulling 2ac71f3 on kshedden:phreg_formula into 9ce1605 on statsmodels:master.

@josef-pkt
Member

looks good

I'm not sure or I haven't checked if the argument names are already "standard"

@jseabold jseabold commented on the diff Sep 21, 2014
statsmodels/duration/hazard_regression.py
@@ -282,6 +282,68 @@ def __init__(self, endog, exog, status=None, entry=None,
self.ties = ties
+ @classmethod
+ def from_formula(cls, formula, data, status=None, entry=None,
+ strata=None, offset=None, subset=None,
+ ties='breslow', missing='drop', *args, **kwargs):
@jseabold
jseabold Sep 21, 2014 Member

I don't feel particularly strongly about it but we have missing='none' as the default everywhere. The idea is not to do any unnecessary computations unless asked for.

@josef-pkt
josef-pkt Sep 21, 2014 Member

Since patsy 1.2 (IIRC) we have missing='drop' as default in from_formula because we don't prevent patsy from dropping missing rows.
We still have the open issues about conflicts between patsy's and our missing value handling.

@josef-pkt josef-pkt added this to the 0.6 milestone Sep 25, 2014
@josef-pkt josef-pkt commented on the diff Sep 25, 2014
statsmodels/duration/tests/test_phreg.py
@@ -130,14 +132,40 @@ def test_missing(self):
exog[10:15,:] = np.nan
md = PHReg(time, exog, status, missing='drop')
- assert(len(md.endog) == 185)
- assert(len(md.status) == 185)
- assert(all(md.exog.shape == np.r_[185,4]))
+ assert_allclose(len(md.endog), 185)
@josef-pkt
josef-pkt Sep 25, 2014 Member

integers can be tested with assert_equal (no floating point errors possible)

@josef-pkt
Member

@kshedden @jseabold
About the name again. I was reading a survey article (for actuarians) about parametric proportional hazard models. I don't have a good overview of what other related models we might get.
Is PHReg too general as a name without Cox or Partial in the name?
Frailty models, parametric PH models, AFT accelerated failure time, ... I don't know how they all fit together.

We can keep PHReg for now as the name, but we might have to rename if we get additional similar models in future.

@jseabold
Member

Ideally, we get the docstring for free in from_formula so we don't have to have it twice, but that will have to wait. Merging.

@jseabold jseabold merged commit 9f5b54d into statsmodels:master Sep 26, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details
@kshedden kshedden deleted the kshedden:phreg_formula branch Sep 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment