BUG/ENH fix llnull, extra kwds to recreate model #1610

Merged
merged 5 commits into from Apr 21, 2014

Projects

None yet

3 participants

@josef-pkt
Member

This adds _init_keys to base.Model and fixes llnul in discrete models

includes one test precision adjustment for TestProbitCG.test_conf_int (use relative precision)

llnull
BUG #1221 missing exposure in llnul of countmodels
BUG #1608 missing loglike_method in llnul of NegativeBinomial

recreating models see #1093

no test yet for nb1 versus nb2 with exposure

@coveralls

Coverage Status

Coverage remained the same when pulling 55fa677 on josef-pkt:clone_fix_llnull into e7e018a on statsmodels:master.

@jseabold jseabold and 1 other commented on an outdated diff Apr 20, 2014
statsmodels/discrete/discrete_model.py
model = self.model
#TODO: what parameters to pass to fit?
- null = model.__class__(model.endog, np.ones(self.nobs)).fit(disp=0)
- return null.llf
+ mod_null = model.__class__(model.endog, np.ones(self.nobs), **kwds)
@jseabold
jseabold Apr 20, 2014 Member

Couldn't you just attach exposure and offset here without going through the init_keys business?

@josef-pkt
josef-pkt Apr 20, 2014 Member

I don't like adding ifs to a superclass that checks attributes that are only available in some subclasses.
I thought about just adding the method to the CountModels.

But I want a generic solution. Here I don't need any special code for loglike_method in NegativeBinomial except registering the keyword.
plus, we will need the same for other things see #1093 (e.g. diagnostic tests)

@jseabold
jseabold Apr 20, 2014 Member

You have the generic solution, but you don't need it. I'm not saying take out init_keys, but you don't really need it here to check for parameters that are attached as attributes already.

@josef-pkt
josef-pkt Apr 20, 2014 Member

We need it. Generically, we don't know which attributes have been attached from the model.__init__ signature.
This should also automatically catch weights in WLS, rho in GLSAR, ...
I got started with this a few days ago because I added an attribute in a Poisson subclass that broke llnul called by summary(). I fixed that in a different way, but it's going to happen all the time.

The alternative to a generic solution, requires lot's of code that handles special cases.

@jseabold
jseabold Apr 20, 2014 Member

Thinking about it a bit, it's really not all that generic. You still need to know what keys you're looking for in _init_keys.

@jseabold
jseabold Apr 20, 2014 Member

All _init_keys does right now is give you an intermediary way to get stuff that was given to __init__ and attached. You're still writing special case code as far as I can see. I'd prefer to fix this bug and work on _init_keys and use cases in a separate PR.

@josef-pkt
josef-pkt Apr 20, 2014 Member

I like it this way, because it's a good test case for _init_keys. I'm catching both exposure and offset which are handed off to super base.Model and for loglike_method that needs to be added in the class.

I don't see a reason to replace this with several special cases that would have to be removed again in the next PR.
I can add a link_test to this PR (or in the next) where I need exactly the same generic code or lot's of special cases: #1479 (comment)

@josef-pkt
josef-pkt Apr 20, 2014 Member

The basic idea is that the class or instance has to tell us what __init__ arguments are required. Then any code that needs to create it can just use the kwds.

for the design
I didn't want to keep a reference to the full data in _init_keys. However, I will most likely write a method get_init_keys that returns the full dict. (essentially the part starting at line 2143)

@jseabold
jseabold Apr 20, 2014 Member

Well, the reason is so that we can discuss design of a new feature in the appropriate PR not in the bug-fix PR. Why don't you add a _make_init to base.Model so you don't have to write the dict(((key... code. I also wonder if there's not a way to do this with __new__.

@josef-pkt
josef-pkt Apr 20, 2014 Member

We can still refactor later, all this does it to add one private attribute that we need to have no matter what, even if we start to use __new__ (Either we write a generic __new__ or have to add it to most classes, and then it still won't work if a subclass adds a new keyword argument.)
adding _make_init to base.Model is what I meant with a get_init_keys method. get_init_kwds or something like that.

This PR, so far, only makes the minimum changes to get a generic solution for 2 bugs.

@josef-pkt
josef-pkt Apr 20, 2014 Member

In a follow-up I can add the generic unit tests in base that checks that a model is correctly recreated. But I will do this "enhancement" after fixing exposure in NegativeBinomial #1611, so I can activate the test case in here.

@jseabold
jseabold Apr 20, 2014 Member

I wrote that before I saw your reply.

Well, yes, this is my point about why we don't need to use _init_keys here if we're going to have to refactor and think about design later anyway.

Regardless, I still think the checks should be moved to the appropriate subclasses.

@josef-pkt
josef-pkt Apr 20, 2014 Member

Refactoring will only change the first part in the generic discrete llnull, replacing it by a method call.
Whichever way we refine the "model cloning", the second part remains a generic llnull solution, and we don't need to add it to Poisson and NegativeBinomial explicitly.

@jseabold
jseabold Apr 20, 2014 Member

Why check for an exposure in models that you know can't/won't have one?

@josef-pkt
josef-pkt Apr 20, 2014 Member

because I think #1609 is a bug, and I just work around it until that is fixed. I don't want to change the design to handle an unrelated problem. (or we get a snowball effect of designing for workarounds)

@jseabold
jseabold Apr 20, 2014 Member

Yes that's a bug. My point is that only CountModels will have an exposure, so it should be a CountModel 'feature'. NegativeBinomial came after Poisson, so there's not the right amount of code sharing yet in CountModel.

@josef-pkt
josef-pkt Apr 20, 2014 Member

The nice point of _init_keys is that we don't care what keywords names where added.
Whether they are exposure or weights or loglike_method or fixed_params_mask or order or ..., the code is generic as long as the original argument value (after data handling (*)) has been attached to the model instance as an attribute.

(*) to get ahead: I'm thinking of this for internal usage as for the use cases in #1093 where we don't need the formula or pandas metainformation.
I haven't seen or thought about any use case that might need pandas or formula information. That' much further down the road if we need it.

@jseabold
jseabold Apr 20, 2014 Member

I understand the reason for _init_keys. But you've hard-coded exposure here. That's not generic. Admittedly, it's because we log it in the __init__ and store the logged version. So...either fix that as part of this PR too (and add logs whenever exposure is used / fix the hard-coded exposure or offset only) or only check for exposure in CountModel. That's my main point. My other point is that if you're going to introduce those changes in this (bug fix) PR, then go the whole nine yards and implement the _get_keys or whatever. This could've been a quick two line fix to close the bug but now I'd like to see the other issues fixed in this PR, if we're going to start the other changes.

@josef-pkt
josef-pkt Apr 20, 2014 Member

I don't want to fix log(exposure) in this PR. it's not backwards compatible and needs better design (for example merge _offset_all = np.exp(exposure) + offset to avoid the repeated sum).

@jseabold
jseabold Apr 20, 2014 Member

Why is it not backwards compatible? Nothing the user does would changed. I doubt there are people out there with pickled exposure variables, but we could add a warning.

This PR either 1) fixes the bug its for only 2) Goes all the way and solve 3-4 outstanding, related issues or 3) Solves 1.5 issues and adds a hack for checking exposure in models where we don't necessarily need to.

Right now it's 3, but it'd make more sense to do 1 or 2 I think. In any case, we've talked about this PR more than it warrants.

@jseabold jseabold commented on an outdated diff Apr 20, 2014
statsmodels/discrete/discrete_model.py
@@ -2137,10 +2140,21 @@ def llr_pvalue(self):
@cache_readonly
def llnull(self):
+ # TODO make this generally available see #1093
+ if self.model._init_keys:
+ kwds = dict(((key, getattr(self.model, key, None))
+ for key in self.model._init_keys))
+
+ #TODO: workaround for #1609, find "generic" solution
+ if 'exposure' in kwds and kwds['exposure'] is not None:
+ kwds['exposure'] = np.exp(kwds['exposure'])
@jseabold
jseabold Apr 20, 2014 Member

I think we should only do these checks in the appropriate subclass. Exposure should only be in count models. Offset could be in any.

@josef-pkt josef-pkt added a commit to josef-pkt/statsmodels that referenced this pull request Apr 20, 2014
@josef-pkt josef-pkt TST: enable test, mostly copied from PR #1610 9298a57
@josef-pkt josef-pkt changed the title from BUG/ENH fix llnull, extra kwds to recreate modle to BUG/ENH fix llnull, extra kwds to recreate model Apr 20, 2014
@josef-pkt
Member

Ok, I figured out a way to avoid adding a special countmodel llnul
instead CountModel overwrites _get_init_kwds so no other code needs to know that exposure is funny.

@coveralls

Coverage Status

Coverage remained the same when pulling 05116c2 on josef-pkt:clone_fix_llnull into e7e018a on statsmodels:master.

@josef-pkt josef-pkt merged commit 14b9537 into statsmodels:master Apr 21, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@josef-pkt josef-pkt deleted the josef-pkt:clone_fix_llnull branch Apr 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment