-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add Panel Data models #1133
Conversation
I've been thinking about this a little bit, and have now convinced myself that forcing users to use an What do you think about it? |
about xtset: I suggested this or similar ones before (there might be an open issue) My general opinion (not having looked at this branch in a while, except for Poisson): Of course a common sub-case is the standard (macro-) panel, with calender time and cross-section and two way effects. |
right, but then you can have different data prep functions, like Stata's |
Part of this PR was unifying the data-handling so it will work for any panel data model separate from just the linear case (and make it so that it's handled in this base class). The way that it works now, which I think is unchanged from before - it's just general now, is that you can either give time and panel to any panel data models. These would be (separate) indices. Or you can give y and X where the index is a MultiIndex that has time and panel as the respective levels. https://github.com/statsmodels/statsmodels/pull/1133/files#diff-8ab5d9484c849d2418de300970ad5b58R84 It makes sense for something like Survival models (stset) when you might have different kinds of censoring, etc. E.g., the information there can affect the estimation, but I'm not sure what we'd gain in the panel case. I'm open to this change though if it will make some things easier, but I don't see how yet. All of the potential code re-use is in your groupings class, which, I agree, should be able to be re-used for the Surival models, though it may take a bit more work to generalize. I was just looking at them again last weekend. |
Note now that groupings is attached to the model.data attribute and not the models too. |
I also see that the data changes have partially broken older cases (or revealed bugs). |
'''Apply to a sub-group of observations''' | ||
n = subset.shape[0] | ||
B = np.ones((n,n)) / n | ||
out = subset - chain_dot(np.diag(theta[position]), B, subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can be replaced with something without (n,n) arrays
unless subset is always small
I just had a quick look, whether it can be merged soon, so we would have everything together to start compare GEE and Panel, and others. |
I just realized that the handle_data subclass abstraction is in this PR and not master. I'm going to make a PR with just this change in master, because I think it's going to be generally useful. E.g., with survival models as well. |
Rebased after merge of #1421. |
I think this is ready to at least start talking about. There are still a few TODOs in the source, particularly with making sure that twoway effects are correctly handled. Stata doesn't offer much in the way of twoway effects, assuming that for most panel models N >> T.
This supersedes #690, which can be closed but referred to for more information.