WIP: Add Panel Data models #1133

jseabold · 2013-10-21T14:06:02Z

I think this is ready to at least start talking about. There are still a few TODOs in the source, particularly with making sure that twoway effects are correctly handled. Stata doesn't offer much in the way of twoway effects, assuming that for most panel models N >> T.

This supersedes #690, which can be closed but referred to for more information.

vincentarelbundock · 2013-10-21T14:24:03Z

I've been thinking about this a little bit, and have now convinced myself that forcing users to use an xtset-like function to prepare data would save us a lot of under the hood trouble.

What do you think about it?

josef-pkt · 2013-10-21T14:57:38Z

about xtset: I suggested this or similar ones before (there might be an open issue)
One problem I have with Stata in regular usage is that it only allows one dataset to be active. (Which is a pain when I try to prepare 3 examples at the same time.) (I thought that's related to having separate index and data.)

My general opinion (not having looked at this branch in a while, except for Poisson):
A more restrictive structure on the data, and a separate "xt-index" will be useful. But I think "forcing" a too restrictive structure will prevent usages, that are not typical for the area for which it was initially written.
For example, GEE (PR) allows for a multi-dimensional (continuous) time-index so that it can also be used as spatial index.
In most applications for panel/longitudinal data that I looked at recently (microeconometrics), "time" is not calender time, just an integer (discrete) or float (continuous) event time index.
(If I understand now correctly, the SUR model in sysreg is the same as a balanced short panel with unrestricted covariance matrix, if we flip time and cross-section index.)

Of course a common sub-case is the standard (macro-) panel, with calender time and cross-section and two way effects.

vincentarelbundock · 2013-10-21T15:02:29Z

right, but then you can have different data prep functions, like Stata's stset for survival time data. There could be quite a bit of code reuse between these data prep functions too. It just seems ugly to handle all sorts of data input in the model classes.

jseabold · 2013-10-21T15:06:07Z

Part of this PR was unifying the data-handling so it will work for any panel data model separate from just the linear case (and make it so that it's handled in this base class). The way that it works now, which I think is unchanged from before - it's just general now, is that you can either give time and panel to any panel data models. These would be (separate) indices. Or you can give y and X where the index is a MultiIndex that has time and panel as the respective levels.

https://github.com/statsmodels/statsmodels/pull/1133/files#diff-8ab5d9484c849d2418de300970ad5b58R84
https://github.com/jseabold/statsmodels/blob/6a6b01bc8ef9a79aa3cf115b8e0a399c1e1f22cb/statsmodels/panel/base/data.py

It makes sense for something like Survival models (stset) when you might have different kinds of censoring, etc. E.g., the information there can affect the estimation, but I'm not sure what we'd gain in the panel case. I'm open to this change though if it will make some things easier, but I don't see how yet.

All of the potential code re-use is in your groupings class, which, I agree, should be able to be re-used for the Surival models, though it may take a bit more work to generalize. I was just looking at them again last weekend.

jseabold · 2013-10-21T15:06:35Z

Note now that groupings is attached to the model.data attribute and not the models too.

jseabold · 2013-10-21T15:07:04Z

I also see that the data changes have partially broken older cases (or revealed bugs).

coveralls · 2013-10-22T10:50:22Z

Coverage remained the same when pulling 6d40ebe on jseabold:panel-vincent into 3b7082c on statsmodels:master.

coveralls · 2013-10-22T12:38:53Z

Coverage remained the same when pulling dbd8a00 on jseabold:panel-vincent into 3b7082c on statsmodels:master.

coveralls · 2013-10-22T13:21:59Z

Coverage remained the same when pulling f3881db on jseabold:panel-vincent into 3b7082c on statsmodels:master.

josef-pkt · 2013-12-20T15:15:02Z

statsmodels/regression/linear_panel.py

+    '''Apply to a sub-group of observations'''
+    n = subset.shape[0]
+    B = np.ones((n,n)) / n
+    out = subset - chain_dot(np.diag(theta[position]), B, subset)


I guess this can be replaced with something without (n,n) arrays
unless subset is always small

josef-pkt · 2013-12-20T15:28:53Z

I just had a quick look, whether it can be merged soon, so we would have everything together to start compare GEE and Panel, and others.

jseabold · 2014-02-21T15:40:01Z

I just realized that the handle_data subclass abstraction is in this PR and not master. I'm going to make a PR with just this change in master, because I think it's going to be generally useful. E.g., with survival models as well.

jseabold · 2014-02-24T04:10:21Z

Rebased after merge of #1421.

josef-pkt reviewed Dec 20, 2013
View reviewed changes

This was referenced Jan 22, 2014

Proportional hazards survival regression model (a.k.a. Cox model) #1312

Closed

MIGRATE: move stats code to statsmodels / deprecate in pandas pandas-dev/pandas#6077

Closed

josef-pkt added the PR label Feb 19, 2014

This was referenced Feb 21, 2014

ENH: Make handle_data overwritable by subclasses. #1416

Merged

ENH: Add grouping utilities code #1421

Merged

vincentarelbundock added 5 commits February 24, 2014 00:06

ENH: New Linear Panel Model

9151ec6

smarter input processing sorting

ed7bcbb

added between time model

1d1ee98

ENH: Use reindex in Panel model

8d849aa

removed fd

db4ac52

jseabold added 22 commits February 24, 2014 00:07

REF: Remove _effects_levels and import it.

02d86ad

TST: Comment out tests.

598b743

STY: Rename panel -> linear_panel

38729d1

TST: Test rsquared in random.

8a953ba

BUG: Make sure to return.

4e27073

TST/BUG: Fix and tests standard deviations

1e2be1d

TST: Test/fix rho.

6865264

ENH: Add Hausman test.

1256933

EX: Make example runnable without internet.

5c7f9ac

ENH: Estimate constant in within model to be consistent.

8a5003a

TST: Add some stata tricks

2523cce

BUG: Change and back to or.

7b3f390

DOC: Clean up Hausman doc.

1a40705

TST: Smoke test summary method.

5c0ac18

ENH: Make sure summary doesn't raise. TODO: Add summaries.

c40c5c2

ENH: Improve Hausman test usage.

f23199f

TST: Test hausman_test.

a248265

ENH: Add nottest decorator.

0e2db04

ENH: Mark hausman_test function as not a test.

1ae189e

REF: Fix refactor victim

9099344

REF: Always nobs length 2d out of transform_slices.

757d7ff

REF: Preserve 1d in 1d out.

10221f3

josef-pkt mentioned this pull request Jul 4, 2015

SUMM: old pull requests #2503

Open

15 tasks

josef-pkt mentioned this pull request Aug 2, 2015

Absorbing fixed effects #2568

Open

josef-pkt mentioned this pull request Dec 26, 2015

SUMM/ENH: migrate pandas stats models #2745

Closed

This was referenced Jan 22, 2016

SUMM/REF: grouputils #2784

Open

ENH: helper function: pandas multi-way group demean #2783

Open

josef-pkt removed the PR label Mar 5, 2016

josef-pkt mentioned this pull request Jun 10, 2016

COMPAT: update calls to pandas objects/methods #2994

Merged

jseabold closed this Mar 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add Panel Data models #1133

WIP: Add Panel Data models #1133

jseabold commented Oct 21, 2013

vincentarelbundock commented Oct 21, 2013

josef-pkt commented Oct 21, 2013

vincentarelbundock commented Oct 21, 2013

jseabold commented Oct 21, 2013

jseabold commented Oct 21, 2013

jseabold commented Oct 21, 2013

coveralls commented Oct 22, 2013

coveralls commented Oct 22, 2013

coveralls commented Oct 22, 2013

josef-pkt Dec 20, 2013

josef-pkt commented Dec 20, 2013

jseabold commented Feb 21, 2014

jseabold commented Feb 24, 2014

WIP: Add Panel Data models #1133

WIP: Add Panel Data models #1133

Conversation

jseabold commented Oct 21, 2013

vincentarelbundock commented Oct 21, 2013

josef-pkt commented Oct 21, 2013

vincentarelbundock commented Oct 21, 2013

jseabold commented Oct 21, 2013

jseabold commented Oct 21, 2013

jseabold commented Oct 21, 2013

coveralls commented Oct 22, 2013

coveralls commented Oct 22, 2013

coveralls commented Oct 22, 2013

josef-pkt Dec 20, 2013

Choose a reason for hiding this comment

josef-pkt commented Dec 20, 2013

jseabold commented Feb 21, 2014

jseabold commented Feb 24, 2014