New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DES: Use MultiIndex to make params *always* 1-dimensional #3652
Comments
see also the thread here https://mail.python.org/pipermail/numpy-discussion/2017-February/076474.html The trade-off is that we might get some conveniences or improvements by using pandas inside the models, but if we mix data structures, then it is difficult, or a lot of work, to keep track of small differences in behavior. e.g. pandas for loops over columns, numpy over "rows", the 0 axis. So far the trade-off was in favor of using only numpy inside the models, and restricting pandas to the interface or where using it really has a large advantage. Also the core parts of the models is just "math", we/they don't care what the variables mean, and carrying around labels (columns and index names) is mostly irrelevant to the math and code. (pandas was initially designed to be user friendly and not developer friendly with two letter names that I never remembered and too much magic, second guessing the user, besides the differences in behavior.) Note: This is a recurrent issue because we always face this decision and trade-off, and recurrent discussions what to do in specific cases. The trend is going towards more pandas, also because the trend is that there are more users and potential contributors familiar with pandas than pure numpy. In the specific cases, I don't think using pandas is worth the pain. |
That's a pretty compelling case. My understanding is that the pandas team intends to address the indexing "warts" in 2.0, but that's still distant-horizon.
What happens to your opinion if we focus the suggestion on the An idea I toyed with yesterday with little success was to patch |
The same as for the models applies to the results classes, those are essentially part of the core computation for the models. That Results are separate from Models, is mostly for internal housekeeping (Results hold fit results, the model doesn't know the results of fit.) and for code reuse and inheritance. As you seem to have discovered, a results instance is the wrapped instance and The main reason is that we still want to use numpy for follow-up or post-estimation computation, like the Wald tests, conf_int and similar.
Minor detail to illustrate what we need to watch out for when mixing pandas and numpy I'm very risk averse in large refactorings of existing models, unless there is a very strong reason. I think it's better to introduce new approaches in the fringes and with new model or other classes where a new pattern promises some advantages. When we have some experience and if it turns out to be useful for the generic case, then large scale or core code refactoring is feasible. (That's also true for many generic enhancement. Eg. the still existing code duplication in sandwich |
I know the |
The mental effort of having to think about ddof and similar still remains. And it won't be "fun" to debug why the precision of the results is low. (although I'm quite used to debug degrees of freedom corrections against Stata.) |
Discussion topic:
Over in #3651 we've been troubleshooting some problems caused in
MNLogit
because some methods are not expectingparams
(orbse
ortvalues
or ...) to be 2-dimensional.In cases with 2-dimensional
params
I consistently have to go back and check which dimension is which. If we cast these into apd.DataFrame
internally (except for in the optimization step), we could have the labels and could attach names to the indexes, e.g. "Exog" and "Eq#".Then any time there is a method that assumes params are 1-dimensional, we have "stack" and "unstack".
A secondary motivation is cases that are not raising errors but make it easy to shoot myself in the foot, like
irf.cov()
. That returns an array with shape(steps+1, neqs**2, neqs**2)
. (The deprecation ofpd.Panel
notwithstanding) Ideally these latter two dimensions should have index that behave a lot like apd.MultiIndex
.The text was updated successfully, but these errors were encountered: