New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SummaryTable is brutally cumbersome #414
Comments
I was playing around with some tables today and I must say I agree with your comment. In terms of defining a vision for how to move forward with this, what's the desired feature set? Would it be sufficient to have a couple helper functions that help populate pandas DataFrame, and then use DataFrame.to_string() or DataFrame.to_html()? Does statsmodels needs more complex tables than simple arrays (e.g. multicolumn)? |
look at an example e.g. http://nbviewer.ipython.org/3484294/ the results.summary() is 5 tables, top and bottom with 2 horizontally concatenated tables and the simpler single table with params in the middle. discrete models are missing the regression diagnostics This is not designed for quick simple tables, pandas works much better in this case. In a branch I made some changes to the html rendering (since align on decimal doesn't exist) some problems where I think the greater control over rendering helps (compared to pandas, AFAIK): I don't know how much control over formatting we can get with using a DataFrame. In simpler cases with a table of just numbers (which are pretty homogenous) pandas is more convenient, but I doubt we have enough control for fancier formatting in more complex tables. (SimpleTable also renders Latex) |
Yes, flexibility does seem to matter. And Latex support is a big plus. But i'm not sure that pandas dataframes are really limited to simple cases. They can basically behave like the SimpleTable building block. You can concatenate them horizontally, or stack tables with different numbers of columns vertically by printing them one after the other and forcing them to have equal width. To_string() also allows a "formatters" argument which can apply arbitray functions to the columns you want. So in theory we could write a pretty simple "align_float_on_decimal" function that would give us neatly formatted columns. Of course, if there are too many formatter functions to write, reinventing the wheel wouldnt be worth it. http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.to_string.html |
Lots of options. Two possibilities
to the second: I worked on the summary for two weeks or so, and was happy enough when I got it to work as it is now. I ran out of patience with fighting with this, and didn't go back and see if it could be made more convenient or cleaner. However, for summary() looks to me that this will be necessary however we are creating the tables. For homogenous tables like summary_frame in outliers, pandas DataFrame is nicer because it can do the rendering and hold the data at the same time. |
Yeah you're probably right. I'm still a bit curiou, so ifi have time i'll try to put together a minimal working example with pandas, just to have a better sense of how close to an acceptable result we can get using 40 lines of code. |
Having added summary tables to AR, ARIMA, and now marginal effects for discrete choice variables, I have to say that doing this is brutally difficult and rather annoying. There's got to be a better way or a refactor that could sort this out.
The text was updated successfully, but these errors were encountered: