SummaryTable is brutally cumbersome #414

jseabold · 2012-08-02T17:14:26Z

Having added summary tables to AR, ARIMA, and now marginal effects for discrete choice variables, I have to say that doing this is brutally difficult and rather annoying. There's got to be a better way or a refactor that could sort this out.

vincentarelbundock · 2012-11-18T03:15:00Z

I was playing around with some tables today and I must say I agree with your comment. In terms of defining a vision for how to move forward with this, what's the desired feature set? Would it be sufficient to have a couple helper functions that help populate pandas DataFrame, and then use DataFrame.to_string() or DataFrame.to_html()?

Does statsmodels needs more complex tables than simple arrays (e.g. multicolumn)?

josef-pkt · 2012-11-18T04:03:28Z

look at an example e.g. http://nbviewer.ipython.org/3484294/

the results.summary() is 5 tables, top and bottom with 2 horizontally concatenated tables and the simpler single table with params in the middle.

discrete models are missing the regression diagnostics
http://nbviewer.ipython.org/3484274/

This is not designed for quick simple tables, pandas works much better in this case.
What I worked on with summary is to have enough control to get a nice table for the standard case. Tables that follow the same pattern are largely boiler plate. New tables with a different pattern are "work".

In a branch I made some changes to the html rendering (since align on decimal doesn't exist)
bb4dafa

some problems where I think the greater control over rendering helps (compared to pandas, AFAIK):
different precision in columns,
colums that have numbers with 1e-20 and 1e4 at the same time
regression summary (top and bottom table) where each line has different units.

I don't know how much control over formatting we can get with using a DataFrame. In simpler cases with a table of just numbers (which are pretty homogenous) pandas is more convenient, but I doubt we have enough control for fancier formatting in more complex tables.

(SimpleTable also renders Latex)

vincentarelbundock · 2012-11-18T04:26:40Z

Yes, flexibility does seem to matter. And Latex support is a big plus. But i'm not sure that pandas dataframes are really limited to simple cases. They can basically behave like the SimpleTable building block. You can concatenate them horizontally, or stack tables with different numbers of columns vertically by printing them one after the other and forcing them to have equal width. To_string() also allows a "formatters" argument which can apply arbitray functions to the columns you want. So in theory we could write a pretty simple "align_float_on_decimal" function that would give us neatly formatted columns. Of course, if there are too many formatter functions to write, reinventing the wheel wouldnt be worth it.

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.to_string.html

josef-pkt · 2012-11-18T04:45:34Z

Lots of options.
I'm not sure you will end up with less setup and formatting code than with https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py#L1314

Two possibilities

try out using pandas for at least simpler tables (no Latex for now), or
try to streamline the current summary helper functions and classes.

to the second: I worked on the summary for two weeks or so, and was happy enough when I got it to work as it is now. I ran out of patience with fighting with this, and didn't go back and see if it could be made more convenient or cleaner.

However, for summary()
-fetch results
-reformat them to correct string representation
-stick them in a "table"

looks to me that this will be necessary however we are creating the tables.

For homogenous tables like summary_frame in outliers, pandas DataFrame is nicer because it can do the rendering and hold the data at the same time.

vincentarelbundock · 2012-11-18T05:01:26Z

Yeah you're probably right. I'm still a bit curiou, so ifi have time i'll try to put together a minimal working example with pandas, just to have a better sense of how close to an acceptable result we can get using 40 lines of code.

jseabold mentioned this issue Feb 13, 2014

SimpleTable in Summary (e.g. OLS) is slow for large models #1385

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SummaryTable is brutally cumbersome #414

SummaryTable is brutally cumbersome #414

jseabold commented Aug 2, 2012

vincentarelbundock commented Nov 18, 2012

josef-pkt commented Nov 18, 2012

vincentarelbundock commented Nov 18, 2012

josef-pkt commented Nov 18, 2012

vincentarelbundock commented Nov 18, 2012

SummaryTable is brutally cumbersome #414

SummaryTable is brutally cumbersome #414

Comments

jseabold commented Aug 2, 2012

vincentarelbundock commented Nov 18, 2012

josef-pkt commented Nov 18, 2012

vincentarelbundock commented Nov 18, 2012

josef-pkt commented Nov 18, 2012

vincentarelbundock commented Nov 18, 2012