Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Summary re-write #636

Closed
wants to merge 12 commits into
from

Conversation

Projects
None yet
5 participants
Member

vincentarelbundock commented Jan 30, 2013

TODO:

  • Reverse name in call. Proper: def summary(self, yname=None, xname=None, title=None, alpha=.05, yname_list=None):
  • Individual column alignment
  • More column spacing by default in summary_params
  • option to not enforce equal table width
  • default lrlr alignment for model info panes
  • summary_col: order/rename regressors in the row index

http://nbviewer.ipython.org/4124662/

What's in here:

  • Summary class: smry = Summary()
  • Convert user input to DataFrames: smry.add_dict(), smry.add_df(), smry.add_array()
  • DataFrame -> SimpleTables -> Output: smry.as_text(), smry.as_html(), smry.as_latex()
  • Helper functions for convenient generation of summary objects: summary_params, summary_model
  • Side-by-side summaries for multiple models: summary_cols
  • 14 tests
  • The main model summaries have all been converted to the new format. I've kept the old summary functions as "summary_old.py" so that sandbox examples can still use it in the interim until everything is converted over.

On ASCII tables implementation: _measure_tables takes a list of DFs, converts them to ascii tables, measures their widths, and calculates how much white space to add to each of them so they all have same width. Then, smry.as_text() produces a new set of ascii tables via SimpleTable by specifying the white space in the colsep output option. In theory, this means we shouldn't ever have to fiddle with make tables fit by hand, and larger column titles/entries should be accommodated (within reason).

PS: The test failure that currently makes Travis go red is trivial to fix.

@vincentarelbundock vincentarelbundock referenced this pull request Jan 30, 2013

Closed

WIP: Summary2 #582

Owner

josef-pkt commented Jan 30, 2013

Thank you Vincent

We are going to merge it for sure, but before we replace the current summary, I think, it's necessary to check corner cases.
I was thinking about this PR yesterday: Since I didn't have or take the time yet to review this again, and play with it, I thought that we could use it in parallel for the transition.
My impression is that the current summary() are still more fine tuned, but your new work is giving greater flexibility in formatting and creates pandas dataframes as byproduct, besides new features comparing models table.
So, I would prefer to be a bit conservative about changing the default summary of the main models, if we want to get 0.5.0 out next month.

two issues, after looking at your notebook

  • scientific notation: all your examples have nicely scaled numbers, what if we have 5000, 1.23456e6, 1.23456e-20 and 0.01 in a table.
    I used a helper function forg (f_or_g formatting) to get consistent number of positions (One of the reasons the current summary has mostly hard coded precision)
  • as I mentioned already, we need more control over the location of two column tables, in your example, I don't like when R-squared is on the left side bottom line and Adj. R-squared is on the right column
Member

vincentarelbundock commented Jan 30, 2013

Sounds reasonable.

  1. forg() could be made slightly more general and included as part of the _formatter() function that I use (we could rename the latter _forg if you prefer).
  2. The current set-up already allows that. 2 very simple alternatives:
    • The info in the top panel is split right down the middle. Right now, R2 and R2-adj happen to be placed right in the middle of the dict, so they get split. But we can move them up or down by a few positions in the dict so they won't be separated anymore.
    • If you want a fixed-forever look, we can just create a 4-column numpy array and use smry.add_array() instead
Member

vincentarelbundock commented Jan 31, 2013

@josef-pkt ,

It seems pretty obvious to me that there's no perfect universal solution to the float formatting issue. The best we can do is have reasonably sane defaults that the user can tweak easily. Hard coding precisions doesn't sound like a solution. So what I've done is introduce a check in the summary._formatter() function to make sure 1e-4 <= x <= 1e4. When this is the case, we apply the default "%.4f". Otherwise, we move to "%4.5g", which seems like a pretty good compromise.

This gives us a single point of entry to improve and write an intelligent statsmodels-wide default float formatting option, and it still makes it easy for users to input their own float string representation if they want.

Member

vincentarelbundock commented Jan 31, 2013

Sorry, forgot to show you the output: http://nbviewer.ipython.org/4679325/

Owner

josef-pkt commented Jan 31, 2013

Looks pretty good, we loose some of the allignment and mostly fixed column width, but it could be worth giving up for the added flexibility.
One advantage of your implementation is that the 3 tables in the summary have the same widths, so having fixed column width is less important.

I think, I prefer reducing the number of decimals for "%g" cases. At least for p-value knowing the extra numbers in 4.5647e-05 or with e-30 is mostly useless. For coefficients/params it would still make sense to have the digits.

I just saw this yesterday by chance, by a poli-sci student:
http://johnbeieler.org/py_apsrtable/

With the extensions now we have two use cases for summary()-like functions.
The current usage as a quick summary of the estimation results (plus warnings) in an interactive session, when users still have full access to the result instance. (Sometimes I didn't like the small number of decimals in the current summary, but I could just print out the full results.)
The second usage is what you would actually keep as record and put in a paper. Here increased flexibility is a big advantage and more options means that users have to do less from scratch. We could even start to add "style sheets" like those encoded in the apsrtable.

I don't know if all the flexibility of the second use case should to be directly in the summary() method. On the other hand if we have set reasonable defaults for everything, then we don't need to look at the options if we just use print results.summary()

Member

vincentarelbundock commented Jan 31, 2013

w.r.t. precision, I agree that pvalues shouldn't show too many digits. An easy fix would be to have summary_params() replace small values by a string. For example, R uses "<2e-16" when p is very small. Perhaps we should follow that convention.

py_apsrtable looks cool. I think the current summary_cols() function in this PR can do most of these things, except maybe rename and sort variables. I'd be curious to hear what @johnb30 thinks of all this. Perhaps he'd like to help...

johnb30 commented Feb 9, 2013

I'd be willing to help however I can. From a quick look, I do like this rewrite of the summary function better than my py_apsrtable. Then again, py_apsrtable was written to enforce a certain type of style, i.e., one commonly used in political science. If @vincentarelbundock would like, I could work on the renaming of variables though I don't know exactly when I would be able to get around to it.

Contributor

janschulz commented Feb 21, 2013

On latest pandas, a error is reported:

[..]\frame.py:3622: FutureWarning: rename with inplace=True  will return None from pandas 0.11 onward
  " from pandas 0.11 onward", FutureWarning)
Contributor

janschulz commented Feb 21, 2013

I also get lots of nan printed:

formular = "avg ~ eigen"
model = sm.ols(formular, df).fit()
print(model.summary())

               Results: Ordinary least squares
============================================================
Model:              nan       R-squared:          7.5148e-05
Dependent Variable: nan       Adj. R-squared:     4.7607e-05
Date:               nan       AIC:                371950.0  
No. Observations:   36308.0   BIC:                371960.0  
Df Model:           1.0       Log-Likelihood:     -185970.0 
Df Residuals:       36306.0   F-statistic:        2.7285    
Scale:              1646.2994 Prob (F-statistic): 0.0986    
------------------------------------------------------------
             Coef.  Std.Err.    t    P>|t|   [0.025   0.975]
------------------------------------------------------------
Intercept   19.5523   0.2133 91.6769    nan  19.1343 19.9703
eigen       -6.4587     3.91 -1.6518 0.0986 -14.1224  1.2051
------------------------------------------------------------
Omnibus:         99558.786  Durbin-Watson:     1.556        
Prob(Omnibus):   0.0        Jarque-Bera (JB):  11204102571.3
Skew:            34.004     Prob(JB):          0.0          
Kurtosis:        2723.554   Condition No.:     18.0         
============================================================
print model.pvalues

Intercept   0.0000
eigen       0.0986
Dtype: float64
  • There should be something in "Model", "Date" and "Dependend Variable"
  • No. observation should probably formatted as an integer, not as float?
  • P>|t| value of 'nan'?

Also, would it be possible to format R^2 without scientific notions? It's much easier for me to look for "0.00xy" than for x.ye-03

[rebased version of this branch on top of statsmodel master]

Contributor

janschulz commented Feb 21, 2013

summary_col also seems to have an error regarding 'nan':

formular = "avg ~ close"
model2 = sm.gls(formular, df).fit()
from statsmodels.iolib.summary import summary_col
print summary_col([model, model2])
=========================
          Model 0 Model 1
-------------------------
Intercept     nan     nan
              nan     nan
close         nan -3.1856
              nan     nan
eigen         nan     nan
              nan     nan
-------------------------
N         36308.0 36308.0
R2            0.0     0.0
=========================
Standard errors in
parentheses.
* p<.1, ** p<.05,
***p<.01

_summ = summary_col([model, model2])
_summ.tables
[              Model 0     Model 1
Intercept  19.5523***  19.5329***
             (0.2133)    (0.2129)
close                     -3.1856
                         (7.8689)
eigen        -6.4587*            
             (3.9100)            
N               36308       36308
R2              0.000       0.000]
Member

vincentarelbundock commented Feb 21, 2013

Thanks for the report. Can you give me data that I can try to replicate this?

Contributor

janschulz commented Feb 21, 2013

Nope, sorry...

ommenting out this line in summary.py:454 removes most of the problems (nan, empty model,..., float formatted observation)

dat = dat.applymap(lambda x: _formatter(x, float_format))

The only missing thing is the scientific formatting

Member

vincentarelbundock commented Feb 21, 2013

Cool. But that line is important for formatting. Do you think you could figure out why the function _formatter() wouldn't work on parameters (elements) of your model. Do:

from statsmodels.iolib.summary import summary_params
res = sm.OLS(y, X).fit()
s = summary_params(res)

and then try _formatter() on elements of s

Contributor

janschulz commented Feb 21, 2013

I think the problem is pandas.DataFrame.applymap:

def _formatter2(element, float_format=None):
    '''Convert an object to string. If float, then it is formatted properly'''
    return element
import statsmodels
statsmodels.iolib.summary._formatter = _formatter2

results in the same problem.

from statsmodels.iolib.summary import summary_params
s = summary_params(model1)
print(s)
             Coef. Std.Err.        t   P>|t|    [0.025   0.975]
Intercept  19.5523   0.2133  91.6769  <2e-16   19.1343  19.9703
eigen      -6.4587   3.9100  -1.6518  0.0986  -14.1224   1.2051
print(statsmodels.iolib.summary._formatter(s.ix[0,3]))
'<2e-16'

@janschulz janschulz referenced this pull request in pandas-dev/pandas Feb 21, 2013

Closed

applymap and misc/float columns #2909

Owner

josef-pkt commented Feb 21, 2013

I would switch back to using the number instead of "<2e-16" even if it's 1.2e-35

Contributor

janschulz commented Feb 21, 2013

@vincentarelbundock The nan and float formatting problem is gone in latest pandas (or a future version, I cherry-picked the fix). This evenbrought me R^2 with leading zeros. :-) So from my (limited tested) perspective: merge 👍

@josef-pkt and the p value formatting: I always liked "<0.0001" (i.e. numbers instead of scientific notations, but not too many zeros)...

Member

vincentarelbundock commented Feb 21, 2013

cool, but we still need to support older versions of pandas, so it's perhaps better to stop using applymap algogether for now.

Contributor

janschulz commented Feb 21, 2013

As far I understood the bug, this wasn't intended (was only in git, not in a release?). Not sure when applymap was initially introduced...

Owner

jseabold commented Feb 21, 2013

Not sure about the bug either, but this works on 0.7.3 (minimum supported version), 0.9.1, and 0.10.1. You're fine targeting released versions only. We can't realistically work around bugs that could crop up in unreleased code.

[~/]
[2]: df = pandas.DataFrame(data=[1.,"a"])

[~/]
[3]: print(df.applymap(lambda x: x))
   0
0  1
1  a
Member

vincentarelbundock commented Feb 21, 2013

Great. After someone else has reviewed this I'll finish up the test suite
and will run it on multiple pandas versions.

Vincent Arel-Bundock

On Thu, Feb 21, 2013 at 4:12 PM, Skipper Seabold
notifications@github.comwrote:

Not sure about the bug either, but this works on 0.7.3 (minimum supported
version), 0.9.1, and 0.10.1. You're fine targeting released versions only.
We can't realistically work around bugs that could crop up in unreleased
code.

[~/]
[2]: df = pandas.DataFrame(data=[1.,"a"])

[~/]
[3]: print(df.applymap(lambda x: x))
0
0 1
1 a


Reply to this email directly or view it on GitHubhttps://github.com/statsmodels/statsmodels/pull/636#issuecomment-13913109.

Member

vincentarelbundock commented Mar 11, 2013

Another good reason to go ahead with this change: Long variable names BREAK current summaries

https://gist.github.com/vincentarelbundock/5131316

Member

vincentarelbundock commented Mar 11, 2013

The same example, using my re-write:

https://gist.github.com/vincentarelbundock/5131319

Owner

josef-pkt commented Mar 11, 2013

Letting only the params table grow was semi-intentional (pretending it's nice when it was too much work to implement otherwise.)

I still think my alignment is prettier, but it's close.

Member

vincentarelbundock commented Mar 11, 2013

;)

Owner

josef-pkt commented Mar 11, 2013

half way down the page
http://blog.yhathq.com/posts/logistic-regression-and-python.html
Ain't summary pretty ? :)

Member

vincentarelbundock commented Mar 11, 2013

That's a really nice blog post

@josef-pkt josef-pkt referenced this pull request Mar 26, 2013

Closed

Fix latex output #734

@josef-pkt josef-pkt commented on the diff Mar 27, 2013

statsmodels/discrete/discrete_model.py
@@ -2118,21 +2120,23 @@ def margeff(self, at='overall', method='dydx', atexog=None, dummy=False,
return effects
- def summary(self, yname=None, xname=None, title=None, alpha=.05,
- yname_list=None):
+ def summary(self, title=None, xname=None, yname=None, alpha=.05,
@josef-pkt

josef-pkt Mar 27, 2013

Owner

why did you reverse the sequence between xname and yname?
It would be confusing to me since our signatures are (endog, exog)

it's just keywords, so not really important

@vincentarelbundock

vincentarelbundock Mar 27, 2013

Member

not intentional. added to todo list (first post of this thread)

Owner

josef-pkt commented Mar 27, 2013

I really would like to get this one merged, but I still don't think it's a substitute for the current summary.

Before I looked at the changes in the current version, I thought the best would be to reinstate the old summary, and create a new summary in parallel summary_flex (or summary_new ?) for your version.
But, now I saw that the old version are not available anymore with merging this.

the points:

  • I think the allignment in the old version is still "prettier" than in the new version, especially for examples in documentation and blogs. (the statsmodel "signature" summary)
  • I like that I have control over left and right columns or where the break across columns is
  • I also find it easier to understand the source if the summary method in the model actually specifies what's in the summary (explicit instead of implicit). (My weird "None" was the compromise: the model_results.summary tells what's supposed to be in it, the Summary class takes care of the formatting and details.)
  • Related to the previous point, the list of tuples controlls the sequence of items, although the OrderedDict is equivalent if we can/could have arbitrary blank lines (I didn't check this)

There are also many reasons why I would like to get this in, more flexibility, more options, new features.

Also, it looks like it's faster to add to a new model. However, in contrast to Skipper, I don't complain about more work for the very common or popular models, if I can fine-tune the presentation, (even if the user cannot).

I still haven't worked my way through all the changes in the backend code. Some things might only become clear until we have used it for some time. (I'm not focused enough to work my way through it in the details.)
That's why I was thinking about the parallel system of old and new summary (hopefully only temporarily).

Member

vincentarelbundock commented Mar 27, 2013

yes, all this is doable.

  • I could write a smry.add_tuple method
  • summary_model is a convenience function, we can build dicts or tuples by hand in every summary method if you prefer, but summary_model reduces code duplication.
  • You might be right that it would be good to have the two versions in parallel for a bit. It may be good to switch over at least some of the models so the functions get some exposure to real-world use.
  • What exactly do you prefer about the current alignment. I see two main differences:
    1. Your odd columns are left aligned and your even columns are right aligned. Mine are all left-aligned. My personal view is that yours might look "cleaner" when looking at the full table, but that this sort of alignment contravenes all sorts of principles of good typographic design. It makes no sense for the reader to see something on the far left side and then to have to follow an imaginary line all the way across the page. I guess we could still add more flexbility in alignment by allowing manual alignment of each column individually.
    2. Spacing between columns in the parameters summary. That's just a matter of switching the default colspace argument from 1 to 3

Unfortunately, I probably won't have time to touch code for the next 3 weeks. I'll let you know when I get time.

Owner

josef-pkt commented Mar 27, 2013

I was going mostly by Stata examples (from the Web or from the other Vincent before I had Stata myself)
http://www.ats.ucla.edu/stat/dae/ has some examples.
SAS is more mixed in terms of alignment, but genmod also has labels left and numbers right aligned:
http://www.ats.ucla.edu/stat/sas/dae/poissonreg.htm
essentially pretty block formatting as in a book.
(To avoid stretching the left and right parts too far, was one of the reasons not to enforce equal widths for all three tables, both Stata and SAS have mixed line widths in summaries, at least in many cases)

The other issue is that your decimals are not always aligned in the params table (also a tradeoff "pretty" versus informative)

(Related: Besides for the formatting, I also used mainly Stata printouts to decide what to add to the summary tables)

Owner

josef-pkt commented Mar 27, 2013

One reason, I would be quite happy with two different styles is that in blogs and documentation nobody reads the numbers in detail, and pretty is better than informative.

If you want to publish the results in a journal, then having more control over precision is necessary because publishers, referees and readers do think it's important. (Plus formatting details are handled by Latex.)
.

Member

vincentarelbundock commented Mar 27, 2013

I don't think that Stata should be taken as a model for graphic design ;)

Honestly, I really don't care that much about the look of the tables. Adding an option for column-wise alignment (e.g. align=['l','r','l','r']) would pretty much allow you to replicate current look if you want. I'm sure we can also improve forg() to make decimals align. We could also have an option to not enforce equal pane width. And that would solve it...

My main problem with the current setup is that it's basically unusable for anyone but Skipper and you. Removing difficulties like this one is not just about prioritizing laziness over fine-grained control, it's about removing barriers to entry for others to contribute. I don't think I would write a summary with the current setup.

I'm also not too sure about having two systems in parallel. I think that with the changes I suggested above we can easily replicate the current printout with the new framework. Holding two sets of summary functions might be good for transition, but I think SM needs to buy-in to one system in the medium run to avoid user and new developer confusion. There should be one obvious way, etc, etc.

Member

vincentarelbundock commented Mar 27, 2013

Also, there's great control over precision in my rewrite since every method allows users to pass a float_format argument which takes custom formatting.

Member

vincentarelbundock commented Mar 27, 2013

Oh, and BTW, the old versions are still there, but in the summary_old file.

Owner

jseabold commented Mar 27, 2013

On Wed, Mar 27, 2013 at 12:21 AM, Vincent Arel-Bundock <
notifications@github.com> wrote:

My main problem with the current setup is that it's basically unusable for
anyone but Skipper and you. Removing difficulties like this one is not just
about prioritizing laziness over fine-grained control, it's about removing
barriers to entry for others to contribute. I don't think I would write a
summary with the current setup.

Don't count me in on it being usable for me. My "complaining" (really?) was
after it took me hours to format printing a table for summary. It just
seemed to me that there must be a better way to do this. Like you, I didn't
follow the development this code or look too much at the details unless I
had to, and I found it to be very difficult to work with. Thanks, again,
for having a look at this.

Owner

jseabold commented Mar 27, 2013

Somewhat related, though we don't really have the luxury of considering this "feature-creep." Might be nice to keep an eye on developments here.

pydata/pandas#3190

Since this is a solved problem elsewhere, I peaked at how libraries like gretl are solving this in a general way. It's about 5000++ lines of C++ code, though, and I didn't yet find the actual templating parts separate from the support code.

Member

vincentarelbundock commented Mar 27, 2013

Looks interesting. Initially I tried to get rid of SimpleTable completely, just relying on pandas printing, but I soon realized that it was not flexible enough for us. For instance, distance between columns couldn't be adjusted, etc. Some of the pandas to_??? methods were also implemented relatively recently, so it might be a while before statsmodels can use any of this.

Owner

jseabold commented Mar 27, 2013

Sure. Mostly just wanted to keep a cross-link here. I learn a lot watching programmers who are better than I and have a better sense of the current landscape.

Contributor

janschulz commented Apr 10, 2013

Any news on this? I really like the new features, especially the possibility to get nicely formatted summary tables in ipython notebook...

Is there anything I can help to get this merged?

Owner

jseabold commented Apr 10, 2013

I am almost done with all of the fixes for 0.5, and I plan to look at PRs next, if it will also help on the decision for this one. I haven't followed this development closely yet. My impression was that this was mostly an internal refactor. Do summary tables not already have a _repr_html method for the notebooks?

Contributor

janschulz commented Apr 15, 2013

I think they do, but there is no summary_col which I find a great help!

Contributor

janschulz commented Apr 16, 2013

BTW: it would be nice if one could specify the names of the independent variables (independent_names={"oldname": "newname"}) or specify the order in summary_col.

Member

vincentarelbundock commented Apr 16, 2013

I added this to the todo list (first post), but I have a ton of work stuff going on right now, so I don't know when I'll be able to get back to this. Also, my sense is that these features could be added afterwards, by me or some enterprising user ;), once the basic framework is in place.

Contributor

janschulz commented Jun 7, 2013

Any news? I couldn't get "git rebase master" to finish while in this branch, so I suspect some manual rebasing is needed :-(

Member

vincentarelbundock commented Jun 12, 2013

I fixed one of the last nagging aesthetic concerns that @josef-pkt had by introducing the cool "max_width" argument and slightly increasing padding between columns. Individual column alignment (e.g. lrlr) would actually be a bit more complicated since SimpleTable doesn't offer that functionality directly. Will have to think about that. Here's my new plan:

Considering that:

  1. There would be great benefits to having this included in the next release
  2. The code base is very simple and should be easy to review
  3. This still represents a considerable change to a highly visible part of statsmodels
  4. Such a change should be introduce in a way that won't break everything but will still get some exposure in order to generate feedback and bug reports

I propose:

  1. New PR which keeps all current summary functions intact and puts the my new summary functions in an alternative file/module "summary_new"
  2. Most model summaries keep using the old functions
  3. Discrete models are converted to the new summaries. Hopefully this will help us identify problems and avenues for improvements
  4. Quantile regression summaries will be written with the new format

Once this is introduced and we get some mileage, we can consider going back to the old way or improvement this and extending it to all the models.

What do you think @josef-pkt, @jseabold, @janschulz ?

Contributor

janschulz commented Jun 12, 2013

As long as I have a summary_col method, I'm happy :-)

Contributor

janschulz commented Jun 25, 2013

I tried to merge this on top of current master (947943d), but this fails with some merge conflicts:

c:\data\external\statsmodels>git merge vincentarelbundock/summary
Auto-merging statsmodels/tsa/arima_model.py
Auto-merging statsmodels/sandbox/examples/example_nbin.py
Auto-merging statsmodels/robust/robust_linear_model.py
CONFLICT (content): Merge conflict in statsmodels/robust/robust_linear_model.py
Auto-merging statsmodels/regression/linear_model.py
CONFLICT (content): Merge conflict in statsmodels/regression/linear_model.py
Removing statsmodels/iolib/tests/test_summary_old.py
Auto-merging statsmodels/iolib/summary.py
CONFLICT (content): Merge conflict in statsmodels/iolib/summary.py
Auto-merging statsmodels/genmod/generalized_linear_model.py
CONFLICT (content): Merge conflict in statsmodels/genmod/generalized_linear_mode
l.py
Auto-merging statsmodels/discrete/tests/test_discrete.py
Auto-merging statsmodels/discrete/discrete_model.py
CONFLICT (content): Merge conflict in statsmodels/discrete/discrete_model.py
Auto-merging statsmodels/discrete/discrete_margins.py
Auto-merging examples/notebooks/example_gmle.ipynb
CONFLICT (content): Merge conflict in examples/notebooks/example_gmle.ipynb
Automatic merge failed; fix conflicts and then commit the result.
Member

vincentarelbundock commented Jun 25, 2013

Yes, this is expected. I plan to submit a new version that keeps all the old summaries in place and just provides an option for this. That way there won't be conflict and the transition (if it ever goes through) will be smoother.

Contributor

janschulz commented Jun 26, 2013

Ok, I just took the summary module, included the fmt* thingies and used that directly in my project to get a summary_col function. Works nicely :-)

The function should check if the model names include duplicates, as pandas doesn't like it when there are duplicate column names. I currently use something like this:

if len(set(_names) != len(_names):
    from collections import defaultdict
    name_counter = defaultdict(str)
    new_names = []
    for _name in _names:
        name_counter[_name] += "I"
        new_names.append(_name+" " +name_counter[_name])

For the ordering of regressors:

import pandas
d = pandas.DataFrame({"a":["A","b","C", "D"], "b":[1,2,3, 4], "C":["A","b","C", "D"]})
d = d.set_index("a")
_order = ["A","D"] # would be passed in and defaults to an empty list []
_new_index = []
_old_index = list(d.index)
for element in _order:
    _new_index.append(element)
    _old_index.remove(element)
_new_index.extend(_old_index)
d.reindex(_new_index)

Would you take such patches? And if yes: against which code branch?

Owner

josef-pkt commented Jun 26, 2013

If the old summary stays in place, then I can still merge this for 0.5.
I also would like to see the additional functionality like the multi-model tables get in and have it available for users.
It would be possible to switch one of the models to the new summary as default in master after 0.5 is out, to get some feedback.

Member

vincentarelbundock commented Jun 26, 2013

Cool.

@janschulz That looks good. Give me a day or two and I'll make a new cleaner pull request. I'll page you when that's done. Code should stay the same, so your patch should work as-is if it works now.

@josef-pkt Good idea. Which model should we convert? quantile?

Owner

josef-pkt commented Jun 26, 2013

@vincentarelbundock I thought you could add a summary2 (or some name like that) to all models, then we can just rename it when we want to change the default, and it gives users the option to use either one.
(That was my idea when I left summary_old in there when I wasn't sure yet I like my rewrite, summary_old needs to be deleted now, as you did in this PR.)

I would also add summary2 to the docs and to the release announcement, as a more flexible alternative to summary(), so it will get a share of users. (and we get feedback from actual usage.)

quantile sounds fine for the change in default, but maybe after 0.5 is out. I don't remember how the quantile regression summary() looks like.

Thanks

Member

vincentarelbundock commented Jun 26, 2013

@josef-pkt @janschulz Check this out: #915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment