Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

WIP: Summary2 #582

Closed
wants to merge 20 commits into from

4 participants

@vincentarelbundock
Collaborator

Cleaned up version of what I had posted but deleted a few days ago. It now works as advertised, but some of it could certainly benefit from a more experienced python hand.

Let me know if any of this is usable.

Vincent

statsmodels/tsa/arima_model.py
((21 lines not shown))
+ # Model info
+ model_info = summary_model(self)
+ model_info['Method:'] = self.model.method
+ model_info['Sample:'] = sample[0]
+ model_info['S.D. of innovations:'] = "%#5.3f" % self.sigma2**.5
+ model_info['HQIC:'] = "%#5.3f" % self.hqic
+ model_info['No. Observations:'] = str(len(self.model.endog))
+
@josef-pkt Owner

what determines the sequence in which this item show up in the table? and whether the info is on left or right?

Is it still possible to do 'Sample' in two lines ?

@vincentarelbundock Collaborator

The order in which elements were entered in the OrderedDict determines where they fall in the table. We split left/right automatically in the middle of the dict (with white space on the right side if the number of elements is odd). You can tweak the order by rearranging the summary_model() function, or by sorting the resulting OrderedDict. In the current setup, arima-specific fields are just appended at the end of the basic information fields shared by other models.

Alternatively, you could sacrifice convenience for control by creating a Numpy array or DataFrame with 4 columns (labels in cols 1 and 3) and then use the Summary().add_df or add_array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@vincentarelbundock
Collaborator

updated the example notebook: http://nbviewer.ipython.org/4124662/

@JanSchulz

It would be nice if summary_col could produce nice looking html tables in ipython notebook like pandas DataFrames (see pydata/pandas#772). Also the new just model.summary() in a notebook looks like print(...)/< pre >..< /pre > now but the old way way a nice html table.

@JanSchulz

The summary_col(output) method fails for me:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-48-249bb855cece> in <module>()
----> 1 print summary_col(output)

C:\portabel\Python27\lib\site-packages\statsmodels\iolib\summary.py in summary_col(results, float_format, model_names, stars, info_dict)
    460     cols = [_col_info(x, info_dict) for x in results]
    461     merg = lambda x,y: x.merge(y, how='outer', right_index=True, left_index=True)
--> 462     info = reduce(merg, cols)
    463     # Summary
    464     smry = Summary()

C:\portabel\Python27\lib\site-packages\statsmodels\iolib\summary.py in <lambda>(x, y)
    459     # Info as dataframe columns
    460     cols = [_col_info(x, info_dict) for x in results]
--> 461     merg = lambda x,y: x.merge(y, how='outer', right_index=True, left_index=True)
    462     info = reduce(merg, cols)
    463     # Summary

C:\portabel\Python27\lib\site-packages\pandas\core\frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
   4362                      left_on=left_on, right_on=right_on,
   4363                      left_index=left_index, right_index=right_index, sort=sort,
-> 4364                      suffixes=suffixes, copy=copy)
   4365 
   4366     #----------------------------------------------------------------------

C:\portabel\Python27\lib\site-packages\pandas\tools\merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     34                          right_index=right_index, sort=sort, suffixes=suffixes,
     35                          copy=copy)
---> 36     return op.get_result()
     37 if __debug__: merge.__doc__ = _merge_doc % '\nleft : DataFrame'
     38 

C:\portabel\Python27\lib\site-packages\pandas\tools\merge.pyc in get_result(self)
    185 
    186         # this is a bit kludgy
--> 187         ldata, rdata = self._get_merge_data()
    188 
    189         # TODO: more efficiently handle group keys to avoid extra

C:\portabel\Python27\lib\site-packages\pandas\tools\merge.pyc in _get_merge_data(self)
    276         lsuf, rsuf = self.suffixes
    277         ldata, rdata = ldata._maybe_rename_join(rdata, lsuf, rsuf,
--> 278                                                 copydata=False)
    279         return ldata, rdata
    280 

C:\portabel\Python27\lib\site-packages\pandas\core\internals.pyc in _maybe_rename_join(self, other, lsuffix, rsuffix, copydata)
   1174 
   1175     def _maybe_rename_join(self, other, lsuffix, rsuffix, copydata=True):
-> 1176         to_rename = self.items.intersection(other.items)
   1177         if len(to_rename) > 0:
   1178             if not lsuffix and not rsuffix:

C:\portabel\Python27\lib\site-packages\pandas\core\index.pyc in intersection(self, other)
    653             this = self.astype('O')
    654             other = other.astype('O')
--> 655             return this.intersection(other)
    656 
    657         if self.is_monotonic and other.is_monotonic:

C:\portabel\Python27\lib\site-packages\pandas\core\index.pyc in intersection(self, other)
    662                 pass
    663 
--> 664         indexer = self.get_indexer(other.values)
    665         indexer = indexer.take((indexer != -1).nonzero()[0])
    666         return self.take(indexer)

C:\portabel\Python27\lib\site-packages\pandas\core\index.pyc in get_indexer(self, target, method, limit)
    789             this = self.astype(object)
    790             target = target.astype(object)
--> 791             return this.get_indexer(target, method=method, limit=limit)
    792 
    793         if not self.is_unique:

C:\portabel\Python27\lib\site-packages\pandas\core\index.pyc in get_indexer(self, target, method, limit)
    792 
    793         if not self.is_unique:
--> 794             raise Exception('Reindexing only valid with uniquely valued Index '
    795                             'objects')
    796 

Exception: Reindexing only valid with uniquely valued Index objects

It seems to come from the missing column name: adding res.columns = [str(result.model.endog_names)] before return in _col_params(result, float_format='%.4f', stars=True) and out.columns = [str(result.model.endog_names)] in _col_info(result, info_dict=None) made the error go away.

@JanSchulz

long "model names" (see comment above) make the "R2/AIC/N" part of the table unaligned with the rest of the values:

========================================================
              np.log(avg_blub + 1)  np.log(blublu + 1)  
--------------------------------------------------------
Intercept     2.7458***              1.3378***          
              (0.0165)               (0.0071)           
BLUBBEREEE    0.1779***              0.1763***          
              (0.0130)               (0.0056)           
--------------------------------------------------------
R2                    0.053                   0.229     
AIC                   116081.787              55120.079 
N                     36308                   36308     
========================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

[shortend: i actually had more models (and valiables) included]

BUT it works! Thanks a lot!

@josef-pkt
Owner

possible enhancement for comparing models table: add results from a t-test

What looks nice in the econometrics text book by Stock Watson is that terms that have interaction effects or polynomial (linear and square terms for example) are included in the table as separate row that shows what the total effect of a variable is and whether it is significantly different from zero.

It should be possible to get everything from the ttest method.

@vincentarelbundock
Collaborator

@josef-pkt I don't have the book with me. Could this be done simply by adding new rows like the N, R2? If so, then we can just write a simple helper function to create a dict with the required information and then feed that to the info_dict argument of summary_col(). I think I'd rather nail down the basic functionality before moving ahead with this though.

@josef-pkt
Owner

Yes, either adding new rows like the N, R2, or adding to the parameter table. (I haven't looked at the details yet.)

This can also wait, just something from the wishlist to keep in mind.

@vincentarelbundock vincentarelbundock referenced this pull request
Closed

Summary re-write #636

2 of 6 tasks complete
@vincentarelbundock
Collaborator

Work moved to #636

@coveralls

Coverage Status

Changes Unknown when pulling 2f9000d on vincentarelbundock:summary2 into ** on statsmodels:master**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 4, 2013
  1. @vincentarelbundock

    ENH: summary2

    vincentarelbundock authored
  2. @vincentarelbundock
  3. @vincentarelbundock
  4. @vincentarelbundock
  5. @vincentarelbundock

    add_array cleanup

    vincentarelbundock authored
  6. @vincentarelbundock
  7. @vincentarelbundock

    code reuse

    vincentarelbundock authored
  8. @vincentarelbundock

    simpler add_dict

    vincentarelbundock authored
Commits on Jan 5, 2013
  1. @vincentarelbundock
Commits on Jan 6, 2013
  1. @vincentarelbundock
  2. @vincentarelbundock
  3. @vincentarelbundock

    misc bugs

    vincentarelbundock authored
Commits on Jan 9, 2013
  1. @vincentarelbundock
Commits on Jan 13, 2013
  1. @vincentarelbundock

    TST: fix tests

    vincentarelbundock authored
  2. @vincentarelbundock
Commits on Jan 28, 2013
  1. @vincentarelbundock
  2. @vincentarelbundock
  3. @vincentarelbundock
Commits on Jan 29, 2013
  1. @vincentarelbundock

    summary refactor

    vincentarelbundock authored
Commits on Jan 30, 2013
  1. @vincentarelbundock

    before backport

    vincentarelbundock authored
Something went wrong with that request. Please try again.