You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I want to use python to output statistic results like Stata's command esttab . But I only know two methods like summary() and summary2() by reading official examples. In addition, statistical results cannot be output to external files like Excel . So I read the source code of above two methods expect that I can do something. Ultimately, I make some code changes in summary2. Fortunately, the new summary2 can directly output the results of multiple models with stars by it's summary_col() function. And at the same time, we can use pandas method to_excel() or to_csv to export the summary results as .xls or .csv file. Besides, my modifications also support Panel Regression from the package linearmodels. Of course, I also found some bugs in summary2.
My main idea is to make all the parts of summary table as DataFrame and then append them into self.tables.So pandas output method works. Detailed changes and descriptions and possible bugs in summary2 are shown below, and some examples are given at the end. All changes will be marked with #+ Arabic number like # +1: ,#+2: , and unchanged parts are replaced by ......
class Summary(object):
......
# +1:
# I added the output method based on pd.DataFrame().to_excel/csv().
# I merged the results when they are output, mainly in order to make
# the output be distinguishable and more beautiful when printing.
# Maybe there are other better ways.
def to_excel(self,path=None):
tables = self.tables
import os
cwd = os.getcwd()
if path:
path = path
else:
path = cwd + '\\summary_results.xlsx'
summ_df = pd.concat(tables,axis=0)
return summ_df.to_excel(path)
def to_csv(self,path=None):
tables = self.tables
import os
cwd = os.getcwd()
if path:
path = path
else:
path = cwd + '\\summary_results.csv'
summ_df = pd.concat(tables,axis=0)
return summ_df.to_csv(path)
def _measure_tables(tables, settings):
'''Compare width of ascii tables in a list and calculate padding values.
We add space to each col_sep to get us as close as possible to the
width of the largest table. Then, we add a few spaces to the first
column to pad the rest.
'''
# simple_tables = _simple_tables(tables, settings)
# tab = [x.as_text() for x in simple_tables]
# length = [len(x.splitlines()[0]) for x in tab]
# len_max = max(length)
# pad_sep = []
# pad_index = []
# for i in range(len(tab)):
# nsep = tables[i].shape[1] - 1
# pad = int((len_max - length[i]) / nsep)
# pad_sep.append(pad)
# len_new = length[i] + nsep * pad
# pad_index.append(len_max - len_new)
# return pad_sep, pad_index, max(length)
#+2 :
# Above codes may have two bugs:
#Bug1: If tables or settings is an empty list,
# then _simple_tables() will return [].
# that means length is also empty ,
# so max() will raise an error.
# Bug2: If table[i] just has one column, '/nsep' will raise ZeroDivisionError.
# So I added exception capture codes as follows.
simple_tables = _simple_tables(tables, settings)
if simple_tables == []:
len_max = 0
pad_sep = None
pad_index = None
else:
tab = [x.as_text() for x in simple_tables]
length = [len(x.splitlines()[0]) for x in tab]
len_max = max(length)
pad_sep = []
pad_index = []
for i in range(len(tab)):
nsep = tables[i].shape[1] - 1
try:
pad = int((len_max - length[i]) / nsep)
except (ZeroDivisionError):
pad = int((len_max - length[i]))
pad_sep.append(pad)
len_new = length[i] + nsep * pad
pad_index.append(len_max - len_new)
return pad_sep, pad_index, len_max
......
def summary_model(results):
'''Create a dict with information about the model
'''
def time_now(*args, **kwds):
now = datetime.datetime.now()
return now.strftime('%Y-%m-%d %H:%M')
info = OrderedDict()
#+3:
# I added some informations of Panel regression from the package linearmodels.
# Panel regression has some different attribute names, but it doesn't matter here.
info['Model:'] = lambda x: x.model.__class__.__name__
info['Model Family:'] = lambda x: x.family.__class.__name__
info['Link Function:'] = lambda x: x.family.link.__class__.__name__
info['Dependent Variable:'] = lambda x: x.model.endog_names
# add1
info['Dependent Variable:'] = lambda x: x.model.dependent.vars[0]
info['Date:'] = time_now
info['No. Observations:'] = lambda x: "%#6d" % x.nobs
info['Df Model:'] = lambda x: "%#6d" % x.df_model
info['Df Residuals:'] = lambda x: "%#6d" % x.df_resid
info['Converged:'] = lambda x: x.mle_retvals['converged']
info['No. Iterations:'] = lambda x: x.mle_retvals['iterations']
info['Method:'] = lambda x: x.method
info['Norm:'] = lambda x: x.fit_options['norm']
info['Scale Est.:'] = lambda x: x.fit_options['scale_est']
info['Cov. Type:'] = lambda x: x.fit_options['cov']
# add2
# I added the x.cov_type item because some model
# there is no fit_options attribute like OLS model
info['Covariance Type:'] = lambda x: x.cov_type
info['Covariance Type:'] = lambda x: x._cov_type # Panel
info['R-squared:'] = lambda x: "%#8.3f" % x.rsquared
info['Adj. R-squared:'] = lambda x: "%#8.3f" % x.rsquared_adj
info['Pseudo R-squared:'] = lambda x: "%#8.3f" % x.prsquared
info['AIC:'] = lambda x: "%8.4f" % x.aic
info['BIC:'] = lambda x: "%8.4f" % x.bic
info['Log-Likelihood:'] = lambda x: "%#8.5g" % x.llf
# add 3
info['Log-Likelihood:'] = lambda x: "%#8.5g" % x.loglike
info['LL-Null:'] = lambda x: "%#8.5g" % x.llnull
info['LLR p-value:'] = lambda x: "%#8.5g" % x.llr_pvalue
info['Deviance:'] = lambda x: "%#8.5g" % x.deviance
info['Pearson chi2:'] = lambda x: "%#6.3g" % x.pearson_chi2
info['F-statistic:'] = lambda x: "%#8.4g" % x.fvalue
# add4
info['F-statistic:'] = lambda x: "%#8.4g" % x.f_statistic.stat
info['Prob (F-statistic):'] = lambda x: "%#6.3g" % x.f_pvalue
# add5
info['Prob (F-statistic):'] = lambda x: "%#6.3g" % x.f_statistic.pval
info['Scale:'] = lambda x: "%#8.5g" % x.scale
# add6
info['Effects:'] = lambda x: ','.join(['%#8s' % i for i in x.included_effects])
out = OrderedDict()
for key, func in iteritems(info):
try:
out[key] = func(results)
# NOTE: some models don't have loglike defined (RLM), so that's NIE
except (AttributeError, KeyError, NotImplementedError):
pass
return out
def summary_params(results, yname=None, xname=None, alpha=.05, use_t=True,
skip_header=False, float_format="%.4f"):
'''create a summary table of parameters from results instance
Parameters
----------
res : results instance
some required information is directly taken from the result
instance
yname : string or None
optional name for the endogenous variable, default is "y"
xname : list of strings or None
optional names for the exogenous variables, default is "var_xx"
alpha : float
significance level for the confidence intervals
use_t : bool
indicator whether the p-values are based on the Student-t
distribution (if True) or on the normal distribution (if False)
skip_headers : bool
If false (default), then the header row is added. If true, then no
header row is added.
float_format : string
float formatting options (e.g. ".3g")
Returns
-------
params_table : SimpleTable instance
'''
from linearmodels.panel.results import PanelEffectsResults
from linearmodels.panel.results import RandomEffectsResults
from linearmodels.panel.results import PanelResults
res_tuple = (PanelEffectsResults,PanelResults,RandomEffectsResults)
if isinstance(results, tuple):
results, params, std_err, tvalues, pvalues, conf_int = results
# else:
# params = results.params
# bse = results.bse
# tvalues = results.tvalues
# pvalues = results.pvalues
# conf_int = results.conf_int(alpha)
#+4 :
# I added Panel results whose some attributes name are different.
# So I modified the code as follows.
elif isinstance(results,res_tuple):
bse = results.std_errors
tvalues = results.tstats
conf_int = results.conf_int(1-alpha)
else:
bse = results.bse
tvalues = results.tvalues
conf_int = results.conf_int(alpha)
params = results.params
pvalues = results.pvalues
data = np.array([params, bse, tvalues, pvalues]).T
data = np.hstack([data, conf_int])
data = pd.DataFrame(data)
if use_t:
data.columns = ['Coef.', 'Std.Err.', 't', 'P>|t|',
'[' + str(alpha/2), str(1-alpha/2) + ']']
else:
data.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|',
'[' + str(alpha/2), str(1-alpha/2) + ']']
if not xname:
# data.index = results.model.exog_names
try:
data.index = results.model.exog_names
except (AttributeError):
data.index = results.model.exog.vars
else:
data.index = xname
return data
#+5:
# The following function just can stack standard errors,but we
# usually use t statistics in reality. I modified the function to
# support one of standard errors, t or pvalues by parameter 'show' .
#+6:
# Bug: There exists different names for intercept item in different models,
# for example, an OLS model named it 'Intercept' while 'const' in logit models.
# So I also added a function to uniform the name to facilitate the data merge.
## Vertical summary instance for multiple models
# def _col_params(result, float_format='%.4f', stars=True):
# '''Stack coefficients and standard errors in single column
# '''
# # Extract parameters
# res = summary_params(result)
# # Format float
# for col in res.columns[:2]:
# res[col] = res[col].apply(lambda x: float_format % x)
# # Std.Errors in parentheses
# res.ix[:, 1] = '(' + res.ix[:, 1] + ')'
# # Significance stars
# if stars:
# idx = res.ix[:, 3] < .1
# res.ix[idx, 0] = res.ix[idx, 0] + '*'
# idx = res.ix[:, 3] < .05
# res.ix[idx, 0] = res.ix[idx, 0] + '*'
# idx = res.ix[:, 3] < .01
# res.ix[idx, 0] = res.ix[idx, 0] + '*'
# # Stack Coefs and Std.Errors
# res = res.ix[:, :2]
# res = res.stack()
# res = pd.DataFrame(res)
# res.columns = [str(result.model.endog_names)]
# return res
def _col_params(result, float_format='%.4f', stars=True,show='t'):
'''Stack coefficients and standard errors in single column
'''
# I add the parameter 'show' equals 't' to display tvalues by default,
# 'p' for pvalues and 'se' for std.err.
# Extract parameters
res = summary_params(result)
# Format float
# Note that scientific number will be formatted to 'str' type though '%.4f'
for col in res.columns[:3]:
res[col] = res[col].apply(lambda x: float_format % x)
res.iloc[:,3] = np.around(res.iloc[:,3],4)
# Significance stars
# .ix method will be deprecated,so .loc has been used.
if stars:
idx = res.iloc[:, 3] < .1
res.loc[res.index[idx], res.columns[0]] += '*'
idx = res.iloc[:, 3] < .05
res.loc[res.index[idx], res.columns[0]] += '*'
idx = res.iloc[:, 3] < .01
res.loc[res.index[idx], res.columns[0]] += '*'
# Std.Errors or tvalues or pvalues in parentheses
res.iloc[:,3] = res.iloc[:,3].apply(lambda x: float_format % x) # pvalues to str
res.iloc[:, 1] = '(' + res.iloc[:, 1] + ')'
res.iloc[:, 2] = '(' + res.iloc[:, 2] + ')'
res.iloc[:, 3] = '(' + res.iloc[:, 3] + ')'
# Stack Coefs and Std.Errors or pvalues
if show is 't':
res = res.iloc[:,[0,2]]
elif show is 'se':
res = res.iloc[:, :2]
elif show is 'p':
res = res.iloc[:,[0,3]]
res = res.stack()
res = pd.DataFrame(res)
try:
res.columns = [str(result.model.endog_names)]
except (AttributeError):
res.columns = result.model.dependent.vars #for PanelOLS
# I added the index name transfromation function
# to deal with MultiIndex and single level index.
def _Intercept_2const(df):
from pandas.core.indexes.multi import MultiIndex
if df.index.contains('Intercept'):
if isinstance(df.index,MultiIndex):
new_index = []
for i in df.index.values:
i = list(i)
if 'Intercept' in i:
i[i.index('Intercept')] = 'const'
new_index.append(i)
multi_index = lzip(*new_index)
df.index = MultiIndex.from_arrays(multi_index)
else:
index_list = df.index.tolist()
idx = index_list.index('Intercept')
index_list[idx] = 'const'
df.index = index_list
return df
return _Intercept_2const(res)
# def _col_info(result, info_dict=None):
# '''Stack model info in a column
# '''
# if info_dict is None:
# info_dict = {}
# out = []
# index = []
# for i in info_dict:
# if isinstance(info_dict[i], dict):
# # this is a specific model info_dict, but not for this result...
# continue
# try:
# out.append(info_dict[i](result))
# except:
# out.append('')
# index.append(i)
# out = pd.DataFrame({str(result.model.endog_names): out}, index=index)
# return out
#+7 :
# I modified the above function,main work is that
# I rename the parameter 'info_dict' to 'more_info',which is a list not a dict.
# Besides, I build a default dict to contain some model information
# from summary_model(), that will be printed by default and
# users can append other statistics by more_info parameter.
def _col_info(result, more_info=None):
'''Stack model info in a column
'''
model_info = summary_model(result)
default_info_ = OrderedDict()
default_info_['Model:'] = lambda x: x.get('Model:')
default_info_['No. Observations:'] = lambda x: x.get('No. Observations:')
default_info_['R-squared:'] = lambda x: x.get('R-squared:')
default_info_['Adj. R-squared:'] = lambda x: x.get('Adj. R-squared:')
default_info_['Pseudo R-squared:'] = lambda x: x.get('Pseudo R-squared:')
default_info_['F-statistic:'] = lambda x: x.get('F-statistic:')
default_info_['Covariance Type:'] = lambda x: x.get('Covariance Type:')
default_info_['Eeffects:'] = lambda x: x.get('Effects:')
default_info_['Covariance Type:'] = lambda x: x.get('Covariance Type:')
default_info = default_info_.copy()
for k,v in default_info_.items():
if v(model_info):
default_info[k] = v(model_info)
else:
default_info.pop(k) # pop the item whose value is none.
if more_info is None:
more_info = default_info
else:
if not isinstance(more_info,list):
more_info = [more_info]
for i in more_info:
try:
default_info[i] = getattr(result,i)
except (AttributeError, KeyError, NotImplementedError) as e:
raise e
more_info = default_info
try:
out = pd.DataFrame(more_info, index=[result.model.endog_names]).T
except (AttributeError):
out = pd.DataFrame(more_info, index=result.model.dependent.vars).T
return out
# def _make_unique(list_of_names):
# if len(set(list_of_names)) == len(list_of_names):
# return list_of_names
# # pandas does not like it if multiple columns have the same names
# from collections import defaultdict
# name_counter = defaultdict(str)
# header = []
# for _name in list_of_names:
# name_counter[_name] += "I"
# header.append(_name+" " + name_counter[_name])
# return header
#+8:
# Above function has a flaw that non-duplicated names will be add a suffix.
# And the time when endog_names duplicate four or more times ,the y
# names will be like 'y IIII' or 'y IIIIII...'.So I used the Arabic numerals.
def _make_unique(list_of_names):
if len(set(list_of_names)) == len(list_of_names):
return list_of_names
# pandas does not like it if multiple columns have the same names
from collections import defaultdict
dic_of_names = defaultdict(list)
for i,v in enumerate(list_of_names):
dic_of_names[v].append(i)
for v in dic_of_names.values():
if len(v)>1:
c = 0
for i in v:
c += 1
list_of_names[i] += '_%i' % c
return list_of_names
#+9:
# The following function is the most critical to work.
# In this function I added the parameters 'show' and 'title',
# and changed the default value of 'stars' into 'True',
# Then I changed the dict parameter 'info_dict' as a list one named 'more_info'.
# Finally I put 'const' at the first location by default in regressor_order.
#10+:
# Bug: np.unique() will disrupt the original order of list,
# this can result in index confusion.
# def summary_col(results, float_format='%.4f', model_names=[], stars=False,
# info_dict=None, regressor_order=[]):
# """
# Summarize multiple results instances side-by-side (coefs and SEs)
# Parameters
# ----------
# results : statsmodels results instance or list of result instances
# float_format : string
# float format for coefficients and standard errors
# Default : '%.4f'
# model_names : list of strings of length len(results) if the names are not
# unique, a roman number will be appended to all model names
# stars : bool
# print significance stars
# info_dict : dict
# dict of lambda functions to be applied to results instances to retrieve
# model info. To use specific information for different models, add a
# (nested) info_dict with model name as the key.
# Example: `info_dict = {"N":..., "R2": ..., "OLS":{"R2":...}}` would
# only show `R2` for OLS regression models, but additionally `N` for
# all other results.
# Default : None (use the info_dict specified in
# result.default_model_infos, if this property exists)
# regressor_order : list of strings
# list of names of the regressors in the desired order. All regressors
# not specified will be appended to the end of the list.
# """
def summary_col(results, float_format='%.4f', model_names=[], stars=True,
more_info=None, regressor_order=[],show='t',title=None):
if not isinstance(results, list):
results = [results]
cols = [_col_params(x, stars=stars, float_format=float_format,show=show) for x in
results]
# Unique column names (pandas has problems merging otherwise)
if model_names:
colnames = _make_unique(model_names)
else:
colnames = _make_unique([x.columns[0] for x in cols])
for i in range(len(cols)):
cols[i].columns = [colnames[i]]
merg = lambda x, y: x.merge(y, how='outer', right_index=True,
left_index=True)
summ = reduce(merg, cols)
# if regressor_order:
if not regressor_order:
regressor_order = ['const']
varnames = summ.index.get_level_values(0).tolist()
ordered = [x for x in regressor_order if x in varnames]
unordered = [x for x in varnames if x not in regressor_order + ['']]
# Note: np.unique can disrupt the original order of list 'unordered'.
# Then pd.Series().unique() works well.
# order = ordered + list(np.unique(unordered))
order = ordered + list(pd.Series(unordered).unique())
f = lambda idx: sum([[x + 'coef', x + 'stde'] for x in idx], [])
# summ.index = f(np.unique(varnames))
summ.index = f(pd.Series(varnames).unique())
summ = summ.reindex(f(order))
summ.index = [x[:-4] for x in summ.index]
idx = pd.Series(lrange(summ.shape[0])) % 2 == 1
summ.index = np.where(idx, '', summ.index.get_level_values(0))
summ = summ.fillna('')
# add infos about the models.
# if info_dict:
# cols = [_col_info(x, info_dict.get(x.model.__class__.__name__,
# info_dict)) for x in results]
# else:
# cols = [_col_info(x, getattr(x, "default_model_infos", None)) for x in
# results]
cols = [_col_info(x,more_info=more_info) for x in results]
# use unique column names, otherwise the merge will not succeed
for df , name in zip(cols, _make_unique([df.columns[0] for df in cols])):
df.columns = [name]
merg = lambda x, y: x.merge(y, how='outer', right_index=True,
left_index=True)
info = reduce(merg, cols)
info.columns = summ.columns
info = info.fillna('')
# dat = pd.DataFrame(np.vstack([summ, info])) # pd.concat better, but error
# dat.columns = summ.columns
# dat.index = pd.Index(summ.index.tolist() + info.index.tolist())
# summ = dat
# summ = summ.fillna('')
# smry = Summary()
# smry.add_df(summ, header=True, align='l')
# smry.add_text('Standard errors in parentheses.')
# if stars:
# smry.add_text('* p<.1, ** p<.05, ***p<.01')*p<.01')
# return smry
if show is 't':
note = ['\t t statistics in parentheses.']
if show is 'se':
note = ['\t Std. error in parentheses.']
if show is 'p':
note = ['\t pvalues in parentheses.']
if stars:
note += ['\t * p<.1, ** p<.05, ***p<.01']
#Here I tried two ways to put extra text in index-location or
# columns-location,finally found the former is better.
# note_df = pd.DataFrame(note,index=['note']+['']*(len(note)-1),
# columns=[summ.columns[0]])
note_df = pd.DataFrame([ ],index=['note:']+note,
columns=summ.columns).fillna('')
# summ_all = pd.concat([summ,info,note_df],axis=0)
if title is not None:
title = str(title)
else:
title = '\t Results Summary'
# Here I tried to construct a title DataFrame and
# adjust the location of title corresponding to the length of columns.
# But I failed because of not good printing effect.
# col_len = len(summ.columns)
# fake_data = ['']*col_len
# if col_len % 2 == 1:
# from math import ceil
# i = ceil(col_len/2)
# else:
# i = int(col_len/2)
# fake_data[i-1] = title
# title_df = pd.DataFrame([fake_data],index=[''],columns=summ.columns).fillna('')
title_df = pd.DataFrame([],index=[title],columns=summ.columns).fillna('')
smry = Summary()
smry.add_df(title_df,header=False,align='l') # title DF
smry.add_df(summ, header=True, align='l') # params DF
smry.add_df(info, header=False, align='l') # model information DF
smry.add_df(note_df, header=False, align='l') # extra text DF
return smry
......
Next show some examples including OLS,GLM,GEE,LOGIT and Panel regression results.Other models do not test yet. But what can be determined is that multi-equation models like VAR model does not work here.
# Load the data and fit
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
# ols
dat = sm.datasets.get_rdataset("Guerry", "HistData").data
res_ols = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
#glm
data = sm.datasets.scotland.load()
data.exog = sm.add_constant(data.exog)
gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
res_glm = gamma_model.fit()
# gee
data = sm.datasets.get_rdataset('epil', package='MASS').data
fam = sm.families.Poisson()
ind = sm.cov_struct.Exchangeable()
mod = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
res_gee = mod.fit()
# logit
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
res_logit = logit_mod.fit()
# load panel data and fit the model
from linearmodels.datasets import wage_panel
data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(['nr', 'year'])
data['year'] = year
from linearmodels.panel import PooledOLS
exog_vars = ['black','hisp','exper','expersq','married', 'educ', 'union', 'year']
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)
res_pooled = mod.fit()
from linearmodels.panel import PanelOLS
exog_vars = ['expersq','union','married']
exog = sm.add_constant(data[exog_vars])
mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)
res_fe_re = mod.fit()
from linearmodels.panel import FirstDifferenceOLS
exog_vars = ['exper','expersq', 'union', 'married']
exog = data[exog_vars]
mod = FirstDifferenceOLS(data.lwage, exog)
res_fd = mod.fit()
exog_vars = ['black','hisp','exper','expersq','married', 'educ', 'union']
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)
res_robust = mod.fit(cov_type='robust')
res_clust_entity = mod.fit(cov_type='clustered', cluster_entity=True)
res_clust_entity_time = mod.fit(cov_type='clustered', cluster_entity=True, cluster_time=True)
Then we import the function summary_col() from the modified summary2 that I named summary3 as a module.Thus we can directly output the concatenated results with stars and some default model informations.
from summary3 import summary_col
For single regression result,we can directly pass the result object,surely a list is better:
# summary_col(res_ols)
summary_col([res_ols])
This will return the Summary class instance, in Notebook the output is:
We can also use print function to output as text. Parameter more_info will add new model information to print. For example,
The incompete output is
We can also use regressor_order to designate the order of variables,show to display the anyone of pvalues,tvalues or std.err you want,title to define a custom title for your table.
The output is
Finally,if you want to export the summary results to external files,you can do like this
summary_col([res_glm,res_logit]).to_excel()
Above will obtain a excel file in your working directory named 'summary_results'.
Of course you can define the filename and path just like use pandas(actually indeed so) .
Thanks for this idea and the code! I am not particularly familiar with the details of the summary functions, so I can't speak to the code itself right now.
But one thing that might be useful (especially to someone like me who is not particularly familiar) is to make a pull request with the changes included. For example, I can't tell if this is all new code, or if this is replacing / improving some existing code.
Recently I want to use python to output statistic results like Stata's command
esttab
. But I only know two methods likesummary()
andsummary2()
by reading official examples. In addition, statistical results cannot be output to external files like Excel . So I read the source code of above two methods expect that I can do something. Ultimately, I make some code changes in summary2. Fortunately, the new summary2 can directly output the results of multiple models with stars by it'ssummary_col()
function. And at the same time, we can use pandas methodto_excel()
orto_csv
to export the summary results as .xls or .csv file. Besides, my modifications also support Panel Regression from the packagelinearmodels
. Of course, I also found some bugs in summary2.My main idea is to make all the parts of summary table as DataFrame and then append them into
self.tables
.So pandas output method works. Detailed changes and descriptions and possible bugs in summary2 are shown below, and some examples are given at the end. All changes will be marked with #+ Arabic number like# +1:
,#+2:
, and unchanged parts are replaced by......
Next show some examples including OLS,GLM,GEE,LOGIT and Panel regression results.Other models do not test yet. But what can be determined is that multi-equation models like VAR model does not work here.
Then we import the function
summary_col()
from the modified summary2 that I named summary3 as a module.Thus we can directly output the concatenated results with stars and some default model informations.For single regression result,we can directly pass the result object,surely a list is better:
This will return the Summary class instance, in Notebook the output is:
We can also use print function to output as text. Parameter
more_info
will add new model information to print. For example,The incompete output is
We can also use
regressor_order
to designate the order of variables,show
to display the anyone of pvalues,tvalues or std.err you want,title
to define a custom title for your table.The output is
Finally,if you want to export the summary results to external files,you can do like this
Above will obtain a excel file in your working directory named 'summary_results'.
Of course you can define the filename and path just like use pandas(actually indeed so) .
The text was updated successfully, but these errors were encountered: