Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some perhaps significant changes and advices for summary2 #4792

Open
young2j opened this issue Jul 9, 2018 · 2 comments
Open

Some perhaps significant changes and advices for summary2 #4792

young2j opened this issue Jul 9, 2018 · 2 comments

Comments

@young2j
Copy link

young2j commented Jul 9, 2018

Recently I want to use python to output statistic results like Stata's command esttab . But I only know two methods like summary() and summary2() by reading official examples. In addition, statistical results cannot be output to external files like Excel . So I read the source code of above two methods expect that I can do something. Ultimately, I make some code changes in summary2. Fortunately, the new summary2 can directly output the results of multiple models with stars by it's summary_col() function. And at the same time, we can use pandas method to_excel() or to_csv to export the summary results as .xls or .csv file. Besides, my modifications also support Panel Regression from the package linearmodels. Of course, I also found some bugs in summary2.
My main idea is to make all the parts of summary table as DataFrame and then append them into self.tables.So pandas output method works. Detailed changes and descriptions and possible bugs in summary2 are shown below, and some examples are given at the end. All changes will be marked with #+ Arabic number like # +1: ,#+2: , and unchanged parts are replaced by ......

class Summary(object):
     
          ......
     
    # +1: 
    # I added the output method based on pd.DataFrame().to_excel/csv(). 
    # I merged  the results when they are output, mainly in order to make 
    #  the output be distinguishable and more beautiful when printing. 
    #  Maybe there are other better ways.

    def to_excel(self,path=None):
        tables = self.tables
        import  os
        cwd = os.getcwd()
        if path:
            path = path 
        else:
            path = cwd + '\\summary_results.xlsx'
        summ_df = pd.concat(tables,axis=0)
        return summ_df.to_excel(path)
    def to_csv(self,path=None):
        tables = self.tables
        import  os
        cwd = os.getcwd()
        if  path:
            path = path 
        else:
            path = cwd + '\\summary_results.csv'
        summ_df = pd.concat(tables,axis=0)
        return summ_df.to_csv(path)

def _measure_tables(tables, settings):
    '''Compare width of ascii tables in a list and calculate padding values.
    We add space to each col_sep to get us as close as possible to the
    width of the largest table. Then, we add a few spaces to the first
    column to pad the rest.
    '''
    # simple_tables = _simple_tables(tables, settings)
    # tab = [x.as_text() for x in simple_tables]

    # length = [len(x.splitlines()[0]) for x in tab]
    # len_max = max(length)
    # pad_sep = []
    # pad_index = []

    # for i in range(len(tab)):
    #     nsep = tables[i].shape[1] - 1
    #     pad = int((len_max - length[i]) / nsep)
    #     pad_sep.append(pad)
    #     len_new = length[i] + nsep * pad
    #     pad_index.append(len_max - len_new)

    # return pad_sep, pad_index, max(length)

    #+2 : 
    # Above codes may have two bugs:
    #Bug1: If tables or settings is an empty list, 
             # then _simple_tables() will return [].
             # that means length is also empty , 
             # so max() will raise an error. 
   # Bug2: If table[i] just has one column, '/nsep' will raise ZeroDivisionError. 
             # So I added exception capture codes as follows.

    simple_tables = _simple_tables(tables, settings)
   
    if simple_tables == []:
        len_max = 0
        pad_sep = None
        pad_index = None
    else:
        tab = [x.as_text() for x in simple_tables]
        length = [len(x.splitlines()[0]) for x in tab]
        len_max = max(length)
        pad_sep = []
        pad_index = []
        for i in range(len(tab)):
            nsep = tables[i].shape[1] - 1
            try:
                pad = int((len_max - length[i]) / nsep)
            except (ZeroDivisionError):
                pad = int((len_max - length[i]))
            pad_sep.append(pad)
            len_new = length[i] + nsep * pad
            pad_index.append(len_max - len_new)

    return pad_sep, pad_index, len_max

      ......

def summary_model(results):
    '''Create a dict with information about the model
    '''
    def time_now(*args, **kwds):
        now = datetime.datetime.now()
        return now.strftime('%Y-%m-%d %H:%M')
    info = OrderedDict()

 #+3:
 # I added some informations of  Panel regression from the package linearmodels. 
 # Panel regression has some different attribute names, but it doesn't matter here.

    info['Model:'] = lambda x: x.model.__class__.__name__
    info['Model Family:'] = lambda x: x.family.__class.__name__
    info['Link Function:'] = lambda x: x.family.link.__class__.__name__
    info['Dependent Variable:'] = lambda x: x.model.endog_names
    # add1  
    info['Dependent Variable:'] = lambda x: x.model.dependent.vars[0]
    
    info['Date:'] = time_now
    info['No. Observations:'] = lambda x: "%#6d" % x.nobs
    info['Df Model:'] = lambda x: "%#6d" % x.df_model
    info['Df Residuals:'] = lambda x: "%#6d" % x.df_resid
    info['Converged:'] = lambda x: x.mle_retvals['converged']
    info['No. Iterations:'] = lambda x: x.mle_retvals['iterations']
    info['Method:'] = lambda x: x.method
    info['Norm:'] = lambda x: x.fit_options['norm']
    info['Scale Est.:'] = lambda x: x.fit_options['scale_est']
    info['Cov. Type:'] = lambda x: x.fit_options['cov']
    # add2 
    # I added the x.cov_type item because some model 
    # there is no fit_options  attribute like OLS model

    info['Covariance Type:'] = lambda x: x.cov_type
    info['Covariance Type:'] = lambda x: x._cov_type # Panel

    info['R-squared:'] = lambda x: "%#8.3f" % x.rsquared
    info['Adj. R-squared:'] = lambda x: "%#8.3f" % x.rsquared_adj
    info['Pseudo R-squared:'] = lambda x: "%#8.3f" % x.prsquared
    info['AIC:'] = lambda x: "%8.4f" % x.aic
    info['BIC:'] = lambda x: "%8.4f" % x.bic
    info['Log-Likelihood:'] = lambda x: "%#8.5g" % x.llf
    # add 3
    info['Log-Likelihood:'] = lambda x: "%#8.5g" % x.loglike

    info['LL-Null:'] = lambda x: "%#8.5g" % x.llnull
    info['LLR p-value:'] = lambda x: "%#8.5g" % x.llr_pvalue
    info['Deviance:'] = lambda x: "%#8.5g" % x.deviance
    info['Pearson chi2:'] = lambda x: "%#6.3g" % x.pearson_chi2
    info['F-statistic:'] = lambda x: "%#8.4g" % x.fvalue
    # add4
    info['F-statistic:'] = lambda x: "%#8.4g" % x.f_statistic.stat

    info['Prob (F-statistic):'] = lambda x: "%#6.3g" % x.f_pvalue
    # add5
    info['Prob (F-statistic):'] = lambda x: "%#6.3g" % x.f_statistic.pval

    info['Scale:'] = lambda x: "%#8.5g" % x.scale
    # add6
    info['Effects:'] = lambda x: ','.join(['%#8s' % i for i in x.included_effects])
   
    out = OrderedDict()
    for key, func in iteritems(info):
        try:
            out[key] = func(results)
        # NOTE: some models don't have loglike defined (RLM), so that's NIE
        except (AttributeError, KeyError, NotImplementedError):
            pass
    return out

def summary_params(results, yname=None, xname=None, alpha=.05, use_t=True,
                   skip_header=False, float_format="%.4f"):
    '''create a summary table of parameters from results instance

    Parameters
    ----------
    res : results instance
        some required information is directly taken from the result
        instance
    yname : string or None
        optional name for the endogenous variable, default is "y"
    xname : list of strings or None
        optional names for the exogenous variables, default is "var_xx"
    alpha : float
        significance level for the confidence intervals
    use_t : bool
        indicator whether the p-values are based on the Student-t
        distribution (if True) or on the normal distribution (if False)
    skip_headers : bool
        If false (default), then the header row is added. If true, then no
        header row is added.
    float_format : string
        float formatting options (e.g. ".3g")

    Returns
    -------
    params_table : SimpleTable instance
    '''
    from linearmodels.panel.results import PanelEffectsResults
    from linearmodels.panel.results import RandomEffectsResults 
    from linearmodels.panel.results import PanelResults
    res_tuple = (PanelEffectsResults,PanelResults,RandomEffectsResults)

    if isinstance(results, tuple):
        results, params, std_err, tvalues, pvalues, conf_int = results
    # else:
    #     params = results.params
    #     bse = results.bse
    #     tvalues = results.tvalues
    #     pvalues = results.pvalues
    #     conf_int = results.conf_int(alpha)
   
   #+4 : 
   # I added Panel results whose some attributes name are different.
   # So I modified the code as follows.

    elif isinstance(results,res_tuple):
        bse = results.std_errors
        tvalues = results.tstats
        conf_int = results.conf_int(1-alpha)
    else:
        bse = results.bse
        tvalues = results.tvalues
        conf_int = results.conf_int(alpha) 
    params = results.params
    pvalues = results.pvalues

    data = np.array([params, bse, tvalues, pvalues]).T
    data = np.hstack([data, conf_int])
    data = pd.DataFrame(data)

    if use_t:
        data.columns = ['Coef.', 'Std.Err.', 't', 'P>|t|',
                        '[' + str(alpha/2), str(1-alpha/2) + ']']
    else:
        data.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|',
                        '[' + str(alpha/2), str(1-alpha/2) + ']']

    if not xname:
        # data.index = results.model.exog_names
        try:
            data.index = results.model.exog_names
        except (AttributeError):
            data.index = results.model.exog.vars
    else:
        data.index = xname

    return data

    #+5:
    # The following function just can stack standard errors,but  we
    #  usually use t statistics in reality. I modified the function to 
    # support one of standard errors, t or pvalues by parameter 'show' .
    #+6:
    # Bug: There exists different names for intercept item in different models,
    # for example, an OLS model named it 'Intercept' while 'const' in logit models.
    # So I also added a function to uniform the name to facilitate the data merge.

## Vertical summary instance for multiple models
# def _col_params(result, float_format='%.4f', stars=True):
#     '''Stack coefficients and standard errors in single column
#     '''

#     # Extract parameters
#     res = summary_params(result)
#     # Format float
#     for col in res.columns[:2]:
#         res[col] = res[col].apply(lambda x: float_format % x)
#     # Std.Errors in parentheses
#     res.ix[:, 1] = '(' + res.ix[:, 1] + ')'
#     # Significance stars
#     if stars:
#         idx = res.ix[:, 3] < .1
#         res.ix[idx, 0] = res.ix[idx, 0] + '*'
#         idx = res.ix[:, 3] < .05
#         res.ix[idx, 0] = res.ix[idx, 0] + '*'
#         idx = res.ix[:, 3] < .01
#         res.ix[idx, 0] = res.ix[idx, 0] + '*'
#     # Stack Coefs and Std.Errors
#     res = res.ix[:, :2]
#     res = res.stack()
#     res = pd.DataFrame(res)
#     res.columns = [str(result.model.endog_names)]
#     return res

def _col_params(result, float_format='%.4f', stars=True,show='t'):
    '''Stack coefficients and standard errors in single column
    '''

 # I add the parameter 'show' equals 't' to display tvalues by default,
 # 'p' for pvalues and 'se' for std.err.
    
    # Extract parameters
    res = summary_params(result)
   
    # Format float
    # Note that scientific number will be formatted to  'str' type though '%.4f'

    for col in res.columns[:3]:
        res[col] = res[col].apply(lambda x: float_format % x)
    res.iloc[:,3] = np.around(res.iloc[:,3],4)
    
    # Significance stars
    # .ix method will be deprecated,so .loc has been used.

    if stars:
        idx = res.iloc[:, 3] < .1
        res.loc[res.index[idx], res.columns[0]] += '*'
        idx = res.iloc[:, 3] < .05
        res.loc[res.index[idx], res.columns[0]] += '*'
        idx = res.iloc[:, 3] < .01
        res.loc[res.index[idx], res.columns[0]] += '*'

    # Std.Errors or tvalues or  pvalues in parentheses
    res.iloc[:,3] = res.iloc[:,3].apply(lambda x: float_format % x) # pvalues to str
    res.iloc[:, 1] = '(' + res.iloc[:, 1] + ')'
    res.iloc[:, 2] = '(' + res.iloc[:, 2] + ')'
    res.iloc[:, 3] = '(' + res.iloc[:, 3] + ')'

    # Stack Coefs and Std.Errors or pvalues
    if show is 't':
        res = res.iloc[:,[0,2]]
    elif show is 'se':
        res = res.iloc[:, :2]
    elif show is 'p':
        res = res.iloc[:,[0,3]]
    res = res.stack()
    res = pd.DataFrame(res)
    try:
        res.columns = [str(result.model.endog_names)]
    except (AttributeError):
        res.columns = result.model.dependent.vars #for PanelOLS
    
   # I added the index name transfromation function 
   # to deal with MultiIndex and single level index.

    def _Intercept_2const(df):
        from pandas.core.indexes.multi import MultiIndex
        if df.index.contains('Intercept'):
            if isinstance(df.index,MultiIndex):
                new_index = []
                for i in df.index.values:
                    i = list(i)
                    if 'Intercept' in i:
                        i[i.index('Intercept')] = 'const'
                    new_index.append(i)
                multi_index = lzip(*new_index)
                df.index = MultiIndex.from_arrays(multi_index)
            else:
                index_list = df.index.tolist()
                idx = index_list.index('Intercept')
                index_list[idx] = 'const'
                df.index = index_list
        return df
    return _Intercept_2const(res)

# def _col_info(result, info_dict=None):
#     '''Stack model info in a column
#     '''
#     if info_dict is None:
#         info_dict = {}
#     out = []
#     index = []
#     for i in info_dict:
#         if isinstance(info_dict[i], dict):
#             # this is a specific model info_dict, but not for this result...
#             continue
#         try:
#             out.append(info_dict[i](result))
#         except:
#             out.append('')
#         index.append(i)
#     out = pd.DataFrame({str(result.model.endog_names): out}, index=index)
#     return out

   #+7 :
   # I modified the above function,main work is that 
   # I rename the parameter 'info_dict' to 'more_info',which is a list not a dict.
   # Besides, I build a default dict to contain some model information 
   # from summary_model(), that will be printed by default and 
   # users can append other statistics by more_info parameter.
   
def _col_info(result, more_info=None):
   
    '''Stack model info in a column
    '''
    model_info = summary_model(result)
    default_info_ = OrderedDict()
    default_info_['Model:'] = lambda x: x.get('Model:')
    default_info_['No. Observations:'] = lambda x: x.get('No. Observations:')
    default_info_['R-squared:'] = lambda x: x.get('R-squared:')
    default_info_['Adj. R-squared:'] = lambda x: x.get('Adj. R-squared:')                    
    default_info_['Pseudo R-squared:'] = lambda x: x.get('Pseudo R-squared:')
    default_info_['F-statistic:'] = lambda x: x.get('F-statistic:')
    default_info_['Covariance Type:'] = lambda x: x.get('Covariance Type:')
    default_info_['Eeffects:'] = lambda x: x.get('Effects:')
    default_info_['Covariance Type:'] = lambda x: x.get('Covariance Type:')

    default_info = default_info_.copy()
    for k,v in default_info_.items():
        if v(model_info):
            default_info[k] = v(model_info)
        else:
            default_info.pop(k) # pop the item whose value is none.
            
    if more_info is None:
        more_info = default_info
    else:
        if not isinstance(more_info,list):
            more_info = [more_info]
        for i in more_info:
            try:
                default_info[i] = getattr(result,i)
            except (AttributeError, KeyError, NotImplementedError) as e:
                raise e
        more_info = default_info
    try:
        out = pd.DataFrame(more_info, index=[result.model.endog_names]).T
    except (AttributeError):
        out = pd.DataFrame(more_info, index=result.model.dependent.vars).T
    return out

# def _make_unique(list_of_names):
#     if len(set(list_of_names)) == len(list_of_names):
#         return list_of_names
#     # pandas does not like it if multiple columns have the same names
#     from collections import defaultdict
#     name_counter = defaultdict(str)
#     header = []
#     for _name in list_of_names:
#         name_counter[_name] += "I"
#         header.append(_name+" " + name_counter[_name])
#     return header
   
   #+8:
   # Above function has a flaw that non-duplicated names will be add a suffix.
   # And the time when endog_names duplicate four or more times ,the y 
   # names will be like 'y IIII' or 'y IIIIII...'.So I used the Arabic numerals.

def _make_unique(list_of_names):
    if len(set(list_of_names)) == len(list_of_names):
        return list_of_names
    # pandas does not like it if multiple columns have the same names
    from collections import defaultdict
    dic_of_names = defaultdict(list)
    for i,v in enumerate(list_of_names):
        dic_of_names[v].append(i)
    for v in  dic_of_names.values():
        if len(v)>1:
            c = 0
            for i in v:
                c += 1
                list_of_names[i] += '_%i' % c
    return list_of_names

   #+9:
   # The following function is the most critical to work.
   # In this function  I added the parameters 'show' and 'title',
   # and changed the default value of 'stars' into 'True',
   # Then  I changed the dict parameter 'info_dict' as a list one named 'more_info'.
   # Finally I put 'const'  at the first location by default in regressor_order.
   #10+:
   # Bug: np.unique() will disrupt the original order of list,
   # this can result in index confusion.

# def summary_col(results, float_format='%.4f', model_names=[], stars=False,
#                 info_dict=None, regressor_order=[]): 
    # """
    # Summarize multiple results instances side-by-side (coefs and SEs)

    # Parameters
    # ----------
    # results : statsmodels results instance or list of result instances
    # float_format : string
    #     float format for coefficients and standard errors
    #     Default : '%.4f'
    # model_names : list of strings of length len(results) if the names are not
    #     unique, a roman number will be appended to all model names
    # stars : bool
    #     print significance stars
    # info_dict : dict
    #     dict of lambda functions to be applied to results instances to retrieve
    #     model info. To use specific information for different models, add a
    #     (nested) info_dict with model name as the key.
    #     Example: `info_dict = {"N":..., "R2": ..., "OLS":{"R2":...}}` would
    #     only show `R2` for OLS regression models, but additionally `N` for
    #     all other results.
    #     Default : None (use the info_dict specified in
    #     result.default_model_infos, if this property exists)
    # regressor_order : list of strings
    #     list of names of the regressors in the desired order. All regressors
    #     not specified will be appended to the end of the list.
    # """

def summary_col(results, float_format='%.4f', model_names=[], stars=True,
                more_info=None, regressor_order=[],show='t',title=None): 
    if not isinstance(results, list):
        results = [results]

    cols = [_col_params(x, stars=stars, float_format=float_format,show=show) for x in
            results]

    # Unique column names (pandas has problems merging otherwise)
    if model_names:
        colnames = _make_unique(model_names)
    else:
        colnames = _make_unique([x.columns[0] for x in cols])
    for i in range(len(cols)):
        cols[i].columns = [colnames[i]]

    merg = lambda x, y: x.merge(y, how='outer', right_index=True,
                                left_index=True)
    summ = reduce(merg, cols)

    # if regressor_order:
    if not regressor_order:
        regressor_order = ['const']
    
    varnames = summ.index.get_level_values(0).tolist()
    ordered = [x for x in regressor_order if x in varnames]
    unordered = [x for x in varnames if x not in regressor_order + ['']]

    # Note: np.unique can disrupt the original order  of list 'unordered'.
    # Then pd.Series().unique()  works well.

    # order = ordered + list(np.unique(unordered))
    order = ordered + list(pd.Series(unordered).unique())

    f = lambda idx: sum([[x + 'coef', x + 'stde'] for x in idx], [])
    # summ.index = f(np.unique(varnames))
    summ.index = f(pd.Series(varnames).unique())
    summ = summ.reindex(f(order))
    summ.index = [x[:-4] for x in summ.index]

    idx = pd.Series(lrange(summ.shape[0])) % 2 == 1
    summ.index = np.where(idx, '', summ.index.get_level_values(0))
    summ = summ.fillna('')
    
    # add infos about the models.
#     if info_dict:
#         cols = [_col_info(x, info_dict.get(x.model.__class__.__name__,
#                                            info_dict)) for x in results]
#     else:
#         cols = [_col_info(x, getattr(x, "default_model_infos", None)) for x in
#                 results]
    
       cols = [_col_info(x,more_info=more_info) for x in results]
    
    # use unique column names, otherwise the merge will not succeed
    for df , name in zip(cols, _make_unique([df.columns[0] for df in cols])):
        df.columns = [name]
    merg = lambda x, y: x.merge(y, how='outer', right_index=True,
                                left_index=True)
    info = reduce(merg, cols)
    info.columns = summ.columns
    info = info.fillna('')
#     dat = pd.DataFrame(np.vstack([summ, info]))  # pd.concat better, but error
#     dat.columns = summ.columns
#     dat.index = pd.Index(summ.index.tolist() + info.index.tolist())
#     summ = dat

#     summ = summ.fillna('')

#     smry = Summary()
#     smry.add_df(summ, header=True, align='l')
#     smry.add_text('Standard errors in parentheses.')
#     if stars:
#         smry.add_text('* p<.1, ** p<.05, ***p<.01')*p<.01')
#     return smry

    if show is 't':
        note = ['\t t statistics in parentheses.']
    if show is 'se':
        note = ['\t Std. error in parentheses.']
    if show is 'p':
        note = ['\t pvalues in parentheses.']
    if stars:
        note +=  ['\t * p<.1, ** p<.05, ***p<.01']

#Here  I tried two ways to put extra text in index-location or
# columns-location,finally found the former is better.

#     note_df = pd.DataFrame(note,index=['note']+['']*(len(note)-1),
#                                                        columns=[summ.columns[0]])

    note_df = pd.DataFrame([ ],index=['note:']+note,
                                                columns=summ.columns).fillna('')
#     summ_all = pd.concat([summ,info,note_df],axis=0)
    
    if title is not None:
        title = str(title)
    else:
        title = '\t Results Summary'
    
    # Here I tried to construct a title DataFrame and 
    # adjust the location of title corresponding to the length of columns. 
    # But I failed because of not good printing effect.
    
    # col_len = len(summ.columns)
    # fake_data = ['']*col_len
    # if col_len % 2 == 1:
    #     from math  import ceil
    #     i = ceil(col_len/2)
    # else:
    #     i = int(col_len/2)
    # fake_data[i-1] = title
    # title_df = pd.DataFrame([fake_data],index=[''],columns=summ.columns).fillna('')
    
    title_df = pd.DataFrame([],index=[title],columns=summ.columns).fillna('')
    
    smry = Summary()
    smry.add_df(title_df,header=False,align='l') # title DF
    smry.add_df(summ, header=True, align='l') # params DF
    smry.add_df(info, header=False, align='l') # model information DF
    smry.add_df(note_df, header=False, align='l') # extra text DF
    return smry

   ......

Next show some examples including OLS,GLM,GEE,LOGIT and Panel regression results.Other models do not test yet. But what can be determined is that multi-equation models like VAR model does not work here.

# Load the data and fit
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
# ols
dat = sm.datasets.get_rdataset("Guerry", "HistData").data
res_ols = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
#glm
data = sm.datasets.scotland.load()
data.exog = sm.add_constant(data.exog)
gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
res_glm = gamma_model.fit()
# gee
data = sm.datasets.get_rdataset('epil', package='MASS').data
fam = sm.families.Poisson()
ind = sm.cov_struct.Exchangeable()
mod = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
res_gee = mod.fit()
# logit
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
res_logit = logit_mod.fit()

# load panel data and fit the model
from linearmodels.datasets import wage_panel
data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(['nr', 'year'])
data['year'] = year

from linearmodels.panel import PooledOLS
exog_vars = ['black','hisp','exper','expersq','married', 'educ', 'union', 'year']
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)
res_pooled = mod.fit()

from linearmodels.panel import PanelOLS
exog_vars = ['expersq','union','married']
exog = sm.add_constant(data[exog_vars])
mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)
res_fe_re = mod.fit()

from linearmodels.panel import FirstDifferenceOLS
exog_vars = ['exper','expersq', 'union', 'married']
exog = data[exog_vars]
mod = FirstDifferenceOLS(data.lwage, exog)
res_fd = mod.fit()

exog_vars = ['black','hisp','exper','expersq','married', 'educ', 'union']
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data.lwage, exog)

res_robust = mod.fit(cov_type='robust')
res_clust_entity = mod.fit(cov_type='clustered', cluster_entity=True)
res_clust_entity_time = mod.fit(cov_type='clustered', cluster_entity=True, cluster_time=True)

Then we import the function summary_col() from the modified summary2 that I named summary3 as a module.Thus we can directly output the concatenated results with stars and some default model informations.

from summary3 import summary_col 

For single regression result,we can directly pass the result object,surely a list is better:

# summary_col(res_ols)
summary_col([res_ols]) 

This will return the Summary class instance, in Notebook the output is:

We can also use print function to output as text. Parameter more_info will add new model information to print. For example,

print(summary_col([res_ols,res_glm,res_gee,res_logit],more_info=['df_model','scale']))

The incompete output is

We can also use regressor_order to designate the order of variables,show to display the anyone of pvalues,tvalues or std.err you want,title to define a custom title for your table.

print(sumary_col([res_fe_re,res_fd,res_robust,res_clust_entity,res_clust_entity_time],
             regressor_order=['black'],show='se',title='Panel Results Summary Table'))

The output is

Finally,if you want to export the summary results to external files,you can do like this

summary_col([res_glm,res_logit]).to_excel()

Above will obtain a excel file in your working directory named 'summary_results'.
Of course you can define the filename and path just like use pandas(actually indeed so) .

summary_col([res_clust_entity,res_fd]).to_csv('your path\\filename.csv')
@ChadFulton
Copy link
Member

Thanks for this idea and the code! I am not particularly familiar with the details of the summary functions, so I can't speak to the code itself right now.

But one thing that might be useful (especially to someone like me who is not particularly familiar) is to make a pull request with the changes included. For example, I can't tell if this is all new code, or if this is replacing / improving some existing code.

@jbrockmendel
Copy link
Contributor

What chad said. Also, please don’t leave commented-out code, delete it if it is no longer useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants