Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ Having trouble getting Exogenous names in model summaries #5492

Open
emilmirzayev opened this issue Feb 12, 2019 · 14 comments
Open

FAQ Having trouble getting Exogenous names in model summaries #5492

emilmirzayev opened this issue Feb 12, 2019 · 14 comments

Comments

@emilmirzayev
Copy link
Contributor

Hi. I am using using statsmodels installed with Anaconda with following versions:


>>> statsmodels.__version__
'0.9.0'
>>> exit()

(base) C:\Users\emirzayev>conda --version
conda 4.6.2

Now when I fit a model, in summary table, I do not see the names of the variables. Only x1, x2, xN. Is there a way to have the variable names also in the summary? Or this change is permanent.

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                 choice   No. Observations:                 1766
Model:                          Logit   Df Residuals:                     1757
Method:                           MLE   Df Model:                            8
Date:                Tue, 12 Feb 2019   Pseudo R-squ.:                  -2.762
Time:                        14:13:09   Log-Likelihood:                -1214.0
converged:                       True   LL-Null:                       -322.66
                                        LLR p-value:                     1.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0751      0.070      1.073      0.283      -0.062       0.212
x2            -0.0944      0.109     -0.866      0.386      -0.308       0.119
x3            -0.1273      0.103     -1.233      0.218      -0.330       0.075
x4             0.0641      0.050      1.273      0.203      -0.035       0.163
x5             0.0682      0.054      1.266      0.205      -0.037       0.174
x6             0.1070      0.122      0.880      0.379      -0.131       0.345
x7            -0.0595      0.076     -0.778      0.437      -0.209       0.090
x8             0.1410      0.143      0.987      0.324      -0.139       0.421
x9             0.1769      0.093      1.896      0.058      -0.006       0.360
==============================================================================

Thanks beforehand

@josef-pkt
Copy link
Member

josef-pkt commented Feb 12, 2019

Are you using numpy arrays for endog and exog in Logit?

numpy arrays don't hold names of variables/columns, so the param or exog names are just made up.

Using pandas DataFrames for exog or using formulas preserves the names, and uses it in the summary.

There is a way to set the names but that still does not have a very clean API.
If you have your own xnames, then
model.exog_names[:] = xnames
Note this is inplace modification not assigment.

just for summary:
summary has an xname keyword that allows overriding the parameter/exog names. That will not change any attributes and is only used for creating the summary table.

xnames needs to be a list of strings with same length as params

@josef-pkt josef-pkt changed the title [Question] Having trouble getting Exogenous names in model summaries FAQ Having trouble getting Exogenous names in model summaries Feb 12, 2019
@emilmirzayev
Copy link
Contributor Author

Yes, I am using NumPy arrays for this. The reason that I noticed it now, I used fit_trasform on the data now. I will try your method and write my feedback here. Waiting for the code to be executed first

@emilmirzayev
Copy link
Contributor Author

emilmirzayev commented Feb 12, 2019

@josef-pkt , I tried to assign new names after the initialization, as

model_ap_simple = sm.Logit(y_train, X_train)

model_ap_simple.exog_names[:] = exog_variables_simple
model_ap_simple.fit()
Optimization terminated successfully.
         Current function value: 0.688823
         Iterations 5
<statsmodels.discrete.discrete_model.BinaryResultsWrapper at 0x213ce544080>

with open("AffinityPropagationSimpleModel.txt", "w") as file:
    file.write(str(model_ap_simple.summary()))
Traceback (most recent call last):

  File "<ipython-input-141-efe57b8b3497>", line 2, in <module>
    file.write(str(model_ap_simple.summary()))

AttributeError: 'Logit' object has no attribute 'summary'

and also after init-fit phase

    model_ap_simple.exog_names[:] = exog_variables_simple

  File "C:\Users\emirzayev\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\base\wrapper.py", line 35, in __getattribute__
    obj = getattr(results, attr)

AttributeError: 'LogitResults' object has no attribute 'exog_names'

In both cases I got error.
I am probably doing something wrong. Would appreciate any help

@josef-pkt
Copy link
Member

josef-pkt commented Feb 12, 2019

AFAICS, you are mixing up model and results instance (difference to sklearn)

model_ap_simple.fit()
returns a results instance and does not change in general the model instance model_ap_simple

In the second case to try to access exog_names in the results instance and not the model instance

try this:

model_ap_simple = sm.Logit(y_train, X_train)
model_ap_simple.exog_names[:] = exog_variables_simple

results_ap_simple = model_ap_simple.fit()
print(results_ap_simple.summary()

@emilmirzayev
Copy link
Contributor Author

It did work! thank you for answering on such short notice

@stevenlis
Copy link

@josef-pkt I'm always curious about this. Is there any reason why Date and Time always show in a summary table? Is there any way to hide them?

@josef-pkt
Copy link
Member

@StevenLi-DS

It's just so we know when we estimated the model (or better when we printed the summary *).
You are the first to ask for hiding/removing it.
A few weeks ago I saw a twitter comment from someone who was happy to have the date and time.

When I wrote the first implementation of summary, I just browsed through what several statistics and econometrics programs (especially Stata) were showing in the summary, and added what I thought looks useful and "traditional".

There is currently no option to adjust what is in the summary, what is included is hardcoded for each model. summary2 is more flexible than summary but I never looked whether we can make results statistics optional when using summary2.

(*) We have an issue to show the fit time instead of the summary time, but I haven't convinced myself yet that we want to call time in fit, i.e. do this extra work in all fit methods.

@stevenlis
Copy link

I can't think about a useful case so far than exposing I got nothing to do than stats at late night. lol...😂

@stevenlis
Copy link

I just realized that I could actually modify the statsmodels.regression.linear_model.OLS in place? Shouldn't this be prevented?

model = smf.ols(formula, data=df)
results = model.fit()
exogs_list = model.exog_names
exogs_list.remove('Intercept')
endog_name = model.endog_names
# call summary will raise error
results.summary()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      7 endog_name = model.endog_names
      8 
----> 9 results.summary()

~\Anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in summary(self, yname, xname, title, alpha)
   2406                              yname=yname, xname=xname, title=title)
   2407         smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
-> 2408                               use_t=self.use_t)
   2409 
   2410         smry.add_table_2cols(self, gleft=diagn_left, gright=diagn_right,

~\Anaconda3\lib\site-packages\statsmodels\iolib\summary.py in add_table_params(self, res, yname, xname, alpha, use_t)
    862         if res.params.ndim == 1:
    863             table = summary_params(res, yname=yname, xname=xname, alpha=alpha,
--> 864                                    use_t=use_t)
    865         elif res.params.ndim == 2:
    866 #            _, table = summary_params_2dflat(res, yname=yname, xname=xname,

~\Anaconda3\lib\site-packages\statsmodels\iolib\summary.py in summary_params(results, yname, xname, alpha, use_t, skip_header, title)
    465 
    466     if len(xname) != len(params):
--> 467         raise ValueError('xnames and params do not have the same length')
    468 
    469     params_stubs = xname

ValueError: xnames and params do not have the same length

@emilmirzayev
Copy link
Contributor Author

@StevenLi-DS ,I think you are right. By modifying maybe only namechange should be allowed?
Because, deleting some variable ex-post by only deleting its name from exog_names should not be possible

@stevenlis
Copy link

I think maybe it should just return a copy of the exog names

@kshedden
Copy link
Contributor

In general I don't think we have ever tried to prevent people from changing attributes of model or results classes. There are some "cache readonly" attributes in the results classes that cannot be changed, but this is an indirect effect of their being cached so that they are not repeatedly recomputed.

@josef-pkt
Copy link
Member

To emphasize Kerby's comment.

In general the user should not change any attributes of either model or results. There is a kind of exception for changing exog_names as in my example above because we don't have an official interface for changing it.

However, not changing attributes is not enforced. If the user changes attributes, then it is on her/his own risk.
One reason that we cannot enforce it, is that we are using those backdoors (changing attributes during execution) internally, and there is no easy way to prevent users to do it if we want to do it on our own.
Also, for expert usage, e.g. when I am writing a new model prototype, I often just manipulate the attributes of a model or results instance. This is fragile and not safe, but it allow for fast writing of experimental code, and can then be safeguarded by unit test or rewritten for a cleaner version.

@stevenlis
Copy link

@kshedden @josef-pkt Thanks for the explanation. Maybe it should be written in the doc to warn the users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants