Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aic in ar/arima/... #3574

Open
jbrockmendel opened this issue Mar 24, 2017 · 5 comments
Open

aic in ar/arima/... #3574

jbrockmendel opened this issue Mar 24, 2017 · 5 comments

Comments

@jbrockmendel
Copy link
Contributor

Related to #3568, I am trying to simplify the AR[...] models and have them inherit some attributes as properties from a general case. There are two inconsistencies I need help understanding.

The aic/bic/hqic are defined slightly differently in ARMAResults, ARResults, and VARResults. The differences can be understood by just looking at aic:

ARResults: aic = np.log(self.sigma2) + 2 * (df_model+1) / nobs

VARResults: aic = ld + 2 * (neqs * df_model) / nobs

Here ld is analogous to np.log(self.sigma2).

ARMAResults: aic = -2 * llf + 2 * (df_model+1)

In the CSS case, llf can be simplified to the point where this is a positive affine transform of the version in ARResults.

Finally the actual questions.

  1. Is there a reason not to use a common definition between AR vs ARMA/ARIMA?

  2. If we set neqs=1 in the VAR, we should end up with an AR, but there is an extra 1 unaccounted for. Where did it go?

@josef-pkt
Copy link
Member

see #1802 for related issue (and #1733, #1719 and #3514 )

How many parameters are in a model? Do we count scale as a parameter?
In least squares or quasi-likelihood the scale does not enter the estimation of the parameters. When we use full MLE, then scale is part of the specified parameters and in that case we have +1.

The main problem is that conventions differ by statistical package and model. Plus we don't want to break backwards compatibility without a good reason.

The definition or convention that is implemented in statsmodels often depends on what reference package was used for writing the unit tests during development.
I don't have a problem with deviating from any convention, if we have a strong enough statistical opinion to explain and justify what we are doing (*). Forming an strong statistical/econometric/theoretical opinion during developement (as "programmer") is often not easy, so we default to whatever the other ones are doing, even if it causes inconsistencies.

(*) We have some differences to Stata where we adjust the unit tests to match Stata when our function doesn't match it because we use different definition, convention or degrees of freedom correction.

Related aside:
I'm always looking for "strong" opinions or Monte Carlo studies or good references, which help coming to a decision when it's too difficult to become convinced enough for either way by myself.
(For example: It took me several years and looking at many related articles to become convinced that we need use_t in all models, as option for better small sample inference. Initially we had either t or normal distributions but did not allow a choice.)

@josef-pkt
Copy link
Member

related: Given that it is often difficult to judge what the behavior should be, I have a tendency to add options, choose one default but make it possible for users to replicate other packages or other definitions or conventions.
E.g. if we have a method like information_criteria with options, then the default is less important and we can choose something for consistency across models within statsmodels.
(It still leaves us with backwards compatibility considerations when something has been available for a long time.)

@jbrockmendel
Copy link
Contributor Author

I don't envy you the task of juggling all of the possible implementations, especially the backward-compatibility part. The handle-it-once approach in ResultMixin is nice, especially the property-like setup.

You have definitely given this more thought than I have, so I'm going to stick to trying to figure out the AR[...] case for now and add a few misc comments below.

For AR, it looks increasingly like the IC should depend on the estimation method. In fact, there is a #TODO comment where self.df_model and self.df_resid are set suggesting that these should depend on this method. I'll try to sort this out.

For ARMA, it looks like your comment about the variance being estimated might explain the extra +1.

For VAR, I think there may be a factor of self.neqs missing if we take the docstring literally:

    @property
    def df_model(self):
        """Number of estimated parameters, including the intercept / trends
        """
        return self.neqs * self.k_ar + self.k_trend

The df_model here gives the number of parameters estimated per equation. That's why down in info_criteria we have to define free_params in a way that equals neqs*df_model. The missing factor of neqs happens to cancel out every other time df_model is used to calculate something, so it doesn't actually affect anything.

Misc:

How many parameters are in a model?

This seems like an argument for making df_model something like a property derived from self.params.size (at least at the Results level).

Do we count scale as a parameter?
In least squares or quasi-likelihood the scale does not enter the estimation of the parameters. When we use full MLE, then scale is part of the specified parameters and in that case we have +1.

In many cases we have a method argument or attribute that could help with this. But also most of the cases where I see an explicit +1 are accompanied by a comment # assume constant or other indication that it is being driven by an intercept term that is not being counted somewhere. Brainstorming: what if those assumptions went into an explicit flag kind of like use_t and then df_model/df_resid could go into a property/cache_readonly high up the class chain.

@josef-pkt
Copy link
Member

I had forgotten about the constant issue in df_model. We have it from the very beginning without counting the constant, but I find it very annoying. I started once with df_model_wc for "with constant" but wasn't convinced enough to pull it through as new generic name. For most parts I try to use only df_resid as the main relevant attribute for inference which is always supposed to count the constant if it is included in the parameters. For df_resid the main issue is including scale in count or not.

(aside in another case, cluster robust standard errors, I introduced df_resid_inference when we want to use different degrees of freedom for inference or for small sample correction to those used for example in scale =ssr / df_resid.)

I need to look at your other comments more carefully, but will be largely offline for the rest of today.

@josef-pkt
Copy link
Member

just as general idea or principle

I think VAR and related SVAR, VECM are the outliers here that need special treatment and are mainly least squares estimators right now. (Some parts still use the MLE interpretation, but don't necessarily use a full MLE optimization)
I think AR/ARMA/ARIMA can be considered as MLE models (even if the estimator is least squares in some cases) and those 3 should standardize with the corresponding statespace models which are also MLE. If we need to fix things, then we could likely treat them as bugfixes as long as they don't change core usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants