Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orig_endog vs endog_lagged vs... #3689

Open
jbrockmendel opened this issue May 23, 2017 · 7 comments
Open

orig_endog vs endog_lagged vs... #3689

jbrockmendel opened this issue May 23, 2017 · 7 comments

Comments

@jbrockmendel
Copy link
Contributor

Mostly @ChadFulton

Many models in tsa have to truncate the given endog, and update nobs in accordance. They could be more uniform in what names they use for both the truncated and un-truncated versions.

  • var_model has endog_lagged
  • arima_model calls the truncated version self.endog, then later accesses self.data.endog to access the original. (at least I think that's the motivation. Less sure on this than the others)
  • markov_autoregression has orig_endog. It also updates self.data.endog so the previous method is not immediately applicable.
  • regression.linear_model.GLSAR truncates exog and calls it wexog

I don't have any real preferences over these variants, but would like to settle on a standard. It could be done with aliases to maintain backward-compatibility. Thoughts?

@josef-pkt
Copy link
Member

good idea, but I don't have an opinion on names and convention right now.

However, GLSAR is different, wendog and wexog are transformation similar to WLS and GLS and not just truncation. So those are different.
(wendog and wexog are "whitened" transformation of the original, in this case after applying an AR filter that currently does not keep the initial observations, but could keep them.
prediction, fittedvalues, resid are for (the original) endog, exog.)

@josef-pkt
Copy link
Member

VAR endog_lagged might also be a bit different. IIRC those are the past endog for the regressor matrix, and not the truncated endog.

@jbrockmendel
Copy link
Contributor Author

However, GLSAR is different

You're right on all counts. I still consider these special cases of the same general phenomenon. The unifying characteristic is that these are all cases where we cannot define nobs as just len(self.endog). Exactly why I care about this characteristic is a topic for another day.

but I don't have an opinion on names and convention right now.

As long as it isn't part of the exposed API, choosing an ideal convention is probably less important than choosing a convention. So let's throw some stuff against the wall:

orig_endog is probably the clearest of the names I've seen. We could attach it to model.data and make it a cache_readonly to clarify that it is intended to be immutable.

VARResults has a n_totobs attribute that corresponds to len(orig_endog). We could make that available in the general case, but privatize it as _nobs_total.

@ChadFulton
Copy link
Member

I will add that I also get confused about this, particularly because model.data already has an orig_endog attribute, but of course it is the original version of the endog array passed to the TimeSeriesModel constructor, not necessarily the original version of the endog array passed to the actual model's constructor. So in e.g. SARIMAX, we have model.endog, model.orig_endog, model.data.endog, and model.data.orig_endog.

@jbrockmendel
Copy link
Contributor Author

So in e.g. SARIMAX, we have model.endog, model.orig_endog, model.data.endog, and model.data.orig_endog.

Are statespace models sufficiently recent that changes can be made without breaking backwards-compatibility? If you were to decide on a One True Convention in statespace, that could become the Schelling Point for other models.

To the extent that multiple names need to be retained for backwards-compatibility, there's room to reduce ambiguity by being explicit. Something like:

    @property
    def y(self):
        """self.y is an alias for self.endog.  This is retained for backward-compatibility and will be removed in version x.y.z."""
        return self.endog

In the case of self.y, it looks like it mostly shows up in vector_ar and sandbox. There are probably some self.xs scattered about too.

This relates to #1664 on which I'll post a comment shortly.

@jbrockmendel
Copy link
Contributor Author

Related #2314

@jbrockmendel
Copy link
Contributor Author

Related: mlemodel.nobs_effective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants