orig_endog vs endog_lagged vs... #3689

jbrockmendel · 2017-05-23T00:23:17Z

Many models in tsa have to truncate the given endog, and update nobs in accordance. They could be more uniform in what names they use for both the truncated and un-truncated versions.

var_model has endog_lagged
arima_model calls the truncated version self.endog, then later accesses self.data.endog to access the original. (at least I think that's the motivation. Less sure on this than the others)
markov_autoregression has orig_endog. It also updates self.data.endog so the previous method is not immediately applicable.
regression.linear_model.GLSAR truncates exog and calls it wexog

I don't have any real preferences over these variants, but would like to settle on a standard. It could be done with aliases to maintain backward-compatibility. Thoughts?

The text was updated successfully, but these errors were encountered:

josef-pkt · 2017-05-23T00:45:15Z

good idea, but I don't have an opinion on names and convention right now.

However, GLSAR is different, wendog and wexog are transformation similar to WLS and GLS and not just truncation. So those are different.
(wendog and wexog are "whitened" transformation of the original, in this case after applying an AR filter that currently does not keep the initial observations, but could keep them.
prediction, fittedvalues, resid are for (the original) endog, exog.)

josef-pkt · 2017-05-23T00:49:53Z

VAR endog_lagged might also be a bit different. IIRC those are the past endog for the regressor matrix, and not the truncated endog.

jbrockmendel · 2017-05-23T03:47:41Z

However, GLSAR is different

You're right on all counts. I still consider these special cases of the same general phenomenon. The unifying characteristic is that these are all cases where we cannot define nobs as just len(self.endog). Exactly why I care about this characteristic is a topic for another day.

but I don't have an opinion on names and convention right now.

As long as it isn't part of the exposed API, choosing an ideal convention is probably less important than choosing a convention. So let's throw some stuff against the wall:

orig_endog is probably the clearest of the names I've seen. We could attach it to model.data and make it a cache_readonly to clarify that it is intended to be immutable.

VARResults has a n_totobs attribute that corresponds to len(orig_endog). We could make that available in the general case, but privatize it as _nobs_total.

ChadFulton · 2017-05-26T01:10:04Z

I will add that I also get confused about this, particularly because model.data already has an orig_endog attribute, but of course it is the original version of the endog array passed to the TimeSeriesModel constructor, not necessarily the original version of the endog array passed to the actual model's constructor. So in e.g. SARIMAX, we have model.endog, model.orig_endog, model.data.endog, and model.data.orig_endog.

jbrockmendel · 2017-05-26T17:49:57Z

So in e.g. SARIMAX, we have model.endog, model.orig_endog, model.data.endog, and model.data.orig_endog.

Are statespace models sufficiently recent that changes can be made without breaking backwards-compatibility? If you were to decide on a One True Convention in statespace, that could become the Schelling Point for other models.

To the extent that multiple names need to be retained for backwards-compatibility, there's room to reduce ambiguity by being explicit. Something like:

    @property
    def y(self):
        """self.y is an alias for self.endog.  This is retained for backward-compatibility and will be removed in version x.y.z."""
        return self.endog

In the case of self.y, it looks like it mostly shows up in vector_ar and sandbox. There are probably some self.xs scattered about too.

This relates to #1664 on which I'll post a comment shortly.

jbrockmendel · 2017-06-02T17:50:46Z

Related #2314

jbrockmendel · 2017-06-02T18:41:40Z

Related: mlemodel.nobs_effective

josef-pkt added comp-tsa design type-refactor labels May 23, 2017

jbrockmendel mentioned this issue May 29, 2017

WIP: Dimensions Mixins #3725

Closed

This was referenced Oct 4, 2017

base.data.ModelData handles two unrelated tasks #3932

Closed

Pythonic data #3995

Closed

jbrockmendel mentioned this issue Oct 12, 2017

SUMM: Topic Week - Internal Consistency #4030

Open

jbrockmendel closed this as completed Dec 25, 2017

jbrockmendel reopened this Mar 10, 2018

jbrockmendel mentioned this issue May 12, 2019

Issue Labels #5708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orig_endog vs endog_lagged vs... #3689

orig_endog vs endog_lagged vs... #3689

jbrockmendel commented May 23, 2017

josef-pkt commented May 23, 2017

josef-pkt commented May 23, 2017

jbrockmendel commented May 23, 2017

ChadFulton commented May 26, 2017

jbrockmendel commented May 26, 2017

jbrockmendel commented Jun 2, 2017

jbrockmendel commented Jun 2, 2017

orig_endog vs endog_lagged vs... #3689

orig_endog vs endog_lagged vs... #3689

Comments

jbrockmendel commented May 23, 2017

josef-pkt commented May 23, 2017

josef-pkt commented May 23, 2017

jbrockmendel commented May 23, 2017

ChadFulton commented May 26, 2017

jbrockmendel commented May 26, 2017

jbrockmendel commented Jun 2, 2017

jbrockmendel commented Jun 2, 2017