Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Wrap X12/X13-ARIMA AUTOMDL. Closes #442. #1587

Merged
merged 22 commits into from Sep 24, 2014

Conversation

Projects
None yet
6 participants
@jseabold
Copy link
Member

commented Apr 10, 2014

This works but is still WIP for some cleanup.

  • add to release notes
  • add tests (will have to run locally only unless we build x12 on travis/throw up a binary somewhere)
  • make sure API is sensible
  • deal with x12path systematically, allow setting at build time, allow setting via a function
  • finish docstrings
  • download links for the binaries
  • add to optional dependencies

The spec code in there should allow wrapping more than AUTOMDL, but I don't know that we really need that. We could, for example, allow use of x12arima as a method in ARIMA and use it's seasonal ARIMA until we have a native solution.

@coveralls

This comment has been minimized.

Copy link

commented Apr 10, 2014

Coverage Status

Coverage remained the same when pulling 3aae32f on jseabold:add-x12 into 5171e28 on statsmodels:master.

@bashtage

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2014

Nice addition.

@josef-pkt josef-pkt added PR labels Apr 14, 2014

@coveralls

This comment has been minimized.

Copy link

commented Apr 14, 2014

Coverage Status

Coverage remained the same when pulling fcfeebb on jseabold:add-x12 into d7ff182 on statsmodels:master.

@@ -13,3 +13,4 @@
from .base import datetools
from .seasonal import seasonal_decompose
from ..graphics import tsaplots as graphics
import x12

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

from . import x12

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 15, 2014

some docs in how to get X12/X13 and what needs to be on the path is also needed.
download link ?
https://www.census.gov/srd/www/x13as/

I think once it's working, additional usecases will show up where this could be expanded.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

I'm not sure on windows. I might need some help here. I've always built it from source, so I'm not sure all the different binary names you might get. We just need x12a or x13a (x13as?) on the path for linux.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 15, 2014

x13as.exe
most likely it's easy, unzip and add to system path (or specify path variable)

E:\Josef\_progs\x13as>x13as Testairline -s -o testout

X-13ARIMA-SEATS Seasonal Adjustment Program
Version Number 1.1 Build 9
Execution began  Apr 14, 2014  22.05.58
@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

Apparently the specs for x12 and x13 are different. x13 provides a converter utility. I don't know if it makes sense to rely on this utility. Probably best to have native support for both specs. Or just target x13?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 15, 2014

I don't see a reason to support x12 if the latest version is x13.
We don't have legacy code to support.

How easy is it to install x13 parallel to x12 on Linux? For users who already have x12.
I don't see a problem on Windows because it's just an unzip as far as I saw.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

Well the reason to support it is that I wrote the specs for x12 first.

It's trivial to install the new one. I just built it, and they also provide binaries for linux.

I'll see if there are any changes to the specs we use. Hopefully, they both work out of the box.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

Looks like these specifications were unaffected. Seems to run fine in x12 or x13. I'm thinking about changing the namespace to x13 though. I added a note that we use x12 and x13 interchangeably but both should work.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

Re: namespace. What's the proper way to only export these two functions in __all__ to the api? I thought if all was defined then it would only tab-complete those, but that's not the case. Should both functions be preceded by x13 and we just import this in the tsa namespace? I.e.,

sm.tsa.x13_select_arima_order
sm.tsa.x13_arima_analysis
@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2014

I slightly prefer sm.tsa.x13.select_arima_order and sm.tsa.x13.arima_analysis but not enough to bend over backwards to make it happen.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 15, 2014

If it's just adding two functions, then I would prefer the underline names, not really worth starting a new namespace.
If the plan is to extend this, and we get a collection of functions, then a new namespace would be fine.

y X ?

return series_spec


def x13arima_analysis(y, x12path=None, X=None, log=None, outlier=True,

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

I think x12path should go later, assuming that in general the path will be picked up automatically and doesn't have to be given on each function call.
maxorder and maxdiff look like higher priority arguments to me.

(spot check: I haven't looked at more than the signature, and whether it's really a function)

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 15, 2014

Author Member

These are the kind of checks that are helpful. Now that it's general enough to be smart, I think this makes sense. I'm using keyword args in all my user code for this so far.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

question: Does this only allow for annual seasonality? Is seasonality inferred from freq, or is there a way to set it explicitly?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 15, 2014

Author Member

No.

Seasonality is inferred from the index, but if you pass an array-like with a start date and a freq, then it's settable. That said, it only allows quarterly and monthly with a note to expand this if it's desired for some reason.

return results, seasadj, trend, irregular, stdout, spec


def select_arima_order(y, x12path=None, X=None, log=None, outlier=True,

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

same as above, I would think that maxorder will be used most of the time by users

The maximum order of the regular and seasonal ARMA polynomials to
examine during the model identification. The order for the regular
polynomial must be greater than zero and no larger than 4. The
order for the seaonal polynomial may be 1 or 2.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

typo: seasonal

is order <= 4 an x13 limitation? If we have monthly or higher frequency data, then we might want to try more lags.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 15, 2014

Author Member

I really can't type that word. Yes 4 is an x13 limitation. There are also series length, etc. limitations from x13. It may be possible to change it in the source and compile your own (there are some limitations you can change at your own peril), but we don't support this.


def _make_automdl_options(maxorder, maxdiff, diff):
options = "\n"
options += "maxorder = ({} {})\n".format(maxorder[0], maxorder[1])

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 15, 2014

Member

IIRC, this is not valid python 2.6 syntax {0}, {1}

Changed in version 2.7: The positional argument specifiers can be omitted, so '{} {}' is equivalent to '{0} {1}'.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2014

I think I've ticked off all the check boxes. The path isn't handled systematically in that it's a settable property at build-time or at run-time that will always be respected. We'll have to wait for that once we have a proper options system.


_binary_names = ('x13as.exe', 'x13as', 'x12a.exe', 'x12a')

class _freq_to_period:

This comment has been minimized.

Copy link
@bashtage

bashtage Apr 16, 2014

Contributor

Isn't this an old style class that doesn't work under Python 3?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 16, 2014

Author Member

I think all classes in Python 3 are new-style. We inherit from object in < 2.7 to ensure this. Could be wrong though.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 16, 2014

Member

We need (object) for python 2 to get new-style classes (which are not new at all anymore).
python 3 only has new-style classes, whether it does or doesn't inherit from object.

I think it would be better to stick to pep-8 capitalized class names

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 16, 2014

Member

maybe this should inherit from dict ?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 16, 2014

Author Member

It's a private, internal only class for this module, so I'm not really that worried about it. Really, it should be replaced by "the way" to do this. I don't know exactly what that is since there's still a few loose ends floating around re: period-handling. I'll likely just file a ticket to look at this once I can come back and merge the exponential smoothing and I have a better idea.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2014

One minor issue is where to put these in the docs. I stuck them with the tsa.stattools stuff in source/tsa.rst, but they might not "belong" there.

@coveralls

This comment has been minimized.

Copy link

commented Apr 16, 2014

Coverage Status

Coverage remained the same when pulling 18e2359 on jseabold:add-x12 into d7ff182 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Apr 16, 2014

Coverage Status

Coverage remained the same when pulling 7d895ed on jseabold:add-x12 into d7ff182 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Apr 16, 2014

Coverage Status

Coverage remained the same when pulling 7d895ed on jseabold:add-x12 into d7ff182 on statsmodels:master.

return series_spec


def x13_arima_analysis(y, maxorder=(2, 1), maxdiff=(2, 1), diff=None, X=None,

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 16, 2014

Member

I would still prefer the more standard

def x13_arima_analysis(endog, exog=None, maxorder=(2, 1), maxdiff=(2, 1), diff=None, ...

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 16, 2014

Author Member

Well, I'm still trying to move away from this, and I haven't had time to finish the great rename yet. I had to go through again last week people getting lost when I bring this up. Really don't want to get into this debate again right now though.

This comment has been minimized.

Copy link
@argriffing

argriffing Apr 18, 2014

If statsmodels were only an econometrics package then maybe people would immediately understand endog vs. exog, but coming from outside of econometrics this terminology is strange to me. When I see endog I just think of the shibe doge surrounded with econometrics jargon thought bubbles 'such pricing', 'much elasticity', and I forget what I was doing.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 18, 2014

Member

see #395 for the looooong discussion
The main point: None of the code that I'm reviewing has one letter names in public functions, and internally only if it's a temp variable within a few lines of code. (As a maintainer I try to "memorize" as few things as possible per line of code. That's the only way to keep track of 100,000 lines of code.)

This comment has been minimized.

Copy link
@argriffing

argriffing Apr 18, 2014

I see. It looks like an important part of your system!

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 18, 2014

Member

maybe a bit, but x and y are neither descriptive nor specific names, and I didn't object too strongly to finding other "names" that are not letters.
Besides, I think most of those 97% almost never even need to see "endog" and "exog".

This comment has been minimized.

Copy link
@argriffing

argriffing Apr 18, 2014

Besides, I think most of those 97% almost never even need to see "endog" and "exog".

Aren't they literally the first thing you see in statsmodels docs after the function name? For example linear regression or multinomial logistic regression, unless most of the 97% are not going to look at docs.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 18, 2014

Author Member

Do you find their use to be confusing here?

This comment has been minimized.

Copy link
@argriffing

argriffing Apr 18, 2014

Personally I don't find it too confusing anymore, I just need a moment to get re-oriented to the statsmodels notation. But the first time I used statsmodels endog and exog sent me on a half-hour expedition to learn what they were all about. Most people probably would not have spent that long -- they would have either understood more quickly or they would have just moved on to something else.

After reading the discussion in #395 I can see how using endog and exog instead of y and x could help maintainability and development by putting more semantics into the variable names, especially if you are used to that system. I think it's OK that statsmodels has a handful of developers actively contributing and testing research code using eccentric but internally consistent notation, and I assume that other projects will start repackaging the code as it matures and give it a less eccentric interface.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 18, 2014

Author Member

Could you add this to #395? It would be good to keep all viewpoints in the same place.

@coveralls

This comment has been minimized.

Copy link

commented Apr 18, 2014

Coverage Status

Coverage remained the same when pulling b5b74a9 on jseabold:add-x12 into d7ff182 on statsmodels:master.

seasadj = _open_and_read(ftempout.name + '.d11')
trend = _open_and_read(ftempout.name + '.d12')
irregular = _open_and_read(ftempout.name + '.d13')
except X13Error, err:

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 18, 2014

Member

not python 3 syntax use as err we don't need py 2.5 anymore.

BTW: What's the point in catching the exception and then raising it right away again?
try ... finally ... ?

@coveralls

This comment has been minimized.

Copy link

commented Apr 18, 2014

Coverage Status

Coverage remained the same when pulling 8f6b50b on jseabold:add-x12 into d7ff182 on statsmodels:master.

@jseabold jseabold force-pushed the jseabold:add-x12 branch from 28308bd to b354f24 Sep 24, 2014

@bashtage

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2014

Is this a GitHub bug? Isn't the x/y conversation half a year old?

@argriffing

This comment has been minimized.

Copy link

commented Sep 24, 2014

Is this a GitHub bug? Isn't the x/y conversation half a year old?

Nah, some of this PR had X which @jseabold is changing to exog, not half a year ago but like half an hour ago in jseabold@e1fff26

I'm not sure of the status of the x/y conversation, I assume stats in python will change to x/y one way or another, possibly involving statsmodels becoming de-facto replaced by some project where the developers are paid to be less eccentric.

@coveralls

This comment has been minimized.

Copy link

commented Sep 24, 2014

Coverage Status

Coverage decreased (-0.37%) when pulling b354f24 on jseabold:add-x12 into e27e5dc on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2014

On Wed, Sep 24, 2014 at 3:26 PM, argriffing notifications@github.com
wrote:

Is this a GitHub bug? Isn't the x/y conversation half a year old?

Nah, some of this PR had X which @jseabold https://github.com/jseabold
is changing to exog, not half a year ago but like half an hour ago in
jseabold/statsmodels@e1fff26
jseabold@e1fff26

I'm not sure of the status of the x/y conversation, I assume stats in
python will change to x/y one way or another, possibly involving
statsmodels becoming de-facto replaced by some project where the developers
are paid to be less eccentric.

:/

@bashtage

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2014

I see. I don't mind either way, but that find/replace seems to have gone awry

"""
If x12path is not given, then either x13as[.exe] or x12a[.exe] must
be found on the PATH. Otherwise, the environmental variable exog12PATH or
X13PATH must be defined. If prefer_x13 is True, only X13PATH is searched
for. If it is false, only exog12PATH is searched for.
"""

exog12PATH

@jseabold jseabold force-pushed the jseabold:add-x12 branch from b354f24 to aa16c75 Sep 24, 2014

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2014

Should be fixed

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Sep 24, 2014

find/replace seems to have gone awry

which I think proves my point

@argriffing If you want to fork statsmodels over x and y, then you are welcome to it. I won't have to maintain that fork. (You are then also free to use any other letter from the ASCII or greek alphabet.)

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2014

On Wed, Sep 24, 2014 at 3:51 PM, Josef Perktold notifications@github.com
wrote:

find/replace seems to have gone awry

which I think proves my point

That I was careless with my regex and only accounted for 13 and not 12?
That's a reason to alienate 98% of users?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Sep 24, 2014

That I was careless with my regex and only accounted for 13 and not 12?
That's a reason to alienate 98% of users?

I don't think 98% of users are alienated.
However, yes, you are the expert on regular expressions and are careless.
The last time I used a regular expression is several years ago, and for me it wouldn't be carelessness, I just wouldn't know, at least not without spending several hours trying out regexes.

Plus, we need a context sensitive regex in our head whenever we read a piece of code with x in it.

@coveralls

This comment has been minimized.

Copy link

commented Sep 24, 2014

Coverage Status

Coverage decreased (-0.37%) when pulling aa16c75 on jseabold:add-x12 into 57a876d on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2014

On Wed, Sep 24, 2014 at 4:14 PM, Josef Perktold notifications@github.com
wrote:

That I was careless with my regex and only accounted for 13 and not 12?
That's a reason to alienate 98% of users?

I don't think 98% of users are alienated.

Again, this is based on the several straw polls I conduct w/1-200 people
each time I give a talk on statsmodels plus every discussion we've had
about this.

However, yes, you are the expert on regular expressions and are careless.

Do you want to get ad hominem now? That'll be even more fun for everyone.

The last time I used a regular expression is several years ago, and for me
it wouldn't be carelessness, I just wouldn't know, at least not without
spending several hours trying out regexes.

Plus, we need a context sensitive regex in our head whenever we read a
piece of code with x in it.

Or style guidelines, code comments, or documentation, but who needs those
right? Let's continue to imagine the horrors that could happen if people
take things to the sloppy extreme.

jseabold added a commit that referenced this pull request Sep 24, 2014

Merge pull request #1587 from jseabold/add-x12
ENH: Wrap X12/X13-ARIMA AUTOMDL. Closes #442.

@jseabold jseabold merged commit 61d3654 into statsmodels:master Sep 24, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details

@jseabold jseabold deleted the jseabold:add-x12 branch Sep 24, 2014

@argriffing

This comment has been minimized.

Copy link

commented Sep 24, 2014

If you want to fork statsmodels over x and y, then you are welcome to it. I won't have to maintain that fork. (You are then also free to use any other letter from the ASCII or greek alphabet.)

Honestly I have only appreciation for the work that you guys put into making this free software, and this notation does not matter to me much at all. If I ever fork statsmodels over notation it will be because I want the variables named likelihood to actually be a likelihood, not a log likelihood or a negative log likelihood or penalized negative log likelihood or a loss function unrelated to likelihoods in any way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.