Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add naive seasonal decomposition function #1484

Merged
merged 27 commits into from Apr 4, 2014

Conversation

Projects
None yet
3 participants
@jseabold
Copy link
Member

commented Mar 15, 2014

This adds a function seasonal_decompose that does a naive seasonal decomposition. It's pretty much the same as R's decompose function as far as functionality goes. I doubt anyone would ever want to use this in practice vs. model-based approaches, but it's ok for a quick look I suppose, and it gets the ball rolling a bit more. It needs a namespace decision. I'm thinking tsa.seasonal. It's not right/obvious in the filters namespace really. I also need to add a note and example to the release docs.

I expect that the first commit will be more contentious. I renamed arfilter -> convolution_filter and I added a true arfilter as recursive_filter. Unfortunately, they don't handle missing data well because of scipy/scipy#3454. I probably won't tackle that on this PR. I also didn't add any tests for these because I ran out of steam, and I just wan't sure what you were after for the 2d cases. I think recursive_filter will need to consider many cases 2d x, 2d filt, and/or 2d init. I think it'd be trivial to add some tests. E.g., just do the same thing over and over again in different dimensions. I'm just not seeing it in the code yet. It also needs to handle pandas-in pandas-out and dates with the wrapper functions.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 15, 2014

I will look at it tonight.

overall for 2d, I was thinking of two or three versions

  • parallel: filter many time series at the same time
  • VAR like filter filter several time series with cross-feedback
  • ARX, distributed lag, miso_filter (multi input single output), with univariate y but including exog

several things I didn't manage because scipy didn't have proper multivariate filters. scipy.signal has improved a lot since then.

for some parts I tried to replicate built-in functions in GAUSS, to try to replicate some research code of econometricians I knew.

for the usecases the filter can go in two directions depending whether we want to estimate (get y_predicted/one-step-ahead prediction, unknown error) or we want to simulate (get y, observed error)

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2014

Ok, yeah. That's roughly what I thought. Parallel is easy. The rest, I didn't really look at. Might be broken now. convolution_filter should do whatever it did. I took out the 3d stuff since the code never got there, and I just wanted to get what I had working.

Looks like I'm on too new numpy. Hard to beleive nanmean was added in 1.8.0.

@coveralls

This comment has been minimized.

Copy link

commented Mar 15, 2014

Coverage Status

Coverage remained the same when pulling a2e75ee on jseabold:seaonal-decomp into e1be28e on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2014

I'd like to merge this to use it for the tests for exponential smoothing. Once we have a decomposed function it's easy to make many different types of series. Trend, no-trend, exponential trend, no-trend seasonality, etc.

Should I just pull out the untested cases from recursive_filter and convolution_filter?

Alternatively, we leave them in. They never really were tested and should work as they were (with correct names now). The only thing I know doesn't work right is init for greater than 1d series and/or filt in recursive_filter. I think init 2d + filt 2d + x 2d requires many permutations. (More than I care to work on.)

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2014

Actually, I can just rebase on top of this and work off that then rebase off master once it's settled. Not pressing.

@coveralls

This comment has been minimized.

Copy link

commented Mar 19, 2014

Coverage Status

Coverage remained the same when pulling ebac972 on jseabold:seaonal-decomp into e1be28e on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2014

I'm wanting to go ahead and merge this. What was there wasn't strictly correct (it wasn't an AR filter). Now the part's that used at least we know is correct. and things are correctly delineated. One possible option is just to disable the parts I'm not sure about if they're correct. I really don't want to take the time to write tests for the other parts of this code.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2014

Cherry-picked a few commits from #1489. That will still need a rebase before merging. Might still want to consider using the pandas wrappers decorators that tsa (and predict, eventually) as actual decorators. I'm just mocking the decorator pattern with two lines rather than using it for what it's for right now.

I decided that fun is overrated and added tests and pandas wrappers to recursive_filter and convolution_filter. I still am not testing the case for filt 2d and init 2d for recursive. I don't really want to get into this for this PR...

There's also a plot method for the result now. This is good to merge as far as I'm concerned. 2d filtered coefficients should be looked and made to work (if they don't) at some point though.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2014

I also added a note about the backwards incompatible changes to removing the incorrect arfilter.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2014

I went ahead and removed the 2d stuff for the recursive filter. signal.lfilter doesn't handle 2d inputs and the user can just as easily get what they want with a loop instead of our doing it. I cleared up and tested what is going on with the convolution_filter in the case of 2d inputs (also just a loop, but I left it since we handle the case of single column y and 2d filter). Our use case is different than just handing off 2d inputs to traditional signal.convolve I think. We can revisit in the future, if there are use cases it should be expanded for.

----------
x : array-like
Time-series data. Should be 1d or n x 1.
filt : array-like

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

not worth saving two letters filter

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

actually, better would be filter_coeff because it's not the polynomial it's the AR coefficients (negative and no leading 1)

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

filter is a "reserved word"

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

Eh, really? It makes everything more verbose without really adding much IMO given it's laid out for you in the docs.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

I'm NOT reading docs, or at least, I don't want to look at them each time I use a function.

Don't use the same word to mean two different things. If I ever have to use this function, my first guess would be that it's a AR filter, i.e. a lag polynomial. (ar versus ar_coeffs is very clear in my opinion.)

(However, pandas became very popular with 2 and 3 letter names, -- I still haven't figured it out.)

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

Again, not reading the docs isn't a valid argument for me. I don't write them to not be looked at.

Where are we using the same word to mean two different things? The docs should be cleared up to call them AR coefficients (as it is in R).

I'd rather go back to a than filter_coeffs.

-------
y : array
Filtered array, number of columns determined by x and filt. If a
pandas object is given, a pandas object is returned.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

add comment about return length, is it truncated or "same"?

Computes the recursive filter ::
y[n] = x[n] + filt[0]*y[n-1] + ... + a[n_filt-1]*y[n-n_filt]

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

put x[n] at the back so it looks more like an AR process

see Notes
filt : array_like
Linear filter coefficients in reverse time-order. Must have the
same number of dimensions as x.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

filter could be 1d and we add the second axis

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

I wanted to preserve shape-in, shape-out and reduce the amount of shuffling code we have to do and simplify the code. Same restrictions as convolve except we treat 2d x 2d filter different when the second dim isn't 1.

filt : array_like
Linear filter coefficients in reverse time-order. Must have the
same number of dimensions as x.
nsides : int, optional

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

n_sides ?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

Similar to maxiter, niter, etc. I think nsides is probably clear enough.

Linear filter coefficients in reverse time-order. Must have the
same number of dimensions as x.
nsides : int, optional
If 2, a centered moving average is computed using the filter

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

how is the centering if filter length is even?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

See the notes.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

I actually need to update them. I changed this and the reverse is true now.

import numpy as np


def test_pandas_freq_decorator():

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

should this go to test module ?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

This file still WIP and probably won't stay like this. I'm using it also in the exponential smoothing PR. I'll likely settle it down in that PR or after when I fix also the other filters. I don't think I'm actually using the decorators anywhere yet, I'm still getting a sense of what we're going to need here. I can do utils -> _utils, if better.

return np.array([pd_nanmean(x[i::freq]) for i in range(freq)])


def seasonal_decompose(X, model="additive", filt=None, freq=None):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

why capital X?

filter_coeff ?

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

convolution actually uses the full filter, ma-polynomial, not _coeff

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 4, 2014

Author Member

X -> x is fine. I'm not really too keen about filter_coeff.


# nan pad for conformability - convolve doesn't do it
if model.startswith('m'):
detrended = X/trend

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

pep pep-8 spaces

@@ -131,101 +156,148 @@ def fftconvolve3(in1, in2=None, in3=None, mode="full"):

#original changes and examples in sandbox.tsa.try_var_convolve
#examples and tests are there
def arfilter(x, a):
'''apply an autoregressive filter to a series x
def recursive_filter(x, filt, init=None):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 4, 2014

Member

filt -> filter

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

I don't really have a strong opinion, since I haven't been working on this for a long time.

Dropping the extra dimension handling is fine, until we run into usecases where we need it. (I don't remember which multivariate dynamic model I had in mind.)

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2014

I just wanted tested code in there and didn't want to spend anymore time correcting what was only a helper function for my original need. Easy to extend later, if someone comes knocking, but no one noticed before. We'll see.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2014

I addressed all of the comments. We'll see what travis says. I'd like to merge this asap to get it off my plate and we can come back around to it when/if we need to.

@coveralls

This comment has been minimized.

Copy link

commented Apr 4, 2014

Coverage Status

Coverage remained the same when pulling 91d0682 on jseabold:seaonal-decomp into b2cefd2 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2014

Test failure is unrelated. I'm going to rebase and merge this.

jseabold added a commit that referenced this pull request Apr 4, 2014

Merge pull request #1484 from jseabold/seaonal-decomp
ENH: Add naive seasonal decomposition function

@jseabold jseabold merged commit ec765bb into statsmodels:master Apr 4, 2014

@jseabold jseabold deleted the jseabold:seaonal-decomp branch Apr 4, 2014

@josef-pkt josef-pkt added the PR label Apr 14, 2014

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1484 from jseabold/seaonal-decomp
ENH: Add naive seasonal decomposition function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.