ENH: Add naive seasonal decomposition function #1484

Merged
merged 27 commits into from Apr 4, 2014

Projects

None yet

3 participants

@jseabold
Member

This adds a function seasonal_decompose that does a naive seasonal decomposition. It's pretty much the same as R's decompose function as far as functionality goes. I doubt anyone would ever want to use this in practice vs. model-based approaches, but it's ok for a quick look I suppose, and it gets the ball rolling a bit more. It needs a namespace decision. I'm thinking tsa.seasonal. It's not right/obvious in the filters namespace really. I also need to add a note and example to the release docs.

I expect that the first commit will be more contentious. I renamed arfilter -> convolution_filter and I added a true arfilter as recursive_filter. Unfortunately, they don't handle missing data well because of scipy/scipy#3454. I probably won't tackle that on this PR. I also didn't add any tests for these because I ran out of steam, and I just wan't sure what you were after for the 2d cases. I think recursive_filter will need to consider many cases 2d x, 2d filt, and/or 2d init. I think it'd be trivial to add some tests. E.g., just do the same thing over and over again in different dimensions. I'm just not seeing it in the code yet. It also needs to handle pandas-in pandas-out and dates with the wrapper functions.

@josef-pkt
Member

I will look at it tonight.

overall for 2d, I was thinking of two or three versions

  • parallel: filter many time series at the same time
  • VAR like filter filter several time series with cross-feedback
  • ARX, distributed lag, miso_filter (multi input single output), with univariate y but including exog

several things I didn't manage because scipy didn't have proper multivariate filters. scipy.signal has improved a lot since then.

for some parts I tried to replicate built-in functions in GAUSS, to try to replicate some research code of econometricians I knew.

for the usecases the filter can go in two directions depending whether we want to estimate (get y_predicted/one-step-ahead prediction, unknown error) or we want to simulate (get y, observed error)

@jseabold
Member

Ok, yeah. That's roughly what I thought. Parallel is easy. The rest, I didn't really look at. Might be broken now. convolution_filter should do whatever it did. I took out the 3d stuff since the code never got there, and I just wanted to get what I had working.

Looks like I'm on too new numpy. Hard to beleive nanmean was added in 1.8.0.

@coveralls

Coverage Status

Coverage remained the same when pulling a2e75ee on jseabold:seaonal-decomp into e1be28e on statsmodels:master.

@jseabold
Member

I'd like to merge this to use it for the tests for exponential smoothing. Once we have a decomposed function it's easy to make many different types of series. Trend, no-trend, exponential trend, no-trend seasonality, etc.

Should I just pull out the untested cases from recursive_filter and convolution_filter?

Alternatively, we leave them in. They never really were tested and should work as they were (with correct names now). The only thing I know doesn't work right is init for greater than 1d series and/or filt in recursive_filter. I think init 2d + filt 2d + x 2d requires many permutations. (More than I care to work on.)

@jseabold
Member

Actually, I can just rebase on top of this and work off that then rebase off master once it's settled. Not pressing.

@coveralls

Coverage Status

Coverage remained the same when pulling ebac972 on jseabold:seaonal-decomp into e1be28e on statsmodels:master.

@jseabold
Member

I'm wanting to go ahead and merge this. What was there wasn't strictly correct (it wasn't an AR filter). Now the part's that used at least we know is correct. and things are correctly delineated. One possible option is just to disable the parts I'm not sure about if they're correct. I really don't want to take the time to write tests for the other parts of this code.

@jseabold
Member

Cherry-picked a few commits from #1489. That will still need a rebase before merging. Might still want to consider using the pandas wrappers decorators that tsa (and predict, eventually) as actual decorators. I'm just mocking the decorator pattern with two lines rather than using it for what it's for right now.

I decided that fun is overrated and added tests and pandas wrappers to recursive_filter and convolution_filter. I still am not testing the case for filt 2d and init 2d for recursive. I don't really want to get into this for this PR...

There's also a plot method for the result now. This is good to merge as far as I'm concerned. 2d filtered coefficients should be looked and made to work (if they don't) at some point though.

@jseabold
Member

I also added a note about the backwards incompatible changes to removing the incorrect arfilter.

@jseabold
Member
jseabold commented Apr 4, 2014

I went ahead and removed the 2d stuff for the recursive filter. signal.lfilter doesn't handle 2d inputs and the user can just as easily get what they want with a loop instead of our doing it. I cleared up and tested what is going on with the convolution_filter in the case of 2d inputs (also just a loop, but I left it since we handle the case of single column y and 2d filter). Our use case is different than just handing off 2d inputs to traditional signal.convolve I think. We can revisit in the future, if there are use cases it should be expanded for.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
@@ -131,101 +156,148 @@ def fftconvolve3(in1, in2=None, in3=None, mode="full"):
#original changes and examples in sandbox.tsa.try_var_convolve
#examples and tests are there
-def arfilter(x, a):
- '''apply an autoregressive filter to a series x
+def recursive_filter(x, filt, init=None):
+ '''
+ Autoregressive, or recursive, filtering.
+
+ Parameters
+ ----------
+ x : array-like
+ Time-series data. Should be 1d or n x 1.
+ filt : array-like
@josef-pkt
josef-pkt Apr 4, 2014 Member

not worth saving two letters filter

@josef-pkt
josef-pkt Apr 4, 2014 Member

actually, better would be filter_coeff because it's not the polynomial it's the AR coefficients (negative and no leading 1)

@jseabold
jseabold Apr 4, 2014 Member

filter is a "reserved word"

@jseabold
jseabold Apr 4, 2014 Member

Eh, really? It makes everything more verbose without really adding much IMO given it's laid out for you in the docs.

@josef-pkt
josef-pkt Apr 4, 2014 Member

I'm NOT reading docs, or at least, I don't want to look at them each time I use a function.

Don't use the same word to mean two different things. If I ever have to use this function, my first guess would be that it's a AR filter, i.e. a lag polynomial. (ar versus ar_coeffs is very clear in my opinion.)

(However, pandas became very popular with 2 and 3 letter names, -- I still haven't figured it out.)

@jseabold
jseabold Apr 4, 2014 Member

Again, not reading the docs isn't a valid argument for me. I don't write them to not be looked at.

Where are we using the same word to mean two different things? The docs should be cleared up to call them AR coefficients (as it is in R).

I'd rather go back to a than filter_coeffs.

@josef-pkt josef-pkt commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
+
+ Parameters
+ ----------
+ x : array-like
+ Time-series data. Should be 1d or n x 1.
+ filt : array-like
+ AR lag polynomial. See Notes
+ init : array-like
+ Initial values of the time-series prior to the first value of y.
+ The default is zero.
+
+ Returns
+ -------
+ y : array
+ Filtered array, number of columns determined by x and filt. If a
+ pandas object is given, a pandas object is returned.
@josef-pkt
josef-pkt Apr 4, 2014 Member

add comment about return length, is it truncated or "same"?

@josef-pkt josef-pkt commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
+ init : array-like
+ Initial values of the time-series prior to the first value of y.
+ The default is zero.
+
+ Returns
+ -------
+ y : array
+ Filtered array, number of columns determined by x and filt. If a
+ pandas object is given, a pandas object is returned.
+
+ Notes
+ -----
+
+ Computes the recursive filter ::
+
+ y[n] = x[n] + filt[0]*y[n-1] + ... + a[n_filt-1]*y[n-n_filt]
@josef-pkt
josef-pkt Apr 4, 2014 Member

put x[n] at the back so it looks more like an AR process

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
Parameters
----------
x : array_like
data array, 1d or 2d, if 2d then observations in rows
- a : array_like
- autoregressive filter coefficients, ar lag polynomial
- see Notes
+ filt : array_like
+ Linear filter coefficients in reverse time-order. Must have the
+ same number of dimensions as x.
@josef-pkt
josef-pkt Apr 4, 2014 Member

filter could be 1d and we add the second axis

@jseabold
jseabold Apr 4, 2014 Member

I wanted to preserve shape-in, shape-out and reduce the amount of shuffling code we have to do and simplify the code. Same restrictions as convolve except we treat 2d x 2d filter different when the second dim isn't 1.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
Parameters
----------
x : array_like
data array, 1d or 2d, if 2d then observations in rows
- a : array_like
- autoregressive filter coefficients, ar lag polynomial
- see Notes
+ filt : array_like
+ Linear filter coefficients in reverse time-order. Must have the
+ same number of dimensions as x.
+ nsides : int, optional
@josef-pkt
josef-pkt Apr 4, 2014 Member

n_sides ?

@jseabold
jseabold Apr 4, 2014 Member

Similar to maxiter, niter, etc. I think nsides is probably clear enough.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
Parameters
----------
x : array_like
data array, 1d or 2d, if 2d then observations in rows
- a : array_like
- autoregressive filter coefficients, ar lag polynomial
- see Notes
+ filt : array_like
+ Linear filter coefficients in reverse time-order. Must have the
+ same number of dimensions as x.
+ nsides : int, optional
+ If 2, a centered moving average is computed using the filter
@josef-pkt
josef-pkt Apr 4, 2014 Member

how is the centering if filter length is even?

@jseabold
jseabold Apr 4, 2014 Member

See the notes.

@jseabold
jseabold Apr 4, 2014 Member

I actually need to update them. I changed this and the reverse is true now.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/utils.py
+
+def dummy_func_array(X):
+ return X.values
+
+def dummy_func_pandas_columns(X):
+ return X.values
+
+
+def dummy_func_pandas_series(X):
+ return X['A']
+
+import pandas as pd
+import numpy as np
+
+
+def test_pandas_freq_decorator():
@josef-pkt
josef-pkt Apr 4, 2014 Member

should this go to test module ?

@jseabold
jseabold Apr 4, 2014 Member

This file still WIP and probably won't stay like this. I'm using it also in the exponential smoothing PR. I'll likely settle it down in that PR or after when I fix also the other filters. I don't think I'm actually using the decorators anywhere yet, I'm still getting a sense of what we're going to need here. I can do utils -> _utils, if better.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 4, 2014
statsmodels/tsa/seasonal.py
+from .filters.utils import _maybe_get_pandas_wrapper_freq
+from .filters.filtertools import convolution_filter
+from statsmodels.tsa.tsatools import freq_to_period
+from statsmodels.tools.tools import Bunch
+
+
+def seasonal_mean(x, freq):
+ """
+ Return means for each period in x. freq is an int that gives the
+ number of periods per cycle. E.g., 12 for monthly. NaNs are ignored
+ in the mean.
+ """
+ return np.array([pd_nanmean(x[i::freq]) for i in range(freq)])
+
+
+def seasonal_decompose(X, model="additive", filt=None, freq=None):
@josef-pkt
josef-pkt Apr 4, 2014 Member

why capital X?

filter_coeff ?

@josef-pkt
josef-pkt Apr 4, 2014 Member

convolution actually uses the full filter, ma-polynomial, not _coeff

@jseabold
jseabold Apr 4, 2014 Member

X -> x is fine. I'm not really too keen about filter_coeff.

@josef-pkt josef-pkt commented on an outdated diff Apr 4, 2014
statsmodels/tsa/seasonal.py
+ elif freq is None:
+ raise ValueError("You must specify a freq or X must be a "
+ "pandas object with a timeseries index")
+
+
+ if filt is None:
+ if freq % 2 == 0: # split weights at ends
+ filt = np.array([.5] + [1] * (freq - 1) + [.5]) / freq
+ else:
+ filt = np.repeat(1./freq, freq)
+
+ trend = convolution_filter(X, filt)
+
+ # nan pad for conformability - convolve doesn't do it
+ if model.startswith('m'):
+ detrended = X/trend
@josef-pkt
josef-pkt Apr 4, 2014 Member

pep pep-8 spaces

@josef-pkt josef-pkt commented on an outdated diff Apr 4, 2014
statsmodels/tsa/filters/filtertools.py
@@ -131,101 +156,148 @@ def fftconvolve3(in1, in2=None, in3=None, mode="full"):
#original changes and examples in sandbox.tsa.try_var_convolve
#examples and tests are there
-def arfilter(x, a):
- '''apply an autoregressive filter to a series x
+def recursive_filter(x, filt, init=None):
@josef-pkt
josef-pkt Apr 4, 2014 Member

filt -> filter

@josef-pkt
Member

I don't really have a strong opinion, since I haven't been working on this for a long time.

Dropping the extra dimension handling is fine, until we run into usecases where we need it. (I don't remember which multivariate dynamic model I had in mind.)

@jseabold
Member
jseabold commented Apr 4, 2014

I just wanted tested code in there and didn't want to spend anymore time correcting what was only a helper function for my original need. Easy to extend later, if someone comes knocking, but no one noticed before. We'll see.

@jseabold
Member
jseabold commented Apr 4, 2014

I addressed all of the comments. We'll see what travis says. I'd like to merge this asap to get it off my plate and we can come back around to it when/if we need to.

@coveralls

Coverage Status

Coverage remained the same when pulling 91d0682 on jseabold:seaonal-decomp into b2cefd2 on statsmodels:master.

@jseabold
Member
jseabold commented Apr 4, 2014

Test failure is unrelated. I'm going to rebase and merge this.

jseabold added some commits Jun 8, 2012
@jseabold jseabold ENH: Add seasonal_decompose function. 4d43cab
@jseabold jseabold REF: Refactor arfilter. Correct name and split. Closes #1478. de46fa9
@jseabold jseabold ENH: Use pandas nanmean. Numpy too new. 6c3ce5a
@jseabold jseabold TST: Old pandas can't assign nan to int series. 4342711
@jseabold jseabold REF: Convert to array before checking finiteness. 2aba184
@jseabold jseabold ENH: Let non-ts pandas objects through. Need to set freq. e028cc2
@jseabold jseabold ENH: Return pass through function 8539f6f
@jseabold jseabold ENH: Check MS not MS- for freq bf2ba32
@jseabold jseabold ENH: Pull out seasonal_mean function. e73a6b4
@jseabold jseabold ENH: Use convolve to properly handle missing values. 28d0682
@jseabold jseabold WIP: Start work on pandas io decorators. c10948c
@jseabold jseabold REF: Update code for pandas wrapper refactor b1d6687
@jseabold jseabold ENH: Add pandas wrappers. Test filters. b67f47f
@jseabold jseabold REF: Move seasonal out of filters. 3eaf69a
@jseabold jseabold ENH: Add a results class with a plotting method. 6060b7d
@jseabold jseabold DOC: Add seasonal_decompose to release notes. 8eecb30
@jseabold jseabold REF: Make sure x and filt have same dimensions. 6f6a490
@jseabold jseabold ENH: Only allow 1d inputs to recursive_filter 54aae7c
@jseabold jseabold ENH: Clear up behavior for 2d convolution. ad29a4e
@jseabold jseabold TST: Test 2d convolution filter. e58675b
@jseabold jseabold BUG: Fix indent a9b6bc6
@jseabold jseabold STY: X -> x 47d335f
@jseabold jseabold STY: Pep-8 f29ad98
@jseabold jseabold DOC: Clarifications and corrections. [skip ci] a3009ef
@jseabold jseabold STY: filt -> ar_coeffs for clarity 2481973
@jseabold jseabold ENH: Allow 1d filt with 2d x 96a9e69
@jseabold jseabold REF: Make WIP utils private. 5577d39
@jseabold jseabold merged commit ec765bb into statsmodels:master Apr 4, 2014
@jseabold jseabold deleted the jseabold:seaonal-decomp branch Apr 4, 2014
@josef-pkt josef-pkt added the PR label Apr 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment