Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: New Seasonal TimeSeries Model:TBATS/BATS/MTBATS #2892

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

Leoyzen
Copy link
Contributor

@Leoyzen Leoyzen commented Apr 14, 2016

A new timeseries forecast/decompose model based on Rob.J.Hyndman's paper.
http://robjhyndman.com/papers/complex-seasonality/

It is the same as the tbats model of R's forecast package.

Feature:

  1. Long Seasonality forecast (such as minutes data)
  2. Complex seasonality based on trigonometric transform(such as multiple seasonality/non-interger periods)
  3. BoxCox transform
  4. arma error
  5. exog support

work:

  • TBATS Model
  • BATS Model
  • MTBATS Model
  • MBATS Model
  • Documents
  • Related
    • Test case
    • IPython notebook examples
    • related datasets
  • Base Tools
    • Fourier Seasonal tools
    • Fourier forecast exog generate tool
    • Model auto select tool
    • Fourier K auto select tool

@Leoyzen Leoyzen changed the title ENH: New Seasonal TimeSeries Model:TBATS/BATS/MB ENH: New Seasonal TimeSeries Model:TBATS/BATS/MTBATS Apr 14, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 85.359% when pulling 35ed40c on Leoyzen:tbats into 970e99e on statsmodels:master.


import numpy as np
from numpy.linalg import LinAlgError
import statsmodels.api as sm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, this adds a circular import and causes the TravisCI test error.
inside the code we only import directly from the modules (or subpackages)

(I'm not sure what this circular import does and why it fails for some tests but not all.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into it

@coveralls
Copy link

coveralls commented Jul 19, 2016

Coverage Status

Coverage decreased (-1.1%) to 88.052% when pulling b5d09f8 on Leoyzen:tbats into 9ccf83f on statsmodels:master.


if self.boxcox:
endog, lmbda = self.transform_boxcox(self.data.endog)
if np.isnan(lmbda) or not (0 <= lmbda <= 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of the next commit you can specify bounds=(0, 1) as an optional argument to transform_boxcox. This will ensure optimisation takes place over the specified interval.

@coveralls
Copy link

coveralls commented Jul 19, 2016

Coverage Status

Coverage decreased (-1.1%) to 88.052% when pulling decd0b8 on Leoyzen:tbats into 9ccf83f on statsmodels:master.

@coveralls
Copy link

coveralls commented Jul 20, 2016

Coverage Status

Coverage decreased (-1.1%) to 88.067% when pulling 935de44 on Leoyzen:tbats into 9ccf83f on statsmodels:master.

@ChadFulton
Copy link
Member

I finally had a chance to look at Hyndman's papers and I'm starting to look this over.

elif key == 'transition':
self.ssm['transition', :-1, :-1] = value
elif key == 'selection':
self.ssm['transition', :-1, -1:] = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should be 'selection', and the same below in __getitem__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, follow the construction of your blog post which you gave to me it should be like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why trying to set the selection should set things in the transition matrix. Can you remind me which blog post? It could have been a typo on my part.

Given this, it looks like you never set the actual selection matrix, which I think would render the model useless? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, yes, you're correct; and now I see the setting of the actual selection matrix in the __init__ call, so that should be fine.

@Leoyzen
Copy link
Contributor Author

Leoyzen commented Jul 20, 2016

@ChadFulton Maybe you can help me to look at the start params initialization at arma error and regression part, actually I am not familar with this.

@coveralls
Copy link

coveralls commented Jul 20, 2016

Coverage Status

Coverage decreased (-1.1%) to 88.065% when pulling f234733 on Leoyzen:tbats into 9ccf83f on statsmodels:master.

----------
periods: iterable of float or None or array, optional
The period of the seasonal component. Default is None. Can be multiple
k: int or None, optional, should be same length of periods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed into a descriptive name.

@ChadFulton
Copy link
Member

@ChadFulton Maybe you can help me to look at the start params initialization at arma error and regression part, actually I am not familar with this.

Sure - you're asking about how to compute start_params for the ARMA and regression coefficients?

@ChadFulton
Copy link
Member

I can't quite tell what the status of this model is. It looks pretty complete, but there are no unit tests. Are things generally working, or is there something holding it up?

I know you were running into a problem with "variance initialization of the model", but I'm not sure what that means - is that still an issue?

@ChadFulton
Copy link
Member

It looks to me like this model pretty much works. Hard to test against the R version of TBATS because they allow larger parameter ranges for alpha and beta. I don't quite understand, but running:

library(forecast)
dta <- read.csv('datasets/macrodata/macrodata.csv')
endog <- log(dta$realgdp) * 400
res <- tbats(endog)

yields the following:

BATS(1, {0,0}, 0.994, -)

Call: tbats(y = endog)

Parameters
  Alpha: 1.23139
  Beta: -0.01928973
  Damping Parameter: 0.994394

Seed States:
            [,1]
[1,] 3134.685159
[2,]    5.292013

Sigma: 3.641562
AIC: 1613.3

alpha > 1 which is not a huge problem if TBATS is allowing the larger parameter range, but I thought beta was always supposed to be positive (e.g. see Hyndman's book, page 45).

I'm not an expert in these models though, so I may have misunderstood the applicable parameter ranges. @Leoyzen @davidljung do you know what's going on with R here?

@Leoyzen
Copy link
Contributor Author

Leoyzen commented Jul 21, 2016

@ChadFulton According to the file checkAdmissibility.R I don't see any limitation to the alpha/beta/gamma, so I just use the information from the book.

Your question is also my problem , it became more different result to TBATS of R.

The model is almost done here , but some issue with arma and variance initial coefs(within initialize_state function). I'm not familar with this, so please help.

The optimize should be done otherwise it takes long time and large nums of iteration to the fit, and I found that it's hard to optimization to converge with lbfgs.

My next goal is to get more docs and right code style of statsmodels.

@ChadFulton
Copy link
Member

@ChadFulton According to the file checkAdmissibility.R I don't see any limitation to the alpha/beta/gamma, so I just use the information from the book.

Your question is also my problem , it became more different result to TBATS of R.

Actually I think I have found the reference here - take a look at Table 10.1 in Hyndman's book. It looks like he's using the third set of constraints, which relax the positivity constraint if there is damping.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@Leoyzen
Copy link
Contributor Author

Leoyzen commented Feb 28, 2024

After 8 years, I've making a progress now.
Identifying code by statespace.ExponentialSmoothing, I made some change in order to implement SSOE(or Innovation State Space) within current framework better.

The key change is from the fomula below.

Considering SSOE (or Innovation State space) formula:

$$\displaylines{y_t = w \alpha_{t-1} + \varepsilon_t \\\ \alpha_{t} = F \alpha_{t-1}+ g \varepsilon_t}$$

and the normal state space formula:

$$\displaylines{y_t = Z \alpha_t \\\ \alpha_t = T \alpha_{t-1} + R \eta_{t-1}}$$

according to the timing differences, we can rewrite the normal formual to

$$\displaylines{y_t = ZT \alpha_{t-1} + ZR\eta_t \\\ \alpha_{t} = T \alpha_{t-1} + TR \eta_{t-1} }$$

then we make sure $ZR=1$ and $TR=g$ .

$ZR=1$ can be ensure by adding a hidden state in transition matrix like ExponentialSmoothing does.

and $TR=g$ can be resolved by scipy.linalg.solve.

    def update(self, params, transformed=True, includes_fixed=False, complex_step=False):
       ......
       # exclude ma from caculation.It's ok.
        end = -q if q != 0 else None
        # we assume $TR = g $
        # so we have to resolve R from $R = T^ * g$
        self["selection", 1:end, 0] = la.solve(
            self["transition", 1:end, 1:end], self._internal_selection[:end, 0]
        )
        # we need ensure $ ZR = 1 $. so the first value of selection matrix should be
        # caculate from others.
        self["selection", 0, 0] = 1 - self["design", :, 1:].dot(
            self["selection", 1:, :]
        )

Then the code from can be removed.

def _initialize_constant_statespace(self, initial_state):
        .....
        # Apply the prediction step to get to what we need for our Kalman
        # filter implementation
        # This is not needed anymore because we compute within update step.
        # constant = np.dot(self.ssm["transition"], constant)

        self.initialization.constant = constant

This implement has been verified by both official implements and python implements, also manually caculated.

So it will be nice when a new class such as InnovationRepresentation which inherent from Representation and implement such changes.

@ChadFulton Can you take a look and give some advices before I moving forward?

@ChadFulton
Copy link
Member

Thanks @Leoyzen for coming back to this, I will take a look soon (unfortunately, it might not be until this weekend)

@Leoyzen
Copy link
Contributor Author

Leoyzen commented Mar 5, 2024

I just created a class named InnovationsMLEModel to simplify the procedure implementing the SSOE Model in Kalman Filters.

The params and results has been verified with both Python tbats and R forecast package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
tsa-statespace
In progress / Needs work
Development

Successfully merging this pull request may close these issues.

None yet

8 participants