Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improvements to new ARIMA-type estimators #6159

Open
4 of 15 tasks
ChadFulton opened this issue Sep 11, 2019 · 12 comments
Open
4 of 15 tasks

ENH: Improvements to new ARIMA-type estimators #6159

ChadFulton opened this issue Sep 11, 2019 · 12 comments

Comments

@ChadFulton
Copy link
Member

ChadFulton commented Sep 11, 2019

Collection of follow-ups to #5827. These can/should be broken out into individual PRs. Many are relatively straightforward and would make a good first PR.

General

  • Documentation (none was added in original PR).
  • Release notes.
  • Example notebook.
  • Double-check how sm.tsa.arima.ARIMA works with fix_params (it should fail except when the fit method is statespace).
  • Estimators that do not support seasonal models per se should support models where the only seasonal part is seasonal differencing.

GLS

  • Add support for fixed parameters
  • Improve "Returns" documentation for other_results.
  • Add documentation for why we have e.g. include_constant but not other trend specifications (e.g. that it is to maintain consistency with estimation methods with assumptions that require demeaned series).
  • Fix the following test and put it back into the GLS test suite:
@pytest.mark.low_precision('Test against Example 6.6.3 in Brockwell and Davis'
                           ' (2016)')
# @pytest.mark.xfail(reason="Source appears to find suboptimal parameters")
def test_brockwell_davis_example_663():
    # TODO: the parameters described by BD appear to be suboptimal (based on
    # llf computed from state space form), so that this test fails. Should try
    # to confirm with ITSM2000 (i.e. see if we can get it to find better
    # parameters closer to what we find, or compare llf, or something).
    # TODO: quite a slow test, and xfail anyway due to finding better
    # parameters...
    # Get the data, perform seasonal differencing
    endog = sbl.diff(12).iloc[12:]

    exog = pd.Series((sbl.index > '1983-01-01').astype(int),
                     index=sbl.index).diff(12).iloc[12:]

    res, _ = gls(endog, exog, order=(0, 0, 12), max_iter=3)

    assert_allclose(res.exog_params, -328.45, atol=1e-2)
    assert_allclose(res.ma_params,
                    [.219, .098, .031, .064, .069, .111, .081,
                     .057, .092, -0.28, .183, -.672], atol=1e-3)
    assert_allclose(res.sigma2, 12581, atol=1)

Hannan Rissanen

  • Better warnings / errors when series are short relative to lag length
  • Add support for fixed parameters
  • Seems like we could add support for seasonal parameters in this model.
  • Tests for the case with the bias-corrected estimator.

Innovations MLE

  • Add support for fixed parameters

Innovations algorithm

  • Add support for ARMA models; see Brockwell and Davis (2016) p.154 and Example 5.1.6. This estimator should be feasible given the Cython versions of the general innovations algorithm that introduced in PERF: Cythonize innovations algo and filter #5947.
@rajathpatel23
Copy link

@ChadFulton I would like to work on this enhancement, would like to request some help in guiding to start working on this enhancement.

Thank you

@ChadFulton
Copy link
Member Author

@rajathpatel23 that would be great, thanks!

There are a number of things here that I think should be pretty self-contained, including:

  • Example notebooks: it would be great to add a notebook showing how to use the new models.
    • There are tons of options here that would be great. One would be to replicate some of the "Examples" sections from Brockwell and Davis' book.
  • Tests: we could use more unit tests in general, and there are two specific issues I called out above:
    • GLS test: the test I put in the issue, above, fails because we find better parameters than are reported in Brockwell and Davis' book. Some more investigation on this case would be useful, especially comparing against their ITSM2000 program (can be found at http://extras.springer.com/2002/978-0-387-21657-7/ITSM2000, only runs on Windows).
    • We don't have any tests for the HR estimator with the bias correction, and this is because I couldn't find any packages that implemented it. However, I notice that Gomez and Maravall (e.g. in their chapter "Automatic Modeling Methods for Univariate Series") also have implemented this bias correction in TRAMO/SEATS, and so I was thinking that it is probably available in X13-ARIMA/SEATS, and we might be able to test against that package somehow.
    • In general, it would be great to add more unit tests for any of the methods against ITSM2000 or X13-ARIMA/SEATS
  • Support for fixed parameters: GLS, innovations MLE, and Hannan Rissanen should each be able to support fixed parameters. If you wanted to try this, I would suggest picking just one to start with.

If you have a particular interest, I can go into more details.

@emilmirzayev
Copy link
Contributor

Hi Chad,

How one can add examle Notebooks?
Which directory I should put them?

Thanks,

@bashtage
Copy link
Member

Three steps to add a notebook:

  1. Add your notebook to examples/notebooks
  2. Add a block to the json file that contains the metadata for notebooks.
  3. Add a nice screenshot for your notebook that makes it look interesting in docs/_static/images

I like the format that @ChadFulton uses with the title. I used it here: https://github.com/statsmodels/statsmodels/blob/master/docs/source/_static/images/rolling_ls.png

@bashtage
Copy link
Member

NB: The json file determines where the example notebook appears in the docs, so please add in the correct section.

@ChadFulton ChadFulton modified the milestones: Someday, 0.11 Oct 6, 2019
@bashtage bashtage modified the milestones: 0.11, 0.12 Jan 24, 2020
@ChadFulton ChadFulton modified the milestones: 0.12, 0.13 Oct 27, 2020
@madhushree14
Copy link

Hi Chad,
I would like to pick up the "Support for fixed parameters". would you please share some more details on this enhancement?

@ChadFulton
Copy link
Member Author

Hi Chad,
I would like to pick up the "Support for fixed parameters". would you please share some more details on this enhancement?

That would be much appreciated, thanks!

First, a little background (maybe you already know this, but just in case you haven't run into this feature before). In our models, all of the parameters are typically estimated by maximum likelihood. But in some cases, you might want to "fix" one of the parameters to a particular value, and then estimate the other parameters by maximum likelihood. A simple example is:

endog = np.random.normal(size=100)

# AR(1) model with an intercept
mod = sm.tsa.SARIMAX(endog, order=(1, 0, 0), trend='c')

# Suppose we want to estimate the AR(1) coefficient, but we want to specify the intercept to be 0.5
with mod.fix_params({'intercept': 0.5}):
    res = mod.fit()
print(res.summary())

this results in:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:               SARIMAX(1, 0, 0)   Log Likelihood                -146.547
Date:                Thu, 10 Dec 2020   AIC                            297.094
Time:                        19:44:35   BIC                            302.305
Sample:                             0   HQIC                           299.203
                                - 100                                         
Covariance Type:                  opg                                         
=====================================================================================
                        coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------
intercept (fixed)     0.5000        nan        nan        nan         nan         nan
ar.L1                -0.1595      0.096     -1.655      0.098      -0.348       0.029
sigma2                1.0972      0.144      7.631      0.000       0.815       1.379
===================================================================================
Ljung-Box (L1) (Q):                   0.13   Jarque-Bera (JB):                 2.30
Prob(Q):                              0.72   Prob(JB):                         0.32
Heteroskedasticity (H):               1.26   Skew:                            -0.37
Prob(H) (two-sided):                  0.50   Kurtosis:                         2.97
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

A more complete set of examples can be found in this notebook: https://www.statsmodels.org/stable/examples/notebooks/generated/statespace_fixed_params.html.

This feature is available in state space models (and also in the new ETSModel class), but it's not yet been implemented for all of the new ARIMA-type estimators referenced here (including Innovations MLE, Hannan-Rissanen, and GLS).

I think the easiest place to get started would be adding fixed parameters to the Hannan-Rissanen estimator, which is the hannan_rissanen function in the file statsmodels/tsa/arima/estimators/hannan_rissanen.py. The basic idea here, is that ultimately the parameters are estimated by least squares (see lines 121 and 141), and so fixing certain parameter values is straightforward.

To see why it is straightforward with OLS, consider the following regression equation:

y = a + b x_1 + c x_2 + e

If we want to fix b = 2, then the equation is y = a + 2 x_1 + c x_2 + e or we can rewrite it as a different regression

z = a + c x_2 + e

where we have created the new variable z according to z = y - 2 x_1.

So basically, what I suggest is the following:

  1. Create a new Github issue (e.g. ENH: Fixed parameters in Hannan-Rissanen) for a more convenient location for discussion and advice going forwards.

  2. Add a new argument fixed_params to hannan_rissanen, which should accept a dictionary with parameter names as keys and the fixed numbers as values.

  3. In each case (lines 121 and 141), if there are fixed parameters, you need to remove the columns that correspond to those parameters from the exog arrays being passed to OLS (e.g. in line 121, the lagged_endog variable is passed as the exog argument), multiply them by the given fixed parameter value, and then subtract that to create a new endogenous variable (i.e. like the z I mentioned above).

Also, for the first attempt, I would just raise a NotImplementedError if unbiased=True (since I haven't really thought about fixed parameter values in this case).

Thanks!

@madhushree14
Copy link

Thank you Chad for this nice illustration. I will start working accordingly.

@madhushree14
Copy link

Hi Chad, could you please help me how to identify the corresponding column when removing the parameter from the 2d array?

@jackzyliu
Copy link
Contributor

Hi @ChadFulton, first time contributing here. I would like to help out if this issue is still open and active.

I can pick up where @madhushree14 left off on #7202, or I can go on to work on GLS. Either way, I'd be curious to investigate the result difference on Brockwell and Davis example 6.6.3 next.

@ChadFulton
Copy link
Member Author

Hi @jackzyliu, thanks, that would be much appreciated!

I think we need fixed parameters for HR and/or innovations MLE before we can support fixed parameters for GLS, so I will ping #7202 / #7355 to see what the status is.

In the meantime, if you have time/interest to take a look at the implementation of Hannan-Rissanen, that would be a great way to get into things here.

Thanks again!

@jtimko16
Copy link

jtimko16 commented Oct 9, 2022

Hello,

I am looking for a good issue for my very first contribution to any Python module. As I used ARIMA several times before, and have background in econometric, this could be a good choice.

Is there any open part of this task where I could contribute? Thanks for guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants