Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARDL fitted params in summary() are not real #8700

Open
satyrmipt opened this issue Feb 23, 2023 · 7 comments
Open

ARDL fitted params in summary() are not real #8700

satyrmipt opened this issue Feb 23, 2023 · 7 comments

Comments

@satyrmipt
Copy link

satyrmipt commented Feb 23, 2023

Problem short description:

I use ARDL on a sandbox task to study all the parameters and how they work.
I fitted the model and i get perfect result (task is designed for model to return perfect results).
But when i look at m.params or m.summary(), values of fitted perameters are not the same as model uses to predic 100% preciese values.

**Problem detailed description:**
### EXAMPLE START
# import libs:
import pandas as pd
from statsmodels.tsa.api import ARDL

#create test data from preciese math formula
sq = [x**2 + x + 11 for x in range(0,101)]
tdf = pd.DataFrame({'sq':sq})

# fit ARDL model and print model information
mt1 = ARDL(tdf['sq'], seasonal = False, lags=1, trend = 'ct' ).fit()
print(mt1.params)
print(mt1.summary())
print(mt1.forecast(5))

### EXAMPLE END

As you see i'm working with f(t) = t^2 + t + 11 and it colud be represented as recurrent sequence in form of
f(t) = 1 * f(t-1) + 2 * t which is in perfect agreement with what ARDL model is expected to predict: lag=1 dependence and linear time trend. And it does perfect prediction as you can see in output of print(mt1.forecast(5)) line

Expected fitted parameters are: const = 0, trend = 2, L1 = 1
But both mt1.summary() and mt1.params return: const= 2, trend = 2, L1 =1
Why the constant equals to 2 and how model was able to predict perfect results with this value of const? What is the final equation model uses to predict correct values with this value of const?

General form of equation can be found here in "Notes" section.

Expected Output

I expected to get mt1.aparms as

  const    0
  trend    2.0
  sq.L1    1.0
  dtype: float64

for data of the form f(t) = 1*f(t-1) + 2*t + 0

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of import statsmodels.api as sm; sm.show_versions() here below this line]

INSTALLED VERSIONS
------------------
Python: 3.9.12.final.0

statsmodels
===========

Installed: 0.13.5 (C:\Users\my_user\Anaconda3\lib\site-packages\statsmodels)

Required Dependencies
=====================

cython: 0.29.28 (C:\Users\my_user\Anaconda3\lib\site-packages\Cython)
numpy: 1.21.5 (C:\Users\my_user\Anaconda3\lib\site-packages\numpy)
scipy: 1.7.3 (C:\Users\my_user\Anaconda3\lib\site-packages\scipy)
pandas: 1.4.2 (C:\Users\my_user\Anaconda3\lib\site-packages\pandas)
    dateutil: 2.8.2 (C:\Users\my_user\Anaconda3\lib\site-packages\dateutil)
patsy: 0.5.2 (C:\Users\my_user\Anaconda3\lib\site-packages\patsy)

Optional Dependencies
=====================

matplotlib: 3.5.1 (C:\Users\my_user\Anaconda3\lib\site-packages\matplotlib)
    backend: module://matplotlib_inline.backend_inline 
cvxopt: Not installed
joblib: 1.1.0 (C:\Users\my_user\Anaconda3\lib\site-packages\joblib)

Developer Tools
================

IPython: 8.2.0 (C:\Users\my_user\Anaconda3\lib\site-packages\IPython)
    jinja2: 2.11.3 (C:\Users\my_user\Anaconda3\lib\site-packages\jinja2)
sphinx: 4.4.0 (C:\Users\my_user\Anaconda3\lib\site-packages\sphinx)
    pygments: 2.11.2 (C:\Users\my_user\Anaconda3\lib\site-packages\pygments)
pytest: 7.1.1 (C:\Users\my_user\Anaconda3\lib\site-packages\pytest)
virtualenv: Not installed
@bashtage
Copy link
Member

Can you post full output?

@satyrmipt
Copy link
Author

satyrmipt commented Feb 23, 2023

Can you post full output?

Full output of the code in my issue and picture of output:


const   -2.0
trend    2.0
sq.L1    1.0
dtype: float64
                              ARDL Model Results                              
==============================================================================
Dep. Variable:                     sq   No. Observations:                  101
Model:                       ARDL(1,)   Log Likelihood                2412.000
Method:               Conditional MLE   S.D. of innovations              0.000
Date:                Thu, 23 Feb 2023   AIC                          -4816.001
Time:                        12:22:40   BIC                          -4805.580
Sample:                             1   HQIC                         -4811.783
                                  101                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -2.0000   2.61e-12  -7.67e+11      0.000      -2.000      -2.000
trend          2.0000   1.14e-13   1.75e+13      0.000       2.000       2.000
sq.L1          1.0000    1.1e-15   9.06e+14      0.000       1.000       1.000
==============================================================================
101    10313.0
102    10517.0
103    10723.0
104    10931.0
105    11141.0
dtype: float64

image

@josef-pkt
Copy link
Member

my guess is that this depends on the origin of trend, what's the first t?
It looks to me that the negative constant is compensating for one trend period.

@satyrmipt
Copy link
Author

satyrmipt commented Feb 23, 2023

As you can see my sequence is:
sq = pd.Series([x**2 + x + 11 for x in range(0,101)])
and first 5 terms are:

0    11
1    13
2    17
3    23
4    31
dtype: int64

so the first t=0, the first x = 0 and you could fast check first row members may be generated by
f(t) = 1*f(t-1) + 2*t + 0
where
f(0) = 11
So there are no -2 in the recurrent formula which generates the sequence.

Let's make direct a calculations for 100th and 101th members of the row (i mean for t = 100 and t=101) :

100**2+100+11 = 10111
101**2+101+11 = 10313

now check indirect calculations using f(t) = 1*f(t-1) + 2*t + 0:
10111+2*101 + 0 = 10313
The same result for 101th member by both methods.

Originaly i have thought that the f(t) have to be dependant on t-1 and not on t. But i have linked earlier that it is not true, it y(t) depends exactly on t:
image

Anyway, if you see im somehow wrong, please post the recurrent formula which is both generate my sequence and in good agreement with

const   -2.0
trend    2.0
sq.L1    1.0
dtype: float64

@josef-pkt
Copy link
Member

josef-pkt commented Feb 23, 2023

Yes, theoretically

But how are trend and initial conditions implemented in ARDL. I never looked at it

f(t) = 1f(t-1) + 2t + 0
define t0 by t= t0 - 1
f(t) = 1f(t-1) + 2(t0 - 1) + 0 = 1f(t-1) + 2t0 - 2

@satyrmipt
Copy link
Author

satyrmipt commented Feb 23, 2023

Do you have any idea how we can change the sequence in a way to force the model to add t^2-term? It will help to check whether there is a shift in t_zero. Originally i thought about it in most naive way:
image

@bashtage
Copy link
Member

You would need to manually encode t**2 as an exogenous variable since the max trend supported is 'ct'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants