ARDL fitted params in summary() are not real #8700

satyrmipt · 2023-02-23T09:38:36Z

Problem short description:

I use ARDL on a sandbox task to study all the parameters and how they work.
I fitted the model and i get perfect result (task is designed for model to return perfect results).
But when i look at m.params or m.summary(), values of fitted perameters are not the same as model uses to predic 100% preciese values.

**Problem detailed description:**

### EXAMPLE START
# import libs:
import pandas as pd
from statsmodels.tsa.api import ARDL

#create test data from preciese math formula
sq = [x**2 + x + 11 for x in range(0,101)]
tdf = pd.DataFrame({'sq':sq})

# fit ARDL model and print model information
mt1 = ARDL(tdf['sq'], seasonal = False, lags=1, trend = 'ct' ).fit()
print(mt1.params)
print(mt1.summary())
print(mt1.forecast(5))

### EXAMPLE END

As you see i'm working with f(t) = t^2 + t + 11 and it colud be represented as recurrent sequence in form of
f(t) = 1 * f(t-1) + 2 * t which is in perfect agreement with what ARDL model is expected to predict: lag=1 dependence and linear time trend. And it does perfect prediction as you can see in output of print(mt1.forecast(5)) line

Expected fitted parameters are: const = 0, trend = 2, L1 = 1
But both mt1.summary() and mt1.params return: const= 2, trend = 2, L1 =1
Why the constant equals to 2 and how model was able to predict perfect results with this value of const? What is the final equation model uses to predict correct values with this value of const?

General form of equation can be found here in "Notes" section.

Expected Output

I expected to get mt1.aparms as

  const    0
  trend    2.0
  sq.L1    1.0
  dtype: float64

for data of the form f(t) = 1*f(t-1) + 2*t + 0

Output of `import statsmodels.api as sm; sm.show_versions()`

[paste the output of import statsmodels.api as sm; sm.show_versions() here below this line]

INSTALLED VERSIONS
------------------
Python: 3.9.12.final.0

statsmodels
===========

Installed: 0.13.5 (C:\Users\my_user\Anaconda3\lib\site-packages\statsmodels)

Required Dependencies
=====================

cython: 0.29.28 (C:\Users\my_user\Anaconda3\lib\site-packages\Cython)
numpy: 1.21.5 (C:\Users\my_user\Anaconda3\lib\site-packages\numpy)
scipy: 1.7.3 (C:\Users\my_user\Anaconda3\lib\site-packages\scipy)
pandas: 1.4.2 (C:\Users\my_user\Anaconda3\lib\site-packages\pandas)
    dateutil: 2.8.2 (C:\Users\my_user\Anaconda3\lib\site-packages\dateutil)
patsy: 0.5.2 (C:\Users\my_user\Anaconda3\lib\site-packages\patsy)

Optional Dependencies
=====================

matplotlib: 3.5.1 (C:\Users\my_user\Anaconda3\lib\site-packages\matplotlib)
    backend: module://matplotlib_inline.backend_inline 
cvxopt: Not installed
joblib: 1.1.0 (C:\Users\my_user\Anaconda3\lib\site-packages\joblib)

Developer Tools
================

IPython: 8.2.0 (C:\Users\my_user\Anaconda3\lib\site-packages\IPython)
    jinja2: 2.11.3 (C:\Users\my_user\Anaconda3\lib\site-packages\jinja2)
sphinx: 4.4.0 (C:\Users\my_user\Anaconda3\lib\site-packages\sphinx)
    pygments: 2.11.2 (C:\Users\my_user\Anaconda3\lib\site-packages\pygments)
pytest: 7.1.1 (C:\Users\my_user\Anaconda3\lib\site-packages\pytest)
virtualenv: Not installed

The text was updated successfully, but these errors were encountered:

bashtage · 2023-02-23T09:45:53Z

Can you post full output?

satyrmipt · 2023-02-23T10:17:22Z

Can you post full output?

Full output of the code in my issue and picture of output:


const   -2.0
trend    2.0
sq.L1    1.0
dtype: float64
                              ARDL Model Results                              
==============================================================================
Dep. Variable:                     sq   No. Observations:                  101
Model:                       ARDL(1,)   Log Likelihood                2412.000
Method:               Conditional MLE   S.D. of innovations              0.000
Date:                Thu, 23 Feb 2023   AIC                          -4816.001
Time:                        12:22:40   BIC                          -4805.580
Sample:                             1   HQIC                         -4811.783
                                  101                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -2.0000   2.61e-12  -7.67e+11      0.000      -2.000      -2.000
trend          2.0000   1.14e-13   1.75e+13      0.000       2.000       2.000
sq.L1          1.0000    1.1e-15   9.06e+14      0.000       1.000       1.000
==============================================================================
101    10313.0
102    10517.0
103    10723.0
104    10931.0
105    11141.0
dtype: float64

josef-pkt · 2023-02-23T14:44:52Z

my guess is that this depends on the origin of trend, what's the first t?
It looks to me that the negative constant is compensating for one trend period.

satyrmipt · 2023-02-23T15:05:59Z

As you can see my sequence is:
sq = pd.Series([x**2 + x + 11 for x in range(0,101)])
and first 5 terms are:

0    11
1    13
2    17
3    23
4    31
dtype: int64

so the first t=0, the first x = 0 and you could fast check first row members may be generated by
f(t) = 1*f(t-1) + 2*t + 0
where
f(0) = 11
So there are no -2 in the recurrent formula which generates the sequence.

Let's make direct a calculations for 100th and 101th members of the row (i mean for t = 100 and t=101) :

100**2+100+11 = 10111
101**2+101+11 = 10313

now check indirect calculations using f(t) = 1*f(t-1) + 2*t + 0:
10111+2*101 + 0 = 10313
The same result for 101th member by both methods.

Originaly i have thought that the f(t) have to be dependant on t-1 and not on t. But i have linked earlier that it is not true, it y(t) depends exactly on t:

Anyway, if you see im somehow wrong, please post the recurrent formula which is both generate my sequence and in good agreement with

const   -2.0
trend    2.0
sq.L1    1.0
dtype: float64

josef-pkt · 2023-02-23T15:15:25Z

Yes, theoretically

But how are trend and initial conditions implemented in ARDL. I never looked at it

f(t) = 1f(t-1) + 2t + 0
define t0 by t= t0 - 1
f(t) = 1f(t-1) + 2(t0 - 1) + 0 = 1f(t-1) + 2t0 - 2

satyrmipt · 2023-02-23T17:09:52Z

Do you have any idea how we can change the sequence in a way to force the model to add t^2-term? It will help to check whether there is a shift in t_zero. Originally i thought about it in most naive way:

bashtage · 2023-02-27T09:25:12Z

You would need to manually encode t**2 as an exogenous variable since the max trend supported is 'ct'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARDL fitted params in summary() are not real #8700

ARDL fitted params in summary() are not real #8700

satyrmipt commented Feb 23, 2023 •

edited

bashtage commented Feb 23, 2023

satyrmipt commented Feb 23, 2023 •

edited

josef-pkt commented Feb 23, 2023

satyrmipt commented Feb 23, 2023 •

edited

josef-pkt commented Feb 23, 2023 •

edited

satyrmipt commented Feb 23, 2023 •

edited

bashtage commented Feb 27, 2023

ARDL fitted params in summary() are not real #8700

ARDL fitted params in summary() are not real #8700

Comments

satyrmipt commented Feb 23, 2023 • edited

Expected Output

Output of import statsmodels.api as sm; sm.show_versions()

bashtage commented Feb 23, 2023

satyrmipt commented Feb 23, 2023 • edited

josef-pkt commented Feb 23, 2023

satyrmipt commented Feb 23, 2023 • edited

josef-pkt commented Feb 23, 2023 • edited

satyrmipt commented Feb 23, 2023 • edited

bashtage commented Feb 27, 2023

satyrmipt commented Feb 23, 2023 •

edited

Output of `import statsmodels.api as sm; sm.show_versions()`

satyrmipt commented Feb 23, 2023 •

edited

satyrmipt commented Feb 23, 2023 •

edited

josef-pkt commented Feb 23, 2023 •

edited

satyrmipt commented Feb 23, 2023 •

edited