Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLS.fit().params contains only 1 value #7884

Open
jmakov opened this issue Nov 16, 2021 · 1 comment
Open

GLS.fit().params contains only 1 value #7884

jmakov opened this issue Nov 16, 2021 · 1 comment
Labels

Comments

@jmakov
Copy link

jmakov commented Nov 16, 2021

Describe the bug

I'm doing a linear fit with statsmodels.api.GLS. I stumbled upon a case where .fit() produces a result that has only 1 value in .params. I'd expect to get 2 - beta and the intercept (y=beta*x + intercept). I checked the docs, but sadly I'm no smarter. Am I perhaps using this incorrectly?

Code Sample, a copy-pastable example if possible

import statsmodels.api as sm

first_normalized_pc = pd.Series(...)
second_normalized_pc = pd.Series(...)

# get rolling difference
window_len = 100
differences = []
idx_start = 10000
idx_end = 40000

for i in range(idx_start, idx_end):
    window_end = i + window_len
    x = first_normalized_pc[i:window_end]
    S1 = sm.add_constant(x.values)
    y = second_normalized_pc[i:window_end]
    results = sm.GLS(y.values, S1).fit()
    differences.append(results.params[1] * x[-1] - y[-1] + results.params[0])

Expected Output

I'd naively expect that results.params contains 2 items.

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.7.10.final.0
OS: Linux 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

statsmodels

Installed: 0.12.2 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/statsmodels)

Required Dependencies

cython: 0.29.24 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/Cython)
numpy: 1.20.3 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/numpy)
scipy: 1.5.4 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/scipy)
pandas: 1.3.2 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/pandas)
dateutil: 2.8.2 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/dateutil)
patsy: 0.5.2 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/patsy)

Optional Dependencies

matplotlib: 3.4.3 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/matplotlib)
backend: module://matplotlib_inline.backend_inline
cvxopt: Not installed
joblib: 1.0.1 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/joblib)

Developer Tools

IPython: 7.27.0 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/IPython)
jinja2: 3.0.1 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/jinja2)
sphinx: Not installed
pygments: 2.10.0 (/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/pygments)
pytest: Not installed
virtualenv: Not installed

@josef-pkt
Copy link
Member

this got lost.

The most likely reason is that x is constant in a window. By default, add_constant does NOT add a constant if one already exists.

In general it's better to add a constant to the dataframe, because then add_constant will not be in the loop and it avoids an additional array copy.

add_constant has an option to add the constant even if one already exists. However, in that case there will be two columns in x with constant values. In this case parameters will not be identified and the estimate by OLS/WLS/GLS is a pinv regularized solution.

@josef-pkt josef-pkt added the FAQ label Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants