Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFM results change if columns are shuffled #6936

Closed
DanielWeitzenfeld opened this issue Jul 31, 2020 · 2 comments · Fixed by #6937
Closed

DFM results change if columns are shuffled #6936

DanielWeitzenfeld opened this issue Jul 31, 2020 · 2 comments · Fixed by #6937

Comments

@DanielWeitzenfeld
Copy link

Describe the bug

Using the same dataset, but shuffling the order of the columns, sometimes results in different results for sm.DFM. Specifically, the factor is at different scales.

Code Sample, a copy-pastable example if possible

https://gist.github.com/DanielWeitzenfeld/1835112662bc489be6cc0a6bb170db25

The factors are the same shape, but at different scales. In the notebook above, the difference is small, but in the example that motivated my posting this, the difference is an order of magnitude and seems to lead to significantly different fitted values after smoothing.

I get that the model is poorly identified scale-wise. It seems the only constraint on the scale of the factor is that the variance on the innovations is I. But why does shuffling the columns lead to different scales?

INSTALLED VERSIONS

Python: 3.6.0.final.0
OS: Darwin 19.3.0 Darwin Kernel Version 19.3.0: Thu Jan 9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64 x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

statsmodels

Installed: 0.11.1 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.18.2 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/numpy)
scipy: 1.1.0 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/scipy)
pandas: 0.23.4 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/pandas)
dateutil: 2.7.3 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/dateutil)
patsy: 0.5.0 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/patsy)

Optional Dependencies

matplotlib: 3.2.1 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/matplotlib)
backend: module://ipykernel.pylab.backend_inline
cvxopt: Not installed
joblib: 0.12.2 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/joblib)

Developer Tools

IPython: 6.2.1 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/IPython)
jinja2: 2.10 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/jinja2)
sphinx: Not installed
pygments: 2.2.0 (/Users/weitzenfeld/.virtualenvs/leftfoot/lib/python3.6/site-packages/pygments)
pytest: Not installed
virtualenv: Not installed

@ChadFulton
Copy link
Member

Thanks for the report, this is a pretty interesting example. It looks to me like the issue here isn't the identification of the factor, but just that the optimizer is finding different parameters depending on the order of the factors. If you check the llf results attribute, your dfm1 model is associated with a higher log-likelihood.

I don't know why this is. Maybe has to do with the order of the parameters. If you check the start_params attributes on each of the models, they are the same, except permuted as the variables were. So maybe if the optimizer starts making adjustments on one of the parameters first, it gets into a region with a higher local maximum?

@ChadFulton
Copy link
Member

The problem here is also exacerbated by the fact that the underlying factor is not well-approximated by a stationary factor.

I'm going to mark this as a "won't fix", but in some tests the EM algorithm approach of the new DynamicFactorMQ class in #6937 appears to get to the optimum regardless of column ordering. So I'll also mark this issue as being closed by that PR.

@bashtage bashtage added this to the 0.12 milestone Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants