Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: incorrect definition of piecewise boolean indicator array in Hampel.rho (*robust* module) #8696

Closed
kkmita opened this issue Feb 21, 2023 · 2 comments · Fixed by #8801
Closed
Milestone

Comments

@kkmita
Copy link

kkmita commented Feb 21, 2023

Describe the bug

There is a bug in class Hampel within statsmodels/robust/norms.py in defining piecewise function over z in Hampel.rho method.

It results in incorrect values for the Hampel.rho method outcomes.

The rho method uses _subset method to calculate boolean indicator arrays for t1, t2, and t3 pieces, with the last piece t4 defining the whole range needed for the Hampel's $\rho$ function being calculated in the rho method. This last piece is incorrect in the calculation of the boolean indicator array, I show in the code below.

Note that I can't reproduce the issue perfectly, since Hampel.rho does not rely on Hampel._subset entirely in defining piecewise boolean indicator arrays, as opposed to Hampel.psi or Hampel.weights.

Code Sample, a copy-pastable example if possible

import statsmodels.api as sm
# create hampel norm with default parameters a, b, c
hampel = sm.robust.norms.Hampel(a=2., b=4., c=8.)
# create specially crafted t1, t2, t3
t1, t2, t3 = hampel._subset([1, 3, 5, 9])
# calculate t4 as in `Hampel.rho` (line 639 calculates that implicitly, without assigning variable `t4`)
t4 = (1 - t1 + t2 + t3)
# t4 gives `array([0, 2, 2, 1])`

Note: As you can see, there are many issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates.

Note: Please be sure you are using the latest released version of statsmodels, or a recent build of main. If your problem has been fixed in an unreleased version, you might be able to use main until a new release occurs.

Note: If you are using a released version, have you verified that the bug exists in the main branch of this repository? It helps the limited resources if we know problems exist in the current main branch so that they do not need to check whether the code sample produces a bug in the next release.

If the issue has not been resolved, please file it in the issue tracker.

Expected Output

The implicit t4 that is calculated "on the fly" in Hampel.rho should result in an array array([0, 0, 0, 1]) to pick only the last piecewise boolean indicator array filter.

In fact, it is obvious the "on the fly" calculation in Hampel.rho line 639 should be

(1 - (t1 + t2 + t3)) * a * (b + c - a))

rather than the existing

(1 - t1 + t2 + t3) * a * (b + c - a))

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.8.10.final.0
OS: Linux 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8

statsmodels

Installed: 0.13.5 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.22.3 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/numpy)
scipy: 1.8.0 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/scipy)
pandas: 1.4.1 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/pandas)
dateutil: 2.8.2 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/dateutil)
patsy: 0.5.3 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/patsy)

Optional Dependencies

matplotlib: 3.5.1 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/matplotlib)
backend: module://matplotlib_inline.backend_inline
cvxopt: Not installed
joblib: 1.1.0 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/joblib)

Developer Tools

IPython: 8.1.1 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/IPython)
jinja2: 3.0.3 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/jinja2)
sphinx: Not installed
pygments: 2.11.2 (/home/kamil/.cache/pypoetry/virtualenvs/smps-aJ6sAQBH-py3.8/lib/python3.8/site-packages/pygments)
pytest: Not installed
virtualenv: Not installed

@kkmita
Copy link
Author

kkmita commented Feb 21, 2023

Note that this affects the same line of code as in #8697 but a different problem is raised in this BUG issue.

@josef-pkt
Copy link
Member

see #8697 (comment)

hampel will be fixed in #8801

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants