-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: cluster: _vq unable to handle large features #3702
Conversation
Ah... the 50min error again. |
outcodes = np.empty((nobs,), dtype=np.int32) | ||
np.PyArray_FillWithScalar(outdists, np.inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPY_INFINITY is a (C) double. I suspect the problem here is that the ndarray.fill
method expects a python object rather than a C number, i.e., you should use np.inf instead. The np.PyArray_FillWithScalar
function, OTOH, can handle a C type.
I think you should just try replacing NPY_INFINITY
with np.inf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I see you tried that. Could be a bug in 1.5, which is truly ancient these days. What Cython version are you using?
Oops, numpy 1.5 failed on |
if obs.dtype != codes.dtype: | ||
raise ValueError('observation and code should have same dtype') | ||
if obs.dtype not in (np.float32, np.float64): | ||
raise ValueError('type other than float or double not supported') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these look like TypeError
rather than ValueError
.
_vq.vq should initialize outdists to INF but ndarray.fill seems to treat NPY_INFINITY as an int32 and initialize outdists as 2 ^ 31 - 1, so that the result may be wrong when the features are large. Use np.inf to fill outdists would be safer.
if obs.dtype != codes.dtype: | ||
raise TypeError('observation and code should have same dtype') | ||
if obs.dtype not in (np.float32, np.float64): | ||
raise TypeError('type other than float or double not supported') | ||
if obs_a.ndim != codes_a.ndim: | ||
raise ValueError('observation and code should have same rank') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since rank
is deprecated, might as well update this to "have the same number of dimensions". Might have to put that on the next line to fit the line length restriction:
raise ValueError(
"observation and code should have the same number of dimensions")
LGTM modulo nitpick. |
BUG: cluster: _vq unable to handle large features
Thanks Richard. |
_vq.vq
should initializeoutdists
to INF butndarray.fill
seems to treatNPY_INFINITY
as an int32 and initialize outdists to2 ^ 31 - 1
, so that the result may be wrong when the features are large. Usenp.inf
to filloutdists
would be safer.ndarray.fill(np.inf)
will fail with numpy 1.5 (only in Cython, no idea why), butnp.PyArray_FillWithScalar
will not. So use the latter instead.