Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: cluster: _vq unable to handle large features #3702

Merged
merged 3 commits into from
Jun 4, 2014
Merged

BUG: cluster: _vq unable to handle large features #3702

merged 3 commits into from
Jun 4, 2014

Conversation

cairijun
Copy link
Contributor

@cairijun cairijun commented Jun 3, 2014

_vq.vq should initialize outdists to INF but ndarray.fill seems to treat NPY_INFINITY as an int32 and initialize outdists to 2 ^ 31 - 1, so that the result may be wrong when the features are large. Use np.inf to fill outdists would be safer.

ndarray.fill(np.inf) will fail with numpy 1.5 (only in Cython, no idea why), but np.PyArray_FillWithScalar will not. So use the latter instead.

@cairijun
Copy link
Contributor Author

cairijun commented Jun 3, 2014

Ah... the 50min error again.

outcodes = np.empty((nobs,), dtype=np.int32)
np.PyArray_FillWithScalar(outdists, np.inf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPY_INFINITY is a (C) double. I suspect the problem here is that the ndarray.fill method expects a python object rather than a C number, i.e., you should use np.inf instead. The np.PyArray_FillWithScalar function, OTOH, can handle a C type.

I think you should just try replacing NPY_INFINITY with np.inf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I see you tried that. Could be a bug in 1.5, which is truly ancient these days. What Cython version are you using?

@cairijun
Copy link
Contributor Author

cairijun commented Jun 3, 2014

Oops, numpy 1.5 failed on test__vq_sametype because _vq didn't check the dtype of obs before using it to initialize outdists, and this test passes an int array as the obs and it blows up outdists.fill. Fixed in 6b81841. outdists.fill(np.inf) should work now.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling f0438db on richardtsai:vq_large_features into 9f89371 on scipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling f0438db on richardtsai:vq_large_features into 9f89371 on scipy:master.

if obs.dtype != codes.dtype:
raise ValueError('observation and code should have same dtype')
if obs.dtype not in (np.float32, np.float64):
raise ValueError('type other than float or double not supported')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these look like TypeError rather than ValueError.

_vq.vq should initialize outdists to INF but ndarray.fill seems
to treat NPY_INFINITY as an int32 and initialize outdists
as 2 ^ 31 - 1, so that the result may be wrong when the features
are large. Use np.inf to fill outdists would be safer.
if obs.dtype != codes.dtype:
raise TypeError('observation and code should have same dtype')
if obs.dtype not in (np.float32, np.float64):
raise TypeError('type other than float or double not supported')
if obs_a.ndim != codes_a.ndim:
raise ValueError('observation and code should have same rank')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since rank is deprecated, might as well update this to "have the same number of dimensions". Might have to put that on the next line to fit the line length restriction:

        raise ValueError(
            "observation and code should have the same number of dimensions")

@charris
Copy link
Member

charris commented Jun 3, 2014

LGTM modulo nitpick.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 7578838 on richardtsai:vq_large_features into 9f89371 on scipy:master.

@cairijun cairijun closed this Jun 4, 2014
@cairijun cairijun reopened this Jun 4, 2014
@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 719da20 on richardtsai:vq_large_features into 9f89371 on scipy:master.

charris added a commit that referenced this pull request Jun 4, 2014
BUG: cluster: _vq unable to handle large features
@charris charris merged commit 5e24556 into scipy:master Jun 4, 2014
@charris
Copy link
Member

charris commented Jun 4, 2014

Thanks Richard.

@rgommers rgommers added this to the 0.15.0 milestone Jun 4, 2014
@cairijun cairijun deleted the vq_large_features branch June 5, 2014 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.cluster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants