BUG: cluster: _vq unable to handle large features #3702

cairijun · 2014-06-03T07:19:47Z

_vq.vq should initialize outdists to INF but ndarray.fill seems to treat NPY_INFINITY as an int32 and initialize outdists to 2 ^ 31 - 1, so that the result may be wrong when the features are large. Use np.inf to fill outdists would be safer.

ndarray.fill(np.inf) will fail with numpy 1.5 (only in Cython, no idea why), but np.PyArray_FillWithScalar will not. So use the latter instead.

cairijun · 2014-06-03T09:37:45Z

Ah... the 50min error again.

charris · 2014-06-03T12:56:25Z

scipy/cluster/_vq.pyx

    outcodes = np.empty((nobs,), dtype=np.int32)
+    np.PyArray_FillWithScalar(outdists, np.inf)


NPY_INFINITY is a (C) double. I suspect the problem here is that the ndarray.fill method expects a python object rather than a C number, i.e., you should use np.inf instead. The np.PyArray_FillWithScalar function, OTOH, can handle a C type.

I think you should just try replacing NPY_INFINITY with np.inf.

Oops, I see you tried that. Could be a bug in 1.5, which is truly ancient these days. What Cython version are you using?

cairijun · 2014-06-03T13:39:23Z

Oops, numpy 1.5 failed on test__vq_sametype because _vq didn't check the dtype of obs before using it to initialize outdists, and this test passes an int array as the obs and it blows up outdists.fill. Fixed in 6b81841. outdists.fill(np.inf) should work now.

coveralls · 2014-06-03T14:13:10Z

Coverage increased (+0.0%) when pulling f0438db on richardtsai:vq_large_features into 9f89371 on scipy:master.

coveralls · 2014-06-03T14:24:19Z

Coverage increased (+0.0%) when pulling f0438db on richardtsai:vq_large_features into 9f89371 on scipy:master.

charris · 2014-06-03T15:10:56Z

scipy/cluster/_vq.pyx

+    if obs.dtype != codes.dtype:
+        raise ValueError('observation and code should have same dtype')
+    if obs.dtype not in (np.float32, np.float64):
+        raise ValueError('type other than float or double not supported')


Both of these look like TypeError rather than ValueError.

_vq.vq should initialize outdists to INF but ndarray.fill seems to treat NPY_INFINITY as an int32 and initialize outdists as 2 ^ 31 - 1, so that the result may be wrong when the features are large. Use np.inf to fill outdists would be safer.

charris · 2014-06-03T16:09:55Z

scipy/cluster/_vq.pyx

+    if obs.dtype != codes.dtype:
+        raise TypeError('observation and code should have same dtype')
+    if obs.dtype not in (np.float32, np.float64):
+        raise TypeError('type other than float or double not supported')
    if obs_a.ndim != codes_a.ndim:
        raise ValueError('observation and code should have same rank')


Since rank is deprecated, might as well update this to "have the same number of dimensions". Might have to put that on the next line to fit the line length restriction:

raise ValueError( "observation and code should have the same number of dimensions")

charris · 2014-06-03T16:11:19Z

LGTM modulo nitpick.

coveralls · 2014-06-03T18:13:51Z

Coverage increased (+0.0%) when pulling 7578838 on richardtsai:vq_large_features into 9f89371 on scipy:master.

coveralls · 2014-06-04T03:52:49Z

Coverage increased (+0.0%) when pulling 719da20 on richardtsai:vq_large_features into 9f89371 on scipy:master.

BUG: cluster: _vq unable to handle large features

charris · 2014-06-04T15:48:35Z

Thanks Richard.

charris reviewed Jun 3, 2014
View reviewed changes

WarrenWeckesser added PR labels Jun 3, 2014

charris reviewed Jun 3, 2014
View reviewed changes

cairijun added 2 commits June 3, 2014 23:55

BUG: cluster: _vq should check the dtype before using it

02e78d9

charris reviewed Jun 3, 2014
View reviewed changes

MAINT: cluster: use 'ndim' instead of 'rank' in _vq

719da20

cairijun closed this Jun 4, 2014

cairijun reopened this Jun 4, 2014

charris added a commit that referenced this pull request Jun 4, 2014

Merge pull request #3702 from richardtsai/vq_large_features

5e24556

BUG: cluster: _vq unable to handle large features

charris merged commit 5e24556 into scipy:master Jun 4, 2014

rgommers added this to the 0.15.0 milestone Jun 4, 2014

cairijun deleted the vq_large_features branch June 5, 2014 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: cluster: _vq unable to handle large features #3702

BUG: cluster: _vq unable to handle large features #3702

cairijun commented Jun 3, 2014

cairijun commented Jun 3, 2014

charris Jun 3, 2014

charris Jun 3, 2014

cairijun commented Jun 3, 2014

coveralls commented Jun 3, 2014

coveralls commented Jun 3, 2014

charris Jun 3, 2014

charris Jun 3, 2014

charris commented Jun 3, 2014

coveralls commented Jun 3, 2014

coveralls commented Jun 4, 2014

charris commented Jun 4, 2014

		outcodes = np.empty((nobs,), dtype=np.int32)
		np.PyArray_FillWithScalar(outdists, np.inf)

BUG: cluster: _vq unable to handle large features #3702

BUG: cluster: _vq unable to handle large features #3702

Conversation

cairijun commented Jun 3, 2014

cairijun commented Jun 3, 2014

charris Jun 3, 2014

Choose a reason for hiding this comment

charris Jun 3, 2014

Choose a reason for hiding this comment

cairijun commented Jun 3, 2014

coveralls commented Jun 3, 2014

coveralls commented Jun 3, 2014

charris Jun 3, 2014

Choose a reason for hiding this comment

charris Jun 3, 2014

Choose a reason for hiding this comment

charris commented Jun 3, 2014

coveralls commented Jun 3, 2014

coveralls commented Jun 4, 2014

charris commented Jun 4, 2014