Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatching dimensions in extmath.weighted_mode #4454

Closed
hgascon opened this issue Mar 27, 2015 · 5 comments
Closed

Mismatching dimensions in extmath.weighted_mode #4454

hgascon opened this issue Mar 27, 2015 · 5 comments
Labels
Milestone

Comments

@hgascon
Copy link

hgascon commented Mar 27, 2015

Apparently, this check here isn't enough.

When using neighbors.RadiousNeighborsClassifer.predict, I keep getting errors due to the mismatching dimensions in the next line :
ValueError: operands could not be broadcast together with shapes (8792) (8807)
ValueError: operands could not be broadcast together with shapes (3) (2)
...

@amueller
Copy link
Member

amueller commented Apr 1, 2015

Thanks for the report.
Can you give an example with synthetic data that is crashing?

@amueller
Copy link
Member

amueller commented Apr 1, 2015

Also, which version of scikit-learn are you using? (and sorry for the slow reply, I was sick)

@amueller amueller added the Bug label Apr 1, 2015
@hgascon
Copy link
Author

hgascon commented Apr 1, 2015

>>> sklearn.__version__
> '0.14-git'

If a.shape != w.shape and a.shape > w.shape there is no problem. But if a.shape < w.shape the arrays can't be broadcast in the next definition w = np.zeros(a.shape, dtype=w.dtype) + w

In [53]: X_train
Out[53]:
<10076x70068 sparse matrix of type '<type 'numpy.int64'>'
    with 1466563 stored elements in Compressed Sparse Column format>
In [54]: X_val
Out[54]:
<2520x70068 sparse matrix of type '<type 'numpy.int64'>'
    with 437226 stored elements in Compressed Sparse Column format>
In [55]: rnc = RadiusNeighborsClassifier(radius=30, weights='distance', outlier_label='unknown')
In [56]: rnc.fit(X_train, y_train)
In [57]: y_pred = rnc.predict(X_val)

/usr/local/lib/python2.7/site-packages/sklearn/utils/extmath.py in weighted_mode(a, w, axis)
    304
    305     if a.shape != w.shape:
--> 306         w = np.zeros(a.shape, dtype=w.dtype) + w
ValueError: operands could not be broadcast together with shapes (10032) (10050)

@amueller
Copy link
Member

amueller commented Apr 1, 2015

The code looks like it tries to handle different ndims, not different shapes.
I don't understand what the function is supposed to do if the shapes are different.
As I said, it would be great if you could reproduce with random data.

Also, I'm not sure what is happening in train here. Your X_train seems to have 70068 datapoints, while y_train only has four labels.

Could you try upgrading scikit-learn to 0.16? You use an old development version.

@amueller amueller modified the milestone: 0.19 Sep 29, 2016
@amueller
Copy link
Member

closing as no reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants