New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests failing #2
Comments
Thanks for reporting this Tim. I suspected having floating point calculations in the tests might lead to some issues -- I've changed the tests to only match the first two digits after the decimal. Could you please paste the output of |
Flask==0.10.1
Jinja2==2.6
MDP==3.3
PIL==1.1.7
PySAL==1.5.0
PySide==1.1.2
PyYAML==3.10
Pygments==1.6
SQLAlchemy==0.8.1
Sphinx==1.1.3
Werkzeug==0.9.1
astropy==0.2.3
atom==0.2.3
binstar-client==0.1.0
biopython==1.61
bitarray==0.8.1
boto==2.9.6
casuarius==1.1
chaco==4.2.1
conda==1.8.1
cubes==0.10.2
distribute==0.6.45
docutils==0.10
enable==4.2.1
enaml==0.7.6
gevent==0.13.8
gevent-websocket==0.3.6
gevent-zeromq==0.2.2
greenlet==0.4.1
grin==1.2.1
h5py==2.1.1
ipython==0.13.2
itsdangerous==0.21
keyring==1.4
llvmmath==0.1
llvmpy==0.11.3
lxml==3.2.1
matplotlib==1.2.1
menuinst==1.0.1
meta==development
moves==0.1
networkx==1.7
nltk==2.0.4
nose==1.3.0
numba==0.9.0
numexpr==2.0.1
numpy==1.7.1
pandas==0.12.0
pep8==1.4.5
ply==3.4
praw==2.1.4
psutil==0.7.1
py==1.4.14
pycosat==0.6.0
pycparser==2.09.1
pycrypto==2.6
pyface==4.2.1
pyflakes==0.7.2
pyparsing==1.5.6
pyreadline==2.0-dev1
pytest==2.3.5
python-dateutil==2.1
pytz==2013b
pywin32==218.4
pyzmq==2.2.0.1
requests==1.2.3
rope==0.9.4
scikit-image==0.8.2
scikit-learn==0.14.1
scipy==0.12.0
simplejson==3.3.0
six==1.3.0
sklearn-pandas==0.0.3
spyder==2.2.0
statsmodels==0.4.3
sympy==0.7.2
tables==2.4.0
tornado==3.1
traits==4.2.1
traitsui==4.2.1
tweepy==2.1
update-checker==0.5
vincent==0.2
wsgiref==0.1.2
xlrd==0.9.2
xlwt==0.7.5 |
I'm able to reproduce this. It seems the interface of sklearn has changed. The following code fails with scikit-learn 0.14.1 but works with scikit-learn 0.13.1:
I need to investigate further to see if this is a sklearn bug or if the tests need to be adjusted appropriately. |
I second that it is a "bug" in sklearn 0.14, in file sklearn/utils/multiclass.py, function if y.ndim > 2 or y.dtype == object:
return 'unknown' It will return 'unknown' for any np.array like strings. So it works fine with ['cat', 'dog', 'fish'], but not with np.asarray(['cat', 'dog', 'fish']) anymore. |
Good find. Still, it seems a little weird for the behaviour to change based on the data representation of two arrays that numpy considers equivalent. I'm going to take a better look at the sklearn code to see if there's a good reason behind this and if I can't find one I'll file a bug report on that project.
|
It seems to be a deeper disconnect between sklearn and pandas than I'd hoped: pandas seems to want string arrays to have the "object" dtype, while sklearn expects them to have the appropriate numpy string datatype. I've found a few hack solutions but none that I feel good about publishing. I'm continuing to dig into the sklearn code to see if there's a better way. |
I think we want to support |
I wish numpy had a dtype for variable length strings... |
Yes, it's a shame that an array of sequences looks the same (from a dtype perspective) as an array of variable-length strings. The least hacky workaround that I can think of without changing sklearn is to convert the arrays to fixed-length strings ( |
+1 for the temp workaround in sklearn_pandas to restaure sklearn 0.14 compat and I will create an issue for the sklearn project. |
@paulgb wouldn't converting to list of string or unicode before sending to sklearn work even better? |
That would work, but the issue is more with knowing when to convert to strings without having to do a scan of the entire table. I experimented a little more about pandas internals and it seems that I've updated the code here and on PyPi to version 0.0.4 which includes this fix. I'd like to wait to hear some feedback on whether it solved the problem before closing this issue. |
Closing this optimistically based on tests passing and things working for Tim (https://twitter.com/tdhopper/status/381088588739272704) |
First time one of my tweets has ever been mentioned as a reason for closing a bug report. |
There is a mismatch in "What you can pass" Vs. "What you are actually passing". This means that the scikit-learn library is not able to recognize what type of problem you want to solve ( regression or classification ). The Unknown label type: 'unknown' error raised related to the Y values that you use in scikit-learn . solutions
|
I'm seeing a bunch of tests fail. I'm on Windows 7 with Python 2.7.5 via Anaconda 1.6.2 (64-bit).
The text was updated successfully, but these errors were encountered: