Loading Arff Data not working #57

ameet-1997 · 2017-05-30T04:56:32Z

The first line from the official docs pertaining to loading datasets,
from skmultilearn.dataset import Dataset,
shows the error" ImportError: cannot import name 'Dataset' "

ChristianSch · 2017-06-28T11:12:49Z

Hey there,
the docs are wrong, you can load arff data sets like this:

from skmultilearn.dataset import load_dataset
d = load_dataset('enron', 'undivided')

d is a dictionary which (in the case of enron) gives you X, y or features and labels as keys:

X, y = d['X'], d['y']

Available data sets can be listed like this:

from skmultilearn.dataset import available_data_sets
available_data_sets()

For more information please see here.

mattalhonte · 2017-08-26T06:11:01Z

Not working here either, on Python 3.5 with Anaconda. First time, it failed with trying to load the Standard Library from future. Monkeypatched that import out with a comment.

Now I'm getting this:
from skmultilearn.dataset import available_data_sets
available_data_sets()

TypeError                                 Traceback (most recent call last)
<ipython-input-73-c3b496b1614f> in <module>()
      1 from skmultilearn.dataset import available_data_sets
----> 2 available_data_sets()

~/miniconda3/envs/py35/lib/python3.5/site-packages/skmultilearn/dataset.py in available_data_sets()
    106     archives = get_dataset_list()
    107     archives = [x.split(';')[-1].split('.')[0].split('-') for x in archives.split('\n')]
--> 108     variants = set()
    109     for a in archives:
    110         if a[0] not in variants:

TypeError: a bytes-like object is required, not 'str'

ljvmiranda921 · 2017-08-29T15:07:37Z

I think this is the same bytes/string issue regarding the incompatibility between Python 2 and 3.
To the creators, I'm wondering if you are doing an integration test (like with travis-ci) to atleast check the compatibility.

ChristianSch · 2017-09-27T16:51:12Z

I can replicate this problem for a clean install of python 3.6, however not for python 2. I'll look into it.

fixed dataset fetching and listing; closes #57

niedakh · 2018-03-31T12:58:10Z

@ChristianSch thank you for the fix!

* Added support for sparse X and y. Corrected small typo: "ytring" -> "ystring" * Added tests for sparse X and y. * Added a return state to the fit() method to comlpy with the usual interface of scikit-learn. * Cleanup: deleted unused variables, corrected variable name case, ... * fixed dataset fetching and listing; closes scikit-multilearn#57 * hotfix: removed standard library packages from requirements.txt to prevent typosquatting and malicious code execution * Fix np.zeros in rakelo.py (scikit-multilearn#76) This commit fixes the use of np.zeros, formerly np.zeroes, to resolve Issue scikit-multilearn#76 Author: ljvmiranda921 Email: ljvmiranda@gmail.com * add citation info test if slack commit log works * Update README.md (scikit-multilearn#74) This commit updates the README.md for this library. New sections were added: - A short description of the project - Features - Dependencies - Installation - Basic Usage - Contributing - Cite Two badges were also added: - PyPI badge - License badge Next steps (can only be done by project owner): - Add travis-ci badge once a successfull travis-build is implemented (owner only). - Add documentation badge (owner only) Author: ljvmiranda921 E-mail: ljvmiranda@gmail.com * initial travis setup * fix tests * remove pyc in travis * name container properly, pass MEKA_CLASSPATH * add travis * add slack notifications * trying multi-env travis * fixed igraph package name * add test requirements * add osx test build * fix linux py3 build name * fix osx homebrew repo name * add one more osx homebrew repo * enforce p3 on osx in travis * fix python osx problems on travis * pip2 instead of pip * cask remove oclint * pip3 instead of pip * explicit python3 on osx for travis * correc liac-arff req * add dtype to np.zeros * add a per label binarizer for quality measures, closes scikit-multilearn#84 * fix BRkNN top label number selection syntax error * mod fit for sparse y columns ensure that sparse y columns of shapes like (800,1) (returned from some classifiers etc) are converted properly to shape (800,) -otherwise bugs are thrown by some scikit validation functions. * proper processing of output matrix structures ensure proper back and forth conversions of y values shaped like (800,1) and (800,) - to avoid errors thrown by some scikit validation functions. np.ravel does not properly process some matrices unless they are first cast to arrays. * make the code much more readable * some more variables renamed to a more informative name * import issparse, reformat * import numpy * prediction transposition in CC is no longer required * fix returning 1d label vector and testing for that * fix meka io bytes/strings and decode if needed, reformat * enforce updating to current docker image * fix travis for meka tests * add self-edge normalization option and fix test * disable weight normalization on unweighted graph * separate graph builders and label space clusterers, more tests written, some parameter sets for graphtool do not work atm * formatting changes * don't build a list if noone needs it * fix some circularity problems with typing * less extensive data testing for now, a lot of cases fail with certain generator params, due to one-classiness of partitions * fix stochastic block modelling based on graphtool * fix and standardize clustering output alongside with a proper integration test in label space partitioning classifier * remove osx travis for now, not working anyway * adjust partitions to new test data * adjust labelset sizes to new test data * make sure correct version is pulled * change the default rakeld/rakelo behaviour to include labelpowerset, a voting classifier is added to allow overlapping classification in rakel style with any clusterer and clasifier, a rakel's clustering logic is moved to a random clusterer * fix CV base test * travis python2 should work correctly now, with new devdocker * introduce a working test case instead of randomly crashing generators * add absoulte imports to fix igraph import in p2 * fix tests and add set_params support for clusters so that CV works * some flaky setup line for travis * fix random label space clusterer test with overlaps to pass * temporary workaround for dense matrices * workaround output in matrix shape as well * update documentation * documentation and naming corrections * fix rename of cluster_*

* Added support for sparse X and y. Corrected small typo: "ytring" -> "ystring" * Added tests for sparse X and y. * Added a return state to the fit() method to comlpy with the usual interface of scikit-learn. * Cleanup: deleted unused variables, corrected variable name case, ... * fixed dataset fetching and listing; closes scikit-multilearn#57 * hotfix: removed standard library packages from requirements.txt to prevent typosquatting and malicious code execution * Fix np.zeros in rakelo.py (scikit-multilearn#76) This commit fixes the use of np.zeros, formerly np.zeroes, to resolve Issue scikit-multilearn#76 Author: ljvmiranda921 Email: ljvmiranda@gmail.com * add citation info test if slack commit log works * Update README.md (scikit-multilearn#74) This commit updates the README.md for this library. New sections were added: - A short description of the project - Features - Dependencies - Installation - Basic Usage - Contributing - Cite Two badges were also added: - PyPI badge - License badge Next steps (can only be done by project owner): - Add travis-ci badge once a successfull travis-build is implemented (owner only). - Add documentation badge (owner only) Author: ljvmiranda921 E-mail: ljvmiranda@gmail.com * implemented iterative stratification with high-order relationships support (Sechidis2011, Szymanski2017) * fix imports, random state and fold init * unit tests * add documentation * initial travis setup * fix tests * remove pyc in travis * name container properly, pass MEKA_CLASSPATH * add travis * add slack notifications * trying multi-env travis * fixed igraph package name * add test requirements * add osx test build * fix linux py3 build name * fix osx homebrew repo name * add one more osx homebrew repo * enforce p3 on osx in travis * fix python osx problems on travis * pip2 instead of pip * cask remove oclint * pip3 instead of pip * explicit python3 on osx for travis * correc liac-arff req * add dtype to np.zeros * add a per label binarizer for quality measures, closes scikit-multilearn#84 * fix BRkNN top label number selection syntax error * fix normalization of confidence computation normalize by number of neighbors counted, rather than label count (original produces results > 1 when k > num_labels.) * mod fit for sparse y columns ensure that sparse y columns of shapes like (800,1) (returned from some classifiers etc) are converted properly to shape (800,) -otherwise bugs are thrown by some scikit validation functions. * proper processing of output matrix structures ensure proper back and forth conversions of y values shaped like (800,1) and (800,) - to avoid errors thrown by some scikit validation functions. np.ravel does not properly process some matrices unless they are first cast to arrays. * make the code much more readable * some more variables renamed to a more informative name * import issparse, reformat * import numpy * prediction transposition in CC is no longer required * fix returning 1d label vector and testing for that * fix meka io bytes/strings and decode if needed, reformat * enforce updating to current docker image * fix travis for meka tests * add self-edge normalization option and fix test * disable weight normalization on unweighted graph * separate graph builders and label space clusterers, more tests written, some parameter sets for graphtool do not work atm * formatting changes * don't build a list if noone needs it * fix some circularity problems with typing * less extensive data testing for now, a lot of cases fail with certain generator params, due to one-classiness of partitions * fix stochastic block modelling based on graphtool * fix and standardize clustering output alongside with a proper integration test in label space partitioning classifier * remove osx travis for now, not working anyway * adjust partitions to new test data * adjust labelset sizes to new test data * make sure correct version is pulled * change the default rakeld/rakelo behaviour to include labelpowerset, a voting classifier is added to allow overlapping classification in rakel style with any clusterer and clasifier, a rakel's clustering logic is moved to a random clusterer * fix CV base test * travis python2 should work correctly now, with new devdocker * introduce a working test case instead of randomly crashing generators * add absoulte imports to fix igraph import in p2 * fix tests and add set_params support for clusters so that CV works * some flaky setup line for travis * fix random label space clusterer test with overlaps to pass * temporary workaround for dense matrices * workaround output in matrix shape as well * update documentation * documentation and naming corrections * fix rename of cluster_* * adhere to review * document helpers * see reqs/dev * clean up requirements * adhere to new docs convention * skip graphtool on windows * make external library imports optional * first approach at windows oriented CI * copy how requests did it * fix paths * more work on appveyor * change path delimeter for windows XD * one more path separator fix * add missing wrapper for windows * give up on cmd_in_env atm * skip igraph test on win32 atm * ignore java for a moment * fix cmdlet * more debugging of appveyor * more appveyor * don't build for now, just test * give up igraph/graphtool tests on win32 * fix test command on windows * more disabling of igraph and graphtool on win32 * fix the louvain community dependency * close file before removal, should fix meka on windows * setup slack notifications * fix yaml indent * add appveyor badge * save file name before closing

ameet-1997 changed the title ~~Loading Arff Date not working~~ Loading Arff Data not working May 30, 2017

ChristianSch mentioned this issue Sep 27, 2017

Fix: interacting with dataset.py causes exception due to file modes #58

Closed

ChristianSch self-assigned this Sep 27, 2017

ChristianSch mentioned this issue Sep 27, 2017

fixed dataset fetching and listing; closes #57 #70

Merged

ChristianSch mentioned this issue Oct 5, 2017

Solid example codes and performance of skmultilearn #81

Closed

ChristianSch mentioned this issue Jan 8, 2018

Issue with WEKA multilabel ? #93

Closed

niedakh closed this as completed in cbf12cd Mar 31, 2018

niedakh added a commit that referenced this issue Mar 31, 2018

Merge pull request #70 from ChristianSch/fix-dataset-fetching

41f7534

fixed dataset fetching and listing; closes #57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading Arff Data not working #57

Loading Arff Data not working #57

ameet-1997 commented May 30, 2017

ChristianSch commented Jun 28, 2017 •

edited

mattalhonte commented Aug 26, 2017

ljvmiranda921 commented Aug 29, 2017

ChristianSch commented Sep 27, 2017 •

edited

niedakh commented Mar 31, 2018

Loading Arff Data not working #57

Loading Arff Data not working #57

Comments

ameet-1997 commented May 30, 2017

ChristianSch commented Jun 28, 2017 • edited

mattalhonte commented Aug 26, 2017

ljvmiranda921 commented Aug 29, 2017

ChristianSch commented Sep 27, 2017 • edited

niedakh commented Mar 31, 2018

ChristianSch commented Jun 28, 2017 •

edited

ChristianSch commented Sep 27, 2017 •

edited