Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Arff Data not working #57

Closed
ameet-1997 opened this issue May 30, 2017 · 5 comments
Closed

Loading Arff Data not working #57

ameet-1997 opened this issue May 30, 2017 · 5 comments
Assignees

Comments

@ameet-1997
Copy link

The first line from the official docs pertaining to loading datasets,
from skmultilearn.dataset import Dataset,
shows the error" ImportError: cannot import name 'Dataset' "

@ameet-1997 ameet-1997 changed the title Loading Arff Date not working Loading Arff Data not working May 30, 2017
@ChristianSch
Copy link
Member

ChristianSch commented Jun 28, 2017

Hey there,
the docs are wrong, you can load arff data sets like this:

from skmultilearn.dataset import load_dataset
d = load_dataset('enron', 'undivided')

d is a dictionary which (in the case of enron) gives you X, y or features and labels as keys:

X, y = d['X'], d['y']

Available data sets can be listed like this:

from skmultilearn.dataset import available_data_sets
available_data_sets()

For more information please see here.

@mattalhonte
Copy link

Not working here either, on Python 3.5 with Anaconda. First time, it failed with trying to load the Standard Library from future. Monkeypatched that import out with a comment.

Now I'm getting this:
from skmultilearn.dataset import available_data_sets
available_data_sets()

TypeError                                 Traceback (most recent call last)
<ipython-input-73-c3b496b1614f> in <module>()
      1 from skmultilearn.dataset import available_data_sets
----> 2 available_data_sets()

~/miniconda3/envs/py35/lib/python3.5/site-packages/skmultilearn/dataset.py in available_data_sets()
    106     archives = get_dataset_list()
    107     archives = [x.split(';')[-1].split('.')[0].split('-') for x in archives.split('\n')]
--> 108     variants = set()
    109     for a in archives:
    110         if a[0] not in variants:

TypeError: a bytes-like object is required, not 'str'

@ljvmiranda921
Copy link

I think this is the same bytes/string issue regarding the incompatibility between Python 2 and 3.
To the creators, I'm wondering if you are doing an integration test (like with travis-ci) to atleast check the compatibility.

@ChristianSch
Copy link
Member

ChristianSch commented Sep 27, 2017

I can replicate this problem for a clean install of python 3.6, however not for python 2. I'll look into it.

@niedakh
Copy link
Contributor

niedakh commented Mar 31, 2018

@ChristianSch thank you for the fix!

ChristianSch added a commit to ChristianSch/scikit-multilearn that referenced this issue Jun 2, 2018
* Added support for sparse X and y.
Corrected small typo: "ytring" -> "ystring"

* Added tests for sparse X and y.

* Added a return state to the fit() method to comlpy with the usual interface of scikit-learn.

* Cleanup: deleted unused variables, corrected variable name case, ...

* fixed dataset fetching and listing; closes scikit-multilearn#57

* hotfix: removed standard library packages from requirements.txt to prevent typosquatting and malicious code execution

* Fix np.zeros in rakelo.py (scikit-multilearn#76)

This commit fixes the use of np.zeros, formerly np.zeroes, to resolve
Issue scikit-multilearn#76

Author: ljvmiranda921
Email: ljvmiranda@gmail.com

* add citation info

test if slack commit log works

* Update README.md (scikit-multilearn#74)

This commit updates the README.md for this library.
New sections were added:

- A short description of the project
- Features
- Dependencies
- Installation
- Basic Usage
- Contributing
- Cite

Two badges were also added:
- PyPI badge
- License badge

Next steps (can only be done by project owner):
- Add travis-ci badge once a successfull travis-build is
implemented (owner only).
- Add documentation badge (owner only)

Author: ljvmiranda921
E-mail: ljvmiranda@gmail.com

* initial travis setup

* fix tests

* remove pyc in travis

* name container properly, pass MEKA_CLASSPATH

* add travis

* add slack notifications

* trying multi-env travis

* fixed igraph package name

* add test requirements

* add osx test build

* fix linux py3 build name

* fix osx homebrew repo name

* add one more osx homebrew repo

* enforce p3 on osx in travis

* fix python osx problems on travis

* pip2 instead of pip

* cask remove oclint

* pip3 instead of pip

* explicit python3 on osx for travis

* correc liac-arff req

* add dtype to np.zeros

* add a per label binarizer for quality measures, closes scikit-multilearn#84

* fix BRkNN top label number selection syntax error

* mod fit for sparse y columns

ensure that sparse y columns of shapes like (800,1) (returned from some classifiers etc) are converted properly to shape (800,) -otherwise bugs are thrown by some scikit validation functions.

* proper processing of output matrix structures

ensure proper back and forth conversions of y values shaped like (800,1) and (800,) - to avoid errors thrown by some scikit validation functions.  
np.ravel does not properly process some matrices unless they are first cast to arrays.

* make the code much more readable

* some more variables renamed to a more informative name

* import issparse, reformat

* import numpy

* prediction transposition in CC is no longer required

* fix returning 1d label vector and testing for that

* fix meka io bytes/strings and decode if needed, reformat

* enforce updating to current docker image

* fix travis for meka tests

* add self-edge normalization option and fix test

* disable weight normalization on unweighted graph

* separate graph builders and label space clusterers, more tests written, some parameter sets for graphtool do not work atm

* formatting changes

* don't build a list if noone needs it

* fix some circularity problems with typing

* less extensive data testing for now, a lot of cases fail with certain generator params, due to one-classiness of partitions

* fix stochastic block modelling based on graphtool

* fix and standardize clustering output alongside with a proper integration test in label space partitioning classifier

* remove osx travis for now, not working anyway

* adjust partitions to new test data

* adjust labelset sizes to new test data

* make sure correct version is pulled

* change the default rakeld/rakelo behaviour to include labelpowerset, a voting classifier is added to allow overlapping classification in rakel style with any clusterer and clasifier, a rakel's clustering logic is moved to a random clusterer

* fix CV base test

* travis python2 should work correctly now, with new devdocker

* introduce a working test case instead of randomly crashing generators

* add absoulte imports to fix igraph import in p2

* fix tests and add set_params support for clusters so that CV works

* some flaky setup line for travis

* fix random label space clusterer test with overlaps to pass

* temporary workaround for dense matrices

* workaround output in matrix shape as well

* update documentation

* documentation and naming corrections

* fix rename of cluster_*
ChristianSch added a commit to ChristianSch/scikit-multilearn that referenced this issue Jun 2, 2018
* Added support for sparse X and y.
Corrected small typo: "ytring" -> "ystring"

* Added tests for sparse X and y.

* Added a return state to the fit() method to comlpy with the usual interface of scikit-learn.

* Cleanup: deleted unused variables, corrected variable name case, ...

* fixed dataset fetching and listing; closes scikit-multilearn#57

* hotfix: removed standard library packages from requirements.txt to prevent typosquatting and malicious code execution

* Fix np.zeros in rakelo.py (scikit-multilearn#76)

This commit fixes the use of np.zeros, formerly np.zeroes, to resolve
Issue scikit-multilearn#76

Author: ljvmiranda921
Email: ljvmiranda@gmail.com

* add citation info

test if slack commit log works

* Update README.md (scikit-multilearn#74)

This commit updates the README.md for this library.
New sections were added:

- A short description of the project
- Features
- Dependencies
- Installation
- Basic Usage
- Contributing
- Cite

Two badges were also added:
- PyPI badge
- License badge

Next steps (can only be done by project owner):
- Add travis-ci badge once a successfull travis-build is
implemented (owner only).
- Add documentation badge (owner only)

Author: ljvmiranda921
E-mail: ljvmiranda@gmail.com

* implemented iterative stratification with high-order relationships support (Sechidis2011, Szymanski2017)

* fix imports, random state and fold init

* unit tests

* add documentation

* initial travis setup

* fix tests

* remove pyc in travis

* name container properly, pass MEKA_CLASSPATH

* add travis

* add slack notifications

* trying multi-env travis

* fixed igraph package name

* add test requirements

* add osx test build

* fix linux py3 build name

* fix osx homebrew repo name

* add one more osx homebrew repo

* enforce p3 on osx in travis

* fix python osx problems on travis

* pip2 instead of pip

* cask remove oclint

* pip3 instead of pip

* explicit python3 on osx for travis

* correc liac-arff req

* add dtype to np.zeros

* add a per label binarizer for quality measures, closes scikit-multilearn#84

* fix BRkNN top label number selection syntax error

* fix normalization of confidence computation

normalize by number of neighbors counted, rather than label count (original produces results > 1 when k > num_labels.)

* mod fit for sparse y columns

ensure that sparse y columns of shapes like (800,1) (returned from some classifiers etc) are converted properly to shape (800,) -otherwise bugs are thrown by some scikit validation functions.

* proper processing of output matrix structures

ensure proper back and forth conversions of y values shaped like (800,1) and (800,) - to avoid errors thrown by some scikit validation functions.  
np.ravel does not properly process some matrices unless they are first cast to arrays.

* make the code much more readable

* some more variables renamed to a more informative name

* import issparse, reformat

* import numpy

* prediction transposition in CC is no longer required

* fix returning 1d label vector and testing for that

* fix meka io bytes/strings and decode if needed, reformat

* enforce updating to current docker image

* fix travis for meka tests

* add self-edge normalization option and fix test

* disable weight normalization on unweighted graph

* separate graph builders and label space clusterers, more tests written, some parameter sets for graphtool do not work atm

* formatting changes

* don't build a list if noone needs it

* fix some circularity problems with typing

* less extensive data testing for now, a lot of cases fail with certain generator params, due to one-classiness of partitions

* fix stochastic block modelling based on graphtool

* fix and standardize clustering output alongside with a proper integration test in label space partitioning classifier

* remove osx travis for now, not working anyway

* adjust partitions to new test data

* adjust labelset sizes to new test data

* make sure correct version is pulled

* change the default rakeld/rakelo behaviour to include labelpowerset, a voting classifier is added to allow overlapping classification in rakel style with any clusterer and clasifier, a rakel's clustering logic is moved to a random clusterer

* fix CV base test

* travis python2 should work correctly now, with new devdocker

* introduce a working test case instead of randomly crashing generators

* add absoulte imports to fix igraph import in p2

* fix tests and add set_params support for clusters so that CV works

* some flaky setup line for travis

* fix random label space clusterer test with overlaps to pass

* temporary workaround for dense matrices

* workaround output in matrix shape as well

* update documentation

* documentation and naming corrections

* fix rename of cluster_*

* adhere to review

* document helpers

* see reqs/dev

* clean up requirements

* adhere to new docs convention

* skip graphtool on windows

* make external library imports optional

* first approach at windows oriented CI

* copy how requests did it

* fix paths

* more work on appveyor

* change path delimeter for windows XD

* one more path separator fix

* add missing wrapper for windows

* give up on cmd_in_env atm

* skip igraph test on win32 atm

* ignore java for a moment

* fix cmdlet

* more debugging of appveyor

* more appveyor

* don't build for now, just test

* give up igraph/graphtool tests on win32

* fix test command on windows

* more disabling of igraph and graphtool on win32

* fix the louvain community dependency

* close file before removal, should fix meka on windows

* setup slack notifications

* fix yaml indent

* add appveyor badge

* save file name before closing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants