New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN add collections of imbalanced datasets #249

Merged
merged 18 commits into from Apr 6, 2017

Conversation

Projects
None yet
3 participants
@glemaitre
Member

glemaitre commented Mar 18, 2017

Reference Issue

Fixes #247

What does this implement/fix? Explain your changes.

Make a fetcher for an imbalanced datasets collections available on Zenodo maintained by ourself.

Any other comments?

TODO:

  • Write the fetcher;
  • Unit tests;
  • Write documentation
@pep8speaks

This comment has been minimized.

pep8speaks commented Mar 18, 2017

Hello @glemaitre! Thanks for updating the PR.

Line 108:18: E128 continuation line under-indented for visual indent
Line 109:18: E128 continuation line under-indented for visual indent
Line 110:18: E128 continuation line under-indented for visual indent
Line 111:18: E128 continuation line under-indented for visual indent

Comment last updated on April 03, 2017 at 13:01 Hours UTC
@codecov

This comment has been minimized.

codecov bot commented Mar 18, 2017

Codecov Report

Merging #249 into master will decrease coverage by 0.09%.
The diff coverage is 95.19%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #249     +/-   ##
=========================================
- Coverage   98.27%   98.18%   -0.1%     
=========================================
  Files          58       60      +2     
  Lines        3427     3530    +103     
=========================================
+ Hits         3368     3466     +98     
- Misses         59       64      +5
Impacted Files Coverage Δ
imblearn/datasets/__init__.py 100% <100%> (ø) ⬆️
imblearn/datasets/tests/test_zenodo.py 89.18% <89.18%> (ø)
imblearn/datasets/zenodo.py 98.46% <98.46%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ca3037...28e7830. Read the comment docs.

@glemaitre glemaitre force-pushed the glemaitre:fetcher_dataset branch from 16caacf to ac13c51 Mar 19, 2017

glemaitre added some commits Mar 19, 2017

@glemaitre glemaitre changed the title from [WIP] EHN add collections of imbalanced datasets to [MRG] EHN add collections of imbalanced datasets Mar 19, 2017

@glemaitre glemaitre changed the title from [MRG] EHN add collections of imbalanced datasets to [WIP] EHN add collections of imbalanced datasets Mar 19, 2017

@glemaitre

This comment has been minimized.

Member

glemaitre commented Mar 19, 2017

Still have an issue with the docstring regarding the table. It is not so nice. I have to check what can be done.

glemaitre added some commits Mar 25, 2017

@glemaitre glemaitre changed the title from [WIP] EHN add collections of imbalanced datasets to [MRG] EHN add collections of imbalanced datasets Mar 25, 2017

@glemaitre

This comment has been minimized.

Member

glemaitre commented Mar 25, 2017

@chkoar This one can be merged also. The PEP8 has been corrected and the table looks nice.

doc/api.rst Outdated
@@ -158,7 +158,7 @@ Functions
:toctree: generated/

datasets.make_imbalance

datasets.fetch_zenodo

This comment has been minimized.

@chkoar

chkoar Mar 26, 2017

Member

Why not just fetch_datasets?

This comment has been minimized.

@glemaitre

glemaitre Mar 27, 2017

Member

I wanted something similar to fetch_mldata

@@ -14,6 +14,8 @@ New features

- Turn off steps in :class:`pipeline.Pipeline` using the `None`
object. By `Christos Aridas`_.
- Add a fetching method `datasets.fetch_zenodo` in order to get some

This comment has been minimized.

@chkoar

chkoar Mar 26, 2017

Member

I would say function instead of method


__all__ = ['make_imbalance']
__all__ = ['make_imbalance',

This comment has been minimized.

@chkoar

chkoar Mar 26, 2017

Member

Sort them if you like

This comment has been minimized.

@glemaitre

glemaitre Mar 27, 2017

Member

Uhm not sure since that they are not in the same file.

download_if_missing=True,
random_state=None,
shuffle=False):
"""Load the Higgs dataset, downloading it if necessary.

This comment has been minimized.

@chkoar

chkoar Mar 26, 2017

Member

L112 could be removed

Returns
-------
datasets : OrderedDict of Bunch object,

This comment has been minimized.

@chkoar

chkoar Mar 26, 2017

Member

Dictionary of Bunch objects.

This comment has been minimized.

@glemaitre

glemaitre Mar 27, 2017

Member

I would like to specified that this is an ordered dictionary. I change the code in accordance

glemaitre added some commits Mar 27, 2017

@@ -0,0 +1,279 @@
"""Collection of imbalanced datasets.
This collection of datasets have been proposed in [1]_. The

This comment has been minimized.

@chkoar

chkoar Mar 30, 2017

Member

I think that is: "This collection of datasets has been proposed..." or "These datasets have been proposed..."

@glemaitre

This comment has been minimized.

Member

glemaitre commented Mar 30, 2017

@chkoar done

@glemaitre glemaitre force-pushed the glemaitre:fetcher_dataset branch from 79e2216 to 5f79302 Mar 30, 2017

----------
data_home : string, optional (default=None)
Specify another download and cache folder for the datasets. By default
all scikit learn data is stored in '~/scikit_learn_data' subfolders.

This comment has been minimized.

@chkoar

chkoar Mar 31, 2017

Member

scikit-learn :)

@chkoar

This comment has been minimized.

Member

chkoar commented Apr 3, 2017

My main concern here is the name of the function. I would call it like fetch_datasets , fetch_data or something like that, since we choose to host them over zenodo. Apart from that it could be easily merged! :P

@glemaitre

This comment has been minimized.

Member

glemaitre commented Apr 3, 2017

Ok so I went for fetch_datasets

@chkoar

This comment has been minimized.

Member

chkoar commented Apr 3, 2017

Will we change the filenames?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Apr 6, 2017

Will we change the filenames?

Not for the moment. It has no influence on the API or the import. So if we come with a better name we can change it afterwards.

@chkoar chkoar merged commit 3c54ea6 into scikit-learn-contrib:master Apr 6, 2017

4 of 6 checks passed

codecov/patch 95.19% of diff hit (target 98.27%)
Details
codecov/project 98.18% (-0.1%) compared to 5ca3037
Details
ci/circleci Your tests passed on CircleCI!
Details
code-quality/landscape Code quality decreased by -0.13%
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

glemaitre added a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017

glemaitre added a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment