New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FIX/TST pass argument to ratio as callable #307

Merged
merged 3 commits into from Aug 11, 2017

Conversation

Projects
None yet
3 participants
@glemaitre
Member

glemaitre commented Jul 31, 2017

Reference Issue

closes #305

What does this implement/fix? Explain your changes.

Allow to pass argument to ratio when this is a function.
It could be useful when the heuristic is outside of the function

Any other comments?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Jul 31, 2017

@chkoar Here it comes. The multiplier could be decided by the user and passed to the function.

@codecov

This comment has been minimized.

codecov bot commented Jul 31, 2017

Codecov Report

Merging #307 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #307      +/-   ##
==========================================
+ Coverage   98.32%   98.34%   +0.01%     
==========================================
  Files          68       68              
  Lines        3890     3918      +28     
==========================================
+ Hits         3825     3853      +28     
  Misses         65       65
Impacted Files Coverage Δ
imblearn/utils/validation.py 100% <100%> (ø) ⬆️
imblearn/utils/tests/test_validation.py 100% <100%> (ø) ⬆️
imblearn/ensemble/balance_cascade.py 100% <0%> (ø) ⬆️
...prototype_selection/neighbourhood_cleaning_rule.py 100% <0%> (ø) ⬆️
...ampling/prototype_selection/one_sided_selection.py 100% <0%> (ø) ⬆️
.../under_sampling/prototype_selection/tomek_links.py 100% <0%> (ø) ⬆️
..._sampling/prototype_selection/tests/test_allknn.py 100% <0%> (ø) ⬆️
imblearn/combine/smote_tomek.py 100% <0%> (ø) ⬆️
imblearn/datasets/imbalance.py 100% <0%> (ø) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33660d4...d251db9. Read the comment docs.

@massich

This comment has been minimized.

Contributor

massich commented Aug 1, 2017

LGTM

@glemaitre glemaitre changed the title from [MRG] FIX/TST pass argument to ratio as callable to [MRG+1] FIX/TST pass argument to ratio as callable Aug 1, 2017

@chkoar

This comment has been minimized.

Member

chkoar commented Aug 1, 2017

I am -1 on this. That's the point of a callable as ratio.

  1. Design a function that accepts a y as input.
  2. Do whatever you want.
import numpy as np

from collections import Counter
from imblearn.utils import check_ratio
from sklearn.utils.testing import assert_equal

y = np.array([1] * 50 + [2] * 100 + [3] * 25)

def ratio_func(y):
    """samples such that each class will be affected by the multiplier."""
    multiplier = {1: 1.5, 2: 1, 3: 3}
    target_stats = Counter(y)
    return {key: int(values * multiplier[key])
            for key, values in target_stats.items()}

ratio_ = check_ratio(ratio_func, y, 'over-sampling')
assert_equal(ratio_, {1: 25, 2: 0, 3: 50})

@glemaitre glemaitre changed the title from [MRG+1] FIX/TST pass argument to ratio as callable to [RFC] FIX/TST pass argument to ratio as callable Aug 1, 2017

@massich

This comment has been minimized.

Contributor

massich commented Aug 1, 2017

I agree, that you could do what ever you want inside ratio_func and therefore you might not need it. But at the same time, if you can do whatever you want, why wouldn't be possible to get any parameter passed to such function.

What I mean, is:
A) we thought about the API as @chkoar call
B) however, users would like to call it passing parameters.

In the case of having a ratio function that takes some parameter, (A) could wrapper the call, (B) could do it directly. I don't see much problem in allowing the behavior since (A) remains perfectly valid. I might still wrap the function 'cos its more clear. But maybe some user can explode the flexibility of (B) in some pipelining or whatever.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 1, 2017

We probably might want to use **kwargs in this case. The only thing is that we need to make sure it is working inside a grid-search.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 2, 2017

@chkoar Does the change with kwargs make sense.

@chkoar

This comment has been minimized.

Member

chkoar commented Aug 3, 2017

Still a user has to provide a function like this one, right?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 3, 2017

yep this is one of the possibility. Like this you allow both behaviour.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 3, 2017

In addition of the previous behaviour

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 4, 2017

@chkoar Working on the documentation, I went through an example which is really useful.

We are sharing the same ratio behaviour between making a dataset imbalanced than using algorithms to make them balanced.

Therefore, it could be nice to have this feature, if you want to try different level of imbalancing in a dataset. Before, we could provide something like ratio=[0.1, 0.3, 0.5]. If we still want to have a float multiplier we need to define a function but more precisely one function by multiplier.

If we use **kwargs we can efficiently avoid those several functions in a more practical manner.

So IMHO, it makes sense in this configuration, much more than in the opposite one (grid-search the ratio for the algorithm, to get the best performance)

@glemaitre glemaitre changed the title from [RFC] FIX/TST pass argument to ratio as callable to [MRG] FIX/TST pass argument to ratio as callable Aug 4, 2017

@glemaitre glemaitre force-pushed the scikit-learn-contrib:master branch from 9395cbe to 333d81b Aug 9, 2017

@glemaitre

This comment has been minimized.

Member

glemaitre commented Aug 9, 2017

@chkoar are you still -1?

@chkoar

This comment has been minimized.

Member

chkoar commented Aug 10, 2017

I am neutral

@glemaitre glemaitre force-pushed the scikit-learn-contrib:master branch 2 times, most recently from 1b22868 to 33660d4 Aug 11, 2017

@glemaitre glemaitre merged commit d4e52e5 into scikit-learn-contrib:master Aug 11, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
code-quality/landscape Code quality remained the same
Details
codecov/patch 100% of diff hit (target 98.36%)
Details
codecov/project Absolute coverage decreased by -0.02% but relative coverage increased by +1.63% compared to dd941e7
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment