Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FIX/TST pass argument to ratio as callable #307

Merged
merged 3 commits into from
Aug 11, 2017

Conversation

glemaitre
Copy link
Member

Reference Issue

closes #305

What does this implement/fix? Explain your changes.

Allow to pass argument to ratio when this is a function.
It could be useful when the heuristic is outside of the function

Any other comments?

@glemaitre
Copy link
Member Author

@chkoar Here it comes. The multiplier could be decided by the user and passed to the function.

@codecov
Copy link

codecov bot commented Jul 31, 2017

Codecov Report

Merging #307 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #307      +/-   ##
==========================================
+ Coverage   98.32%   98.34%   +0.01%     
==========================================
  Files          68       68              
  Lines        3890     3918      +28     
==========================================
+ Hits         3825     3853      +28     
  Misses         65       65
Impacted Files Coverage Δ
imblearn/utils/validation.py 100% <100%> (ø) ⬆️
imblearn/utils/tests/test_validation.py 100% <100%> (ø) ⬆️
imblearn/ensemble/balance_cascade.py 100% <0%> (ø) ⬆️
...prototype_selection/neighbourhood_cleaning_rule.py 100% <0%> (ø) ⬆️
...ampling/prototype_selection/one_sided_selection.py 100% <0%> (ø) ⬆️
.../under_sampling/prototype_selection/tomek_links.py 100% <0%> (ø) ⬆️
..._sampling/prototype_selection/tests/test_allknn.py 100% <0%> (ø) ⬆️
imblearn/combine/smote_tomek.py 100% <0%> (ø) ⬆️
imblearn/datasets/imbalance.py 100% <0%> (ø) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33660d4...d251db9. Read the comment docs.

@massich
Copy link
Contributor

massich commented Aug 1, 2017

LGTM

@glemaitre glemaitre changed the title [MRG] FIX/TST pass argument to ratio as callable [MRG+1] FIX/TST pass argument to ratio as callable Aug 1, 2017
@chkoar
Copy link
Member

chkoar commented Aug 1, 2017

I am -1 on this. That's the point of a callable as ratio.

  1. Design a function that accepts a y as input.
  2. Do whatever you want.
import numpy as np

from collections import Counter
from imblearn.utils import check_ratio
from sklearn.utils.testing import assert_equal

y = np.array([1] * 50 + [2] * 100 + [3] * 25)

def ratio_func(y):
    """samples such that each class will be affected by the multiplier."""
    multiplier = {1: 1.5, 2: 1, 3: 3}
    target_stats = Counter(y)
    return {key: int(values * multiplier[key])
            for key, values in target_stats.items()}

ratio_ = check_ratio(ratio_func, y, 'over-sampling')
assert_equal(ratio_, {1: 25, 2: 0, 3: 50})

@glemaitre glemaitre changed the title [MRG+1] FIX/TST pass argument to ratio as callable [RFC] FIX/TST pass argument to ratio as callable Aug 1, 2017
@massich
Copy link
Contributor

massich commented Aug 1, 2017

I agree, that you could do what ever you want inside ratio_func and therefore you might not need it. But at the same time, if you can do whatever you want, why wouldn't be possible to get any parameter passed to such function.

What I mean, is:
A) we thought about the API as @chkoar call
B) however, users would like to call it passing parameters.

In the case of having a ratio function that takes some parameter, (A) could wrapper the call, (B) could do it directly. I don't see much problem in allowing the behavior since (A) remains perfectly valid. I might still wrap the function 'cos its more clear. But maybe some user can explode the flexibility of (B) in some pipelining or whatever.

@glemaitre
Copy link
Member Author

We probably might want to use **kwargs in this case. The only thing is that we need to make sure it is working inside a grid-search.

@glemaitre
Copy link
Member Author

@chkoar Does the change with kwargs make sense.

@chkoar
Copy link
Member

chkoar commented Aug 3, 2017

Still a user has to provide a function like this one, right?

@glemaitre
Copy link
Member Author

yep this is one of the possibility. Like this you allow both behaviour.

@glemaitre
Copy link
Member Author

In addition of the previous behaviour

@glemaitre
Copy link
Member Author

@chkoar Working on the documentation, I went through an example which is really useful.

We are sharing the same ratio behaviour between making a dataset imbalanced than using algorithms to make them balanced.

Therefore, it could be nice to have this feature, if you want to try different level of imbalancing in a dataset. Before, we could provide something like ratio=[0.1, 0.3, 0.5]. If we still want to have a float multiplier we need to define a function but more precisely one function by multiplier.

If we use **kwargs we can efficiently avoid those several functions in a more practical manner.

So IMHO, it makes sense in this configuration, much more than in the opposite one (grid-search the ratio for the algorithm, to get the best performance)

@glemaitre glemaitre changed the title [RFC] FIX/TST pass argument to ratio as callable [MRG] FIX/TST pass argument to ratio as callable Aug 4, 2017
@glemaitre
Copy link
Member Author

@chkoar are you still -1?

@chkoar
Copy link
Member

chkoar commented Aug 10, 2017

I am neutral

@glemaitre glemaitre force-pushed the master branch 2 times, most recently from 1b22868 to 33660d4 Compare August 11, 2017 14:43
@glemaitre glemaitre merged commit d4e52e5 into scikit-learn-contrib:master Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ratio as callable should allow to pass arguments.
3 participants