-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX/TST pass argument to ratio as callable #307
Conversation
@chkoar Here it comes. The multiplier could be decided by the user and passed to the function. |
Codecov Report
@@ Coverage Diff @@
## master #307 +/- ##
==========================================
+ Coverage 98.32% 98.34% +0.01%
==========================================
Files 68 68
Lines 3890 3918 +28
==========================================
+ Hits 3825 3853 +28
Misses 65 65
Continue to review full report at Codecov.
|
LGTM |
I am -1 on this. That's the point of a callable as ratio.
import numpy as np
from collections import Counter
from imblearn.utils import check_ratio
from sklearn.utils.testing import assert_equal
y = np.array([1] * 50 + [2] * 100 + [3] * 25)
def ratio_func(y):
"""samples such that each class will be affected by the multiplier."""
multiplier = {1: 1.5, 2: 1, 3: 3}
target_stats = Counter(y)
return {key: int(values * multiplier[key])
for key, values in target_stats.items()}
ratio_ = check_ratio(ratio_func, y, 'over-sampling')
assert_equal(ratio_, {1: 25, 2: 0, 3: 50}) |
I agree, that you could do what ever you want inside What I mean, is: In the case of having a ratio function that takes some parameter, (A) could wrapper the call, (B) could do it directly. I don't see much problem in allowing the behavior since (A) remains perfectly valid. I might still wrap the function 'cos its more clear. But maybe some user can explode the flexibility of (B) in some pipelining or whatever. |
We probably might want to use |
@chkoar Does the change with |
Still a user has to provide a function like this one, right? |
yep this is one of the possibility. Like this you allow both behaviour. |
In addition of the previous behaviour |
@chkoar Working on the documentation, I went through an example which is really useful. We are sharing the same Therefore, it could be nice to have this feature, if you want to try different level of imbalancing in a dataset. Before, we could provide something like If we use So IMHO, it makes sense in this configuration, much more than in the opposite one (grid-search the ratio for the algorithm, to get the best performance) |
@chkoar are you still -1? |
I am neutral |
1b22868
to
33660d4
Compare
Reference Issue
closes #305
What does this implement/fix? Explain your changes.
Allow to pass argument to ratio when this is a function.
It could be useful when the heuristic is outside of the function
Any other comments?