Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMOTETomek with SMOTE variants #589

Closed
pckroon opened this issue Aug 1, 2019 · 5 comments
Closed

SMOTETomek with SMOTE variants #589

pckroon opened this issue Aug 1, 2019 · 5 comments

Comments

@pckroon
Copy link

pckroon commented Aug 1, 2019

Description

Hi! First off, I know very little about machine learning in general, and imbalanced machine learning in particular, so I don't know if this will make much sense.
The problem I encountered is that I can not use combine.SMOTETomek with e.g. SVMSMOTE.

Steps/Code to Reproduce

import numpy as np
from imblearn.combine import SMOTETomek
from imblearn.over_sampling import SVMSMOTE

sampler = SMOTETomek(smote=SVMSMOTE())
sampler.fit_resample(np.arange(10).reshape(5, -1), np.arange(5))

Expected Results

A SMOTETomek sampler that uses SVMSMOTE for oversampling.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/base.py", line 84, in fit_resample
    output = self._fit_resample(X, y)
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/combine/_smote_tomek.py", line 139, in _fit_resample
    self._validate_estimator()
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/combine/_smote_tomek.py", line 117, in _validate_estimator
    'Got {} instead.'.format(type(self.smote)))
ValueError: smote needs to be a SMOTE object.Got <class 'imblearn.over_sampling._smote.SVMSMOTE'> instead.

Versions

>>> import platform; print(platform.platform())
Linux-4.15.0-54-generic-x86_64-with-Ubuntu-18.04-bionic
>>> import sys; print("Python", sys.version)
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.16.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.2.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.21.2
>>> import imblearn; print("Imbalanced-Learn", imblearn.__version__)
Imbalanced-Learn 0.5.0
@hayesall
Copy link
Member

hayesall commented Aug 1, 2019

Thanks for the question @pckroon! Currently the smote= parameter is used for passing a SMOTE object with parameters that are different from the defaults.

And it looks like the error is raised here:

if self.smote is not None:
if isinstance(self.smote, SMOTE):
self.smote_ = clone(self.smote)
else:
raise ValueError('smote needs to be a SMOTE object.'
'Got {} instead.'.format(type(self.smote)))

This should probably be adapted to accept SVMSMOTE and the other SMOTE variants as well.

@pckroon
Copy link
Author

pckroon commented Aug 1, 2019

Changing

if isinstance(self.smote, SMOTE):
for if instance(self.smote, BaseSMOTE): should do it code-wise (along with changing the corresponding import statement), but I don't know if there's some deeper reason why this would or would not be a bad idea.

PS. Is there any particular reason why ADASYN wouldn't work in this context?

@chkoar
Copy link
Member

chkoar commented Aug 1, 2019

@pckroon basically by using the Pipeline object you could chain whatever samplers you want.

@pckroon
Copy link
Author

pckroon commented Aug 1, 2019

Ah ok :)
I thought there was more going on than just calling one after the other.
In that case I would suggest removing/deprecating the SMOTEENN and SMOTETomek objects altogether, and in the combine docs write a little bit about using a pipeline to chain them. Currently it looks as if the combinations SMOTE+ENN and SMOTE+TomekLinks are special.

@glemaitre
Copy link
Member

It is a bit easier to discover these samplers if you come from the literature.
You don't need to know about the internal and that it corresponds to make a pipeline.
If you read the paper then it is true that it is a bit overkill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants