Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMOTETomek with SMOTE variants #589

pckroon opened this issue Aug 1, 2019 · 5 comments

SMOTETomek with SMOTE variants #589

pckroon opened this issue Aug 1, 2019 · 5 comments


Copy link

@pckroon pckroon commented Aug 1, 2019


Hi! First off, I know very little about machine learning in general, and imbalanced machine learning in particular, so I don't know if this will make much sense.
The problem I encountered is that I can not use combine.SMOTETomek with e.g. SVMSMOTE.

Steps/Code to Reproduce

import numpy as np
from imblearn.combine import SMOTETomek
from imblearn.over_sampling import SVMSMOTE

sampler = SMOTETomek(smote=SVMSMOTE())
sampler.fit_resample(np.arange(10).reshape(5, -1), np.arange(5))

Expected Results

A SMOTETomek sampler that uses SVMSMOTE for oversampling.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/", line 84, in fit_resample
    output = self._fit_resample(X, y)
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/combine/", line 139, in _fit_resample
  File "/home/.../.virtualenvs/cartographer/lib/python3.6/site-packages/imblearn/combine/", line 117, in _validate_estimator
    'Got {} instead.'.format(type(self.smote)))
ValueError: smote needs to be a SMOTE object.Got <class 'imblearn.over_sampling._smote.SVMSMOTE'> instead.


>>> import platform; print(platform.platform())
>>> import sys; print("Python", sys.version)
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.16.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.2.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.21.2
>>> import imblearn; print("Imbalanced-Learn", imblearn.__version__)
Imbalanced-Learn 0.5.0
Copy link

@hayesall hayesall commented Aug 1, 2019

Thanks for the question @pckroon! Currently the smote= parameter is used for passing a SMOTE object with parameters that are different from the defaults.

And it looks like the error is raised here:

if self.smote is not None:
if isinstance(self.smote, SMOTE):
self.smote_ = clone(self.smote)
raise ValueError('smote needs to be a SMOTE object.'
'Got {} instead.'.format(type(self.smote)))

This should probably be adapted to accept SVMSMOTE and the other SMOTE variants as well.

Copy link

@pckroon pckroon commented Aug 1, 2019


if isinstance(self.smote, SMOTE):
for if instance(self.smote, BaseSMOTE): should do it code-wise (along with changing the corresponding import statement), but I don't know if there's some deeper reason why this would or would not be a bad idea.

PS. Is there any particular reason why ADASYN wouldn't work in this context?

Copy link

@chkoar chkoar commented Aug 1, 2019

@pckroon basically by using the Pipeline object you could chain whatever samplers you want.

Copy link

@pckroon pckroon commented Aug 1, 2019

Ah ok :)
I thought there was more going on than just calling one after the other.
In that case I would suggest removing/deprecating the SMOTEENN and SMOTETomek objects altogether, and in the combine docs write a little bit about using a pipeline to chain them. Currently it looks as if the combinations SMOTE+ENN and SMOTE+TomekLinks are special.

Copy link

@glemaitre glemaitre commented Sep 18, 2019

It is a bit easier to discover these samplers if you come from the literature.
You don't need to know about the internal and that it corresponds to make a pipeline.
If you read the paper then it is true that it is a bit overkill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

4 participants