-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TomekLinks fit_sample taking long time #567
Comments
"Tomek Links" is a fairly expensive algorithm since it has to compute pairwise distances between all examples. Even before taking the dimensionality of your text data into account, it will have to compute something on the order of From page 4 of "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data": "As finding Tomek links is computationally demanding, it would be computationally cheaper if it was performed on a reduced data set." Maybe you could sample from your data set (while preserving the underlying distribution) or try some other dimensionality reduction techniques first? |
Ok |
Yes totally agree |
I am working on a text classification problem. I am using TomekLinks class of imblearn module to resample my data.But after calling fit_sample(X,y) method of TomekLinks class program does nothing even if i wait for 30 mins. My data set is 1800000 records long(text data).Here is the code snippet
from imblearn.under_sampling import TomekLinks
tl = TomekLinks(return_indices=True, ratio='majority',random_state=42)
X_tl, y_tl = tl.fit_sample(train_x,y_binary)
Can anyone help as why it is taking such a long time? and how to hanle this situation
The text was updated successfully, but these errors were encountered: