Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TomekLinks fit_sample taking long time #567

Closed
atulec04 opened this issue May 8, 2019 · 2 comments
Closed

TomekLinks fit_sample taking long time #567

atulec04 opened this issue May 8, 2019 · 2 comments

Comments

@atulec04
Copy link

@atulec04 atulec04 commented May 8, 2019

I am working on a text classification problem. I am using TomekLinks class of imblearn module to resample my data.But after calling fit_sample(X,y) method of TomekLinks class program does nothing even if i wait for 30 mins. My data set is 1800000 records long(text data).Here is the code snippet

from imblearn.under_sampling import TomekLinks

tl = TomekLinks(return_indices=True, ratio='majority',random_state=42)
X_tl, y_tl = tl.fit_sample(train_x,y_binary)

Can anyone help as why it is taking such a long time? and how to hanle this situation

@hayesall

This comment has been minimized.

Copy link
Contributor

@hayesall hayesall commented May 8, 2019

"Tomek Links" is a fairly expensive algorithm since it has to compute pairwise distances between all examples. Even before taking the dimensionality of your text data into account, it will have to compute something on the order of(1.8 * 10^6)^2 values.

From page 4 of "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data": "As finding Tomek links is computationally demanding, it would be computationally cheaper if it was performed on a reduced data set."

Maybe you could sample from your data set (while preserving the underlying distribution) or try some other dimensionality reduction techniques first?

@atulec04

This comment has been minimized.

Copy link
Author

@atulec04 atulec04 commented May 8, 2019

Ok

@atulec04 atulec04 closed this May 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.