Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX reproducibility and parallelization of InstanceHardnessThreshold #598

Conversation

Shihab-Shahriar
Copy link
Contributor

This PR aims to solve couple problems with existing InstanceHardnessThreshold sampler.

  1. When estimator is not None, result won't not be reproducible if estimator doesn't have random_state.
  2. When estimator is not None, it may have different n_jobs value than one given to InstanceHardnessThreshold constructor. So when given estimator's n_jobs equals 1, setting n_jobs>1 in InstanceHardnessThreshold won't affect anything, and fit_resample will run in single thread.
  3. When n_jobs in both cases match, by moving parallelism away from estimator to cross_val_predict, this enables coarse-grained parallelism, possibly speeding up computation.

@pep8speaks
Copy link

Hello @Shihab-Shahriar! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 15:52: E231 missing whitespace after ','
Line 129:48: E231 missing whitespace after ','
Line 149:39: E128 continuation line under-indented for visual indent
Line 149:80: E501 line too long (81 > 79 characters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants