FIX reproducibility and parallelization of InstanceHardnessThreshold #598

Shihab-Shahriar · 2019-09-07T14:11:57Z

This PR aims to solve couple problems with existing InstanceHardnessThreshold sampler.

When estimator is not None, result won't not be reproducible if estimator doesn't have random_state.
When estimator is not None, it may have different n_jobs value than one given to InstanceHardnessThreshold constructor. So when given estimator's n_jobs equals 1, setting n_jobs>1 in InstanceHardnessThreshold won't affect anything, and fit_resample will run in single thread.
When n_jobs in both cases match, by moving parallelism away from estimator to cross_val_predict, this enables coarse-grained parallelism, possibly speeding up computation.

pep8speaks · 2019-09-07T14:11:59Z

Hello @Shihab-Shahriar! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py:

Line 15:52: E231 missing whitespace after ','
Line 129:48: E231 missing whitespace after ','
Line 149:39: E128 continuation line under-indented for visual indent
Line 149:80: E501 line too long (81 > 79 characters)

FIX reproducibility and parallelization of InstanceHardnessThreshold

6e37a60

Shihab-Shahriar closed this Sep 7, 2019

Shihab-Shahriar mentioned this pull request Sep 7, 2019

FIX reproducibility and parallelization of InstanceHardnessThreshold #599

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX reproducibility and parallelization of InstanceHardnessThreshold #598

FIX reproducibility and parallelization of InstanceHardnessThreshold #598

Shihab-Shahriar commented Sep 7, 2019

pep8speaks commented Sep 7, 2019

FIX reproducibility and parallelization of InstanceHardnessThreshold #598

FIX reproducibility and parallelization of InstanceHardnessThreshold #598

Conversation

Shihab-Shahriar commented Sep 7, 2019

pep8speaks commented Sep 7, 2019