-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing over features rather than CV folds in Sequential Feature Selection (addressing #191) #193
Conversation
Hello @whalebot-helmsman! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 18, 2017 at 20:57 Hours UTC |
Looks great, thanks a lot! I will add a few minor code comments regarding the documentation. (Alt. I am happy to add those to this PR if you do the "enabling repository maintainer permissions on existing pull requests" feature.) |
Okay, it didn't let me add the comments directly to the code since it wasn't modified by you in this PR, so let me post it here :) Could you change the docstrings for
Also, could you add the following snippet to the Changelog in docs/sources/CHANGELOG.md?
Otherwise, the PR looks great, thanks a lot! |
Added changelog and documentation updates |
@@ -66,7 +66,8 @@ class ExhaustiveFeatureSelector(BaseEstimator, MetaEstimatorMixin): | |||
otherwise. | |||
No cross-validation if cv is None, False, or 0. | |||
n_jobs : int (default: 1) | |||
The number of CPUs to use for cross validation. -1 means 'all CPUs'. | |||
The number of CPUs to use for evaluating different feature subsets | |||
in parallel. -1 means 'all CPUs'. | |||
pre_dispatch : int, or string (default: '2*n_jobs') | |||
Controls the number of jobs that get dispatched | |||
during parallel execution in cross_val_score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you also update the pre_dispatch docstring since it still has the cross_val_score referenced there, which may be misleading
@@ -84,7 +84,8 @@ class SequentialFeatureSelector(BaseEstimator, MetaEstimatorMixin): | |||
exclusion/inclusion if floating=True and | |||
algorithm gets stuck in cycles. | |||
n_jobs : int (default: 1) | |||
The number of CPUs to use for cross validation. -1 means 'all CPUs'. | |||
The number of CPUs to use for evaluating different feature subsets | |||
in parallel. -1 means 'all CPUs'. | |||
pre_dispatch : int, or string (default: '2*n_jobs') | |||
Controls the number of jobs that get dispatched | |||
during parallel execution in cross_val_score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here. thanks!
I have to look into Travis CI to see if it supports multiple processors to better tests PRs like that. However, I just tried it locally on 8 cpus and it seems to yield the same results as in the unit tests with 1 processor -- everything seems to work as expected and I will merge it now. Thanks a lot for this really useful contribution! |
Do you use unit-tests for comparison? There are very easy (in a sense of CPU time required) tasks in unit-tests. In such scenario overhead of starting new processes is bigger than gains of multiprocessing.
|
Yeah, I was primarily looking for correctness. I.e., that the previous results in the unit tests are reproduced with > 1 CPUs. |
Description
Use 1 process per feature for sequential feature selection and exhaustive feature selection
Related issues or pull requests
Fixes #191