Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for abstract cv method for StackingCVClassifier #203

Merged
merged 3 commits into from
Jun 15, 2017

Conversation

sque
Copy link
Contributor

@sque sque commented Jun 14, 2017

Description

Change the API of StackingCVClassifier in order to accept any kind of cross-validation technique.

Related issues or pull requests

Related #202

Pull Request requirements

  • Added appropriate unit test functions in the ./mlxtend/*/tests directories
  • Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
  • Checked the test coverage by running nosetests ./mlxtend --with-coverage
  • Checked for style issues by running flake8 ./mlxtend
  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
  • Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
  • Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 93.68% when pulling 7bc008f on sque:abstract_cv_method_for_stacking into e688f7d on rasbt:master.

@@ -26,6 +26,7 @@ The CHANGELOG for the current development version is available at
- `plot_decision_regions` now supports plotting decision regions for more than 2 training features. (via [James Bourbeau](https://github.com/jrbourbeau)).
- Parallel execution in `mlxtend.feature_selection.SequentialFeatureSelector` and `mlxtend.feature_selection.ExhaustiveFeatureSelector` is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large ([#193](https://github.com/rasbt/mlxtend/pull/193), via [@whalebot-helmsman](https://github.com/whalebot-helmsman)).
- Raise meaningful error messages if pandas `DataFrame`s or Python lists of lists are fed into the StackingCVClassifer as a `fit` arguments.
- The `StackingCVClassifier` can now accept any kind of cross validation technique by using `cv` argument. E.g. `StackingCVClassifier(..., cv=GroupKFold(n_splits=3))`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, but could you mention that the n_folds argument was changed to cv? Maybe sth. like

  • The n_folds parameter of the StackingCVClassifier was changed to cv and can now accept any kind of cross validation technique that is available from scikit-learn. For example, StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3)) or StackingCVClassifier(..., cv=GroupKFold(n_splits=3)) (via Konstantinos Paliouras).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it looks better like this. Thanks

@rasbt
Copy link
Owner

rasbt commented Jun 14, 2017

Thanks for the PR, the code looks fine, and that's a pretty elegant solution. Regarding the examples in the documentation (at https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/classifier/StackingCVClassifier.ipynb). Are you planning on changing the previous once to make sure they work with the API change? Also, it would prob. be a good idea to add a new example for GroupKFold or so.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 93.68% when pulling 0cbd4d1 on sque:abstract_cv_method_for_stacking into e688f7d on rasbt:master.

@sque
Copy link
Contributor Author

sque commented Jun 15, 2017

I updated the CHANGELOG as you instructed but I would like to avoid updating the documentation because I am not familiar with the process. In case I am commiting more changes in the future, I will try to learn the process :)

@rasbt
Copy link
Owner

rasbt commented Jun 15, 2017

Thanks! For the documentation, you would just need to update the Jupyter Notebook text. However, I just see that the n_folds parameter was nowhere used explicitly in the examples, so nothing would have to be changed here :). The only think that would change is the API documentation at the bottom, but this will be done automatically when I push the documentation to the website during the next version release.

The general way would be to go into the docs subdirectory and execute the make_api.py script from there:

cd docs
python make_api.py

This would build a local API documentation via markdown, which is then simply read into the Jupyter notebook -- later, the Jupyter notebook gets converted into Markdown via

python ipynb2markdown.py --ipynb sources/user_guide/classifier/StackingCVClassifier.ipynb

But don't worry about it, I will run ipynb2markdown.py --all before the next version release.

Overall, it looks good now, I can merge it now unless you have additional suggestions

@sque
Copy link
Contributor Author

sque commented Jun 15, 2017

Thank you for the info and I will keep them in mind for the next time :) I do not intend to make any more changes you can proceed with the merging

@rasbt
Copy link
Owner

rasbt commented Jun 15, 2017

Nice, I will merge it then! Thanks a lot for the contribution, I greatly appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants