Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Minimize the validation of X in adaboost #13174

Merged

Conversation

chkoar
Copy link
Contributor

@chkoar chkoar commented Feb 15, 2019

Reference Issues/PRs

Fixes #7768. Takes over #8304.

What does this implement/fix? Explain your changes.

This PR (almost) transfers the responsibility of the validation of X and y from AdaBoost to the base estimator.

Any other comments?

I retained all the tests. If we want to relax the check about sparsity we should remove the test_sparse_classification and test_sparse_regression tests.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

sklearn/ensemble/weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/weight_boosting.py Outdated Show resolved Hide resolved

boost = AdaBoostRegressor(DummyEstimator(), n_estimators=3)
boost.fit(X, y_regr)
assert_equal(len(boost.estimator_weights_), len(boost.estimator_errors_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the adoption of pytest, we are phasing out use of test helpers assert_equal, assert_true, etc. Please use bare assert statements, e.g. assert x == y, assert not x, etc.

Copy link
Contributor Author

@chkoar chkoar Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not touch that test. It seems that I did because I removed and then readd the test_sparse_classification and test_sparse_regression tests. Do you want me to fix that even it is unrelated to the PR?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
@chkoar
Copy link
Contributor Author

chkoar commented Feb 18, 2019

@jnothman @NicolasHug do you think that we should retain the test_sparse_classification and test_sparse_regression tests?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Why should we not retain those tests?

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved
@chkoar
Copy link
Contributor Author

chkoar commented Feb 18, 2019

I don't understand. Why should we not retain those tests?

@jnothman in the process of the minimization of the validation I was thinking that these checks/tests should be in the responsibility of the base estimator. That's why I am asking.

@chkoar chkoar force-pushed the minimize_validation_in_meta_estimators branch from 88cebc6 to 2e9a045 Compare February 21, 2019 17:35
@chkoar
Copy link
Contributor Author

chkoar commented Feb 21, 2019

@jnothman _num_samples might be unnecessary since we use check_array and check_X_y

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@@ -118,6 +118,9 @@ Support for Python 3.4 and below has been officially dropped.
value of ``learning_rate`` in ``update_terminal_regions`` is not consistent
with the document and the caller functions.
:issue:`6463` by :user:`movelikeriver <movelikeriver>`.

- |Enhancement| Minimized the validation of X in :class:`ensemble.AdaBoostClassifier`
and :class:`ensemble.AdaBoostRegressor` :issue:`13174` by :user:`Christos Aridas <chkoar>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep lines under 80 characters

@@ -70,16 +69,32 @@ def __init__(self,
self.learning_rate = learning_rate
self.random_state = random_state

def _validate_data(self, X, y=None):

accept_sparse = ['csr', 'csc']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand why we don't just accept_sparse=True. Maybe add a comment that these are required for safe_indexing support???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_sparse_regression and test_sparse_classification check that sparse matrices will be converted in one of the two formats. Check here for instance. We could minimize more the validation by turn of these assertions and use accept_sparse=True but safe_indexing does not support all sparse matrices. What do you propose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.

@agramfort agramfort changed the title Minimize the validation of X in adaboost [MRG+1] Minimize the validation of X in adaboost Feb 27, 2019
Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the comment by @jnothman should be addressed (it's just a question of adding a comment line)

@@ -70,16 +69,32 @@ def __init__(self,
self.learning_rate = learning_rate
self.random_state = random_state

def _validate_data(self, X, y=None):

accept_sparse = ['csr', 'csc']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.

@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Minimize the validation of X in adaboost [MRG+2] Minimize the validation of X in adaboost Feb 27, 2019
@glemaitre
Copy link
Member

Some PEP8 I think

@chkoar
Copy link
Contributor Author

chkoar commented Feb 27, 2019

@glemaitre good catch. I think it is ok.

@jnothman jnothman merged commit 9d21197 into scikit-learn:master Feb 28, 2019
@jnothman
Copy link
Member

Thanks @chkoar!

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minimize validation of X in ensembles with a base estimator
6 participants