-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Minimize the validation of X in adaboost #13174
[MRG+2] Minimize the validation of X in adaboost #13174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice
Please add an entry to the change log at doc/whats_new/v0.21.rst
. Like the other entries there, please reference this pull request with :issue:
and credit yourself (and other contributors if applicable) with :user:
|
||
boost = AdaBoostRegressor(DummyEstimator(), n_estimators=3) | ||
boost.fit(X, y_regr) | ||
assert_equal(len(boost.estimator_weights_), len(boost.estimator_errors_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the adoption of pytest, we are phasing out use of test helpers assert_equal
, assert_true
, etc. Please use bare assert
statements, e.g. assert x == y
, assert not x
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not touch that test. It seems that I did because I removed and then readd the test_sparse_classification
and test_sparse_regression
tests. Do you want me to fix that even it is unrelated to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
@jnothman @NicolasHug do you think that we should retain the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Why should we not retain those tests?
@jnothman in the process of the minimization of the validation I was thinking that these checks/tests should be in the responsibility of the base estimator. That's why I am asking. |
88cebc6
to
2e9a045
Compare
@jnothman |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
doc/whats_new/v0.21.rst
Outdated
@@ -118,6 +118,9 @@ Support for Python 3.4 and below has been officially dropped. | |||
value of ``learning_rate`` in ``update_terminal_regions`` is not consistent | |||
with the document and the caller functions. | |||
:issue:`6463` by :user:`movelikeriver <movelikeriver>`. | |||
|
|||
- |Enhancement| Minimized the validation of X in :class:`ensemble.AdaBoostClassifier` | |||
and :class:`ensemble.AdaBoostRegressor` :issue:`13174` by :user:`Christos Aridas <chkoar>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep lines under 80 characters
@@ -70,16 +69,32 @@ def __init__(self, | |||
self.learning_rate = learning_rate | |||
self.random_state = random_state | |||
|
|||
def _validate_data(self, X, y=None): | |||
|
|||
accept_sparse = ['csr', 'csc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand why we don't just accept_sparse=True
. Maybe add a comment that these are required for safe_indexing support???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_sparse_regression
and test_sparse_classification
check that sparse matrices will be converted in one of the two formats. Check here for instance. We could minimize more the validation by turn of these assertions and use accept_sparse=True
but safe_indexing
does not support all sparse matrices. What do you propose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but the comment by @jnothman should be addressed (it's just a question of adding a comment line)
@@ -70,16 +69,32 @@ def __init__(self, | |||
self.learning_rate = learning_rate | |||
self.random_state = random_state | |||
|
|||
def _validate_data(self, X, y=None): | |||
|
|||
accept_sparse = ['csr', 'csc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.
Some PEP8 I think |
@glemaitre good catch. I think it is ok. |
Thanks @chkoar! |
)" This reverts commit cc6fbf4.
)" This reverts commit cc6fbf4.
Reference Issues/PRs
Fixes #7768. Takes over #8304.
What does this implement/fix? Explain your changes.
This PR (almost) transfers the responsibility of the validation of
X
andy
from AdaBoost to the base estimator.Any other comments?
I retained all the tests. If we want to relax the check about sparsity we should remove the
test_sparse_classification
andtest_sparse_regression
tests.