[MRG+2] Minimize the validation of X in adaboost #13174

chkoar · 2019-02-15T22:37:43Z

Reference Issues/PRs

Fixes #7768. Takes over #8304.

What does this implement/fix? Explain your changes.

This PR (almost) transfers the responsibility of the validation of X and y from AdaBoost to the base estimator.

Any other comments?

I retained all the tests. If we want to relax the check about sparsity we should remove the test_sparse_classification and test_sparse_regression tests.

jnothman

Looks nice

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

sklearn/ensemble/weight_boosting.py

jnothman · 2019-02-17T21:33:59Z

sklearn/ensemble/tests/test_weight_boosting.py

+
+    boost = AdaBoostRegressor(DummyEstimator(), n_estimators=3)
+    boost.fit(X, y_regr)
+    assert_equal(len(boost.estimator_weights_), len(boost.estimator_errors_))


With the adoption of pytest, we are phasing out use of test helpers assert_equal, assert_true, etc. Please use bare assert statements, e.g. assert x == y, assert not x, etc.

I did not touch that test. It seems that I did because I removed and then readd the test_sparse_classification and test_sparse_regression tests. Do you want me to fix that even it is unrelated to the PR?

NicolasHug

some comments

sklearn/ensemble/tests/test_weight_boosting.py

chkoar · 2019-02-18T13:15:03Z

@jnothman @NicolasHug do you think that we should retain the test_sparse_classification and test_sparse_regression tests?

jnothman

I don't understand. Why should we not retain those tests?

sklearn/ensemble/tests/test_weight_boosting.py

chkoar · 2019-02-18T23:05:33Z

I don't understand. Why should we not retain those tests?

@jnothman in the process of the minimization of the validation I was thinking that these checks/tests should be in the responsibility of the base estimator. That's why I am asking.

chkoar · 2019-02-21T18:00:37Z

@jnothman _num_samples might be unnecessary since we use check_array and check_X_y

sklearn/ensemble/tests/test_weight_boosting.py

sklearn/ensemble/weight_boosting.py

jnothman

Otherwise LGTM

jnothman · 2019-02-26T22:15:24Z

doc/whats_new/v0.21.rst

@@ -118,6 +118,9 @@ Support for Python 3.4 and below has been officially dropped.
  value of ``learning_rate`` in ``update_terminal_regions`` is not consistent
  with the document and the caller functions.
  :issue:`6463` by :user:`movelikeriver <movelikeriver>`.
+
+- |Enhancement| Minimized the validation of X in :class:`ensemble.AdaBoostClassifier`
+  and :class:`ensemble.AdaBoostRegressor` :issue:`13174` by :user:`Christos Aridas <chkoar>`.


please keep lines under 80 characters

jnothman · 2019-02-26T22:21:37Z

sklearn/ensemble/weight_boosting.py

@@ -70,16 +69,32 @@ def __init__(self,
        self.learning_rate = learning_rate
        self.random_state = random_state

+    def _validate_data(self, X, y=None):
+
+        accept_sparse = ['csr', 'csc']


It took me a while to understand why we don't just accept_sparse=True. Maybe add a comment that these are required for safe_indexing support???

test_sparse_regression and test_sparse_classification check that sparse matrices will be converted in one of the two formats. Check here for instance. We could minimize more the validation by turn of these assertions and use accept_sparse=True but safe_indexing does not support all sparse matrices. What do you propose?

What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.

GaelVaroquaux

LGTM, but the comment by @jnothman should be addressed (it's just a question of adding a comment line)

GaelVaroquaux · 2019-02-27T13:16:51Z

sklearn/ensemble/weight_boosting.py

@@ -70,16 +69,32 @@ def __init__(self,
        self.learning_rate = learning_rate
        self.random_state = random_state

+    def _validate_data(self, X, y=None):
+
+        accept_sparse = ['csr', 'csc']


What @jnothman is asking for is simply a comment on the line above saying that these are required for safe_indexing support.

glemaitre · 2019-02-27T17:35:48Z

Some PEP8 I think

chkoar · 2019-02-27T18:22:58Z

@glemaitre good catch. I think it is ok.

jnothman · 2019-02-28T05:18:24Z

Thanks @chkoar!

)" This reverts commit cc6fbf4.

jnothman reviewed Feb 17, 2019

View reviewed changes

NicolasHug reviewed Feb 17, 2019

View reviewed changes

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved

jnothman reviewed Feb 18, 2019

View reviewed changes

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved

chkoar added 12 commits February 21, 2019 16:18

Minimize validation of X

9332837

One more iteration

88118ab

Update docstrings

a52af10

Pass common tests

2f07c55

add back the removed tests

fe06a6c

Always convert to csr or csc

110249d

last changes

0a0a0c3

Fix tests

59d0944

Add _num_samples

ceae5fe

fix pep8

cb51fd6

fix test

7c728f9

Revert back docstrings and data_validation

2e9a045

chkoar force-pushed the minimize_validation_in_meta_estimators branch from 88cebc6 to 2e9a045 Compare February 21, 2019 17:35

chkoar added 3 commits February 21, 2019 19:39

pep8

5178572

Update whats_new/v0.21.rst

03adc18

pep8

a6ffe4b

jnothman reviewed Feb 23, 2019

View reviewed changes

sklearn/ensemble/tests/test_weight_boosting.py Show resolved Hide resolved

chkoar added 2 commits February 23, 2019 15:06

Check prediction

daf7b7a

Check prediction again

d0c62f6

agramfort reviewed Feb 25, 2019

View reviewed changes

sklearn/ensemble/tests/test_weight_boosting.py Outdated Show resolved Hide resolved

chkoar added 2 commits February 25, 2019 17:32

Address agramfort comments

1713475

Fix data generation

3a07a37

agramfort reviewed Feb 25, 2019

View reviewed changes

sklearn/ensemble/weight_boosting.py Outdated Show resolved Hide resolved

fix validation [ci skip]

bdd6a24

Change strategy for DummyClassifier

1efc196

agramfort reviewed Feb 26, 2019

View reviewed changes

sklearn/ensemble/weight_boosting.py Outdated Show resolved Hide resolved

User relative imports

e5ccc53

jnothman reviewed Feb 26, 2019

View reviewed changes

fix whats_new

e6bf9ab

agramfort changed the title ~~Minimize the validation of X in adaboost~~ [MRG+1] Minimize the validation of X in adaboost Feb 27, 2019

jnothman approved these changes Feb 27, 2019

View reviewed changes

GaelVaroquaux approved these changes Feb 27, 2019

View reviewed changes

GaelVaroquaux changed the title ~~[MRG+1] Minimize the validation of X in adaboost~~ [MRG+2] Minimize the validation of X in adaboost Feb 27, 2019

Address jnothman comment

5f9fc61

Remove white space

1a2e896

jnothman merged commit 9d21197 into scikit-learn:master Feb 28, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH Minimize the validation of X in adaboost (scikit-learn#13174)

cc6fbf4

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Minimize the validation of X in adaboost (scikit-learn#13174

22584a1

)" This reverts commit cc6fbf4.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Minimize the validation of X in adaboost (scikit-learn#13174

513b9ef

)" This reverts commit cc6fbf4.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH Minimize the validation of X in adaboost (scikit-learn#13174)

d258bf2

amueller mentioned this pull request Aug 7, 2019

[WIP] minimize validation of X in adaboost #8304

Closed

cmarmo mentioned this pull request Jun 9, 2020

[WIP] Minimize validation of X in ensembles with a base estimator #12072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Minimize the validation of X in adaboost #13174

[MRG+2] Minimize the validation of X in adaboost #13174

chkoar commented Feb 15, 2019

jnothman left a comment

jnothman Feb 17, 2019

chkoar Feb 18, 2019 •

edited

Loading

NicolasHug left a comment

chkoar commented Feb 18, 2019

jnothman left a comment

chkoar commented Feb 18, 2019 •

edited

Loading

chkoar commented Feb 21, 2019

jnothman left a comment

jnothman Feb 26, 2019

jnothman Feb 26, 2019

chkoar Feb 26, 2019

GaelVaroquaux Feb 27, 2019

GaelVaroquaux left a comment

GaelVaroquaux Feb 27, 2019

glemaitre commented Feb 27, 2019

chkoar commented Feb 27, 2019 •

edited

Loading

jnothman commented Feb 28, 2019

[MRG+2] Minimize the validation of X in adaboost #13174

[MRG+2] Minimize the validation of X in adaboost #13174

Conversation

chkoar commented Feb 15, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jnothman left a comment

Choose a reason for hiding this comment

jnothman Feb 17, 2019

Choose a reason for hiding this comment

chkoar Feb 18, 2019 • edited Loading

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

chkoar commented Feb 18, 2019

jnothman left a comment

Choose a reason for hiding this comment

chkoar commented Feb 18, 2019 • edited Loading

chkoar commented Feb 21, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman Feb 26, 2019

Choose a reason for hiding this comment

jnothman Feb 26, 2019

Choose a reason for hiding this comment

chkoar Feb 26, 2019

Choose a reason for hiding this comment

GaelVaroquaux Feb 27, 2019

Choose a reason for hiding this comment

GaelVaroquaux left a comment

Choose a reason for hiding this comment

GaelVaroquaux Feb 27, 2019

Choose a reason for hiding this comment

glemaitre commented Feb 27, 2019

chkoar commented Feb 27, 2019 • edited Loading

jnothman commented Feb 28, 2019

chkoar Feb 18, 2019 •

edited

Loading

chkoar commented Feb 18, 2019 •

edited

Loading

chkoar commented Feb 27, 2019 •

edited

Loading