TST Replace boston in histgradboost test_predictor #16918

lucyleeow · 2020-04-14T13:29:02Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Replace boston dataset with ~~diabetes~~ California housing dataset in sklearn/ensemble/_hist_gradient_boosting/tests/test_predictor.py

Any other comments?

Unsure of the best n_bins values/what this is testing. I noticed that the boston features are more spread out and generally has a longer right tail cf the diabetes dataset. Also the R2 values with n_bin 200 and 256 with diabetes were the same.

The R2 values with California housing are:

train: (bins=200; 0.8233 (bins=256; 0.8340)
test: (bins=200; 0.8112) (bins=256; 0.8094)

adrinjalali · 2020-04-14T18:00:14Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_predictor.py

+    assert r2_score(y_train, predictor.predict(X_train)) > 0.69
+    assert r2_score(y_test, predictor.predict(X_test)) > 0.30


aren't these rather low? Same with the other PR you have, maybe using another dataset would be more easonale?

Yes, that is the problem with the diabetes dataset. I have changed to california housing and it seems to work reasonably well with the original bins.

train: (bins=200; 0.8233 (bins=256; 0.8340)

test: (bins=200; 0.8112) (bins=256; 0.8094)

The downside of using fetch_california_housing is that it requires network access, which means we would need to mark these test with @pytest.mark.network.

With some parameter tuning on the diabetes dataset I can get these results:

n_bins=50
train: 0.4253613178731953
test: 0.38498296812822475

n_bins=100
train: 0.4298426536827863
test: 0.3991035630532065

Parameters:
min_samples_leaf = 50
max_leaf_nodes = None

@thomasjpfan and @adrinjalali which dataset do you guys suggest to use?

would a make_regression with some tuned parameters not be a good option?

Good point, I'll try this tomorrow.

ogrisel · 2020-04-22T08:19:43Z

+1 for using make_regression to avoid relying on the network for such tests.

lucyleeow · 2020-04-22T10:16:26Z

Thanks @ogrisel and @adrinjalali. I've amended to make_regression.

adrinjalali

thanks @lucyleeow . This LGTM

thomasjpfan

LGTM thank you @lucyleeow !

lucyleeow added 2 commits April 14, 2020 14:59

replace boston

3590891

lint

39c12f0

github-actions bot added the module:ensemble label Apr 14, 2020

adrinjalali reviewed Apr 14, 2020

View reviewed changes

lucyleeow added 2 commits April 14, 2020 22:40

use cali house

e18c155

tune diabetes

358e1cc

ogrisel mentioned this pull request Apr 22, 2020

Replace boston in ensemble test_forest #16927

Merged

use make regression

72f8797

adrinjalali approved these changes Apr 22, 2020

View reviewed changes

thomasjpfan approved these changes Apr 23, 2020

View reviewed changes

thomasjpfan changed the title ~~Replace boston in histgradboost test_predictor~~ TST Replace boston in histgradboost test_predictor Apr 23, 2020

thomasjpfan merged commit 7844d1c into scikit-learn:master Apr 23, 2020

lucyleeow deleted the test_predictor branch April 24, 2020 11:58

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

TST Replace boston in histgradboost test_predictor (scikit-learn#16918)

14476cc

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

TST Replace boston in histgradboost test_predictor (scikit-learn#16918)

d87ec80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST Replace boston in histgradboost test_predictor #16918

TST Replace boston in histgradboost test_predictor #16918

lucyleeow commented Apr 14, 2020 •

edited

adrinjalali Apr 14, 2020

lucyleeow Apr 14, 2020

thomasjpfan Apr 15, 2020

lucyleeow Apr 21, 2020

adrinjalali Apr 21, 2020

adrinjalali Apr 21, 2020

adrinjalali Apr 21, 2020

lucyleeow Apr 21, 2020

ogrisel commented Apr 22, 2020

lucyleeow commented Apr 22, 2020

adrinjalali left a comment

thomasjpfan left a comment

		assert r2_score(y_train, predictor.predict(X_train)) > 0.69
		assert r2_score(y_test, predictor.predict(X_test)) > 0.30

TST Replace boston in histgradboost test_predictor #16918

TST Replace boston in histgradboost test_predictor #16918

Conversation

lucyleeow commented Apr 14, 2020 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Apr 22, 2020

lucyleeow commented Apr 22, 2020

adrinjalali left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

lucyleeow commented Apr 14, 2020 •

edited