[WIP] Bandits regressors for model selection #391

etiennekintzler · 2020-11-22T19:36:24Z

Description

The PR introduce bandits (epsilon-greedy and UCB) for model selection (see issue #270 ). The PR concerns only regressors, but I can add the classifiers in a subsequent PR.

The use of the classes are straightforward :

bandit = UCBRegressor(models=models, metric=metrics.MSE()),

for (x, y) in data.take(N):    
        y_pred = bandit.predict_one(x=x)
        bandit.learn_one(x=x, y=y)

best_model = bandit.best_model

There are convenience methods such as :

percentage_pulled : to get the percentage each arm was pulled
best_model : return the model with the highest average reward

Also I added a method add_models where the user can add models on the fly.

I am also working on a notebook that studies the behavior of the bandits for model selection. The notebook also include Exp3, which seems promising but has numerical stability issue and yields counter-intuitive results (see section 3 of the NB). That's why I kept it out of this PR. More generally, the performances of UCB and epsilon-greedy are rather good but there seems to be some variance in the performance.

Improvements

It's still WIP on the following points :

docstring, mainly add examples + cleaning.
typing
some comments might be removed.
certain functionalities I am not sure to include or not:
- Save the history of metric value (cf save_metrics parameter) in Bandits
the name of the classes and the methods are open for changes

I would appreciate inputs regarding the functionalities and the methods/classes naming.

etiennekintzler · 2020-11-23T08:28:33Z

river/expert/bandit.py

+    def __init__(self, epsilon=0.1, epsilon_decay=None, **kwargs):
+        super().__init__(**kwargs)


This part seems to make the tests fail (same part for UCBBandit). Should I enumerate all the parameters (that are in Bandit.__init__) instead of using **kwargs ?

Yes, you should :)

Actually I try with it but the tests do fail again. I see also that expert.SuccessiveHalving* are purposely exclude from the test (in get_all_estimators) while expert.EWARegressor is not. Is the kind of regressor I implemented supposed to pass this test at all ?

You're right, it's not supposed to pass the tests by default. You need implement a _default_params method, like what is done here. This method will get called here to initialize the state of your model.

I just add default LinearRegression in a Pipeline and tests pass. It somehow surprise me since I didn't gave default arg for metric and reward_scaler.

Not related to this PR but one test failed :

======================================================================== FAILURES ======================================================================== ___________________________________________________ [doctest] river.neighbors.sam_knn.SAMKNNClassifier ___________________________________________________ 052 >>> from river import metrics 053 >>> from river import neighbors 054 055 >>> dataset = synth.ConceptDriftStream(position=500, width=20, seed=1).take(1000) 056 057 >>> model = neighbors.SAMKNNClassifier(window_size=100) 058 059 >>> metric = metrics.Accuracy() 060 061 >>> evaluate.progressive_val_score(dataset, model, metric) Expected: Accuracy: 59.90% Got: Accuracy: 58.20% /path/to/package/river/neighbors/sam_knn.py:61: DocTestFailure

@smastelini any idea why the above test is failing?

I could not reproduce that failure in my local setup (in the master branch). Pinging @jacobmontiel, since he has more experience with this specific k-NN variant than I do.

FYU the master branch in my fork is 10 commits behind master in online-ml/river however these 10 commits are mostly doc so..

It passes on my local setup using pytest but it did fail on the CI pipeline (https://travis-ci.org/github/online-ml/river/jobs/745694837).

Very weird! Not sure what to say. Are you sure your development setup is correct?

@MaxHalford

My bad :/ I forget that I add the bandits classes in the ignored tuple in get_all_estimators.

I can now confirm that it fails as in the CI when I do not provide the arguments.

@smastelini any idea why the above test is failing?

I could not reproduce that failure in my local setup (in the master branch). Pinging @jacobmontiel, since he has more experience with this specific k-NN variant than I do.

I am confused, is this failing?

I clone and rebuild the whole package (following https://github.com/online-ml/river/blob/master/CONTRIBUTING.md#installation) and it fails at the same place:

======================================================================== FAILURES ======================================================================== ___________________________________________________ [doctest] river.neighbors.sam_knn.SAMKNNClassifier ___________________________________________________ 052 >>> from river import metrics 053 >>> from river import neighbors 054 055 >>> dataset = synth.ConceptDriftStream(position=500, width=20, seed=1).take(1000) 056 057 >>> model = neighbors.SAMKNNClassifier(window_size=100) 058 059 >>> metric = metrics.Accuracy() 060 061 >>> evaluate.progressive_val_score(dataset, model, metric) Expected: Accuracy: 59.90% Got: Accuracy: 58.20% /home/etienne/PROJETS/river_test/river/river/neighbors/sam_knn.py:61: DocTestFailure

I am using Python 3.7.7 and Ubuntu 20.04.1 LTS.

I can open an issue to keep it out of the PR

MaxHalford

Thanks for the PR! It's a good start, and I agree that starting with regression only is a good idea :). I've made a few comments to get the ball rolling. High five!

river/expert/__init__.py

river/expert/bandit.py

MaxHalford · 2020-11-23T17:22:50Z

river/expert/bandit.py

+    def __init__(self, epsilon=0.1, epsilon_decay=None, **kwargs):
+        super().__init__(**kwargs)


Yes, you should :)

river/expert/bandit.py

etiennekintzler · 2020-11-24T17:57:12Z

Thanks for the PR! It's a good start, and I agree that starting with regression only is a good idea :). I've made a few comments to get the ball rolling. High five!

Thanks, also thank you for this review !

I incorporate all your points and left some conversation unresolved (when your feedback is needed).

I will quickly test this new class since I introduce noticeable change by removing the numpy dependency. Also I need to finish writing docstring (Parameters and Example) then you can merge if it looks good to you. If you see other desirable change tell me too.

etiennekintzler · 2020-11-25T17:39:09Z

The test check_shuffle_features_no_impact from module estimator_checks fails. The reason is not because of the shuffling of the features per se but because the bandit learning process is not deterministic.

To put it differently, if there is two arms (A1, A2) and two bandits (B1, B2), and if the bandit B1 pull arm A1 and bandit B2 pull arm A2 at round 0, they won't output the same prediction at round 1 because their internals (in particular _average_reward) will differ.

MaxHalford · 2020-11-26T09:00:10Z

@etiennekintzler, can you close and reopen the pull request? We've switched to GitHub actions for CI, which should be much faster and more enjoyable than Travis :)

MaxHalford · 2020-11-26T17:56:40Z

@etiennekintzler: you need to a seed parameter and seed the random generation process to make it reproducible.

Regarding unit tests, I guess that for the moment you can rely CI and locally ignore that SamKNN test that isn't passing.

etiennekintzler added 3 commits November 15, 2020 13:27

[WIP] first layout for bandits classes

da7a823

improve docstring, rm class not in PR, clean some

13154a6

delete nb from pr, clean some

b0fefc6

etiennekintzler commented Nov 23, 2020

View reviewed changes

MaxHalford self-requested a review November 23, 2020 14:35

MaxHalford reviewed Nov 23, 2020

View reviewed changes

etiennekintzler added 7 commits November 23, 2020 18:45

enumerate all parameters in __init__, fix \epsilon

e128e61

align on convention: single quote and import

a50693f

rm print_every, change print_info->__repr__, skip line after class

22c7923

substitute stdlib for numpy

afe1cac

rm metrics tracing, add _learn_one for powerusers

3c25c93

forget to rm object tracing in class __init__ signature

860e779

add type to models, _default_params, use '+=' for list append

9fe14f1

etiennekintzler added 4 commits November 25, 2020 09:46

change parameters in _default_params

f0b1b6c

intercept_lr instead of lr in LinearRegression

9cb720c

mv argmax to utils.math

d3169dc

fix mistake: EpsilonGreedyRessor didnt inherit from base.Regressor

c7b20ca

etiennekintzler closed this Nov 26, 2020

etiennekintzler mentioned this pull request Nov 26, 2020

Bandits regressors for model selection (new PR to use Github CI/CD) #397

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Bandits regressors for model selection #391

[WIP] Bandits regressors for model selection #391

etiennekintzler commented Nov 22, 2020

etiennekintzler Nov 23, 2020

MaxHalford Nov 23, 2020

etiennekintzler Nov 23, 2020

MaxHalford Nov 23, 2020

etiennekintzler Nov 24, 2020 •

edited

etiennekintzler Nov 25, 2020

MaxHalford Nov 25, 2020

etiennekintzler Nov 25, 2020

jacobmontiel Nov 26, 2020

etiennekintzler Nov 26, 2020

MaxHalford left a comment

MaxHalford Nov 23, 2020

etiennekintzler commented Nov 24, 2020

etiennekintzler commented Nov 25, 2020

MaxHalford commented Nov 26, 2020

MaxHalford commented Nov 26, 2020 •

edited

		def __init__(self, epsilon=0.1, epsilon_decay=None, **kwargs):
		super().__init__(**kwargs)

[WIP] Bandits regressors for model selection #391

[WIP] Bandits regressors for model selection #391

Conversation

etiennekintzler commented Nov 22, 2020

Description

Improvements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etiennekintzler Nov 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaxHalford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etiennekintzler commented Nov 24, 2020

etiennekintzler commented Nov 25, 2020

MaxHalford commented Nov 26, 2020

MaxHalford commented Nov 26, 2020 • edited

etiennekintzler Nov 24, 2020 •

edited

MaxHalford commented Nov 26, 2020 •

edited