Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandits regressors for model selection (new PR to use Github CI/CD) #397

Merged
merged 30 commits into from Jan 4, 2021

Conversation

etiennekintzler
Copy link
Contributor

Description

The PR introduce bandits (epsilon-greedy and UCB) for model selection (see issue #270 ). The PR concerns only regressors, but I can add the classifiers in a subsequent PR.

The use of the classes are straightforward :

bandit = UCBRegressor(models=models, metric=metrics.MSE()),

for (x, y) in data.take(N):    
        y_pred = bandit.predict_one(x=x)
        bandit.learn_one(x=x, y=y)

best_model = bandit.best_model

There are convenience methods such as :

  • percentage_pulled : to get the percentage each arm was pulled
  • best_model : return the model with the highest average reward

Also I added a method add_models where the user can add models on the fly.

I am also working on a notebook that studies the behavior of the bandits for model selection. The notebook also include Exp3, which seems promising but has numerical stability issue and yields counter-intuitive results (see section 3 of the NB). That's why I kept it out of this PR. More generally, the performances of UCB and epsilon-greedy are rather good but there seems to be some variance in the performance.

Improvements

It's still WIP on the following points :

  • docstring, mainly add examples + cleaning.
  • some comments might be removed.
  • the name of the classes and the methods are open for changes

@etiennekintzler
Copy link
Contributor Author

As in #391 :

The test check_shuffle_features_no_impact from module estimator_checks fails. The reason is not because of the shuffling of the features per se but because the bandit learning process is not deterministic.

To put it differently, if there is two arms (A1, A2) and two bandits (B1, B2), and if the bandit B1 pull arm A1 and bandit B2 pull arm A2 at round 0, they won't output the same prediction at round 1 because their internals (in particular _average_reward) will differ.

@smastelini
Copy link
Member

The test check_shuffle_features_no_impact from module estimator_checks fails. The reason is not because of the shuffling of the features per se but because the bandit learning process is not deterministic.

Hi @etiennekintzler. Some while ago I had to set the seed parameter of some methods in order to make them pass the tests. The default seed parameter must be added in the _default_params method. Please, take a look at that as an example.

@MaxHalford
Copy link
Member

Indeed @etiennekintzler, as @smastelini is saying, you need to seed the randomized parts of your code. For instance, instead of calling random.random(), first initialize a random number generator in your __init__:

self._rng = random.Random(seed)

where seed is a parameter provided by the user that defaults to None.

You can then do self._rng.random().

@etiennekintzler
Copy link
Contributor Author

etiennekintzler commented Nov 26, 2020

Indeed @etiennekintzler, as @smastelini is saying, you need to seed the randomized parts of your code. For instance, instead of calling random.random(), first initialize a random number generator in your __init__:

self._rng = random.Random(seed)

where seed is a parameter provided by the user that defaults to None.

You can then do self._rng.random()

@MaxHalford I misunderstood when you first talk about this in the 1st PR. I thought you meant fix a different seed for each bandit (for model and shuffled as named in the test) which would not solve the problem. You are talking about randomizing in the _defaut_params so the 2 bandits will pull the same arm ?

The test check_shuffle_features_no_impact from module estimator_checks fails. The reason is not because of the shuffling of the features per se but because the bandit learning process is not deterministic.

Hi @etiennekintzler. Some while ago I had to set the seed parameter of some methods in order to make them pass the tests. The default seed parameter must be added in the _default_params method. Please, take a look at that as an example.

Hello @smastelini, I check your code, I think I got it now, thanks you 👍

Also @MaxHalford not related to this, but I think the CI cancelled because of RAM usage (it happens when I run the tests on my computer), you can try to provision more RAM and see if the problem persists.

@codecov-io
Copy link

codecov-io commented Nov 29, 2020

Codecov Report

Merging #397 (8d5c0a1) into master (a9bfae0) will decrease coverage by 0.63%.
The diff coverage is 80.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #397      +/-   ##
==========================================
- Coverage   85.38%   84.75%   -0.64%     
==========================================
  Files         276      276              
  Lines       13352    13626     +274     
==========================================
+ Hits        11401    11549     +148     
- Misses       1951     2077     +126     
Impacted Files Coverage Δ
river/base/ensemble.py 100.00% <ø> (+10.00%) ⬆️
river/datasets/bikes.py 85.71% <ø> (ø)
river/datasets/elec2.py 85.71% <ø> (ø)
river/datasets/higgs.py 75.00% <ø> (ø)
river/datasets/http.py 75.00% <0.00%> (ø)
river/datasets/malicious_url.py 33.33% <0.00%> (ø)
river/datasets/movielens100k.py 85.71% <ø> (ø)
river/datasets/music.py 100.00% <ø> (ø)
river/datasets/phishing.py 100.00% <ø> (ø)
river/datasets/restaurants.py 87.50% <ø> (ø)
... and 268 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c104e0f...8d5c0a1. Read the comment docs.

Copy link
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

river/expert/bandit.py Outdated Show resolved Hide resolved
river/expert/bandit.py Outdated Show resolved Hide resolved
@etiennekintzler
Copy link
Contributor Author

Looking good!

Thanks :)

Regarding the example in the docstring what dataset in river would you recommend for model selection ?

@MaxHalford
Copy link
Member

Not sure, but something like logistic regression / Hoeffding tree / GausianNB sounds good

@etiennekintzler
Copy link
Contributor Author

etiennekintzler commented Dec 1, 2020

You are talking about methods nope ? (I was talking about dataset)

Regarding methods I would have liked to use online PCA (to do selection on the number of components) but doesn't seem to exist yet in river (just saw it on #3 )

@MaxHalford
Copy link
Member

Lol my bad: use Phishing for binary classification, ImageSegments for multi-class, and TrumpApproval for regression :). Indeed, we haven't implemented online PCA yet :)

@etiennekintzler
Copy link
Contributor Author

Hello @MaxHalford !

Would it help to have a "warm-up" period where each model is updated regardless of its cumulative reward?

Yes I did, this what I meant when i was talking about "explore each arm first" strategy in my previous message.

After thinking and tinkering with the rewarding system, I found an alternative that seems to work well which is to get rid of the online scaling for the reward and to use strong discounting for the first n rewards.

More specifically:

  1. fixed scaling system like sigmoid function for the reward instead of online scaler.
  2. online scaler to standardize y. It allows sigmoid to be used for problems which have different scale for y (and thus MSE/MAE).
  3. discounting factor for the first rewards, like exponential decay or just setting reward to 0 for the first n iterations. The goal is to have learning for the y scaler and for the models without skewing the average reward for the future. Sounds a bit like warm-up/explore first strategy but without messing with the average reward.

The benefits of abandoning the online scaler for the reward are also :

  • simplify the choice the user has to make
  • avoid the risk that a model which has really bad reward skew the whole scaling.

However the main drawback is that some bandit model (like UCB) make hypothesis about the distribution of the reward (e.g subgaussian), which might be a bit different with what we obtained using sigmoid.

Copy link
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

One important thing: since you've started this PR we've added more thorough code style rules. Essentially you'll want to run pre-commit install --hook-type pre-push on your work station. This will run black + flake8 before you git push. You can fix the code style for black by running black river --config .black.

river/expert/bandit.py Outdated Show resolved Hide resolved
river/expert/bandit.py Outdated Show resolved Hide resolved
river/expert/bandit.py Outdated Show resolved Hide resolved
@etiennekintzler
Copy link
Contributor Author

etiennekintzler commented Dec 21, 2020

One important thing: since you've started this PR we've added more thorough code style rules. Essentially you'll want to run pre-commit install --hook-type pre-push on your work station. This will run black + flake8 before you git push. You can fix the code style for black by running black river --config .black.

I had done it (as suggested in CONTRIBUTING.md) but had error when pushing from local.

Thinking the issue was on my side, I pushed directly from Github (as 7bea5bb was just name change) and had the same error as the one I had in local, that is :

would reformat /home/runner/work/river/river/river/expert/bandit.py
Oh no! 💥 💔 💥
1 file would be reformatted, 311 files would be left unchanged.
would reformat /home/runner/work/river/river/river/expert/bandit.py
Oh no! 💥 💔 💥
1 file would be reformatted, 311 files would be left unchanged.

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

river/utils/math.py:335:12: E741 ambiguous variable name 'l'
river/expert/bandit.py:143:5: E303 too many blank lines (2)

Error: Process completed with exit code 1.

does it mean that E741 and E303 are blocking and that I have to be resolve it by myself ?

Also, I don't know what to do with the DocTestFailure that happens because the docstring's Example should have the output of the command as well (when it's not stored in a variable). For example I can't use :

>>> for x, y in dataset:
...    bandit.learn_one(x=x, y=y)

since learn_one return self. Of course the result could be stored in a variable (I guess that's the reason why it's written like this in the README.md) but it looks a bit awkward to me (and misleading for the user).

@MaxHalford
Copy link
Member

MaxHalford commented Dec 21, 2020

does it mean that E741 and E303 are blocking and that I have to be resolve it by myself ?

Yep those are flake8 errors that you have to correct.

It seems that you also have some black errors, which you have correct by running black river --config .black. I advise installing the dev version of black before doing this: pip install git+git://github.com/psf/black --upgrade.

Also, I don't know what to do with the DocTestFailure that happens because the docstring's Example should have the output of the command as well (when it's not stored in a variable)

Alas you have to assign the output to a variable. Typically I would write bandit = bandit.learn_one(x, y). Now that I think about it, we could consider removing the return self from each fit_one method altogether as I don't see it bringing any value.

@etiennekintzler
Copy link
Contributor Author

does it mean that E741 and E303 are blocking and that I have to be resolve it by myself ?

Yep those are flake8 errors that you have to correct.

Ok !

It seems that you also have some black errors, which you have correct by running black river --config .black. I advise installing the dev version of black before doing this: pip install git+git://github.com/psf/black --upgrade.

Thank you :) work well after update

Also, I don't know what to do with the DocTestFailure that happens because the docstring's Example should have the output of the command as well (when it's not stored in a variable)

Alas you have to assign the output to a variable. Typically I would write bandit = bandit.learn_one(x, y). Now that I think about it, we could consider removing the return self from each fit_one method altogether as I don't see it bringing any value.

Yep or you could just remove the related test (that will run for every base.Regressor I guess).

@MaxHalford
Copy link
Member

It's a bit more complicated than that because we need to update every example.

@etiennekintzler
Copy link
Contributor Author

Oh ok I see,

Should I set it to

>>> for x, y in dataset:
...    bandit = bandit.learn_one(x=x, y=y)

and keep return self in learn_one ... or wait for the change ?

@MaxHalford
Copy link
Member

I wouldn't wait for the changes, it'll take some time :)

…_iter extra line; make average_reward 'public'
@etiennekintzler etiennekintzler changed the title [WIP] Bandits regressors for model selection (new PR to use Github CI/CD) Bandits regressors for model selection (new PR to use Github CI/CD) Dec 24, 2020
@etiennekintzler
Copy link
Contributor Author

Hello @MaxHalford !

There are tests not related to this PR that are failing.

They are located in river/compat/test_sklearn.py.

Also there is another test that results in internal error (in local as in the CI/CD pipeline). It happens right after river/metrics/test_.py::test_pickling[_RollingRegressionReport] (see in CI/CD logs), error is:

warnings._OptionError: invalid module name: 'sklearn.metrics.classification'

@MaxHalford
Copy link
Member

Hey mate!

I fixed the tests :). It was all down to the new release of scikit-learn (0.24).

@etiennekintzler
Copy link
Contributor Author

Great ! Can you push it ? :D

@etiennekintzler
Copy link
Contributor Author

Cool, all tests passed now !

I've nothing else to add for now, so you can review and merge if it looks good to you :)

The main change since your last review is the argmax function defined directly in river/bandits.py. Standard argmax will tend to favour pulling the first model since argmax([0, 0, 0, 0]) will always return 0. This argmax will pull a random arm in this very case. When there is one and only one maximum, both functions are of course the same.

Also, I don't know how you want to frame it but I think it could be good to mark this bandit class as experimental since this implementation doesn't really stem from theory (hence the importance of having researchers input on this issue).

Copy link
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good IMO! Ideally, I would like it if you could add some more comments to the internal functions. Maybe that adding a docstring to the Bandit would help. You could quickly describe the purpose of each function and how they work together. Not sure if I'm clear :). I just want newcomers to be able to grok how we're framing this. Then again, the code is really clear.

river/expert/bandit.py Outdated Show resolved Hide resolved
river/expert/bandit.py Outdated Show resolved Hide resolved

# Predict and learn with the chosen model
chosen_model = self[chosen_arm]
y_pred = chosen_model.predict_one(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be predict_proba for a classifier right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, didn't anticipate that. Also, for classifier the whole scaling thing is less of a problem (since the target is {0, 1}).

@MaxHalford
Copy link
Member

I guess we can merge now, and take care of the classification aspect in another PR?

@etiennekintzler
Copy link
Contributor Author

Looks really good IMO! Ideally, I would like it if you could add some more comments to the internal functions. Maybe that adding a docstring to the Bandit would help. You could quickly describe the purpose of each function and how they work together. Not sure if I'm clear :). I just want newcomers to be able to grok how we're framing this. Then again, the code is really clear.

Thank you really much ! I added some docstring

I guess we can merge now, and take care of the classification aspect in another PR?

Yes, I think you can merge 👌 !

The classification aspect in _compute_scaled_reward can be treated when introducing classifiers. Before classifiers, I'll do a PR with Exp3 regressor which is straightforward to implement and offer good results (see this WIP notebook).

@MaxHalford MaxHalford merged commit 1853cec into online-ml:master Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants