New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bandits regressors for model selection (new PR to use Github CI/CD) #397
Bandits regressors for model selection (new PR to use Github CI/CD) #397
Conversation
As in #391 : The test To put it differently, if there is two arms (A1, A2) and two bandits (B1, B2), and if the bandit B1 pull arm A1 and bandit B2 pull arm A2 at round 0, they won't output the same prediction at round 1 because their internals (in particular |
Hi @etiennekintzler. Some while ago I had to set the |
Indeed @etiennekintzler, as @smastelini is saying, you need to seed the randomized parts of your code. For instance, instead of calling self._rng = random.Random(seed) where You can then do |
@MaxHalford I misunderstood when you first talk about this in the 1st PR. I thought you meant fix a different seed for each bandit (for
Hello @smastelini, I check your code, I think I got it now, thanks you 👍 Also @MaxHalford not related to this, but I think the CI cancelled because of RAM usage (it happens when I run the tests on my computer), you can try to provision more RAM and see if the problem persists. |
Codecov Report
@@ Coverage Diff @@
## master #397 +/- ##
==========================================
- Coverage 85.38% 84.75% -0.64%
==========================================
Files 276 276
Lines 13352 13626 +274
==========================================
+ Hits 11401 11549 +148
- Misses 1951 2077 +126
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Thanks :) Regarding the example in the docstring what dataset in |
Not sure, but something like logistic regression / Hoeffding tree / GausianNB sounds good |
You are talking about methods nope ? (I was talking about dataset) Regarding methods I would have liked to use online PCA (to do selection on the number of components) but doesn't seem to exist yet in river (just saw it on #3 ) |
Lol my bad: use Phishing for binary classification, ImageSegments for multi-class, and TrumpApproval for regression :). Indeed, we haven't implemented online PCA yet :) |
Hello @MaxHalford !
Yes I did, this what I meant when i was talking about "explore each arm first" strategy in my previous message. After thinking and tinkering with the rewarding system, I found an alternative that seems to work well which is to get rid of the online scaling for the reward and to use strong discounting for the first n rewards. More specifically:
The benefits of abandoning the online scaler for the reward are also :
However the main drawback is that some bandit model (like UCB) make hypothesis about the distribution of the reward (e.g subgaussian), which might be a bit different with what we obtained using sigmoid. |
…le to avoid DocTestFailure
…t_after in __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
One important thing: since you've started this PR we've added more thorough code style rules. Essentially you'll want to run pre-commit install --hook-type pre-push
on your work station. This will run black
+ flake8
before you git push. You can fix the code style for black
by running black river --config .black
.
I had done it (as suggested in CONTRIBUTING.md) but had error when pushing from local. Thinking the issue was on my side, I pushed directly from Github (as 7bea5bb was just name change) and had the same error as the one I had in local, that is : would reformat /home/runner/work/river/river/river/expert/bandit.py
Oh no! 💥 💔 💥
1 file would be reformatted, 311 files would be left unchanged.
would reformat /home/runner/work/river/river/river/expert/bandit.py
Oh no! 💥 💔 💥
1 file would be reformatted, 311 files would be left unchanged.
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
river/utils/math.py:335:12: E741 ambiguous variable name 'l'
river/expert/bandit.py:143:5: E303 too many blank lines (2)
Error: Process completed with exit code 1. does it mean that E741 and E303 are blocking and that I have to be resolve it by myself ? Also, I don't know what to do with the >>> for x, y in dataset:
... bandit.learn_one(x=x, y=y) since |
Yep those are It seems that you also have some
Alas you have to assign the output to a variable. Typically I would write |
Ok !
Thank you :) work well after update
Yep or you could just remove the related test (that will run for every |
It's a bit more complicated than that because we need to update every example. |
Oh ok I see, Should I set it to >>> for x, y in dataset:
... bandit = bandit.learn_one(x=x, y=y) and keep |
I wouldn't wait for the changes, it'll take some time :) |
…_iter extra line; make average_reward 'public'
Hello @MaxHalford ! There are tests not related to this PR that are failing. They are located in Also there is another test that results in internal error (in local as in the CI/CD pipeline). It happens right after warnings._OptionError: invalid module name: 'sklearn.metrics.classification' |
Hey mate! I fixed the tests :). It was all down to the new release of scikit-learn (0.24). |
Great ! Can you push it ? :D |
Cool, all tests passed now ! I've nothing else to add for now, so you can review and merge if it looks good to you :) The main change since your last review is the Also, I don't know how you want to frame it but I think it could be good to mark this bandit class as experimental since this implementation doesn't really stem from theory (hence the importance of having researchers input on this issue). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good IMO! Ideally, I would like it if you could add some more comments to the internal functions. Maybe that adding a docstring to the Bandit
would help. You could quickly describe the purpose of each function and how they work together. Not sure if I'm clear :). I just want newcomers to be able to grok how we're framing this. Then again, the code is really clear.
|
||
# Predict and learn with the chosen model | ||
chosen_model = self[chosen_arm] | ||
y_pred = chosen_model.predict_one(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be predict_proba
for a classifier right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, didn't anticipate that. Also, for classifier the whole scaling thing is less of a problem (since the target is {0, 1}).
I guess we can merge now, and take care of the classification aspect in another PR? |
Thank you really much ! I added some docstring
Yes, I think you can merge 👌 ! The classification aspect in |
Description
The PR introduce bandits (epsilon-greedy and UCB) for model selection (see issue #270 ). The PR concerns only regressors, but I can add the classifiers in a subsequent PR.
The use of the classes are straightforward :
There are convenience methods such as :
percentage_pulled
: to get the percentage each arm was pulledbest_model
: return the model with the highest average rewardAlso I added a method
add_models
where the user can add models on the fly.I am also working on a notebook that studies the behavior of the bandits for model selection. The notebook also include Exp3, which seems promising but has numerical stability issue and yields counter-intuitive results (see section 3 of the NB). That's why I kept it out of this PR. More generally, the performances of UCB and epsilon-greedy are rather good but there seems to be some variance in the performance.
Improvements
It's still WIP on the following points :