New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Stable-Baselines3 RL Example. #1420
Conversation
EDIT: I installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you so much for doing this awesome work!!!
I left a couple of comments but most of them are cosmetic.
So, before resolving them, let @hvy review this and make sure we are on the same page. 😃
examples/rl/sb3_simple.py
Outdated
try: | ||
study.optimize(objective, n_trials=N_TRIALS, n_jobs=N_JOBS) | ||
except KeyboardInterrupt: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use timeout
instead of KeyboardInterrupt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have both, no?
This is more intended to create the report even when the user kills the optimization early.
The timeout would be more for the tests, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right and what you do, i.e., the use of KeyboardInterrupt
is really cool.
That being said, it's embarrassing to say this though, could you set timeout
for the faster CI runs like this?
optuna/examples/pytorch_simple.py
Line 136 in e249aa8
study.optimize(objective, n_trials=100, timeout=600) |
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Thanks for your comments, I added your suggestions. In the past, there was a stage but it was removed recently apparently. |
Good catch! Currently, example runs are daily and not checked in PR's CI. optuna/.github/workflows/examples.yml Lines 3 to 5 in e249aa8
So, I think what we need to do is to add your example to optuna/.github/workflows/examples.yml Lines 38 to 39 in e249aa8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your quick action. Took a quick skim through your code and it basically LGTM.
examples/rl/sb3_simple.py
Outdated
self.eval_idx += 1 | ||
self.trial.report(self.last_mean_reward, self.eval_idx) | ||
# Prune trial if need | ||
if self.trial.should_prune(self.eval_idx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described here the step
argument has been deprecated for a while and has now been discarded. The logic should remain unchained in this case so let's simply omit it.
if self.trial.should_prune(self.eval_idx): | |
if self.trial.should_prune(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I think I wrote that code one year and half ago... so things have changed a bit ;)
examples/rl/sb3_simple.py
Outdated
# Sometimes, random hyperparams can generate NaN | ||
# Prune hyperparams that generate NaNs | ||
print(e) | ||
raise optuna.exceptions.TrialPruned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a tip but if applicable, you can also, after cleaning up, return a float('nan')
from the objective function instead of treating it as a pruned trial. Optuna will treat that trial as a failed trial https://github.com/optuna/optuna/blob/master/optuna/study.py#L737 and samplers/pruners in Optuna will know how to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the swift action! I have one suggestion around suggest_*
methods.
Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
Is there an easy way to display the user attributes for each trial in terminal? Because for a RL researcher, it's not very intuitive to see "gamma=0.002" in the terminal... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the swift action! LGTM except for one minor comment!
IMO, the current visualization of the optimized parameters looks good. If we have a sufficient time budget, it would be a good idea to run the training again with the optimized parameters and evaluate the performance of the parameters. |
Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
Could you merge the master branch? It will resolve the CI failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great effort! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, and again thanks for you effort. LGTM!
I did not check every detail with stable_baselines3
but the usage of Optuna seems good and I also verified the example locally.
Note: the only thing missing now is to deactivate tests for python < 3.6 (I don't know where that should be changed)
Motivation
closes #1314
Description of the changes
Add an hyperparameter tuning example in an reinforcement learning context using Stable-Baselines3.