Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stable-Baselines3 RL Example. #1420

Merged
merged 15 commits into from Jul 13, 2020
Merged

Add Stable-Baselines3 RL Example. #1420

merged 15 commits into from Jul 13, 2020

Conversation

araffin
Copy link
Contributor

@araffin araffin commented Jun 24, 2020

Note: the only thing missing now is to deactivate tests for python < 3.6 (I don't know where that should be changed)

Motivation

closes #1314

Description of the changes

Add an hyperparameter tuning example in an reinforcement learning context using Stable-Baselines3.

@araffin
Copy link
Contributor Author

araffin commented Jun 24, 2020

@HideakiImamura where is the flake8 config to check the style locally? I've been using flake8 . at the root of the project but it does not show the errors that CircleCI shows...

EDIT: I installed hacking and could fix the issues

@hvy hvy self-assigned this Jun 25, 2020
Copy link
Contributor

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you so much for doing this awesome work!!!

I left a couple of comments but most of them are cosmetic.
So, before resolving them, let @hvy review this and make sure we are on the same page. 😃

examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
Comment on lines 151 to 154
try:
study.optimize(objective, n_trials=N_TRIALS, n_jobs=N_JOBS)
except KeyboardInterrupt:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use timeout instead of KeyboardInterrupt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have both, no?
This is more intended to create the report even when the user kills the optimization early.
The timeout would be more for the tests, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right and what you do, i.e., the use of KeyboardInterrupt is really cool.

That being said, it's embarrassing to say this though, could you set timeout for the faster CI runs like this?

study.optimize(objective, n_trials=100, timeout=600)

examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
araffin and others added 2 commits June 26, 2020 10:33
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
@araffin
Copy link
Contributor Author

araffin commented Jun 26, 2020

Thanks for your comments, I added your suggestions.
One last thing that I did not find: how do you add that example to CI tests?

In the past, there was a stage but it was removed recently apparently.

@crcrpar
Copy link
Contributor

crcrpar commented Jun 26, 2020

One last thing that I did not find: how do you add that example to CI tests?

In the past, there was a stage but it was removed recently apparently.

Good catch! Currently, example runs are daily and not checked in PR's CI.

on:
schedule:
- cron: '0 15 * * *'

So, I think what we need to do is to add your example to IGNORES.

if [ ${{ matrix.python-version }} = 3.5 ]; then
IGNORES='chainermn_.*|pytorch_lightning.*|fastai_.*|allennlp_.*'

Copy link
Member

@hvy hvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your quick action. Took a quick skim through your code and it basically LGTM.

self.eval_idx += 1
self.trial.report(self.last_mean_reward, self.eval_idx)
# Prune trial if need
if self.trial.should_prune(self.eval_idx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described here the step argument has been deprecated for a while and has now been discarded. The logic should remain unchained in this case so let's simply omit it.

Suggested change
if self.trial.should_prune(self.eval_idx):
if self.trial.should_prune():

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I think I wrote that code one year and half ago... so things have changed a bit ;)

# Sometimes, random hyperparams can generate NaN
# Prune hyperparams that generate NaNs
print(e)
raise optuna.exceptions.TrialPruned()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tip but if applicable, you can also, after cleaning up, return a float('nan') from the objective function instead of treating it as a pruned trial. Optuna will treat that trial as a failed trial https://github.com/optuna/optuna/blob/master/optuna/study.py#L737 and samplers/pruners in Optuna will know how to handle it.

examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
@HideakiImamura HideakiImamura self-assigned this Jun 30, 2020
@HideakiImamura
Copy link
Member

@araffin Thanks for the PR! The introduced example looks very impressive. I would like to review this PR after you addressing above @hvy's reviews. Thanks!

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the swift action! I have one suggestion around suggest_* methods.

examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
araffin and others added 4 commits July 1, 2020 16:03
@araffin
Copy link
Contributor Author

araffin commented Jul 1, 2020

Is there an easy way to display the user attributes for each trial in terminal?

Because for a RL researcher, it's not very intuitive to see "gamma=0.002" in the terminal...

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the swift action! LGTM except for one minor comment!

examples/rl/sb3_simple.py Outdated Show resolved Hide resolved
@HideakiImamura
Copy link
Member

IMO, the current visualization of the optimized parameters looks good. If we have a sufficient time budget, it would be a good idea to run the training again with the optimized parameters and evaluate the performance of the parameters.

Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
@HideakiImamura
Copy link
Member

Could you merge the master branch? It will resolve the CI failure.

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great effort! LGTM!

Copy link
Member

@hvy hvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, and again thanks for you effort. LGTM!

I did not check every detail with stable_baselines3 but the usage of Optuna seems good and I also verified the example locally.

@hvy hvy merged commit ff7f9f6 into optuna:master Jul 13, 2020
@HideakiImamura HideakiImamura added this to the v2.0.0 milestone Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stable-Baselines3 RL Example
4 participants