Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_grad_norm and use_clipped_value_loss #160

Closed
seungjaeryanlee opened this issue Dec 28, 2018 · 2 comments
Closed

max_grad_norm and use_clipped_value_loss #160

seungjaeryanlee opened this issue Dec 28, 2018 · 2 comments

Comments

@seungjaeryanlee
Copy link

Hello! I was documenting your PPO code algo/ppo.py to improve my understanding of the algorithm, and I got confused on max_grad_norm and _use_clipped_value_loss.

If I am understanding this correctly, max_grad_norm is given to nn.utils.clip_grad_norm_() to set maximum gradient size, and _use_clipped_value_loss. However, I could not find relevent detail in the paper Proximal Policy Optimization Algorithm. If it was explicitly mentioned here, would you please point it out for me?

For L^VF, The paper seems to use the simple squared loss, equivalent to use_clipped_value_loss=False, but I could not find anything about the case when use_clipped_value_loss=True. Is this a trick not mentioned in the paper?

Thank you in advance for your help. Happy holidays!

@ikostrikov
Copy link
Owner

They introduced the new loss in the implementation of PPO2:
https://github.com/openai/baselines/blob/master/baselines/ppo2/model.py#L63

Also see grad normalization here:
https://github.com/openai/baselines/blob/master/baselines/ppo2/model.py#L102

@seungjaeryanlee
Copy link
Author

Thank you for the links! I see how they correspond to those parts of PPO2 in OpenAI Baselines.

It's unfortunate that these changes are not written in any paper. Guess I will have to read openai/baselines code as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants