-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding state_values.detach() #29
Comments
For updating the value network, we later calculate the MSE loss without As to why it is performing better? , I do not know. |
Hi, thank you for the reply. I must have made some mistake in my experiment. As you said '.detach' does return a new tensor without the computation graph, so it wouldn't stop the value layer gets updated with the undetached state values. My mistake! |
thank u for ur example~~~ i modify as this critic_vpi = self.policy_next.network_critic(curr_states)
refer to here: |
Hi, thanks for the great implementation. I learned a lot about PPO by reading your code.
I have one question regarding the state_values.detach() when updating PPO.
When you detach a tensor, it loses its computation record that is used in back propagation.
So I checked if the weights of the value layer of the policy get updated, and they did not.
Surprisingly, in my own experiment, the training performance was better with .detach() than the one without. But I still find it difficult to understand the use of detach() theoretically.
Thank you.
The text was updated successfully, but these errors were encountered: