Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation method of value estimate #23

Closed
wayunderfoot opened this issue Mar 16, 2020 · 3 comments
Closed

Calculation method of value estimate #23

wayunderfoot opened this issue Mar 16, 2020 · 3 comments

Comments

@wayunderfoot
Copy link

Thank you for your outstanding work. In your paper, the estimated value and the real value are mentioned. I would like to ask about the specific calculation method. Thank you。

@sfujim
Copy link
Owner

sfujim commented Mar 30, 2020

Estimated value comes from the average output from the critic. The real value is computed by resetting the state of the MuJoCo simulator to states sampled from the replay buffer, and then following the trajectory to completion, starting from the state (and corresponding action) sampled from the buffer.

Alternatively, you could just estimate the real value by running trajectories from the initial start states (i.e. just reset the env and run). And compute the corresponding value from the critic on those start states.

@sfujim sfujim closed this as completed Apr 7, 2020
@geyang
Copy link

geyang commented Dec 3, 2021

Hey Scott, when you say

Estimated value comes from the average output from the critic.

Which critic do you use to compute the estimate? do you use both of the two critics and average between them, or this averaging is more amongst the starting states?

@sfujim
Copy link
Owner

sfujim commented Dec 3, 2021

For TD3 it's the min of the critics. Both the estimated and real value are averages over state-action pairs in the replay buffer. So for the estimated value, it was: sample state-action pairs from the buffer -> evaluate both critics on these pairs -> take the min of the two critics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants