We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey there,
you used Monte Carlo Estimate - would it no also be nice to have GAE (Generalized Advantage estimation?)
The function should be something like:
I am not sure how exactly I can include gae in the code....
def get_advantages(self, values, masks, rewards, gamma): returns = [] gae = 0 for i in reversed(range(len(rewards))): delta = rewards[i-1] + gamma * values[i] * masks[i-1] - values[i-1] gae = delta + gamma * 0.95 * masks[i-1] * gae returns.insert(0, gae + values[i-1]) adv = np.array(returns) - values.detach().numpy() adv = torch.tensor(adv.astype(np.float32)).float() # Normalizing advantages return returns, (adv - adv.mean()) / (adv.std() + 1e-5)
# Monte Carlo estimate of rewards: rewards = [] discounted_reward = 0 for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)): if is_terminal: discounted_reward = 0 discounted_reward = reward + (self.gamma * discounted_reward) rewards.insert(0, discounted_reward)
The text was updated successfully, but these errors were encountered:
I do not think the added complexity is worth it in this repo, since the goal is to provide a simplistic and beginner friendly implementation.
Also, Bootstrapping values does require the experience to be collected from parallel workers in order to work practically. Source: skip to 54:19 of (https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=6)
I do not know how GAE will perform on a single worker and I also do not have the time to re-test GAE on different environments.
If your implementation works correctly, then you can add it in your fork of this repo.
Sorry, something went wrong.
No branches or pull requests
Hey there,
you used Monte Carlo Estimate - would it no also be nice to have GAE (Generalized Advantage estimation?)
The function should be something like:
I am not sure how exactly I can include gae in the code....
The text was updated successfully, but these errors were encountered: