Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including GAE #26

Closed
CesMak opened this issue Mar 28, 2020 · 1 comment
Closed

Including GAE #26

CesMak opened this issue Mar 28, 2020 · 1 comment

Comments

@CesMak
Copy link

CesMak commented Mar 28, 2020

Hey there,

you used Monte Carlo Estimate - would it no also be nice to have GAE (Generalized Advantage estimation?)

The function should be something like:

I am not sure how exactly I can include gae in the code....

    def get_advantages(self, values, masks, rewards, gamma):
        returns = []
        gae = 0
        for i in reversed(range(len(rewards))):
            delta = rewards[i-1] + gamma * values[i] * masks[i-1] - values[i-1]
            gae = delta + gamma * 0.95 * masks[i-1] * gae
            returns.insert(0, gae + values[i-1])

        adv = np.array(returns) - values.detach().numpy()
        adv = torch.tensor(adv.astype(np.float32)).float()
        # Normalizing advantages
        return returns, (adv - adv.mean()) / (adv.std() + 1e-5)
       # Monte Carlo estimate of rewards:
        rewards = []
        discounted_reward = 0
        for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)):
            if is_terminal:
                discounted_reward = 0
            discounted_reward = reward + (self.gamma * discounted_reward)
            rewards.insert(0, discounted_reward)
@nikhilbarhate99
Copy link
Owner

I do not think the added complexity is worth it in this repo, since the goal is to provide a simplistic and beginner friendly implementation.

Also, Bootstrapping values does require the experience to be collected from parallel workers in order to work practically.
Source: skip to 54:19 of (https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=6)

I do not know how GAE will perform on a single worker and I also do not have the time to re-test GAE on different environments.

If your implementation works correctly, then you can add it in your fork of this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants