Issues with the RewardFilter #2

vwxyzjn · 2020-01-23T22:55:06Z

Hi, thanks for the great work. I have three questions if you don't mind.

In

code-for-paper/src/policy_gradients/torch_utils.py

Line 358 in 094994f

class RewardFilter:

,
the comments suggest it uses Incorrect reward normalization. I was wondering if you could elaborate. Does that mean we should avoid using RewardFilter because of the incorrect normalization and try to use Zfilter instead for the reward normalization?

Another concern I have is with the reset() call of the RewardFilter. It seems that in your customized envs,

    def reset(self):
        # Reset the state, and the running total reward
        start_state = self.env.reset()
        self.total_true_reward = 0.0
        self.counter = 0.0
        self.state_filter.reset()
        return self.state_filter(start_state, reset=True)

It seems the reward_filter will never reset. However, the reward_filter always multiply the existing returns by gamma. Could this be a bug?
The reward_filter is already using the gamma as part of its inputs, but do you still calculate the advantage using the gamma again or is this somehow omitted?

Thanks.

The text was updated successfully, but these errors were encountered:

walkacross · 2020-02-22T04:47:56Z

hi @vwxyzjn,
To my understanding, the state_filter and reward_filter always keep the inside RunningStats object alive, and will never reset, which means keep track of running stats all over the episodes rather than per episode.(I am a little bit of concerned about its reasonability).

the gamma in reward_filter and the advantage calculation have different meaning and purpose. the former usage is to keep running sum of rewards(discount past returns and then update with momentum), and the latter one is to discount future rewards.

andrewilyas · 2020-05-11T15:50:52Z

Hi all, I'm an author of the corresponding paper for this repo. Since this was an anonymized submission we were unable to comment/change the code during submission. There is now an update repository with better hyper parameters, where we also switched to a system where we reset the reward filter: https://github.com/MadryLab/implementation-matters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with the RewardFilter #2

Issues with the RewardFilter #2

vwxyzjn commented Jan 23, 2020 •

edited

Loading

walkacross commented Feb 22, 2020

andrewilyas commented May 11, 2020

Issues with the RewardFilter #2

Issues with the RewardFilter #2

Comments

vwxyzjn commented Jan 23, 2020 • edited Loading

walkacross commented Feb 22, 2020

andrewilyas commented May 11, 2020

vwxyzjn commented Jan 23, 2020 •

edited

Loading