RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

boscotsang · 2019-04-28T06:38:33Z

In ppo_agent.py, it compute the running estimate of intrinsic returns with rff_int.
rffs_int = np.array([self.I.rff_int.update(rew) for rew in self.I.buf_rews_int.T])
In reinforcement learning, returns are computed by sum{\gamma^t * r_t}. However in rff_int, it seems that it compute the returns by sum{\gamma^(T-t) * r_t) which discounted the reward forward.
What's the reason for compute the intrinsic returns forward?
Thanks!

The text was updated successfully, but these errors were encountered:

4kasha · 2019-05-01T19:26:02Z

Hi,

According to this comment, it seems just for convenience.
Modifying to self.I.buf_rews_int.T[::-1] will not change its std significantly, I think.

alirezakazemipour · 2020-10-08T08:24:09Z

Exactly. 👍
I think they have made a mistake!!!
It must have been self.I.buf_rews_int.T[::-1] as 4kasha has mentioned.

alirezakazemipour mentioned this issue Oct 8, 2020

Paper and implementation are different. openai/large-scale-curiosity#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

boscotsang commented Apr 28, 2019

4kasha commented May 1, 2019

alirezakazemipour commented Oct 8, 2020

RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

Comments

boscotsang commented Apr 28, 2019

4kasha commented May 1, 2019

alirezakazemipour commented Oct 8, 2020