The reason for clipped reward in V-trace #56

benlin1996 · 2020-12-23T02:13:20Z

For example, in the network.py in dmlab, you use clipped_reward (line 112). Can I know the reason why you construct the reward and the network this way? And is there any constraint on my reward setting in the environment.

Thanks

lespeholt · 2020-12-23T08:38:06Z

For this project we didn't try to innovate much on the networks. The network is really from the IMPALA work: https://arxiv.org/abs/1802.01561

For some games it may make sense to not clip or to represent the reward as a onehot. However, I haven't found it to be very important. It makes the most sense when the same object in a game can have different rewards depending on other factors in the game.

Clipping the reward for the loss is important though: https://github.com/google-research/seed_rl/blob/master/agents/vtrace/learner.py#L94

benlin1996 · 2020-12-23T09:33:38Z

Thanks for your explanation

benlin1996 closed this as completed Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The reason for clipped reward in V-trace #56

The reason for clipped reward in V-trace #56

benlin1996 commented Dec 23, 2020

lespeholt commented Dec 23, 2020

benlin1996 commented Dec 23, 2020

The reason for clipped reward in V-trace #56

The reason for clipped reward in V-trace #56

Comments

benlin1996 commented Dec 23, 2020

lespeholt commented Dec 23, 2020

benlin1996 commented Dec 23, 2020