Skip to content
This repository has been archived by the owner on Jan 16, 2023. It is now read-only.

The reason for clipped reward in V-trace #56

Closed
benlin1996 opened this issue Dec 23, 2020 · 2 comments
Closed

The reason for clipped reward in V-trace #56

benlin1996 opened this issue Dec 23, 2020 · 2 comments

Comments

@benlin1996
Copy link

For example, in the network.py in dmlab, you use clipped_reward (line 112). Can I know the reason why you construct the reward and the network this way? And is there any constraint on my reward setting in the environment.

Thanks

@lespeholt
Copy link
Collaborator

For this project we didn't try to innovate much on the networks. The network is really from the IMPALA work: https://arxiv.org/abs/1802.01561

For some games it may make sense to not clip or to represent the reward as a onehot. However, I haven't found it to be very important. It makes the most sense when the same object in a game can have different rewards depending on other factors in the game.

Clipping the reward for the loss is important though: https://github.com/google-research/seed_rl/blob/master/agents/vtrace/learner.py#L94

@benlin1996
Copy link
Author

Thanks for your explanation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants