Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About SHIFT #8

Open
yichaowa opened this issue Nov 26, 2018 · 1 comment
Open

About SHIFT #8

yichaowa opened this issue Nov 26, 2018 · 1 comment

Comments

@yichaowa
Copy link

I have no idea about why we need to subtract a shift from reward, and how to set this value?

@dryanguasr
Copy link

They explain that in the article... the idea is to supress the survival bonus from the reward function in order to avoid some local optima. In hopper the survival bonus is 1 per step so shift is set to 1 and in humanoid it is 5 per sted so shift is set to 5. It is even commented in the ars.py file:

# for Swimmer-v1 and HalfCheetah-v1 use shift = 0
# for Hopper-v1, Walker2d-v1, and Ant-v1 use shift = 1
# for Humanoid-v1 used shift = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants