Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Runner has worse performance at same timestep/episode than Episode Runner #3

Open
PMatthaei opened this issue Apr 29, 2021 · 0 comments

Comments

@PMatthaei
Copy link
Owner

PMatthaei commented Apr 29, 2021

Reason:

  • Steppers did not read correct metrics f.e. reward was read correctly if policy team id set to 0 but in the build plan the policy team was at position 1. So done array was wrong accessed reading the done boolean of the scripted agent
  • learning seems to diverge or be very fluctuating in parallel steppers
  • episode stepper achieves better results in same time
  • check if logging is wrong and policy actually is good
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant