Parallel Runner has worse performance at same timestep/episode than Episode Runner #3

PMatthaei · 2021-04-29T14:04:13Z

Reason:

Steppers did not read correct metrics f.e. reward was read correctly if policy team id set to 0 but in the build plan the policy team was at position 1. So done array was wrong accessed reading the done boolean of the scripted agent
learning seems to diverge or be very fluctuating in parallel steppers
episode stepper achieves better results in same time
check if logging is wrong and policy actually is good

Provide feedback