-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this a bug in runner.py? #14
Comments
These leaves another problem. Followed by the codes, the model should be trained every n_steps, say 12, inside each training, may contains two episode, then some value calculated by the bellman equation will be wrong. |
No bugs, episode end is accounted for inside the agent during returns calculation |
Oh, thanks a lot, but I still think that there is error in reward calculation... |
If you're looking at console logs then they're calculated here. Notice that rewards are averaged and only displayed after all envs report back |
Yeah, I have seen those codes.:) I mean, when calculating the average score, it may include more than one episode in one env because it has to wait for others to report |
Okay, I can see it now. Can confirm it's a bug, good catch! This most likely doesn't affect non-adversarial minigames, but definitely might explain high variance for others like As I mentioned in #7 I'm currently re-writing the project essentially from scratch, so I don' think I'll have time to fix it in legacy codebase, but I'll be sure to keep this bug in mind. I plan to publish the rewrite by the end of August. |
Great! Thanks a lot. Hope for your next great release! |
Let's keep the ticket open until next release so others are informed as well. |
Fixed! |
Thank you for the great codes. When I tried new maps, I found some problems in runner.py. When there are more than one env, one env have done before others, then it is going to restart the game. At the end, all envs are done, the calculated rewards contain many episodes, which is a much bigger number. If you understand what I am talking about, please tell me is there any problem?
The text was updated successfully, but these errors were encountered: