-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probably wrong MCTS implementation #16
Comments
I checked https://github.com/YeWR/EfficientZero 's implementation of MCTS(based on c language), and am almost certain we should reverse search_path. |
Hi @ChenDRAG, Thanks for finding this potential bug. Superficially, it does look that it should be reversed. I would get back to this in a day or two to investigate its correctness. Also, I am also happy to accept a PR on it. |
The authors corrected the mistake of not reverseing the search_path in the v2 of the pseudocode. Another small error is that the target rewards seem to be misaligned, there is a thread on stackoverflow about that: https://stackoverflow.com/questions/60234530/is-the-reward-value-in-muzeros-pseudocode-misaligned |
muzero-pytorch/core/mcts.py
Line 105 in 4c008d0
As what I understand, we should reverse search_path and calculate value from bottom to up?
The text was updated successfully, but these errors were encountered: