Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably wrong MCTS implementation #16

Closed
ChenDRAG opened this issue Dec 20, 2021 · 3 comments · Fixed by #17
Closed

Probably wrong MCTS implementation #16

ChenDRAG opened this issue Dec 20, 2021 · 3 comments · Fixed by #17

Comments

@ChenDRAG
Copy link

for node in search_path:

As what I understand, we should reverse search_path and calculate value from bottom to up?

@ChenDRAG
Copy link
Author

I checked https://github.com/YeWR/EfficientZero 's implementation of MCTS(based on c language), and am almost certain we should reverse search_path.
https://github.com/YeWR/EfficientZero/blob/a0c094818d750237d5aa14263a65a1e1e4f2bbcb/core/ctree/cnode.cpp#L289

@ChenDRAG ChenDRAG changed the title Perhaps wrong MCTS implementation? Probably wrong MCTS implementation Dec 20, 2021
@koulanurag
Copy link
Owner

koulanurag commented Dec 20, 2021

Hi @ChenDRAG,

Thanks for finding this potential bug. Superficially, it does look that it should be reversed.

I would get back to this in a day or two to investigate its correctness.

Also, I am also happy to accept a PR on it.

@bAmpT
Copy link
Contributor

bAmpT commented Jan 12, 2022

The authors corrected the mistake of not reverseing the search_path in the v2 of the pseudocode. Another small error is that the target rewards seem to be misaligned, there is a thread on stackoverflow about that: https://stackoverflow.com/questions/60234530/is-the-reward-value-in-muzeros-pseudocode-misaligned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants