Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampled MuZero and Sampled EfficientZero #162

Closed
WilliamYangXu opened this issue Dec 5, 2023 · 3 comments
Closed

Sampled MuZero and Sampled EfficientZero #162

WilliamYangXu opened this issue Dec 5, 2023 · 3 comments
Labels
algorithm New algorithm good first issue Good for newcomers

Comments

@WilliamYangXu
Copy link

WilliamYangXu commented Dec 5, 2023

Hi, thanks for the awesome implementations.

The algorithm "Sampled MuZero" and "Sampled EfficientZero" seemed to occur interchangeably, and I wonder if both of them meant the same algorithm. If so, do they refer to the continuous space version of MuZero (proposed in https://arxiv.org/pdf/2104.06303.pdf), or do they mean the continuous space version of EfficientZero. If not, it seems that I cannot find the Sampled MuZero algorithm in the lzero folder, I wonder where I can find it since you mentioned that it is implemented in your paper. Thanks a lot.

@puyuan1996
Copy link
Collaborator

  • Hello, our current implementation, Sampled EfficentZero, essentially refers to Sampled MuZero with the addition of SSL loss and value prefix prediction proposed in EfficentZero. In our paper, we chose not to compare with the original version of Sampled MuZero, as the SSL loss made a significant contribution to improving sample efficiency in the Atari environment.

  • If you need the original version of Sampled MuZero, you would need to disable the two enhancements (SSL loss and value prefix prediction) in the current Sampled EfficentZero implementation. For the first enhancement, you can simply set the SSL loss weight to zero here. However, it should be noted that the second adjustment, changing the value prefix prediction back to the original reward prediction of Sampled MuZero, might require a fair amount of work.

  • If you indeed require the implementation of Sampled MuZero, we welcome your contribution and are ready to provide as much assistance as possible (p.s. we should have had the implementation of the original Sampled MuZero version in an private older backup branch). Best wishes!

@puyuan1996 puyuan1996 added good first issue Good for newcomers algorithm New algorithm labels Dec 5, 2023
@WilliamYangXu
Copy link
Author

WilliamYangXu commented Dec 5, 2023

Hi and thanks for your quick reply. I am happy to contribute to implementation of Sampled MuZero if you can provide the your original version in the backup branch. You can find me through the email in my github profile. Thanks a lot.

@puyuan1996
Copy link
Collaborator

Hello, thank you for your enthusiasm. Unfortunately, it seems we are unable to locate the backup branch, and even if we could, it contains only a draft version of the implementation without any optimizations. Therefore, a better approach might be to create a new branch directly from the main branch and undertake modifications as per the steps we've outlined above. We appreciate your support. Best wishes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm New algorithm good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants