Sampled MuZero and Sampled EfficientZero #162

WilliamYangXu · 2023-12-05T04:30:32Z

Hi, thanks for the awesome implementations.

The algorithm "Sampled MuZero" and "Sampled EfficientZero" seemed to occur interchangeably, and I wonder if both of them meant the same algorithm. If so, do they refer to the continuous space version of MuZero (proposed in https://arxiv.org/pdf/2104.06303.pdf), or do they mean the continuous space version of EfficientZero. If not, it seems that I cannot find the Sampled MuZero algorithm in the lzero folder, I wonder where I can find it since you mentioned that it is implemented in your paper. Thanks a lot.

puyuan1996 · 2023-12-05T14:23:56Z

Hello, our current implementation, Sampled EfficentZero, essentially refers to Sampled MuZero with the addition of SSL loss and value prefix prediction proposed in EfficentZero. In our paper, we chose not to compare with the original version of Sampled MuZero, as the SSL loss made a significant contribution to improving sample efficiency in the Atari environment.
If you need the original version of Sampled MuZero, you would need to disable the two enhancements (SSL loss and value prefix prediction) in the current Sampled EfficentZero implementation. For the first enhancement, you can simply set the SSL loss weight to zero here. However, it should be noted that the second adjustment, changing the value prefix prediction back to the original reward prediction of Sampled MuZero, might require a fair amount of work.
If you indeed require the implementation of Sampled MuZero, we welcome your contribution and are ready to provide as much assistance as possible (p.s. we should have had the implementation of the original Sampled MuZero version in an private older backup branch). Best wishes!

WilliamYangXu · 2023-12-05T19:02:31Z

Hi and thanks for your quick reply. I am happy to contribute to implementation of Sampled MuZero if you can provide the your original version in the backup branch. You can find me through the email in my github profile. Thanks a lot.

puyuan1996 · 2023-12-06T06:55:19Z

Hello, thank you for your enthusiasm. Unfortunately, it seems we are unable to locate the backup branch, and even if we could, it contains only a draft version of the implementation without any optimizations. Therefore, a better approach might be to create a new branch directly from the main branch and undertake modifications as per the steps we've outlined above. We appreciate your support. Best wishes!

puyuan1996 added good first issue Good for newcomers algorithm New algorithm labels Dec 5, 2023

puyuan1996 closed this as completed Dec 26, 2023

puyuan1996 mentioned this issue Apr 10, 2024

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampled MuZero and Sampled EfficientZero #162

Sampled MuZero and Sampled EfficientZero #162

WilliamYangXu commented Dec 5, 2023 •

edited

Loading

puyuan1996 commented Dec 5, 2023

WilliamYangXu commented Dec 5, 2023 •

edited

Loading

puyuan1996 commented Dec 6, 2023

Sampled MuZero and Sampled EfficientZero #162

Sampled MuZero and Sampled EfficientZero #162

Comments

WilliamYangXu commented Dec 5, 2023 • edited Loading

puyuan1996 commented Dec 5, 2023

WilliamYangXu commented Dec 5, 2023 • edited Loading

puyuan1996 commented Dec 6, 2023

WilliamYangXu commented Dec 5, 2023 •

edited

Loading

WilliamYangXu commented Dec 5, 2023 •

edited

Loading