Consider using qtransform_completed_by_mix_value. #1

fidlej · 2022-06-15T22:51:53Z

Thanks for the nice project.
Have you tried using the default qtransform_completed_by_mix_value for the gumbel_muzero_policy?

The qtransform_by_min_max gives zero values to unvisited actions. That does not have a good theoretical justification.

NTT123 · 2022-06-16T02:30:16Z

Hi @fidlej, thanks for informing me about this issue. I will take a deeper look into these transforms.

NTT123 · 2022-06-16T11:36:40Z

The agent is much stronger when training with qtransform_completed_by_mix_value. Thank you again for your suggestion!

NTT123 added a commit that referenced this issue Jun 16, 2022

Use a better qtransform as suggested in #1

486bce5

NTT123 closed this as completed Jun 16, 2022

onegigbyte mentioned this issue Nov 25, 2022

Killed unexpectedly in Colab with TPU #7

Closed

Provide feedback