-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[action_mask error] #158
Comments
Heelo, I understand your concerns. In previous versions, we did not specifically test for scenarios where the In our latest PR #160, we have implemented and optimized this adjustment. We warmly invite you to review and test these modifications. Thank you for your valuable feedback, it is greatly appreciated and beneficial for the advancement of LightZero. Best wishes! |
Hi I run the following file to test I set the polocy dict as: and set If I didn't set action_type = 'varied_action_space', the error mentioned above still occur. But after setting action_type = 'varied_action_space', the error dissapeared, and the reward does increase through training step increases. But some weired part remain: the complete value will become -inf throughout the training process as: [12-04 22:47:17] INFO collect end: muzero_collector.py:729 Doesn't know whether this is an issue? Today before you upload the fixed version pull request, I am checking the same place and tried to fix this bug. I did the exactly same thing as you did but I just use the "else" part to run the code. Thanks for your reply!! |
Thank you for your feedback.
Best wishes! |
Hi I've run under the default config with and I guess that you didn't add Thanks! |
Hello, indeed, after following your modifications, we did encounter this issue. We are currently investigating the cause and searching for a solution. Thank you for your patience and feedback. |
Hi, this problem occurs because masked actions were not handled properly in gumbel muzero collecting. Now the error has been solved in #178 . We welcome you to review the change and test if it resolves the problem you were facing. Please let us know if you have any other questions or feedback. Thank you for reporting this issue! |
for any game which set the "action_mask" not equal all 1, for example when creating the BaseEnv:
Will result in the following error:
Traceback (most recent call last):
File "./zoo/custom/pkgir/config/pjk_disc_gumbel_muzero_config.py", line 93, in
train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
File "/home/LightZero-main/lzero/entry/train_muzero.py", line 174, in train_muzero
train_data = replay_buffer.sample(batch_size, policy)
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 76, in sample
batch_target_policies_non_re = self._compute_target_policy_non_reanalyzed(
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 681, in _compute_target_policy_non_reanalyzed
batch_target_policies_non_re = np.asarray(batch_target_policies_non_re)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (128, 6) + inhomogeneous part.
Exception ignored in: <function MuZeroEvaluator.del at 0x7f8bebff93a0>
After reading the code in game_buffer_muzero around p.661
I found that when
if self._cfg.env_type == 'not_board_games':
The legal_actions isn't processed. But when the case is board game, the legal action is processed.
So I guess the action_mask for not_board_games scenario isn't supported?
The text was updated successfully, but these errors were encountered: