[action_mask error] #158

lewis841214 · 2023-12-04T08:45:28Z

for any game which set the "action_mask" not equal all 1, for example when creating the BaseEnv:

    if not self._continuous:
        action_mask = np.ones(self.discrete_action_num, 'int8')
    else:
        action_mask = None
    
    # Here I set the action 2 to be invalid:
    action_mask[2] = 0
    
    obs = {'observation': obs, 'action_mask': action_mask, 'to_play': -1}
    return BaseEnvTimestep(obs, rew, done, info)

Will result in the following error:

Traceback (most recent call last):
File "./zoo/custom/pkgir/config/pjk_disc_gumbel_muzero_config.py", line 93, in
train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
File "/home/LightZero-main/lzero/entry/train_muzero.py", line 174, in train_muzero
train_data = replay_buffer.sample(batch_size, policy)
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 76, in sample
batch_target_policies_non_re = self._compute_target_policy_non_reanalyzed(
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 681, in _compute_target_policy_non_reanalyzed
batch_target_policies_non_re = np.asarray(batch_target_policies_non_re)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (128, 6) + inhomogeneous part.
Exception ignored in: <function MuZeroEvaluator.del at 0x7f8bebff93a0>

After reading the code in game_buffer_muzero around p.661

I found that when
if self._cfg.env_type == 'not_board_games':

The legal_actions isn't processed. But when the case is board game, the legal action is processed.

So I guess the action_mask for not_board_games scenario isn't supported?

The text was updated successfully, but these errors were encountered:

puyuan1996 · 2023-12-04T12:34:35Z

Heelo, I understand your concerns. In previous versions, we did not specifically test for scenarios where the action_mask in not_board_games contains zeros. However, theoretically, our handling of variable action spaces should be extendable to not_board_games. Therefore, we have proposed to expand the original env_type into two variables: env_type and action_type.

In our latest PR #160, we have implemented and optimized this adjustment. We warmly invite you to review and test these modifications. Thank you for your valuable feedback, it is greatly appreciated and beneficial for the advancement of LightZero. Best wishes!

lewis841214 · 2023-12-04T15:45:18Z

Hi I run the following file to test
python3 ./zoo/box2d/lunarlander/config/lunarlander_disc_gumbel_muzero_config.py

I set the polocy dict as:
env_type='not_board_games',
action_type = 'varied_action_space',

and set
action_mask[0] = 0
in env file.

If I didn't set action_type = 'varied_action_space', the error mentioned above still occur. But after setting action_type = 'varied_action_space', the error dissapeared, and the reward does increase through training step increases.

But some weired part remain: the complete value will become -inf throughout the training process as:

[12-04 22:47:17] INFO collect end: muzero_collector.py:729
episode_count: 16
envstep_count: 1248
avg_envstep_per_episode: 78.0
avg_envstep_per_sec: 224.66893464049363
avg_episode_per_sec: 2.8803709569294056
collect_time: 5.554840067217127
reward_mean: -373.18820628837983
reward_std: 212.4814746359075
reward_max: -135.52152484840005
reward_min: -788.0163128857876
total_envstep_count: 1248
total_episode_count: 32
total_duration: 14.115308258408682
visit_entropy: 1.3028649394086207
completed_value: -inf
[12-04 22:47:17] WARNING NaN or Inf found in input tensor. x2num.py:14
[12-04 22:47:17] WARNING NaN or Inf found in input tensor.

Doesn't know whether this is an issue?

By the way I've a question:

Today before you upload the fixed version pull request, I am checking the same place and tried to fix this bug. I did the exactly same thing as you did but I just use the "else" part to run the code.
But the weired thing is :
the else part in the picture I capture, the code inside it is independent to
state_index and current_index
so they just keep on producing same thing? I.e. the variable "target_policies" keeps appending same thing?

Thanks for your reply!!

puyuan1996 · 2023-12-05T04:51:47Z

Thank you for your feedback.

Regarding the issue of encountering completed_value: -inf when running lunarlander_disc_gumbel_muzero_config.py, I would like to confirm, did you only use the default configuration and make no additional modifications? Did this problem arise at the very beginning of the program execution? On my macOS system, I executed 30K environment steps and did not encounter a similar issue. In order to pinpoint the problem more accurately, please provide more detailed information.
About your observation that the code segment does not use state_index and current_index, this is because our goal here is to transform the visit count distribution obtained from MCTS search into target_policies that comply with a specific data format. This is mainly accomplished through distributions = roots_distributions[policy_index] and policy_index += 1. I acknowledge that there is redundancy in this section of the code and there are more efficient implementation methods. We will optimize it in the coming weeks. I greatly appreciate your valuable suggestion.

Best wishes!

lewis841214 · 2023-12-05T07:28:13Z

Hi I've run
lunarlander_disc_gumbel_muzero_config.py

under the default config with
action_type = 'varied_action_space', added at p.43
and
action_mask[0] = 0 added at p.139 in
LightZero-fix-action-mask/zoo/box2d/lunarlander/envs/lunarlander_env.py

and
completed_value: -inf
occurs.

I guess that you didn't add action_mask[0] = 0 in the env file so all element in action_mask = 1. If I don't put action_mask[0] = 0 into the env file, the error won't occur neither, but this is not what we want right? Since this enhancement is created for some action is masked as 0.

Thanks!

puyuan1996 · 2023-12-05T13:10:53Z

Hello, indeed, after following your modifications, we did encounter this issue. We are currently investigating the cause and searching for a solution. Thank you for your patience and feedback.

karroyan · 2023-12-27T03:38:22Z

Hi, this problem occurs because masked actions were not handled properly in gumbel muzero collecting. Now the error has been solved in #178 . We welcome you to review the change and test if it resolves the problem you were facing. Please let us know if you have any other questions or feedback. Thank you for reporting this issue!

puyuan1996 added the enhancement New feature or request label Dec 4, 2023

puyuan1996 added the config New or improved configuration label Dec 5, 2023

puyuan1996 assigned karroyan Dec 22, 2023

puyuan1996 closed this as completed Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[action_mask error] #158

[action_mask error] #158

lewis841214 commented Dec 4, 2023

puyuan1996 commented Dec 4, 2023

lewis841214 commented Dec 4, 2023

puyuan1996 commented Dec 5, 2023

lewis841214 commented Dec 5, 2023

puyuan1996 commented Dec 5, 2023

karroyan commented Dec 27, 2023

[action_mask error] #158

[action_mask error] #158

Comments

lewis841214 commented Dec 4, 2023

puyuan1996 commented Dec 4, 2023

lewis841214 commented Dec 4, 2023

puyuan1996 commented Dec 5, 2023

lewis841214 commented Dec 5, 2023

puyuan1996 commented Dec 5, 2023

karroyan commented Dec 27, 2023