Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

Merged
merged 25 commits into from
Mar 5, 2023

Conversation

song2181
Copy link
Contributor

@song2181 song2181 commented Dec 28, 2022

Description

fix config max_envstep. fix MiniGrid-AKTDT-7x7-1-v0 bug

Related Issue

TODO

Check List

  • merge the latest version source branch/repo, and resolve all the conflicts
  • pass style check
  • pass all the tests

@song2181 song2181 changed the title Dev icm onppo fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs Dec 28, 2022
@PaParaZz1 PaParaZz1 added bug Something isn't working env Questions about RL environment labels Dec 28, 2022
dizoo/minigrid/__init__.py Show resolved Hide resolved
dizoo/minigrid/config/minigrid_icm_offppo_config.py Outdated Show resolved Hide resolved
dizoo/minigrid/config/minigrid_icm_offppo_config.py Outdated Show resolved Hide resolved
@@ -185,7 +186,7 @@ def step(self, action):

obs = self.gen_obs()

return obs, reward, done, {}
return obs, reward, done, done, {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why two done here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bacause in file minigrid_env.py , we use the step method of Gymnasium env, return is (observation, reward, terminated, truncated, info)
obs, rew, done, _, info = self._env.step(action)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments about this

setup.py Outdated Show resolved Hide resolved
ding/reward_model/rnd_reward_model.py Show resolved Hide resolved
ding/reward_model/icm_reward_model.py Outdated Show resolved Hide resolved
ding/reward_model/icm_reward_model.py Outdated Show resolved Hide resolved
ding/reward_model/icm_reward_model.py Show resolved Hide resolved
self.tb_logger.add_scalar('icm_reward/icm_reward_mean', icm_reward.mean(), self.estimate_cnt_icm)
self.tb_logger.add_scalar('icm_reward/icm_reward_min', icm_reward.min(), self.estimate_cnt_icm)
self.tb_logger.add_scalar('icm_reward/icm_reward_std', icm_reward.std(), self.estimate_cnt_icm)
icm_reward = (raw_icm_reward - raw_icm_reward.min()) / (raw_icm_reward.max() - raw_icm_reward.min() + 1e-8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why norm twice here

dizoo/minigrid/__init__.py Show resolved Hide resolved
@@ -171,8 +174,12 @@ def __init__(self, config: EasyDict, device: str, tb_logger: 'SummaryWriter') ->
self.ce = nn.CrossEntropyLoss(reduction="mean")
self.forward_mse = nn.MSELoss(reduction='none')
self.reverse_scale = config.reverse_scale
self.res = nn.Softmax(dim=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use softmax here if we only need to sample action by argmax operation

value_weight=0.5,
entropy_weight=0.001,
clip_ratio=0.2,
adv_norm=False,
adv_norm=True,
value_norm=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offppo doesn't have value norm

@@ -151,6 +151,9 @@ class ICMRewardModel(BaseRewardModel):
update_per_collect=100,
# (float) the importance weight of the forward and reverse loss
reverse_scale=1,
intrinsic_reward_weight=0.003, # 1/300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments for each fields in default config

@PaParaZz1 PaParaZz1 merged commit 072370a into opendilab:main Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working env Questions about RL environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants