Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] HER not working in environments that have max_step = 1 #578

Closed
renan-cunha opened this issue Nov 23, 2019 · 1 comment
Closed

[bug] HER not working in environments that have max_step = 1 #578

renan-cunha opened this issue Nov 23, 2019 · 1 comment
Labels
custom gym env Issue related to Custom Gym Env

Comments

@renan-cunha
Copy link

Describe the bug
HER is not working with environments that have a max step equal to 1. For example, on a modified version of BitFlipping where the agent can make only one action. The desired goal never changes on the "compute_reward()" method. I think it should use the achieved_goal sometimes as the desired one so it can compute a good reward and the agent can learn something. The behavior is ok with the default version of BitFlipping

desired goal used

Code example
code to run

Modified part of BitFlipping, the 'step' method always returns True as the 'done' variable

    def step(self, action):
        if self.continuous:
            self.state[action > 0] = 1 - self.state[action > 0]
        else:
            self.state[action] = 1 - self.state[action]
        obs = self._get_obs()
        reward = self.compute_reward(obs['achieved_goal'], obs['desired_goal'], None)
        done = reward == 0
        self.current_step += 1
        # Episode terminate when we reached the goal or the max number of steps
        info = {'is_success': done}
        done = done or self.current_step >= self.max_steps
        return obs, reward, True, info

System Info
Describe the characteristic of your environment:

  • lib installed by pip
  • using CPU
  • python 3.6.8
  • tensorflow 1.14.1
  • gym 0.15.4
  • stable baselines 2.7.0
@araffin
Copy link
Collaborator

araffin commented Feb 2, 2020

Hello,
Thanks for reporting the issue. After looking more closely, I think this is normal for the future strategy.
As mentioned in the code:

# We cannot sample a goal from the future in the last step of an episode
if (transition_idx == len(self.episode_transitions) - 1 and
self.goal_selection_strategy == GoalSelectionStrategy.FUTURE):
break

However, using the final strategy, it samples different goal but does not seem to learn much...

EDIT: it is normal that it cannot succeed all the time for N_BITS > n_steps because it does not have enough time to flip all the bits

@araffin araffin closed this as completed May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env
Projects
None yet
Development

No branches or pull requests

2 participants