[bug] HER not working in environments that have max_step = 1 #578

renan-cunha · 2019-11-23T21:16:03Z

Describe the bug
HER is not working with environments that have a max step equal to 1. For example, on a modified version of BitFlipping where the agent can make only one action. The desired goal never changes on the "compute_reward()" method. I think it should use the achieved_goal sometimes as the desired one so it can compute a good reward and the agent can learn something. The behavior is ok with the default version of BitFlipping

desired goal used

Code example
code to run

Modified part of BitFlipping, the 'step' method always returns True as the 'done' variable

    def step(self, action):
        if self.continuous:
            self.state[action > 0] = 1 - self.state[action > 0]
        else:
            self.state[action] = 1 - self.state[action]
        obs = self._get_obs()
        reward = self.compute_reward(obs['achieved_goal'], obs['desired_goal'], None)
        done = reward == 0
        self.current_step += 1
        # Episode terminate when we reached the goal or the max number of steps
        info = {'is_success': done}
        done = done or self.current_step >= self.max_steps
        return obs, reward, True, info

System Info
Describe the characteristic of your environment:

lib installed by pip
using CPU
python 3.6.8
tensorflow 1.14.1
gym 0.15.4
stable baselines 2.7.0

araffin · 2020-02-02T21:14:59Z

Hello,
Thanks for reporting the issue. After looking more closely, I think this is normal for the future strategy.
As mentioned in the code:

stable-baselines/stable_baselines/her/replay_buffer.py

Lines 155 to 158 in 4fada47

    
           # We cannot sample a goal from the future in the last step of an episode 
        
           if (transition_idx == len(self.episode_transitions) - 1 and 
        
                   self.goal_selection_strategy == GoalSelectionStrategy.FUTURE): 
        
               break

However, using the final strategy, it samples different goal but does not seem to learn much...

EDIT: it is normal that it cannot succeed all the time for N_BITS > n_steps because it does not have enough time to flip all the bits

araffin added the custom gym env Issue related to Custom Gym Env label Nov 23, 2019

nicoguertler mentioned this issue Feb 2, 2020

[question] HER does not sample very last state of episode as achieved goal #666

Open

araffin closed this as completed May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] HER not working in environments that have max_step = 1 #578

[bug] HER not working in environments that have max_step = 1 #578

renan-cunha commented Nov 23, 2019

araffin commented Feb 2, 2020 •

edited

Loading

[bug] HER not working in environments that have max_step = 1 #578

[bug] HER not working in environments that have max_step = 1 #578

Comments

renan-cunha commented Nov 23, 2019

araffin commented Feb 2, 2020 • edited Loading

araffin commented Feb 2, 2020 •

edited

Loading