HER does not converge on simple envs that adhere to GoalEnv interface #428

avaziri · 2018-06-07T22:21:31Z

System information

OS Platform and Distribution: Ubuntu 16.04.3 LTS
Gym version: 0.10.5
Baselines Head Commit- 9cb7ece
Exact command to reproduce: See detailed description

Describe the problem

I have not been able to get HER to converge on any environment other than the Mujoco environments. I have concerns about the robustness and reproducibility of the algorithm. Even with hyperparameter tuning HER fails to converge on all the simple environments I have tried.
Here is a graph of test progress on FetchPush-v1 with command python -m baselines.her.experiment.train --env_name=FetchPush-v1:

Here is a graph of test progress on the much more simple environment I made. I will give the source, and a description of the environment in a follow up post. It was run with command python -m baselines.her.experiment.train --env_name ContinuousGoalOrientedMoveToPoint-v0:

Result: Over 50 epochs it seems any learning is negligible. I am confident the environment is functioning properly. I have confirmed that DDPG can solve this environment very well with less than 20 epochs of training. I have also made a keyboard agent and played this environment myself; observing that everything works as expected.

Hypothesis About Cause of Problem

Perhaps there is something special about Mujoco environments which is not captured in the top level GoalEnv interface (i.e. the problem is in gym)
Perhaps there is a bug in HER that is masked when using Mujoco environment (i.e. the problem is in baselines)

The text was updated successfully, but these errors were encountered:

avaziri · 2018-06-07T22:21:44Z

ContinuousGoalOrientedMoveToPoint-v0 Environment

Env Summary:

The goal of this environment is to move the agent (a red dot) to touch the goal (a greev dot). The agent has momentum and will bounce off of walls if it touches one.

Gif of Environment

Please excuse the inconsistent frame rate, I was controlling the agent with a keyboard and did not hit the keys at a constant frequency.

Env Details

State Space: Positions in a unit square, velocities between -0.05 and +0.05.
Goal Space: Any point in the unit square, re-sampled each episode
Initial State Space: A point in the unit square, re-sampled each episode as well as a velocity within the velocity bounds re-sampled each episode
Action Space: X force component, Y force component, between -1 and 1
Reward: 1 for reaching goal, 0 otherwise
Terminal Condition: When the goal is achieved or when the time limit is reached (time limit comes from wrapper)

Source code

continuous_goal_oriented_particle.py.gz
Make sure to register the environment with a time limit.

from gym.envs.registration import register

def register_move_to_point_env():
	register(
		id='ContinuousGoalOrientedMoveToPoint-v0',
		entry_point='baselines.her.experiment.continuous_goal_oriented_particle:ContinuousGoalOrientedMoveToPoint',
		max_episode_steps=250,
		reward_threshold=1,)

avaziri · 2018-06-07T23:06:49Z

@mandrychowicz and/or other developers
I know it is not your job to solve random environments people propose to you, but I hope you can look into this one. I have spent a lot of time debugging it, and am only asking you after exhausting my own ideas. I feel that making sure this environment works is worth your time. It provides value to users by giving them (1) At least one goal oriented environment that does not require a mujoco license (2) A proof positive that the OpenAI gym GoalEnv interface is sufficient to make your own goal oriented environment.

I would love to use HER and continue to build on it at my company. Unfortunately this simple environment which was meant to be used for unit tests does not want to solve.

avaziri · 2018-06-08T23:43:07Z

Update: My coworker was able to solve the ContinuousGoalOrientedMoveToPoint environment with 100% success rate with TDM, which makes me more confident that this is a problem with Baselines HER and not the environment.

avaziri · 2018-06-11T18:34:57Z

I was unable to get HER to solve the ContinuousMoveToPointEnvironment with hyper-parameter tuning. I manually varied the following one at a time to no effect:

number of layers
hidden network size
critic learning rate
actor learning rate
batch_size
replay_k

Here you can see a graph of the runs which are all noisy measurements with mean ~0.25

iSaran · 2018-09-11T13:52:01Z

@avaziri I also have some problems solving my environments with HER, but still I'm not sure if it's the HER's implementation fault, or mine. Did you have any progress with identifying the problem with your environment?

filipolszewski · 2018-10-17T06:53:38Z

Did you try to use -1 reward for not reaching a goal and 0 reward for reaching it? Also, your environment seems easier then FetchPush - maybe try to use 1 hidden layer network with 20-50 hidden neurons and batch size of 8, maybe 16? Try to run this with HER probability 0.0 as well. I am very interested what will happen.

Did you inspect how the agent moves around the env after several epochs of learning? I would be interested what is his behaviour - is it totally random, does it have any kind of pattern etc.

* Update dependencies * Fix for numpy v1.17.0 * Downgrade pytest version for travis * Update .travis.yml * Rollback to previous docker image * Update .travis.yml * Trying to upgrade pytest * Try to ignore pytest warning * Update setup.cfg * Use full name to ignore pytest warning * Correct import * Remove gym and tf warnings * Add TD3 to tensorboard tests * Ignore additional gym warning * Get rid of additional tf warning * Re-enable new docker image * Test with different tf version * Upgrade pytest * Upgrade tf to 1.13.2 * Try downgrading tf * Move Travis CI test to separate bash file * Try splitting up test suite * Diagnostic: echo test glob * Avoid unintended wildcard * Upgrade TF to 1.13.2, move scripts to subdirectory * Split up tests to keep them <20m long * Try thread-safe SubprocVecEnv * Disable non-thread safe start methods in tests; document default change * Rebalance tests * Rebalance some more * Rebalance, hopefully for the last time * Fix globs * Update docker cpu image: add coverage reporter for travis * Codacy partial upload * Bump Docker image version * Make Travis read environment variable * Pass project token in * Remove pip install and fix coverage final report

avaziri mentioned this issue Jun 7, 2018

HER does not converge on simple envs that adhere to GoalEnv interface openai/gym#1064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HER does not converge on simple envs that adhere to GoalEnv interface #428

HER does not converge on simple envs that adhere to GoalEnv interface #428

avaziri commented Jun 7, 2018 •

edited

Loading

avaziri commented Jun 7, 2018 •

edited

Loading

avaziri commented Jun 7, 2018 •

edited

Loading

avaziri commented Jun 8, 2018

avaziri commented Jun 11, 2018

iSaran commented Sep 11, 2018

filipolszewski commented Oct 17, 2018

HER does not converge on simple envs that adhere to GoalEnv interface #428

HER does not converge on simple envs that adhere to GoalEnv interface #428

Comments

avaziri commented Jun 7, 2018 • edited Loading

System information

Describe the problem

Hypothesis About Cause of Problem

avaziri commented Jun 7, 2018 • edited Loading

ContinuousGoalOrientedMoveToPoint-v0 Environment

Env Summary:

Gif of Environment

Env Details

Source code

avaziri commented Jun 7, 2018 • edited Loading

avaziri commented Jun 8, 2018

avaziri commented Jun 11, 2018

iSaran commented Sep 11, 2018

filipolszewski commented Oct 17, 2018

avaziri commented Jun 7, 2018 •

edited

Loading

avaziri commented Jun 7, 2018 •

edited

Loading

avaziri commented Jun 7, 2018 •

edited

Loading