add flattening logic so ddpg can handle image observations #23

singulaire · 2016-06-30T00:18:39Z

Logic to flatten observations that are not a flat vector, allowing the ddpg algorithm to work with environments that return image data in their observations. cf. #22.

dementrock · 2016-06-30T03:47:10Z

rllab/sampler/utils.py

@@ -14,7 +14,7 @@ def rollout(env, agent, max_path_length=np.inf, animated=False, speedup=1):
    if animated:
        env.render()
    while path_length < max_path_length:
-        a, agent_info = agent.get_action(o)
+        a, agent_info = agent.get_action(env.observation_space.flatten(o))


This would break quite a few things since the get_action method is supposed to receive just the raw actions. Why is this change needed?

without this change I get the following error:

Traceback (most recent call last):
File "/home/tom/rllab/scripts/run_experiment_lite.py", line 103, in
run_experiment(sys.argv)
File "/home/tom/rllab/scripts/run_experiment_lite.py", line 90, in run_experiment
maybe_iter = concretize(data)
File "/home/tom/rllab/rllab/misc/instrument.py", line 898, in concretize
return method(_args, *_kwargs)
File "/home/tom/rllab/rllab/algos/ddpg.py", line 263, in train
self.evaluate(epoch, pool)
File "/home/tom/rllab/rllab/algos/ddpg.py", line 381, in evaluate
max_path_length=self.max_path_length,
File "/home/tom/rllab/rllab/sampler/parallel_sampler.py", line 114, in sample_paths
show_prog_bar=True
File "/home/tom/rllab/rllab/sampler/stateful_pool.py", line 142, in run_collect
result, inc = collect_once(self.G, *args)
File "/home/tom/rllab/rllab/sampler/parallel_sampler.py", line 89, in _worker_collect_one_path
path = rollout(G.env, G.policy, max_path_length)
File "/home/tom/rllab/rllab/sampler/utils.py", line 17, in rollout
a, agent_info = agent.get_action(o)
File "/home/tom/rllab/rllab/policies/deterministic_mlp_policy.py", line 66, in get_action
action = self._f_actions([observation])[0]
File "/home/tom/anaconda2/envs/rllab/lib/python2.7/site-packages/theano/compile/function_module.py", line 784, in call
allow_downcast=s.allow_downcast)
File "/home/tom/anaconda2/envs/rllab/lib/python2.7/site-packages/theano/tensor/type.py", line 178, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "/home/tom/rllab/rllab/misc/ext.py:135" at index 0 (0-based)', 'Wrong number of dimensions: expected 2, got 4 with shape (1, 96, 96, 3).')

Or in short, the policy network is expecting a flat input but getting a non-flat input. An alternative would be to handle flattening inside the get_action function of DeterministicMLPPolicy.

dementrock · 2016-06-30T19:00:58Z

rllab/algos/ddpg.py

@@ -224,7 +224,7 @@ def train(self):
                    self.es_path_returns.append(path_return)
                    path_length = 0
                    path_return = 0
-                action = self.es.get_action(itr, observation, policy=sample_policy)  # qf=qf)
+                action = self.es.get_action(itr, self.env.observation_space.flatten(observation), policy=sample_policy)  # qf=qf)


Just to be a little consistent, could this be changed to passing the raw observations to exploration strategy? (shouldn't require any other changes, as I believe that none of the exploration strategies implemented make use of the observations...)

dementrock · 2016-07-01T05:27:41Z

Thanks, merged!

add flattening logic so ddpg can handle image observations

dc9decf

singulaire mentioned this pull request Jun 30, 2016

Problem running OpenAI Gym environments with image observations #22

Closed

dementrock reviewed Jun 30, 2016
View reviewed changes

move flattening logic to DeterministicMLPPolicy

3b09276

dementrock reviewed Jun 30, 2016
View reviewed changes

remove unnecessary flattening

f9f73e7

dementrock merged commit 327446a into rll:master Jul 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add flattening logic so ddpg can handle image observations #23

add flattening logic so ddpg can handle image observations #23

singulaire commented Jun 30, 2016

dementrock Jun 30, 2016

singulaire Jun 30, 2016

dementrock Jun 30, 2016

dementrock commented Jul 1, 2016

add flattening logic so ddpg can handle image observations #23

add flattening logic so ddpg can handle image observations #23

Conversation

singulaire commented Jun 30, 2016

dementrock Jun 30, 2016

Choose a reason for hiding this comment

singulaire Jun 30, 2016

Choose a reason for hiding this comment

dementrock Jun 30, 2016

Choose a reason for hiding this comment

dementrock commented Jul 1, 2016