CategoricalGRUPolicy InputLayer dimensionality #59

windweller · 2016-11-23T23:04:19Z

Hi,

I guess I'm a bit confused by this:

l_input = L.InputLayer(
                shape=(None, None, input_dim),
                name="input")

CategoricalGRUPolicy is a vectorized policy, so a VectorizedSampler is created, it creates 12 duplicate environments and sample from them. So assume the observation_space is (10), then what vec_env can observe is shaped as (12, 10): (n_env, observation_space.n)

But the InputLayer for CategoricalGRUPolicy is shaped as (None, None, input_dim)
Where's the extra dimension coming from?

The observation_space is Box type, and the flatten_n in get_actions(): flat_obs = self.observation_space.flatten_n(observations) doesn't add a dimension to it.

The text was updated successfully, but these errors were encountered:

dementrock · 2016-11-24T17:24:41Z

Hi, the dimension layout is (batch size, time, obs dim). The extra dimension is needed to construct a computation graph for the entire history. This is only used in the optimization phase but not during execution of the policy.

windweller · 2016-11-24T19:23:06Z

So a easy solution is to take (12, 10) / (n_env, obs_dim) / (batch_size, obs_dim) - the current obses shape returned by vec_env, and expand to (12, 1, 10) right? (Add 1 time dimension to it)

dementrock · 2016-11-25T15:01:30Z

What's the use case?

windweller · 2016-11-25T17:06:19Z

I'm adding a custom environment of one of the text task from OpenAI Gym (character copy, but slightly varied), so the environment at each step, returns an embedding of character (10-dim).

When I dig into the VectorizedSampler, and use CategoricalGRUPolicy, it creates an array of n environments and when step() is called on vec_env, it returns shape of (12, 10).

dementrock · 2016-11-26T17:02:48Z

The current code is supposed to work with recurrent policies already. Did you notice anything broken?

windweller · 2016-11-28T02:22:15Z

@ahhh! My mistake, current code does work! Sorry for the trouble!! :(

Can I ask a different question?

The "Copy-v0" environment has its action-space as a tuple:

self.action_space = Tuple(
            [Discrete(len(self.MOVEMENTS)), Discrete(2), Discrete(self.base)]
        )

and it seems like convert_gym_space or to_tf_space can't handle this type of Tupled action_space...is there a good way to get around this?

https://github.com/openai/gym/blob/master/gym/envs/algorithmic/algorithmic_env.py

dementrock · 2016-11-28T19:29:47Z

Hmm, I think both of these should support this scenario. Did you see an error? A CategoricalGRUPolicy won't be able to handle this kind of action space though since you want to apply separate softmax to each group of the actions, so you need to customize the nonlinearity applied to the output.

windweller · 2016-11-28T19:36:24Z

import gym
from sandbox.rocky.tf.envs.base import TfEnv
from sandbox.rocky.tf.envs.vec_env_executor import VecEnvExecutor

env = gym.make("Copy-v0")
env = TfEnv(env)

config = {
    "max_seq_len": 10,  
    "batch_size": 128,
}

n_envs = int(config["batch_size"] / config["max_seq_len"])
n_envs = max(1, min(n_envs, 100))

envs = [env for _ in range(n_envs)]
vec_env = VecEnvExecutor(
    envs=envs,
    max_path_length=config["max_seq_len"]
)

Error is:

[2016-11-28 11:32:49,682] Making new env: Copy-v0
Traceback (most recent call last):
  File "exps/text_env_test.py", line 39, in <module>
    max_path_length=config["max_seq_len"]
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/vec_env_executor.py", line 11, in __init__
    self._action_space = envs[0].action_space
  File "/usr/local/lib/python2.7/site-packages/cached_property.py", line 26, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/base.py", line 40, in action_space
    return to_tf_space(self.wrapped_env.action_space)
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/base.py", line 20, in to_tf_space
    raise NotImplementedError
NotImplementedError

dementrock · 2016-11-28T20:38:41Z

I see. The recommended way to use gym environments is via GymEnv: https://github.com/openai/rllab/blob/master/rllab/envs/gym_env.py.

…ll#59) The selection of the regular TensorFlow package or GPU- accelerated TensorFlow was added into the setup_linux script. Although no error is produced if both packages are installed, the GPU package won't work if the regular package is installed as well. Therefore, both packages were removed from the conda environment file. In the setup_linux script, the selection of the TF package is done after the environment is created, leaving the regular release as the default package. Two guards were added for the creation and update of the conda environment to verify if their execution was successful or else exit the script with an error message. The TF version was set to 1.8 which is the most recent release that works for both the GPU and regular version. As a side note, the prettytensor package removed from the conda environment has the regular TF package as a dependency, and prettytensor is currently not used in the rllab project, so it was replaced by the selection of the regular TF package in the setup script.

windweller closed this as completed Nov 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CategoricalGRUPolicy InputLayer dimensionality #59

CategoricalGRUPolicy InputLayer dimensionality #59

windweller commented Nov 23, 2016 •

edited

Loading

dementrock commented Nov 24, 2016

windweller commented Nov 24, 2016

dementrock commented Nov 25, 2016

windweller commented Nov 25, 2016 •

edited

Loading

dementrock commented Nov 26, 2016

windweller commented Nov 28, 2016 •

edited

Loading

dementrock commented Nov 28, 2016

windweller commented Nov 28, 2016

dementrock commented Nov 28, 2016

CategoricalGRUPolicy InputLayer dimensionality #59

CategoricalGRUPolicy InputLayer dimensionality #59

Comments

windweller commented Nov 23, 2016 • edited Loading

dementrock commented Nov 24, 2016

windweller commented Nov 24, 2016

dementrock commented Nov 25, 2016

windweller commented Nov 25, 2016 • edited Loading

dementrock commented Nov 26, 2016

windweller commented Nov 28, 2016 • edited Loading

dementrock commented Nov 28, 2016

windweller commented Nov 28, 2016

dementrock commented Nov 28, 2016

windweller commented Nov 23, 2016 •

edited

Loading

windweller commented Nov 25, 2016 •

edited

Loading

windweller commented Nov 28, 2016 •

edited

Loading