Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CategoricalGRUPolicy InputLayer dimensionality #59

Closed
windweller opened this issue Nov 23, 2016 · 9 comments
Closed

CategoricalGRUPolicy InputLayer dimensionality #59

windweller opened this issue Nov 23, 2016 · 9 comments

Comments

@windweller
Copy link

windweller commented Nov 23, 2016

Hi,

I guess I'm a bit confused by this:

l_input = L.InputLayer(
                shape=(None, None, input_dim),
                name="input")

CategoricalGRUPolicy is a vectorized policy, so a VectorizedSampler is created, it creates 12 duplicate environments and sample from them. So assume the observation_space is (10), then what vec_env can observe is shaped as (12, 10): (n_env, observation_space.n)

But the InputLayer for CategoricalGRUPolicy is shaped as (None, None, input_dim)
Where's the extra dimension coming from?


The observation_space is Box type, and the flatten_n in get_actions(): flat_obs = self.observation_space.flatten_n(observations) doesn't add a dimension to it.

@dementrock
Copy link
Member

Hi, the dimension layout is (batch size, time, obs dim). The extra dimension is needed to construct a computation graph for the entire history. This is only used in the optimization phase but not during execution of the policy.

@windweller
Copy link
Author

So a easy solution is to take (12, 10) / (n_env, obs_dim) / (batch_size, obs_dim) - the current obses shape returned by vec_env, and expand to (12, 1, 10) right? (Add 1 time dimension to it)

@dementrock
Copy link
Member

What's the use case?

@windweller
Copy link
Author

windweller commented Nov 25, 2016

I'm adding a custom environment of one of the text task from OpenAI Gym (character copy, but slightly varied), so the environment at each step, returns an embedding of character (10-dim).

When I dig into the VectorizedSampler, and use CategoricalGRUPolicy, it creates an array of n environments and when step() is called on vec_env, it returns shape of (12, 10).

@dementrock
Copy link
Member

The current code is supposed to work with recurrent policies already. Did you notice anything broken?

@windweller
Copy link
Author

windweller commented Nov 28, 2016

@ahhh! My mistake, current code does work! Sorry for the trouble!! :(

Can I ask a different question?

The "Copy-v0" environment has its action-space as a tuple:

self.action_space = Tuple(
            [Discrete(len(self.MOVEMENTS)), Discrete(2), Discrete(self.base)]
        )

and it seems like convert_gym_space or to_tf_space can't handle this type of Tupled action_space...is there a good way to get around this?

https://github.com/openai/gym/blob/master/gym/envs/algorithmic/algorithmic_env.py

@dementrock
Copy link
Member

Hmm, I think both of these should support this scenario. Did you see an error? A CategoricalGRUPolicy won't be able to handle this kind of action space though since you want to apply separate softmax to each group of the actions, so you need to customize the nonlinearity applied to the output.

@windweller
Copy link
Author

import gym
from sandbox.rocky.tf.envs.base import TfEnv
from sandbox.rocky.tf.envs.vec_env_executor import VecEnvExecutor

env = gym.make("Copy-v0")
env = TfEnv(env)

config = {
    "max_seq_len": 10,  
    "batch_size": 128,
}

n_envs = int(config["batch_size"] / config["max_seq_len"])
n_envs = max(1, min(n_envs, 100))

envs = [env for _ in range(n_envs)]
vec_env = VecEnvExecutor(
    envs=envs,
    max_path_length=config["max_seq_len"]
)

Error is:

[2016-11-28 11:32:49,682] Making new env: Copy-v0
Traceback (most recent call last):
  File "exps/text_env_test.py", line 39, in <module>
    max_path_length=config["max_seq_len"]
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/vec_env_executor.py", line 11, in __init__
    self._action_space = envs[0].action_space
  File "/usr/local/lib/python2.7/site-packages/cached_property.py", line 26, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/base.py", line 40, in action_space
    return to_tf_space(self.wrapped_env.action_space)
  File "/Users/xxx/Documents/rllab/sandbox/rocky/tf/envs/base.py", line 20, in to_tf_space
    raise NotImplementedError
NotImplementedError

@dementrock
Copy link
Member

I see. The recommended way to use gym environments is via GymEnv: https://github.com/openai/rllab/blob/master/rllab/envs/gym_env.py.

jonashen pushed a commit to jonashen/rllab that referenced this issue May 29, 2018
…ll#59)

The selection of the regular TensorFlow package or GPU-
accelerated TensorFlow was added into the setup_linux
script.
Although no error is produced if both packages are installed,
the GPU package won't work if the regular package is installed
as well. Therefore, both packages were removed from the conda
environment file.
In the setup_linux script, the selection of the TF package is
done after the environment is created, leaving the regular
release as the default package. Two guards were added for the
creation and update of the conda environment to verify if their
execution was successful or else exit the script with an error
message.
The TF version was set to 1.8 which is the most recent release
that works for both the GPU and regular version.
As a side note, the prettytensor package removed from the conda
environment has the regular TF package as a dependency, and
prettytensor is currently not used in the rllab project, so
it was replaced by the selection of the regular TF package
in the setup script.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants