-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms #3384
Conversation
@@ -164,6 +165,8 @@ def _init_shape(self, obs_space, options): | |||
return (size, ) | |||
|
|||
def transform(self, observation): | |||
if not isinstance(observation, OrderedDict): | |||
observation = OrderedDict(sorted(list(observation.items()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops this is kind of an important bug fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to match the rest of the checks? https://github.com/openai/gym/blob/master/gym/spaces/dict_space.py#L34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, note that that is a check against the space spec, this is sorting the observation dict.
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Very sorry for my wrong previous comment, the issue was simply that I didn't install Ray from your branch. Both DQN and the PPO are working beautifully with the masking. I'm not using the action embeddings, but those are at least working in the example you created. |
That should be fixed in the latest update -- try pulling. |
Test FAILed. |
Ping @richardliaw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tf.boolean_mask(x, tf.logical_not(tf.is_inf(x)))
could be cleaner (for future ref)
@@ -164,6 +165,8 @@ def _init_shape(self, obs_space, options): | |||
return (size, ) | |||
|
|||
def transform(self, observation): | |||
if not isinstance(observation, OrderedDict): | |||
observation = OrderedDict(sorted(list(observation.items()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to match the rest of the checks? https://github.com/openai/gym/blob/master/gym/spaces/dict_space.py#L34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is_inf won't work is because inf is numerically unstable, so we use tf.float32.min instead
What do these changes do?
Add examples of how to work with parametric action spaces (e.g., OpenAI 5 style).
Related issue number
Closes #3364
cc @zegerhoogeboom