Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] example and docs on how to use parametric actions with DQN / PG algorithms #3384

Merged
merged 22 commits into from
Nov 28, 2018

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Nov 22, 2018

What do these changes do?

Add examples of how to work with parametric action spaces (e.g., OpenAI 5 style).

Related issue number

Closes #3364

cc @zegerhoogeboom

@@ -164,6 +165,8 @@ def _init_shape(self, obs_space, options):
return (size, )

def transform(self, observation):
if not isinstance(observation, OrderedDict):
observation = OrderedDict(sorted(list(observation.items())))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops this is kind of an important bug fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to match the rest of the checks? https://github.com/openai/gym/blob/master/gym/spaces/dict_space.py#L34

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, note that that is a check against the space spec, this is sorting the observation dict.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9538/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9536/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9539/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9540/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9541/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9544/
Test FAILed.

@ericl ericl changed the title [rllib] example and docs on how to use parametric actions with pg algorithms [rllib] example and docs on how to use parametric actions with DQN / PG algorithms Nov 22, 2018
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9546/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9548/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9549/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9550/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9551/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9552/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9554/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9555/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9556/
Test FAILed.

@ericl ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 23, 2018
@zegerhoogeboom
Copy link

zegerhoogeboom commented Nov 23, 2018

Very sorry for my wrong previous comment, the issue was simply that I didn't install Ray from your branch. Both DQN and the PPO are working beautifully with the masking. I'm not using the action embeddings, but those are at least working in the example you created.

@ericl
Copy link
Contributor Author

ericl commented Nov 23, 2018

That should be fixed in the latest update -- try pulling.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9576/
Test FAILed.

@ericl
Copy link
Contributor Author

ericl commented Nov 27, 2018

Ping @richardliaw

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tf.boolean_mask(x, tf.logical_not(tf.is_inf(x))) could be cleaner (for future ref)

@@ -164,6 +165,8 @@ def _init_shape(self, obs_space, options):
return (size, )

def transform(self, observation):
if not isinstance(observation, OrderedDict):
observation = OrderedDict(sorted(list(observation.items())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to match the rest of the checks? https://github.com/openai/gym/blob/master/gym/spaces/dict_space.py#L34

Copy link
Contributor Author

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is_inf won't work is because inf is numerically unstable, so we use tf.float32.min instead

@ericl ericl merged commit f0df97d into ray-project:master Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants