[RLlib] Exploration API: Policy changes needed for forward pass noisifications. #7798

sven1977 · 2020-03-29T17:06:54Z

This PR contains the following Policy changes to allow different types of forward-pass noisifications:

All major Policy classes (tf, eager, torch) allow for setting up action sampling behavior in one of 3 ways:
- action_sampler_fn: Fully customized action sampling behavior, returning only actions and logp. If provided, the Policy will not automatically use its Exploration object. Exploration has to be done (customized) inside the action_sampler_fn!
- action_distribution_fn: Customized distribution-inputs and -class generator. Policy's Exploration is used automatically (before_compute_action and get_exploration_action).
- None of the above: Action-dist input and -class are determined in default fashion: forward pass through model.

This removes the need for a log_likelihood_fn and thus simplifies DQN and SAC.

All Policy.compute_actions() methods now return (by default and if available) the extra_action_fetches key:
ACTION_DIST_INPUTS
This is currently only used by the ParameterNoise class (but should be useful information for other cases as well).
The ActionDistribution object is now passed directly into Exploration.get_exploration_action(). Before, distribution-inputs and distribution-class were passed in separately.
Each Exploration class requires the Model upon construction. Hence the Model is not longer passed into e.g. Exploration.get_exploration_action. This requires the Model being generated before the Exploration object in all Policy classes.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested (please justify below)

…oration_API_parameter_noise_api_only � Conflicts: � rllib/policy/policy.py � rllib/tests/test_checkpoint_restore.py

…oration_API_parameter_noise_api_only � Conflicts: � rllib/agents/dqn/tests/test_dqn.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

…oration_api_minimal_param_noise

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

…com/sven1977/ray into exploration_api_minimal_param_noise � Conflicts: � rllib/utils/exploration/exploration.py

…oration_api_minimal_param_noise

ericl · 2020-03-30T23:46:49Z

rllib/examples/rock_paper_scissors_multiagent.py

@@ -95,17 +92,8 @@ def compute_actions(self,
                        **kwargs):
        return list(state_batches[0]), state_batches, {}

-    def learn_on_batch(self, samples):


Consider keeping these, there are there to make it clear these are no-ops.

ericl · 2020-03-30T23:47:06Z

rllib/examples/rock_paper_scissors_multiagent.py

-    run_heuristic_vs_learned(use_lstm=False)
-    # run_with_custom_entropy_loss()
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--stop", type=int, default=1000)


Consider keeping parser args at the top of the file by convention.

ericl · 2020-03-30T23:47:52Z

rllib/examples/rock_paper_scissors_multiagent.py

+    run_same_policy(args)
+    run_heuristic_vs_learned(args, use_lstm=True)
+    run_heuristic_vs_learned(args, use_lstm=False)
+    run_with_custom_entropy_loss(args)


Hmm I can see this being a bit confusing to run as an example since there are four different runs in the output.

Could we at least add print()s in between?

Yeah, I just fixed that test case. For some reason, 3 of these have always been commented out and weren't working. I'll add the print()s.

rllib/policy/dynamic_tf_policy.py

ericl · 2020-03-30T23:56:45Z

rllib/policy/eager_tf_policy.py

+                        timestep=timestep)
+                else:
+                    # Exploration hook before each forward pass.
+                    self.exploration.before_compute_actions(


Shouldn't the hook be called in all cases including action_sampler_fn?

Not sure. I was thinking of action_sampler_fn as a completely custom way of doing things. So the user would have to apply exploration him/herself.
That being said: No one is using action_sampler_fn right now, anyway, so I guess it doesn't matter much.

ericl

Looks good, a few minor comments.

sven1977 · 2020-03-31T06:51:05Z

Cool, thanks! Will fix these.

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

AmplabJenkins · 2020-03-31T10:27:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23998/
Test FAILed.

…://github.com/sven1977/ray into exploration_api_minimal_param_noise_2nd_stage

…oration_api_minimal_param_noise_2nd_stage

AmplabJenkins · 2020-03-31T11:49:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24023/
Test FAILed.

AmplabJenkins · 2020-03-31T12:34:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24024/
Test PASSed.

sven1977 · 2020-03-31T13:43:24Z

@ericl This can be merged now. Tests are all ok.

AmplabJenkins · 2020-03-31T14:11:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24030/
Test PASSed.

ericl

LGTM. looks like some merge conflicts though.

…oration_api_minimal_param_noise_2nd_stage � Conflicts: � rllib/policy/tf_policy.py � rllib/policy/torch_policy.py � rllib/policy/torch_policy_template.py

sven1977 · 2020-04-01T06:09:54Z

Merged, waiting for re-testing.

sven1977 · 2020-04-01T06:58:19Z

@ericl Please merge. Re-tests ok after merging.

AmplabJenkins · 2020-04-01T07:30:39Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24077/
Test FAILed.

sven1977 and others added 30 commits March 19, 2020 11:33

Rollback.

cbb64ee

Merge branch 'master' of https://github.com/ray-project/ray

12208a5

Merge branch 'master' of https://github.com/ray-project/ray

12f2c65

Merge branch 'master' of https://github.com/ray-project/ray

c1e8be7

WIP.

48f7fed

WIP.

ea71e31

LINT.

2935256

Merge branch 'master' of https://github.com/ray-project/ray into expl…

07a825d

…oration_API_parameter_noise_api_only � Conflicts: � rllib/policy/policy.py � rllib/tests/test_checkpoint_restore.py

WIP.

344b9f3

Fix.

afbc6af

Fix.

84fa0c8

Fix.

3389af2

LINT.

c818e62

Fix (SAC does currently not support eager).

be9bd29

Fix.

7471006

Merge branch 'master' of https://github.com/ray-project/ray into expl…

870e227

…oration_API_parameter_noise_api_only � Conflicts: � rllib/agents/dqn/tests/test_dqn.py

WIP.

628e049

LINT.

d63c0a4

Update rllib/evaluation/sampler.py

de55f4b

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

Update rllib/evaluation/sampler.py

c0932ef

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into expl…

35a5e06

…oration_api_minimal_param_noise

Update rllib/utils/exploration/exploration.py

b995539

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

Update rllib/utils/exploration/exploration.py

23c7412

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

WIP.

69befc4

Merge branch 'exploration_api_minimal_param_noise' of https://github.…

d67e73f

…com/sven1977/ray into exploration_api_minimal_param_noise � Conflicts: � rllib/utils/exploration/exploration.py

WIP.

6682cc9

Fix.

a258a8e

LINT.

acb7163

LINT.

1f08148

Merge branch 'master' of https://github.com/ray-project/ray into expl…

c651880

…oration_api_minimal_param_noise

ericl reviewed Mar 30, 2020

View reviewed changes

rllib/policy/dynamic_tf_policy.py Outdated Show resolved Hide resolved

ericl reviewed Mar 30, 2020

View reviewed changes

rllib/policy/dynamic_tf_policy.py Outdated Show resolved Hide resolved

ericl reviewed Mar 30, 2020

View reviewed changes

rllib/policy/dynamic_tf_policy.py Outdated Show resolved Hide resolved

ericl reviewed Mar 30, 2020

View reviewed changes

sven1977 and others added 3 commits March 31, 2020 08:54

Update rllib/policy/dynamic_tf_policy.py

38919c4

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

Update rllib/policy/dynamic_tf_policy.py

2ec626e

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

Update rllib/policy/dynamic_tf_policy.py

07168a9

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

sven1977 added 3 commits March 31, 2020 12:59

Fixes.

2d50a0b

Merge branch 'exploration_api_minimal_param_noise_2nd_stage' of https…

1580333

…://github.com/sven1977/ray into exploration_api_minimal_param_noise_2nd_stage

Merge branch 'master' of https://github.com/ray-project/ray into expl…

199d790

…oration_api_minimal_param_noise_2nd_stage

LINT.

f92890c

sven1977 mentioned this pull request Mar 31, 2020

[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. #7814

Merged

6 tasks

ericl approved these changes Apr 1, 2020

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into expl…

e076ba4

…oration_api_minimal_param_noise_2nd_stage � Conflicts: � rllib/policy/tf_policy.py � rllib/policy/torch_policy.py � rllib/policy/torch_policy_template.py

WIP.

9efd585

ericl merged commit e153e31 into ray-project:master Apr 1, 2020

sven1977 deleted the exploration_api_minimal_param_noise_2nd_stage branch April 1, 2020 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Exploration API: Policy changes needed for forward pass noisifications. #7798

[RLlib] Exploration API: Policy changes needed for forward pass noisifications. #7798

sven1977 commented Mar 29, 2020 •

edited

Loading

ericl Mar 30, 2020

ericl Mar 30, 2020

sven1977 Mar 31, 2020

sven1977 Mar 31, 2020

ericl Mar 30, 2020

sven1977 Mar 31, 2020

sven1977 Mar 31, 2020

ericl Mar 30, 2020

sven1977 Mar 31, 2020

ericl left a comment

sven1977 commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

sven1977 commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

ericl left a comment

sven1977 commented Apr 1, 2020

sven1977 commented Apr 1, 2020

AmplabJenkins commented Apr 1, 2020

[RLlib] Exploration API: Policy changes needed for forward pass noisifications. #7798

[RLlib] Exploration API: Policy changes needed for forward pass noisifications. #7798

Conversation

sven1977 commented Mar 29, 2020 • edited Loading

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

sven1977 commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

sven1977 commented Mar 31, 2020

AmplabJenkins commented Mar 31, 2020

ericl left a comment

Choose a reason for hiding this comment

sven1977 commented Apr 1, 2020

sven1977 commented Apr 1, 2020

AmplabJenkins commented Apr 1, 2020

sven1977 commented Mar 29, 2020 •

edited

Loading