Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Support prev_state/prev_action in rollout and fix multiagent #4565

Merged
merged 10 commits into from
Apr 10, 2019

Conversation

vladfi1
Copy link
Contributor

@vladfi1 vladfi1 commented Apr 4, 2019

What do these changes do?

Fixes a few issues with rollout.py:

  1. Multiagent envs were not properly handled.
  2. prev_state and prev_action weren't passed in to the agent.

The code has also been simplified and should be more readable.

Closes #4573

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ericl ericl self-assigned this Apr 4, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13577/
Test FAILed.

policy_map = agent.local_evaluator.policy_map
state_init = {p: m.get_initial_state() for p, m in policy_map.items()}
use_lstm = {p: len(s) > 0 for p, s in state_init.items()}
action_init = {
p: m.action_space.sample()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that in the actual code we feed the all-zeros action initially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that make sense? All zeros might not even be in the action space?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, though it would be better to set it to zeros for consistency (or switch the sampler to stick in a random initial action, but that might be weird).

@@ -124,37 +140,46 @@ def rollout(agent, env_name, num_steps, out=None, no_render=True):
while steps < (num_steps or steps + 1):
if out is not None:
rollout = []
state = env.reset()
obs = env.reset()
multi_obs = obs if multiagent else {0: obs}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -124,37 +140,46 @@ def rollout(agent, env_name, num_steps, out=None, no_render=True):
while steps < (num_steps or steps + 1):
if out is not None:
rollout = []
state = env.reset()
obs = env.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the mapping cache be reset as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's appropriate.

@ericl ericl changed the title Rollout fix [rllib] Support prev_state/prev_action in rollout and fix multiagent Apr 5, 2019
policy_agent_mapping = agent.config["multiagent"][
"policy_mapping_fn"]
mapping_cache = {}
else:
policy_agent_mapping = lambda _: DEFAULT_POLICY_ID
Copy link
Contributor

@cclauss cclauss Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13598/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13599/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13605/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Apr 7, 2019

Seems this causes rllib/tests/test_rollout.sh to raise an error:

Traceback (most recent call last):
  File "../rollout.py", line 212, in <module>
    run(args, parser)
  File "../rollout.py", line 105, in run
    rollout(agent, args.env, num_steps, args.out, args.no_render)
  File "../rollout.py", line 183, in rollout
    next_obs, reward, done, _ = env.step(action)
  File "/home/eric/.local/lib/python3.5/site-packages/gym/wrappers/time_limit.py", line 31, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/eric/.local/lib/python3.5/site-packages/gym/envs/atari/atari_env.py", line 68, in step
    action = self._action_set[a]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

@waldroje
Copy link
Contributor

waldroje commented Apr 9, 2019

I believe step() needs to also be contingent on multiagent, i.e.
if multiagent:
next_obs, reward, done, _ = env.step(action)
else:
next_obs, reward, done, _ = env.step(action[_DUMMY_AGENT_ID])

and line #172
agent_states[policy_id] = p_state
but above in the call to agent.compute_action, it is using agent_states[agent_id], and at least in the cartpole_lstm example policy_id != agent_id, so obs and prev_states are not being carried forward.

@vladfi1
Copy link
Contributor Author

vladfi1 commented Apr 9, 2019

Single-agent envs should work now (test_rollout.py passes).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13652/
Test FAILed.

@vladfi1
Copy link
Contributor Author

vladfi1 commented Apr 9, 2019

Test failure doesn't seem related to this PR:

  File "/ray/python/ray/experimental/sgd/test_sgd.py", line 48, in <module>
    all_reduce_alg=args.all_reduce_alg)
  File "/ray/python/ray/experimental/sgd/sgd.py", line 110, in __init__
    shard_shapes = ray.get(self.workers[0].shard_shapes.remote())
  File "/ray/python/ray/worker.py", line 2306, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=118, host=308e9c11b868)
NameError: global name 'FileNotFoundError' is not defined

@cclauss
Copy link
Contributor

cclauss commented Apr 9, 2019

FileNotFoundError was added in Python 3. In Python 2 use OSError instead.

@waldroje
Copy link
Contributor

waldroje commented Apr 9, 2019

Latest version still is updating
agent_states[policy_id] = p_state
yet passing
agent_states[agent_id]
to agent.compute_action....
Don't believe that is correct as it results in the initial state simply being passed repetitively, unless agent_id==policy_id, which was not the case in the cartpole_lstm example I had run...

@vladfi1
Copy link
Contributor Author

vladfi1 commented Apr 9, 2019

@waldroje You are correct, good catch (if only python had types...). Fixed now.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13667/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Apr 10, 2019

Lint unrelated.

@ericl ericl merged commit 74fd3d7 into ray-project:master Apr 10, 2019
@ericl
Copy link
Contributor

ericl commented Apr 10, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants