[RLlib] Fix type hints for `original_batches` in callbacks. #24214

XuehaiPan · 2022-04-26T10:29:07Z

Why are these changes needed?

Fix type hints for original_batches in on_postprocess_trajectory.

Related issue number

N/A

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

gjoliver · 2022-04-26T22:04:53Z

rllib/agents/callbacks.py

@@ -196,7 +196,7 @@ def on_postprocess_trajectory(
        policy_id: PolicyID,
        policies: Dict[PolicyID, Policy],
        postprocessed_batch: SampleBatch,
-        original_batches: Dict[AgentID, SampleBatch],
+        original_batches: Dict[AgentID, Tuple[Policy, SampleBatch]],


what, that's surprising ...
how can we mix these 2 things together?

ray/rllib/evaluation/collectors/simple_list_collector.py

Lines 824 to 832 in 18c269c

pre_batches = {}

for (eps_id, agent_id), collector in self.agent_collectors.items():

# Build only if there is data and agent is part of given episode.

if collector.agent_steps == 0 or eps_id != episode_id:

continue

pid = self.agent_key_to_policy_id[(eps_id, agent_id)]

policy = self.policy_map[pid]

pre_batch = collector.build(policy.view_requirements)

pre_batches[agent_id] = (policy, pre_batch)

pre_batches is a dict with value type Tuple[Policy, SampleBatch]. Then it is fed to on_postprocess_trajectory as name original_batches.

ray/rllib/evaluation/collectors/simple_list_collector.py

Lines 913 to 925 in 18c269c

for agent_id, post_batch in sorted(post_batches.items()):

agent_key = (episode_id, agent_id)

pid = self.agent_key_to_policy_id[agent_key]

policy = self.policy_map[pid]

self.callbacks.on_postprocess_trajectory(

worker=get_global_worker(),

episode=episode,

agent_id=agent_id,

policy_id=pid,

policies=self.policy_map,

postprocessed_batch=post_batch,

original_batches=pre_batches,

)

oh it's Tuple, not Union, I got scared.
Thanks!

:) @gjoliver

I think there is only very few places in RLlib where we mix different types e.g. in a return value (for example in the sampler code _process_observations()) and no, we probably shouldn't do this.

gjoliver

@sven1977 can you help merge?

sven1977 · 2022-04-29T08:33:08Z

Nice fix. Thanks @XuehaiPan , really appreciate all your help on RLlib :)

Fix type hints for original_batches

8611727

Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>

XuehaiPan requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst and smorad as code owners April 26, 2022 10:29

gjoliver reviewed Apr 26, 2022

View reviewed changes

gjoliver approved these changes Apr 27, 2022

View reviewed changes

sven1977 changed the title ~~[RLlib] Fix type hints for original_batches~~ [RLlib] Fix type hints for original_batches. Apr 29, 2022

sven1977 changed the title ~~[RLlib] Fix type hints for original_batches.~~ [RLlib] Fix type hints for original_batches in callbacks. Apr 29, 2022

sven1977 merged commit 3c3dd50 into ray-project:master Apr 29, 2022

XuehaiPan deleted the fix-typehint-original_batches branch August 23, 2022 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix type hints for `original_batches` in callbacks. #24214

[RLlib] Fix type hints for `original_batches` in callbacks. #24214

XuehaiPan commented Apr 26, 2022

gjoliver Apr 26, 2022

XuehaiPan Apr 27, 2022

gjoliver Apr 27, 2022

sven1977 Apr 29, 2022

gjoliver left a comment

sven1977 commented Apr 29, 2022

	pre_batches = {}
	for (eps_id, agent_id), collector in self.agent_collectors.items():
	# Build only if there is data and agent is part of given episode.
	if collector.agent_steps == 0 or eps_id != episode_id:
	continue
	pid = self.agent_key_to_policy_id[(eps_id, agent_id)]
	policy = self.policy_map[pid]
	pre_batch = collector.build(policy.view_requirements)
	pre_batches[agent_id] = (policy, pre_batch)

	for agent_id, post_batch in sorted(post_batches.items()):
	agent_key = (episode_id, agent_id)
	pid = self.agent_key_to_policy_id[agent_key]
	policy = self.policy_map[pid]
	self.callbacks.on_postprocess_trajectory(
	worker=get_global_worker(),
	episode=episode,
	agent_id=agent_id,
	policy_id=pid,
	policies=self.policy_map,
	postprocessed_batch=post_batch,
	original_batches=pre_batches,
	)

[RLlib] Fix type hints for original_batches in callbacks. #24214

[RLlib] Fix type hints for original_batches in callbacks. #24214

Conversation

XuehaiPan commented Apr 26, 2022

Why are these changes needed?

Related issue number

Checks

gjoliver Apr 26, 2022

Choose a reason for hiding this comment

XuehaiPan Apr 27, 2022

Choose a reason for hiding this comment

gjoliver Apr 27, 2022

Choose a reason for hiding this comment

sven1977 Apr 29, 2022

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

sven1977 commented Apr 29, 2022

[RLlib] Fix type hints for `original_batches` in callbacks. #24214

[RLlib] Fix type hints for `original_batches` in callbacks. #24214