[RLlib] SlateQ training iteration function. #24151

sven1977 · 2022-04-24T17:50:41Z

SlateQ training iteration function.

Set _disable_execution_plan_api=True by default for SlateQ.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

avnishn

I don't see a training iteration function in the diff here -- maybe github is off, but maybe you forgot to upload a commit? LMK

rllib/agents/slateq/slateq_tf_policy.py

rllib/agents/slateq/slateq_torch_policy.py

sven1977 · 2022-04-25T19:05:16Z

@avnishn , we use the exact same training_iteration function as DQN now. That's why I had to add the td_error stats. Makes SlateQ a little more powerful and the code base simpler.

…eq_training_itr

avnishn

This all looks pretty good to me except the global vars that we would use for updating rollout workers -- I guess since sampling is synchronous, my comments don't really matter, so Imma go ahead and approve.

avnishn · 2022-04-28T21:29:21Z

rllib/agents/dqn/dqn.py

+            # Update weights and global_vars - after learning on the local worker -
+            # on all remote workers.
+            global_vars = {
+                "timestep": self._counters[NUM_ENV_STEPS_SAMPLED],


If this is using LR schedule or Entropy schedule, shouldn't this be num agent steps trained?

True. The thing is that NUM_AGENT_STEPS_SAMPLED is always a sum over all (multi) agents. So let's say you have 2 agents in your env and 1 policy, which both these agents map to. Then you would update this policy's timestep counter with the sum of these 2 agents' steps, which would be incorrect (as this count is possibly much larger than env steps; double if the two agents always act at the same time).

So I would leave it for now to reflect our current behavior.
Definitely worth looking into this and maybe provide a better per-policy fix for this.

avnishn · 2022-04-28T21:30:29Z

rllib/agents/dqn/simple_q.py

+        # Update weights and global_vars - after learning on the local worker - on all
+        # remote workers.
+        global_vars = {
+            "timestep": self._counters[NUM_ENV_STEPS_SAMPLED],


same comment here as above

avnishn · 2022-04-28T21:31:15Z

rllib/agents/slateq/slateq.py

    "replay_buffer_config": {
-        "type": "MultiAgentReplayBuffer",
+        # Enable the new ReplayBuffer API.
+        "_enable_replay_buffer_api": True,


is there a reason that you have to set this and its not just the case already?

Does this make Slateq use another replay buffer api?

Yeah, it'll use the new replay buffer API we are currently rolling out across RLlib.

rllib/agents/slateq/slateq_tf_policy.py

…eq_training_itr

sven1977 added 3 commits April 24, 2022 19:45

wip

f4b51b2

wip

e170be9

wip

7560431

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst and smorad as code owners April 24, 2022 17:50

sven1977 assigned avnishn Apr 24, 2022

avnishn reviewed Apr 25, 2022

View reviewed changes

rllib/agents/slateq/slateq_tf_policy.py Show resolved Hide resolved

rllib/agents/slateq/slateq_tf_policy.py Show resolved Hide resolved

rllib/agents/slateq/slateq_tf_policy.py Show resolved Hide resolved

rllib/agents/slateq/slateq_torch_policy.py Show resolved Hide resolved

sven1977 added 3 commits April 27, 2022 22:00

Merge branch 'master' of https://github.com/ray-project/ray into slat…

2fea69a

…eq_training_itr

wip.

f4c196b

Merge branch 'master' of https://github.com/ray-project/ray into slat…

ec5aad1

…eq_training_itr

avnishn approved these changes Apr 28, 2022

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into slat…

d6fcac2

…eq_training_itr

sven1977 requested a review from maxpumperla as a code owner April 29, 2022 07:11

sven1977 added 9 commits April 29, 2022 11:45

Merge branch 'master' of https://github.com/ray-project/ray into slat…

8de202e

…eq_training_itr

Merge branch 'master' of https://github.com/ray-project/ray into slat…

85fb3be

…eq_training_itr

LINT

6429ec0

LINT

0ae938d

LINT

2723036

LINT

9fe7ef2

Merge branch 'master' of https://github.com/ray-project/ray into slat…

70fb6e9

…eq_training_itr

Merge branch 'master' of https://github.com/ray-project/ray into slat…

3adcc09

…eq_training_itr

wip

4aa3407

sven1977 merged commit 539832f into ray-project:master Apr 29, 2022

sven1977 deleted the slateq_training_itr branch June 2, 2023 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] SlateQ training iteration function. #24151

[RLlib] SlateQ training iteration function. #24151

sven1977 commented Apr 24, 2022 •

edited

avnishn left a comment

sven1977 commented Apr 25, 2022

avnishn left a comment

avnishn Apr 28, 2022

sven1977 Apr 29, 2022 •

edited

sven1977 Apr 29, 2022

avnishn Apr 28, 2022

avnishn Apr 28, 2022

sven1977 Apr 29, 2022

[RLlib] SlateQ training iteration function. #24151

[RLlib] SlateQ training iteration function. #24151

Conversation

sven1977 commented Apr 24, 2022 • edited

Why are these changes needed?

Related issue number

Checks

avnishn left a comment

Choose a reason for hiding this comment

sven1977 commented Apr 25, 2022

avnishn left a comment

Choose a reason for hiding this comment

avnishn Apr 28, 2022

Choose a reason for hiding this comment

sven1977 Apr 29, 2022 • edited

Choose a reason for hiding this comment

sven1977 Apr 29, 2022

Choose a reason for hiding this comment

avnishn Apr 28, 2022

Choose a reason for hiding this comment

avnishn Apr 28, 2022

Choose a reason for hiding this comment

sven1977 Apr 29, 2022

Choose a reason for hiding this comment

sven1977 commented Apr 24, 2022 •

edited

sven1977 Apr 29, 2022 •

edited