[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; #21773

sven1977 · 2022-01-21T10:23:11Z

Move bandit algorithms into main rllib.agents folder (from rllib/contrib): LinTS and LinUCB
Rename registry names for those algos from "contrib/Lin[TS|UCB]" into "BanditLin[TS|UCB]".
Cleanup ResSim environment adapter and create 3 out-of-the-box RLlib-ready RecSim environments (interest evolution, long-term-satisfaction, and interest-exploration).
Simple compilation tests for Bandits.

TODO (follow up PR):

Run Benchmarks to use for hard-task nightly/weekly learning tests.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…its_become_1st_class_citizens

richardliaw · 2022-01-21T17:32:33Z

rllib/agents/bandits/bandit_torch_policy.py

@@ -81,14 +78,14 @@ def make_model_and_action_dist(policy, obs_space, action_space, config):

    # TODO: Have a separate model catalogue for bandits
    if exploration_config:
-        if exploration_config["type"] == TS_PATH:
+        if exploration_config["type"] == "ThompsonSampling":


maybe make this backwards compatible?

Not sure this would work: The exploration type is now a single "ThompsonSampling" string as we moved the ThompsonSampling class into our built-in RLlib utils/exploration package.

The two supported exploration types are pretty much hard-coded into these two bandit Trainers (UCB and TS).
Also, the old TS_PATH ( -> ThompsonSampling path) does no longer exist as it was inside the contrib folder.

OK, this may not be the right level to do backward compat messaging. Can you make sure that users code from previous ray versions will automatically raise an actionable error message to help them migrate to new changes?

This is done through the added "location" check. The new test_bandits test case makes sure this works and users will get a deprecation message when they do stuff like:

tune.run("contrib/LinTS") # OR from ray.rllib.contrib.bandits.agents.lin_ts import LinTSTrainer

avnishn

LGTM, left a few comments, and asked clarifying questions, etc. Thanks for doing this, and all the hard work.

avnishn · 2022-01-21T18:49:13Z

rllib/agents/bandits/bandit_torch_policy.py

@@ -81,14 +78,14 @@ def make_model_and_action_dist(policy, obs_space, action_space, config):

    # TODO: Have a separate model catalogue for bandits
    if exploration_config:
-        if exploration_config["type"] == TS_PATH:
+        if exploration_config["type"] == "ThompsonSampling":


avnishn · 2022-01-21T18:49:34Z

rllib/agents/bandits/bandit_torch_policy.py

            if isinstance(original_space, spaces.Dict):
                assert "item" in original_space.spaces, \
                    "Cannot find 'item' key in observation space"
                model_cls = ParametricLinearModelThompsonSampling
            else:
                model_cls = DiscreteLinearModelThompsonSampling
-        elif exploration_config["type"] == UCB_PATH:
+        elif exploration_config["type"] == "UpperConfidenceBound":


same here in terms of keeping backwards compatible

Please see my comment above.

avnishn · 2022-01-21T18:51:12Z

rllib/agents/bandits/tests/LinTS_train_wheel_env.py

@@ -32,7 +32,7 @@ def plot_model_weights(means, covs):
 if __name__ == "__main__":
    num_iter = 10
    print("Running training for %s time steps" % num_iter)
-    trainer = LinTSTrainer(env=WheelBanditEnv)
+    trainer = BanditLinTSTrainer(env=WheelBanditEnv)


Is this a toy environment? Asking for clarification, not w.r.t the PR.

Yes, this is one of our "bandit-friendly" example envs, now in the rllib/examples/env/... folder.

avnishn · 2022-01-21T18:51:48Z

rllib/agents/bandits/tests/test_bandits.py

+            # Force good learning behavior (this is a very simple env).
+            self.assertTrue(results["episode_reward_mean"] == 10.0)
+
+    def test_deprecated_locations(self):


I like this

avnishn · 2022-01-21T18:53:24Z

rllib/agents/bandits/tests/test_bandits.py

+    def tearDownClass(cls) -> None:
+        ray.shutdown()
+
+    def test_bandit_lin_ts_compilation(self):


Do you think its worth it if we started writing some unit tests for algorithms that test whether the loss functions, etc have the right outputs? Its extra work, but I found in the past that by writing unit tests like these, they heavily reduced the time that it took to debug later performance regressions. That, and it was a good exercise for building knowledge on the algorithms that we were writing.

I can definitely start this pattern on my own whenever I implement an algorithm next.

+1
not necessarily this pr though.
like if we add tf impl for bandit, we can write unit tests to make sure those loss funcs are the same.

Great idea @avnishn . Some of our algos do have these tests, e.g. ppo, dqn, pg, marwil. But not all. Yes, we should ideally have these for all algos. I agree that it helps to reduce bugs at a small cost!

avnishn · 2022-01-21T18:55:45Z

rllib/agents/bandits/tests/tune_LinUCB_train_recommendation.py

-        "contrib/LinUCB",
-        config=UCB_CONFIG,
+        "BanditLinUCB",
+        config=config,
        stop={"training_iteration": training_iterations},


could we add a reward based stopping criteria whenever we finish benchmarking these?

Definitely need to clean up these scripts. These are the original ones from our contributors.

Can we do this in a follow up PR?
Just moved these here, these are not new scripts. Let's go through them in another PR and make sure the Bandits learn properly.

avnishn · 2022-01-21T18:57:39Z

rllib/env/env_context.py

+        Examples:
+             >>> env_ctx = EnvContext({"a": 1, "b": 2}, worker_index=0)
+             >>> env_ctx.set_defaults({"a": -42, "c": 3})
+             >>> print(env_ctx)


I like this

it removes clutter from the environments that we've added in a lot of the example environments at least, if I understand this function correctly.

rllib/env/wrappers/recsim.py

rllib/env/wrappers/tests/test_recsim_wrapper.py

avnishn · 2022-01-21T19:00:20Z

rllib/env/wrappers/tests/test_recsim_wrapper.py

 from ray.rllib.utils.error import UnsupportedSpaceException


 class TestRecSimWrapper(unittest.TestCase):
    def test_observation_space(self):
-        env = make_recsim_env(config={})
+        env = InterestEvolutionRecSimEnv()
        obs = env.reset()


we could also check if an observation that has been sampled from the observation space is contained, although im not sure how good a test that is. We could sampled a few thousand observations and check if they're contained in the obs space (its a relatively cheap operation)

I think we are doing this right below. Only twice, though after the env.reset() and the step().

Added some more checks for the action space test.

Not sure I fully understand what you are aiming at, though. But please feel free to suggest more checks here.

gjoliver · 2022-01-24T09:07:36Z

rllib/agents/bandits/tests/test_bandits.py

+    def tearDownClass(cls) -> None:
+        ray.shutdown()
+
+    def test_bandit_lin_ts_compilation(self):


+1
not necessarily this pr though.
like if we add tf impl for bandit, we can write unit tests to make sure those loss funcs are the same.

gjoliver · 2022-01-24T09:21:49Z

rllib/env/wrappers/recsim.py

+        return multi_action
+
+
+def rllib_gym_wrapper(recsim_gym_env: gym.Env,


recsys_gym_wrapper?

Oh no! Great catch! :D

…its_become_1st_class_citizens # Conflicts: # doc/source/rllib/rllib-algorithms.rst

avnishn

Had some comments about seeding learning tests but otherwise lgtm.

avnishn · 2022-01-26T20:28:35Z

rllib/agents/bandit/tests/test_bandits.py

+            results = None
+            for i in range(num_iterations):
+                results = trainer.train()
+                check_train_results(results)


Can we seed this test.

gjoliver

lgtm

sven1977 added 5 commits January 20, 2022 11:42

wip

4b0dce0

wip.

ec61696

wip

538d322

Merge branch 'master' of https://github.com/ray-project/ray into band…

2b11c8d

…its_become_1st_class_citizens

wip

8954a35

sven1977 changed the title ~~[WIP RLlib] Move bandits into main agents folder.~~ [RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; Jan 21, 2022

sven1977 requested a review from avnishn January 21, 2022 10:32

sven1977 assigned avnishn Jan 21, 2022

sven1977 marked this pull request as ready for review January 21, 2022 10:32

sven1977 requested a review from gjoliver as a code owner January 21, 2022 10:32

richardliaw reviewed Jan 21, 2022

View reviewed changes

avnishn reviewed Jan 21, 2022

View reviewed changes

gjoliver reviewed Jan 24, 2022

View reviewed changes

avnishn mentioned this pull request Jan 24, 2022

[RLlib] Cleanup SlateQ algo; add test + add target Q-net #21827

Merged

6 tasks

sven1977 added 3 commits January 25, 2022 15:00

wip.

b6561c1

wip

1b55f56

Merge branch 'master' of https://github.com/ray-project/ray into band…

1c4bc56

…its_become_1st_class_citizens # Conflicts: # doc/source/rllib/rllib-algorithms.rst

avnishn approved these changes Jan 26, 2022

View reviewed changes

gjoliver approved these changes Jan 26, 2022

View reviewed changes

sven1977 added 4 commits January 26, 2022 21:43

wip.

3492e2a

wip.

7cf9178

wip.

c28036f

wip.

47bf1db

sven1977 merged commit 893536e into ray-project:master Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; #21773

[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; #21773

sven1977 commented Jan 21, 2022 •

edited

Loading

richardliaw Jan 21, 2022

avnishn Jan 21, 2022

sven1977 Jan 25, 2022

richardliaw Jan 25, 2022

sven1977 Jan 26, 2022

avnishn left a comment

avnishn Jan 21, 2022

avnishn Jan 21, 2022

sven1977 Jan 25, 2022

avnishn Jan 21, 2022

sven1977 Jan 25, 2022

avnishn Jan 21, 2022

avnishn Jan 21, 2022

avnishn Jan 21, 2022

gjoliver Jan 24, 2022

sven1977 Jan 24, 2022

avnishn Jan 21, 2022

sven1977 Jan 24, 2022

sven1977 Jan 26, 2022

avnishn Jan 21, 2022

avnishn Jan 21, 2022

avnishn Jan 21, 2022

sven1977 Jan 25, 2022

gjoliver Jan 24, 2022

gjoliver Jan 24, 2022

sven1977 Jan 24, 2022

sven1977 Jan 25, 2022

avnishn left a comment

avnishn Jan 26, 2022

gjoliver left a comment

		return multi_action


		def rllib_gym_wrapper(recsim_gym_env: gym.Env,

[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; #21773

[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; #21773

Conversation

sven1977 commented Jan 21, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

sven1977 commented Jan 21, 2022 •

edited

Loading