[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752

sven1977 · 2024-06-05T12:31:27Z

Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak, slowing down execution (and learning).
Minor cleanups and enhancements, mostly in preparation for upcoming IMPALA on new stack.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 · 2024-06-06T09:47:51Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

@@ -310,7 +310,7 @@ def __call__(
                )
                self.add_n_batch_items(
                    batch=data,
-                    column="loss_mask",
+                    column=Columns.LOSS_MASK,


Great! More constants! This will reduce our errors everywhere in the lib.

simonsays1980 · 2024-06-06T09:49:53Z

rllib/utils/actor_manager.py

-                None make them synchronous calls.
+            timeout_seconds: Time to wait (in seconds) for results. Set this to 0.0 for
+                fire-and-forget. Set this to None (default) to wait infinitely (i.e. for
+                synchronous execution).


Great description!

simonsays1980 · 2024-06-06T09:57:00Z

rllib/utils/actor_manager.py


        Returns:
-            A tuple of corresponding (remote_calls, remote_actor_ids, valid_tags)
-
+            A tuple consisting of a list of the remote calls that match the tag(s),


Interesting. Did we have this all this time? So we can tag certain tasks and check, if these tasks have been worked?

Correct. This was always there, albeit rarely used b/c we don't really have any async algos on Learner API, yet.

You can send async requests to the ActorManager with a tag, then - later - fetch the async results from the manager using this tag, kind of as a label to say: I only want these results, the others - even if already ready - I don't care about right now and will fetch them later.

simonsays1980 · 2024-06-06T10:00:53Z

rllib/utils/actor_manager.py

            )
        remote_calls = []
        remote_actor_ids = []
        valid_tags = []
        for call, (tag, actor_id) in self._in_flight_req_to_actor_id.items():
            # the default behavior is to return all ready results.
-            if not len(tags) or tag in tags:
+            if len(tags) == 0 or tag in tags:


Does this mean we throw away all other results not having this tag? What if we have e.g. 2 tags and we call this function and we need afterwards the two tags separately (maybe results are from different sampling regimes) - cann we distinguish by gthe valid_tags the results?

simonsays1980

LGTM.

…st mem-leak. (ray-project#45752) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

…st mem-leak. (ray-project#45752) Signed-off-by: yucai <yyu1@ebay.com>

…st mem-leak. (ray-project#45752) Signed-off-by: Richard Liu <ricliu@google.com>

wip

4c60e4c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners June 5, 2024 12:31

sven1977 assigned simonsays1980 Jun 5, 2024

sven1977 enabled auto-merge (squash) June 5, 2024 12:54

github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 5, 2024

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 5, 2024

simonsays1980 reviewed Jun 6, 2024

View reviewed changes

simonsays1980 self-requested a review June 6, 2024 10:06

simonsays1980 approved these changes Jun 6, 2024

View reviewed changes

sven1977 merged commit cbb1634 into ray-project:master Jun 6, 2024
6 checks passed

sven1977 deleted the fix_metrics_stats_bug_ema_infinite_growth branch June 6, 2024 11:08

ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite li…

c1e4a44

…st mem-leak. (ray-project#45752) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

yucai pushed a commit to yucai/ray that referenced this pull request Jun 7, 2024

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite li…

4f20ab6

…st mem-leak. (ray-project#45752) Signed-off-by: yucai <yyu1@ebay.com>

yucai pushed a commit to yucai/ray that referenced this pull request Jun 7, 2024

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite li…

08449dd

…st mem-leak. (ray-project#45752) Signed-off-by: yucai <yyu1@ebay.com>

richardsliu pushed a commit to richardsliu/ray that referenced this pull request Jun 12, 2024

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite li…

f2a610b

…st mem-leak. (ray-project#45752) Signed-off-by: Richard Liu <ricliu@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752

sven1977 commented Jun 5, 2024 •

edited

Loading

simonsays1980 Jun 6, 2024

simonsays1980 Jun 6, 2024

simonsays1980 Jun 6, 2024

sven1977 Jun 6, 2024

simonsays1980 Jun 6, 2024

simonsays1980 left a comment

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752

[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752

Conversation

sven1977 commented Jun 5, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 Jun 6, 2024

Choose a reason for hiding this comment

simonsays1980 Jun 6, 2024

Choose a reason for hiding this comment

simonsays1980 Jun 6, 2024

Choose a reason for hiding this comment

sven1977 Jun 6, 2024

Choose a reason for hiding this comment

simonsays1980 Jun 6, 2024

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

sven1977 commented Jun 5, 2024 •

edited

Loading