-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Stats bug fix: EMA stats w/o window would lead to infinite list mem-leak. #45752
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -266,7 +266,7 @@ def __init__( | |
self._restored_actors = set() | ||
self.add_actors(actors or []) | ||
|
||
# Maps outstanding async requests to the ids of the actors that | ||
# Maps outstanding async requests to the IDs of the actor IDs that | ||
# are executing them. | ||
self._in_flight_req_to_actor_id: Mapping[ray.ObjectRef, int] = {} | ||
|
||
|
@@ -457,7 +457,7 @@ def _fetch_result( | |
calls were fired against. | ||
remote_calls: list of remote calls to fetch. | ||
tags: list of tags used for identifying the remote calls. | ||
timeout_seconds: timeout for the ray.wait() call. Default is None. | ||
timeout_seconds: Timeout (in sec) for the ray.wait() call. Default is None. | ||
return_obj_refs: whether to return ObjectRef instead of actual results. | ||
mark_healthy: whether to mark certain actors healthy based on the results | ||
of these remote calls. Useful, for example, to make sure actors | ||
|
@@ -593,10 +593,9 @@ def foreach_actor( | |
actors "healthy" that respond to the request within `timeout_seconds` | ||
and are currently tagged as "unhealthy". | ||
remote_actor_ids: Apply func on a selected set of remote actors. | ||
timeout_seconds: Ray.get() timeout in seconds. Default is None, which will | ||
block until all remote results have been received. Setting this to 0.0 | ||
makes all the remote calls fire-and-forget, while setting this to | ||
None make them synchronous calls. | ||
timeout_seconds: Time to wait (in seconds) for results. Set this to 0.0 for | ||
fire-and-forget. Set this to None (default) to wait infinitely (i.e. for | ||
synchronous execution). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great description! |
||
return_obj_refs: whether to return ObjectRef instead of actual results. | ||
Note, for fault tolerance reasons, these returned ObjectRefs should | ||
never be resolved with ray.get() outside of the context of this manager. | ||
|
@@ -649,8 +648,10 @@ def foreach_actor_async( | |
"""Calls given functions against each actors without waiting for results. | ||
|
||
Args: | ||
func: A single, or a list of Callables, that get applied on the list | ||
of specified remote actors. | ||
func: A single Callable applied to all specified remote actors or a list | ||
of Callables, that get applied on the list of specified remote actors. | ||
In the latter case, both list of Callables and list of specified actors | ||
must have the same length. | ||
tag: A tag to identify the results from this async call. | ||
healthy_only: If True, applies `func` only to actors currently tagged | ||
"healthy", otherwise to all actors. If `healthy_only=False` and | ||
|
@@ -730,41 +731,44 @@ def foreach_actor_async( | |
return len(remote_calls) | ||
|
||
def _filter_calls_by_tag( | ||
self, tags | ||
self, tags: Union[str, List[str], Tuple[str]] | ||
) -> Tuple[List[ray.ObjectRef], List[ActorHandle], List[str]]: | ||
"""Return all the in flight requests that match the given tags. | ||
"""Return all the in flight requests that match the given tags, if any. | ||
|
||
Args: | ||
tags: A str or a list of str. If tags is empty, return all the in flight | ||
tags: A str or a list/tuple of str. If tags is empty, return all the in | ||
flight requests. | ||
|
||
Returns: | ||
A tuple of corresponding (remote_calls, remote_actor_ids, valid_tags) | ||
|
||
A tuple consisting of a list of the remote calls that match the tag(s), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting. Did we have this all this time? So we can tag certain tasks and check, if these tasks have been worked? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. This was always there, albeit rarely used b/c we don't really have any async algos on Learner API, yet. You can send async requests to the ActorManager with a tag, then - later - fetch the async results from the manager using this tag, kind of as a label to say: I only want these results, the others - even if already ready - I don't care about right now and will fetch them later. |
||
a list of the corresponding remote actor IDs for these calls (same length), | ||
and a list of the tags corresponding to these calls (same length). | ||
""" | ||
if isinstance(tags, str): | ||
tags = {tags} | ||
elif isinstance(tags, (list, tuple)): | ||
tags = set(tags) | ||
else: | ||
raise ValueError( | ||
f"tags must be either a str or a list of str, got {type(tags)}." | ||
f"tags must be either a str or a list/tuple of str, got {type(tags)}." | ||
) | ||
remote_calls = [] | ||
remote_actor_ids = [] | ||
valid_tags = [] | ||
for call, (tag, actor_id) in self._in_flight_req_to_actor_id.items(): | ||
# the default behavior is to return all ready results. | ||
if not len(tags) or tag in tags: | ||
if len(tags) == 0 or tag in tags: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean we throw away all other results not having this |
||
remote_calls.append(call) | ||
remote_actor_ids.append(actor_id) | ||
valid_tags.append(tag) | ||
|
||
return remote_calls, remote_actor_ids, valid_tags | ||
|
||
@DeveloperAPI | ||
def fetch_ready_async_reqs( | ||
self, | ||
*, | ||
tags: Union[str, List[str]] = (), | ||
tags: Union[str, List[str], Tuple[str]] = (), | ||
timeout_seconds: Optional[float] = 0.0, | ||
return_obj_refs: bool = False, | ||
mark_healthy: bool = True, | ||
|
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! More constants! This will reduce our errors everywhere in the lib.