-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Client][Proxy] Prevent Logstream from Timing Out when Delays in DataClient #16180
[Client][Proxy] Prevent Logstream from Timing Out when Delays in DataClient #16180
Conversation
|
||
def delay_in_rewrite(input: JobConfig): | ||
import time | ||
time.sleep(6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this fails before the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl Added a check to ensure that logsclient is working! I had previously tested this interactively and forgot that the logclient thread
failing mainly prints a scary warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't think I grok all details about this change. Besides moving the logic in the new create_specific_server()
to run earlier, and skipping not-ready servers in the background _check_processes()
thread, what other changes helps prevent Logstream from timing out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No worries @mwtian ! Moving the creation logic to I prefer to keep the creation logic associated with |
I see. It seems in the existing implementation, |
@mwtian The current failure is that the ray/python/ray/util/client/worker.py Line 463 in e1da31f
|
36981fc
to
431f16e
Compare
Why are these changes needed?
This ensures that the
LogstreamServicer
does not time out fetching a channel when the DataStream initial connection takes a long time.Related issue number
Closes #16178
Checks
scripts/format.sh
to lint the changes in this PR.time.sleep(10)
inrewrite_runtime_env_uris
.