-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Reduce internal Ray Data stack trace output by default #43251
Conversation
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
By default, the stack trace for these exceptions is omitted from | ||
stdout, but will still be emitted to the Ray Data specific log file. | ||
To emit all stack frames to stdout, set | ||
`DataContext.internal_stack_trace_stdout` to True.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`DataContext.internal_stack_trace_stdout` to True.""" | |
`DataContext.log_internal_stack_trace_to_stdout` to True.""" |
"is omitted from stdout, and only written to the Ray Data log " | ||
f"file at {data_exception_logger._log_path}. To output the " | ||
"full stack trace to stdout, set " | ||
"`DataContext.internal_stack_trace_stdout` to True`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"`DataContext.internal_stack_trace_stdout` to True`." | |
"`DataContext.log_internal_stack_trace_to_stdout` to True`." |
python/ray/data/tests/test_logger.py
Outdated
if not found_exception_call: | ||
raise Exception( | ||
f"Searched logs for the following text, but did not find: `{text}` " | ||
f"All calls: {mock_calls}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, use assert
?
Signed-off-by: Scott Lee <sjl@anyscale.com>
"""Represents an Exception originating from user code, e.g. | ||
user-specified UDF used in a Ray Data transformation. | ||
|
||
By default, the stack trace for these exceptions is omitted from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit misleading. sounds like the entire stack trace is omitted.
from ray.util.annotations import DeveloperAPI | ||
|
||
|
||
@DeveloperAPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge deal. but I slightly prefer to put these 2 exceptions and the util function in a new ray/data/exceptions.py
file.
Because they are more related to exceptions, instead of logging. And we will no longer need UserCodeException.__module__ = __name__
as well
log_to_stdout = DataContext.get_current().log_internal_stack_trace_to_stdout | ||
# data_exception_logger.get_logger(log_to_stdout=log_to_stdout).exception(e) | ||
|
||
is_user_code_exception = isinstance(e, ray.exceptions.RayTaskError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ray.exceptions.RayTaskError
doesn't always mean user code exception. it could be because of other ray internal exceptions from the tasks.
To accurately detect user code exception. we should add a try-catch wrap to the UDF.
) | ||
|
||
if is_user_code_exception: | ||
raise e.with_traceback(None) from UserCodeException() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be raise UserCodeException() from e.with_traceback(None)
right?
Because e
was the original exception, and the new UserCodeException
was caused by e
.
But I think it's okay to just raise e.with_traceback(None)
as well.
We can use UserCodeException
to wrap the UDF, as I mentioned in the above comment.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Remaining premerge failure is unrelated to this PR ( |
… default"" (#43735) Reverts #43638, adding changes from original PR #43251 with fix for breaking osx / windows tests. Closes #40802 Postmerge test run: https://buildkite.com/ray-project/postmerge/builds/3306 --------- Signed-off-by: Scott Lee <sjl@anyscale.com>
…oject#43251) Whenever there is any error when using Ray Data, the full stack trace is currently printed to stdout. If the exception originates from the user code, the internal stack trace is usually not needed, and obfuscates the relevant user code with the actual error. This PR reduces the stack trace output to omit internal Ray Data or Ray Core code in the stdout output, and only logs the full stack trace to the Ray Data ray-data.log file. Users can enable the full stack trace outputted to stdout by setting DataContext.log_internal_stack_trace_to_stdout = True. Signed-off-by: Scott Lee <sjl@anyscale.com>
…oject#43251) Whenever there is any error when using Ray Data, the full stack trace is currently printed to stdout. If the exception originates from the user code, the internal stack trace is usually not needed, and obfuscates the relevant user code with the actual error. This PR reduces the stack trace output to omit internal Ray Data or Ray Core code in the stdout output, and only logs the full stack trace to the Ray Data ray-data.log file. Users can enable the full stack trace outputted to stdout by setting DataContext.log_internal_stack_trace_to_stdout = True. Signed-off-by: Scott Lee <sjl@anyscale.com>
…ray-project#43251)" (ray-project#43638) This reverts commit 8eea8d8. Confirmed that this broke darwin://python/ray/tests:test_actor_retry (go/flaky)
… default"" (ray-project#43735) Reverts ray-project#43638, adding changes from original PR ray-project#43251 with fix for breaking osx / windows tests. Closes ray-project#40802 Postmerge test run: https://buildkite.com/ray-project/postmerge/builds/3306 --------- Signed-off-by: Scott Lee <sjl@anyscale.com>
Why are these changes needed?
Whenever there is any error when using Ray Data, the full stack trace is currently printed to stdout. If the exception originates from the user code, the internal stack trace is usually not needed, and obfuscates the relevant user code with the actual error. This PR reduces the stack trace output to omit internal Ray Data or Ray Core code in the stdout output, and only logs the full stack trace to the Ray Data ray-data.log file. Users can enable the full stack trace outputted to stdout by setting DataContext.log_internal_stack_trace_to_stdout = True.
For the sample user code with an error, we compare the stack trace output for several Ray Data usages before and after this PR:
The output is as follows:
Before (stdout):
After (stdout):
After (ray-data.log):
Before (stdout):
After (stdout):
After (ray-data.log):
Before (stdout):
After (stdout):
After (ray-data.log):
Finally, here is a comparison of the log output in the case of an internal Ray Data code error (I added a
1/0
in the Ray Data code to trigger an exception):Before:
After (stdout):
After (ray-data.log):
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.