Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add address and pid prefix to the mars exception message #2730

Merged
merged 4 commits into from
Feb 23, 2022

Conversation

fyrestone
Copy link
Contributor

What do these changes do?

Developers need the address and pid for debug in a large distributed cluster, so this PR,

  • Add address and pid prefix to the mars exception message.

An example exception,

  File "mars/oscar/core.pyx", line 226, in _handle_actor_result
    result = await result
  File "/Users/admin/mars/services/tests/fault_injection_patch.py", line 86, in run
    return await super().run()
  File "/Users/admin/mars/services/subtask/worker/processor.py", line 457, in run
    await self._execute_graph(chunk_graph)
  File "/Users/admin/mars/services/subtask/worker/processor.py", line 195, in _execute_graph
    await self._async_execute_operand(self._datastore, chunk.op)
  File "/Users/admin/mars/services/tests/fault_injection_patch.py", line 94, in _async_execute_operand
    handle_fault(fault)
  File "/Users/admin/mars/services/tests/fault_injection_manager.py", line 50, in handle_fault
    raise FaultInjectionError("Fault Injection")
types._MarsError: [address=ray://test_cluster/1/2, pid=35418] Fault Injection

Related issue number

Fixes #xxxx

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

class _MarsError(ErrorMessage.AsCauseBase, type(self.error)):
pass

return _MarsError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping existing errors has some side effects. For instance, some exception types have custom attributes, for instance, HTTP error codes. Wrapping with error messages hide these attributes and may lead to unexpected errors. Simply try adding attributes to these exceptions and log locations when handling exceptions at receiver end may be a better solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks. I added a test case test_as_instanceof_cause for testing as_instanceof_cause.

@@ -59,8 +59,9 @@


class _ErrorProcessor:
def __init__(self, message_id: bytes, protocol):
def __init__(self, message_id: bytes, address: str, protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply use pool as the argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the _ErrorProcessor needs some message info and the address of current pool. Replacing address to pool does not simplify the constructor parameters.

Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hekaisheng hekaisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hekaisheng hekaisheng merged commit 0f2c0d6 into mars-project:master Feb 23, 2022
qinxuye pushed a commit to hekaisheng/mars that referenced this pull request Mar 1, 2022
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request May 31, 2022
…ars exception message (mars-project#2730)

Merge branch improve_mars_exception_compatibility of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/251

Signed-off-by: 慕白 <chaokun.yck@antgroup.com>


* Add address and pid prefix to the mars exception message (mars-project#2730)

* Fix CI
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch cp_2633_2723_2730 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/266

Signed-off-by: 不涸 <zhongchun.yzc@antgroup.com>


* Refine failure recovery log and exception (mars-project#2633)

* Refine fo log and exception

* Pin xgboost_ray to 0.1.5

Co-authored-by: 留宝 <po.lb@antgroup.com>
Co-authored-by: 刘宝 <po.lb@antfin.com>

* Fix duplicate exceptions in log (mars-project#2723)

* Add address and pid prefix to the mars exception message (mars-project#2730)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants