Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Fix surprious __call__ invocation in Deployment DAG's exec_impl #24199

Merged
merged 4 commits into from
Apr 27, 2022

Conversation

simon-mo
Copy link
Contributor

@simon-mo simon-mo commented Apr 26, 2022

Why are these changes needed?

It turns out whenever you call dag.execute on a deployment graph, the nodes are executed recursively. When a node is executed, whatever in its _exec_impl will be called and its return value is put in parent_node of all its child node's _exec_impl. So this PR make sure the DeploymentNode's exec just return the handle to be used by DeploymentMethodNode's exec.

Related issue number

Closes #24116

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -139,7 +139,8 @@ def _copy_impl(

def _execute_impl(self, *args, **kwargs):
"""Executor of DeploymentNode by ray.remote()"""
return self._deployment_handle.remote(*self._bound_args, **self._bound_kwargs)
# TODO fix comment
return self._deployment_handle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class node's impl is also broken, see 24198

@edoakes edoakes added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 27, 2022
@simon-mo simon-mo changed the title [Serve] [WIP] Fix deployment node exec impl [Serve] Fix surprious __call__ invocation in Deployment DAG's exec_impl Apr 27, 2022
@simon-mo simon-mo removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 27, 2022
@simon-mo
Copy link
Contributor Author

@edoakes @jiaodong ready for review/stamp

Copy link
Member

@jiaodong jiaodong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great fix ! From my benchmark yesterday, patching this also improves perf

return method_body.remote(
*self._bound_args,
**self._bound_kwargs,
)

def _get_serve_deployment_handle(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used

other_args_to_resolve=self._bound_other_args_to_resolve,
other_args_to_resolve={
**self._bound_other_args_to_resolve,
PARENT_CLASS_NODE_KEY: self,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used by unit test, not by the actual ray dag -> serve dag translation path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're only passing it so DeploymentMethodNode can find method_body = getattr(self._deployment_node, self._deployment_method_name) that's essentially a deployment handle method call, do we need to include the actual instance of DeploymentNode in other_args_to_resolve ?

Given other_args_to_resolve is on critical path of DAG execution and json serde it's preferable if we add minimal info and avoid passing in Deployment or DAGNode objects if possible.

@simon-mo simon-mo merged commit b4d9fcd into ray-project:master Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Fix weird missing methods upon doing serve.run
3 participants