Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve][Deployment Graph] Let .bind return ray DAGNode types and remove exposing DeploymentNode as public #24065

Merged
merged 12 commits into from
Apr 21, 2022

Conversation

jiaodong
Copy link
Member

@jiaodong jiaodong commented Apr 21, 2022

Why are these changes needed?

See dag layering summary in #24061

We need to cleanup and set right ray dag -> serve dag layering where .bind() can be called on @serve.deployment decorated class or func, but only returns raw Ray DAGNode type, executable by ray core and serve_dag is only available after serve-specific transformations.

Thus this PR removes exposed serve DAGNode type such as DeploymentNode.

It also removes the syntax of class.bind().bind() to return a DeploymentMethodNode that defaults to __call__ to match same behavior in ray dag building.

closes #24062

Next steps

Refactor DeploymentNode implementation that doesn't always bundle DAGNode, Deployment creation, handling config and replacing handle in one node. We need all of the functionalities, but shouldn't be in one place.

Concerte example: all Ray DAGNode types uses _copy to handle function calls recursively which will re-instantiate DAGNode instances upon replacement it's the mechanism used to execute the DAG as well. But we don't want to re-do everything upon each call in DeploymentNode, not just for abstraction, but also for performance.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

from ray.experimental.dag.class_node import ClassNode
from ray.experimental.dag.function_node import FunctionNode
from ray.experimental.dag import DAGNode
from ray.experimental.dag.class_node import ClassNode # noqa: F401
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these imports happen at deployment_graph level so from api.py we're just importing from serve module, rather than directly exposing experimental folder stuff.

Ideally we should have another effort to cleanup imports while eliminating circular imports.

python/ray/serve/deployment_graph.py Outdated Show resolved Hide resolved
python/ray/serve/pipeline/deployment_node.py Outdated Show resolved Hide resolved
python/ray/experimental/dag/class_node.py Outdated Show resolved Hide resolved
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 21, 2022
@@ -79,6 +79,12 @@ def _contains_input_node(self) -> bool:
return False

def __getattr__(self, method_name: str):
# User trying to call .bind() without a bind class method
if method_name == "bind" and "bind" not in dir(self._body):
Copy link
Member Author

@jiaodong jiaodong Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl addressed, now we don't have the extra .bind() on @serve.deployment, consistent with Ray Core .bind now.

Only little nit i did here is to detect and surface the same DAGNode base exception message when user tries to Actor.bind().bind(), that previously we only surfaced type object 'Actor' has no attribute 'bind'

@jiaodong jiaodong removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 21, 2022
@jiaodong jiaodong requested review from ericl and edoakes April 21, 2022 04:26
@jiaodong jiaodong added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. flaky-test labels Apr 21, 2022
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for @ericl to sign off

if method_name == "bind" and "bind" not in dir(self._body):
raise AttributeError(
f".bind() cannot be used again on {type(self)} "
f"(args: {self.get_args()}, kwargs: {self.get_kwargs()})."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why print args and kwargs here? Doesn't seem relevant and might be spammy.

@edoakes
Copy link
Contributor

edoakes commented Apr 21, 2022

@jiaodong please fix error message before we merge

@ericl ericl merged commit f0071d3 into ray-project:master Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Let .bind() on @serve.deployment return ClassNode / FunctionNode only
5 participants