New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Add deployment graph cookbook #24524
Conversation
import ray | ||
from ray import serve | ||
from ray.experimental.dag.input_node import InputNode | ||
from ray.serve.drivers import DAGDriver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to have a snippet at top of this doc as "imports" section so all the subsequent cells can be much shorter in length. We want to to the exact opposite of current e2e documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in further pr if we have lots of duplicates import (i think we will have :) ).
else: | ||
prev_outputs[i] = nodes[i].forward.bind(prev_outputs[i - 1]) | ||
|
||
serve_dag = DAGDriver.options(route_prefix="/my-dag").bind(prev_outputs[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would suggest cutting this part completely and directly do dag.execute(0)
and prints its output in the doc. The sequence of action we want user to take is Ray DAG -> Serve DAG where DAGDriver and serve.run() comes later when user needs http / serving / rolling updates.
Therefore i will be helpful to have sections like "How do I serve my DAG with HTTP" , "How do i add adapter / schema to my http input" and "How do I scale one node in my deployment graph"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev_outputs[-1] is a bit confusing here as cookbook that aims for simplicity :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have suggestions what we should describe this scenario? I can add comments mention that "we need to use the last node to execute the dag" But let me know if you have good idea
# 10 nodes chain | ||
num_nodes = 10 | ||
nodes = [Model.bind(w) for w in range(num_nodes)] | ||
prev_outputs = [None] * num_nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's worth having a "warning" as "anti-pattern" section for this part explaining why using a variable name and re-assign it in a loop is not the right way.
tip: After wheel is built, latest documentation on commit can be viewed on CI |
# 10 nodes chain in a line | ||
num_nodes = 10 | ||
nodes = [Model.bind(w) for w in range(num_nodes)] | ||
outputs = [None] * num_nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need very explicitly inline comment here (i would suggest even adding a WARNING section right underneath) about why this does not behave like a python variable that can be reused in a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline, leave it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format looks much more self-contained and extendable :) really close now and just a few nits. I can take care of inline editing later on as we iterate on later parts.
doc/source/serve/deployment-graph/chain_nodes_same_class_different_args.md
Outdated
Show resolved
Hide resolved
doc/source/serve/deployment-graph/chain_nodes_same_class_different_args.md
Outdated
Show resolved
Hide resolved
doc/source/serve/deployment-graph/chain_nodes_same_class_different_args.md
Show resolved
Hide resolved
doc/source/serve/deployment-graph/chain_nodes_same_class_different_args.md
Show resolved
Hide resolved
- file: serve/deployment-graph | ||
- file: serve/deployment-graph/deployment-graph | ||
sections: | ||
- file: serve/deployment-graph/deployment-graph-e2e-tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know I'm against this "feature-first" approach to docs hierarchy, but I can't stop you guys from doing your thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're referencing Ray dataset structure which i believe you contributed a lot to its format, open to changes :) iiuc you're suggesting we should bring that to first level user guide instead? I think that's better format too to avoid nesting docs, but meanwhile Deployment graph was just in alpha stage but i haven't found a structure that satisfies both yet. What do you think is more reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on Serve doc restructuring. This structure will be transient for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, @simon-mo . We don't have to change anything for this PR, just wanted to chime in.
@jiaodong Generally, users are interested in their workloads and workflows, rather than specific features of a library (that they often don't know yet and won't click on). A user guide / cookbook for this is great, but I wouldn't put it top level. To me there shouldn't be a "Deployment Graph" doc there at all, as I find the structure a little confusing.
Where should I go to learn about deploying my models first? Core API: Deployments, Calling Deployments, Serving ML Models, Deployment Graph, or Deploying Ray Serve? They all sound very similar and I don't want to read through 5 docs first to understand what I need, because I might only have 10 minutes. All relatively easy to streamline, I think. No worries there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the inputs Max, it makes a lot of sense. As we chatted earlier in March, Serve team should re-structure our docs to better organize our content on this and it's happening now, please don't hesitate to step up and give us suggestions like this :)
For this PR in particular, Sihan is ramping on the context to make progress, but I will self-assign and ensure syncing with Simon to properly organize them as part of the docs restructure effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry i should have mentioned this earlier for the image file please commit it to separate repo to make the size of ray repo smaller: https://github.com/ray-project/images/tree/master/docs, you should be able to push it directly and link it via the url of its raw content.
The content and structure looks good to me for the first page. Please rebase and resolve a small merge conflict and let's aim to merge it tomorrow morning
431eec0
to
86c8cba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting @simon-mo to take another look
Hmm looks like this snippet is still not tested. I think we need to add a py_test target at https://github.com/ray-project/ray/blob/master/python/ray/serve/BUILD#L458-L464 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as test passes
merging |
Why are these changes needed?
Provide more examples for user to use deployment graph api
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.