-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example for inference server #994
Conversation
Hey @iojw, thanks for submitting this PR! Just curious: what's the expected usage? Is it for real-time inference or offline inference? Is the inference request repeated over time? And why do we need Sky Python API for this? I think we need to make a concrete story here. |
@WoosukKwon I believe the expected usage is for real-time inference, with the inference request changing based on the input image being passed in the HTTP request. I think the case for using the Python API is that it is much more natural over using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @iojw for submitting this PR! Left some comments. Please check them out.
workdir=workdir, | ||
run=run_fn) | ||
|
||
resources = sky.Resources(cloud=sky.Azure()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for using Azure here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had access to Azure at the time, so I used it for easier testing locally. I recently gained access to AWS as well - would you suggest to leave it as the default in this case?
@@ -0,0 +1,18 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more realistic example in addition to (or instead of) this one? As pointed out in server.py
, we need to send model weights and inputs to the cluster and get the prediction outputs from the cluster. Why don't we show such a complete example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! I went with a simpler example because I don't have much experience with ML and inference - what do you think would be a good library and model to implement here?
# Copy model weights to the cluster | ||
# Instead of local path, can also specify a cloud object store URI | ||
'/remote/path/to/model-weights': 'local/path/to/model-weights', | ||
# Copy image to the cluster | ||
remote_image_path: local_image_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice point! Our storage APIs are indeed useful in sending model weights and inputs.
examples/inference_server/server.py
Outdated
LOCAL_UPLOAD_FOLDER = '/Users/isaac/Dropbox/Berkeley/Sky/sky/examples/inference_server/uploads' | ||
REMOTE_UPLOAD_FOLDER = '/remote/path/to/folder' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this example easier to run? It would be nice if users can run this example without any modification to the code.
This PR is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was closed because it has been stalled for 10 days with no activity. |
This is a first pass at how a server executing computations on Sky might be implemented. The use case I have in mind is when a user wants to execute certain computations on Sky triggered by HTTP requests, and based on the request parameters as input.
To test it out:
LOCAL_UPLOAD_FOLDER
variable and data init file mount path.FLASK_APP=examples/inference_server/server.py flask run
then accesshttp://127.0.0.1:5000
and upload an image. The server spins up a Sky cluster, initializes the cluster with some model weights, runs some computation on the uploaded image, and returns the result as the HTTP response.Some thoughts on the implementation:
sky.launch
takes the bulk of the time. If all the computations can be executed on the same cluster, we can have a setup function run before the first request is made that spins up the cluster, and subsequent computations will all be run on this cluster usingsky.exec
.Let me know what you think!