Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for inference server #994

Closed
wants to merge 12 commits into from
Closed

Add example for inference server #994

wants to merge 12 commits into from

Conversation

iojw
Copy link
Collaborator

@iojw iojw commented Jul 19, 2022

This is a first pass at how a server executing computations on Sky might be implemented. The use case I have in mind is when a user wants to execute certain computations on Sky triggered by HTTP requests, and based on the request parameters as input.

To test it out:

  1. Update the LOCAL_UPLOAD_FOLDER variable and data init file mount path.
  2. FLASK_APP=examples/inference_server/server.py flask run then access http://127.0.0.1:5000 and upload an image. The server spins up a Sky cluster, initializes the cluster with some model weights, runs some computation on the uploaded image, and returns the result as the HTTP response.

Some thoughts on the implementation:

  • Currently, there does not seem to be a good way of retrieving the return value of a function. As such, I had to use regex and logs to extract the result of the function. This is not ideal, but I believe it is being addressed by Add core programmatic API align with CLI #978
  • Because the handler launches a new Sky cluster, the request is very long running, taking > 10 min currently. I have 2 ideas for how to improve this:
    • Use a task queue like Celery to execute the computations in the background when the handler is called, and return the job ID instead. User can use the job ID to poll the status of the job.
    • It looks like sky.launch takes the bulk of the time. If all the computations can be executed on the same cluster, we can have a setup function run before the first request is made that spins up the cluster, and subsequent computations will all be run on this cluster using sky.exec.

Let me know what you think!

@iojw iojw self-assigned this Jul 19, 2022
@WoosukKwon
Copy link
Collaborator

Hey @iojw, thanks for submitting this PR!

Just curious: what's the expected usage? Is it for real-time inference or offline inference? Is the inference request repeated over time? And why do we need Sky Python API for this? I think we need to make a concrete story here.

@iojw
Copy link
Collaborator Author

iojw commented Jul 20, 2022

@WoosukKwon I believe the expected usage is for real-time inference, with the inference request changing based on the input image being passed in the HTTP request. I think the case for using the Python API is that it is much more natural over using subprocess since the server is built in Python as well. cc @concretevitamin

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iojw for submitting this PR! Left some comments. Please check them out.

workdir=workdir,
run=run_fn)

resources = sky.Resources(cloud=sky.Azure())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for using Azure here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had access to Azure at the time, so I used it for easier testing locally. I recently gained access to AWS as well - would you suggest to leave it as the default in this case?

examples/inference_server/server.py Outdated Show resolved Hide resolved
examples/inference_server/server.py Show resolved Hide resolved
examples/inference_server/server.py Outdated Show resolved Hide resolved
examples/inference_server/server.py Outdated Show resolved Hide resolved
examples/inference_server/server.py Outdated Show resolved Hide resolved
@@ -0,0 +1,18 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a more realistic example in addition to (or instead of) this one? As pointed out in server.py, we need to send model weights and inputs to the cluster and get the prediction outputs from the cluster. Why don't we show such a complete example here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I went with a simpler example because I don't have much experience with ML and inference - what do you think would be a good library and model to implement here?

Comment on lines +69 to +73
# Copy model weights to the cluster
# Instead of local path, can also specify a cloud object store URI
'/remote/path/to/model-weights': 'local/path/to/model-weights',
# Copy image to the cluster
remote_image_path: local_image_path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice point! Our storage APIs are indeed useful in sending model weights and inputs.

Comment on lines 20 to 21
LOCAL_UPLOAD_FOLDER = '/Users/isaac/Dropbox/Berkeley/Sky/sky/examples/inference_server/uploads'
REMOTE_UPLOAD_FOLDER = '/remote/path/to/folder'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this example easier to run? It would be nice if users can run this example without any modification to the code.

@concretevitamin concretevitamin removed their request for review May 14, 2023 16:35
@github-actions
Copy link

This PR is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 12, 2023
@github-actions
Copy link

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants