-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review model API design #259
Comments
A bunch of different ways. I built a couple of custom queue-based batch processing pipelines using RabbitMQ. We also used AI Platform for some workloads, Cloud Dataflow with Klio for large scale batch processing, KubeFlow for many of the non-deep learning models, real-time systems where features were precomputed in batch and retrieved from fast feature stores, etc. I've tended towards queue-based systems, because I've worked mostly with deep learning models where latency wasn't critical, and models consumed lots of resources, limiting possible concurrency. But I can also see use cases for a bundled real-time HTTP API. Two trade-offs I'm thinking about: First, how can we design Cog such that it can be used in heterogenous environments? Ideally the same Cog model should be deployable on Replicate, AI Platform, Sagemaker, Cloud Dataflow, Seldon, custom GKE setups, and so on. The sidecar pattern would help in some of these environments, but not on the managed serving platforms. Can we make a core Cog server that can be wrapped with adapters for various environments? Perhaps we provide well-documented HTTP and AMQP APIs out of the box, and maintain adapters for different platforms together with the community. Secondly, how do we keep the APIs simple? Both kserve and Seldon are great, but, like k8s, they have large surface areas and steep learning curves. Can we be as opinionated in our serving APIs as we are with Cog, and still be deployable in different environments? |
This is a quick fix to get GPU models serving correctly. The real fix is being incorporated into replicate#259 and replicate#343. Signed-off-by: Ben Firshman <ben@firshman.co.uk>
This is a quick fix to get GPU models serving correctly. The real fix is being incorporated into replicate#259 and replicate#343. Signed-off-by: Ben Firshman <ben@firshman.co.uk>
Hmm, I wonder whether we can consider this superseded by #443? |
I think we can... |
The current model API was hastily written, and we have now learned many things that could be incorporated into its design.
User data
Requirements
Essential
Future, to design in context
Prior art
We don't need to reinvent the wheel.
Off the shelf
Real world
Areas for discussion
Potential solutions
Next steps
See also
(Maintainers -- please edit this and consider it a wiki! Edited by @bfirsh, ...)
The text was updated successfully, but these errors were encountered: