kubeai 0.9.0
Highlights
- Autoscaling now works for any engine including Ollama and FasterWhisper
- Add ability to cache models using shared filesystems (Filestore, EFS, etc)
What's Changed
- Autoscale based on KubeAI OpenTelemetry active requests metrics by @nstogner in #261
- add resourceProfiles and 405b on A100 80GB by @samos123 in #264
- Refactor e2e tests by @nstogner in #263
- Add Autoscaler State ConfigMap by @nstogner in #268
- add tpu quota to GKE install guide and use values-gke.yaml by @samos123 in #271
- update vllm images to 0.6.3 by @samos123 in #273
- Shared filesystem caching by @nstogner in #272
- add manual test of vLLM on GPU and TPU by @samos123 in #279
Full Changelog: v0.8.0...v0.9.0