Feature Type
Nice to have
Feature Description
Gemini vertexai now supports priority tiering inference in preview.
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/priority-paygo
On the genai api it supports service tiers with priority inference
https://ai.google.dev/gemini-api/docs/priority-inference
TLDR api users can pay extra for inference to guarantee lower latency and more flexible throughput.
For voice agents with spiky traffic and value low latency inference, this is a critical feature.
This should be supported explicitly through the gemini plugin. Currently, this can be implemented through gemini LLM HTTP options.
Workarounds / Alternatives
No response
Additional Context
No response
Feature Type
Nice to have
Feature Description
Gemini vertexai now supports priority tiering inference in preview.
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/priority-paygo
On the genai api it supports service tiers with priority inference
https://ai.google.dev/gemini-api/docs/priority-inference
TLDR api users can pay extra for inference to guarantee lower latency and more flexible throughput.
For voice agents with spiky traffic and value low latency inference, this is a critical feature.
This should be supported explicitly through the gemini plugin. Currently, this can be implemented through gemini LLM HTTP options.
Workarounds / Alternatives
No response
Additional Context
No response