Skip to content

Gemini - Priority Tiering #5664

@smith-mathieu-infinitusai

Description

Feature Type

Nice to have

Feature Description

Gemini vertexai now supports priority tiering inference in preview.
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/priority-paygo

On the genai api it supports service tiers with priority inference
https://ai.google.dev/gemini-api/docs/priority-inference

TLDR api users can pay extra for inference to guarantee lower latency and more flexible throughput.
For voice agents with spiky traffic and value low latency inference, this is a critical feature.

This should be supported explicitly through the gemini plugin. Currently, this can be implemented through gemini LLM HTTP options.

Workarounds / Alternatives

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions