Skip to content

GenAI service_tier - is it even working?? #2448

@danielLublinsky

Description

@danielLublinsky

Hey,

Gemini has Priority inference - https://ai.google.dev/gemini-api/docs/priority-inference#how-to-use
And we recently allowed for priority for all of our AI uses
Sadly I cant get it to preform, we still get:

503 UNAVAILABLE. {'error': {'code': 503, 'message': 'This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.', 'status': 'UNAVAILABLE'}}

The service_tier flag is set and the response acknowledges the setting - we are running Tier 2 on the API key:

Priority inference is available to Tier 2 & Tier 3 users across the GenerateContent API and Interactions API endpoints.

And still nothing,
I know that its not 100% grantee to give a response at peak-times but for the few weeks its out its completely useless and never helps avoid the 503 or speed response times like they promise

Am I possibly missing something?
did anyone else use this and actually get something out of it?

Environment details

  • Programming language: python 3.14.4

  • Package version: 2.1.0

Thanks!

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions