Skip to content

Feature Request: ADK LlmAgent should support self-deployed Vertex AI Endpoints (which use :predict) in the 'model=' parameter #936

@aashnakunk

Description

@aashnakunk

Is your feature request related to a problem? Please describe. Yes. I'm frustrated because it's currently impossible to use a self-deployed open model (like Gemma 2 or Llama 3) from the Vertex AI Model Garden as the "brain" for an adk.Agent.

When I pass the self-deployed endpoint path to the model= parameter in adk.Agent, the agent fails on its first call with a 400 'Failed to apply chat template.' error.

Through testing, I've confirmed this is due to an API mismatch:

The ADK Agent class is hard-coded to speak the modern :generateContent API (which is necessary for tool-calling, chat history, and system instructions).

The standard Model Garden "Deploy" button for open models (like Gemma) creates an endpoint that only understands the simpler :predict API.

This API conflict means the ADK Agent is sending a complex :generateContent payload to an endpoint that only accepts simple :predict payloads, causing it to fail.

Describe the solution you'd like I'd like the ADK Agent class to be able to use self-deployed Model Garden endpoints as its main "brain." This could be solved in two ways:

Make the ADK smarter: Update the adk.Agent class to detect when a model= endpoint is a :predict service and automatically translate its agentic requests (with tools, history, etc.) into the {"instances": [...]} format that the endpoint understands.

Make Model Garden smarter: Change the default "Deploy" container for generative models (like Gemma/Llama) so they serve the agent-compatible :generateContent API instead of just the :predict API.

Describe alternatives you've considered The only alternative I've found is to not use the SLM as the agent brain.

Instead, I have to:

Use a Gemini model (like gemini-1.5-flash-001) as the adk.Agent brain.

Write a custom @tool that manually calls my self-deployed Gemma endpoint's :predict API (using requests or aiplatform.gapic.PredictionServiceClient).

This works, but it's an unsatisfying workaround. It relegates the SLM to a simple "tool" instead of letting it be the actual agent, which is what I was trying to test and build.

Additional context This issue was confirmed by successfully deploying gemma2-2b-it to an endpoint (.../endpoints/6374043728566288384).

A curl command to the :predict endpoint with an {"instances": [...]} payload succeeded.

The adk.Agent class, when pointed to that exact same endpoint, failed with the 400 'Failed to apply chat template.' error.

This same core problem also seems to affect publisher models like Mistral, which use a non-standard :rawPredict API and also cannot be used as an Agent brain. This forces all adk.Agent development to be locked into Gemini models.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions