[Feature] Enforce Agent output_schema for OpenAI-compatible / third-party BaseLlm providers (DeepSeek, NVIDIA NIM, etc.)

## 🔴 Required Information

### Is your feature request related to a specific problem?

Yes. In a multi-stage ADK `SequentialAgent` pipeline we rely on `Agent(output_schema=PydanticModel)` for **mechanical JSON extraction** (e.g. a `StructuredResearch` schema with nested lists, enums, optional fields).

This works reliably with **native Google models** (`Gemini` / `Gemma` via the Google API): ADK passes the schema to the API and the response is constrained at generation time.

It does **not** work reliably when the same `Agent` uses a **custom `BaseLlm`** backed by OpenAI-compatible third-party endpoints (we use **DeepSeek** and previously **NVIDIA NIM**):

1. ADK forwards `response_schema` on `LlmRequest.config`, but there is **no official ADK mapping** from that schema to provider-specific strict structured-generation APIs.
2. Our custom connector can only fall back to OpenAI `response_format: {"type": "json_object"}` — which guarantees *some* JSON object, **not** conformance to the Pydantic schema (field names, types, required fields, enums).
3. Models frequently return invalid shapes anyway: prose prefixes, ` ```json ` fences, wrong keys (`date` instead of `period`), missing required fields (`sentiment`), nested dicts instead of lists — causing `model_validate_json` failures in the agent loop.
4. Provider-specific hacks (e.g. NVIDIA NIM `extra_body.nvext.guided_json`) are outside ADK, brittle, and still failed in production (JSON wrapped in markdown despite `guided_json`).

**Concrete production impact:** we had to move a structuring stage from NVIDIA Llama 3.3 70B (NIM) back to **Gemma via Gemini API** solely because `output_schema` is effectively a first-class feature only for Google-native connectors. DeepSeek stages still need duplicated schema text in prompts + heavy post-normalization.

Example agent setup (works on Gemini, fragile on custom `BaseLlm`):

```python
structuring = Agent(
    model=get_gemini_structuring(),  # Gemma — reliable output_schema
    name="StructuringAgent",
    output_schema=StructuredResearch,
    output_key="structured",
)

scoring = Agent(
    model=get_deepseek_connector(),  # custom BaseLlm — json_object only
    name="ScoringAgent",
    output_schema=CompanyResponse,
    output_key="company_json",
)
```

### Describe the Solution You'd Like

We would like ADK to treat `Agent.output_schema` as a **portable contract** across model backends, not only Google API models.

Minimum viable solution:

1. **Document** how `output_schema` / `LlmRequest.config.response_schema` should be honored by custom `BaseLlm` implementations.
2. Provide an **official or reference OpenAI-compatible connector** that maps ADK schema → provider capabilities:
   - OpenAI / Azure: `response_format: { type: "json_schema", json_schema: { name, schema, strict: true } }`
   - NVIDIA NIM: `extra_body.nvext.guided_json` (or current NIM structured-generation API)
   - DeepSeek / other OpenAI-compat: best-effort strict mode where supported, clear fallback otherwise
3. **Capability flags** on `BaseLlm` (e.g. `supports_strict_json_schema`, `supports_json_object`) so ADK can choose strategy and log when schema enforcement is degraded.
4. Optional but valuable: a **single ADK sanitization hook** before Pydantic validation (strip markdown fences / prose wrappers) when providers return quasi-JSON — so every custom connector does not reimplement this.

Ideal API shape (pseudo-code):

```python
class OpenAICompatibleLlm(BaseLlm):
    structured_output_mode: Literal["strict_schema", "json_object", "none"] = "strict_schema"

    async def generate_content_async(self, llm_request, stream=False):
        schema = adk_schema_to_json_schema(llm_request.config.response_schema)
        if schema and self.structured_output_mode == "strict_schema":
            kwargs["response_format"] = {
                "type": "json_schema",
                "json_schema": {"name": schema.name, "schema": schema.dict, "strict": True},
            }
        elif schema:
            kwargs["response_format"] = {"type": "json_object"}
        ...
```

For custom providers (NVIDIA NIM), ADK could expose an extension point:

```python
def apply_structured_output(provider: str, schema: dict, request_kwargs: dict) -> dict:
    if provider == "nvidia_nim":
        request_kwargs.setdefault("extra_body", {})["nvext"] = {"guided_json": schema}
    ...
```

### Impact on your work

- **High impact** on production multi-agent pipelines that mix Google models (search/tools) with cheaper external models (reasoning/scoring).
- Without this, teams must either:
  - (a) duplicate schemas in prompts and maintain custom sanitizers/normalizers per provider, or
  - (b) route all structured stages to Google models, increasing cost/quota pressure on `GEMINI_API_KEY`.

Timeline: not a hard deadline, but this is a **practical blocker** for using ADK as a provider-agnostic orchestration layer.

### Willingness to contribute

**Yes** — we can contribute a reference `BaseLlm` implementation and regression tests (Pydantic schema + OpenAI-compat provider matrix) if the ADK team defines the intended extension interface.

---

## 🟡 Recommended Information

### Describe Alternatives You've Considered

| Alternative | Why insufficient |
|---|---|
| **Prompt-only schema** (embed `model_json_schema()` in instructions) | Models still drift; duplicates schema; does not enforce at decode time. |
| **`response_format: json_object` only** | Returns any JSON object; frequent field/type mismatches; requires heavy `normalize_llm_output()` in application code. |
| **NVIDIA `guided_json` in custom connector** | Provider-specific; outside ADK; still returned prose + fenced JSON in our CI/production runs. |
| **LiteLLM proxy** | We migrated away from LiteLLM to direct ADK connectors; proxy adds another failure layer and does not standardize ADK `output_schema` semantics. |
| **Use Gemma/Gemini for all structured stages** | Works today, but defeats purpose of using DeepSeek/NIM for cost/latency; concentrates quota on one API key. |
| **Post-hoc JSON extraction + Pydantic repair** | Fragile, hard to test, masks model quality issues; every team reimplements the same sanitizers. |

### Proposed API / Implementation

1. **`BaseLlm` structured output contract**
   - When `Agent(output_schema=Model)` is set, ADK always populates `LlmRequest.config.response_schema` consistently (already happens).
   - Add documented obligation: connectors should use strict schema enforcement when possible.

2. **Reference connector: `OpenAICompatibleLlm` in ADK (or `contrib/`)**
   - Map Pydantic / JSON Schema → OpenAI Structured Outputs (`json_schema` + `strict: true`).
   - Pluggable `StructuredOutputAdapter` per host (`api.openai.com`, `api.deepseek.com`, `integrate.api.nvidia.com`).

3. **Validation pipeline**
   ```
   Agent.output_schema
     → LlmRequest.config.response_schema
     → Provider adapter (strict / guided_json / json_object)
     → Optional ADK sanitize_json_response()
     → Pydantic model_validate_json (existing ADK behavior)
   ```

4. **Developer-visible telemetry**
   - Log when falling back from `strict_schema` → `json_object` → `none`, with model id and provider.

### Additional Context

**Environment**
- ADK Python, `google.adk.agents.llm_agent.Agent`, `google.adk.models.base_llm.BaseLlm`
- Providers tested: Gemini API (`gemini-2.5-flash`, `gemma-4-31b-it`), DeepSeek (`deepseek-chat`, `deepseek-reasoner`), NVIDIA NIM (`meta/llama-3.3-70b-instruct`)
- Pattern: custom `OpenAICompatibleLlm` subclassing `BaseLlm`, using `openai.AsyncOpenAI` with custom `base_url`

**Observed failure modes (NVIDIA Llama + custom connector, pre-migration)**
- Response: `Oto wynik analizy...\n```json\n{...}\n````
- ADK validates raw string → `JSONDecodeError` at column 1
- Even with `guided_json`, field aliases (`date` vs `period`) and missing enum fields required app-side normalizers

**What works today (Google-native only)**
- `Agent(output_schema=StructuredResearch)` + `Gemini(model="gemma-4-31b-it")` → stable structured extraction in E2E CI (149 tests, live pipeline)

**What remains fragile (DeepSeek)**
- `Agent(output_schema=CompanyResponse)` + custom `BaseLlm` → relies on `json_object` + prompt schema duplication + Pydantic validators

Happy to share a minimal reproduction repository / test case if useful.
```

Alternative	Why insufficient
Prompt-only schema (embed `model_json_schema()` in instructions)	Models still drift; duplicates schema; does not enforce at decode time.
`response_format: json_object` only	Returns any JSON object; frequent field/type mismatches; requires heavy `normalize_llm_output()` in application code.
NVIDIA `guided_json` in custom connector	Provider-specific; outside ADK; still returned prose + fenced JSON in our CI/production runs.
LiteLLM proxy	We migrated away from LiteLLM to direct ADK connectors; proxy adds another failure layer and does not standardize ADK `output_schema` semantics.
Use Gemma/Gemini for all structured stages	Works today, but defeats purpose of using DeepSeek/NIM for cost/latency; concentrates quota on one API key.
Post-hoc JSON extraction + Pydantic repair	Fragile, hard to test, masks model quality issues; every team reimplements the same sanitizers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Enforce Agent output_schema for OpenAI-compatible / third-party BaseLlm providers (DeepSeek, NVIDIA NIM, etc.) #6021

🔴 Required Information

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Describe Alternatives You've Considered

Proposed API / Implementation

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Enforce Agent output_schema for OpenAI-compatible / third-party BaseLlm providers (DeepSeek, NVIDIA NIM, etc.) #6021

Description

🔴 Required Information

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Describe Alternatives You've Considered

Proposed API / Implementation

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions