Skip to content

fix(oci): sanitize tool JSON Schema for Gemini Proto validator#283

Merged
fede-kamel merged 2 commits into
mainfrom
fix/oci-gemini-schema-munging
May 28, 2026
Merged

fix(oci): sanitize tool JSON Schema for Gemini Proto validator#283
fede-kamel merged 2 commits into
mainfrom
fix/oci-gemini-schema-munging

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

@fede-kamel fede-kamel commented May 28, 2026

Summary

OCI's V1 OpenAI-compat transport (OCIChatCompletionsModel) serves every non-Cohere-R / non-DAC model on OCI, including google.gemini-*. Pydantic emits type: ["string", "null"] for Optional[str] and type: "any" for Any — both rejected by Gemini's Proto schema validator with a hard 400 before the model gets a turn:

400 Bad Request: Unknown name 'type' at
'tools[0].function_declarations[N].parameters.properties[M].value...':
Proto field is not repeating, cannot start list.

This PR ports the validated sanitisation pattern from Oracle's own langchain-oci (OCIUtils.sanitize_schema in libs/oci/langchain_oci/common/utils.py) into Locus, so every tool's function.parameters schema is normalised before the V1 call leaves Locus.

Fixes #281.

What changed (commit 88c4d79)

New helperlocus.core.structured.sanitize_oci_tool_schema():

  • Collapses type: ["string", "null"] → first non-null type (Gemini's Proto requires a single string).
  • Remaps type: "any""object" (Gemini doesn't know "any").
  • Strips metadata keys OCI rejects on tool params: title, const, default: None, any x-* namespace.
  • Promotes const: "x"enum: ["x"] (OCI accepts enum, not const).
  • Defaults missing items{"type": "object"} when type == "array".
  • Recurses through properties / $defs / definitions as dicts of named sub-schemas, so user-defined field names (e.g. a field literally named title) survive even though schema-level title is stripped.

WiringOCIChatCompletionsModel._sanitize_tools_for_oci() is invoked from both complete() and stream(), applied unconditionally to tools[*].function.parameters. Matches langchain-oci's design — the stripped keys are pure metadata that non-Gemini vendors ignore.

Removed — the earlier speculative _requires_gemini_schema_strip + _munge_response_format_for_gemini keyword strip (vendor-gated additionalProperties / patternProperties / dependencies). It was targeting a different OCI endpoint and didn't match the observed bug shape. The now-validated $ref inline for Gemini response_format is preserved.

Drive-by (commit 6ee95fc) — chore(deps): cap starlette <1.2

The first CI run on this PR failed with ModuleNotFoundError: No module named 'httpx2' in tests/unit/test_a2a_protocol.py and tests/unit/test_server_app_full.py. Neither file is touched by the #281 work.

Root cause: starlette 1.2.0 was published on 2026-05-28 (between main's last green CI at 11:35Z and this PR's first CI run at 12:03Z) and makes its TestClient require httpx2 instead of httpx. Locus pins httpx<1.0 (because OCIRequestSigner and BearerAuth subclass the soon-removed top-level httpx.Auth), so the resolver can't satisfy starlette 1.2's TestClient transitively.

Capped starlette<1.2 alongside the existing httpx<1.0 pin; verified the 102 affected tests collect + pass under starlette 1.0.0. Lift both pins together when the auth classes are ported to httpx 2.x.

If you'd prefer the dep pin in its own PR, happy to split — but bundling unblocks CI on this one in the same round-trip.

Why mirror langchain-oci

The langchain-oci package ships Oracle's own LangChain integration against the same OCI Generative AI endpoint Locus's V1 transport hits. Its sanitize_schema is the reference pattern used in production by every Oracle-hosted LangChain agent. Re-deriving the sanitiser independently would just create a parallel set of mistakes; copying the proven pattern keeps the bug surface narrow.

Tests

Unit (tests/unit/test_oci_tool_schema_sanitize.py, 12 tests, all passing):

  • test_optional_field_collapses_type_list_to_single_string — Pydantic-generated Optional[str] schema clears the sanitiser.
  • test_explicit_type_list_collapses_to_first_non_null — handcrafted ["string","null"] / ["null","number"] / degenerate ["null"].
  • test_type_any_normalises_to_object.
  • test_metadata_keys_strippedtitle/const/x-*/default:None all gone; const: "search" promoted to enum.
  • test_user_field_named_title_survives_recursion — defends the properties/$defs recursion seam.
  • test_array_without_items_gets_default_items.
  • test_does_not_mutate_input.
  • test_nested_recursion_through_items_properties — the exact deepest-nesting shape from the issue's error path.
  • 4 wiring tests on _sanitize_tools_for_oci: applies to each tool, returns None/[] for empty inputs, passes through non-function tools, doesn't mutate caller's list.

Integration (tests/integration/test_oci_gemini_tool_schema_live.py, gated by RUN_LIVE_OCI=1):

  • test_gemini_accepts_optional_field_tool_schema — sends a tool with Optional[str], Optional[int], Any, and a nested array-of-objects to real Gemini 2.5 Flash; pre-fix raised 400, post-fix completes.
  • test_gemini_accepts_clean_schema_unchanged — control: a tool with no Optional fields still works (sanitiser is idempotent on clean schemas).

Both integration tests verified live against oci:google.gemini-2.5-flash in us-chicago-1.

Regression sweep: 149 OCI-related unit tests passing (test_oci_*.py + test_tools_*.py), no breakage from removing the speculative response_format strip.

Test plan

  • uv run pytest tests/unit/test_oci_tool_schema_sanitize.py -v — 12/12
  • RUN_LIVE_OCI=1 uv run pytest tests/integration/test_oci_gemini_tool_schema_live.py -v — 2/2 live against real Gemini
  • uv run pytest tests/unit/test_oci_openai_compat*.py tests/unit/test_tools_*.py — 149/149, no regressions
  • uv run pytest tests/unit/test_a2a_protocol.py tests/unit/test_server_app_full.py — 102/102 with starlette<1.2 pin
  • Pre-commit (ruff lint + format + mypy + codespell + commitizen + DCO sign-off) — passing on both commits

OCI's V1 OpenAI-compat transport (OCIChatCompletionsModel) serves every
non-Cohere-R / non-DAC model including google.gemini-*. Pydantic emits
type: ["string", "null"] for Optional[str] and type: "any" for Any, and
Gemini's Proto schema validator rejects both with a hard 400:

  Unknown name 'type' at 'tools[0].function_declarations[N]...':
  Proto field is not repeating, cannot start list.

The new sanitize_oci_tool_schema() helper mirrors
OCIUtils.sanitize_schema from Oracle's own langchain-oci package - the
validated reference pattern used in production against this endpoint.
It collapses list types to a single string, remaps "any" -> "object",
strips metadata keys (title / const / x-* / default:None), promotes
const -> enum, and defaults missing items on arrays. Recurses correctly
through properties / \$defs / definitions so user-defined field names
survive.

Wired into _sanitize_tools_for_oci() and applied unconditionally to
tools[*].function.parameters in both complete() and stream() -
matching langchain-oci's design; the stripped keys are pure metadata
that non-Gemini vendors ignore. The earlier speculative
response_format strip (vendor-gated additionalProperties /
patternProperties / dependencies) was targeting a different OCI
endpoint and has been removed; only the now-validated \$ref inline for
Gemini response_format is preserved.

Verified live against oci:google.gemini-2.5-flash in us-chicago-1: a
tool with Optional[str], Optional[int], Any, and a nested
array-of-objects (the deepest-nesting shape from the issue's error
path) clears the validator and the call completes. 12 unit tests
cover the helper + wiring; 2 live integration tests prove the
end-to-end fix.

Fixes #281

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 28, 2026
starlette 1.2.0 (2026-05-28) makes its TestClient require ``httpx2``
instead of ``httpx``. Locus pins ``httpx>=0.27,<1.0`` (because
``OCIRequestSigner`` and ``BearerAuth`` subclass the soon-removed
top-level ``httpx.Auth``), so the resolver can't satisfy starlette
1.2's TestClient transitively, and ``tests/unit/test_a2a_protocol.py``
+ ``tests/unit/test_server_app_full.py`` fail at collection time
with::

    ModuleNotFoundError: No module named 'httpx2'
    StarletteDeprecationWarning: Using ``httpx`` with
    ``starlette.testclient`` is deprecated; install ``httpx2``
    instead.

Cap ``starlette<1.2`` alongside the existing ``httpx<1.0`` pin.
Lift both together when ``OCIRequestSigner`` / ``BearerAuth`` are
ported to the httpx 2.x auth API.

This is a drive-by fix bundled with #281 to unblock its CI; the
break is unrelated to the schema-sanitisation work in that PR — it
just landed on PyPI between main's last green CI (11:35Z) and
#281's first CI run (12:03Z).

Verified locally: 102/102 tests in the two affected files collect
and pass under starlette 1.0.0.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel merged commit 1951ce3 into main May 28, 2026
10 checks passed
fede-kamel added a commit that referenced this pull request May 28, 2026
…t safety net (#284)

Two Gemini-on-V1-transport fixes that landed since b22:

  * fix(agent): force summary call when assistant content is empty
    (closes #280) — #282
  * fix(oci): sanitize tool JSON Schema for Gemini Proto validator
    (closes #281) — #283

Plus one drive-by ``starlette<1.2`` cap to unblock CI after upstream
broke the existing ``httpx<1.0`` pin on 2026-05-28.

No breaking changes. The schema sanitiser is applied unconditionally
on the V1 (OpenAI-compat) transport, but the strip-list mirrors
``langchain_oci.common.utils.OCIUtils.sanitize_schema`` — the keys
it removes are pure metadata that non-Gemini vendors ignore.

The empty-content safety net costs at most one extra LLM call per
agent run, and only when the model already returned an iteration
with no tool calls AND no content. Uses ``auxiliary_model`` if set
so operators can route the recovery call to a cheaper model.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel deleted the fix/oci-gemini-schema-munging branch May 29, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCIModel + Gemini: structured output rejects Pydantic schemas with additionalProperties:false

1 participant