Skip to content

feat: cli OpenAI-compatible API response_format support#884

Open
markstur wants to merge 5 commits intogenerative-computing:mainfrom
markstur:issue_824
Open

feat: cli OpenAI-compatible API response_format support#884
markstur wants to merge 5 commits intogenerative-computing:mainfrom
markstur:issue_824

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 17, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

feat: cli OpenAI-compatible API `response_format` support

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from a team as a code owner April 17, 2026 23:37
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@markstur markstur changed the title Issue 824 feat: cli OpenAI-compatible API response_format support Apr 17, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Apr 17, 2026
Comment thread cli/serve/app.py
Comment thread cli/serve/streaming.py Outdated
Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/models.py
Comment thread cli/serve/app.py Outdated
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)

Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/app.py
* elif/pass not needed. Comment
* set finish reason from result

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@planetf1
Copy link
Copy Markdown
Contributor

OpenAI will return a finish_reason of length if limits are hit.

If we're trying to be OpenAI compatible (which is the intent here) my default response would be to say we should emulate that behaviour -- though I am uncomfortable in some ways with returning syntactically invalid json.. I guess the rationale is 'we returned what we could'. In some ways it comes down to our 'contract' with callers

For streaming this is a key difference as we fail validation (after the data has been streamed)

We also have the inconsistency between streaming (here) and non-streaming. For the latter we're already returning the same as OpenAI since we missed doing any validation...

I'd say to remove the validation -- then we have the same behaviour and it's down to the format parameter and how the backend constrained decoding works. I think this is what @jakelorocco describes

I think this also means my early comment about ADDING streaming validatino is something I no longer feel is the right way forward. In fact the opposite -- remove

@psschwei
Copy link
Copy Markdown
Member

+1 on emulating openai as faithfully as possible

* inspect server() to set accepts_format and is_async in make_chat_endpoint instead of on every request
* improved support for json_schema that can be converted to pydantic
* no output validation. The module is given a format when applicable, but instead
  of throwing an error if the formatting is incorrect, the results are sent to the client.
* consolidated redundant blocks of kwargs by building a dict of kwargs needed.
* improved setting of finish_reason
* added tests
* improved comments/docstrings where clarification needed

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

removed the validation so now it is more consistent with what you might expect from an OpenAI API server that will return what it has even if it is truncated or otherwise broken.

@markstur markstur enabled auto-merge May 1, 2026 00:23
value.upper() if value and value[0].isalpha() else f"VALUE_{index}"
): value
for index, value in enumerate(enum_values)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String-enum members collide when two values upper-case to the same name. The dict comprehension at L126-131 keys each member on value.upper(), so ["open", "OPEN"] produces a 1-member enum — one value is silently dropped, and validating the dropped value fails with ValidationError even though the schema declared it.

Reproduced against the branch:

m = json_schema_to_pydantic(
    {"type": "object",
     "properties": {"s": {"type": "string", "enum": ["open", "OPEN"]}},
     "required": ["s"]},
)
m.model_validate({"s": "open"})   # ValidationError

Case-variant enums are uncommon in hand-written OpenAI schemas but routine in schemas migrated from other sources (OpenAPI, Java/TS codebases, ["AM", "PM"], etc.). The failure is silent at conversion time and only surfaces as a mismatched validation later.

Two options, either is fine:

  1. Dedupe member names — track seen names, append _1, _2, … on collision. Minimal change, preserves the named-Enum form.
  2. Collapse to Literal — the non-string branch at L134 already uses Literal[tuple(enum_values)], which is case-exact, needs no name generation, and would let L122-132 disappear. No public contract depends on the Enum class being named.

Worth a regression test asserting both "open" and "OPEN" validate when both are declared.

raise ValueError(f"{_format_path(path)} enum must be an array")
return _enum_annotation(enum_values, path)

field_type = normalized_schema.get("type", "string")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L286: field_type = normalized_schema.get("type", "string") silently defaults any typeless schema node to str. JSON Schema with no type keyword is valid and means "any type"; defaulting to str narrows that silently.

Reproduced:

json_schema_to_pydantic({
    "type": "object",
    "properties": {"data": {"description": "anything"}},
    "required": ["data"],
}).model_validate({"data": 42})  # ValidationError: expected str

{"description": "..."} without type is a real pattern in schemas pasted from OpenAPI specs where types are contextual. The converter already raises ValueError for unrecognised types ({"type": "integr"} → "unsupported JSON schema type"), so this is the only silently-wrong fallthrough left in the type dispatch.

Two options:

  1. Raise on missing type — strictest and matches OpenAI's own strict-mode requirement that every node declares a type. Most aligned with the project's fail-fast policy.
  2. Default to Any — spec-faithful; treats missing type as "any JSON value" and lets Pydantic validate nothing specific.

Either is a clear improvement over the current silent narrowing. Lower severity than the enum collision — OpenAI-style schemas usually do declare type on every node — but worth fixing while the converter is fresh.

if normalized_schema.get("type") != "object":
raise ValueError(f"{_format_path(path)} must be an object schema")

cache_key = f"{current_model_name}:{id(object_schema)}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L354: cache_key = f"{current_model_name}:{id(object_schema)}" — the cache never protects against recursive $ref. _normalize_schema returns a fresh dict on every ref resolution, so each re-entry to _object_schema_to_model for the same $def gets a new id(), a new cache miss, and another recursion.

Reproduced:

json_schema_to_pydantic({
    "type": "object",
    "$defs": {"Node": {
        "type": "object",
        "properties": {"val": {"type": "integer"},
                       "child": {"$ref": "#/$defs/Node"}},
        "required": ["val"],
    }},
    "properties": {"root": {"$ref": "#/$defs/Node"}},
    "required": ["root"],
})  # RecursionError

Recursive schemas are rare in OpenAI-style structured output (strict mode historically disallowed them) but a valid JSON Schema pattern for tree/linked-list outputs. The current behaviour is a server-side crash — caught by the endpoint's except Exception and surfaced as a generic 500.

Minimum fix: detect cycles by tracking in-flight ref names in a set and raise ValueError("recursive $ref is not supported") — turns a 500 into a clean 400 and adds one line to the "Still unsupported" list in the docstring. Supporting recursion properly would need Pydantic forward refs + model_rebuild(); probably not worth it for v1.

Severity: lower than findings 1 and 2 — unlikely to hit OpenAI users in practice — but trivial to fix and improves the error surface.

Comment thread cli/serve/utils.py
if choices and len(choices) > 0:
finish_reason = choices[0].get("finish_reason")
if finish_reason in valid_reasons:
return finish_reason
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_finish_reason handles Ollama (chat_response.done_reason) and OpenAI-direct (oai_chat_response["choices"][0]["finish_reason"]) but silently skips LiteLLM and returns the default "stop" for every LiteLLM-backed request.

LiteLLM stores its response at a different key and in a different shape — mellea/backends/litellm.py:443,693:

mot._meta["litellm_chat_response"] = chunk.choices[0].model_dump()

This is a flat Choice dict ({"finish_reason": "length", "index": 0, "message": {...}}), not wrapped in a choices array, so neither existing branch matches.

Not a regression — pre-PR the code hardcoded "stop" for every backend, so LiteLLM users are no worse off. But the PR's stated goal is "use backend finish_reason", and LiteLLM is a first-class backend in Mellea; leaving it defaulting to "stop" means clients that branch on finish_reason == "length" to paginate will loop forever on LiteLLM-fronted models.

Proposed addition (after L46):

# LiteLLM backend stores a flat Choice.model_dump() dict
litellm_response = output._meta.get("litellm_chat_response")
if isinstance(litellm_response, dict):
    finish_reason = litellm_response.get("finish_reason")
    if finish_reason in valid_reasons:
        return finish_reason

HuggingFace backend is also uncovered but has no OpenAI-equivalent finish signal to extract, so fall-through to "stop" is reasonable there. Worth calling out in the docstring.

Comment thread test/cli/test_serve.py
assert call_args.kwargs["format"] is None

# Verify response is successful
assert isinstance(response, ChatCompletion)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L608-609 (and L635-636 in the text-format sibling test) use a conditional assert that never fires:

if "format" in call_args.kwargs:
    assert call_args.kwargs["format"] is None

The shared mock_module fixture uses Mock(), and inspect.signature(Mock().serve) returns (*args, **kwargs) with parameters ["args", "kwargs"]"format" is never in sig.parameters, so accepts_format in make_chat_endpoint is always False for this fixture, and format is never added to serve_kwargs. The if branch is unreachable; the assert proves nothing.

Reproduced:

>>> inspect.signature(Mock().serve).parameters
mappingproxy({"args": <Parameter "*args">, "kwargs": <Parameter "**kwargs">})
>>> "format" in inspect.signature(Mock().serve).parameters
False

Net effect: these two tests only verify that endpoint() returns a ChatCompletion when response_format is json_object / text — the "no format model is passed" behaviour they claim to test is not tested at all.

Fix options:

  1. Unconditional assertassert "format" not in call_args.kwargs. Locks in the mock's actual behaviour (no format in params → no format in call).
  2. Use a real function with format= param — e.g. def fake_serve(input, requirements=None, model_options=None, format=None): ... and assert format is None unconditionally. More faithful to the real code path.

Option 2 is closer to what the test docstring claims; option 1 is a one-line fix.

Comment thread cli/serve/app.py
return create_openai_error_response(
status_code=400,
message=f"Invalid JSON schema: {e!s}",
error_type="invalid_request_error",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoped nit on the schema-conversion error handling (the broader except Exception handler at L252 is pre-existing, not in scope for this PR).

L181 catches only ValueError from json_schema_to_pydantic, but the converter can also surface non-ValueError exceptions for pathological schemas that should still be client errors:

  • create_model(...) can raise PydanticUserError / PydanticSchemaGenerationError on schemas that produce an internally-inconsistent model.
  • Enum(...) in _enum_annotation can raise TypeError for certain member-name edge cases (see the enum-collision comment above).

These currently fall through to the generic 500 handler and surface to the client as "Internal server error: ..." rather than a clean 400 "Invalid JSON schema".

Minimal fix — widen this block only:

except (ValueError, TypeError) as e:
    return create_openai_error_response(
        status_code=400,
        message=f"Invalid JSON schema: {e!s}",
        error_type="invalid_request_error",
        param="response_format.json_schema.schema",
    )

Or import the Pydantic error types explicitly and include them in the tuple. Either way keeps the scope tight to schema-conversion failures and leaves the outer handler alone.

Comment thread cli/serve/models.py
"""JSON Schema definition when type is 'json_schema'."""


class StreamOptions(BaseModel):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: ResponseFormat accepts {"type": "json_schema", "json_schema": null} at Pydantic-parse time and defers the "json_schema is required when type=json_schema" check to the endpoint at app.py:169-175. The endpoint does catch it and returns a clean 400, so this is a code-organisation concern rather than a functional bug.

A model_validator on ResponseFormat would co-locate the invariant with the type it constrains and shrink the endpoint body by a branch:

from pydantic import model_validator

class ResponseFormat(BaseModel):
    type: Literal["text", "json_object", "json_schema"]
    json_schema: JsonSchemaFormat | None = None

    @model_validator(mode="after")
    def _require_json_schema(self) -> "ResponseFormat":
        if self.type == "json_schema" and self.json_schema is None:
            raise ValueError("json_schema is required when type is 'json_schema'")
        return self

The endpoint 's existing except RequestValidationError handler (via validation_exception_handler) will then turn the ValueError into a 400 with the same shape as other validation errors, and the explicit block at app.py:169-175 can go away.

Not blocking — happy to ship as-is if the author prefers endpoint-level validation.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting some changes. I think the main one is around the enum clash -- the others add value over what was there before, so the comments are more about completeness rather than regression - so I think it would be good to fix up, but minded to approve when you are comfortable with the changes. nothing else is a blocker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

m serve OpenAI API structured output

4 participants