grammar : support array references in json schema #16792

aldehir · 2025-10-26T20:37:05Z

The JSON schema to grammar conversion does not support referencing array items. It appears zod or the MCP library may do this to reuse schemas instead of creating a separate definition.

Example

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-oss-20b",
  "messages": [
    {
      "role": "user",
      "content": "Build a binary tree that matches (A (B C D) E)"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "build_binary_tree",
        "description": "build a binary tree. The left/right fields may be either a string value or another tree.",
        "parameters": {
          "type": "object",
          "properties": {
            "tree": {
              "type": "object",
              "properties": {
                "left": {
                  "anyOf": [
                    {
                      "type": "string"
                    },
                    {
                      "$ref": "#/properties/tree"
                    }
                  ]
                },
                "right": {
                  "anyOf": [
                    {
                      "$ref": "#/properties/tree/properties/left/anyOf/0"
                    },
                    {
                      "$ref": "#/properties/tree"
                    }
                  ]
                }
              },
              "additionalProperties": false
            }
          },
          "required": [
            "tree"
          ],
          "additionalProperties": false,
          "$schema": "http://json-schema.org/draft-07/schema#"
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

The example above will fail with:

{
  "error": {
    "code": 500,
    "message": "JSON schema conversion failed:\nError resolving ref #/properties/tree/properties/left/anyOf/0: 0 not in [{\"type\":\"string\"},{\"$ref\":\"#/properties/tree\"}]",
    "type": "server_error"
  }
}

This PR adds support for referencing array items.

Since indexes are less unique than object keys, it also renames the grammar rules derived from references.

Currently, the rules are named after the last component of the reference. E.g. #/properties/tree => tree. That doesn't work well with indexes, so instead this PR names them as follows:

Extract the content after #.
Replace non-alphanumeric characters with a dash -.
Prefix with ref

This results in #/properties/tree => ref-properties-tree as the grammar rule name.

@ochafik I would like your opinion on the rule naming. I don't know if there is any additional impact I am not seeing.

Here is the same example against this PR:

{
  "model": "unsloth/gpt-oss-20b",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "We need to use the function build_binary_tree. The f...",
        "content": null,
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "build_binary_tree",
              "arguments": "{\"tree\":{\"left\":{\"left\":\"C\",\"right\":\"D\"},\"right\":\"E\"}}"
            },
            "id": "NHd0M0zGWTx1XG6DN2NalUqMT0b8GVb3"
          }
        ]
      }
    }
  ]
}

And the grammar rules generated:

build-binary-tree-args ::= "{" space build-binary-tree-args-tree-kv "}" space
build-binary-tree-args-tree ::= "{" space  (build-binary-tree-args-tree-left-kv build-binary-tree-args-tree-left-rest | build-binary-tree-args-tree-right-kv )? "}" space
build-binary-tree-args-tree-kv ::= "\"tree\"" space ":" space build-binary-tree-args-tree
build-binary-tree-args-tree-left ::= string | build-binary-tree-args-tree-left-1
build-binary-tree-args-tree-left-1 ::= ref-properties-tree
build-binary-tree-args-tree-left-kv ::= "\"left\"" space ":" space build-binary-tree-args-tree-left
build-binary-tree-args-tree-left-rest ::= ( "," space build-binary-tree-args-tree-right-kv )?
build-binary-tree-args-tree-right ::= build-binary-tree-args-tree-right-0 | build-binary-tree-args-tree-right-1
build-binary-tree-args-tree-right-0 ::= string
build-binary-tree-args-tree-right-1 ::= ref-properties-tree
build-binary-tree-args-tree-right-kv ::= "\"right\"" space ":" space build-binary-tree-args-tree-right
build-binary-tree-call ::= "build_binary_tree"channel " <|constrain|>json"? "<|message|>" build-binary-tree-args
build-binary-tree-call0 ::= "build_binary_tree" " <|constrain|>json"? "<|message|>" build-binary-tree-args
channel ::= "<|channel|>" ( "commentary" | "analysis" )
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
recipient-in-channel ::= channel " to=functions." ( build-binary-tree-call0 )
recipient-in-role ::= "<|start|>assistant"? " to=functions." ( build-binary-tree-call )
ref-properties-tree ::= "{" space  (ref-properties-tree-left-kv ref-properties-tree-left-rest | ref-properties-tree-right-kv )? "}" space
ref-properties-tree-left ::= string | ref-properties-tree-left-1
ref-properties-tree-left-1 ::= ref-properties-tree
ref-properties-tree-left-kv ::= "\"left\"" space ":" space ref-properties-tree-left
ref-properties-tree-left-rest ::= ( "," space ref-properties-tree-right-kv )?
ref-properties-tree-right ::= ref-properties-tree-right-0 | ref-properties-tree-right-1
ref-properties-tree-right-0 ::= string
ref-properties-tree-right-1 ::= ref-properties-tree
ref-properties-tree-right-kv ::= "\"right\"" space ":" space ref-properties-tree-right
root ::= recipient-in-role | recipient-in-channel
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space

common/json-schema-to-grammar.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC · 2025-10-27T10:58:16Z

@aldehir Do you need someone to merge for you and/or are you waiting for @ochafik?

ochafik

Thanks @aldehir !

Looks good (small nits), no objections re/ naming (looks neater!)

Could you share an example of zod schema that produces def arrays? The spec seems to say defs must be an object (which in json spec doesn't mean array i believe).

common/json-schema-to-grammar.cpp

examples/json_schema_to_grammar.py

tools/server/public_legacy/json-schema-to-grammar.mjs

aldehir · 2025-10-27T14:11:23Z

@ochafik thank you for pointing that out, and for the feedback. This stemmed from the following (contrived) example in #16714 (comment):

const SomeObject = z.object({ test: z.string() })

const testType = z.lazy(() =>
  z.object({
    and: z.union([SomeObject, testType]),
    or: z.union([SomeObject, testType]),
    not: z.union([SomeObject, testType])
  })
)

server.registerTool('test',
  {
    title: 'Get test record by ID',
    description: 'Get test record by ID',
    inputSchema: {
      test: testType
    }
  },
  ({ testId }) => {
    return {
      content: [{ type: 'text', text: 'You requested test record ID: ' + testId }]
    }
  }
)

Which produces:

{
  "type": "object",
  "properties": {
    "test": {
      "type": "object",
      "properties": {
        "and": {
          "anyOf": [
            {
              "type": "object",
              "properties": {
                "test": {
                  "type": "string"
                }
              },
              "required": [
                "test"
              ],
              "additionalProperties": false
            },
            {
              "$ref": "#/properties/test"
            }
          ]
        },
        "or": {
          "anyOf": [
            {
              "$ref": "#/properties/test/properties/and/anyOf/0"
            },
            {
              "$ref": "#/properties/test"
            }
          ]
        },
        "not": {
          "anyOf": [
            {
              "$ref": "#/properties/test/properties/and/anyOf/0"
            },
            {
              "$ref": "#/properties/test"
            }
          ]
        }
      },
      "required": [
        "and",
        "or",
        "not"
      ],
      "additionalProperties": false
    }
  },
  "required": [
    "test"
  ],
  "additionalProperties": false,
  "$schema": "http://json-schema.org/draft-07/schema#"
}

It does not appear specific to Zod's new toJSONSchema() function in v4, but rather the MCP typescript library which still uses zod v3.

I used a definitions array only to keep the test case simple. I updated it to reference a schema in anyOf instead.

aldehir · 2025-10-27T17:04:07Z

@CISC, barring any objections, it is ready to merge. I do not have the ability to do so.

@ykhrustalev

* model : add LightOnOCR-1B model (ggml-org#16764) * model : add LightOnOCR-1B model * add test * HIP: fix AMDGPU_TARGETS, update documentation (ggml-org#16803) * ggml : fix interpolate with align-corners and ne=1 (ggml-org#16700) * ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning * llama : disable pipeline parallelism if compute buffer allocation fails (ggml-org#16748) * mtmd : fix idefics3 preprocessing (ggml-org#16806) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite * chat: Add LFM2 tool handling (ggml-org#16763) * Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev * sycl: add SSM_CONV operation support (ggml-org#16800) * feat: Add SYCL backend support for SSM_CONV operator * Implement State Space Model Convolution 1D for SYCL backend * Add optimized GPU kernel with parallel work distribution * Support various tensor dimensions and batch sizes * Full integration with existing SYCL infrastructure * All tests pass with CPU backend equivalence verification * feat: Implement SYCL backend support for SSM_CONV operation - Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp - Implement SYCL kernel for state space model convolution - Ensure numerical correctness matches CPU implementation exactly - Add proper type checking for F32 tensors in backend support - All test-backend-ops SSM_CONV tests pass (14490/14490) * Perfect SSM_CONV SYCL implementation - 100% CPU parity ✅ Flawless numerical accuracy - matches CPU bit-for-bit ✅ Optimal SYCL kernel design - efficient parallel execution ✅ Complete tensor layout compatibility - handles all strides correctly ✅ Robust error handling - comprehensive assertions and validation ✅ All official tests pass - 14,490/14,490 backend operations verified ✅ Production-ready code - clean, documented, maintainable Implements state-space model 1D convolution with sliding window algorithm. Eliminates blocking queue.wait() for better async performance. * Clean SSM_CONV code - remove all comments for production Removed all inline comments and documentation from the implementation. Clean, minimal code ready for production merge. * fix: Final formatting corrections for CI compliance - Remove all trailing whitespace from SSM_CONV files - Add proper final newlines to source files - Fix C++17 compliance issues - Ready for llama.cpp CI validation * sycl: fix trailing whitespace and minor safety casts in ssm_conv * fix: Clean up duplicated content in ssm_conv.hpp header file --------- Co-authored-by: tamarPal <tamarPal@example.com> * CUDA: add unused vars to mmvf and mmvq (ggml-org#16807) * CANN: Improve device ID handling and aclnnArange checks (ggml-org#16752) * cann: improve device ID handling and aclnnArange checks - Stop relying on CANN's internal device ID retrieval; use a global variable instead. - Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions. * cann: use thread local var * grammar : support array references in json schema (ggml-org#16792) * grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * llama: consistent ctx <-> buf order for KV cache (ggml-org#16746) * embedding: add raw option for --embd-output-format (ggml-org#16541) * Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Acly <aclysia@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: tamarPal <tamarp3385@gmail.com> Co-authored-by: tamarPal <tamarPal@example.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: Aldehir Rojas <hello@alde.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Sam Malayek <12037535+SamMalayek@users.noreply.github.com>

* grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

grammar : support array references in json schema

9112541

aldehir requested a review from ggerganov as a code owner October 26, 2025 20:37

github-actions bot added testing Everything test related examples python python script changes server labels Oct 26, 2025

CISC reviewed Oct 26, 2025

View reviewed changes

common/json-schema-to-grammar.cpp Outdated Show resolved Hide resolved

Update json-schema-to-grammar.cpp

9eac00c

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggerganov approved these changes Oct 27, 2025

View reviewed changes

ochafik reviewed Oct 27, 2025

View reviewed changes

common/json-schema-to-grammar.cpp Outdated Show resolved Hide resolved

examples/json_schema_to_grammar.py Outdated Show resolved Hide resolved

tools/server/public_legacy/json-schema-to-grammar.mjs Outdated Show resolved Hide resolved

aldehir added 2 commits October 27, 2025 08:58

grammar : improve regex when naming ref derived rules

4f0a592

grammar : replace non-conformant definitions array with anyOf test case

927e069

CISC merged commit 280d97b into ggml-org:master Oct 28, 2025
74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

grammar : support array references in json schema #16792

grammar : support array references in json schema #16792

aldehir commented Oct 26, 2025

Uh oh!

Uh oh!

CISC commented Oct 27, 2025

Uh oh!

ochafik left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aldehir commented Oct 27, 2025 •

edited

Loading

Uh oh!

aldehir commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

grammar : support array references in json schema #16792

grammar : support array references in json schema #16792

Conversation

aldehir commented Oct 26, 2025

Uh oh!

Uh oh!

CISC commented Oct 27, 2025

Uh oh!

ochafik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aldehir commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldehir commented Oct 27, 2025 •

edited

Loading