#  GPT-5 New Params and Tools

We’re introducing new developer controls in the GPT-5 series that give you greater control over model responses—from shaping output length and style to enforcing strict formatting. Below is a quick overview of the latest features:


| #  | Feature | Overview | Values / Usage |
|----|---------|----------|----------------|
| 1. | **Verbosity Parameter** | Lets you hint the model to be more or less expansive in its replies. Keep prompts stable and use the parameter instead of re-writing. | • **low** → terse UX, minimal prose.<br>• **medium** *(default)* → balanced detail.<br>• **high** → verbose, great for audits, teaching, or hand-offs. |
| 2. | **Free-Form Function Calling** | Generate raw text payloads—anything from Python scripts to SQL queries—directly to your custom tool without JSON wrapping. Offers greater flexibility for external runtimes like:<br>• Code sandboxes (Python, C++, Java, …)<br>• SQL databases<br>• Shell environments<br>• Config generators | Use when structured JSON isn’t needed and raw text is more natural for the target tool. |
| 3. | **Context-Free Grammar (CFG)** | A set of production rules defining valid strings in a language. Each rule rewrites a non-terminal into terminals and/or other non-terminals, independent of surrounding context. Useful for constraining output to match the syntax of programming languages or custom formats in OpenAI tools. | Use as a contract to ensure the model emits only valid strings accepted by the grammar. |

**Supported Models:**  
- gpt-5  
- gpt-5-mini  
- gpt-5-nano  

**Supported API Endpoints** 
- Responses API 
- Chat Completions API 

Note: We recommend to use Responses API with GPT-5 series of model to get the most performance out of the models. 


## Pre-requisites 

Let's begin with updating your OpenAI SDK that supports the new params and tools for GPT-5. Make sure you've set OPENAI_API_KEY as an environment variable. 

In [2]:
!pip install --quiet --upgrade openai pandas && \
echo -n "openai " && pip show openai | grep '^Version:' | cut -d' ' -f2 && \
echo -n "pandas " && pip show pandas | grep '^Version:' | cut -d' ' -f2

openai 1.99.1
pandas 2.3.1


## 1. Verbosity Parameter 

### 1.1 Overview 
The verbosity parameter lets you hint the model to be more or less expansive in its replies.   

**Values:** "low", "medium", "high"

- low → terse UX, minimal prose.
- medium (default) → balanced detail.
- high → verbose, great for audits, teaching, or hand-offs.

Keep prompts stable and use the param rather than re-writing.


In [5]:
from openai import OpenAI
import pandas as pd
from IPython.display import display

client = OpenAI()

question = "Write a poem about a boy and his first pet dog."

data = []

for verbosity in ["low", "medium", "high"]:
    response = client.responses.create(
        model="gpt-5-mini",
        input=question,
        text={"verbosity": verbosity}
    )

    # Extract text
    output_text = ""
    for item in response.output:
        if hasattr(item, "content"):
            for content in item.content:
                if hasattr(content, "text"):
                    output_text += content.text

    # Truncate for display
    if len(output_text) > 700:
        sample_output = (
            output_text[:500]
            + " ... redacted for brevity ... "
            + output_text[-200:]
        )
    else:
        sample_output = output_text

    usage = response.usage
    data.append({
        "Verbosity": verbosity,
        "Sample Output": output_text,
        "Output Tokens": usage.output_tokens
    })

# Create DataFrame
df = pd.DataFrame(data)

# Display nicely with centered headers
pd.set_option('display.max_colwidth', None)
styled_df = df.style.set_table_styles(
    [
        {'selector': 'th', 'props': [('text-align', 'center')]},  # Center column headers
        {'selector': 'td', 'props': [('text-align', 'left')]}     # Left-align table cells
    ]
)

display(styled_df)


Unnamed: 0,Verbosity,Sample Output,Output Tokens
0,low,"He found him behind the old shed, a damp ball of surprise, eyes like two small moons against the dusk. His palms trembled when they touched the ribs, a first brave promise — the word that made them both grin: ""home."" They learned the map of each other's days: mud on the porch, the shorthand of paws on tile, lessons in patience—sit, wait, the clumsy language of fetch— evenings stitched together by a tail that kept time. The dog taught him how to be gentle with broken things: scarred knees, lost kites, the ache of a scraped promise. When the boy fell, the dog was there to push him up with a wet nose, when the world tightened, the dog leaned in like a small sun. Years threaded silver into ears and softened the sprint, the boy grew taller, pockets fuller of other lives. On the last slow walk he carried the dog across the field they had worn bare, and learned what it meant to hold and let go at once. Sometimes, in a house that now holds different voices, he still calls the name and feels a short, bright tug at the hem of memory.",754
1,medium,"He was small enough to be held in two palms, hands surprised by the light, warm weight of a breathing animal. The dog came with a cardboard box of toys, a collar that jingled like a laugh, and eyes the color of late autumn pools. They learned each other by touch: how a thumb smoothed a lump of fur, how a paw fit between fingers like a promise. Afternoons became slow, luminous things — fetch across the yard, the dog tumbling through dandelions, the boy counting sticks as if they were crowns. At night the dog curled against a ribcage and taught the boy how to sleep with one ear listening. They shared secrets whispered into warm fur, bandaged scraped knees with careful nuzzles, and sat together on porches while light leaned away. The world outside was larger, sharper, but he had a companion who never measured fear, only presence — a warm, steady map back home. Lessons came not in words but in ritual: the bowl filled, the leash clicked, the hush before storms. Responsibility fit him like a second skin. Years threaded through collars and collars loosened, legs that once ran like wind now slowed to wiser steps. The boy learned how to be brave in other ways: walking into rooms without the certainty of pawprints, carrying a small quiet grief folded into his chest. Even when the dog went where yards are always green, he left prints on the boy’s life that never washed away — a tucked-in photograph, a faded ball, the smell of wet fur in summer rain. Sometimes, when the house is very still, the boy — now taller, still hears the jingle of a collar and smiles, for he knows how to be loved, and how to love back.",939
2,high,"The dog arrived like a small sun, all fur and surprise, an apology for shoes chewed at the doorstep. He fit into the boy’s lap as if he had been made there: a warm, wiggling answer to the question the boy never knew he’d asked. They learned one another by touch — the boy’s clumsy fingers finding the soft map of ears, the steady paddling heart. The dog learned the geometry of a bedroom, a couch, a favorite chair; the boy learned the weight of responsibility when the bowl was empty. They practiced words like spellcraft: sit, stay, no — then laugh when the commands rearranged themselves into games. Mud sketched secret rivers on their trousers; the neighborhood became a kingdom to patrol on two legs and four. At night the dog lay like a small, even moon against the boy’s ribs, breathing the rhythm of a shared world into his dreams. Storms were smaller storms then, because the dog’s body was a promise: that thunder could be waited out, that hands could find something warm. They buried treasures — bones, a tin soldier, a lost mitten — beneath the old apple tree, sworn companions to the earth. They chased imagined villains down fences and over hedges, and sometimes, at the creek, the boy learned the exact angle of courage: how to step in, then trust another heartbeat beside his. The dog taught him how to be brave without knowing the word, showed him where loyalty lived (in the bright, impatient wag of a tail). He taught him how to say hello, properly; how to stay when needed; how to forgive a day of wrongs with a single, earnest lick. Seasons folded themselves into the years; the boy measured time in collars replaced, in new aches in the dog’s hips. Snow made the yard a clean page; leaves wrote their own goodbyes. When the dog’s runs slowed, the boy learned a different kind of steady: to sit more, to listen longer, to count the small comforts. There was a last evening where the light sat low and patient, and the boy — now older in ways that did not fit his face — held that same warm weight. He remembered the first bark like a promise kept, the first wild sprint across the grass, and he kept, beneath his ribs, the map of a thousand small mercies. Now when he walks by an old apple tree, his hand finds empty air, but his steps know how to make room for another’s rhythm. Sometimes a stray dog will glance his way and tilt its head, and he smiles, answering without words what he was taught long ago: how to open a hand, how to offer a place on the floor, how to recognize the sun when it returns — in fur, in breath, in the simple, astonished love of a first pet who showed a boy what home can mean.",1174


The output tokens scale roughly linearly with verbosity: low (754) → medium (939) → high (1174).

### 2.3 Using Verbosity for Coding Use Cases 

The verbosity parameter also influences the length and complexity of generated code, as well as the depth of accompanying explanations. Here's an example, wherein we use various verboisty levels for a task to generate a Python program that sorts an array of 1000000 random numbers. 

In [6]:
from openai import OpenAI

client = OpenAI()

prompt = "Output a Python program that sorts an array of 1000000 random numbers"

def ask_with_verbosity(verbosity: str, question: str):
    response = client.responses.create(
        model="gpt-5-mini",
        input=question,
        text={
            "verbosity": verbosity
        }
    )

    # Extract assistant's text output
    output_text = ""
    for item in response.output:
        if hasattr(item, "content"):
            for content in item.content:
                if hasattr(content, "text"):
                    output_text += content.text

    # Token usage details
    usage = response.usage

    print("--------------------------------")
    print(f"Verbosity: {verbosity}")
    print("Output:")
    print(output_text)
    print("Tokens => input: {} | output: {}".format(
        usage.input_tokens, usage.output_tokens
    ))


# Example usage:
ask_with_verbosity("low", prompt)

--------------------------------
Verbosity: low
Output:
#!/usr/bin/env python3
import random
import time

def main():
    N = 1_000_000
    # generate 1,000,000 random floats
    arr = [random.random() for _ in range(N)]

    t0 = time.perf_counter()
    arr.sort()  # in-place Timsort
    t1 = time.perf_counter()

    print(f"Sorted {N} numbers in {t1 - t0:.4f} seconds")
    # optional quick checks
    print("First 5:", arr[:5])
    print("Last 5:", arr[-5:])
    print("Verified sorted:", all(arr[i] <= arr[i+1] for i in range(len(arr)-1)))

if __name__ == "__main__":
    main()
Tokens => input: 21 | output: 877


Notice that the code output is a plain script. Now, lets run with 'medium' 

In [7]:
ask_with_verbosity("medium", prompt)

--------------------------------
Verbosity: medium
Output:
Here's a simple Python program that generates 1,000,000 random floats and sorts them using Python's built-in Timsort. It times generation and sorting and verifies the result.

```python
import random
import time

def main():
    N = 1_000_000

    print(f"Generating {N} random numbers...")
    t0 = time.perf_counter()
    arr = [random.random() for _ in range(N)]
    t1 = time.perf_counter()
    print(f"Generated in {t1 - t0:.3f} seconds")

    print("Sorting...")
    t2 = time.perf_counter()
    arr.sort()
    t3 = time.perf_counter()
    print(f"Sorted in {t3 - t2:.3f} seconds")

    # Quick verification
    is_sorted = all(arr[i] <= arr[i+1] for i in range(len(arr)-1))
    print("Verified sorted:", is_sorted)

    print("First 10 elements:", arr[:10])
    print("Last 10 elements:", arr[-10:])

if __name__ == "__main__":
    main()
```

If you prefer a faster/more memory-efficient approach and have NumPy installed, you can re

Medium verboisty, generated richer code with additioanl explanations. Let's do the same with high. 

In [8]:
ask_with_verbosity("high", prompt)

--------------------------------
Verbosity: high
Output:
Below are two complete Python programs you can run to generate and sort 1,000,000 random numbers. One is a pure-Python implementation using the built-in list and list.sort() (Timsort). The other uses NumPy (faster and far more memory-efficient for large numeric arrays). Each script times generation and sorting and optionally verifies the result is sorted.

Pure-Python version (no extra dependencies)
- Generates 1,000,000 Python floats with random.random().
- Uses list.sort() (Timsort), which is O(n log n) and very efficient in practice.
- Note: Python float objects have overhead, so the list will use substantially more memory than a raw numeric array.

Save as sort_random_pure.py and run: python sort_random_pure.py
You can change the size with the --count option.

```python
#!/usr/bin/env python3
"""
Generate and sort N random numbers using pure Python list and list.sort().
Default N = 1_000_000. Use --count to override.
"""
impo

High verbosity yielded additional details and explanations. 

### 1.3 Takeaways 

The new verbosity parameter reliably scales both the length and depth of the model’s output while preserving correctness and reasoning quality - **without changing the underlying prompt**.
In this example:

- **Low verbosity** produces a minimal, functional script with no extra comments or structure.
- **Medium verbosity** adds explanatory comments, function structure, and reproducibility controls.
- **High verbosity** yields a comprehensive, production-ready script with argument parsing, multiple sorting methods, timing/verification, usage notes, and best-practice tips.

## 2. Free‑Form Function Calling

### 2.1 Overview 
GPT‑5 can now send raw text payloads - anything from Python scripts to SQL queries - to your custom tool without wrapping the data in JSON using the new tool `"type": "custom"`. This differs from classic structured function calls, giving you greater flexibility when interacting with external runtimes such as:

- code_exec with sandboxes (Python, C++, Java, …)
- SQL databases
- Shell environments
- Configuration generators

**Note that custom tool type does NOT support parallel tool calling.**

### 2.2 Quick Start Example - Compute the Area of a Circle

The code below produces a simple python code to calculate area of a circle, and instruct the model to use the free-form tool call to output the result. 

In [9]:
from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5-mini",
    input="Please use the code_exec tool to calculate the area of a circle with radius equal to the number of 'r's in strawberry",
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "code_exec",
            "description": "Executes arbitrary python code",
        }
    ]
)
print(response.output)

[ResponseReasoningItem(id='rs_6894d0c323d481a2b727907746def8ec03e38603225fe1bd', summary=[], type='reasoning', content=[], encrypted_content=None, status=None), ResponseOutputMessage(id='ctc_6894d0c4917881a29923bb525509b34003e38603225fe1bd', content=None, role=None, status='completed', type='custom_tool_call', call_id='call_1ACilrk0d1DISLvW4Q2iE0jc', input='# Calculate the area of a circle where radius = number of \'r\'s in "strawberry"\nimport math\nradius = "strawberry".count(\'r\')\narea = math.pi * radius**2\n{"radius": radius, "area": area, "area_exact": f"{radius**2}*pi"}', name='code_exec')]


The model emits a `tool call` containing raw Python. You execute that code server‑side, capture the printed result, and send it back in a follow‑up responses.create call.

### 2.3 Mini‑Benchmark – Sorting an Array in Three Languages
To illustrate the use of free form tool calling, we will ask GPT‑5 to:
- Generate Python, C++, and Java code that sorts a fixed array 10 times.
- Print only the time (in ms) taken for each iteration in the code. 
- Call all three functions, and then stop 

In [10]:
from openai import OpenAI
from typing import List, Optional

MODEL_NAME = "gpt-5"

# Tools that will be passed to every model invocation. They are defined once so
# that the configuration lives in a single place.
TOOLS = [
    {
        "type": "custom",
        "name": "code_exec_python",
        "description": "Executes python code",
    },
    {
        "type": "custom",
        "name": "code_exec_cpp",
        "description": "Executes c++ code",
    },
    {
        "type": "custom",
        "name": "code_exec_java",
        "description": "Executes java code",
    },
]

client = OpenAI()

def create_response(
    input_messages: List[dict],
    previous_response_id: Optional[str] = None,
):
    """Wrapper around ``client.responses.create``.

    Parameters
    ----------
    input_messages: List[dict]
        The running conversation history to feed to the model.
    previous_response_id: str | None
        Pass the ``response.id`` from the *previous* call so the model can keep
        the thread of the conversation.  Omit on the very first request.
    """
    kwargs = {
        "model": MODEL_NAME,
        "input": input_messages,
        "text": {"format": {"type": "text"}},
        "tools": TOOLS,
    }
    if previous_response_id:
        kwargs["previous_response_id"] = previous_response_id

    return client.responses.create(**kwargs)

# Recursive 
def run_conversation(
    input_messages: List[dict],
    previous_response_id: Optional[str] = None,
):
  
    response = create_response(input_messages, previous_response_id)

    # ``response.output`` is expected to be a list where element 0 is the model
    # message.  Element 1 (if present) denotes a tool call.  When the model is
    # done with tool calls, that element is omitted.
    tool_call = response.output[1] if len(response.output) > 1 else None

    if tool_call and tool_call.type == "custom_tool_call":
        print("--- tool name ---")
        print(tool_call.name)
        print("--- tool call argument (generated code) ---")
        print(tool_call.input)
        
        # Add a synthetic *tool result* so the model can continue the thread.
        
        input_messages.append(
            {
                "type": "function_call_output",
                "call_id": tool_call.call_id,
                "output": "done", # <-- replace with the result of the tool call
            }
        )

        # Recurse with updated conversation and track the response id so the
        # model is aware of the prior turn.
        return run_conversation(input_messages, previous_response_id=response.id)
    else:
        # Base-case: no further tool call - return. 
        return 


prompt = """
Write code to sort the array of numbers in three languages: C++, Python and Java (10 times each)using code_exec functions.

ALWAYS CALL THESE THREE FUNCTIONS EXACTLY ONCE: code_exec_python, code_exec_cpp and code_exec_java tools to sort the array in each language. Stop once you've called these three functions in each language once.

Print only the time it takes to sort the array in milliseconds. 

[448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949]
"""

# Initial developer message.
messages = [
    {
        "role": "developer",
        "content": prompt,
    }
]

run_conversation(messages)


--- tool name ---
code_exec_python
--- tool call argument (generated code) ---
arr_orig = [448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949]

import time

start = time.perf_counter_ns()
for _ in range(10):
    arr = arr_orig.copy()
    arr.sort()
elapsed_ms = (time.perf_counter_ns() - start) // 1_000_000
print(elapsed_ms)
--- tool name ---
code_exec_cpp
--- tool call argument (generated code) ---
#include <algorithm>
#include <vector>
#include <chrono>
#include <iostream>
int main() {
    std::vector<int> orig = {448, 986, 255, 884, 632, 623, 2

The model output three code blocks in Python, C++ and Java for the same algorithm. The output of the function call was chained back into the model as input to allow model to keep going until all the functions have been called exactly once. 

### 2.4 Takeaways 

Free-form tool calling in GPT-5 lets you send raw text payloads—such as Python scripts, SQL queries, or config files—directly to custom tools without JSON wrapping. This provides greater flexibility for interacting with external runtimes and allows the model to generate code or text in the exact format your tool expects. It’s ideal when structured JSON is unnecessary and natural text output improves usability.

## 3. Context‑Free Grammar (CFG)

### 3.1 Overview 
A context‑free grammar is a collection of production rules that define which strings belong to a language. Each rule rewrites a non‑terminal symbol into a sequence of terminals (literal tokens) and/or other non‑terminals, independent of surrounding context—hence context‑free. CFGs can capture the syntax of most programming languages and, in OpenAI custom tools, serve as contracts that force the model to emit only strings that the grammar accepts.

### 3.2 Grammar Fundamentals

**Supported Grammar Syntax** 
- Lark - https://lark-parser.readthedocs.io/en/stable/
- Regex - https://docs.rs/regex/latest/regex/#syntax

We use LLGuidance under the hood to constrain model sampling: https://github.com/guidance-ai/llguidance.

**Unsupported Lark Features** 
- Lookaround in regexes (`(?=...)`, `(?!...)`, etc.)
- Lazy modifier (`*?`, `+?`, `??`) in regexes.
- Terminal priorities, templates, %declares, %import (except %import common).


**Terminals vs Rules & Greedy Lexing** 

| Concept          | Take-away                                                                    |
|------------------|------------------------------------------------------------------------------|
| Terminals (UPPER)| Matched first by the lexer – longest match wins.                             |
| Rules (lower)    | Combine terminals; cannot influence how text is tokenised.                   |
| Greedy lexer     | Never try to “shape” free text across multiple terminals – you’ll lose control. |

** Correct vs Incorrect Pattern Design

✅ **One bounded terminal handles free‑text between anchors**  
start: SENTENCE  
SENTENCE: /[A-Za-z, ]*(the hero|a dragon)[A-Za-z, ]*(fought|saved)[A-Za-z, ]*(a treasure|the kingdom)[A-Za-z, ]*\./  

❌ **Don’t split free‑text across multiple terminals/rules**  
start: sentence  
sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/  


### 3.3 Example - SQL Dialect — MS SQL vs PostgreSQL

The following code example is now the canonical reference for building multi‑dialect SQL tools with CFGs. It demonstrates:

- Two isolated grammar definitions (`mssql_grammar_definition`, `postgres_grammar_definition`) encoding TOP vs LIMIT semantics.
- How to prompt, invoke, and inspect tool calls in a single script.
- A side‑by‑side inspection of the assistant’s responses.

Define the LARK grammars for different SQL dialects

In [11]:
import textwrap

# ----------------- grammars for MS SQL dialect -----------------
mssql_grammar = textwrap.dedent(r"""
            // ---------- Punctuation & operators ----------
            SP: " "
            COMMA: ","
            GT: ">"
            EQ: "="
            SEMI: ";"

            // ---------- Start ----------
            start: "SELECT" SP "TOP" SP NUMBER SP select_list SP "FROM" SP table SP "WHERE" SP amount_filter SP "AND" SP date_filter SP "ORDER" SP "BY" SP sort_cols SEMI

            // ---------- Projections ----------
            select_list: column (COMMA SP column)*
            column: IDENTIFIER

            // ---------- Tables ----------
            table: IDENTIFIER

            // ---------- Filters ----------
            amount_filter: "total_amount" SP GT SP NUMBER
            date_filter: "order_date" SP GT SP DATE

            // ---------- Sorting ----------
            sort_cols: "order_date" SP "DESC"

            // ---------- Terminals ----------
            IDENTIFIER: /[A-Za-z_][A-Za-z0-9_]*/
            NUMBER: /[0-9]+/
            DATE: /'[0-9]{4}-[0-9]{2}-[0-9]{2}'/
    """)

# ----------------- grammars for PostgreSQL dialect -----------------
postgres_grammar = textwrap.dedent(r"""
            // ---------- Punctuation & operators ----------
            SP: " "
            COMMA: ","
            GT: ">"
            EQ: "="
            SEMI: ";"

            // ---------- Start ----------
            start: "SELECT" SP select_list SP "FROM" SP table SP "WHERE" SP amount_filter SP "AND" SP date_filter SP "ORDER" SP "BY" SP sort_cols SP "LIMIT" SP NUMBER SEMI

            // ---------- Projections ----------
            select_list: column (COMMA SP column)*
            column: IDENTIFIER

            // ---------- Tables ----------
            table: IDENTIFIER

            // ---------- Filters ----------
            amount_filter: "total_amount" SP GT SP NUMBER
            date_filter: "order_date" SP GT SP DATE

            // ---------- Sorting ----------
            sort_cols: "order_date" SP "DESC"

            // ---------- Terminals ----------
            IDENTIFIER: /[A-Za-z_][A-Za-z0-9_]*/
            NUMBER: /[0-9]+/
            DATE: /'[0-9]{4}-[0-9]{2}-[0-9]{2}'/
    """)

### 3.4 Generate specific SQL dialect 
Let's define the prompt, and call the function to produce MS SQL dialect 

In [None]:
from openai import OpenAI
client = OpenAI()

sql_prompt_mssql = (
    "Call the mssql_grammar to generate a query for Microsoft SQL Server that retrieve the "
    "five most recent orders per customer, showing customer_id, order_id, order_date, and total_amount, "
    "where total_amount > 500 and order_date is after '2025-01-01'. "
)

response_mssql = client.responses.create(
    model="gpt-5",
    input=sql_prompt_mssql,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "mssql_grammar",
            "description": "Executes read-only Microsoft SQL Server queries limited to SELECT statements with TOP and basic WHERE/ORDER BY. YOU MUST REASON HEAVILY ABOUT THE QUERY AND MAKE SURE IT OBEYS THE GRAMMAR.",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": mssql_grammar
            }
        },
    ],
    parallel_tool_calls=False
)

print("--- MS SQL Query ---")
print(response_mssql.output[1].input)

--- MS SQL Query ---
SELECT TOP 5 customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC;


The output SQL accurately uses "SELECT TOP" construct. 

In [None]:
sql_prompt_pg = (
    "Call the postgres_grammar to generate a query for PostgreSQL that retrieve the "
    "five most recent orders per customer, showing customer_id, order_id, order_date, and total_amount, "
    "where total_amount > 500 and order_date is after '2025-01-01'. "
)

response_pg = client.responses.create(
    model="gpt-5",
    input=sql_prompt_pg,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "postgres_grammar",
            "description": "Executes read-only PostgreSQL queries limited to SELECT statements with LIMIT and basic WHERE/ORDER BY. YOU MUST REASON HEAVILY ABOUT THE QUERY AND MAKE SURE IT OBEYS THE GRAMMAR.",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": postgres_grammar
            }
        },
    ],
    parallel_tool_calls=False,
)

print("--- PG SQL Query ---")
print(response_pg.output[1].input)

--- PG SQL Query ---
SELECT customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC LIMIT 5;


Output highlights the same logical query - different physical syntax. Supply distinct grammars so the model can only produce valid statements for the chosen dialect.

| Dialect       | Generated Query                                              | Key Difference                          |
|---------------|--------------------------------------------------------------|------------------------------------------|
| MS SQL Server | SELECT TOP 5 customer_id, … ORDER BY order_date DESC;         | Uses `TOP N` clause before column list.  |
| PostgreSQL    | SELECT customer_id, … ORDER BY order_date DESC LIMIT 5;       | Uses `LIMIT N` after `ORDER BY`.         |



### 3.5 Example - Regex CFG Syntax

The following code example demonstrates using the Regex CFG syntax to constrain the free-form tool call to a certain timestamp pattern.

In [1]:
from openai import OpenAI
client = OpenAI()

timestamp_grammar_definition = r"^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) (?:[01]\d|2[0-3]):[0-5]\d$"

timestamp_prompt = (
        "Call the timestamp_grammar to save a timestamp for August 7th 2025 at 10AM."
)

response_mssql = client.responses.create(
    model="gpt-5",
    input=timestamp_prompt,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "timestamp_grammar",
            "description": "Saves a timestamp in date + time in 24-hr format.",
            "format": {
                "type": "grammar",
                "syntax": "regex",
                "definition": timestamp_grammar_definition
            }
        },
    ],
    parallel_tool_calls=False
)

print("--- Timestamp ---")
print(response_mssql.output[1].input)

--- Timestamp ---
2025-08-07 10:00


### 3.5 Best Practices

Lark grammars can be tricky to perfect. While simple grammars perform most reliably, complex grammars often require iteration on the grammar definition itself, the prompt, and the tool description to ensure that the model does not go out of distribution.

- Keep terminals bounded – use `/[^.\n]{0,10}*\./` rather than `/.*\./`. Limit matches both by content (negated character class) and by length (`{M,N}` quantifier). 
- Prefer explicit char‑classes over `.` wildcards.
- Thread whitespace explicitly, e.g. using `SP = " "`, instead of a global `%ignore`.
- Describe your tool: tell the model exactly what the CFG accepts and instruct it to reason heavily about compliance.

**Troubleshooting**
- API rejects the grammar because it is too complex ➜ Simplify rules and terminals, remove `%ignore.*`.
- Unexpected tokens ➜ Confirm terminals aren’t overlapping; check greedy lexer.
- When the model drifts "out‑of‑distribution" (shows up as the model producing excessively long or repetitive outputs, it is syntactically valid but is semantically wrong):
    - Tighten the grammar.
    - Iterate on the prompt (add few-shot examples) and tool description (explain the grammar and instruct the model to reason to conform to it).
    - Experiment with a higher reasoning effort (e.g, bump from medium to high).

**Resources:**  
- Lark Docs – https://lark-parser.readthedocs.io/en/stable/
- Lark IDE – https://www.lark-parser.org/ide/
- LLGuidance Syntax – https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md
- Regex (Rust crate) – https://docs.rs/regex/latest/regex/#syntax

### 3.6 Takeaways 

Context-Free Grammar (CFG) support in GPT-5 lets you strictly constrain model output to match predefined syntax, ensuring only valid strings are generated. This is especially useful for enforcing programming language rules or custom formats, reducing post-processing and errors. By providing a precise grammar and clear tool description, you can make the model reliably stay within your target output structure.