# Daft llm_generate Structured Outputs Examples
Example Suite for structured output calls over text for vLLM and SGLang OpenAI Compatible Servers

In [None]:
!pip install "daft[huggingface]" vllm

In [None]:
!hf auth login

### Online Serving - Launch vLLM OpenAI Compatible Server

Run the following in your terminal
```bash
 python -m vllm.entrypoints.openai.api_server \
  --model google/gemma-3n-e4b-it \
  --guided-decoding-backend guidance \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.85 \
  --host 0.0.0.0 --port 8000
```

Note: If you are in Google Colab, you can open a terminal by clicking the terminal icon in the bottom left of the ui.

It usually takes at least **7.5** minutes before the vLLM server is ready

For these small examples, an L4 GPU should be fine. T4 doesn't support bfloat16

# Table of Contents

1. **Text Examples with llm_generate**
    - Simple Text Generation
    - Guided Choice
    - Guided Regex
    - Pydantic Json Schema
    - Guided Grammar
    - Structural Tag

2. **Image Examples with Batch UDF**
    - 

## Text Examples with llm_generate

In [None]:
import daft
from daft import col
from daft.functions import llm_generate

api_key = "none"
base_url = "https://localhost:8000"
model_id = "google/gemma-3n-e4b-it"
sampling_params = {
    "temperature": 0.0,
    "max_new_tokens": 200,
}

### Text Generation

In [None]:
df = daft.from_pylist([
    {"text":"What is the best thing about daft dataframes?"},
])

df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        base_url=base_url, 
        api_key=api_key
    )
)
df_result.show()

### Guided Choice

In [None]:
df = daft.from_pylist([
    {"text":"Classify this sentiment: Daft is fast!"},
])

df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        extra_body={"guided_choice": ["positive", "negative"]},
        base_url=base_url, 
        api_key=api_key
    )
)
df_result.show()

### Guided Regex

In [None]:
df = daft.from_pylist([
    {"text":"Generate an email address for Alan Turing, who works at Enigma. End in .com and new line. Example result: 'alan.turing@enigma.com\n'"},
])

df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        extra_body={"guided_regex": r"[a-z0-9.]{1,20}@\w{6,10}\.com\n"},
        base_url=base_url,
        api_key=api_key
    )
)
df_result.show()

### Pydantic Json Schema

In [None]:
import enum
import pydantic

# Define the pydantic model
class CarType(str, enum.Enum):
    SEDAN = "SEDAN"
    SUV = "SUV"
    TRUCK = "TRUCK"
    COUPE = "COUPE"


class CarDescription(pydantic.BaseModel):
    brand: str
    model: str
    car_type: CarType

# Define the prompt
df = daft.from_pylist([
    {"text": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's"},
])

# Generate the result
df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        response_format = {
            "type": "json_schema",
            "json_schema": {
                "name": "car-description",
                "schema": CarDescription.model_json_schema(),
            },
        },
        base_url=base_url,
        api_key=api_key
    )
)

# Validate the result, return a pydantic model
df_result_validated = df_result.with_column("pydantic_model_validated", df_result["result"].apply(
    lambda x: CarDescription.model_validate_json(x),
    return_dtype= daft.DataType.python()
))

df_result_validated.show()

### Guided Grammar

In [None]:
df = daft.from_pylist([
    {"text":"Generate an SQL query to show the 'username' and 'email'from the 'users' table."},
])

df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        extra_body={"guided_grammar": """
root ::= select_statement

select_statement ::= "SELECT " column " from " table " where " condition

column ::= "col_1 " | "col_2 "

table ::= "table_1 " | "table_2 "

condition ::= column "= " number

number ::= "1 " | "2 "
        """},
        base_url=base_url,
        api_key=api_key
    )
)
df_result.show()

### Structural Tag

In [None]:
df = daft.from_pylist([
    {"text": """
You have access to the following function to retrieve the weather in a city:

{
    "name": "get_weather",
    "parameters": {
        "city": {
            "param_type": "string",
            "description": "The city to get the weather for",
            "required": True
        }
    }
}

If a you choose to call a function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where

start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function
              argument value as value.
end_tag => `</function>`

Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line
- Always add your sources when using search results to answer the user query

You are a helpful assistant.

Given the previous instructions, what is the weather in New York City, Boston,
and San Francisco?"""},
])

# Generate the result
df_result = df.with_column("result", llm_generate(
        df["text"],
        model=model_id,
        provider="openai",
        response_format = {
            "type": "structural_tag",
            "structures": [
                {
                    "begin": "<function=get_weather>",
                    "schema": {
                        "type": "object",
                        "properties": {"city": {"type": "string"}},
                        "required": ["city"],
                    },
                    "end": "</function>",
                }
            ],
            "triggers": ["<function="],
        },
        base_url=base_url,
        api_key=api_key
    )
)
df_result.show()