# Structured Outputs with llm_generate using the vLLM Provider

<a target="_blank" href="https://colab.research.google.com/github/everettVT/daft-structured-outputs/blob/main/workload/llm_generate_vllm_provider_examples.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Dont forget to follow installation and setup instructions in the readme.

WARNING: vLLM has experimental support for macOS with Apple Silicon. For now, users must build from source to natively run on macOS.

Colab or Intel/AMD Recommended for CPU.


In [2]:
!pip install daft vllm pydantic

Collecting daft
  Downloading daft-0.5.22-cp39-abi3-manylinux_2_24_x86_64.whl.metadata (12 kB)
Collecting vllm
  Downloading vllm-0.10.1.1-cp38-abi3-manylinux1_x86_64.whl.metadata (15 kB)
Collecting blake3 (from vllm)
  Downloading blake3-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting prometheus-fastapi-instrumentator>=7.0.0 (from vllm)
  Downloading prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl.metadata (13 kB)
Collecting lm-format-enforcer<0.11,>=0.10.11 (from vllm)
  Downloading lm_format_enforcer-0.10.12-py3-none-any.whl.metadata (17 kB)
Collecting llguidance<0.8.0,>=0.7.11 (from vllm)
  Downloading llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting outlines_core==0.2.10 (from vllm)
  Downloading outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.8 kB)
Collecting diskcache==5.6.3 (from vllm)
  Downloading diskcache-5.6.3-py3-none-any

#### Authenticate with HuggingFace for Access to Gemma-3 Series models

In [1]:
!hf auth login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) 
Token is valid (permission: read).
The token `Anyscale Ray Serve LLM` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to 

In [8]:
import daft
from daft import col
from daft.functions import llm_generate, format
from vllm.sampling_params import GuidedDecodingParams

MAX_TOKENS = 100
TEMPERATURE = 0.0
NUM_GPUS = 1

## Guided Choice

In [5]:
df_choice = daft.from_pydict({
    "text": [
        "I'm not a fan of slow data pipelines",
        "Daft Dataframes are wicked fast!",
    ]
})

df_choice = df_choice.with_column("sentiment", llm_generate(
    "Classify this sentiment: " + df_choice["text"],
    model="google/gemma-3-270m",
    provider="vllm",
    guided_decoding= GuidedDecodingParams(choice=["Positive", "Negative"]),
    max_tokens= MAX_TOKENS,
    temperature= TEMPERATURE,
    num_gpus= NUM_GPUS
)).collect()
df_choice.show()

text Utf8,sentiment Utf8
I'm not a fan of slow data pipelines,Positive
Daft Dataframes are wicked fast!,Negative


## Guided Regex

In [10]:
df_regex = daft.from_pydict({
    "name": [
        "John Doe",
        "Jane Smith",
        "Alice Johnson",
        "Bob Brown",
        "Charlie Davis",
    ],
    "company": [
        "Acme Inc.",
        "Globex Corp.",
        "Initech",
        "Soylent Corp.",
        "Umbrella Corp.",
    ]
})

df_regex = df_regex.with_column("email", llm_generate(
    format("""
    Generate an email address for {} at {}
    End in .com
    """, col("name"), col("company")),
    model="google/gemma-3-270m",
    provider="vllm",
    guided_decoding=GuidedDecodingParams(regex=r"\w+@\w+\.com\n"),
    max_tokens=MAX_TOKENS,
    temperature=TEMPERATURE,
    num_gpus=NUM_GPUS
))
df_regex.show()


name Utf8,company Utf8,email Utf8
John Doe,Acme Inc.,GenerateanemailaddressforJohnDoeatAcmeInc@com.com
Jane Smith,Globex Corp.,GenerateanemailaddressforJaneSmithatGlobexCorp@gmail.com
Alice Johnson,Initech,GenerateanemailaddressforAliceJohnsonatInitech@gmail.com
Bob Brown,Soylent Corp.,GenerateanemailaddressforBobBrownatSoylentCorp@gmail.com
Charlie Davis,Umbrella Corp.,GenerateanemailaddressforCharlieDavisatUmpireCorp@gmail.com


## Guided Decoding by Pydantic JSON Schema

In [12]:
from pydantic import BaseModel, Field
from enum import Enum

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"


class CarDescription(BaseModel):
    make: str = Field(description="The make of the car")
    model: str = Field(description="The model name of the car")
    car_type: CarType = Field(description="The type of vehicle")


df_pydantic = daft.from_pydict({
    "decade": [
        "80's",
        "90's",
        "2000's",
    ]
})

df_pydantic = df_pydantic.with_column("car_description_json", llm_generate(
    format("Generate a car description for the {} decade", col("decade")),
    model="google/gemma-3-270m",
    provider="vllm",
    guided_decoding=GuidedDecodingParams(json=CarDescription.model_json_schema()),
    max_tokens=MAX_TOKENS,
    temperature=TEMPERATURE,
    num_gpus=NUM_GPUS
))

df_pydantic = df_pydantic.with_column("car_description_validated",
    df_pydantic["car_description_json"].apply(
        lambda x: CarDescription.model_validate_json(x),
        return_dtype= daft.DataType.python()
    )
)
df_pydantic.show()


decade Utf8,car_description_json Utf8,car_description_validated Python
80's,"{  ""brand"": ""Ford"",  ""model"": ""Ford Mustang"",  ""car_type"": ""SUV"" }",brand='Ford' model='Ford Mustang' car_type=<CarType.suv: 'SUV'>
90's,"{  ""brand"": ""90s"",  ""model"": ""90s"",  ""car_type"": ""SUV"" }",brand='90s' model='90s' car_type=<CarType.suv: 'SUV'>
2000's,"{  ""brand"": ""Toyota"",  ""model"": ""Toyota"",  ""car_type"": ""SUV"" }",brand='Toyota' model='Toyota' car_type=<CarType.suv: 'SUV'>


## Guided Decoding by Grammar

In [13]:
simplified_sql_grammar = """
root ::= select_statement
select_statement ::= "SELECT " column " from " table " where " condition
column ::= "col_1 " | "col_2 "
table ::= "table_1 " | "table_2 "
condition ::= column "= " number
number ::= "1 " | "2 "
limit ::= "LIMIT " number
"""

df_grammar = daft.from_pydict({
    "prompt": [
        "Generate an SQL query to show the 'username' and 'email' from the 'users' table.",
        "Generate an SQL query to show the 'name' and 'age' from the 'users' table where the age is greater than 30.",
        "Generate an SQL query to show the 'name' and 'age' from the 'users' table where the age is greater than 30 and the name is 'John Doe'.",
        "Show me the first 10 rows of the 'users' table.",
    ]
})

df_grammar = df_grammar.with_column("sql_query", llm_generate(
    df_grammar["prompt"],
    model="google/gemma-3-270m",
    provider="vllm",
    guided_decoding=GuidedDecodingParams(grammar=simplified_sql_grammar),
    num_gpus=NUM_GPUS
))
df_grammar.show()

prompt Utf8,sql_query Utf8
Generate an SQL query to show the 'username' and 'email' from the 'users' table.,SELECT col_1 from table_1 where
Generate an SQL query to show the 'name' and 'age' from the 'users' table where the age is greater than 30.,SELECT col_1 from table_2 where col_1
Generate an SQL query to show the 'name' and 'age' from the 'users' table where the age is greater than 30 and the name is 'John Doe'.,SELECT col_1 from table_1 where col_
Show me how to see the first 10 rows of the 'users' table.,SELECT col_1 from table_1 where col_
