## Guided Generation Test
Original code is from here: https://github.com/dottxt-ai/outlines

There were some questionable things in this code, so I worked with Google Gemini
to clean and explain the code a little better.

According to Gemini the code was also out-of-date...

# %%capture suppresses the long output 

Also, fun fact, things like %%capture must be the very first line in a cell... 

In [2]:
%%capture
!pip install outlines

UsageError: Line magic function `%%capture` not found.


In [None]:
# %%capture suppresses the long output log
%%capture
!pip install --upgrade transformers accelerate torch

In [2]:
import outlines
from enum import Enum
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import List

# ---------------------------------------------------------
# 1. SETUP: DEFINING THE "SHAPE" OF THE DATA
# ---------------------------------------------------------
# We aren't just writing a Python class here. Outlines will look at this 
# Pydantic model and convert it into a complex "Grammar" or Regex 
# behind the scenes. This acts like a stencil for the AI.
class TicketPriority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    urgent = "urgent"

class ServiceTicket(BaseModel):
    priority: TicketPriority
    category: str
    requires_manager: bool
    summary: str
    action_items: List[str]

# ---------------------------------------------------------
# 2. SETUP: LOADING THE BRAINS
# ---------------------------------------------------------
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"

# We load the standard HuggingFace model. 
# 'trust_remote_code' is needed because Phi-3 uses custom architecture code.
# 'device_map="auto"' puts the model on your GPU if available.
llm = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, 
    device_map="cuda", 
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# THE MAGIC LINE:
# outlines.from_transformers wraps the standard LLM. 
# It intercepts the model's generation process. When the model tries to 
# pick the next word, Outlines checks the Pydantic schema (ServiceTicket)
# and bans any token that doesn't fit the JSON structure.
model = outlines.from_transformers(llm, tokenizer)



config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
2026-01-09 02:19:41.419792: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767925181.727829      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767925181.821149      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767925182.600950      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same ta

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

In [None]:
# ---------------------------------------------------------
# 3. THE INPUT
# ---------------------------------------------------------
customer_email = """
Subject: URGENT - Cannot access my account after payment
I paid for the premium plan 3 hours ago and still can't access any features.
Please fix this immediately or refund my payment.
"""

# We format the prompt using Phi-3's specific chat template tags
# (<|im_start|>, etc).
prompt = f"""<|im_start|>user
Analyze this customer email:
{customer_email}
<|im_end|>
<|im_start|>assistant
"""

# ---------------------------------------------------------
# 4. EXECUTION: GENERATING THE TICKET
# ---------------------------------------------------------
# Here is the fix for the confusing variable usage.
# When we pass `ServiceTicket` as the second argument, Outlines does three things:
# 1. Forces the LLM to output valid JSON matching the ServiceTicket schema.
# 2. Waits for generation to finish.
# 3. AUTOMATICALLY converts that JSON string into a Python 'ServiceTicket' object.
structured_ticket_object = model(
    prompt,
    ServiceTicket,
    max_new_tokens=500
)

# ---------------------------------------------------------
# 5. LOGIC: USING THE RESULT
# ---------------------------------------------------------
# Because 'structured_ticket_object' is already a Python object, 
# we can access properties using dot notation (.priority) immediately.
# No JSON parsing is required here.

def alert_manager(ticket):
    print(f"ALARM: {ticket.priority.upper()} priority issue: {ticket.summary}")

# We check the Enums and Booleans directly
if structured_ticket_object.priority == TicketPriority.urgent or structured_ticket_object.requires_manager:
    alert_manager(structured_ticket_object)
else:
    print("Ticket routed to standard queue.")