# [Guidance](https://github.com/guidance-ai/guidance) for JSON-constrained LLM output

This notebook demonstrates using Guidance to force JSON format from LLM queries.

In [None]:
# Install guidance if needed
# !pip install guidance

In [1]:
import guidance
from guidance import system, user, assistant, gen
from guidance import json as gen_json
from guidance.models import Transformers
from pydantic import BaseModel, Field
from typing import List
import json

In [2]:
# Define Pydantic schema for person extraction
class Person(BaseModel):
    name: str = Field(description="Person's name")
    time: str = Field(description="Time period when mentioned, or 'not specified'")
    place: str = Field(description="Geographic location, or 'not specified'") 
    role: str = Field(description="Person's role or occupation, or 'not specified'")

class PeopleExtraction(BaseModel):
    people: List[Person] = Field(description="List of people extracted from text")

In [None]:
# Load your model with Guidance
model_path = "/gpfs1/llm/llama-3.2-hf/Meta-Llama-3.2-3B-Instruct"

# Initialize the Guidance model
lm = Transformers(model_path, device_map="cuda")
print("Guidance model loaded successfully!")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

gpustat is not installed, run `pip install gpustat` to collect GPU stats.


Guidance model loaded successfully!


In [4]:
# Sample text from your existing data
sample_text = """Whether Ong would have seen cyberspace as fundamentally oral or literary, he would surely have recognized it as transformative: not just a revitalization of older forms, not just an amplification, but something wholly new. He might have sensed a coming discontinuity akin to the emergence of literacy itself. Few understood better than Ong just how profound a discontinuity that had been.
When he began his studies, "oral literature" was a common phrase. It is an oxymoron laced with anachronism; the words imply an all-too-unconscious approach to the past by way of the present. Oral literature was generally treated as a variant of writing; this, Ong said, was "rather like thinking of horses as automobiles without wheels."
"Language in fact bears the same relationship to the concept of mind that legislation bears to the concept of parliament," says Jonathan Miller: "it is a competence forever bodying itself in a series of concrete performances." Much the same might be said of writing—it is concrete performance—but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools, and it is a tool. And like many technologies that followed, it thereby inspired immediate detractors.
One unlikely Luddite was also one of the first long-term beneficiaries. Plato (channeling the nonwriter Socrates) warned that this technology meant impoverishment."""

In [5]:
# Function to extract people using Guidance JSON constraints
def extract_people_with_guidance(text: str) -> dict:
    """Extract people information with guaranteed JSON format using Guidance"""
    
    # Start with a fresh model instance
    model = lm
    
    with system():
        model += "You are an expert at extracting people information from text."
    
    with user():
        model += f"""Extract all people mentioned in the following text. For each person, provide their name, time period, location, and role. Use 'not specified' for missing information.

Text: {text}

Extract people as JSON:"""
    
    with assistant():
        model += gen_json(name="people_data", schema=PeopleExtraction)
    
    return model["people_data"]

print("Function defined successfully!")

Function defined successfully!


In [6]:
# Test the extraction
print("Testing Guidance-based JSON extraction...")
result = extract_people_with_guidance(sample_text)

print("\nRaw JSON result:")
print(result)

print("\nParsed and formatted result:")
parsed_result = json.loads(result)
print(json.dumps(parsed_result, indent=2))

Testing Guidance-based JSON extraction...


StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\n<html lang="en">\n<head>\n …




Raw JSON result:
{"people": [{"name": "Ong", "time": "not specified", "place": "not specified", "role": "scholar"}, {"name": "Socrates", "time": "not specified", "place": "not specified", "role": "philosopher"}, {"name": "Plato", "time": "not specified", "place": "not specified", "role": "philosopher"}, {"name": "Jonathan Miller", "time": "not specified", "place": "not specified", "role": "scholar"}]}

Parsed and formatted result:
{
  "people": [
    {
      "name": "Ong",
      "time": "not specified",
      "place": "not specified",
      "role": "scholar"
    },
    {
      "name": "Socrates",
      "time": "not specified",
      "place": "not specified",
      "role": "philosopher"
    },
    {
      "name": "Plato",
      "time": "not specified",
      "place": "not specified",
      "role": "philosopher"
    },
    {
      "name": "Jonathan Miller",
      "time": "not specified",
      "place": "not specified",
      "role": "scholar"
    }
  ]
}


In [7]:
# Validate with Pydantic
print("Validating with Pydantic...")
try:
    validated_result = PeopleExtraction.model_validate_json(result)
    print("✅ JSON is valid according to schema!")
    
    print("\nExtracted people:")
    for i, person in enumerate(validated_result.people, 1):
        print(f"{i}. {person.name} - {person.role} ({person.time}, {person.place})")
        
except Exception as e:
    print(f"❌ Validation failed: {e}")

Validating with Pydantic...
✅ JSON is valid according to schema!

Extracted people:
1. Ong - scholar (not specified, not specified)
2. Socrates - philosopher (not specified, not specified)
3. Plato - philosopher (not specified, not specified)
4. Jonathan Miller - scholar (not specified, not specified)
