# Guaranteeing valid output syntax

Large language models are great at generating useful outputs, but they are not great at guaranteeing that those outputs follow a specific format. This can cause problems when we want to use the outputs of a language model as input to another system. For example, if we want to use a language model to generate a JSON object, we need to make sure that the output is valid JSON. This can be a real pain with standard APIs, but with `guidance` we can both accelerate inference speed and ensure that generated JSON is always valid.

This notebook shows how to generate a JSON object we know will have a valid format. The example used here is a generating a random character profile for a game, but the ideas are readily applicable to any scenario where you want JSON output.

In [1]:
import guidance

# define the model we will use
# lm = guidance.models.LlamaCpp("/home/scottlundberg_google_com/models/mistral-7b-v0.1.Q8_0.gguf", n_gpu_layers=-1)
lm = guidance.models.Transformers("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

gpustat is not installed, run `pip install gpustat` to collect GPU stats.


StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\n<html lang="en">\n<head>\n …

In [2]:
from guidance import gen, select

# we can pre-define valid option sets
sample_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]
sample_armor = ["leather", "chainmail", "plate"]

# define a re-usable "guidance function" that we can use below
@guidance
def quoted_list(lm, name, n):
    for i in range(n):
        if i > 0:
            lm += ", "
        lm += '"' + gen(name, list_append=True, stop='"') + '"'
    return lm

@guidance
def generate_character(
    lm,
    character_one_liner,
    weapons: list[str] = sample_weapons,
    armour: list[str] = sample_armor,
    n_items: int = 3
):
    lm += f'''\
    {{
        "description" : "{character_one_liner}",
        "name" : "{gen("character_name", stop='"')}",
        "age" : {gen("age", regex="[0-9]+")},
        "armour" : "{select(armour, name="armor")}",
        "weapon" : "{select(weapons, name="weapon")}",
        "class" : "{gen("character_class", stop='"')}",
        "mantra" : "{gen("mantra", stop='"')}",
        "strength" : {gen("age", regex="[0-9]+")},
        "quest_items" : [{quoted_list("quest_items", n_items)}]
    }}'''
    return lm


generation = lm + generate_character("A quick and nimble fighter")

StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\n<html lang="en">\n<head>\n …

We have produced valid JSON:

In [3]:
import json

gen_json = json.loads(generation.__str__())

print(f"Loaded json:\n{json.dumps(gen_json, indent=4)}")

Loaded json:
{
    "description": "A quick and nimble fighter",
    "name": "Sabretooth",
    "age": 25,
    "armour": "leather",
    "weapon": "sword",
    "class": "warrior",
    "mantra": "Fight or die",
    "strength": 10,
    "quest_items": [
        "Sword of Valor",
        "Shield of Courage",
        "Helmet of Wisdom"
    ]
}


We have also captured our generated text and can access it like a dictionary:

In [4]:
generation["weapon"]

'sword'

## Using a schema

We can also define a JSON-schema for our character, and then pass that to `guidance`:

In [5]:
character_schema = """{
    "type": "object",
    "properties": {
        "description" : { "type" : "string" },
        "name" : { "type" : "string" },
        "age" : { "type" : "integer" },
        "armour" : { "type" : "string", "enum" : ["leather", "chainmail", "plate"] },
        "weapon" : { "type" : "string", "enum" : ["sword", "axe", "mace", "spear", "bow", "crossbow"] },
        "class" : { "type" : "string" },
        "mantra" : { "type" : "string" },
        "strength" : { "type" : "integer" },
        "quest_items" : { "type" : "array", "items" : { "type" : "string" } }
    },
    "required": [ "description", "name", "age", "armour", "weapon", "class", "mantra", "strength", "quest_items" ],
    "additionalProperties": false
}
"""

character_schema_obj = json.loads(character_schema)

Our previous generation complies with this schema:

In [6]:
from jsonschema import validate

validate(instance=gen_json, schema=character_schema_obj)

Now, use our schema with `guidance`:

In [7]:
from guidance import json as gen_json

generated = lm + "A character attuned to the forest"
generated += gen_json(schema=character_schema_obj, name="next_character")

StitchWidget(initial_height='auto', initial_width='100%', srcdoc='<!doctype html>\n<html lang="en">\n<head>\n …

Again, we have a valid JSON result:

In [8]:
loaded_character = json.loads(generated["next_character"])

validate(instance=loaded_character, schema=character_schema_obj)

print(json.dumps(loaded_character, indent=4))

{
    "description": "A druid named Elara, who communicates with the ancient spirits of the forest.",
    "name": "Elara",
    "age": 27,
    "armour": "leather",
    "weapon": "bow",
    "class": "druid",
    "mantra": "Nature's harmony, ancient spirits' guidance",
    "strength": 12,
    "quest_items": [
        "Ancient Oak Seed",
        "Mystic Herb",
        "Crystal of the Forest"
    ]
}


<hr style="height: 1px; opacity: 0.5; border: none; background: #cccccc;">
<div style="text-align: center; opacity: 0.5">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>