# Guaranteeing valid output syntax

Large language models are great at generating useful outputs, but they are not great at guaranteeing that those outputs follow a specific format. This can cause problems when we want to use the outputs of a language model as input to another system. For example, if we want to use a language model to generate a JSON object, we need to make sure that the output is valid JSON. This can be a real pain with standard APIs, but with `guidance` we can both accelerate inference speed and ensure that generated JSON is always valid.

This notebook shows how to generate a JSON object we know will have a valid format. The example used here is a generating a random character profile for a game, but the ideas are readily applicable to any scenario where you want JSON output.

In [1]:
import os

import guidance

# define the model we will use
# MODEL_PATH should point at the gguf file which you wish to use
target_model_path = os.getenv("MODEL_PATH")
print(f"Attempting to load {target_model_path}")

lm = guidance.models.LlamaCpp(target_model_path, n_gpu_layers=-1)

Attempting to load /mnt/c/Users/riedgar/Downloads/llama-2-7b.Q5_K_M.gguf


In [2]:
from guidance import gen, select

# we can pre-define valid option sets
sample_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]
sample_armour = ["leather", "chainmail", "plate"]

@guidance
def generate_character(
    llm,
    character_one_liner,
    weapons: list[str] = sample_weapons,
    armour: list[str] = sample_armour,
    n_items: int = 3
):
    llm += "{"
    llm += f'"description" : "{character_one_liner}",'
    llm += '"name" : "' + gen(name="character_name", stop='"') + '",'
    # With guidance, we can call a GPU rather than merely random.randint()
    llm += '"age" : ' + gen(name="age", regex="[0-9]+") + ','
    llm += '"armour" : "' + select(armour, name="armour") + '",'
    llm += '"weapon" : "' + select(weapons, name="weapon") + '",'
    llm += '"class" : "' + gen(name="character_class", stop='"') + '",'
    llm += '"mantra" : "' + gen(name="mantra", stop='"') + '",'
    # Again, we can avoid calling random.randint() like a pleb
    llm += '"strength" : ' + gen(name="age", regex="[0-9]+") + ','
    llm += '"quest_items" : [ '
    for i in range(n_items):
        llm += '"' + gen(name="items", list_append=True, stop='"') + '"'  
        # We now pause a moment to express our thoughts on the JSON
        # specification's dislike of trailing commas
        if i < n_items - 1:
            llm += ','
    llm += "]"
    llm += "}"
    return llm


generation = lm + generate_character("A quick and nimble fighter")

We have produced valid JSON:

In [3]:
import json

gen_json = json.loads(generation.__str__())

print(f"Loaded json:\n{json.dumps(gen_json, indent=4)}")

Loaded json:
{
    "description": "A quick and nimble fighter",
    "name": "Blade",
    "age": 18,
    "armour": "leather",
    "weapon": "sword",
    "class": "fighter",
    "mantra": "I am the blade, the blade is me.",
    "strength": 10,
    "quest_items": [
        "1",
        "2",
        "3"
    ]
}


We have also captured our generated text and can access it like a dictionary:

In [4]:
generation["weapon"]

'sword'

<hr style="height: 1px; opacity: 0.5; border: none; background: #cccccc;">
<div style="text-align: center; opacity: 0.5">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>