# Guidance Garage Demo

This notebook is a brief introduction to the capabilities of [Guidance](https://github.com/guidance-ai/guidance), a Python package designed to make interfacing with LLMs both easier and more reliable. Guidance has support for a variety of LLMs, although its constrained generation features only work with local models (mostly).

Before you begin, please install the required packages into your Python environment:

```bash
pip install -r /path/to/this/directory/requirements.txt
```

We will be using the 'mini' Phi-3 model in this demo, but it should work with most models available via Hugging Face Transformers. After installing the above packages, you shoulod be able to [run the sample inferencing code for Phi-3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct#sample-inference-code).

## Simple Usage

We will start by importing guidance and some useful functions:

In [None]:
import guidance
from guidance import gen, select, models, assistant, system, user, with_temperature

Next, we create a `Model` object, which is Guidance's abstract representation of an LLM. We will use the `Transformers` implementation of `Model`, and load Phi-3 from the Hugging Face hub:

In [None]:
lm = models.Transformers("microsoft/Phi-3-mini-4k-instruct")

`Model` objects are immutable, so each time we assign, we actually make a copy (the copy is shallow; we do not make a copy of the underlying LLM). As we accumulate prompts and responses, these are stored in the `Model` object so that we can reference them later.

We can use the `user()` and `assistant()` context managers to build a conversation with the model. Sending prompts to the model is a matter of string concatenation, and we can get a response from the model by calling `gen()`:

In [None]:
chat_lm = lm

with user():
    chat_lm += "How are you?"

with assistant():
    chat_lm += gen("chat_response", max_tokens=20, temperature=0.8)

The first argument to `gen()` is a key we can use to extract the specific text generated by the call:

In [None]:
chat_lm["chat_response"]

## Constrained Generation

Constrained generation is a powerful feature of Guidance. With it, we can force the model to produce an answer from a list we specify. It is accessed via the `select()` function:

In [None]:
food_lm = lm

with user():
    food_lm += "Do you like brussels sprouts?"

with assistant():
    food_lm += with_temperature(select(name="brussels", options=["Yes, I like them" , "No I despise them"]), temperature=0.8)

The output here also shows some of the power of constrained generation: only the first token is highlighted, which means that only the first token was generated by the model. Once that was generated, Guidance was able to see that only one of the `options` passed to `select()` was still a possibility, and was therefore able to inject the remaining tokens.

In [None]:
food_lm["brussels"]

## JSON Generation

Constrained generation is really powerful when working with formatted data, such as JSON. Guidance can be taught any context-free grammar as a constraint, but it comes with (partial) support for [JSON schema](https://json-schema.org/). Let us start with some useful `import` statements:

In [None]:
import json
from jsonschema import validate

For our example, we're going to generate characters for a role-playing game, in the usual Tolkienesque setting. We can write a very simple JSON schema for these characters:

In [None]:
character_schema = """{
    "type": "object",
    "properties": {
        "description" : { "type" : "string" },
        "name" : { "type" : "string" },
        "age" : { "type" : "integer" },
        "armour" : { "type" : "string", "enum" : ["leather", "chainmail", "plate"] },
        "weapon" : { "type" : "string", "enum" : ["sword", "axe", "mace", "spear", "bow", "crossbow"] },
        "class" : { "type" : "string" },
        "mantra" : { "type" : "string" },
        "strength" : { "type" : "integer" },
        "quest_items" : { "type" : "array", "items" : { "type" : "string" } }
    },
    "additionalProperties": false
}
"""

character_schema_obj = json.loads(character_schema)

Now, let's import the necessary function from Guidance itself. Since we have already imported the json package, we have to rename it on import:

In [None]:
from guidance import json as gen_json

We can use this with a standard one-shot prompting strategy. We provide the LLM with a system prompt via the `system()` context manager, and then our one-shot example with the `user()` and `assistant()` context managers as before. We then put in our actual request, and call `gen_json()` with the schema object we just loaded:

In [None]:
character_lm = lm

with system():
    character_lm += """You are a DM creating characters for a game in a Tolkienesque setting.
Users will provide a one-line description of a character, you and should respond with a longer
description in JSON format.
"""

# Now give an example
with user():
    character_lm += "A quick and nimble fighter"

with assistant():
    character_lm += """{
    "description": "A quick and nimble fighter",
    "name": "Mokosh",
    "age": 20,
    "armour": "chainmail",
    "weapon": "sword",
    "class": "fighter",
    "mantra": "I am the sword of the gods",
    "strength": 10,
    "quest_items": [
        "Bag of holding",
        "Amulet of Perun",
    ]
}"""

# Now ask for our character
with user():
    character_lm += "A character attuned to the forest"

with assistant():
    character_lm += gen_json(schema=character_schema_obj, name="next_character", temperature=0.8)

Notice how only a subset of tokens in the output were actually produced by the LLM. Many of the other tokens could be forced because the model was constrained by the schema.

We can show that we really produced valid JSON with `json.loads()`, and also validate it against the schema we provided:

In [None]:
loaded_character = json.loads(character_lm["next_character"])

validate(instance=loaded_character, schema=character_schema_obj)

print(json.dumps(loaded_character, indent=4))

With Guidance, we can do even better. Using the `@guidance` decorator, we can create functions which can be used with the 'string concatenation' approach, much like `gen()` and `select()` (not to mention `json()` itself - used as `gen_json()` here). The function must accept a Guidance `Model` as its first argument, and then return a `Model` at the end. Inside, you can call other Guidance functions (or Python ones):

In [None]:
@guidance
def generate_character(
    lm_curr,
    key: str,
    character_one_liner: str,
    temperature: float
):
    with system():
        lm_curr += """You are a DM creating characters for a game set in a Tolkienesque setting.
Users will provide a one-line description of a character, you and should respond with a longer
description in JSON format.
"""

    # Now give an example
    with user():
        lm_curr += "A quick and nimble fighter"

    with assistant():
        lm_curr += """{
    "description": "A quick and nimble fighter",
    "name": "Mokosh",
    "age": 20,
    "armour": "chainmail",
    "weapon": "sword",
    "class": "fighter",
    "mantra": "I am the sword of the gods",
    "strength": 10,
    "quest_items": [
        "Bag of holding",
        "Amulet of Perun",
    ]
}"""

    # Now ask for our character
    with user():
        lm_curr += character_one_liner

    with assistant():
        lm_curr += gen_json(schema=character_schema_obj, name=key)

    return lm_curr

To be a _stateless_ Guidance function, the supplied function must not reference any of its own generations (i.e. it mustn't contain code like `lm["my_generation"]`), but that only matters when interacting with a remote endpoint.

We can now use this function like `gen()` or `select()`:

In [None]:
key = "new_character"
char_0 = lm + generate_character(character_one_liner="A crafty rogue", key=key, temperature=0.8)

print(json.dumps(json.loads(char_0[key]), indent=4))

And again:

In [None]:
char_1 = char_0 + generate_character(character_one_liner="A paladin from strange lands", key=key, temperature=0.8)

print(json.dumps(json.loads(char_1[key]), indent=4))