# Introduction to `guidance`

This notebook is a terse tutorial walkthrough of the syntax of `guidance`.

## Models

At the core of any guidance program are the immutable model objects. You can create an initial model object using any of the constructors under `guidance.models`: 

In [1]:
from guidance import models

# For LlamaCpp, you need to provide the path on disk to a .gguf model
# A sample model can be downloaded from
# https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf
mistral = models.LlamaCpp("/home/scottlundberg_google_com/models/mistral-7b-instruct-v0.2.Q8_0.gguf", n_gpu_layers=-1, n_ctx=4096)

#llama2 = models.Transformers("meta-llama/Llama-2-7b-hf")
#gpt3 = models.OpenAI("text-davinci-003")
#palm2 = models.VertexAI("text-bison@001")

## Simple generation

Once you have an initial model object you can append text to it with the addition operator. This creates a new model object that has the same context (prompt) as the original model, but with the text appended at the end (just like what would happen if you add two strings together).

In [2]:
lm = mistral + "Who won the last Kentucky derby and by how much?"

Once you have added some text to the model you can then ask the model to generate unconstrained text using the `gen` guidance function. Guidance functions represent executable components that can be appended to a model. When you append a guidance function to a model the model extends its state by executing the guidance function.

In [3]:
from guidance import gen

lm + gen(max_tokens=10)

Note that while the `lm` and `mistral` objects are semantically separate, for performance purposes they share the same model weights and KV cache, so the incremental creation of new lm objects is very cheap and reuses all the computation from prior objects.

We can add the text and the `gen` function in one statement to follow the traditional prompt-then-generate pattern:

In [4]:
mistral + '''\
Q: Who won the last Kentucky derby and by how much?
A:''' + gen(stop="Q:")

## Simple templates

You can define a template in `guidance` (v0.1+) using f-strings. You can interpolate both standard variables and also guidance functions. Note that in Python 3.12 you can put anything into f-string slots, but in python 3.11 and below there are a few disallowed characters (like backslash).

In [5]:
query = "Who won the last Kentucky derby and by how much?"
mistral + f'''\
Q: {query}
A: {gen(stop="Q:")}'''

## Capturing variables

Often when you are building a guidance program you will want to capture specific portions of the output generated by the model. You can do this by giving a name to the element you wish to capture.

In [6]:
query = "Who won the last Kentucky derby and by how much?"
lm = mistral + f'''\
Q: {query}
A: {gen(name="answer", stop="Q:")}'''

Then we can access the variable by indexing into the final model object.

In [7]:
lm["answer"]

"The last Kentucky Derby was held on May 1, 2021, and the winner was Medina Spirit, ridden by jockey John Velazquez. Medina Spirit won by 0.5 lengths over Mandaloun. However, it's important to note that Medina Spirit failed a drug test and the results of the race are under investigation. Therefore, the official winner may change."

## Function encapsulation

When you have a set of model operations you want to group together, you can place them into a custom guidance function. To do this you define a decorated python function that takes a model as the first positional argument and returns a new updated model. You can add this guidance function to a model to execute it, just like with the built-in guidance functions like `gen`.

In [8]:
import guidance

@guidance
def qa_bot(lm, query):
    lm += f'''\
    Q: {query}
    A: {gen(name="answer", stop="Q:")}'''
    return lm

query = "Who won the last Kentucky derby and by how much?"
mistral + qa_bot(query) # note we don't pass the `lm` arg here (that will get passed during execution when it gets added to the model)

Note that one atypical feature of guidance functions is that multi-line string literals defined inside a guidance function respect the python indentation structure. This means that the whitespace before "Q:" and "A:" in the prompt above is stripped (but if they were indented 6 spaces instead of 4 spaces then only the first 4 spaces would be stripped, since that is the current python indentation level). This allows us to define multi-line templates inside guidance functions while retaining indentation readability (if you ever want to disable this behavior you can use `@guidance(dedent=False)`).

## Selecting among alternatives

Guidance has lots of ways to constrain model generation, but the most basic buliding block is the `select` function that forces the model to choose between a set of options (either strings or full grammars).

In [9]:
from guidance import select

mistral + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}'''

Note that since guidance is smart about when tokens are forced by the program (and so don't need to be predicted by the model) only one token was generated in the program above (the beginning of "SEARCH" that is highlighted in green).

## Interleaved generation and control

Because guidance is pure Python code you can interleave (constrained) generation commands with traditional python control statements. In the example below we first ask the model to decide if it should search the web or respond directly, then act accordingly.

In [10]:
@guidance
def qa_bot(lm, query):
    lm += f'''\
    Q: {query}
    Now I will choose to either SEARCH the web or RESPOND.
    Choice: {select(["SEARCH", "RESPOND"], name="choice")}
    '''
    if lm["choice"] == "SEARCH":
        lm += "A: I don't know, Google it!"
    else:
        lm += f'A: {gen(stop="Q:", name="answer")}'
    return lm

mistral + qa_bot(query)

## Generating lists

Whenever you want to generate a list of items you can use the `list_append` parameter which will cause the captured value to be appended to a list instead of overwriting previous values.

In [11]:
lm = mistral + f'''\
Q: {query}
Now I will choose to either SEARCH the web or RESPOND.
Choice: {select(["SEARCH", "RESPOND"], name="choice")}
'''
if lm["choice"] == "SEARCH":
    lm += "Here are 3 search queries:\n"
    for i in range(3):
        lm += f'''{i+1}. "{gen(stop='"', name="queries", temperature=1.0, list_append=True)}"\n'''

In [12]:
lm["queries"]

['last Kentucky derby winner',
 'latest Kentucky derby results',
 'Kentucky derby winner 2021']

## Chat

You can control chat models using special `with` context blocks that wrap whatever is inside them with the special formats needed for the chat model you are using. This allows you express chat programs without tying yourself to a single model backend.

In [13]:
# to use role based chat tags you need a chat model, here we use gpt-3.5-turbo but you can use 'gpt-4' as well
gpt35 = models.OpenAI("gpt-3.5-turbo")

In [14]:
from guidance import system, user, assistant

with system():
    lm = gpt35 + "You are a helpful assistant."

with user():
    lm += "What is the meaning of life?"

with assistant():
    lm += gen("response")

Multistep

In [15]:
# you can create and guide multi-turn conversations by using a series of role tags
@guidance
def experts(lm, query):
    with system():
        lm += "You are a helpful assistant."

    with user():
        lm += f"""\
        I want a response to the following question:
        {query}
        Who are 3 world-class experts (past or present) who would be great at answering this?
        Please don't answer the question or comment on it yet."""

    with assistant():
        lm += gen(name='experts', max_tokens=300)
    
    with user():
        lm += f"""\
        Great, now please answer the question as if these experts had collaborated in writing a joint anonymous answer.
        In other words, their identity is not revealed, nor is the fact that there is a panel of experts answering the question.
        If the experts would disagree, just present their different positions as alternatives in the answer itself (e.g. 'some might argue... others might argue...').
        Please start your answer with ANSWER:"""
    
    with assistant():
        lm += gen(name='answer', max_tokens=500)

    return lm
                   
gpt35 + experts(query='What is the meaning of life?')

## Streaming

Often you want to get the results of a generation as it is happening so you update an interface. You can do this programmatically using the `.stream()` method of model objects. This creates a `ModelStream` that you can use to accumulate updates. These updates don't get executed until you interate over then `ModelStream` object. When you iterate over the object you get lots of partially completed model objects as the guidance program is executed.

In [16]:
for part in mistral.stream() + qa_bot(query):
    part # do something with the partially executed lm

<hr style="height: 1px; opacity: 0.5; border: none; background: #cccccc;">
<div style="text-align: center; opacity: 0.5">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>