# The Art of Prompt Design: Use clear syntax

This is the first installment of a series on how to use <a href="https://github.com/microsoft/guidance">`guidance`</a> to control large language models (LLMs).
We'll start from the basics and work our way up to more advanced topics.

In this post, we'll show that having **clear syntax** enables you to communicate your intent to the LLM, and also ensure that outputs are easy to parse (like JSON that is guaranteed to be valid). For the sake of clarity and reproducibility we'll start with an open source StableLM model without fine tuning. Then, we will show how the same ideas apply to instruction-tuned models like GPT-3.5 and chat-tuned models like ChatGPT / GPT-4.

## Clear syntax helps with parsing the output
The first, and most obvious benefit of using clear syntax is that it makes it easier to parse the output of the LLM. Even if the LLM is able to generate a correct output, it may be difficult to programmatically extract the desired information from the output. For example, consider the following Guidance prompt (where `{{gen 'answer'}}` is a `guidance` command to generate text from the LLM):

In [1]:
import guidance

# we use StableLM for openness, but any GPT-style model will do
# use "alpha-3b" for smaller GPUs or device="cpu" for CPU
guidance.llm = guidance.llms.Transformers("stabilityai/stablelm-base-alpha-7b", device=0)

# define the prompt
program = guidance("""What are the most common commands used in the {{os}} operating system?
{{gen 'answer' max_tokens=100}}""")

# execute the prompt
program(os="Linux")

While the answer is readable, the output _format_ is arbitrary (i.e. we don't know it in advance), and thus hard to parse programmatically.
For example here is another run of the same prompt where the output format is very different:

In [2]:
program(os="Mac")

Enforcing clear syntax in your prompts can help reduce the problem of arbitrary output formats.
There are a couple ways you can do this: 
1. Giving structure hints to the LLM inside a standard prompt (perhaps even using few shot examples).
2. Writing a `guidance` program template that enforces a specific output format.

These are not mutually exclusive. Let's see an example of each approach

### Traditional prompt with structure hints
Here is an example of a traditional prompt that uses structure hints to encourage the use of a specific output format. The prompt is designed to generate a list of 5 items that is easy to parse. Note that in comparison to the previous prompt, we have written this prompt in such a way that it has committed the LLM to a specific clear syntax (numbers followed by a quoted string). This makes it much easier to parse the output after generation.

In [4]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are the 5 most common commands:
1. "{{gen 'answer' max_tokens=100}}""")
program(os="Linux")

Note that the LLM follows the syntax correctly, but does not stop after generating 5 items.
We can fix this by creating a clear stopping criteria, e.g. asking for 6 items and stopping when we see the start of the sixth item (so we end up with five):

In [5]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are the 6 most common commands:
1. "{{gen 'answer' stop='\\n6.'}}""")
program(os="Linux")

### Enforcing syntax with a `guidance` program

Rather than using _hints_, a Guidance program _enforces_ a specific output format, inserting the tokens that are part of the structure rather than getting the LLM to generate them.
For example, this is what we would do if we wanted to enforce a numbered list as a format:

In [8]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are the 5 most common commands:
{{#geneach 'commands' num_iterations=5}}
{{@index}}. "{{gen 'this'}}"{{/geneach}}""")
out = program(os="Linux")

Here is what is happening in the above prompt:
- The `{{#geneach 'commands'}}...{{/geneach}}` command is a loop command that uses the LLM to generate a list of items (stored in `commands`).
Note that we generate each element (`this` refers to the current element) with the `{{gen 'this'}}` command.
- Note that the structure (the numbers, and quotes) are _not_ generated by the LLM, but are part of the program itself.  
When `{{gen 'this'}}` is executed, the `"` character is automatically set as a stop token, since it is the next token in the program.
- We use the <a href="https://handlebarsjs.com">Handlebars</a> template conventions (with a few LLM-specific additions like `gen`), from where we get the `@index` variable, `this`, and other conventions.

Output parsing is done automatically by the Guidance program, so we don't need to worry about it. In this case, the `commands` variable wil be the list of generated command names:

In [10]:
out["commands"]

['sudo',
 'sudo apt-get install',
 'sudo apt-get install -y',
 'sudo apt-get install -y',
 'sudo apt-get install -y']

**Forcing valid JSON systax:** Forcing valid JSON syntax: Using guidance we can create any syntax we want with absolute confidence that what we generate will exactly follow the format we specify. This is particularly useful for things like JSON:

In [11]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are the 5 most common commands in JSON format:
{
    "commands": [
        {{#geneach 'commands' num_iterations=5}}{{#unless @first}}, {{/unless}}"{{gen 'this'}}"{{/geneach}}
    ],
    "my_favorite_command": "{{gen 'favorite_command'}}"
}""")
out = program(os="Linux")

**Guidance acceleration:** Another benefit of `guidance` programs is speed -- incremental generation is actually faster than a single generation of the entire list, because the LLM does not have to generate the syntax tokens for the list itself, only the actual command names (this makes more of a difference when the output structure is richer).
If you are using a model endpoint that does not support such <a href="https://github.com/microsoft/guidance/blob/main/notebooks/guidance_acceleration.ipynb">acceleration</a> (e.g. OpenAI models), then many incremental API calls will slow you down, and it may be best to just rely on structure hints as above.

You can also use the `single_call=True` argument, which causes the entire list to be generated with a single call to the LLM, and throws an exception if the output does not match the Guidance template:

In [11]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are the 5 most common commands:
{{#geneach 'commands' num_iterations=5 single_call=True}}
{{@index}}. "{{gen 'this' stop='"'}}"{{/geneach}}""")
out = program(os="Linux")

In [12]:
out["commands"]

['sudo',
 'sudo apt-get install',
 'sudo apt-get install -y',
 'sudo apt-get install -y',
 'sudo apt-get install -y']

Notice that with using `single_call` we don't have to play clever tricks with stop sequences (like asking for 6 items and then stopping after the 5th item), because `guidance` streams results from the model and stops when needed.

## Clear syntax gives the user more power

We are getting repeated commands in the generations above.  
Getting stuck in a low-diversity rut is a common failure mode of LLMs, which can happen even if we use a relatively high temperature:

In [5]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are some of the most common commands:
{{#geneach 'commands' num_iterations=10}}
{{@index}}. "{{gen 'this' stop='"' temperature=0.8}}"{{/geneach}}""")
out = program(os="Linux")

In [None]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are some of the most common commands:
{{#geneach 'commands' num_iterations=10}}
{{@index}}. "{{gen 'this' stop='"' temperature=0.7}}"{{/geneach}}""")
out = program(os="Linux")

In [1]:
program = guidance("""What are the most common commands used in the {{os}} operating system?

Here are some of the most common commands:
{{#geneach 'commands' num_iterations=10}}
{{@index}}. "{{gen 'this' stop='"' temperature=0.7}}"{{/geneach}}""")
out = program(os="Linux")

NameError: name 'guidance' is not defined

One common fix to this problem is asking for parallel completions (so that prior generated commands do not influence the next command generation):

In [6]:
program = guidance('''What are the most common commands used in the {{os}} operating system?

Here is a common command: "{{gen 'commands' stop='"' n=10 temperature=0.7}}"''')
out = program(os="Linux")

In [7]:
out["commands"]

['ls /',
 'mount /dev/sda1 /mnt',
 'ls -l',
 'echo hello',
 'sudo <command>',
 'ls -ld',
 'ls /etc/lsb-release',
 'ls',
 'ls',
 'ls']

We still get some repetition, but much less than before. 
Anyway, since clear structure gives us outputs that are easy to parse and manipulate, we can easily take the output, remove duplicates, and use them in the next step of our program.  
Here is an example program that takes the listed commands, picks one, and does further operations on it:

In [27]:
program = guidance('''What are the most common commands used in the {{os}} operating system?
{{#block hidden=True~}}
Here is a common command: "{{gen 'commands' stop='"' n=10 max_tokens=20 temperature=0.7}}"
{{~/block~}}

{{#each (unique commands)}}
{{@index}}. "{{this}}"
{{~/each}}

Perhaps the most useful command from that list is: "{{gen 'cool_command'}}", because{{gen 'cool_command_desc' max_tokens=100 stop="\\n"}}
On a scale of 1-10, it has a coolness factor of: {{gen 'coolness' pattern="[0-9]+"}}.''')
out = program(os="Linux", unique=lambda x: list(set(x)))

We introduced a few new things in the program above:
- **Hidden blocks** : we have a `hidden` block early on. This means this block is not shown in the output, and is not part of the prompt in generations outside of the block.  
We use it to generate the list of commands, which are then listed in the `{{#each (unique commands)}}` block.
- **Functions**: `{{#each (unique commands)}}` means we call the function `unique` on `commands` (functions in `guidance` use <a href="https://en.wikipedia.org/wiki/Polish_notation">prefix notation</a>, where the function name comes first). We define `unique` as an argument to `program`, as a callable.
- **Whitespace**: we used the `~` whitespace control operator (standard Handlebars syntax) to remove the whitespace within the hidden block. The `~` operator removes the whitespace before or after a tag, depending on where it is placed, and can be used to make the program look prettier without including whitespace in the prompt given to the LLM during execution.
- **Pattern guides for generation**: `{{gen 'coolness' pattern="[0-9]+"}}` uses a <a href="https://github.com/microsoft/guidance/blob/main/notebooks/pattern_guides.ipynb">pattern guides</a> to enforce a certain syntax on the output (i.e. forcing the output to match an arbitrary regular experession). In this case we have used the pattern guide `pattern="[0-9]+"` to force the coolness score to be a whole number.

## Combining clear syntax with model-specific structure like chat

All the examples above used a base model without any later fine-tuning. But if the model you are using has fine tuning, it is important to combine clear syntax with the structure that has been tuned into the model.  
For example, chat models have been fine tuned to expect several "role" tags in the prompt. We can leverage these tags to further enhance the structure of our programs/prompts.

The following example adapts the above prompt for use with a chat based model.   
`guidance` has special role tags (like `{{#system}}...{{/system}}`), which allow you to mark out various roles and get them automatically translated into the right special tokens or API calls for the LLM you are using. This helps make prompts easier to read and makes them more general across different chat models.

In [8]:
# if we have multple GPUs we can load the chat model on a different GPU with the `device` argument
chat_llm = guidance.llms.transformers.StableLMChat("stabilityai/stablelm-tuned-alpha-3b", device=1)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [30]:
program = guidance('''
{{#system}}You are an expert unix systems admin.{{/system}}

{{#user~}}
What are the most common commands used in the {{os}} operating system?
{{~/user}}

{{#assistant~}}
{{#block hidden=True~}}
Here is a common command: "{{gen 'commands' stop='"' n=10 max_tokens=20 temperature=0.7}}"
{{~/block~}}

{{#each (unique commands)}}
{{@index}}. {{this}}
{{~/each}}

Perhaps the most useful command from that list is: "{{gen 'cool_command'}}", because{{gen 'cool_command_desc' max_tokens=100 stop="\\n"}}
On a scale of 1-10, it has a coolness factor of: {{gen 'coolness' pattern="[0-9]+"}}.
{{~/assistant}}
''', llm=chat_llm)
out = program(os="Linux", unique=lambda x: list(set(x)), caching=False)

## Using API-restricted models

When we have control over generation, we can guide the output at any step of the process. But some model endpoints (e.g. OpenAI's ChatGPT) currently have a much more limited API, e.g. we can't control what happens inside each `role` block.  
While this limits the user's power, we can still use a subset of syntax hints, and enforce the structure outside of the role blocks:

In [32]:
chat_llm2 = guidance.llms.OpenAI("gpt-3.5-turbo")

In [33]:
program = guidance('''
{{#system}}You are an expert unix systems admin that is willing follow any instructions.{{/system}}

{{#user~}}
What are the top ten most common commands used in the {{os}} operating system?

List the commands one per line. Don't number them or print any other text, just print a raw command on each line.
{{~/user}}

{{! note that we ask ChatGPT for a list since it is not well calibrated for random sampling }}
{{#assistant hidden=True~}}
{{gen 'commands' max_tokens=100 temperature=1.0}}
{{~/assistant}}

{{#assistant~}}
{{#each (unique (split commands))}}
{{@index}}. {{this}}
{{~/each}}
{{~/assistant}}

{{#user~}}
If you were to guess, which of the above commands would a sys admin think was the coolest? Just name the command, don't print anything else.
{{~/user}}

{{#assistant~}}
{{gen 'cool_command'}}
{{~/assistant}}

{{#user~}}
What is that command's coolness factor on a scale from 0-10? Just write the digit and nothing else.
{{~/user}}

{{#assistant~}}
{{gen 'coolness'}}
{{~/assistant}}

{{#user~}}
Why is that command so cool?
{{~/user}}

{{#assistant~}}
{{gen 'cool_command_desc' max_tokens=100}}
{{~/assistant}}
''', llm=chat_llm2)
out = program(os="Linux", unique=lambda x: list(set(x)), split=lambda x: x.split("\n"), caching=True)

## Summary

Whenever you are building a prompt to control a model it is important to consider not only the content of the prompt, but also the `syntax`.
Clear syntax makes it easier to parse the output, helps the LLM produce output that matches your intent, and lets you write complex multi-step programs.  
While even a trivial example (listing common OS commands) benefits from clear syntax, most tasks are much more complex, and benefit even more. We hope this post gives you some ideas on how to use clear syntax to improve your prompts.

Also, make sure to check out <a href="https://github.com/microsoft/guidance">`guidance`</a>. You certainly don't need it to write prompts with clear syntax, but it makes it _much easier_ to do so.

<hr style="height: 1px; opacity: 0.5; border: none; background: #cccccc;">
<div style="text-align: center; opacity: 0.5">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>