# Prompting

In this notebook we'll prompt a language model to perform some real world tasks. We'll extract entities from text and do some classification.

First lets make sure our libraries are up to date:

In [None]:
!pip install -U transformers
!pip install git+https://github.com/guidance-ai/guidance

Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.46.3-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m81.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.46.2
    Uninstalling transformers-4.46.2:
      Successfully uninstalled transformers-4.46.2
Successfully installed transformers-4.46.3
Collecting git+https://github.com/guidance-ai/guidance
  Cloning https://github.com/guidance-ai/guidance to /tmp/pip-req-build-x08cpniz
  Running command git clone --filter=blob:none --quiet https://github.com/guidance-ai/guidance /tmp/pip-req-build-x08cpniz
  Resolved https://github.com/guidance-ai/guidance to commit 2f114670904119be93

We can now load a chat model. In this demo we're going to use TinyLLama as it will load quickly and run within colab. For better performance, you can experiment with running larger models.

In [2]:
from transformers import pipeline

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
pipe = pipeline("text-generation", model_name, device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

## Information Extraction

Lets get our llm to find all the problems (diseases or symptoms), tests run (e.g. Xrays), and treatments given (e.g. drugs) in some veterinary notes.

We can write a prompt which describes our problem and how we want the model to answer:

In [3]:
MAIN_PROMPT = """\
You are a smart and intelligent Named Entity Recognition (NER) system.

Entity Definition:
1. PROBLEM: Any disease, syndrome, or symptom.
2. TREATMENT: medical care given to fix a problem.
3. TEST: Any diagnostic test used to investigate a problem.

Output Format:
{'PROBLEM': [list of entities present], 'TREATMENT': [list of entities present], 'TEST': [list of entities present]}
If no entities are presented in any categories, output an empty list
]"""

Then let's turn this into a sequence of messages.

- We'll include the prompt we defined above.
- We then include a few example input and output pairs.
- Finally, we provide the input we wish the model to solve.

In [4]:
input_sentence = "Archie is a 10-year-old cat with a broken leg. He currently receives 3 units of ProZinc insulin."

messages = [
     {"role": "system", "content": MAIN_PROMPT},
     {"role": "user", "content": "My dog developed lumps on her skin. This has been diagnosed as keratoacanthomas and treated with anti-itch medication"},
     {"role": "assistant", "content": '{"PROBLEM": ["lumps on her skin", "keratoacanthomas"], "TREATMENT": ["anti-itch medication"], "TEST": []}'},
     {"role": "user", "content": "Jess has been sneezing for 2 months or more. Today we took a nasal scope and CT. Placed on a week of Clavamox."},
     {"role": "assistant", "content": '{"PROBLEM": ["sneezing"], "TREATMENT": ["Clavamox"], "TEST": ["nasal scope", "CT"]}'},
     {"role": "user", "content": input_sentence}
]

output = pipe(messages)[0]["generated_text"][-1]
print(output["content"])

{"PROBLEM": ["broken leg"], "TREATMENT": ["ProZinc insulin"], "TEST": []}


Because the model has output in a standard json format, we can parse the result as data:

In [5]:
import json
output_data = json.loads(output["content"])

print("Problems", output_data["PROBLEM"])
print("Treatments", output_data["TREATMENT"])
print("Tests", output_data["TEST"])

Problems ['broken leg']
Treatments ['ProZinc insulin']
Tests []


These could then be saved to a file or visualized, eg:

In [6]:
from spacy import displacy

ents = []
for category in output_data:
  for entity in output_data[category]:
    ents.append({"start": input_sentence.find(entity), "end": input_sentence.find(entity) + len(entity), "label": category})

doc = {
    "text": input_sentence,
    "ents": ents
}

displacy.render(doc, style="ent", manual=True, jupyter=True, options={"colors":{"problem": "#ffbbbb", "test": "#bbbbff", "treatment": "#bbffbb"}})

Try changing the sentence and prompt and see what the model produces.

## Under the hood

The above example used the [chat templating feature](https://huggingface.co/docs/transformers/main/en/chat_templating) of the transformers library.

Behind-the-scenes this is turned into a single long input for the model which includes special tokens indicating who is "speaking" in the chat dialogue.

For example, the sequence of messages:

In [7]:
messages = [
   {"role": "system", "content": "You are a helpful chatbot."},
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

will be turned into the following prompt under-the-hood:

In [8]:
print(pipe.tokenizer.apply_chat_template(messages, tokenize=False))

<|system|>
You are a helpful chatbot.</s>
<|user|>
Hello, how are you?</s>
<|assistant|>
I'm doing great. How can I help you today?</s>
<|user|>
I'd like to show off how chat templating works!</s>



Notice how the special tokens `<|system|>`, `<|user|>`, `<|assistant|>`, and `</s>` are added between each round of dialogue.

Each LLM (Large Language Model) is trained using different formats so these special tokens are model-specific. The chat templating feature hides this away for us so we don't have to remember which tokens to use.

This isn't yet supported for all LLMs in the transformers library howerver, so sometimes you may need to construct the above prompt by hand.

# Classification

Lets try another example, this time predicting the sex of the animal as one of 3 categories:

*   Male
*   Female
*   Unknown

In [9]:
test_sentences = [
    {
        "sentence": "The Owner brought his dog into the surgery yesterday and mentioned a history with diabetes.",
        "label": "unknown"
    },
    {
        "sentence": "I saw a 5yo cat with a broken leg. She didn't show any improvement since her last visit.",
        "label": "female"
    },
    {
        "sentence": "9yo F/N cat",
        "label": "female"
    },
      {
        "sentence": "Shorthair was brought into my clinic the other day, and is the most beautiful boy!",
        "label": "male"
    },
]

Write a prompt which can classify the sex of the animal (not the owner!):




In [10]:
MAIN_PROMPT = """\

You are a smart sex classifying vet. Classify the following text as male, female, or unknown.

"""

def classify_sex(sentence):
  messages = [
    {"role": "system", "content": MAIN_PROMPT},
    {"role": "user", "content": f"Sentence: {sentence}"}
  ]
  return pipe(messages)[0]["generated_text"][-1]["content"].lower()

In [None]:
print(classify_sex("I have been working with a 10-year-old diabetic cat. He is treated with 3 units of ProZinc insulin."))

Try it out on the examples:

In [None]:
for sentence in test_sentences:
  output = classify_sex(sentence["sentence"])
  print("Input:           ", sentence["sentence"])
  print("Label:           ", sentence["label"])
  print("Model Prediction:", output.replace("\n", " \\n"))
  prediction = "female" if "female" in output else "male" if "male" in output else "unknown"
  print("Verdict          :", "✅" if prediction == sentence["label"].lower() else "❌")
  print()

Input:            The Owner brought his dog into the surgery yesterday and mentioned a history with diabetes.
Label:            unknown
Model Prediction: sentence: the owner brought his dog into the surgery yesterday and mentioned a history with diabetes.
Verdict          : ✅

Input:            I saw a 5yo cat with a broken leg. She didn't show any improvement since her last visit.
Label:            female
Model Prediction: sentence: i saw a 5-year-old cat with a broken leg. she did not show any improvement since her last visit.
Verdict          : ❌

Input:            9yo F/N cat
Label:            female
Model Prediction: sentence: 9yo f/n cat
Verdict          : ❌

Input:            Shorthair was brought into my clinic the other day, and is the most beautiful boy!
Label:            male
Model Prediction: sentence: shorthair was brought into my clinic the other day, and is the most beautiful boy!
Verdict          : ❌



Did the model get them all right? Prompting is hard and small changes make a big difference to the output. Try modifying your phrasing or adding more examples.

# Guiding the model

Previously, we told the model which format to output the result in and gave it some examples. You may have noticed, it can be quite hard to get the model to follow the format you want!

Most of the time this is enough, but if you're asking the LLM to solve a task it's not seen before (or using a very small model in our case), it may struggle with the output format. This is a problem if we want to parse the model's output and we're expecting it to be in a specific format.

We can ensure that the model outputs in the correct format by using constrained generation. There are loads of libraries which do this, but we'll explore using the [guidance](https://guidance.readthedocs.io/en/latest/) library here.  

Firstly we make sure we have all the dependencies and load the library and a model. We're using the LLM defined in the previous section but you could use other things here including the OpenAI api.

In [None]:
from guidance import models, select
from transformers import pipeline

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lm = models.Transformers(model_name)

{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %} was unable to be loaded directly into guidance.
                        Defaulting to the ChatML format which may not be optimal for the selected model. 
                        For best results, create and pass in a `guidance.ChatTemplate` subclass for your model.


Lets try applying this to classifying the sex of an animal in the input.

We can now define the rules for the output. In this example we want the model to only output one of 3 options:


*   Male
*   Female
*   Unknown

In [None]:
def classify_sex(input_sentence):
  messages = [
    {"role": "system", "content": MAIN_PROMPT},
    {"role": "user", "content": f"Sentence: He was a happy dog"},
    {"role": "assistant", "content": f"male"},
    {"role": "user", "content": f"Sentence: I met Lucy, a 3yo female shorthair today."},
    {"role": "assistant", "content": f"female"},
    {"role": "user", "content": f"Sentence: A 5yo poodle was brought into the office today by the owner. She described how they wouldn't eat food."},
    {"role": "assistant", "content": f"unknown"},
    {"role": "user", "content": f"Sentence: {sentence}"}
  ]
  prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
  prompt = prompt.replace("</s>", "")

  # This line is the magic!
  # Notice how select is given 3 options constraining the output!
  return str(lm + prompt + select(['male', 'female', 'unknown'], name="sex")).split("<|assistant|>")[-1].strip()

In [None]:
classify_sex("I have been working with a 10-year-old diabetic cat. He is treated with 3 units of ProZinc insulin.")

'male'

In [None]:
for sentence in test_sentences:
  output = classify_sex(sentence["sentence"])
  print("Input:           ", sentence["sentence"])
  print("Label:           ", sentence["label"])
  print("Model Prediction:", output.replace("\n", " \\n"))
  prediction = "female" if "female" in output else "male" if "male" in output else "unknown"
  print("Verdict          :", "✅" if prediction == sentence["label"].lower() else "❌")
  print()

We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)


Input:            The Owner brought his dog into the surgery yesterday and mentioned a history with diabetes.
Label:            unknown
Model Prediction: unknown
Verdict          : ✅

Input:            I saw a 5yo cat with a broken leg. She didn't show any improvement since her last visit.
Label:            female
Model Prediction: female
Verdict          : ✅

Input:            9yo F/N cat
Label:            female
Model Prediction: female
Verdict          : ✅

Input:            Shorthair was brought into my clinic the other day, and is the most beautiful boy!
Label:            male
Model Prediction: male
Verdict          : ✅



If you're using APIs such as openai, you might not even need this. They support structured outputs such as json by just passing them a parameter.

# Exercises

1. Experiment with different prompts. The structure of the prompt makes a big difference to the performance of the model
2. Explore some of the other things the [guidance library](https://github.com/guidance-ai/guidance/tree/main) can do.