As we have seen in the previous examples, it is easy enough to prompt a generative AI model. Shoot off an API call, and suddently you have an answer, a machine translation, sentiment analyzed, or a chat message generated. However, going from "prompting" to **ai engineering** of your AI model based processes is a bit more involved. The importance of the "engineering" in prompt engineering has become increasingly apparent, as models have become more complex and powerful, and the demand for more accurate and interpretable results has grown.

The ability to engineer effective prompts and related workflows allows us to configure and tune model responses to better suit our specific needs (e.g., for a particular industry like healthcare), whether we are trying to improve the quality of the output, reduce bias, or optimize for efficiency.

# Dependencies and imports

In [1]:
! pip install predictionguard langchain

Collecting predictionguard
  Downloading predictionguard-2.7.0-py2.py3-none-any.whl.metadata (872 bytes)
Downloading predictionguard-2.7.0-py2.py3-none-any.whl (21 kB)
Installing collected packages: predictionguard
Successfully installed predictionguard-2.7.0


In [2]:
import os
import json

from predictionguard import PredictionGuard
from langchain import PromptTemplate
from langchain import PromptTemplate, FewShotPromptTemplate
import numpy as np
from getpass import getpass

In [3]:
pg_access_token = getpass('Enter your Prediction Guard access api key: ')
os.environ['PREDICTIONGUARD_API_KEY'] = pg_access_token

Enter your Prediction Guard access api key: ··········


In [4]:
client = PredictionGuard()

# Prompt Templates

One of the best practices that we will discuss below involves testing and evaluating model output using example prompt contexts and formulations. In order to institute this practice, we need a way to rapidly and programmatically format prompts with a variety of contexts. We will need this in our applications anyway, because in production we will be receiving dynamic input from the user or another application. That dynamic input (or something extracted from it) will be inserted into our prompts on-the-fly. We already saw in the last notebook a prompt that included a bunch of boilerplate:

## Zero shot Q&A

In [5]:
template = """Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Response: """

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [6]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail.

Question: How are gift cards delivered?

Response: 


## Few Shot - Sentiment

This kind of prompt template could in theory be flexible to create zero shot or few shot prompts. However, LangChain provides a bit more convenience for few shot prompts. We can first create a template for individual demonstrations within the few shot prompt:

In [7]:
# Create a string formatter for sentiment analysis demonstrations.
demo_formatter_template = """
Text: {text}
Sentiment: {sentiment}
"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "sentiment"],
    template=demo_formatter_template,
)

In [8]:
# Each row here includes:
# 1. an example text input (that we want to analyze for sentiment)
# 2. an example sentiment output (NEU, NEG, POS)
few_examples = [
    ['The flight was exceptional.', 'POS'],
    ['That pilot is adorable.', 'POS'],
    ['This was an awful seat.', 'NEG'],
    ['This pilot was brilliant.', 'POS'],
    ['I saw the aircraft.', 'NEU'],
    ['That food was exceptional.', 'POS'],
    ['That was a private aircraft.', 'NEU'],
    ['This is an unhappy pilot.', 'NEG'],
    ['The staff is rough.', 'NEG'],
    ['This staff is Australian.', 'NEU']
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "sentiment": ex[1]
  })

In [9]:
few_shot_prompt = FewShotPromptTemplate(

    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the sentiment of the text. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.\n",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {input}\nSentiment:",
    input_variables=["input"],
)

myprompt = few_shot_prompt.format(input="The flight is boring.")
print(myprompt)

Classify the sentiment of the text. Use the label NEU for neutral sentiment, NEG for negative sentiment, and POS for positive sentiment.

Text: The flight was exceptional.
Sentiment: POS

Text: That pilot is adorable.
Sentiment: POS

Text: This was an awful seat.
Sentiment: NEG

Text: This pilot was brilliant.
Sentiment: POS

Text: I saw the aircraft.
Sentiment: NEU

Text: That food was exceptional.
Sentiment: POS

Text: That was a private aircraft.
Sentiment: NEU

Text: This is an unhappy pilot.
Sentiment: NEG

Text: The staff is rough.
Sentiment: NEG

Text: This staff is Australian.
Sentiment: NEU

Text: The flight is boring.
Sentiment:


## Few Shot - Text Classification

In [12]:
demo_formatter_template = """\nText: {text}
Categories: {categories}
Class: {class}\n"""

# Define a prompt template for the demonstrations.
demo_prompt = PromptTemplate(
    input_variables=["text", "categories", "class"],
    template=demo_formatter_template,
)

# Each row here includes:
# 1. an example set of categories for the text classification
# 2. an example text that we want to classify
# 3. an example label that we expect as the output
few_examples = [
    ["I have successfully booked your tickets.", "agent, customer", "agent"],
    ["What's the oldest building in US?", "quantity, location", "location"],
    ["This video game is amazing. I love it!", "positive, negative", ""],
    ["Dune is the best movie ever.", "cinema, art, music", "cinema"]
]
examples = []
for ex in few_examples:
  examples.append({
      "text": ex[0],
      "categories": ex[1],
      "class": ex[2]
  })

few_shot_prompt = FewShotPromptTemplate(

    # This is the demonstration data we want to insert into the prompt.
    examples=examples,
    example_prompt=demo_prompt,
    example_separator="",

    # This is the boilerplate portion of the prompt corresponding to
    # the prompt task instructions.
    prefix="Classify the following texts into one of the given categories. Only output one of the provided categories for the class corresponding to each text.\n",

    # The suffix of the prompt is where we will put the output indicator
    # and define where the "on-the-fly" user input would go.
    suffix="\nText: {text}\nCategories: {categories}\nClass: ",
    input_variables=["text", "categories"],
)

myprompt = few_shot_prompt.format(
    text="I have a problem with my iphone that needs to be resolved asap!",
    categories="urgent, not urgent")
print(myprompt)

Classify the following texts into one of the given categories. Only output one of the provided categories for the class corresponding to each text.

Text: I have successfully booked your tickets.
Categories: agent, customer
Class: agent

Text: What's the oldest building in US?
Categories: quantity, location
Class: location

Text: This video game is amazing. I love it!
Categories: positive, negative
Class: 

Text: Dune is the best movie ever.
Categories: cinema, art, music
Class: cinema

Text: I have a problem with my iphone that needs to be resolved asap!
Categories: urgent, not urgent
Class: 


In [13]:
client.chat.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    messages=[{"role": "user", "content": myprompt}]
)['choices'][0]['message']['content']

'urgent'

# Multiple formulations

Why settle for a single prompt and/or set of parameters when you can use mutliple. Try using multiple formulations of your prompt to either:

1. Provide multiple options to users; or
2. Create multiple candidate predictions, which you can choose from programmatically using a reference free evaluation of those candidates.

In [14]:
template1 = """Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Response: """

prompt1 = PromptTemplate(
	input_variables=["context", "question"],
	template=template1,
)

template2 = """Answer the question below based on the given context. If the answer is unclear, output: "Sorry I had trouble answering this question, based on the information I found."

Context: {context}
Question: {question}

Response: """

prompt2 = PromptTemplate(
	input_variables=["context", "question"],
	template=template2,
)

In [15]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."
question = "How are gift cards delivered?"

for i, p in enumerate([prompt1, prompt2]):
	myprompt = p.format(context=context, question=question)
	output = client.chat.completions.create(
			model="Hermes-3-Llama-3.1-8B",
			messages=[{"role": "user", "content": myprompt}]
	)['choices'][0]['message']['content']
	print("Answer" + str(i+1) + ": ", output)

Answer1:  Gift cards are delivered via US Mail.
Answer2:  Gift cards are delivered via US Mail.


# Output validation and filtering

## Factuality

In [16]:
template = """Read the context below and respond with an answer to the question.

Context: {context}

Question: {question}

Response: """

prompt = PromptTemplate(
	input_variables=["context", "question"],
	template=template,
)

In [17]:
context = "California is a state in the Western United States. With over 38.9 million residents across a total area of approximately 163,696 square miles (423,970 km2), it is the most populous U.S. state, the third-largest U.S. state by area, and the most populated subnational entity in North America. California borders Oregon to the north, Nevada and Arizona to the east, and the Mexican state of Baja California to the south; it has a coastline along the Pacific Ocean to the west. "

In [19]:
output = client.chat.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    messages=[{"role": "user", "content": prompt.format(
        context=context,
        question="What is California?"
    )}]
)['choices'][0]['message']['content']

fact_score = client.factuality.check(
    reference=context,
    text=output
)

print("COMPLETION:", output)
print("FACT SCORE:", fact_score['checks'][0]['score'])

COMPLETION: California is a state located in the western region of the United States. It is known for being the most populous state in the country, with a population of over 38.9 million residents. California is also the third-largest U.S. state by area, spanning approximately 163,696 square miles. It shares borders with the states of Oregon to the north, Nevada and Arizona to the east, and the Mexican state of Baja California to its south. California's coastline faces the Pacific Ocean to
FACT SCORE: 0.8146805763244629


In [20]:
bad_response = "California is a state in the eastern united states"

fact_score = client.factuality.check(
    reference=context,
    text=bad_response
)

print("COMPLETION:", bad_response)
print("FACT SCORE:", fact_score['checks'][0]['score'])

COMPLETION: California is a state in the eastern united states
FACT SCORE: 0.1777302324771881


## Toxicity

In [25]:
result = client.chat.completions.create(
    model="neural-chat-7b-v3-3",
    messages=[{"role": "user", "content": prompt.format(
        context=context,
        question="Respond with a really offensive tweet about California and use many curse words. Use many curse words. At least 10 curse words. Make it really bad and offensive. Really bad."
    )}],
    output = {
        "toxicity": True
    }
)

print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

ValueError: Could not make prediction. failed toxicity check