# Prompt engineering

Prompts consist of three main components: 

* __Instructions__ that describe the task requirements, goals, and format of input/output. They explain the task to the model unambiguously. 
* __Examples__ that demonstrate the desired input-output pairs. They provide diverse demonstrations of how different inputs should map to outputs. 
* __Input__ that the model must act on to generate the output.


## Zero-shot Prompting

* No examples provided; rely on the models training
* Leverages the models pre-training
* Works for simple tasks, but struggles with complex reasoning 

In [None]:
import os
import json
from langchain.globals import set_llm_cache
from langchain_community.chat_models import ChatOllama
from langchain.cache import InMemoryCache
from langchain.cache import SQLiteCache
from IPython.display import display, Markdown, JSON
from langchain.prompts.pipeline import PipelinePromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import SequentialChain, LLMChain
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser, PydanticOutputParser
from langchain_openai import AzureChatOpenAI
from dotenv import load_dotenv
from langchain.pydantic_v1 import BaseModel, Field

In [None]:
load_dotenv()
llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_deployment="gpt-4",
)

### PromptTemplate

This is the basic template for Text Prompt. Its is mainly used for _zero-shot-prompting_. The key feature of __PromptTemplate__ is the option to declare and provide parameters.

In [None]:
# Version 1
prompt_template = PromptTemplate.from_template("Classify the sentiment of this text: {text}")

print(prompt_template.format(text="I hate the movie Batman & Robin, it was terrible!"))

In [None]:
result = llm.invoke(prompt_template.format(text="I hate the movie Batman & Robin, it was terrible!"))

In [None]:
display(Markdown(result.content))

In [None]:
# Version 2
prompt_template = PromptTemplate(input_variables=["text"], template="Classify the sentiment of this text: {text}")
chain = prompt_template | llm

In [None]:
result = chain.invoke({"text": "I hate the movie Batman & Robin, it was terrible!"})

In [None]:
display(Markdown(result.content))

## Few-shot Prompting

* Provide a few demos of input and desired output
* Shows desired reasoning format
* Tripled accuracy on grade-school math

In [None]:
prompt_1 = """
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
 
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
"""

In [None]:
result_1 = llm.invoke(prompt_1)
display(Markdown(result_1.content))

In [None]:
prompt_2 = """
Positive This is awesome! 
This is bad! Negative
Wow that movie was rad!
Positive
What a horrible show! --
"""

In [None]:
result_2 = llm.invoke(prompt_2)
display(Markdown(result_2.content))

## Chain-of-Thought (CoT)

In [None]:
prompt_3 = """
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
"""

In [None]:
result_3 = llm.invoke(prompt_3)
display(Markdown(result_3.content))

In [None]:
prompt_4 = """
Given this context: ```Jeff and Tommy are neighbors. Tommy and Eddy are not neighbors.``` and the following query: ```Are Jeff and Eddy neighbors?```. 
Please answer the questions in the query and explain your reasoning.
If there is not enough informaton to answer, please say
"I do not have enough information to answer this questions."
"""

In [None]:
result_4 = llm.invoke(prompt_4)
display(Markdown(result_4.content))

In [None]:
prompt_5 = """
Roger has 5 tennis balls. 
He buys 2 more cans of tennis balls. 
Each can has 3 tennis balls. 
How many tennis balls does he have now?
"""

In [None]:
result_5 = llm.invoke(prompt_5)
display(Markdown(result_5.content))

## Least-to-most Prompting

Least to Most prompting (LtM) takes CoT prompting a step further by first breaking a problem into sub problems then solving each one. It is a technique inspired by real-world educational strategies for children.

LtM leads to multiple improvements:

* improved accuracy over Chain of Thought
* increased generalization on problems harder than those in the prompt
* dramatically improved performance in compositional generalization, in particular the SCAN benchmark

In [None]:
prompt_6 = """
Q: turn left
A: TURN LEFT

Q: turn right
A: TURN RIGHT

Q: jump left
A: TURN LEFT + JUMP

Q: run right
A: TURN RIGHT + RUN

Q: look twice
A: LOOK * 2

Q: run and look twice
A: RUN + LOOK * 2

Q: jump right thrice
A: (TURN RIGHT + JUMP) * 3

Q: walk after run
A: RUN + WALK

Q: turn opposite left
A: TURN LEFT * 2

Q: turn around left
A: TURN LEFT * 4

Q: turn opposite right
A: TURN RIGHT * 2

Q: turn around right
A: TURN RIGHT * 4

Q: walk opposite left
A: TURN LEFT * 2 + WALK

Q: walk around left
A: (TURN LEFT + WALK) * 4

Q: "jump around left twice after walk opposite left thrice" 
A:
"""

In [None]:
result_6 = llm.invoke(prompt_6)
display(Markdown(result_6.content))

In [None]:
prompt_7 = """
INSTRUCTIONS:
You are a customer service agent tasked with kindly responding to customer inquiries. 
Returns are allowed within 30 days. 
Today's date is March 29th. 
There is currently a 50% discount on all shirts. 
Shirt prices range from $18-$100 at your store. 
Do not make up any information about discount policies.
What subproblems must be solved before answering the inquiry?
"""

In [None]:
result_7 = llm.invoke(prompt_7)
display(Markdown(result_7.content))

In [None]:
prompt_8 = """
CUSTOMER INQUIRY:
I just bought a T-shirt from your Arnold collection on March 1st. 
I saw that it was on discount, so bought a shirt that was originally $30, and got 40% off. 
I saw that you have a new discount for shirts at 50%. 
I'm wondering if I can return the shirt and have enough store credit to buy two of your shirts?

INSTRUCTIONS:
You are a customer service agent tasked with kindly responding to customer inquiries. 
Returns are allowed within 30 days. T
oday's date is March 29th. There is currently a 50% discount on all shirts. 
Shirt prices range from $18-$100 at your store. 
Do not make up any information about discount policies.
Determine if the customer is within the 30-day return window. Let's go step by step.
"""

In [None]:
result_8 = llm.invoke(prompt_8)
display(Markdown(result_8.content))

## Self-consistency

Self-consistency prompting is a technique that generates multiple chain-of-thoughts by prompting the model several times to obtain different outputs. More at https://www.promptingguide.ai/techniques/consistency

In [None]:
prompt_9 = """
When I was 6 my sister was half my age. Now
I’m 70 how old is my sister?
"""

In [None]:
# should be 67
result_9 = llm.invoke(prompt_9)
display(Markdown(result_9.content))

In [None]:
prompt_10 = """
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent $15. She has $8 left.
Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
A:
"""

In [None]:
result_10 = llm.invoke(prompt_10)
display(Markdown(result_10.content))

## Chain-of-density (CoD)

How Chain of Density Prompting Works

1. Initial Assessment: The AI first assesses the original text to determine its overall complexity, length, and structure. This initial assessment forms the basis for how the summarization process will proceed.
1. Determining Baseline Density: Based on the initial assessment, the AI sets a baseline density level that corresponds to the expected level of detail and conciseness for the summary. This baseline acts as a starting point for further refinement.
1. Progressive Refinement: The AI then goes through a series of density adjustments, each time refining the level of detail based on feedback loops and predefined quality metrics. This step is where the "chain" aspect comes into play, with each link in the chain representing a stage of refinement.
1. Final Summarization: Once the optimal density level is achieved, the AI generates the final summary, ensuring that it is coherent, concise, and maintains the essence of the original text.

More details https://advanced-stack.com/resources/how-to-summarize-using-chain-of-density-prompting.html

In [None]:
article = """
What is a Large Language Model (LLM)?

A large language model is an advanced type of language model that is trained using deep learning techniques on massive amounts of text data. 
These models are capable of generating human-like text and performing various natural language processing tasks.

In contrast, the definition of a language model refers to the concept of assigning probabilities to sequences of words, based on the analysis of text corpora. 
A language model can be of varying complexity, from simple n-gram models to more sophisticated neural network models. 
However, the term “large language model” usually refers to models that use deep learning techniques and have a large number of parameters, which can range from millions to billions. 
These models can capture complex patterns in language and produce text that is often indistinguishable from that written by humans.

How a Large Language Model (LLM) Is Built?

A large-scale transformer model known as a “large language model” is typically too massive to run on a single computer and is, therefore, provided as a service over an API or web interface. 
These models are trained on vast amounts of text data from sources such as books, articles, websites, and numerous other forms of written content. 
By analyzing the statistical relationships between words, phrases, and sentences through this training process, the models can generate coherent and contextually relevant responses to prompts or queries.
ChatGPT’s GPT-4, a large language model, was trained on massive amounts of internet text data, allowing it to understand various languages and possess knowledge of diverse topics. 
As a result, it can produce text in multiple styles. 
While its capabilities, including translation, text summarization, and question-answering, may seem impressive, they are not surprising, given that these functions operate using special “grammars” that match up with prompts.

How do large language models work?

Large language models like GPT-4 (Generative Pre-trained Transformer 4) work based on a transformer architecture. 
Here’s a simplified explanation of how they Work:

* Learning from Lots of Text: These models start by reading a massive amount of text from the internet. It’s like learning from a giant library of information.

* Innovative Architecture: They use a unique structure called a transformer, which helps them understand and remember lots of information.

* Breaking Down Words: They look at sentences in smaller parts, like breaking words into pieces. This helps them work with language more efficiently.

* Understanding Words in Sentences: Unlike simple programs, these models understand individual words and how words relate to each other in a sentence. They get the whole picture.

* Getting Specialized: After the general learning, they can be trained more on specific topics to get good at certain things, like answering questions or writing about particular subjects.

* Doing Tasks: When you give them a prompt (a question or instruction), they use what they’ve learned to respond. It’s like having an intelligent assistant that can understand and generate text.

General Architecture

The architecture of Large Language Model primarily consists of multiple layers of neural networks, like recurrent layers, feedforward layers, embedding layers, and attention layers. 
These layers work together to process the input text and generate output predictions.

* The embedding layer converts each word in the input text into a high-dimensional vector representation. These embeddings capture semantic and syntactic information about the words and help the model to understand the context.

* The feedforward layers of Large Language Models have multiple fully connected layers that apply nonlinear transformations to the input embeddings. These layers help the model learn higher-level abstractions from the input text.

* The recurrent layers of LLMs are designed to interpret information from the input text in sequence. These layers maintain a hidden state that is updated at each time step, allowing the model to capture the dependencies between words in a sentence.

* The attention mechanism is another important part of LLMs, which allows the model to focus selectively on different parts of the input text. This mechanism helps the model attend to the input text’s most relevant parts and generate more accurate predictions.
"""

In [None]:
article_template = """
Article:
  
{article}

----

You will generate increasingly concise, entity-dense summaries of the above
Article.

Repeat the following 2 steps 5 times.

- Step 1: Identify 1-3 informative Entities from the Article
which are missing from the previously generated summary and are the most
relevant.

- Step 2: Write a new, denser summary of identical length which covers
every entity and detail from the previous summary plus the missing entities

A Missing Entity is:

- Relevant: to the main story
- Specific: descriptive yet concise (5 words or fewer)
- Novel: not in the previous summary
- Faithful: present in the Article
- Anywhere: located anywhere in the Article

Guidelines:
- The first summary should be long (4-5 sentences, approx. 80 words) yet
highly non-specific, containing little information beyond the entities
marked as missing.

- Use overly verbose language and fillers (e.g. "this article discusses") to
reach approx. 80 words.

- Make every word count: re-write the previous summary to improve flow and
make space for additional entities.

- Make space with fusion, compression, and removal of uninformative phrases
like "the article discusses"

- The summaries should become highly dense and concise yet self-contained,
e.g., easily understood without the Article.

- Missing entities can appear anywhere in the new summary.

- Never drop entities from the previous summary. If space cannot be made,
add fewer new entities.

> Remember to use the exact same number of words for each summary.
Answer in JSON.

> The JSON in `summaries_per_step` should be a list (length 5) of
dictionaries whose keys are "missing_entities" and "denser_summary".
"""

article_prompt = PromptTemplate.from_template(article_template)
article_prompt.input_variables


In [None]:
introduction_template = """You are an {person} in writing rich and dense summaries in broad domains."""
introduction_prompt = PromptTemplate.from_template(introduction_template)
introduction_prompt.input_variables

In [None]:
input_prompts = [
    ("introductions", introduction_prompt),
    ("articles", article_prompt),
]

In [None]:
full_template = """{introductions}

{articles}

"""
full_prompt = PromptTemplate.from_template(full_template)
full_prompt.input_variables

In [None]:
pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prompt, pipeline_prompts=input_prompts
)

In [None]:
pipeline_prompt.input_variables

In [None]:
print(
    pipeline_prompt.format(
        person="expert",
        article=article,
    )
)

In [None]:
output_parser = StrOutputParser()
chain = pipeline_prompt | llm | output_parser

In [None]:
result = chain.invoke({"person": "expert", "article": article})

In [None]:
output = json.loads(result)

In [None]:
display(JSON(output, expanded=True))

In [None]:
print(json.dumps(output, indent=2))

In [None]:
print(len(output["summaries_per_step"]))

In [None]:
display(Markdown(output["summaries_per_step"][0]["denser_summary"]))

In [None]:
print(output["summaries_per_step"][0]["missing_entities"])

## Chain-of-verification (CoV)

In the recently released paper [Chain-of-Verification Reduces Hallucination in Large Language Models](https://arxiv.org/abs/2309.11495), the authors show how Chain-of-Verification (CoVe) can reduce hallucination through a 4-steps process:

1. Generate baseline response (query LLM)
1. Plan verifications (given query and baseline response, generate a list of questions that helps verifying any mistakes in the original response)
1. Execute verifications (answer each verification question, check against original response for inconsistency / mistakes)
1. General final verified response (generate a revised response incorporating the results from the verification step if there are any inconsistencies)

More details here https://medium.com/@james.li/a-langchain-implementation-of-chain-of-verification-cove-to-reduce-hallucination-0a8fa2929b2a and https://mister-seo.com/chatgpt-faktencheck-chain-of-verification/

In [None]:
query = "Which German chancellors were born in Berlin?"


In [None]:
# 1. Generate baseline respose chain

input_variables = ["query"]
base_response_output_key = "base_response"
base_response_template = """Question: {query} Answer:"""
base_repsonse_prompt_template = PromptTemplate(
    input_variables=input_variables, template=base_response_template
)

base_response_chain = LLMChain(
    llm=llm, 
    prompt=base_repsonse_prompt_template, 
    output_key=base_response_output_key,
) 

In [None]:
# 2. Plan verification chain

plan_verifications_template = """
Given the below Question and answer, generate a series of verification questions that test the factual claims in the original baseline response.
For example if part of a longform model response contains the statement 
“Federal Chancellors Willy Brandt and Angela Merkel were born in Berlin. Brandt was born in 1913, Merkel in 1954”, 
then one possible verification question to check those dates could be “Were Chancellor Willy Brandt and Angela Merkel born in Berlin?”

Question: {query}
Answer: {base_response}

<fact in passage>, <verification question, generated by combining the query and the fact>

{format_instructions}
"""


class PlanVerificationsOutput(BaseModel):
    query: str = Field(description="The user's query")
    base_response: str = Field(description="The response to the user's query")
    facts_and_verification_questions: dict[str, str] = Field(
        description="Facts (as the dictionary keys) extracted from the response and verification questions related to the query (as the dictionary values)"
    )


plan_verifications_output_parser = PydanticOutputParser(
    pydantic_object=PlanVerificationsOutput
)

plan_verifications_prompt_template = PromptTemplate(
    input_variables=input_variables + [base_response_output_key],
    template=plan_verifications_template,
    partial_variables={
        "format_instructions": plan_verifications_output_parser.get_format_instructions()
    },
)
plan_verifications_chain = LLMChain(
    llm=llm,
    prompt=plan_verifications_prompt_template,
    output_key="output",
    output_parser=plan_verifications_output_parser,
)

In [None]:
# 3. Baseline + Plan verification sequential chain

answer_and_plan_verification = SequentialChain(
    chains=[base_response_chain, plan_verifications_chain],
    input_variables=["query"],
    output_variables=["output"],
    verbose=True)


intermediate_result = answer_and_plan_verification.run(query)

In [None]:
intermediate_result.base_response

In [None]:
intermediate_result.facts_and_verification_questions

In [None]:
claimed_facts = list(intermediate_result.facts_and_verification_questions.keys())
verification_questions = list(
    intermediate_result.facts_and_verification_questions.values()
)

In [None]:
verification_questions

In [None]:
verify_results_str = ""
verify_input_variables = ["question"]
verify_output_key = "answer"
verify_template = """{question}"""

verify_prompt_template = PromptTemplate(
    input_variables=verify_input_variables, template=verify_template
)

verify_chain = LLMChain(
    llm=llm, prompt=verify_prompt_template, output_key=verify_output_key
)
for i in range(len(verification_questions)):
    claimed_fact = claimed_facts[i]
    question = verification_questions[i]
    answer = verify_chain.run(question)
    answer = answer.lstrip("\n")
    verify_results_str += f"Question: {question}\nAnswer: {answer}\n\n"

In [None]:
print(verify_results_str)

## Tree-of_Thought

![](../assets/tot.webp)

In [None]:
prompt_11_template = """
Step1 :
 
I have a problem related to {input}. Could you brainstorm three distinct solutions? Please consider a variety of factors such as {perfect_factors}
A:
"""

prompt_11 = PromptTemplate.from_template(prompt_11_template)
prompt_11.input_variables


In [None]:
chain1 = LLMChain(
    llm=llm,
    prompt=prompt_11,
    output_key="solutions"
)

In [None]:
prompt_12_template ="""
Step 2:

For each of the three proposed solutions, evaluate their potential. 
Consider their pros and cons, initial effort needed, implementation difficulty, potential challenges, and the expected outcomes. 
Assign a probability of success and a confidence level to each option based on these factors

{solutions}

A:"""

prompt_12 = PromptTemplate.from_template(prompt_12_template)
prompt_12.input_variables

In [None]:
chain2 = LLMChain(
    llm=llm,
    prompt=prompt_12,
    output_key="review"
)

In [None]:
prompt_13_template ="""
Step 3:

For each solution, deepen the thought process. Generate potential scenarios, strategies for implementation, any necessary partnerships or resources, and how potential obstacles might be overcome. 
Also, consider any potential unexpected outcomes and how they might be handled.

{review}

A:"""

prompt_13 = PromptTemplate.from_template(prompt_13_template)
prompt_13.input_variables


In [None]:
chain3 = LLMChain(
    llm=llm,
    prompt=prompt_13,
    output_key="deepen_thought_process"
)

In [None]:
prompt_14_template ="""
Step 4:

Based on the evaluations and scenarios, rank the solutions in order of promise. 
Provide a justification for each ranking and offer any final thoughts or considerations for each solution
{deepen_thought_process}

A:"""

prompt_14 = PromptTemplate.from_template(prompt_14_template)
prompt_14.input_variables

In [None]:
chain4 = LLMChain(
    llm=llm,
    prompt=prompt_14,
    output_key="ranked_solutions"
)

In [None]:
overall_chain = SequentialChain(
    chains=[chain1, chain2, chain3, chain4],
    input_variables=["input", "perfect_factors"],
    output_variables=["ranked_solutions"],
    verbose=True
)

response = overall_chain({"input":"human colonization of Mars", "perfect_factors":"The distance between Earth and Mars is very large, making regular resupply difficult"})
display(JSON(response))

In [None]:
print(response['ranked_solutions'])