# Prompt Engineering
<img src="../assets/module_4/pe_banner.jpg">

Prompt Engineering is this thrilling new discipline that opens the door to a world of possibilities with large language models (LLMs).

As a prompt engineer, you'll delve into the depths of LLMs, unraveling their capabilities and limitations with finesse. But prompt engineering isn't about mere prompts. It is aa combination of skills and techniques, enabling you to interact and innovate through the use of LLMs.

In this module, we will step into the fascinating world of prompt engineering, where we will learn about key principals of working with LLMs through prompts.

## Local Model using GPT4ALL
> GPT4All is an open-source software ecosystem that allows anyone to train and deploy powerful and customized large language models (LLMs) on everyday hardware. Nomic AI oversees contributions to the open-source ecosystem ensuring quality, security and maintainability.

It provides easy to setup and use python bindings.

```python
!pip install gpt4all
```

For OpenAI bindings
```python
!pip install --upgrade openai
```

<a target="_blank" href="https://colab.research.google.com/github/raghavbali/llm_workshop_dhs23/blob/main/module_04/prompt_engineeering_and_langchain.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
!pip install gpt4all

Collecting gpt4all
  Downloading gpt4all-1.0.6-py3-none-macosx_10_9_universal2.whl (7.7 MB)
[K     |████████████████████████████████| 7.7 MB 2.3 MB/s eta 0:00:01
Installing collected packages: gpt4all
Successfully installed gpt4all-1.0.6
You should consider upgrading via the '/Users/r.bali/.pyenv/versions/3.8.11/envs/exp/bin/python3.8 -m pip install --upgrade pip' command.[0m


In [1]:
import gpt4all
from IPython.display import display, Markdown
import openai
import json
import os



In [16]:
# NOTE: If you have access to openAI, this can be easily used with the same
MODEL_TYPE = "openLLAMA"
#or "OPENAI" #openLLAMA

In [17]:
if MODEL_TYPE == "OPENAI":
    API_KEY = ""
    os.environ["OPENAI_API_KEY"] = API_KEY
    openai.organization = ""
    openai.api_key = os.environ["OPENAI_API_KEY"]
    llm_model = "gpt-3.5-turbo"
else:
    # llama quantized
    MODEL_NAME = "nous-hermes-13b.ggmlv3.q4_0.bin"
    #or "GPT4All-13B-snoozy.ggmlv3.q4_0.bin"
    llm_model = gpt4all.GPT4All(MODEL_NAME)

Found model file at  /Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin


objc[83064]: Class GGMLMetalClass is implemented in both /Users/r.bali/.pyenv/versions/3.8.11/envs/exp/lib/python3.8/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x1110bf208) and /Users/r.bali/.pyenv/versions/3.8.11/envs/exp/lib/python3.8/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x111227208). One of the two will be used. Which one is undefined.
llama.cpp: loading model from /Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model

In [18]:
if MODEL_TYPE == "OPENAI":
    def get_completion(prompt, model):
        messages = [{"role": "user", "content": prompt}]
        response = openai.ChatCompletion.create(
            model=model,
            messages = messages,
            temperature=0
        )
        return response.choices[0].message['content']
else:
    def get_completion(prompt, model):
        messages = [{"role": "user", "content": prompt}]
        response = model.generate(
            prompt, streaming = False
        )
        return json.dumps(response, indent=4)

## Prompting Basics

+ Be Clear and Provide Specific Instructions
+ Allow Time to **Think**



In [5]:
# Be Clear and Specific

# Example: Clearly state which text to look at, provide delimiters
text = """
The dominant sequence transduction models are based on complex recurrent or 
convolutional neural networks in an encoder-decoder configuration. The best 
performing models also connect the encoder and decoder through an attention 
mechanism. We propose a new simple network architecture, the Transformer, 
based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. 
Experiments on two machine translation tasks show these models to be superior in quality 
while being more parallelizable and requiring significantly less time to train.
"""

prompt = f"""
Summarize the text delimited by triple backticks \
into a single sentence. Identify key contributions.
```{text}```
"""
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(get_completion(prompt, llm_model))

> sample output OPENAI

The text discusses the Transformer, a new network architecture that relies solely on attention mechanisms, eliminating the need for recurrent or convolutional neural networks, and shows that it outperforms existing models in machine translation tasks in terms of quality, parallelizability, and training time.


In [20]:
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(get_completion(prompt, llm_model))

> sample output openLLAMA

"Key contributions: Propose a new simple network architecture called Transformer based solely on attention mechanisms, showing superiority over existing complex recurrent or convolutional neural networks while being faster and more efficient during training."


In [6]:
# Be Clear and Specific
text = """
The dominant sequence transduction models are based on complex recurrent or 
convolutional neural networks in an encoder-decoder configuration. The best 
performing models also connect the encoder and decoder through an attention 
mechanism. We propose a new simple network architecture, the Transformer, 
based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. 
Experiments on two machine translation tasks show these models to be superior in quality 
while being more parallelizable and requiring significantly less time to train.
"""
prompt = f"""
Summarize the text delimited by triple backticks \
into a single sentence. Provide response in markdown format
with a title for the summary.
```{text}```

"""
response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
display(Markdown(response))

> sample output OPENAI

## Summary: 
The Transformer network architecture, which is based solely on attention mechanisms and does not use recurrence or convolutions, outperforms other models in terms of quality, parallelizability, and training time on machine translation tasks.

In [22]:
display(Markdown(f"> sample output {MODEL_TYPE}"))
display(Markdown(response))

> sample output openLLAMA

"Title: Summary of the Transformer architecture for sequence transduction"

In [7]:
# Be Clear and Specific, aka provide step by step instructions
text = """To make tea you first need to have a cup full of water,
half cup milk, some sugar and tea leaves. Start by boiling water.
Once it comes to a boil, add milk to it. Next step is to add tea and
let it boil for another minute.
Add sugar to taste. Serve in a tall glass
"""

prompt = f"""
Read the text delimited by triple single quotes.
Check if it contains a sequence of instructions, \
re-write the instructions in the following format:

Point 1 - ...
Point 2 - …
…
Point N - …

If the text does not contain a sequence of instructions, \
then apologize that you cannot rephrase such text.

'''{text}'''
"""

response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output OPENAI

Point 1 - To make tea, you first need to have a cup full of water, half cup milk, some sugar, and tea leaves.
Point 2 - Start by boiling water.
Point 3 - Once it comes to a boil, add milk to it.
Point 4 - The next step is to add tea and let it boil for another minute.
Point 5 - Add sugar to taste.
Point 6 - Serve in a tall glass.


> sample output openLLAMA

Point 1 - Boil the water

Point 2 - Add half cup of milk

Point 3 - Add some sugar

Point 4 - Let it boil for one more minute

Point 5 - Add tea leaves and let it steep for another five minutes.

In [9]:
# without instructions
# openAI
display(Markdown(f"> sample output {MODEL_TYPE}"))
get_completion('What are snakes?',llm_model)

> sample output OPENAI

'Snakes are elongated, legless reptiles belonging to the suborder Serpentes. They are characterized by their long, cylindrical bodies covered in scales, lack of limbs, and ability to move in a serpentine motion. Snakes are found in various habitats worldwide, except in Antarctica, Iceland, Ireland, and New Zealand. They come in a wide range of sizes, from tiny thread snakes measuring a few inches to large pythons and anacondas that can exceed 20 feet in length. Snakes are carnivorous and feed on a variety of prey, including rodents, birds, amphibians, and other reptiles. They have a unique way of capturing and consuming their food, using their highly flexible jaws to swallow prey whole. Some snakes are venomous, possessing specialized fangs and venom glands to immobilize or kill their prey, while others are non-venomous and rely on constriction to subdue their victims. Snakes play important roles in ecosystems as both predators and prey, and they have been the subject of fascination an

In [10]:
# Be Clear and Specific, aka provide examples
prompt = f"""
Your task is to answer in conversation style mentioned in triple back quotes.
Keep answers very short similar to examples provided below.

```
<kid>: What are birds?
<father>: birds are cute little creatures that can fly

<kid>: What are whales?
<father>: Whales are very big fish that roam the oceans
```

<kid>: What are snakes?
"""
response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output OPENAI

<father>: Snakes are long, slithery reptiles.


> sample output openLLAMA

```
<parent>: Snakes are slimy, scaly animals with no legs. They eat small creatures and can swallow them whole!
```

In [12]:
# Allow for time to think (similar to step by step instructions)
text = """
Our last holiday was in Germany. We visited Berlin and Hamburg.
"""
prompt = f"""
Summarize the text delimited by triple \
backticks briefly. Then follow the instructions :
1 - Translate the summary to German.
2 - List each city in the text.
3 - Output a python dictionary object that contains the following \
keys: original_text, german_translation, num_cities, city_names.

Text:
```{text}```
"""

response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output OPENAI

Summary: The text mentions that the last holiday was in Germany and the cities visited were Berlin and Hamburg.

German Translation: Unser letzter Urlaub war in Deutschland. Wir haben Berlin und Hamburg besucht.

City List: Berlin, Hamburg

Python Dictionary:
{
  "original_text": "Our last holiday was in Germany. We visited Berlin and Hamburg.",
  "german_translation": "Unser letzter Urlaub war in Deutschland. Wir haben Berlin und Hamburg besucht.",
  "num_cities": 2,
  "city_names": ["Berlin", "Hamburg"]
}


In [24]:
response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output openLLAMA

"Summary: Our trip to Germany included visits to Berlin and Hamburg."


In [15]:
# Allow time to think, aka ask LLM to generate its own answer and then compare

prompt = f"""
Determine if the user's solution delimited by triple back ticks\
is correct or not.
To solve the problem the instructions are as follows:
- Step 1: prepare your own solution to the problem.
- Step 2: Compare your solution to the user's solution \
and evaluate if the user's solution is correct or not.
Do not decide if the solution is correct until
you have done the problem yourself.

Use the following format:
Question:
```
question here
```
User's solution:
```
student's solution here
```
Actual solution:
```
steps to work out the solution and your solution here
```
Is the user's solution the same as actual solution \
just calculated:
```
yes or no
```
Final Answer:
```
correct or incorrect
```

Question:
```
I went to the market and bought 10 apples.
I gave 2 apples to the neighbor and 2 to the repairman.
I then went and bought 5 more apples and ate 1. How many apples did I remain with?
```
User's solution:
```
1. I started with 10 apples.
2. I gave away 2 apples to the neighbor and 2 to the repairman, so now I have 6 apples left.
3. Then I bought 5 more apples, so now I have 11 apples.
4. I then ate 1 apple, so I will have only 10 apples with me.
```
Actual Answer:
"""

response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output OPENAI

1. I started with 10 apples.
2. I gave away 2 apples to the neighbor and 2 to the repairman, so now I have 6 apples left.
3. Then I bought 5 more apples, so now I have 11 apples.
4. I then ate 1 apple, so I will have only 10 apples with me.

Is the user's solution the same as actual solution just calculated:
yes

Final Answer:
correct


In [26]:
response = get_completion(prompt,llm_model)
display(Markdown(f"> sample output {MODEL_TYPE}"))
print(response)

> sample output openLLAMA

"```\nYes, the user's solution is correct as it follows the given steps and arrives at the same answer as calculated above. So, the final number of apples remaining with you would be 10. The answer to the question \"How many apples did I remain with?\" is incorrect."


## Types of Prompts

<img src="../assets/module_4/pe_types.jpg">

### Zero-Shot Prompting
Zero-shot or without any examples. Since LLMs are trained on huge amounts of data and instructions, they work pretty well without any specific examples (shots) for usual tasks such as summarization, sentiment classification, grammar checks, etc.

_Sample Prompt_:
```
Classify the text as neutral, positive or negative.
Text: The food at this restaurant is so bad.
Sentiment:

```

### Few-Shot Prompting
LLMs are good for basic instructions they are trained with but for complex requirements they need some hand-holding or some examples to better understand the instructions.

_Sample Prompt_:
```
Superb drinks and amazing service! > Positive
I don't understand why this place is so expensive, worst food ever. > Negative
Totally worth it, tasty 100%. > Positive
This place is such an utter waste of time. >
```
**Note**: We did not explicitly instruct our LLM to do sentiment classification, rather gave examples (few-shot) to help it understand


### Chain of Thought (COT)
Tasks which are more complex and require a bit of *reasoning* (careful there 😉 ) require special measures. Introduced by in a paper of similar title by [Wei et. al.](https://arxiv.org/abs/2201.11903) combines few-shot prompting with additional instructions for the LLM to think through while generating the response.

_Sample Prompt_:
<img src="../assets/module_4/cot_few_shot.png">

> Source: [Wei et. al.](https://arxiv.org/abs/2201.11903)

#### COT Zero Shot ✨
Extension of COT setup where instead of providing examples on how to solve a problem, we explicitly state ``Let's think step by step``. This was introduced by [Kojima et. al.](https://arxiv.org/abs/2205.11916)

_Sample Prompt_:
```
I went to the market and bought 10 apples.
I gave 2 apples to the neighbor and 2 to the repairman.
I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.
```

## Advanced Prompting Techniques
Prompt Engineering or PE is an active area of research where new techniques
are being explored every day. Some of these are:

  - [Auto Chain of Thought](https://arxiv.org/abs/2210.03493)
  - [Majority Vote or Self-Consistency](https://arxiv.org/abs/2203.11171)
  - [Tree of Thoughts](https://arxiv.org/abs/2305.10601)
  - Augmented Generation/Retrieval
  - [Auto Prompt Engineering (APE)](https://arxiv.org/abs/2211.01910)
  - [Multi-modal Prompting](https://arxiv.org/abs/2302.00923)
  


## LangChain 🦜🔗
- [LangChain](https://python.langchain.com/docs/get_started/introduction.html) is a framework for developing LLM powered applications.
- It provides capabilities to connect LLMs to a number of different sources of data
- Provides interfaces for language models to interact with external environment (aka _Agentic_)
- Provides for required levels of abstractions to designing end to end applications

In [7]:
!pip install langchain

You should consider upgrading via the '/Users/r.bali/.pyenv/versions/3.8.11/envs/exp/bin/python3.8 -m pip install --upgrade pip' command.[0m


In [27]:
llm_model.config

{'systemPrompt': '',
 'promptTemplate': '### Instruction:\n{0}\n### Response:\n',
 'order': 'c',
 'md5sum': '4acc146dd43eb02845c233c29289c7c5',
 'name': 'Hermes',
 'filename': 'nous-hermes-13b.ggmlv3.q4_0.bin',
 'filesize': '8136777088',
 'requires': '2.4.7',
 'ramrequired': '16',
 'parameters': '13 billion',
 'quant': 'q4_0',
 'type': 'LLaMA',
 'description': '<strong>Extremely good model</strong><br><ul><li>Instruction based<li>Gives long responses<li>Curated with 300,000 uncensored instructions<li>Trained by Nous Research<li>Cannot be used commercially</ul>',
 'path': '/Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin'}

In [28]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [29]:
llm = GPT4All(
    model='/Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin',#GPT4All-13B-snoozy.ggmlv3.q4_0.bin',
    callbacks=[StreamingStdOutCallbackHandler()]
)

Found model file at  /Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin


llama.cpp: loading model from /Users/r.bali/.cache/gpt4all/nous-hermes-13b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 9031.71 MB (+ 1608.00 MB per state)
llama_new_context_with_model: kv self size  = 1600.00 MB


In [19]:
template = """
You are a friendly chatbot assistant that responds in a conversational
manner to users questions. Keep the answers short, unless specifically
asked by the user to elaborate on something.

Question: {question}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

query = input("Prompt: ")
llm_chain(query)

Found model file at  /root/.cache/gpt4all/GPT4All-13B-snoozy.ggmlv3.q4_0.bin
Prompt: What is the capital of Germany
 The capital of Germany is Berlin

{'question': 'What is the capital of Germany',
 'text': ' The capital of Germany is Berlin'}

## LangChain Conversation Buffer

LangChain provides us with an easy to use interface to enable LLMs to refer to context/memory
across multiple chains/calls

In [30]:
from langchain import LLMChain, PromptTemplate
from langchain.memory import ConversationBufferWindowMemory

In [31]:
template = """
{history}
Human: {human_input}
Assistant:"""

prompt = PromptTemplate(input_variables=["history", "human_input"], template=template)


chatgpt_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    memory=ConversationBufferWindowMemory(k=4),
)

In [32]:
output = chatgpt_chain.predict(
    human_input="""Follow the instruction I specify as {instruction} on the
    text I specify as <text>
    My first instruction is {summarize text briefly} <We went on Holiday to \
    Germany. We visited Berlin and Hamburg"""
)
print(output)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m

Human: Follow the instruction I specify as {instruction} on the
    text I specify as <text>
    My first instruction is {summarize text briefly} <We went on Holiday to     Germany. We visited Berlin and Hamburg
Assistant:[0m
 Berlin and Hamburg are two cities in Germany that you visited during your holiday there.
[1m> Finished chain.[0m
 Berlin and Hamburg are two cities in Germany that you visited during your holiday there.


In [33]:
output = chatgpt_chain.predict(human_input="{Translate summary in German}")
print(output)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Human: Follow the instruction I specify as {instruction} on the
    text I specify as <text>
    My first instruction is {summarize text briefly} <We went on Holiday to     Germany. We visited Berlin and Hamburg
AI:  Berlin and Hamburg are two cities in Germany that you visited during your holiday there.
Human: {Translate summary in German}
Assistant:[0m
Das war eine schnelle Übersicht über unseren Urlaub nach Deutschland, wo wir Berlin und Hamburg besucht haben.
[1m> Finished chain.[0m
Das war eine schnelle Übersicht über unseren Urlaub nach Deutschland, wo wir Berlin und Hamburg besucht haben.


In [34]:
output = chatgpt_chain.predict(human_input="{Identify the name of cities}")
print(output)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Human: Follow the instruction I specify as {instruction} on the
    text I specify as <text>
    My first instruction is {summarize text briefly} <We went on Holiday to     Germany. We visited Berlin and Hamburg
AI:  Berlin and Hamburg are two cities in Germany that you visited during your holiday there.
Human: {Translate summary in German}
AI: Das war eine schnelle Übersicht über unseren Urlaub nach Deutschland, wo wir Berlin und Hamburg besucht haben.
Human: {Identify the name of cities}
Assistant:[0m
 The two cities you visited during your holiday to Germany are Berlin and Hamburg.
[1m> Finished chain.[0m
 The two cities you visited during your holiday to Germany are Berlin and Hamburg.


In [35]:
output = chatgpt_chain.predict(human_input="prepare a python dictionary with keys original_text, city_names, summary, german_summary")
print(output)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Human: Follow the instruction I specify as {instruction} on the
    text I specify as <text>
    My first instruction is {summarize text briefly} <We went on Holiday to     Germany. We visited Berlin and Hamburg
AI:  Berlin and Hamburg are two cities in Germany that you visited during your holiday there.
Human: {Translate summary in German}
AI: Das war eine schnelle Übersicht über unseren Urlaub nach Deutschland, wo wir Berlin und Hamburg besucht haben.
Human: {Identify the name of cities}
AI:  The two cities you visited during your holiday to Germany are Berlin and Hamburg.
Human: prepare a python dictionary with keys original_text, city_names, summary, german_summary
Assistant:[0m
 {'original_text': 'We went on Holiday to Germany. We visited Berlin and Hamburg', 'city_names': ['Berlin','Hamburg'],'summarize text briefly':'Das war eine schnelle Übersicht über unseren Urlaub nach Deutschland, wo wir Berl

## Beyond LangChain

### [LlamaIndex](https://www.llamaindex.ai/)
Similar to langchain, LlamaIndex provides utilities to extend the power of LLMs through various integrations for:
    - Data ingestion
    - Data Indexing
    - Querying

### [LangSmith](https://docs.smith.langchain.com/)
Build production grade applications by providing tools & utilities for
    - Debugging
    - Testing
    - Integrations
    - Token Usage

### [HuggingFace](https://huggingface.co/models?other=LLM)
The defacto standard for not just LLMs but large models across NLP, Computer vision and more.
Libraries such as ``transformers``, ``diffusers``, ``accelerate`` and more provide ease of working
with deep learning models in pytorch/tensorflow. Huggingface now also provides ``model-cards`` and ``model-spaces``
for hosting and executing models on cloud for free.

## [LLM-Foundry](https://github.com/mosaicml/llm-foundry)
Mosaic ML released their own GPT style models based on special features such as [Flash Attention](https://arxiv.org/pdf/2205.14135.pdf) & [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) for efficient/faster
training along with ALiBi for extended context lengths (65k+ tokens). LLM-Foundary is a package built to assist their implementations
for training and fine-tuning LLMs.


