# Prompt Engineering with Llama 2 & 3
***

## Models

- **Base/foundational models** - pure next word predictors, useful for developers who want to continue training for a particular task.

- **Chat models** - ideal for chatbots and question/answering tasks, further trained on instructions.

- **Code models** - further trained on code.


## Overview

Llama 2 comes in 3 sizes:
1. Llama 7B chat
2. Llama 13B chat
3. Llama 70B chat

Larger the model is, the more capacity it has to learn from its training data. Chat models are instruction-tuned on human language instructions (e.g. summarize this, tell me a joke, etc). **More common to use base foundational models for fine-tuning.**


**Performance:**

|                           | Llama2       | Falcon 40B   | GPT 3.5      |
|---------------------------|--------------|--------------|--------------|
| Performance (MMLU benchmark) |    68.9          | 55.4             |  70            |


<br>**Code Llama models:**

![Optional Title](imgs/code_llama_0.png)

* Created by taking Llama 2 and training it for coding tasks. Comes in 3 sizes (7B, 13B, 34B).

* For each size. there is a base version and an instruction fine-tuned version.

* Base code Llama models are derived from the non-chat Llama models. Can be used for autocompleting/filling code.

* Instruction fine-tuned models created by training the Llama chat models. So these models exhibit more human-like behavior (e.g. "help me write some code to build some web app" or "please debug the following code I just wrote". Good at human explanations of what that code is doing.

* There is also Code Llama Python, which is specialized for Python coding.

**Purple Llama**

Umbrella project for Generative AI Safety
* tools
* models
* benchmarks

Two goals: 
* Make sure LLM outputs are safe to run on computers (cybersecurity)
* Make sure LLM models are generating content that isn't harmful/toxic to humans.

1. **CyberSecEval** - set of tool/benchmarks dataset to check if code completion tools are generationg secure code that guards against viruses/cyber threats.
2. **Llama Guard** - Takes input/output of any LLM to detect harmful/toxic content.

## Getting Started

* *Instruction tags*: Prompting in Llama models is unique. This helps apply the recommended formatting methods.

* Foundation model - https://huggingface.co/meta-llama/Llama-2-7b-hf

* Chat model - https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

* Try to set `repetition_penalry=1.2` or `top_p=0.95`, or `temperature=0.8`.

In [1]:
from utils import query

# Using (chat model): https://huggingface.co/meta-llama/Llama-2-7b-hf

model_name = "meta-llama/Llama-2-7b-hf"

response = query("What's the capital of Spain?", model_name, temperature=0.7, verbose=True)

Model: meta-llama/Llama-2-7b-hf
Temperature: 0.7
Prompt: [INST]What's the capital of Spain?[/INST]

max_tokens: 1024
top_p: 0.95


In [6]:
response

'No, it isn\'t. Madrid is not in Latin America either. It belongs to Europe and there are many differences between these two continents.\nThen what are you complaining about if I say that Spanish-speaking countries use Spanish as their official language and English speaking countries do so too with English? You don\'t have a point. You sound like someone who says "I hate Americans" while living on US soil!\nBtw, my last post was made by me'

3 ways to access Llama:

1. Hosted API service (e.g. Hugging Face, Bedrock, GCP, Azure, Together.ai, etc)
2. Self-configured Cloud
3. Host on your own computer

**Prompting Llama models**

![Optional Title](imgs/llama_0.png)

For most use-cases, use the Llama chat models instead of the base foundation models. 

The foundational models do not understand the *instruction tags*.

If you ask the foundational model "What is the capital of France?", it will return something like "What is the capital of Spain?", etc.


Foundational models **learn to predict the next word given the words that came before it.**.

The chat model might return a more human-like response to the question. The chat model was trained using instruction tags with human-like responses to queries.

If you want consistent responses given the same input prompt, set `temperature` to 0. This will have the Llama model behave deterministically.

In [2]:
# test temperature
response = query("What is the capital of France?", model_name, temperature=0.4, verbose=True)

Model: meta-llama/Llama-2-7b-hf
Temperature: 0.4
Prompt: [INST]What is the capital of France?[/INST]

max_tokens: 1024
top_p: 0.95


In [3]:
response

'Paris\nParis is the capital of France.\nParis is the capital of France.Paris is the capital of France.\n[INST]What is the capital of France?[/INST'

Now let's look at `max_tokens`, let's set to 1024 by default in the helper function. On average, 1 token is about 3/4-th of a word. 


But important to note that setting a smaller number of tokens doesn't make the model give it's more complete answer more succinctly. It just stops part way through its answer.

LLMs have a limit to how many tokens they can take as input, and give as output.

Note that when you set `max_tokens`, this means that `input` tokens + `max_new_tokens` (output response tokens) must be less than or equal to (<=) 4097, which is the *context size* of the Llama 2 7B Chat model. Make sure that the input + output doesn't exceed this. Llama will not do this by itself, and might return an error (validation error). This means if your input is 3974 tokens, set `max_tokens` for the Llama model to 123 to not pass the 4097 token limit of Llama 2.

In [4]:
# test temperature
response = query("What is the capital of France?", model_name, max_tokens=5, temperature=0.4, verbose=True)

Model: meta-llama/Llama-2-7b-hf
Temperature: 0.4
Prompt: [INST]What is the capital of France?[/INST]

[{'generated_text': '[INST]What is the capital of France?[/INST]\nAns: Paris.'}]


In [6]:
response

'Ans: Paris.'

**Summarizing a book**

In [1]:
with open("data/test_book.txt", "r", encoding="utf-8") as file:
    text = file.read()

In [4]:
from utils import query

prompt = f"""
Give me a summary of the following text in 100 words:\n\n
{text}
"""


model_name = "meta-llama/Llama-2-7b-hf"

response = query(prompt, model_name, temperature=0.4, verbose=False)

In [3]:
response

'### [INST]\n\nGive me a summary of the following text in 100 words:\n\n\nTitle: The Lost World\n\nAuthor: Sir Arthur Conan Doyle\n\nIn "The Lost World," Sir Arthur Conan Doyle crafts a thrilling tale of adventure and discovery in the heart of the Amazon jungle. The story follows Professor Challenger, a renowned explorer and scientist, who leads a team of'

## Multi-turn Conversations


In [1]:
from utils import query

prompt = f"""
Give me 5 fun activities I can do this weekend.
"""


model_name="meta-llama/Llama-2-7b-chat-hf"

response = query(prompt, model_name, temperature=0.7, verbose=False)

In [2]:
response

"Of course! Here are 5 fun activities you can do this weekend:\n\n1. Outdoor Adventure: Plan a hike, camping trip, or a picnic in a nearby park or nature reserve. Being in nature can be incredibly rejuvenating and relaxing.\n2. Game Night: Invite some friends over for a game night. You can play board games, card games, or even video games. It's a great way to social"

**Constructing multi-turn prompts**

General form of multi-turn chat prompt:

```python
prompt_chat = f"""
    User: {prompt 1}
    Assistant: {response 1}
    User: {prompt 2}
    Assistant: {response 2}
    User: {prompt 3}
"""
```

Each turn you add in a new prompt-response pair.

Llama-2 form of multi-turn chat prompt:


```python
prompt_chat = f"""
    <s>[INST]{user prompt 1}[/INST]
    Assistant: {model response 1}</s>
    <s>[INST]{user prompt 2}[/INST]
    Assistant: {model response 2}</s>
    ...
    <s>[INST]{user prompt 3}[/INST]
"""
```

Chat prompt ends with latest input from the user. 

You wrap each prompt-response pair with a set of start tags `<s>` and end tags `</s>`.

You open the last prompt with a start tag `<s>`, but this time you don't close with an end tag. The turn is not over, you want the model to respond.

In [14]:
prompt_1 = """
    What are some fun activities I can doo this weekend??
"""
response_1 = query(prompt_1, max_tokens=3000, verbose=True)
print(response_1)

Model: meta-llama/Llama-2-7b-chat-hf
Temperature: 0.8
Prompt: [INST]
    What are some fun activities I can doo this weekend??
[/INST]

max_tokens: 3000
top_p: 0.95
There are so many fun activities you can do this weekend, depending on your interests and preferences! Here are some ideas to get you started:

1. Outdoor Adventures: Go hiking, camping, or kayaking in a nearby park or nature reserve. Rent a bike and explore a new trail or explore the local beaches and go swimming.
2. Cultural Events: Attend a concert, play, or festival in your area.


In [16]:
print(response_1)

There are so many fun activities you can do this weekend, depending on your interests and preferences! Here are some ideas to get you started:

1. Outdoor Adventures: Go hiking, camping, or kayaking in a nearby park or nature reserve. Rent a bike and explore a new trail or explore the local beaches and go swimming.
2. Cultural Events: Attend a concert, play, or festival in your area.


In [17]:
prompt_2 = """
Which of these would be good for my health?
"""

In [18]:
chat_prompt = f"""
<s>[INST] {prompt_1} [/INST]
{response_1}
</s>
<s>[INST] {prompt_2} [/INST]
"""
print(chat_prompt)


<s>[INST] 
    What are some fun activities I can doo this weekend??
 [/INST]
There are so many fun activities you can do this weekend, depending on your interests and preferences! Here are some ideas to get you started:

1. Outdoor Adventures: Go hiking, camping, or kayaking in a nearby park or nature reserve. Rent a bike and explore a new trail or explore the local beaches and go swimming.
2. Cultural Events: Attend a concert, play, or festival in your area.
</s>
<s>[INST] 
Which of these would be good for my health?
 [/INST]



In [19]:
# setting false here b/c helper function already puts
# the instruction token for single-turn chat
# here we are constructing part for multi-turn chat
# so we put the instruction tags ourselves
response_2 = query(chat_prompt,
                 add_inst=False,
                 verbose=True)

Model: meta-llama/Llama-2-7b-chat-hf
Temperature: 0.8
Prompt: 
<s>[INST] 
    What are some fun activities I can doo this weekend??
 [/INST]
There are so many fun activities you can do this weekend, depending on your interests and preferences! Here are some ideas to get you started:

1. Outdoor Adventures: Go hiking, camping, or kayaking in a nearby park or nature reserve. Rent a bike and explore a new trail or explore the local beaches and go swimming.
2. Cultural Events: Attend a concert, play, or festival in your area.
</s>
<s>[INST] 
Which of these would be good for my health?
 [/INST]


max_tokens: 1024
top_p: 0.95


In [20]:
print(response_2)

All of the activities I mentioned can be good for your health, depending on your current level of physical activity and any health considerations you may have. Here are some specific health benefits of each activity:

1. Hiking: Hiking is a great way to get cardiovascular exercise, build muscle strength and endurance, and improve flexibility. It can also help reduce stress and improve mental health.
2. Camping: Camping can provide opportunities


Number of prompts will always be greater than number of responses because we are passing prompt-response pairs, which we send to the model expecting our response back. Let's try using the `chat` utils function.

In [1]:
from utils import query, chat

In [7]:
prompt_1 = """
    What are some fun activities I can doo this weekend??
"""

response_1 = query(prompt_1)

In [8]:
prompt_2 = """
Which of these would be good for my health?
"""

In [9]:
prompts = [prompt_1, prompt_2]
responses = [response_1]

In [10]:
response_2 = chat(prompts, responses, verbose=True)

Chat Prompt:
<s>[INST] 
    What are some fun activities I can doo this weekend??
 [/INST]
There are plenty of fun activities you can do this weekend! Here are some ideas:

1. Outdoor Adventures: Go for a hike, have a picnic in a nearby park, or go camping in the mountains.
2. Indoor Games: Host a game night with friends, play board games or card games, or try escape room challenges.
3. Movie Night: Rent a classic movie or a new release and have a movie marathon
</s>
<s>[INST] 
Which of these would be good for my health?
 [/INST]



In [13]:
response_2

'All of the activities I mentioned can be good for your health, depending on your individual needs and preferences. Here are some specific benefits of each activity:\n\n1. Outdoor Adventures: Spending time in nature has been shown to have numerous health benefits, including reducing stress levels, improving mood, and boosting the immune system. Being physically active outdoors can also improve cardiovascular health and overall fitness.\n2. Indoor Games'

Let's keep going...

In [14]:
prompt_3 = "Which of these activites would be fun with friends?"

prompts = [prompt_1, prompt_2, prompt_3]
responses = [response_1, response_2]

response_3 = chat(prompts=prompts, responses=responses, verbose=True)

Chat Prompt:
<s>[INST] 
    What are some fun activities I can doo this weekend??
 [/INST]
There are plenty of fun activities you can do this weekend! Here are some ideas:

1. Outdoor Adventures: Go for a hike, have a picnic in a nearby park, or go camping in the mountains.
2. Indoor Games: Host a game night with friends, play board games or card games, or try escape room challenges.
3. Movie Night: Rent a classic movie or a new release and have a movie marathon
</s>
<s>[INST] 
Which of these would be good for my health?
 [/INST]
All of the activities I mentioned can be good for your health, depending on your individual needs and preferences. Here are some specific benefits of each activity:

1. Outdoor Adventures: Spending time in nature has been shown to have numerous health benefits, including reducing stress levels, improving mood, and boosting the immune system. Being physically active outdoors can also improve cardiovascular health and overall fitness.
2. Indoor Games
</s>
<s>[IN

## Prompt Engineering Best Practices

**Prompt Engineering**: The science and art of communicating with an LLM, so that it responds and behaves in a way that's useful for you.

You can guide the model to improve its response for your task through specific instructions or by including different kinds of information, or "context", e.g.
* Providing **examples of the task** you are trying to carry out
* Specifying how to **format responses**
* Requesting that the model assume a particular **"role or persona"** when creating its response
* Including **additional information or data** for the model to use in its response

### Zero-shot Prompting

- Here is an example of zero-shot prompting.
- You are prompting the model to see if it can infer the task from the structure of your prompt.
- In zero-shot prompting, you only provide the structure to the model, but without any examples of the completed task.


In [15]:
from utils import query, chat

prompt = """
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = query(prompt)
print(response)

The sentiment in the message "Hi Amit, thanks for the thoughtful birthday card!" is "Gratitude".


**In-context learning**: LLMs can determine the task you want them to perform from examples in your prompt! This is called **zero-shot prompt**. Some LLMs cannot do this. Some models may respond with its base behavior, and just continue generating text. Can build upon this by providing one or more examples of what we're trying to do, this can help the model **infer the task**.

### Few-shot Prompting

- Here is an example of few-shot prompting.
- In few-shot prompting, you not only provide the structure to the model, but also two or more examples.
- You are prompting the model to see if it can infer the task from the structure, as well as the examples in your prompt.
- Prompting with 1 example is called **one-shot prompting**.
- Prompting with more than 1 is called **few-shot prompting** or **n-shot prompting**.

In [17]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = query(prompt)
print(response)

Sure, here are the sentiments for each message:

1. Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
2. Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive
3. Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: Positive


In [18]:
response

"Sure, here are the sentiments for each message:\n\n1. Message: Hi Dad, you're 20 minutes late to my piano recital!\nSentiment: Negative\n2. Message: Can't wait to order pizza for dinner tonight\nSentiment: Positive\n3. Message: Hi Amit, thanks for the thoughtful birthday card!\nSentiment: Positive"

### Specifying the Output Format

- You can also specify the format in which you want the model to respond.
- In the example below, you are asking to "give a one word response".

In [None]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt)
print(response)

**Note:** For all the examples above, you used the 7 billion parameter model, `llama-2-7b-chat`. And as you saw in the last example, the 7B model was uncertain about the sentiment.

- You can use the larger (70 billion parameter) `llama-2-70b-chat` model to see if you get a better, certain response:

In [None]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt,
                model="togethercomputer/llama-2-70b-chat")
print(response)

In [None]:
- Now, use the smaller model again, but adjust your prompt in order to help the model to understand what is being expected from it.
- Restrict the model's output format to choose from `positive`, `negative` or `neutral`.

In [None]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: 

Respond with either positive, negative, or neutral.
"""
response = llama(prompt)
print(response)

### Role Prompting

- Roles give context to LLMs what type of answers are desired.
- Llama 2 often gives more consistent responses when provided with a role.
- First, try standard prompt and see the response.

In [None]:
prompt = """
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

- Now, try it by giving the model a "role", and within the role, a "tone" using which it should respond with.

In [None]:
role = """
Your role is a life coach \
who gives advice to people about living a good life.\
You attempt to provide unbiased advice.
You respond in the tone of an English pirate.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

### Summarization
- Summarizing a large text is another common use case for LLMs. Let's try that!

In [None]:
email = """
Dear Amit,

An increasing variety of large language models (LLMs) are open source, or close to it. The proliferation of models with relatively permissive licenses gives developers more options for building applications.

Here are some different ways to build applications based on LLMs, in increasing order of cost/complexity:

Prompting. Giving a pretrained LLM instructions lets you build a prototype in minutes or hours without a training set. Earlier this year, I saw a lot of people start experimenting with prompting, and that momentum continues unabated. Several of our short courses teach best practices for this approach.
One-shot or few-shot prompting. In addition to a prompt, giving the LLM a handful of examples of how to carry out a task — the input and the desired output — sometimes yields better results.
Fine-tuning. An LLM that has been pretrained on a lot of text can be fine-tuned to your task by training it further on a small dataset of your own. The tools for fine-tuning are maturing, making it accessible to more developers.
Pretraining. Pretraining your own LLM from scratch takes a lot of resources, so very few teams do it. In addition to general-purpose models pretrained on diverse topics, this approach has led to specialized models like BloombergGPT, which knows about finance, and Med-PaLM 2, which is focused on medicine.
For most teams, I recommend starting with prompting, since that allows you to get an application working quickly. If you’re unsatisfied with the quality of the output, ease into the more complex techniques gradually. Start one-shot or few-shot prompting with a handful of examples. If that doesn’t work well enough, perhaps use RAG (retrieval augmented generation) to further improve prompts with key information the LLM needs to generate high-quality outputs. If that still doesn’t deliver the performance you want, then try fine-tuning — but this represents a significantly greater level of complexity and may require hundreds or thousands more examples. To gain an in-depth understanding of these options, I highly recommend the course Generative AI with Large Language Models, created by AWS and DeepLearning.AI.

(Fun fact: A member of the DeepLearning.AI team has been trying to fine-tune Llama-2-7B to sound like me. I wonder if my job is at risk? 😜)

Additional complexity arises if you want to move to fine-tuning after prompting a proprietary model, such as GPT-4, that’s not available for fine-tuning. Is fine-tuning a much smaller model likely to yield superior results than prompting a larger, more capable model? The answer often depends on your application. If your goal is to change the style of an LLM’s output, then fine-tuning a smaller model can work well. However, if your application has been prompting GPT-4 to perform complex reasoning — in which GPT-4 surpasses current open models — it can be difficult to fine-tune a smaller model to deliver superior results.

Beyond choosing a development approach, it’s also necessary to choose a specific model. Smaller models require less processing power and work well for many applications, but larger models tend to have more knowledge about the world and better reasoning ability. I’ll talk about how to make this choice in a future letter.

Keep learning!

Andrew
"""

In [None]:
prompt = f"""
Summarize this email and extract some key points.
What did the author say about llama models?:

email: {email}
"""

response = llama(prompt)
print(response)

### Providing New Information in the Prompt
- A model's knowledge of the world ends at the moment of its training - so it won't know about more recent events.
- Llama 2 was released for research and commercial use on July 18, 2023, and its training ended some time before that date.
- Ask the model about an event, in this case, FIFA Women's World Cup 2023, which started on July 20, 2023, and see how the model responses.

In [None]:
prompt = """
Who won the 2023 Women's World Cup?
"""
response = llama(prompt)
print(response)

- As you can see, the model still thinks that the tournament is yet to be played, even though you are now in 2024!
- Another thing to **note** is, July 18, 2023 was the date the model was released to public, and it was trained even before that, so it only has information upto that point. The response says, "the final match is scheduled to take place in July 2023", but the final match was played on August 20, 2023.

- You can provide the model with information about recent events, in this case text from Wikipedia about the 2023 Women's World Cup.

In [None]:
context = """
The 2023 FIFA Women's World Cup (Māori: Ipu Wahine o te Ao FIFA i 2023)[1] was the ninth edition of the FIFA Women's World Cup, the quadrennial international women's football championship contested by women's national teams and organised by FIFA. The tournament, which took place from 20 July to 20 August 2023, was jointly hosted by Australia and New Zealand.[2][3][4] It was the first FIFA Women's World Cup with more than one host nation, as well as the first World Cup to be held across multiple confederations, as Australia is in the Asian confederation, while New Zealand is in the Oceanian confederation. It was also the first Women's World Cup to be held in the Southern Hemisphere.[5]
This tournament was the first to feature an expanded format of 32 teams from the previous 24, replicating the format used for the men's World Cup from 1998 to 2022.[2] The opening match was won by co-host New Zealand, beating Norway at Eden Park in Auckland on 20 July 2023 and achieving their first Women's World Cup victory.[6]
Spain were crowned champions after defeating reigning European champions England 1–0 in the final. It was the first time a European nation had won the Women's World Cup since 2007 and Spain's first title, although their victory was marred by the Rubiales affair.[7][8][9] Spain became the second nation to win both the women's and men's World Cup since Germany in the 2003 edition.[10] In addition, they became the first nation to concurrently hold the FIFA women's U-17, U-20, and senior World Cups.[11] Sweden would claim their fourth bronze medal at the Women's World Cup while co-host Australia achieved their best placing yet, finishing fourth.[12] Japanese player Hinata Miyazawa won the Golden Boot scoring five goals throughout the tournament. Spanish player Aitana Bonmatí was voted the tournament's best player, winning the Golden Ball, whilst Bonmatí's teammate Salma Paralluelo was awarded the Young Player Award. England goalkeeper Mary Earps won the Golden Glove, awarded to the best-performing goalkeeper of the tournament.
Of the eight teams making their first appearance, Morocco were the only one to advance to the round of 16 (where they lost to France; coincidentally, the result of this fixture was similar to the men's World Cup in Qatar, where France defeated Morocco in the semi-final). The United States were the two-time defending champions,[13] but were eliminated in the round of 16 by Sweden, the first time the team had not made the semi-finals at the tournament, and the first time the defending champions failed to progress to the quarter-finals.[14]
Australia's team, nicknamed the Matildas, performed better than expected, and the event saw many Australians unite to support them.[15][16][17] The Matildas, who beat France to make the semi-finals for the first time, saw record numbers of fans watching their games, their 3–1 loss to England becoming the most watched television broadcast in Australian history, with an average viewership of 7.13 million and a peak viewership of 11.15 million viewers.[18]
It was the most attended edition of the competition ever held.
"""

In [None]:
prompt = f"""
Given the following context, who won the 2023 Women's World cup?
context: {context}
"""
response = llama(prompt)
print(response)

### Try it Yourself!

Try asking questions of your own! Modify the code below and include your own context to see how the model responds:


In [None]:
context = """
<paste context in here>
"""
query = "<your query here>"

prompt = f"""
Given the following context,
{query}

context: {context}
"""
response = llama(prompt,
                 verbose=True)
print(response)

### Chain-of-thought Prompting
- LLMs can perform better at reasoning and logic problems if you ask them to break the problem down into smaller steps. This is known as **chain-of-thought** prompting.

In [None]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?
"""
response = llama(prompt)
print(response)

- Modify the prompt to ask the model to "think step by step" about the math problem you provided.

In [None]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?

Think step by step.
"""
response = llama(prompt)
print(response)

- Provide the model with additional instructions.

In [None]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?

Think step by step.
Explain each intermediate step.
Only when you are done with all your steps,
provide the answer based on your intermediate steps.
"""
response = llama(prompt)
print(response)

- The order of instructions matters!
- Ask the model to "answer first" and "explain later" to see how the output changes.

In [None]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?
Think step by step.
Provide the answer as a single yes/no answer first.
Then explain each intermediate step.
"""

response = llama(prompt)
print(response)

- Since LLMs predict their answer one token at a time, the best practice is to ask them to think step by step, and then only provide the answer after they have explained their reasoning.