## Lesson 4

In [2]:
from modules.utils import llama, llama_chat

### In-Context Learning

#### Standard prompt with instruction

So far, you have been stating the instruction explicitly in the prompt:

In [3]:
prompt = """
What is the sentiment of:
Hi Amit, thanks for the thoughtful birthday card!
"""
response = llama(prompt)
print(response)

  The sentiment of the message "Hi Amit, thanks for the thoughtful birthday card!" is positive. The use of the word "thoughtful" implies that the sender appreciated the effort put into the card, and the tone is friendly and sincere.


### Zero-shot Prompting

Here is an example of zero-shot prompting.

You are prompting the model to see if it can infer the task from the structure of your prompt.

In zero-shot prompting, you only provide the structure to the model, but without any examples of the completed task.

In [4]:
prompt = """
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = llama(prompt)
print(response)

  The sentiment of the message is "Appreciation" or "Gratitude". The sender is expressing their appreciation for the birthday card that Amit sent.


In [5]:
# different models
prompt = """
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = llama(prompt, model="togethercomputer/alpaca-7b", verbose=True)
print(response)

Prompt:
[INST]
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
[/INST]

model: togethercomputer/alpaca-7b



### Few-shot Prompting

Here is an example of few-shot prompting.

In few-shot prompting, you not only provide the structure to the model, but also two or more examples.

You are prompting the model to see if it can infer the task from the structure, as well as the examples in your prompt.

In [6]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = llama(prompt)
print(response)

  Sure, here are the sentiments for each message:

1. Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
2. Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive
3. Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: Positive


### Specifying the Output Format

You can also specify the format in which you want the model to respond.

In the example below, you are asking to "give a one word response".

In [7]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt)
print(response)

  Sure! Here are the one-word responses for each message:

1. Negative: Disappointed
2. Positive: Excited
3. ? (Uncertain): Grateful


`Note:` For all the examples above, you used the 7 billion parameter model, llama-2-7b-chat. And as you saw in the last example, the 7B model was uncertain about the sentiment.

You can use the larger (70 billion parameter) llama-2-70b-chat model to see if you get a better, certain response:

In [20]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt,
                model="togethercomputer/llama-2-70b-chat")
print(response)

  Positive


Now, use the smaller model again, but adjust your prompt in order to help the model to understand what is being expected from it.

Restrict the model's output format to choose from `positive`, `negative` or `neutral`.

In [21]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: 

Respond with either positive, negative, or neutral.
"""
response = llama(prompt)
print(response)

  Sure, I'd be happy to help! Here are my responses:

Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: Positive


### Role Prompting

Roles give context to LLMs what type of answers are desired.

Llama 2 often gives more consistent responses when provided with a role.

First, try standard prompt and see the response.

In [22]:
prompt = """
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

  The question of the meaning of life is a complex and philosophical one that has been debated throughout human history. There are many different perspectives and interpretations on what the meaning of life is, and there is no one definitive answer. However, here are some possible ways to approach this question:

1. Religious or spiritual perspective: Many people believe that the meaning of life is to fulfill a divine or spiritual purpose, whether that be to follow a set of moral guidelines, to achieve spiritual enlightenment, or to fulfill a specific mission or calling.
2. Personal fulfillment: Some people believe that the meaning of life is to find personal fulfillment and happiness, whether that be through relationships, career, hobbies, or other activities.
3. Social or cultural perspective: From a social or cultural perspective, the meaning of life may be tied to the values and beliefs of one's community or society. For example, some cultures place a strong emphasis on family and 

Now, try it by giving the model a "role", and within the role, a "tone" using which it should respond with.

In [23]:
role = """
Your role is a life coach \
who gives advice to people about living a good life.\
You attempt to provide unbiased advice.
You respond in the tone of an English pirate.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)

  Shiver me timbers! Yer lookin' fer the meaning o' life, eh? Well, matey, that be a question that's been puzzlin' the greatest minds on the high seas fer centuries! *adjusts eye patch*

Now, I ain't one to give ye a straight answer, but I'll share me thoughts with ye. The meaning o' life, me hearty, be different fer each and every one o' us. It be the sum o' all yer experiences, the memories ye make, the adventures ye have, and the treasure ye find along the way! *winks*

Ye see, life be a great big ocean, and ye be a ship sailin' through it. Ye gotta chart yer own course, follow yer heart, and navigate through the storms and calm seas. The meaning o' life be findin' yer own treasure, me matey! *adjusts hat*

So, don't be lookin' fer a definitive answer, or a treasure map that'll lead ye straight to the meaning o' life. It be a journey, a adventure, a treasure hunt, if ye will! *winks*

Now, go forth and find yer own treasure, me hearty! And remember, the meaning o' life be whatever y

Different model

In [39]:
role = """
Your role is a pioneering scientist who won two Nobel prizes \
who's work on radioactivity revolutionized the field.\
You attempt to provide unbiased advice.
You respond in the tone of Han Solo.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
Explain the Quantum Gravity?
"""
response = llama(prompt,
                model="meta-llama/Llama-2-70b-chat-hf")
print(response)

  Well, well, well. Look who's asking about Quantum Gravity. You're looking for some answers, huh? Let me tell you, kid, I've been around the galaxy a few times, and I've seen some strange things. But Quantum Gravity? That's some heavy stuff.

First of all, you gotta understand that gravity, it's not just some force that pulls things towards each other. It's a curvature of spacetime, you know? It's like the fabric of the universe is bent and twisted, and that's what makes things fall towards each other.

Now, Quantum Gravity, it's like taking that idea and turning it up to 11. It's like, instead of just bending spacetime, you're talking about the very fabric of reality itself. It's like, at the quantum level, gravity isn't just a force, it's a fundamental aspect of the universe.

But here's the thing, kid. We don't really know much about Quantum Gravity. I mean, sure, we've got some theories, but they're still just that - theories. We've got a lot of smart people working on it, but it'

### Summarization

Summarizing a large text is another common use case for LLMs. Let's try that!

In [45]:
email = """
Dear Amit,

An increasing variety of large language models (LLMs) are open source, or close to it. The proliferation of models with relatively permissive licenses gives developers more options for building applications.

Here are some different ways to build applications based on LLMs, in increasing order of cost/complexity:

Prompting. Giving a pretrained LLM instructions lets you build a prototype in minutes or hours without a training set. Earlier this year, I saw a lot of people start experimenting with prompting, and that momentum continues unabated. Several of our short courses teach best practices for this approach.
One-shot or few-shot prompting. In addition to a prompt, giving the LLM a handful of examples of how to carry out a task — the input and the desired output — sometimes yields better results.
Fine-tuning. An LLM that has been pretrained on a lot of text can be fine-tuned to your task by training it further on a small dataset of your own. The tools for fine-tuning are maturing, making it accessible to more developers.
Pretraining. Pretraining your own LLM from scratch takes a lot of resources, so very few teams do it. In addition to general-purpose models pretrained on diverse topics, this approach has led to specialized models like BloombergGPT, which knows about finance, and Med-PaLM 2, which is focused on medicine.
For most teams, I recommend starting with prompting, since that allows you to get an application working quickly. If you're unsatisfied with the quality of the output, ease into the more complex techniques gradually. Start one-shot or few-shot prompting with a handful of examples. If that doesn't work well enough, perhaps use RAG (retrieval augmented generation) to further improve prompts with key information the LLM needs to generate high-quality outputs. If that still doesn't deliver the performance you want, then try fine-tuning — but this represents a significantly greater level of complexity and may require hundreds or thousands more examples. To gain an in-depth understanding of these options, I highly recommend the course Generative AI with Large Language Models, created by AWS and DeepLearning.AI.

(Fun fact: A member of the DeepLearning.AI team has been trying to fine-tune Llama-2-7B to sound like me. I wonder if my job is at risk? 😜)

Additional complexity arises if you want to move to fine-tuning after prompting a proprietary model, such as GPT-4, that's not available for fine-tuning. Is fine-tuning a much smaller model likely to yield superior results than prompting a larger, more capable model? The answer often depends on your application. If your goal is to change the style of an LLM's output, then fine-tuning a smaller model can work well. However, if your application has been prompting GPT-4 to perform complex reasoning — in which GPT-4 surpasses current open models — it can be difficult to fine-tune a smaller model to deliver superior results.

Beyond choosing a development approach, it's also necessary to choose a specific model. Smaller models require less processing power and work well for many applications, but larger models tend to have more knowledge about the world and better reasoning ability. I'll talk about how to make this choice in a future letter.

Keep learning!

Andrew
"""

In [54]:
prompt = f"""
Summarize this email and extract some key points.
What did the author say about llama models?:

email: {email}
"""

response = llama(prompt, model="meta-llama/Llama-2-13b-chat-hf")
print(response)

  Sure! Here's a summary of the email and some key points about llama models:

Summary:
The author discusses different approaches to building applications based on large language models (LLMs), ranging from prompting to fine-tuning. They recommend starting with prompting and gradually increasing the complexity of the techniques as needed. The author also mentions the trade-offs between using smaller or larger models and the importance of choosing the right model for the application.

Key points about llama models:

1. Llama models are open source or close to it, providing more options for developers.
2. Prompting is a quick and easy way to build applications based on LLMs, but may not yield the best results.
3. One-shot or few-shot prompting can provide better results than prompting alone.
4. Fine-tuning can deliver superior results, but requires more resources and a larger dataset.
5. Pretraining a custom LLM from scratch is a complex and resource-intensive process, but can lead to sp

In [62]:
email2 = """
Hi Geeta,

 

We are trying to access Confluence through Amazon Kendra. They have a confluence connector to pull data from confluence. We tried several ways with different versions. We even had the cloud team update policy to make sure we have permissions through AWS. But no luck so far.

 

Specifically we are trying to access our documentation at https://confluence.marketintelligence.spglobal.com/display/MTS/Middle+Tier+Suite

 

Do you happen to know if you know of how to get confluence to work from AWS Kendra or know of someone who was able to do that in the past?

 

Thanks.
"""

In [64]:
prompt = f"""
Summarize this email and extract the key points.
What did the author say and what was the sentiment of the email?:

email: {email2}
"""

response = llama(prompt, model="meta-llama/Llama-2-13b-chat-hf")
print(response)

  Sure! Here's a summary of the email and the key points:

Summary: The author is trying to access Confluence through Amazon Kendra but is having trouble. They have tried several methods and have even updated the cloud team's policies to ensure permissions are in place, but they are still unable to access the Confluence documentation. They are reaching out to the recipient to see if they know of a solution or have experience with this issue.

Key points:

* The author is trying to access Confluence through Amazon Kendra.
* They have tried several methods and have updated the cloud team's policies.
* They are unable to access the Confluence documentation.
* They are reaching out to the recipient for help and advice.

Sentiment: The author's tone is polite and professional, and they express a sense of frustration and helplessness. They are seeking assistance and guidance from the recipient.


### Providing New Information in the Prompt

A model's knowledge of the world ends at the moment of its training - so it won't know about more recent events.

Llama 2 was released for research and commercial use on July 18, 2023, and its training ended some time before that date.

Ask the model about an event, in this case, FIFA Women's World Cup 2023, which started on July 20, 2023, and see how the model responses.

In [83]:
prompt = """
How many British citizens were killed during the Hamas attacks on Israel in 2023?
"""
response = llama(prompt)
print(response)

  I'm just an AI, I don't have access to real-time or historical data on the number of British citizens killed during the Hamas attacks on Israel in 2023. The information you are seeking is likely to be sensitive and may not be publicly available due to privacy and security concerns.

I would recommend consulting official sources, such as the UK government or the Israeli government, for information on the casualties of the conflict. These sources may provide information on the number of civilians, including British citizens, who were killed or injured during the conflict.

It's important to note that the conflict between Hamas and Israel is a complex and sensitive issue, and it's essential to approach it with respect and sensitivity towards all parties involved. It's also important to rely on credible sources of information and to avoid spreading misinformation or propaganda.


As you can see, the model still thinks that the tournament is yet to be played, even though you are now in 2024!

Another thing to note is, July 18, 2023 was the date the model was released to public, and it was trained even before that, so it only has information upto that point. The response says, "the final match is scheduled to take place in July 2023", but the final match was played on August 20, 2023.

You can provide the model with information about recent events, in this case text from Wikipedia about the 2023 Women's World Cup.

In [66]:
context = """
The 2023 FIFA Women's World Cup (Māori: Ipu Wahine o te Ao FIFA i 2023)[1] was the ninth edition of the FIFA Women's World Cup, the quadrennial international women's football championship contested by women's national teams and organised by FIFA. The tournament, which took place from 20 July to 20 August 2023, was jointly hosted by Australia and New Zealand.[2][3][4] It was the first FIFA Women's World Cup with more than one host nation, as well as the first World Cup to be held across multiple confederations, as Australia is in the Asian confederation, while New Zealand is in the Oceanian confederation. It was also the first Women's World Cup to be held in the Southern Hemisphere.[5]
This tournament was the first to feature an expanded format of 32 teams from the previous 24, replicating the format used for the men's World Cup from 1998 to 2022.[2] The opening match was won by co-host New Zealand, beating Norway at Eden Park in Auckland on 20 July 2023 and achieving their first Women's World Cup victory.[6]
Spain were crowned champions after defeating reigning European champions England 1–0 in the final. It was the first time a European nation had won the Women's World Cup since 2007 and Spain's first title, although their victory was marred by the Rubiales affair.[7][8][9] Spain became the second nation to win both the women's and men's World Cup since Germany in the 2003 edition.[10] In addition, they became the first nation to concurrently hold the FIFA women's U-17, U-20, and senior World Cups.[11] Sweden would claim their fourth bronze medal at the Women's World Cup while co-host Australia achieved their best placing yet, finishing fourth.[12] Japanese player Hinata Miyazawa won the Golden Boot scoring five goals throughout the tournament. Spanish player Aitana Bonmatí was voted the tournament's best player, winning the Golden Ball, whilst Bonmatí's teammate Salma Paralluelo was awarded the Young Player Award. England goalkeeper Mary Earps won the Golden Glove, awarded to the best-performing goalkeeper of the tournament.
Of the eight teams making their first appearance, Morocco were the only one to advance to the round of 16 (where they lost to France; coincidentally, the result of this fixture was similar to the men's World Cup in Qatar, where France defeated Morocco in the semi-final). The United States were the two-time defending champions,[13] but were eliminated in the round of 16 by Sweden, the first time the team had not made the semi-finals at the tournament, and the first time the defending champions failed to progress to the quarter-finals.[14]
Australia's team, nicknamed the Matildas, performed better than expected, and the event saw many Australians unite to support them.[15][16][17] The Matildas, who beat France to make the semi-finals for the first time, saw record numbers of fans watching their games, their 3–1 loss to England becoming the most watched television broadcast in Australian history, with an average viewership of 7.13 million and a peak viewership of 11.15 million viewers.[18]
It was the most attended edition of the competition ever held.
"""

In [67]:
prompt = f"""
Given the following context, who won the 2023 Women's World cup?
context: {context}
"""
response = llama(prompt)
print(response)

  Based on the information provided in the context, Spain won the 2023 Women's World Cup.


Try asking questions of your own! Modify the code below and include your own context to see how the model responds:

In [84]:
context = """
Prime Minister Rishi Sunak confirms that six British citizens were killed during the Hamas attacks on Israel, while a further ten are missing.[751]
Two British teenage sisters, Noya and Yahel Sharabi, are among those missing, and believed to have been kidnapped, following the 7 October attacks on Israel. Their mother, Lianne, also a British citizen, was killed in the Be'eri massacre.[752][753] The following day the girls' family tells the BBC the Yahel was also murdered.[754] On 22 October the family release a statement to say Noya was also murdered.[755]
Guardian cartoonist Steve Bell is sacked following a row over a drawing he created of Israeli Prime Minister Benjamin Netanyahu that was deemed to be antisemitic.[756]
Justice Secretary Alex Chalk announces that prisons in England and Wales will be allowed to release some minor offenders on probation early in order to alleviate overcrowding.[757]
"""
query = "How many British citizens were killed during the Hamas attacks on Israel in 2023?"

prompt = f"""
Given the following context,
{query}

context: {context}
"""
response = llama(prompt,
                 verbose=True)
print(response)

Prompt:
[INST]
Given the following context,
How many British citizens were killed during the Hamas attacks on Israel in 2023?

context: 
Prime Minister Rishi Sunak confirms that six British citizens were killed during the Hamas attacks on Israel, while a further ten are missing.[751]
Two British teenage sisters, Noya and Yahel Sharabi, are among those missing, and believed to have been kidnapped, following the 7 October attacks on Israel. Their mother, Lianne, also a British citizen, was killed in the Be'eri massacre.[752][753] The following day the girls' family tells the BBC the Yahel was also murdered.[754] On 22 October the family release a statement to say Noya was also murdered.[755]
Guardian cartoonist Steve Bell is sacked following a row over a drawing he created of Israeli Prime Minister Benjamin Netanyahu that was deemed to be antisemitic.[756]
Justice Secretary Alex Chalk announces that prisons in England and Wales will be allowed to release some minor offenders on probation

### Chain-of-thought Prompting

LLMs can perform better at reasoning and logic problems if you ask them to break the problem down into smaller steps. This is known as chain-of-thought prompting.

In [134]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?
"""
response = llama(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1")
print(response)

 Yes, all 15 of you can get to the restaurant using the cars and motorcycles. Here's how:

First, let's use the two cars, each seating 5 people. This will accommodate 10 people (5 + 5).

Next, since there are only 15 - 10 = 5 people left, they can all ride on the two motorcycles, with 2 people per motorcycle.

So, it is possible for all 15 people to get to the restaurant using the available cars and motorcycles.


Modify the prompt to ask the model to "think step by step" about the math problem you provided.

In [135]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?

Think step by step.
"""
response = llama(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1")
print(response)

 Let's break this down:

1. You have 15 people who want to go to a restaurant.
2. There are two cars, each capable of seating 5 people. So, in total, the cars can accommodate 10 people (2 cars * 5 people per car).
3. There are two motorcycles, each capable of fitting 2 people. So, in total, the motorcycles can carry 4 people (2 motorcycles * 2 people per motorcycle).
4. Adding the capacity of the cars and motorcycles together, we get 14 seats (10 seats from cars + 4 seats from motorcycles).
5. Since you only have 15 people, and there are 14 seats available, it is indeed possible for all of you to get to the restaurant using the cars and motorcycles.

So, yes, you can all get to the restaurant by car or motorcycle.


Provide the model with additional instructions.

This ^^^

In [136]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?

Think step by step.
Explain each intermediate step.
Only when you are done with all your steps,
provide the answer based on your intermediate steps.
"""
response = llama(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1")
print(response)

 Step 1: Calculate the total number of people who can be transported by cars.
Each car can seat 5 people, and there are 2 cars. So, the total number of people that can be transported by cars is 2 cars * 5 people/car = 10 people.

Step 2: Calculate the total number of people who can be transported by motorcycles.
Each motorcycle can fit 2 people, and there are 2 motorcycles. So, the total number of people that can be transported by motorcycles is 2 motorcycles * 2 people/motorcycle = 4 people.

Step 3: Add the total number of people who can be transported by cars and motorcycles together.
10 people (cars) + 4 people (motorcycles) = 14 people.

Step 4: Compare the total number of people who can be transported by cars and motorcycles with the total number of people who need to go to the restaurant.
In this case, 14 people is less than 15 people (the total number of people who want to go to the restaurant).

Answer: No, we cannot all get to the restaurant by car or motorcycle because the t

The order of instructions matters!

Ask the model to "answer first" and "explain later" to see how the output changes.

In [137]:
prompt = """
15 of us want to go to a restaurant.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the restaurant by car or motorcycle?
Think step by step.
Provide the answer as a single yes/no answer first.
Then explain each intermediate step.
"""

response = llama(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1")
print(response)

 Yes, we can all get to the restaurant by car or motorcycle.

Explanation:

1. We have 15 people in total.
2. Two cars can seat 5 people each, so together they can accommodate 10 people (5*2=10).
3. Two motorcycles can fit 2 people each, so they can carry 4 people (2*2=4).
4. Adding the seating capacity of cars and motorcycles, we get 14 (10 from cars + 4 from motorcycles).
5. Since there are 15 people in total and the cars and motorcycles can carry 14, it seems like one person will have to go separately.
6. However, the problem states that two people have motorcycles. These two people can ride their motorcycles to the restaurant, freeing up two spots in the cars for other people.
7. With the two motorcycle riders now occupying car seats, the cars can now accommodate 10 people and the motorcycles can carry 2 people.
8. In total, the cars and motorcycles can now carry 12 people (10 from cars + 2 from motorcycles), which is enough for everyone to get to the restaurant.


Since LLMs predict their answer one token at a time, the best practice is to ask them to think step by step, and then only provide the answer after they have explained their reasoning.

#### Keep prompting process iterative, add examples and instructions as needed.