# Lesson 4

### Import helper function

In [23]:
from utils import llama

### In-Context Learning

#### Standard prompt with instruction
- So far, you have been stating the instruction explicitly in the prompt:

In [24]:
prompt = """
What is the sentiment of:
Hi Pavan, thanks for the thoughtful Lecture today!
"""
response = llama(prompt)
print(response)

The sentiment of this message is extremely positive. The use of the word "thoughtful" to describe the lecture and the expression of gratitude with "thanks" convey a strong sense of appreciation and admiration for the speaker's effort.


### Zero-shot Prompting
- Here is an example of zero-shot prompting.
- You are prompting the model to see if it can infer the task from the structure of your prompt.
- In zero-shot prompting, you only provide the structure to the model, but without any examples of the completed task.


In [25]:
prompt = """
Message: Hi Pavan, thanks for the thoughtful Lecture today!
Sentiment: ?
"""
response = llama(prompt)
print(response)

The sentiment of this message is positive. The user is expressing gratitude and appreciation for the lecture given by Pavan, indicating a favorable tone.


### Few-shot Prompting
- Here is an example of few-shot prompting.
- In few-shot prompting, you not only provide the structure to the model, but also two or more examples.
- You are prompting the model to see if it can infer the task from the structure, as well as the examples in your prompt.

In [26]:
prompt = """
Message: Hi Pavan, you're 20 minutes late to my AI Lecture!
Sentiment: {"intent": "Negative"}

Message: Can't wait to order a book for my birthday
Sentiment: {"intent": "Positive"}

Message: Hi Pavan, thanks for the thoughtful Lecture today!
Sentiment: ?
"""
response = llama(prompt)
print(response)

To determine the sentiment of the third message, we need to analyze its content.

The message reads: "Hi Pavan, thanks for the thoughtful Lecture today!"

In this message:

- The speaker is expressing gratitude towards you (Pavan) for something.
- They are using positive adjectives like "thoughtful" to describe your lecture.
- There's no negative language or tone in the message.

Based on these observations, I would classify the sentiment of the third message as: {"intent": "Positive"}

So, the updated list would be:

1. Sentiment: {"intent": "Negative"} (Message 1)
2. Sentiment: {"intent": "Positive"} (Message 2)
3. Sentiment: {"intent": "Positive"} (Message 3)


### Specifying the Output Format
- You can also specify the format in which you want the model to respond.
- In the example below, you are asking to "give a one word response".

In [31]:
prompt = """
Message: Hi Pavan, you're 20 minutes late to my AI Lecture!
Sentiment: Negative

Message: Can't wait to order a book for my birthday
Sentiment: Positive

Message: Hi Pavan, thanks for the thoughtful Lecture today!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt=prompt, model="deepseek-r1:8b")
print(response)

<think>
Okay, so I need to figure out the sentiment of the message "Hi Pavan, thanks for the thoughtful Lecture today!" The user is asking me to respond with just one word. Let's break this down step by step.

First, looking at the structure of the message: it starts with a greeting ("Hi Pavan"), which is pretty standard and doesn't carry any negative or positive connotations on its own. Then there's an expression of gratitude: "thanks for the thoughtful Lecture today!" The word "thoughtful" here is key because it's describing something positive about the lecture. So, the speaker is appreciative and acknowledging that the lecture was well-considered or well-prepared.

Now, considering the context from the previous messages provided. In the first message, the sentiment was negative because the person was upset about being late to an AI Lecture. The second message was positive because they were excited to order a book for their birthday. So, this third message is in response to the AI Le

**Note:** For all the examples above, you used the 7 billion parameter model, `llama-2-7b-chat`. And as you saw in the last example, the 7B model was uncertain about the sentiment.

- You can use the larger (70 billion parameter) `llama-2-70b-chat` model to see if you get a better, certain response:

In [6]:
prompt = """
Message: Hi Pavan, you're 20 minutes late to my AI Lecture!
Sentiment: Negative

Message: Can't wait to order a book for my birthday
Sentiment: Positive

Message: Hi Pavan, thanks for the thoughtful Lecture today!
Sentiment: ?

Give a one word response.
"""
response = llama(prompt,model="deepseek-r1:8b")
print(response)

<think>
Okay, so I need to figure out the sentiment of the message "Hi Pavan, thanks for the thoughtful Lecture today!" The user is asking me to respond with just one word. Let's break this down step by step.

First, looking at the structure of the message: it starts with a greeting ("Hi Pavan"), which is pretty standard and doesn't carry any negative or positive connotations on its own. Then there's an expression of gratitude: "thanks for the thoughtful Lecture today!" The word "thoughtful" here is key because it's describing something positive about the lecture. So, the speaker is appreciative and acknowledging that the lecture was well-considered or well-prepared.

Now, considering the context from the previous messages provided. In the first message, the sentiment was negative because the person was upset about being late to an AI Lecture. The second message was positive because they were excited to order a book for their birthday. So, this third message is in response to the AI Le

- Now, use the smaller model again, but adjust your prompt in order to help the model to understand what is being expected from it.
- Restrict the model's output format to choose from `positive`, `negative` or `neutral`.

In [32]:
prompt = """
Message: Hi Pavan, you're 20 minutes late to my AI Lecture!
Sentiment: Negative

Message: Can't wait to order a book for my birthday
Sentiment: Positive

Message: Hi Pavan, thanks for the thoughtful Lecture today!
Sentiment: ?

Respond with either positive, negative, or neutral.
"""
response = llama(prompt)
print(response)

Neutral. The sentiment of the message is not explicitly stated as positive or negative, but rather it's a polite and appreciative greeting.


### Role Prompting
- Roles give context to LLMs what type of answers are desired.
- Llama 2 often gives more consistent responses when provided with a role.
- First, try standard prompt and see the response.

In [33]:
prompt = """
How can I answer this question from my friend:
What is the tokenizer in LLM?
"""
response = llama(prompt)
print(response)

The tokenizer, also known as a wordpiece or subword tokenizer, is a crucial component of Large Language Models (LLMs). Here's how you can answer your friend's question:

**What is the tokenizer in LLM?**

In the context of Large Language Models, the tokenizer is a module that splits input text into individual tokens. These tokens are then fed into the model as input.

Think of it like this: when you type a sentence on your keyboard, each key press creates a token (e.g., "hello" becomes 3 tokens: h-e-l-lo). The tokenizer does something similar with text input. It breaks down words and subwords (smaller units within words) into individual tokens that the model can process.

The tokenizer's primary function is to:

1. **Split words into subwords**: Large language models often use subwording techniques, like WordPiece or BPE (Byte Pair Encoding), which split words into smaller subwords based on their frequency and context.
2. **Convert text to numerical representations**: The tokenizer con

- Now, try it by giving the model a "role", and within the role, a "tone" using which it should respond with.

In [34]:
role = """
Your role is a AI Architect \
who gives advice to people about advanced architectural questions.\
You attempt to provide unbiased advice.
You respond in the tone of an hindi speaking professor.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
What is the tokenizer in LLM ?
"""
response = llama(prompt)
print(response)

Beta, tokenizer ek bahut hi important component hai LLM (Large Language Model) mein. 

Tokenization ka matlab yah hai ki hum kisi bhi text ko small chunks mein todne ke liye ek algorithm ka upyog karte hain. Yeh chunks humein model ko understand karne aur process karne mein madad karte hain.

Ek tokenizer ka kaam yah hai ki wo input text ko words, subwords ya tokens mein badal dete hain. Isse model ko samajhne aur process karne mein asaan hota hai. 

Ab, aap apni friend se puch sakte hain ki tokenizer kya hota hai? Kya yah ek machine learning algorithm hai? Kya yah humein input text ko understand karne mein madad karta hai?

Aapke paas kuch examples bhi de sakte hain, jaise ki "tokenizer ka upyog kaise hota hai?" ya "tokenizer ke liye kya algorithms upyogi hote hain?"

Is tarah se aap apni friend ko tokenizer ke baare mein samajhne mein madad kar sakte hain.


### Summarization
- Summarizing a large text is another common use case for LLMs. Let's try that!

In [11]:
email = """
Dear Professor,

I'm writing to share our comprehensive research findings on optimizing Retrieval Augmented Generation (RAG) pipelines. Our team has made significant breakthroughs in improving accuracy and reducing hallucination rates through systematic enhancements across multiple components of the RAG architecture.

Our first major advancement came through chunking optimization. Traditional approaches often rely on arbitrary text splits, but we've implemented semantic chunking that preserves contextual meaning. By reducing chunk sizes to 256 tokens and incorporating a 20% overlap between adjacent chunks, we've maintained contextual continuity while enabling more precise retrieval. This modification alone yielded a 15% improvement in retrieval precision across our test cases.

The vector database layer presented another opportunity for substantial improvement. We developed a hybrid search architecture that combines dense and sparse vectors, leveraging the strengths of both approaches. The integration of BM25 with cross-encoder reranking has proven particularly effective. By adding metadata filtering capabilities, we've enabled more nuanced domain-specific retrieval. These enhancements collectively reduced our hallucination rate by 27%, a significant improvement in output reliability.

Query processing emerged as a critical component for overall system performance. We deployed an advanced query expansion system using a fine-tuned T5 model, which better captures semantic variations in user queries. Our hypothesis-driven query decomposition approach breaks complex queries into manageable sub-queries, while maintaining logical relationships. The implementation of query-specific routing to specialized indexes has further refined our retrieval accuracy. These query processing improvements resulted in a 32% boost in answer relevance.

To validate these improvements, we developed a comprehensive evaluation framework. Our test suite now includes over 5,000 carefully curated question-answer pairs, spanning multiple domains and complexity levels. We employ automated evaluation metrics including ROUGE, BLEU, and BERTScore, supplemented by a structured human-in-loop feedback system. This rigorous evaluation approach has confirmed an 89% accuracy rate on domain-specific queries, representing a substantial improvement over baseline performance.

Looking ahead, we're exploring several promising avenues for further enhancement. Our preliminary work on multi-vector indexing shows potential for handling complex, multi-hop queries more effectively. We're also investigating advanced context compression techniques that could enable more efficient processing of longer documents while maintaining retrieval accuracy. Initial results suggest these approaches could push our accuracy rates even higher.

I look forward to presenting our detailed findings at next week's NLP conference, where we can discuss potential collaborations and share insights with the broader research community.

Best regards,
Pavan
"""

In [12]:
prompt = f"""
Summarize this email and extract some key points in bullets.
What did the author say about RAG process?:

email: {email}
"""

response = llama(prompt)
print(response)

**Summary:**

The email is from Pavan, who shares their team's research findings on optimizing Retrieval Augmented Generation (RAG) pipelines. They report significant breakthroughs in improving accuracy and reducing hallucination rates through various enhancements to the RAG architecture, including chunking optimization, vector database layer improvements, query processing advancements, and preliminary work on multi-vector indexing and context compression.

**Key Points:**

* The team implemented semantic chunking that preserves contextual meaning, resulting in a 15% improvement in retrieval precision.
* A hybrid search architecture combining dense and sparse vectors reduced hallucination rates by 27%.
* An advanced query expansion system using a fine-tuned T5 model improved answer relevance by 32%.
* A comprehensive evaluation framework with over 5,000 carefully curated question-answer pairs validated the improvements, achieving an 89% accuracy rate.
* Preliminary work on multi-vector

### Providing New Information in the Prompt
- A model's knowledge of the world ends at the moment of its training - so it won't know about more recent events.
- Llama 2 was released for research and commercial use on July 18, 2023, and its training ended some time before that date.
- Ask the model about an event, in this case, FIFA Women's World Cup 2023, which started on July 20, 2023, and see how the model responses.

In [13]:
prompt = """
Who won the 2023 Women's World Cup?
"""
response = llama(prompt)
print(response)

I don't have information on the winner of the 2023 Women's World Cup as my knowledge cutoff is December 2023, and I do not have real-time updates. However, I can suggest checking with a reliable news source or the official FIFA website for the latest information on the tournament.


- As you can see, the model still thinks that the tournament is yet to be played, even though you are now in 2024!
- Another thing to **note** is, July 18, 2023 was the date the model was released to public, and it was trained even before that, so it only has information upto that point. The response says, "the final match is scheduled to take place in July 2023", but the final match was played on August 20, 2023.

- You can provide the model with information about recent events, in this case text from Wikipedia about the 2023 Women's World Cup.

In [14]:
context = """
The 2023 FIFA Women's World Cup (Māori: Ipu Wahine o te Ao FIFA i 2023)[1] was the ninth edition of the FIFA Women's World Cup, the quadrennial international women's football championship contested by women's national teams and organised by FIFA. The tournament, which took place from 20 July to 20 August 2023, was jointly hosted by Australia and New Zealand.[2][3][4] It was the first FIFA Women's World Cup with more than one host nation, as well as the first World Cup to be held across multiple confederations, as Australia is in the Asian confederation, while New Zealand is in the Oceanian confederation. It was also the first Women's World Cup to be held in the Southern Hemisphere.[5]
This tournament was the first to feature an expanded format of 32 teams from the previous 24, replicating the format used for the men's World Cup from 1998 to 2022.[2] The opening match was won by co-host New Zealand, beating Norway at Eden Park in Auckland on 20 July 2023 and achieving their first Women's World Cup victory.[6]
Spain were crowned champions after defeating reigning European champions England 1–0 in the final. It was the first time a European nation had won the Women's World Cup since 2007 and Spain's first title, although their victory was marred by the Rubiales affair.[7][8][9] Spain became the second nation to win both the women's and men's World Cup since Germany in the 2003 edition.[10] In addition, they became the first nation to concurrently hold the FIFA women's U-17, U-20, and senior World Cups.[11] Sweden would claim their fourth bronze medal at the Women's World Cup while co-host Australia achieved their best placing yet, finishing fourth.[12] Japanese player Hinata Miyazawa won the Golden Boot scoring five goals throughout the tournament. Spanish player Aitana Bonmatí was voted the tournament's best player, winning the Golden Ball, whilst Bonmatí's teammate Salma Paralluelo was awarded the Young Player Award. England goalkeeper Mary Earps won the Golden Glove, awarded to the best-performing goalkeeper of the tournament.
Of the eight teams making their first appearance, Morocco were the only one to advance to the round of 16 (where they lost to France; coincidentally, the result of this fixture was similar to the men's World Cup in Qatar, where France defeated Morocco in the semi-final). The United States were the two-time defending champions,[13] but were eliminated in the round of 16 by Sweden, the first time the team had not made the semi-finals at the tournament, and the first time the defending champions failed to progress to the quarter-finals.[14]
Australia's team, nicknamed the Matildas, performed better than expected, and the event saw many Australians unite to support them.[15][16][17] The Matildas, who beat France to make the semi-finals for the first time, saw record numbers of fans watching their games, their 3–1 loss to England becoming the most watched television broadcast in Australian history, with an average viewership of 7.13 million and a peak viewership of 11.15 million viewers.[18]
It was the most attended edition of the competition ever held.
"""

In [15]:
prompt = f"""
Given the following context, who won the 2023 Women's World cup?
context: {context}
"""
response = llama(prompt)
print(response)

Spain won the 2023 Women's World Cup by defeating England 1-0 in the final.


### Try it Yourself!

Try asking questions of your own! Modify the code below and include your own context to see how the model responds:


In [16]:
context = """
<paste context in here>
"""
query = "<your query here>"

prompt = f"""
Given the following context,
{query}

context: {context}
"""
response = llama(prompt,verbose=True)
print(response)

Prompt:
[INST]
Given the following context,
<your query here>

context: 
<paste context in here>

[/INST]

model: llama3.2:latest
Please go ahead and provide the context, and I'll do my best to assist you with your query.


### Chain-of-thought Prompting
- LLMs can perform better at reasoning and logic problems if you ask them to break the problem down into smaller steps. This is known as **chain-of-thought** prompting.

In [18]:
prompt = """
15 of us want to go to a AI Conference.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the AI Conference by car or motorcycle?
"""
response = llama(prompt)
print(response)

Let's analyze the situation:

* We have 15 people in total.
* Two people have cars, and each car can seat 5 people. This means that the two cars can seat a maximum of 10 people (2 x 5 = 10).
* Since we need to accommodate all 15 people, we'll use the motorcycles as well.

We have 5 people left who don't fit in the cars. These 5 people will be transported by motorcycle.

Since each motorcycle can fit 2 people, we'll need at least 3 motorcycles to transport these 5 people (5 / 2 = 2.5, so we round up to 3).

So, to summarize:

* The two cars will seat 10 people.
* Three motorcycles will seat the remaining 5 people.

Yes, all 15 people can get to the AI Conference by car or motorcycle.


- Modify the prompt to ask the model to "think step by step" about the math problem you provided.

In [19]:
prompt = """
15 of us want to go to a AI Conference.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the AI Conference by car or motorcycle?

Think step by step.
"""
response = llama(prompt)
print(response)

Let's break down the problem step by step:

1. We have 15 people in total, and two of them have cars that can seat 5 people each. This means we have a total capacity of 10 people for the cars (2 cars x 5 people each).
2. Since there are only 15 people in total, and we already have a capacity of 10 people for the cars, it's not possible to fit all 15 people into the cars.
3. Now, let's consider the motorcycles. We have two people who own motorcycles that can seat 2 people each. This means we have a total capacity of 4 people for the motorcycles (2 motorcycles x 2 people each).
4. Since we still need to accommodate 5 more people after using the cars and motorcycles, it's not possible to fit all 15 people by car or motorcycle alone.

Therefore, unfortunately, it is not possible to get all 15 people to the AI Conference by car or motorcycle alone.


- Provide the model with additional instructions.

In [20]:
prompt = """
15 of us want to go to a AI Conference.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the AI Conference by car or motorcycle?

Think step by step.
Explain each intermediate step.
Only when you are done with all your steps,
provide the answer based on your intermediate steps.
"""
response = llama(prompt)
print(response)

To determine if everyone can get to the AI Conference by car or motorcycle, we'll break down the problem into manageable steps.

Step 1: Calculate the total number of people who need transportation
There are 15 people in total. Since two have cars and two have motorcycles, that's a total of 4 vehicles. However, we don't know how many people will use each vehicle yet.

Step 2: Determine the capacity of all vehicles combined
Each car can seat 5 people, so 2 cars can seat 10 people (2 x 5). Each motorcycle can fit 2 people, so 2 motorcycles can fit 4 people (2 x 2).

Total capacity = Car capacity + Motorcycle capacity
= 10 + 4
= 14

Step 3: Compare the total capacity to the number of people who need transportation
We have a total of 15 people who need transportation, but our vehicles can only seat 14 people. This means we're short one person.

Since we can't fit everyone in the available vehicles, we'll analyze further:

Step 4: Consider alternative arrangements for the remaining person
T

- The order of instructions matters!
- Ask the model to "answer first" and "explain later" to see how the output changes.

In [21]:
prompt = """
15 of us want to go to a AI Conference.
Two of them have cars
Each car can seat 5 people.
Two of us have motorcycles.
Each motorcycle can fit 2 people.

Can we all get to the AI Conference by car or motorcycle?

Think step by step.
Provide the answer as a single yes/no answer first.
Then explain each intermediate step.
"""

response = llama(prompt)
print(response)

Yes.

Here's the step-by-step explanation:

1. We have two cars, each seating 5 people, so we can seat up to 10 people in the cars (2 cars x 5 people/car).
2. Since there are only 15 people in total and 10 seats available in the cars, we need to find a way to transport the remaining 5 people.
3. We have two motorcycles, each seating 2 people, so we can seat up to 4 people on the motorcycles (2 motorcycles x 2 people/motorcycle).
4. Since there are only 5 people left and 4 seats available on the motorcycles, we can fit all 5 people on the motorcycles.

Therefore, it is possible for all 15 people to get to the AI Conference by car or motorcycle.


- Since LLMs predict their answer one token at a time, the best practice is to ask them to think step by step, and then only provide the answer after they have explained their reasoning.