## What is RAG ?
RAG is a technique where a language model first looks up (retrieves) useful information from a database or documents, and then uses that information to give a better answer.

## Why and when we prefer RAG over finetuning ?
We **prefer RAG over finetuning** when: We want the model to give **up-to-date or specific answers** from **our own data** without changing the model itself.

---

**Why prefer RAG?**

* **Cheaper & faster** – No need to train the model again.
* **Easier to update** – Just change the documents, not the model.
* **Better for private or large data** – You keep data separate and safe.

---

**When to use RAG?**

* When your data **changes often** (like news, product lists).
* When you want the model to **answer from your documents**.
* When **training a model is too costly or slow**.

---

Think of RAG like **giving the model a library to read** instead of teaching it everything from scratch.


# Install Libraries
* **`langchain`** – Core framework to build LLM-powered applications.
* **`langchain-community`** – Extra integrations like tools, APIs, and vector stores.
* **`langchain-pinecone`** – Connects LangChain with Pinecone for vector storage and retrieval.
* **`langchain_groq`** – Enables LangChain to use Groq's ultra-fast language models.
* **`datasets`** – Provides ready-to-use NLP/ML datasets from Hugging Face.

In [1]:
%pip install langchain==0.3.23 langchain-community==0.3.21 langchain-pinecone==0.2.5 langchain_groq datasets==3.5.0

Note: you may need to restart the kernel to use updated packages.


## Load API Keys from .env file
### What is env file?

* A .env file is a simple text file that stores environment variables (like API keys and secrets) in key=value format.
* Example content of a .env file:
* PINECONE_API_KEY=your_pinecone_api_key_here
* GROQ_API_KEY=your_groq_api_key_here

* It helps keep sensitive information out of your code and makes it easier to manage secrets securely.

**Imports tools** to:

* Use environment variables (`os`)
* Load values from a `.env` file (`load_dotenv`)
* **os** is a Python built-in module that lets your code interact with the operating system (like Windows, macOS, Linux).
---
* **Loads the `.env` file** so Python can use the secret keys stored in it (like API keys).
* **Gets the values** of `PINECONE_API_KEY` and `GROQ_API_KEY` from the `.env` file.
* **In Short:** This code **reads your secret keys from a `.env` file** so you don’t have to write them directly in your code.


In [2]:
import os
from dotenv import load_dotenv
load_dotenv()
pinecone_api=os.getenv("PINECONE_API_KEY")
groq_api=os.getenv("FELLOWSHIP_GROQ_KEY")

if groq_api:
    print("GROQ API is loaded successfully.")
else:
    print("GROQ API is not loaded.")
if pinecone_api:
    print("Pinecone is loaded successfully.")
else:
    print("Pinecone API is not loaded.")

GROQ API is loaded successfully.
Pinecone is loaded successfully.


## What is `langchain_groq`?

`langchain_groq` is a **LangChain integration** that lets you **connect to Groq’s LLMs** (like LLaMA3) easily.

Think of it as a **bridge between LangChain and Groq’s fast language models**.

---

## What is `ChatGroq`?

`ChatGroq` is a **class (tool)** inside `langchain_groq`.

It lets you:

* **Send prompts** to Groq-hosted models
* **Receive responses** from those models
* Use these models in your **LangChain app**, like chatbots, RAG, agents, etc.

---

**Why do we use this?**

Instead of manually setting up HTTP requests to Groq’s API, `ChatGroq` makes it **super easy**:

* Lets you talk to a specific Groq model (`llama3-8b-8192`)
* Works smoothly with LangChain tools (retrievers, chains, memory, etc.)
* Connects securely with your `groq_api_key`

---

#### In Simple Words

* `langchain_groq` lets LangChain talk to Groq.
* `ChatGroq` is the tool that helps you **chat with Groq’s AI model** using your API key.


In [3]:
from langchain_groq import ChatGroq
chat=ChatGroq(
    groq_api_key=groq_api,
    model_name="Llama-3.3-70B-Versatile"
)

  from .autonotebook import tqdm as notebook_tqdm


Groq uses the **same chat structure as OpenAI** because it runs **OpenAI-compatible models** like `llama3`, `mixtral`, etc.
So just like OpenAI, chats with Groq **typically look like this in plain text**:

```
System: You are a helpful assistant.
User: Hi, how are you?
Assistant: I'm doing well! How can I assist you today?
User: What is quantum computing?
Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

---

**In Code (OpenAI/Groq-compatible format):**

When using the API (like with `ChatGroq` or `ChatOpenAI` in LangChain), you use this structure:

```
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing well! How can I assist you today?"},
    {"role": "user", "content": "What is quantum computing?"}
]
```

---

**In LangChain (message objects):**

LangChain wraps those into **message classes**, like:

```
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi, how are you?"),
    AIMessage(content="I'm doing well! How can I assist you today?"),
    HumanMessage(content="What is quantum computing?")
]
```
The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

Then you pass them to the model:

```
response = chat.invoke(messages)
print(response.content)
```

In [4]:
from langchain.schema import (SystemMessage, AIMessage, HumanMessage)
messages=[
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Who is the founder of Pakistan?"),
    AIMessage(content="Quaid-e-Azam Muhammad Ali Jinnah was the founder of Pakistan."),
    HumanMessage(content="I'd like to know about the history of Pakistan.")
]


We generate the next response from the AI by passing these messages to the `ChatGroq` object.

Like saying to the AI:

“Here’s what has been said so far — now tell me what the AI should say next.”

LangChain then handles formatting and sending this to the LLM backend, and res stores the AI’s next reply.

**In Short:**
* You define a conversation (via messages).
* Call the LLM using chat(messages).
* Get a response back — stored in res.

In [5]:
res=chat(messages)
res

  res=chat(messages)


AIMessage(content="The history of Pakistan is a rich and complex one, spanning thousands of years. Here's a brief overview:\n\n**Ancient Civilizations (3300 BCE - 500 CE):**\nThe region that is now Pakistan was home to some of the world's oldest and most influential civilizations, including the Indus Valley Civilization (3300-1300 BCE), the Vedic Civilization (1500-500 BCE), and the ancient Gandhara Civilization (500 BCE-500 CE). These civilizations made significant contributions to art, architecture, literature, and philosophy.\n\n**Islamic Conquest and Rule (711-1858 CE):**\nIn 711 CE, Arab Muslims conquered the region, introducing Islam and establishing the Umayyad Caliphate. Over the centuries, various Muslim dynasties, including the Ghaznavids, Ghorids, and Mughals, ruled the region, leaving a lasting legacy in architecture, art, and culture.\n\n**British Colonial Rule (1858-1947 CE):**\nIn 1858, the British East India Company established colonial rule in the region, which became 

To see the models reply

In [6]:
print(res.content)

The history of Pakistan is a rich and complex one, spanning thousands of years. Here's a brief overview:

**Ancient Civilizations (3300 BCE - 500 CE):**
The region that is now Pakistan was home to some of the world's oldest and most influential civilizations, including the Indus Valley Civilization (3300-1300 BCE), the Vedic Civilization (1500-500 BCE), and the ancient Gandhara Civilization (500 BCE-500 CE). These civilizations made significant contributions to art, architecture, literature, and philosophy.

**Islamic Conquest and Rule (711-1858 CE):**
In 711 CE, Arab Muslims conquered the region, introducing Islam and establishing the Umayyad Caliphate. Over the centuries, various Muslim dynasties, including the Ghaznavids, Ghorids, and Mughals, ruled the region, leaving a lasting legacy in architecture, art, and culture.

**British Colonial Rule (1858-1947 CE):**
In 1858, the British East India Company established colonial rule in the region, which became part of British India. Durin

Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [7]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt=HumanMessage(content="Who was the first prime minister of Pakistan?")

# add to messages
messages.append(prompt)

# send to LLM
res = chat(messages)

print(res.content)

Liaquat Ali Khan was the first Prime Minister of Pakistan, serving from August 14, 1947, until his assassination on October 16, 1951. He was a close associate and friend of Muhammad Ali Jinnah, the founder of Pakistan, and played a key role in the country's early years.


## Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about Deepseek R1.

In [8]:
print(res.content)

Liaquat Ali Khan was the first Prime Minister of Pakistan, serving from August 14, 1947, until his assassination on October 16, 1951. He was a close associate and friend of Muhammad Ali Jinnah, the founder of Pakistan, and played a key role in the country's early years.


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.

In [9]:
source_knowledge = (
    "Pakistan was created on August 14, 1947, as a result of the partition of British India. "
    "The movement for independence was led by Muhammad Ali Jinnah and the All-India Muslim League, "
    "who demanded a separate homeland for Muslims. The new nation consisted of two regions, "
    "West Pakistan (present-day Pakistan) and East Pakistan (now Bangladesh), separated by about 1,600 km of Indian territory. "
    "In 1971, East Pakistan broke away after a civil war and became Bangladesh. "
    "Pakistan's first constitution was adopted in 1956, declaring it an Islamic Republic. "
    "Since independence, Pakistan has experienced several military coups, democratic transitions, "
    "and significant developments in its nuclear program, cultural identity, and regional politics."
)


We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [10]:
query="What is so special about Pakistan?"

augmented_prompt=f"""Using the context below, answer the query.

context:{source_knowledge}
Query:{query}
"""

Now we feed this into our chatbot as we were before.

In [11]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to GROQ
res = chat.invoke(messages)


In [12]:
print(res.content)

Pakistan is special because it was created as a separate homeland for Muslims, making it a unique nation with a distinct cultural and religious identity. Additionally, it has a complex and fascinating history, having experienced the partition of British India, the separation of East Pakistan (now Bangladesh), and significant developments in its nuclear program, cultural identity, and regional politics.


## How do we get this information in the first place?

The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem.

This is where Pinecone and vector databases comes in place, as they can help us here too. But first, we'll need a dataset.

## Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the `"jamescalam/deepseek-r1-paper-chunked"` dataset. This dataset contains the Deepseek R1 paper pre-processed into RAG-ready chunks.

In [13]:
from datasets import load_dataset
dataset=load_dataset(
    "jamescalam/deepseek-r1-paper-chunked",
    split="train"
)
dataset

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'num_tokens', 'pages', 'source'],
    num_rows: 76
})

In [14]:
dataset[0]

{'doi': '2501.12948v1',
 'chunk-id': 1,
 'chunk': "uestion: If a > 1, then the sum of the real solutions of √a - √a + x = x is equal to Response: <think> To solve the equation √a – √a + x = x, let's start by squaring both . . . (√a-√a+x)² = x² ⇒ a - √a + x = x². Rearrange to isolate the inner square root term:(a – x²)² = a + x ⇒ a² – 2ax² + (x²)² = a + x ⇒ x⁴ - 2ax² - x + (a² – a) = 0",
 'num_tokens': 145,
 'pages': [1],
 'source': 'https://arxiv.org/abs/2501.12948'}

## Dataset Overview

The dataset we are using is sourced from the Deepseek R1 ArXiv papers. Each entry in the dataset represents a "chunk" of text from the R1 paper.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, even many of the newest LLMs cannot answer questions about Deepseek R1 — at least not without this data.

## Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our Pinecone client, this requires a [free API key](https://app.pinecone.io).

In [15]:
from pinecone import Pinecone
# initialize client
pc=Pinecone(api_key=pinecone_api)


Delete the old one to save the resources

In [22]:
index_name="rag1"
# pc.delete_index(index_name) 

In [23]:
from pinecone import ServerlessSpec, CloudProvider, AwsRegion, Metric
pc.create_index(
    name=index_name,
    metric=Metric.DOTPRODUCT,
    dimension=384,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS,
        region=AwsRegion.US_EAST_1

    )
)
index = pc.Index(index_name)


Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will use HuggingFace's `sentence-transformers/all-MiniLM-L6-v2` model — we can access it via LangChain like so:

In [24]:
from langchain_huggingface import HuggingFaceEmbeddings
embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


Using this model we can create embeddings like so:

In [25]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 384)

From this we get two (aligning to our two chunks of text) CHANGE-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [26]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/1 [00:00<?, ?it/s]

100%|██████████| 1/1 [00:36<00:00, 36.53s/it]


We can check that the vector index has been populated using `describe_index_stats` like before:

In [27]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'metric': 'dotproduct',
 'namespaces': {'': {'vector_count': 76}},
 'total_vector_count': 76,
 'vector_type': 'dense'}

# Retrieval Augmented Generation
We've built a fully-fledged knowledge base. Now it's time to link that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [28]:
from langchain_pinecone import PineconeVectorStore
text_field="text"
vectorstore=PineconeVectorStore(
    index=index,
    embedding=embed_model,
    text_key=text_field
)

Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [29]:
query = "What is so special about Deepseek R1?"
vectorstore.similarity_search(query,k=3)



[Document(id='2501.12948v1-39', metadata={'source': 'https://arxiv.org/abs/2501.12948'}, page_content='## 1.2. Summary of Evaluation Results - **Reasoning tasks:** (1) DeepSeek-R1 achieves a score of 79.8% Pass@1 on AIME 2024, slightly surpassing OpenAI-01-1217. On MATH-500, it attains an impressive score of 97.3%, performing on par with OpenAI-01-1217 and significantly outperforming other models. (2) On coding-related tasks, DeepSeek-R1 demonstrates expert level in code competition tasks, as it achieves 2,029 Elo rating on Codeforces outperforming 96.3% human participants in the competition. For engineering-related tasks, DeepSeek-R1 performs slightly better than DeepSeek-V3, which could help developers in real world tasks. - **Knowledge:** On benchmarks such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek-R1 achieves outstanding results, significantly outperforming DeepSeek-V3 with scores of 90.8% on MMLU, 84.0% on MMLU-Pro, and 71.5% on GPQA Diamond. While its performance is slightly 

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to link the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [30]:
def augment_prompt(query:str):
    results=vectorstore.similarity_search(query,k=3)
    source_knowledge="\n".join([x.page_content for x in results])
    augmented_prompt=f"""Using the context below, answer the query.
    context:
    {source_knowledge}
    Query:
    {query}"""
    return augmented_prompt


Using this we produce an augmented prompt:

In [31]:
print(augment_prompt(query))

Using the context below, answer the query.
    context:
    ## 1.2. Summary of Evaluation Results - **Reasoning tasks:** (1) DeepSeek-R1 achieves a score of 79.8% Pass@1 on AIME 2024, slightly surpassing OpenAI-01-1217. On MATH-500, it attains an impressive score of 97.3%, performing on par with OpenAI-01-1217 and significantly outperforming other models. (2) On coding-related tasks, DeepSeek-R1 demonstrates expert level in code competition tasks, as it achieves 2,029 Elo rating on Codeforces outperforming 96.3% human participants in the competition. For engineering-related tasks, DeepSeek-R1 performs slightly better than DeepSeek-V3, which could help developers in real world tasks. - **Knowledge:** On benchmarks such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek-R1 achieves outstanding results, significantly outperforming DeepSeek-V3 with scores of 90.8% on MMLU, 84.0% on MMLU-Pro, and 71.5% on GPQA Diamond. While its performance is slightly below that of OpenAI-01-1217 on these bench

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [32]:
prompt=HumanMessage(content=augment_prompt(query))
messages.append(prompt)
res=chat(messages)
print(res.content)

DeepSeek-R1 is special because it achieves outstanding results in various reasoning-related benchmarks, such as AIME 2024, MATH-500, and GPQA Diamond, demonstrating its competitive edge in educational tasks. It also performs exceptionally well in coding-related tasks, achieving an expert level in code competition tasks with a 2,029 Elo rating on Codeforces, outperforming 96.3% of human participants. Additionally, DeepSeek-R1 shows strong performance in knowledge-based tasks, significantly outperforming other models in benchmarks such as MMLU, MMLU-Pro, and GPQA Diamond. Its capabilities make it a potential solution for AI-driven reasoning applications.


We can continue with another Deepseek R1:

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "how does deepseek r1 compare to deepseek r1 zero?"
    )
)

res = chat(messages + [prompt])
print(res.content)

AIMessage(content='There is no mention of "DeepSeek R1 Zero" in the provided context. The context only compares DeepSeek-R1 with other models such as OpenAI-01-1217, DeepSeek-V3, GPT-4o-0513, and Claude-3.5-Sonnet-1022, but does not mention a "DeepSeek R1 Zero" model. Therefore, it is not possible to compare DeepSeek R1 with DeepSeek R1 Zero based on the provided information.', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 104, 'prompt_tokens': 4582, 'total_tokens': 4686, 'completion_time': 0.297127955, 'prompt_time': 0.38954781, 'queue_time': 0.04457126, 'total_time': 0.686675765}, 'model_name': 'Llama-3.3-70B-Versatile', 'system_fingerprint': 'fp_34d416ee39', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--26355c11-c7bd-450a-8385-5f323c03f635-0', usage_metadata={'input_tokens': 4582, 'output_tokens': 104, 'total_tokens': 4686})

You can continue asking questions about Deepseek R1, but once you're done you can delete the index to save resources:

In [34]:
pc.delete_index(index_name)