### What is RAG?
One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as retrieval-augmented generation (RAG). RAG is a technique for augmenting LLM knowledge with additional data, which can be your own data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to public data up to the specific point in time that they were trained. If you want to build AI applications that can reason about private data or data introduced after a model‚Äôs cut-off date, you must augment the knowledge of the model with the specific information that it needs. The process of bringing and inserting the appropriate information into the model prompt is known as RAG.

LangChain has several components that are designed to help build Q&A applications and RAG applications, more generally.

### RAG architecture
A typical RAG application has two main components:

* **Indexing**: A pipeline for ingesting and indexing data from a source. This usually happens offline.

* **Retrieval and generation**: The actual RAG chain takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

The most common full sequence from raw data to answer looks like the following examples.


- **Indexing**
1. Load: First, you must load your data. This is done with [DocumentLoaders](https://python.langchain.com/docs/how_to/#document-loaders).

2. Split: [Text splitters](https://python.langchain.com/docs/how_to/#text-splitters) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it into a model because large chunks are harder to search and won‚Äôt fit in a model‚Äôs finite context window.

3. Store: You need somewhere to store and index your splits so that they can later be searched. This is often done using a [VectorStore](https://python.langchain.com/docs/how_to/#vector-stores) and [Embeddings](https://python.langchain.com/docs/how_to/embed_text/) model.


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WEE3pjeJvSZP0R7UL7CYTA.png" width="50%" alt="indexing"/> <br>
<span style="font-size: 10px;">[source](https://python.langchain.com/docs/tutorials/rag/)</span>


- **Retrieval and generation**
1. Retrieve: Given a user input, relevant splits are retrieved from storage using a retriever.
2. Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/SwPO26VeaC8VTZwtmWh5TQ.png" width="50%" alt="retrieval"/> <br>
<span style="font-size: 10px;">[source](https://python.langchain.com/docs/use_cases/question_answering/)</span>


In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
import wget
filename = 'companyPolicies.txt'
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/6JDbUb_L3egv_eOkouY71A.txt'

# Use wget to download the file
wget.download(url, out=filename)
print('file downloaded')

file downloaded


In [3]:
with open(filename,"r") as file:
    contents = file.read()
    print(contents)

1.	Code of Conduct

Our Code of Conduct outlines the fundamental principles and ethical standards that guide every member of our organization. We are committed to maintaining a workplace that is built on integrity, respect, and accountability.
Integrity: We hold ourselves to the highest ethical standards. This means acting honestly and transparently in all our interactions, whether with colleagues, clients, or the broader community. We respect and protect sensitive information, and we avoid conflicts of interest.
Respect: We embrace diversity and value each individual's contributions. Discrimination, harassment, or any form of disrespectful behavior is unacceptable. We create an inclusive environment where differences are celebrated and everyone is treated with dignity and courtesy.
Accountability: We take responsibility for our actions and decisions. We follow all relevant laws and regulations, and we strive to continuously improve our practices. We report any potential violations of 

### Split the document into chunks

In this step, you are splitting the document into chunks, which is basically the `split` process in `Indexing`.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/0JFmAV5e_mejAXvCilgHWg.png" width="50%" alt="split"/>

`LangChain` is used to split the document and create chunks. It helps you divide a long story (document) into smaller parts, which are called `chunks`, so that it's easier to handle. 

For the splitting process, the goal is to ensure that each segment is as extensive as if you were to count to a certain number of characters and meet the split separator. This certain number is called `chunk size`. Let's set 1000 as the chunk size in this project. Though the chunk size is 1000, the splitting is happening randomly. This is an issue with LangChain. `CharacterTextSplitter` uses `\n\n` as the default split separator. You can change it by adding the `separator` parameter in the `CharacterTextSplitter` function; for example, `separator="\n"`.


In [4]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader(filename)
docs = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
chunks = text_splitter.split_documents(docs)
print(len(chunks)) 

Created a chunk of size 1624, which is longer than the specified 1000
Created a chunk of size 1885, which is longer than the specified 1000
Created a chunk of size 1903, which is longer than the specified 1000
Created a chunk of size 1729, which is longer than the specified 1000
Created a chunk of size 1678, which is longer than the specified 1000
Created a chunk of size 2032, which is longer than the specified 1000
Created a chunk of size 1894, which is longer than the specified 1000


16


### Embedding and storing
This step is the `embed` and `store` processes in `Indexing`. <br>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/u_oJz3v2cSR_lr0YvU6PaA.png" width="50%" alt="split"/>

In this step, you're taking the pieces of the story, your "chunks," converting the text into numbers, and making them easier for your computer to understand and remember by using a process called "embedding." Think of embedding like giving each chunk its own special code. This code helps the computer quickly find and recognize each chunk later on. 

You do this embedding process during a phase called "Indexing." The reason why is to make sure that when you need to find specific information or details within your larger document, the computer can do so swiftly and accurately.

The following code creates a default embedding model from Hugging Face and ingests them to Chromadb.

When it's completed, print "document ingested".



In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
doc_search = Chroma.from_documents(documents=chunks,
                                   embedding=embeddings,
                                   persist_directory="./chroma_db")

# persist_directory means where the chroma db saves the vector db on disk
print("Documents ingested")

Documents ingested


In [6]:
from langchain_groq import ChatGroq

llm = ChatGroq(model="openai/gpt-oss-20b")

Note: The langchain v1 does not have chains. So I'm using lcel for this part

In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
    """You are a helpful assistant.
    
    Answer the question using ONLY the context below.
    
    Context:
    {context}
    
    Question:
    {question}"""
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


In [None]:
retriever = doc_search.as_retriever()

# User Question ‚Üí Retriever ‚Üí Context ‚Üí Prompt ‚Üí LLM ‚Üí Answer

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough() # Returns the input unchanged
    }
    | prompt 
    | llm 
    | StrOutputParser()
)

response = rag_chain.invoke("what is the mobile policy?")
response

'**Mobile Phone Policy**\n\nThe policy establishes how employees should use mobile devices in the organization. Key points include:\n\n- **Acceptable Use** ‚Äì Phones are mainly for work; limited personal use is allowed if it doesn‚Äôt interfere with duties.  \n- **Security** ‚Äì Protect devices and credentials, be cautious with unknown apps or links, and report any security concerns immediately.  \n- **Confidentiality** ‚Äì Do not send sensitive company data via unsecured messaging or email; be discreet in public.  \n- **Cost Management** ‚Äì Keep personal charges separate from company accounts; reimburse any personal charges on company‚Äëissued phones.  \n- **Compliance** ‚Äì Follow all relevant laws and regulations on data protection and privacy.  \n- **Lost or Stolen Devices** ‚Äì Report any loss or theft right away to IT or your supervisor.  \n- **Consequences** ‚Äì Non‚Äëcompliance can result in disciplinary action, including loss of mobile‚Äëphone privileges.  \n\nThe policy is 

# How a RAG Chain Works in LangChain v1

## Short Answer

**The question comes first.**
Then the retriever uses the question to build the context.

```
User Question ‚Üí Retriever ‚Üí Context ‚Üí Prompt ‚Üí LLM ‚Üí Answer
```

---

#  Step-by-Step Explanation

Let‚Äôs say the user asks:

```
"What is mobile policy?"
```

And your RAG chain is:

```python
rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)
```

---

## Step 1 ‚Äî User Sends Question

```
Input ‚Üí "What is mobile policy?"
```

This is the **only input** to the chain.

---

## Step 2 ‚Äî Input Goes Into Both Branches

```python
{
   "context": retriever | format_docs,
   "question": RunnablePassthrough()
}
```

This means:

* Send the same question to **retriever**
* Send the same question to **prompt**

So both receive:

```
"What is mobile policy?"
```

---

## Step 3 ‚Äî Retriever Builds Context

Retriever searches vector database using the question.

Example retrieved document:

```
Mobile Policy:
Employees must not use personal phones during meetings.
```

Then `format_docs()` converts docs into one string:

```
Mobile Policy: Employees must not use personal phones during meetings.
```

This becomes **context**.

---

## Step 4 ‚Äî RunnablePassthrough Passes Question

```python
RunnablePassthrough()
```

This just forwards the question unchanged.

So now we have:

```python
{
  "context": "Mobile Policy text...",
  "question": "What is mobile policy?"
}
```

---

## Step 5 ‚Äî Prompt Is Filled

Prompt template:

```
Context:
{context}

Question:
{question}
```

After filling:

```
Context:
Mobile Policy: Employees must not use personal phones during meetings.

Question:
What is mobile policy?
```

---

## Step 6 ‚Äî LLM Generates Answer

Example output:

```
The mobile policy states that employees must not use personal phones during meetings.
```

---

## üîπ Step 7 ‚Äî Output Parser

`StrOutputParser()` converts LLM output into plain text.

---

#  Final Flow Diagram

```
1Ô∏è‚É£ User Question
        ‚Üì
2Ô∏è‚É£ Retriever searches docs
        ‚Üì
3Ô∏è‚É£ Context built
        ‚Üì
4Ô∏è‚É£ Prompt filled
        ‚Üì
5Ô∏è‚É£ LLM generates answer
        ‚Üì
6Ô∏è‚É£ Output returned
```

---

# Important Notes

* You **cannot build context before the question**, because retriever needs the question.
* `RunnablePassthrough()` is used to pass the question unchanged.
* Both context-building and question-passing happen **after the question arrives**.

---

# Real-Life Analogy

Like Google Search:

1. You type a question
2. Google searches the internet
3. Shows results

No question ‚Üí no search.

---

# Minimal Toy Example

```python
def retriever(question):
    return ["Mobile policy: no phones in meetings"]

def format_docs(docs):
    return "\n".join(docs)

question = "What is mobile policy?"

context = format_docs(retriever(question))

print(f"""
Context: {context}
Question: {question}
""")
```

---

In [None]:
rag_chain_with_sources = (
    {
        "docs": retriever,
        "question": RunnablePassthrough()
    }
    # here where adding extra field for context
    | RunnablePassthrough.assign(
        context=lambda x: format_docs(x["docs"])
    )
    | prompt
    | llm
)

response = rag_chain_with_sources.invoke("what is the mobile policy?")
print(response.content)

The Mobile Phone Policy sets the standards and expectations for how employees may use mobile devices in the organization. It covers:

- **Acceptable Use:** Devices are mainly for work; limited personal use is allowed only if it does not interfere with work duties.  
- **Security:** Employees must protect their devices and credentials, be careful with unfamiliar apps or links, and report any security concerns immediately.  
- **Confidentiality:** Sensitive company information should not be sent through unsecured messaging or email, and employees should be discreet about company matters in public.  
- **Cost Management:** Personal use must be kept separate from company accounts, and any personal charges on company‚Äëissued phones must be reimbursed.  
- **Compliance:** All applicable laws and regulations about mobile use, data protection, and privacy must be followed.  
- **Lost or Stolen Devices:** Any loss or theft must be reported right away to IT or a supervisor.  
- **Consequences:*

You use prompts to guide the responses from an LLM the way you want. For instance, if the LLM is uncertain about an answer, you instruct it to simply state, "I do not know," instead of attempting to generate a speculative response.

Let's see an example.


In [12]:
response = rag_chain.invoke("Can I eat in company vehicles?")
print(response)

The policies you provided do not contain any guidance about eating in company vehicles.  Therefore, based on the information available here, it‚Äôs unclear whether eating in company vehicles is permitted or prohibited.


As you can see, the query is asking something that does not exist in the document. The LLM responds with information that actually is not true. You don't want this to happen, so you must add a prompt to the LLM.


Using PromptTemplates

In [13]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template(
    """Use the information from the document to answer the question at the end. If you don't know the answer, just say that  you don't know, definitely do not try to make up an answer.
    
    {context}

    Question: {question}
    """
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt 
    | llm
    | StrOutputParser()
)

query = "Can I eat in company vehicles ?"
response = rag_chain.invoke(query)
print(response)

I don't know.


### Make the conversation have memory

Take a look at a situation in which an LLM does not have memory.

You start a new query, "What I cannot do in it?". You do not specify what "it" is. In this case, "it" means "company vehicles" if you refer to the last query.


In [14]:
query = "What I cannot do it?"
response = rag_chain.invoke(query)
print(response)

**What you are not allowed to do (according to the policies in the document):**

- **Internet & Email Use**
  - Use company‚Äëprovided internet or email for personal activities that interfere with job responsibilities.
  - Share your login credentials or passwords with anyone else.
  - Download or open email attachments or click on links from unknown or unverified sources.
  - Send confidential information, trade secrets, or sensitive customer data over email without encryption.
  - Discuss company matters on public forums or social media without discretion.
  - Use the internet or email to harass, discriminate, or distribute offensive or inappropriate content.

- **Drug & Alcohol Policy**
  - Use, possess, distribute, or sell illegal drugs or unauthorized controlled substances on company premises or during work‚Äërelated activities.
  - Misuse prescription drugs.
  - Consume alcoholic beverages during work hours, on company property, or while performing company‚Äërelated duties (unles

From the response, you see that the model does not have the memory because it does not provide the correct answer, which is something related to "smoking is not permitted in company vehicles."

To make the LLM have memory we need to use `RunnableWithMessageHistory` and `InMemoryChatMessageHistory` in Langchain

In [None]:
from langchain_core.runnables import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter

prompt = ChatPromptTemplate.from_messages([
    ("system","Use the information from the document to answer the question at the end. If you don't know the answer, just say that  you don't know, definitely do not try to make up an answer."),
    ("placeholder","{history}"),
    ("human","{question}"),
    ("system","Context:\n{context}")
])

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt 
    | llm
    | StrOutputParser()
)

store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain = (
    {
        "context": itemgetter("question")| retriever | format_docs,
        "question": itemgetter("question")
    }
    | prompt
    | llm
    | StrOutputParser()
) 

rag_chain_with_memory = RunnableWithMessageHistory(chain,
                                                   get_session_history,
                                                   input_messages_key="question",
                                                   history_messages_key="history")

# Note this automatically stores the conversation history 

response = rag_chain_with_memory.invoke(
    {"question":"What is the mobile policy?"},
    config={"configurable":{"session_id":"1"}}
)

print(response)

**Mobile Phone Policy Summary**

The Mobile Phone Policy outlines how employees should use mobile devices at work. Key points include:

- **Primary Purpose**: Devices are mainly for work‚Äërelated tasks; limited personal use is allowed only if it doesn‚Äôt interfere with job duties.  
- **Security**: Protect devices and login credentials, avoid suspicious apps or links, and report any security incidents promptly.  
- **Confidentiality**: Do not send sensitive company information through unsecured messaging or email, and be discreet about company matters in public.  
- **Cost Management**: Keep personal charges separate from company accounts; reimburse the company for any personal usage on company‚Äëissued phones.  
- **Compliance**: Follow all applicable laws and regulations on data protection and privacy.  
- **Lost/Stolen Devices**: Report immediately to IT or your supervisor.  
- **Consequences**: Violations can lead to disciplinary action, up to loss of mobile phone privileges.  



In [19]:
query = "List points in it?"
response = rag_chain_with_memory.invoke(
    {"question":query},
    config={"configurable":{"session_id":"1"}}
)
print(response)

I‚Äôm sorry, but I don‚Äôt have enough information to list the specific points in those policies.


Making this a rag agent 

In [20]:
from langchain.tools import tool

@tool 
def search_docs(query: str) -> str:
    '''Search document for information'''
    results = doc_search.similarity_search(query)
    return results[0].page_content

In [22]:
from langchain.chat_models import init_chat_model
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver

model = init_chat_model(model="openai/gpt-oss-20b",
                        model_provider="groq")

agent = create_agent(model=model,
                     tools=[search_docs],
                     system_prompt="You are a helpful agent that can search the policy document for information.",
                     checkpointer=InMemorySaver())

In [24]:
from langchain.messages import HumanMessage

response = agent.invoke({"messages":HumanMessage(content="What is the smoking policy?")},
                        config={'configurable':{'thread_id':'1'}})

In [25]:
print(response['messages'][-1].content)

I couldn‚Äôt locate the detailed smoking policy text in the document‚Äîonly the heading ‚Äú5. Smoking Policy‚Äù was returned. If you have the full policy text handy, feel free to share it and I can help summarize or explain it for you.


In [26]:
response = agent.invoke({"messages":HumanMessage(content="Can you list all the points of it?")},
                        config={'configurable':{'thread_id':'1'}})
print(response['messages'][-1].content)

It looks like the document only contains the heading ‚Äú5.‚ÄØSmoking Policy‚Äù and no further details were found in the searchable text. If you have the full text of the policy (or a PDF/word document that includes the bullet points), please paste it here and I‚Äôll gladly list all the points for you.


In [27]:
response = agent.invoke({"messages":HumanMessage(content="Can you summarize it?")},
                        config={'configurable':{'thread_id':'1'}})
print(response['messages'][-1].content)

I‚Äôm afraid the document I can search only shows the heading ‚Äú5.‚ÄØSmoking Policy‚Äù ‚Äì the full text of the policy itself isn‚Äôt available in the current source. If you can paste the policy‚Äôs bullet points or the complete paragraph, I‚Äôll gladly read it and give you a concise summary.


### Exercise 1: Work on your own document


You are welcome to use your own document to practice. Another document has also been prepared that you can use for practice. Can you load this document and make the LLM read it for you? <br>
Here is the URL to the document: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/XVnuuEg94sAE4S_xAsGxBA.txt


In [34]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma 
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables import RunnableWithMessageHistory
from operator import itemgetter


import wget

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/XVnuuEg94sAE4S_xAsGxBA.txt"
filename = "companyPolicies.txt"

wget.download(url, filename)
loader = TextLoader(filename)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50,separators=["\n\n","\n"," ",""])
chunks = text_splitter.split_documents(docs)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(documents=chunks,
                                    embedding=embeddings,
                                    persist_directory="./chroma_db")

retriever = vectorstore.as_retriever()

llm = ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct")

prompt = ChatPromptTemplate.from_messages([
    ("system","Use the information from the document to answer the question at the end. If you don't know the answer, just say that  you don't know, definitely do not try to make up an answer."),
    ("placeholder","{history}"),
    ("human","{question}"),
    ("system","Context:\n{context}")
])

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain = (
    {
        "context":itemgetter("question") | retriever | format_docs,
        "question":itemgetter("question")
    }
    | prompt 
    | llm 
    | StrOutputParser()
)

rag_chain = RunnableWithMessageHistory(chain,
                                       get_session_history,
                                       input_messages_key="question",
                                       history_messages_key="history")

rag_chain.invoke({"question":"What is this document about?"},
                 config={'configurable':{'session_id':'1'}})




"This document appears to be about a Code of Conduct, as the phrase is repeated four times. However, I don't have enough information to provide a detailed description of what the Code of Conduct entails."