# Lab: Using LangChain to build a simple RAG pipeline

## Goal

Explore connection to models.corp, basic langchain and other component functionality with the ultimate goal of developing a simple **Retrieval Augmented Generation (RAG)** question and answer system based on publicly available Medium articles.


### Steps

1. Install dependencies
2. Get models.corp credentials
3. Quick langchain tutorial (optional)
4. Download and transform data
5. Generate embeddings and add to vector store
6. Build the RAG chain
7. Query and test the RAG system
    
---
TO FINISH
### Prerequisites

#### Access to models.corp
You need to .....
VPN is required https://gitlab.cee.redhat.com/models-corp/user-documentation/-/blob/main/getting-started.md

#### Python
Python 3.13 is needed

### Key components

list the key components

### And anything else
---

## 1. Install dependencies


Uncomment the following cell and install the LangChain dependencies

In [1]:
!pip install langchain_community langchain_openai langchain_milvus langchain_text_splitters langchain_huggingface langchain_core

Collecting langchain_community
  Obtaining dependency information for langchain_community from https://files.pythonhosted.org/packages/5c/e1/975bcd11e86de74c10023d291879810d4eaffcfbb5d4c0d8fb6fb41b8247/langchain_community-0.3.25-py3-none-any.whl.metadata
  Downloading langchain_community-0.3.25-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain_openai
  Obtaining dependency information for langchain_openai from https://files.pythonhosted.org/packages/fc/9b/b8f86d78dbc651decd684ab938a1340e1ad3ba1dbcef805e12db65dee0ba/langchain_openai-0.3.24-py3-none-any.whl.metadata
  Downloading langchain_openai-0.3.24-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain_milvus
  Obtaining dependency information for langchain_milvus from https://files.pythonhosted.org/packages/3c/34/8daa975a92daff9cda1125b1ce9b280d3954e886fa76f07eb87cafde6b9a/langchain_milvus-0.2.0-py3-none-any.whl.metadata
  Downloading langchain_milvus-0.2.0-py3-none-any.whl.metadata (3.7 kB)
Collecting langchain_text_splitte

Other dependencies

In [None]:
#pip install beautifulsoup4 python-dotenv chromadb sentence-transformers

## 2. Get models.corp credentials

In this lab we will be using a **granite-3.1-8b-instruct** model. 

To connect to the model ou will need the model id, the model API url and your corresponding API key 

#### Model id and url
The models.corp user documentation gives a list of models https://gitlab.cee.redhat.com/models-corp/user-documentation#open-model-list
On the **granite-3.1-8b-instruct** page, you can find the model string (model id) and api (model API url) along with other information about the model.     https://gitlab.cee.redhat.com/models-corp/user-documentation/-/blob/main/models/granite-3-1-8b-instruct.md

#### Model API key
To gain 14 day access to models.corps you need to create an account and complete a ServiceNow request. These SNow tickets are automated, and you should have sandbox access within 30 minutes. https://gitlab.cee.redhat.com/models-corp/user-documentation/-/blob/main/getting-started.md  VPN needed

Find your api key from the application tab on the models.corp platform https://developer.models.corp.redhat.com
For this lab we will be using a granite-3.1-8b-instruct. Find the key corresponding to this model in the credentials column in the applications tab.

**Storing your key**

You will need to store this api key variable securely in your local environment (not in this notebook).
Create a `.env` file at the same location as this notebook and insert the line `GRANITE_ACCESS_TOKEN = "YOUR TOKEN GOES HERE"` 

In [1]:
from dotenv import load_dotenv
import os

# Load the environment variables from the .env file
# load_dotenv()

# Access the access token
# access_token = os.getenv("GRANITE_ACCESS_TOKEN")
access_token = "dummy"


In [2]:
# Set the variables for the model id and url
model="ibm-granite/granite-3.1-8b-instruct"
model_api_url="https://granite-3-1-8b-instruct--apicast-production.apps.int.stc.ai.prod.us-east-1.aws.paas.redhat.com/v1"

model = "granite3.3"
model_api_url="http://localhost:11434/v1"

#### SSL certificate

You may need to add an SSL certificate to connect to models.corp. You can find the PEM file in this folder. ADD LINK 

**MAC or Linux**

Add the PEM file to your current working directory and run the bash command:
```bash
cat 2022-IT-Root-CA.pem >> `python -m certifi`
```

**Windows**

Add the PEM file to your current working directory and run the Powershell command:
```powershell
cat .\2022-IT-Root-CA.pem >> $(python3 -m certifi)
```

Alternatively you can add the certificate path to your environmental variables `SSL_CERT_FILE=[Path to 2022-IT-Root-CA.pem ]`

## 3. Quick langchain tutorial (optional)

*NOTE: This section does not contribute to the RAG system. It is included for testing and exploration.* 



LangChain is a powerful framework that simplifies the development of AI applications by providing a standard way to chain together language models with external data sources and other components and tools.

In LangChain, components are the modular, reusable building blocks of an AI application, such as Models, Prompt Templates, and Output Parsers. These individual components are then linked together into chains, which define a complete workflow for a specific task, such as answering a question or summarizing a document. Using the LangChain Expression Language (LCEL), you can connect these components, creating an automated data flow from the initial user input to the final, processed output.

### 3a. Models: the core engine
A Model in LangChain is a wrapper around a large language model like Granite (the model we are using for this lab).


In [3]:
# minimum import needed
from langchain_openai import ChatOpenAI

# Temperature loose guidelines: 0-0.4 LOW deterministic;  0.5-1.0  MEDIUM balanced creativey and coherence; 1.1-2 HIGH creative and random

# Initialize the model. Set a low temperature for predictable, less creative responses
testllm = ChatOpenAI(model=model, api_key=access_token, base_url=model_api_url, temperature=0.1)

# We can now "invoke" the model with a simple prompt
response = testllm.invoke("What is the official currency of Italy?")

# The response is an AI Message object, so we access its content
print(response.content)


The official currency of Italy is the Euro. The Euro was introduced as a common currency for many European countries in 1999, and it replaced various national currencies, including the Italian Lira. As a member of the Eurozone, Italy adopted the Euro for cash transactions, and it is now used for all official financial dealings within the country. The Euro is also utilized by 19 out of the 27 European Union (EU) member states, facilitating trade and travel among these nations.


### 3b. Output Parsers: Structuring the Response

The output from an LLM is typically an AI Message object. An output parser is a class that helps you structure the model's response into a more usable format, like a simple string, a list, or a JSON object.

In the last exercise, we used `response.content` to get the content string from our model response. Here we will use an output parser component which we can use later in a LangChain chain. 


In [9]:
from langchain_core.output_parsers import StrOutputParser

# Initialize the parser
output_parser = StrOutputParser()

# The llm.invoke() call returns an AIMessage object
response = testllm.invoke("Give me a 60 word biography on Josephine Baker")

# The parser converts the AIMessage into a simple string
parsed_output = output_parser.invoke(response)

print(parsed_output)

Josephine Baker (1891-1975) was an American-born French entertainer and civil rights activist, celebrated for her dazzling performances in Parisian nightclubs. Known as "The Black Venus," she broke racial barriers with her exotic dance routines, captivating audiences worldwide. Beyond the stage, Baker was a dedicated anti-colonial and civil rights campaigner, adopting 12 children of diverse backgrounds and fighting for equality. Her legacy endures as an icon of artistic brilliance and social justice.


### 3c. Prompt Templates: Crafting Your Instructions
A prompt template in LangChain is a reusable object that creates a complete and formatted prompt for a language model by dynamically inserting user inputs and other variables into a predefined text structure.

We will compare two prompt template classes.



#### PromptTemplate
This class creates a single string from a template.

In [53]:
from langchain_core.prompts import PromptTemplate

# Define a PromptTemplate object with a placeholder for a word to translate
PT_prompt = PromptTemplate(
    template="Translate the expression {toTranslate} from English to French.",
    input_variables=["toTranslate"]
)

#invoke the template and print the output and type 
formatted_PT_prompt=PT_prompt.invoke({"toTranslate": "little by little"})

print(formatted_PT_prompt)
print(f"Type: {type(formatted_PT_prompt)}\n")

text='Translate the expression little by little from English to French.'
Type: <class 'langchain_core.prompt_values.StringPromptValue'>



#### ChatPromptTemplate
This class creates a structured list of messages for chat models.

In [54]:
from langchain_core.prompts import ChatPromptTemplate

# Define a ChatPromptTemplate object with a placeholder for a word to translate
CPT_prompt = ChatPromptTemplate.from_template(
    "Translate the expression {toTranslate} from English to French."
)

#invoke the template and print the output and type 
formatted_CPT_prompt=CPT_prompt.invoke({"toTranslate": "little by little"})

print(formatted_CPT_prompt)
print(f"Type: {type(formatted_CPT_prompt)}\n")

messages=[HumanMessage(content='Translate the expression little by little from English to French.', additional_kwargs={}, response_metadata={})]
Type: <class 'langchain_core.prompt_values.ChatPromptValue'>



#### Comparison
`ChatPromptTemplate` creates a structured list of messages whereas `PromptTemplate`creates a single string.

`ChatPromptTemplate` is used more commonly because modern models are optimised for a sequence of messages. They perform best when they receive a structured list of messages with roles (System for instructions, Human for user input), and ChatPromptTemplate is designed specifically for this.
However it is also possible to format those roles manually with`PromptTemplate` 

### 3d. LangChain Expression Language (LCEL): Chaining It All Together

LCEL is the declarative syntax used to chain LangChain components together. It uses the pipe symbol ( | ). Data flows from one component to the next in the sequence.

We will create a simple chain using use the `testllm` model, the `PT_Prompt` template, and the `output_parser` from the previous exercises.

In [10]:
# Build the chain
test_chain=PT_prompt | testllm | output_parser

# Invoke the chain with the input data "Congratulations on building this chain" (the expression we want to translate into French)

chain_output = test_chain.invoke("congratulations on building this chain")

print(chain_output)


NameError: name 'PT_prompt' is not defined

Congratulations on your first LangChain chain!

## 4. Download and transform data

The example uses multiple Medium articles on the subject of generative AI as the source document.

We will load the documents and split them into shorter document chunks.

In [4]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter


# Instantiate the WebBaseLoader to fetch content from specified URLs.
loader = WebBaseLoader(
    web_paths=(
        "https://medium.com/@tuhinsharma121/mastering-prompt-engineering-a-beginners-guide-to-ai-interaction-2a28434ccb67",
        "https://medium.com/@rahuljangir2992/graph-based-prompting-revolutionizing-ai-reasoning-f316b7266c1f",
        "https://medium.com/@fassha08/transforming-search-ai-agents-and-multi-vector-intelligence-1bde1dbe66e7",
        "https://medium.com/@harshkumar1146/prompt-chaining-unlocking-the-full-potential-of-ai-assistants-4fdf2f28c1a5",
    ),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
        )
    ),
)

# The load() method fetches and parses the content from the URLs returning a list of Document objects, where each object contains the text content of one webpage.
documents = loader.load()

# Initialize the RecursiveCharacterTextSplitter which will break down large texts into smaller chunks.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

docs = text_splitter.split_documents(documents)

USER_AGENT environment variable not set, consider setting it to identify your requests.


#### Exploring the document chunks

*Note: This does not contribute to building the RAG system*

The output, `docs`, is a flat list of all the text chunks derived from the original documents. These smaller, more granular pieces of text are now in an ideal format to be used for creating vector embeddings for a similarity search in a RAG pipeline.

If we look at the first two list items we can see the included metadata and note the overlap in the `page_content` chunks. 
Without overlap, you risk cutting a sentence or a complete thought exactly in half, leading to two incoherent chunks that lose their meaning. This severely damages the ability of the RAG system to both find the right information (retrieval) and understand it correctly (generation).



In [4]:
print(f"The articles have been split into {len(docs)} sub-documents. \n")
print ("First sub-document:")
docs[0]


The articles have been split into 38 sub-documents. 

First sub-document:


Document(metadata={'source': 'https://medium.com/@tuhinsharma121/mastering-prompt-engineering-a-beginners-guide-to-ai-interaction-2a28434ccb67', 'title': 'Mastering Prompt Engineering: A Beginner’s Guide to AI Interaction | by Tuhin Sharma | Medium', 'description': 'In today’s world of artificial intelligence (AI), prompt engineering has become a key skill. It changes how we talk to AI models and make them work better. Whether you’re experienced or just…', 'language': 'en'}, page_content='Mastering Prompt Engineering: A Beginner’s Guide to AI Interaction | by Tuhin Sharma | MediumSitemapSign upSign inMedium LogoWriteSign upSign inMastering Prompt Engineering: A Beginner’s Guide to AI InteractionTuhin SharmaFollow11 min read·Jul 10, 2024--1ListenShareIn today’s world of artificial intelligence (AI), prompt engineering has become a key skill. It changes how we talk to AI models and make them work better. Whether you’re experienced or just starting, knowing how to create good prompts can 

In [5]:
print("Second sub-document:")
docs[1]

Second sub-document:


Document(metadata={'source': 'https://medium.com/@tuhinsharma121/mastering-prompt-engineering-a-beginners-guide-to-ai-interaction-2a28434ccb67', 'title': 'Mastering Prompt Engineering: A Beginner’s Guide to AI Interaction | by Tuhin Sharma | Medium', 'description': 'In today’s world of artificial intelligence (AI), prompt engineering has become a key skill. It changes how we talk to AI models and make them work better. Whether you’re experienced or just…', 'language': 'en'}, page_content='as it showcases their ability to handle a wide variety of tasks and questions they have not explicitly been prepared for.ExamplePrompt:Classify the text into neutral, negative or positive. Text: I think the vacation is okay.Sentiment:Output:NeutralLimitationFirstly, these models often aren’t as accurate as models trained on specific tasks because they’re trying to generalize without direct examples. This can lead to errors or lower confidence in the results. Also, because they handle such a broad rang

## 5. Generate embeddings and add to vector store

### 5a. Embeddings
Vector embeddings are dense numerical representations of data (our article chunks in this lab). An embedding is essentially an array of numbers that can represent a vector in multidimensional space. These vector representations position similar concepts closer together within this high-dimensional vector space, creating a semantic map of the text.

To generate embeddings from our text chunks we use an embedding model, mxbai-embed-large-v1, from HuggingFace. This model converts each text chunk into an array with 1024 dimensions.


In [5]:
from langchain_huggingface import HuggingFaceEmbeddings

# Specify the embeddings model
embeddings = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

  from .autonotebook import tqdm as notebook_tqdm


#### Exploring embeddings
We can explore an embedding on some sample text.

*Note: This does not contribute to building the RAG system*

In [7]:
sample_text="What are the advantages of few shot prompting?"

vector_embedding = embeddings.embed_query(sample_text)

# Print some information about the vector
print(f"Number of dimensions {len(vector_embedding)}")
print(f"Data type of the numbers {type(vector_embedding[0])}")

# Print the first few numbers to give a sense of what it looks like
print("Here are the first 5 dimensions (numbers) of the vector")
print(vector_embedding[:5])


Number of dimensions 1024
Data type of the numbers <class 'float'>
Here are the first 5 dimensions (numbers) of the vector
[0.1568078249692917, 0.26713573932647705, -0.11924576759338379, 0.4239087998867035, -1.244588851928711]


### 5b. Vector store

We will add the embeddings and the document chunks to an in-memory vector database. 

Two options are presented below. Both Milvus and Chroma are vector databases that can operate in-memory to provide fast, low-latency similarity searches for AI applications. Milvus is built for very large, complex projects that need to handle huge amounts of data, while Open Source Chroma is designed to be very simple and easy to start with for smaller, single-computer applications.

#### Option 1 - Milvus

(will not work if you are on a Windows machine)

In [None]:
from langchain_milvus import Milvus

vectorstore = Milvus.from_documents(  
    documents=docs,
    embedding=embeddings,
    connection_args={
        "uri": "./milvus_demo.db",
    },
    drop_old=True,  # Drop the old Milvus collection if it exists
    index_params={"index_type": "FLAT", "metric_type": "L2"},

)

#check the number of entries corresponds with the number of sub-documents (chunks)
vectorstore.col.flush()
number_of_entries = vectorstore.col.num_entities
print(f"The number of entries in the vector store is: {number_of_entries}")

#### Option 2 - Chroma

In [6]:
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
#from langchain_community.embeddings import FakeEmbeddings
#import chromadb
#client = chromadb.EphemeralClient()

# Deletes the database entries if they already exist
try:
    print(f"Attempting to delete collection: '{"Medium_articles"}'...")
    vectorstore.delete_collection()
    print("Collection deleted successfully.")
except Exception as e:
    # This error is expected on the very first run when the collection doesn't exist yet.
    print(f"Collection did not exist, so no deletion was needed.")
    

# Adds the document chunks and the embeddings.
vectorstore = Chroma.from_documents(
    #client=client,
    documents=docs,
    embedding=embeddings,
    collection_name="Medium_articles",
    collection_metadata={"hnsw:space": "l2"}
    
)

#check the number of entries corresponds with the number of sub-documents (chunks)
number_of_entries = vectorstore._collection.count()
print(f"The vector store contains {number_of_entries} entries.")

Attempting to delete collection: 'Medium_articles'...
Collection did not exist, so no deletion was needed.
The vector store contains 38 entries.


#### About index parameters

An index in a vector database is a data structure that organizes the vectors. 
FLAT and HNSW (Hierarchical Navigable Small World) are two different types of indexes used in vector databases to manage and search through high-dimensional data. They represent a fundamental trade-off between search accuracy and speed.

**FLAT Index** 

A FLAT index is the most basic approach to vector search. It is a brute-force method where a query vector is directly compared to every single other vector in the dataset. It is slow, but 100% accurate.
In our example above the Milvus instance uses a FLAT index. We can change this parameter setting to HNSW.


**HNSW (Hierarchical Navigable Small World) Index** 

HNSW is a sophisticated graph-based index that provides a powerful balance between search speed and accuracy. It's an Approximate Nearest Neighbor (ANN) algorithm, meaning it finds results that are most likely the nearest neighbors, but without a 100% guarantee.
In our example above the Chroma instance uses a HNSW index.

**Similarity Measures**

In both cases we have specified L2 as an index parameter.

This parameter defines how similarity between two vector embeddings is measured. 
L2 signifies that similarity between 2 vectors is measured by the shortest Euclidean distance between them in multi-dimensional space. 

In simpler terms, the vector embeddings can be thought of as lines in space. Two vector embeddings are 'similar' if the lines are close to each other. 

This is how our RAG system can identify the most relevant document chunks to feed to the LLM. The user query vector embedding is compared to the sub-document embeddings.

L2 distance is not the only way to compare 2 vector embeddings. Another common choice is Cosine Similarity which measures the angle between 2 vectors in multi-dimensional space.

## 6. Build the RAG chain

Now our documents have been chunked and stored in a vector database with their embeddings we are ready to define the components needed for our RAG system. 

In [7]:
# imports for this section
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

#### Initialize the model
Initialize the model using the models.corp credentials defined earlier (see section 2 - Get models.corp credentials).
Note the low temperature parameter which will give us accurate, factual responses.

In [8]:
# Initialize the model
llm = ChatOpenAI(model=model, api_key=access_token, base_url=model_api_url, temperature=0.1)

#### Create a retriever object
The vectorstore.as_retriever() method creates a retriever object, which acts as a specialized search interface for the vector database.

This retriever takes a query, uses the embedding model to get the query embedding vector, uses the vector store's index to efficiently find the most semantically relevant documents by vector comparison, and returns the top k documents, ready to be used.

In [9]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

#### Define the prompt template

Note that we are using PromptTemplate and manually specifying the roles Human and Assistant (see section 3c)
You can also see two placeholders. `{question}` will be filled by the user's query and `{context}` will be filled by the article chunks returned by the retriever. 

In [10]:
PROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer. Specify if the answer is in the context or not.
<context>
{context}
</context>

<question>
{question}
</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""

prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)


#### Building the RAG chain
The chain's first step is a dictionary with two keys, context and question. 
* "context" key: The input query is passed to the retriever, which fetches relevant documents. These documents are then piped to format_docs to be converted into a single string.
* "question" key: The RunnablePassthrough() also receives the exact same input query. Its only job is to do nothing to it and pass it straight through.

This output is then fed to the prompt, where the `context` and `question` placeholders are substituted by the retrieved document string and the user input query.

The formatted prompt is passed to the LLM, and lastly the response is passed through an output parser.


In [11]:
# joins the returned documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
    
# builds the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## 7. Query and test the RAG system
We can now invoke the RAG chain and test if it replies correctly to our queries with responses based on our stored documents.

In [24]:
query = "What are the advantages of few shot prompting?"

res = rag_chain.invoke(query)
print("--------------------------\n")
print("Question : ",query)
print("\n--------------------------\n")
print("Response : ",res)
print("\n--------------------------")


--------------------------

Question :  What are the advantages of few shot prompting?

--------------------------

Response :  Few-shot prompting offers several advantages in the field of artificial intelligence, particularly with models like GPT. It allows AI models to understand and generate context-specific responses with minimal input data. This technique involves providing the AI with a few examples to guide its responses, enabling the model to infer the desired task from these examples without explicit programming.

One key advantage is improved performance on a wide range of tasks. According to a study by Brown et al. (2020), few-shot learning can achieve performance comparable to full fine-tuning on a diverse set of tasks, even when the number of training examples is limited.

Another advantage is the ability to handle tasks that the model has not been explicitly trained on. This is demonstrated in the provided context, where the model correctly identifies the sentiment of the

#### Explore the retrieved documents for comparison with the response
We can invoke the retrieever directly to explore the documents that have been used to inform the response.

In [25]:
# 2. Invoke the retriever directly to get the relevant documents

retrieved_docs = retriever.invoke(query)

# 3. Print the retrieved documents to inspect their content
print("--- Retrieved Documents ---")
for i, doc in enumerate(retrieved_docs):
    print(f"\n--- Document {i+1} ---\n")
    print(f"Source: {doc.metadata.get('source', 'N/A')}\n")
    print(f"Content: {doc.page_content}\n")
print("---------------------------")

--- Retrieved Documents ---

--- Document 1 ---

Source: https://medium.com/@tuhinsharma121/mastering-prompt-engineering-a-beginners-guide-to-ai-interaction-2a28434ccb67

Content: is:Output:When we won the game, we all started to farduddle in celebration.LimitationStandard few-shot prompting is good for many tasks, but it’s not perfect, especially for complex thinking tasks. Let’s show why that is.Prompt:The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.A: The answer is False.The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.A: The answer is True.The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.A: The answer is True.The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.A: The answer is False.The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:Output:The answer is True.3. Chain-of-Thought (CoT) PromptingChain-of-thought (CoT) prompting he

### Exploring and testing the RAG system
Let's see what happens if we ask something that is not in the Meduim article documents. Is this the behaviour you expect?

In [26]:
query_test = "What is the capital of France?"

res = rag_chain.invoke(query_test)
print("--------------------------\n")
print("Question : ",query_test)
print("\n--------------------------\n")
print("Response : ",res)
print("\n--------------------------")


--------------------------

Question :  What is the capital of France?

--------------------------

Response :  The capital of France is Paris. It is the most populous city in France, with a population of approximately 2.14 million people within the city proper and over 12 million in the metropolitan area. Paris is known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral.

--------------------------


#### Again explore the retrieved documents for comparison with the response
Can you see any information related to the query?

In [28]:
# 2. Invoke the retriever directly to get the relevant documents

retrieved_docs = retriever.invoke(query_test)

# 3. Print the retrieved documents to inspect their content
print("--- Retrieved Documents ---")
for i, doc in enumerate(retrieved_docs):
    print(f"\n--- Document {i+1} ---\n")
    print(f"Source: {doc.metadata.get('source', 'N/A')}\n")
    print(f"Content: {doc.page_content}\n")
print("---------------------------")

--- Retrieved Documents ---

--- Document 1 ---

Source: https://medium.com/@fassha08/transforming-search-ai-agents-and-multi-vector-intelligence-1bde1dbe66e7

Content: langchain_core.output_parsers import StrOutputParserfrom semantic_router import Routefrom semantic_router.encoders import HuggingFaceEncoderfrom semantic_router.layer import RouteLayerfrom langchain_community.tools.tavily_search import TavilySearchResultsimport osInstantiate the Embedding Modelembeddings = HuggingFaceEmbeddings(            model_name="BAAI/bge-large-en-v1.5")Vector DatabasesIn this blog, we are using LangChain wrappers to connect to different vector databases. For more information on setting up PgVector, please refer to the PgVector documentation. Similarly, for Qdrant, you can refer to the Qdrant documentation. Regardless of which vector store we are using, the process remains the same.Data UsedDataset1: Technical documentationDomain: Software DevelopmentVector Database: PGVectorDataset2: Customer Serv

### Next steps

You have successfully built a simple **Retrieval Augmented Generation (RAG) question** and answer system based on publicly available Medium articles.

Well done!

#### To go further
- Experiment with different parameters and prompts
- Try other models from models.corp