## Types of Query Transformation

- Improve User's Prompt
    - Parallel Query Fan out Retrieval (filter_unique_query)
    - Reciprocate Rank Fusion (Rank LLM Outpu)
    - Query Decomposition (Breaking Query)
        - Less Abstract (sync process)
        - More Abstract (Step Back Prompting)
    - HyDE (Hypothetical Document Embedding)

### Parallel Query Fan out Retrieval (filter_unique_query)

![rag flow image](paralle-query.png "RAG FLOW")

In [1]:
import os
from dotenv import load_dotenv
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from openai import OpenAI

load_dotenv()

True

In [17]:
file_path = "./nodejs.pdf"

loader = PyPDFLoader(file_path)

In [18]:
# 1 Data loading
docs = loader.load()
docs

Ignoring wrong pointing object 268 0 (offset 0)
Ignoring wrong pointing object 309 0 (offset 0)


[Document(metadata={'producer': 'macOS Version 10.14.1 (Build 18B75) Quartz PDFContext', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': "D:20190227140340Z00'00'", 'author': 'Andrew Mead', 'moddate': "D:20190227140340Z00'00'", 'source': './nodejs.pdf', 'total_pages': 125, 'page': 0, 'page_label': '1'}, page_content='A PDF Reference for        The Complete Node.js Dev Course                Version 3.0'),
 Document(metadata={'producer': 'macOS Version 10.14.1 (Build 18B75) Quartz PDFContext', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': "D:20190227140340Z00'00'", 'author': 'Andrew Mead', 'moddate': "D:20190227140340Z00'00'", 'source': './nodejs.pdf', 'total_pages': 125, 'page': 1, 'page_label': '2'}, page_content='Version 1.0 2 \nSection 1: Welcome ................................................................................................................... 8 \nSection 2: Installing and Exploring Node.js ......................................................

In [19]:
# Types of Text Splitters: https://python.langchain.com/docs/concepts/text_splitters/
# Chunking func
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

# applying chunck fun to docs
split_docs = text_splitter.split_documents(docs)
split_docs

[Document(metadata={'producer': 'macOS Version 10.14.1 (Build 18B75) Quartz PDFContext', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': "D:20190227140340Z00'00'", 'author': 'Andrew Mead', 'moddate': "D:20190227140340Z00'00'", 'source': './nodejs.pdf', 'total_pages': 125, 'page': 0, 'page_label': '1'}, page_content='A PDF Reference for        The Complete Node.js Dev Course                Version 3.0'),
 Document(metadata={'producer': 'macOS Version 10.14.1 (Build 18B75) Quartz PDFContext', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': "D:20190227140340Z00'00'", 'author': 'Andrew Mead', 'moddate': "D:20190227140340Z00'00'", 'source': './nodejs.pdf', 'total_pages': 125, 'page': 1, 'page_label': '2'}, page_content='Version 1.0 2 \nSection 1: Welcome ................................................................................................................... 8 \nSection 2: Installing and Exploring Node.js ......................................................

In [20]:
# Embedder function
embedder = OpenAIEmbeddings(
    model="text-embedding-3-large",
    api_key=os.getenv("OPEN_API_KEY")
)

In [21]:
# Storing in Vector DB
# connecting to Qdrant vector store (running locally through docker ~ docker compose -f docker-compose.db.yml up)
vector_store = QdrantVectorStore.from_documents(
    documents=[], # for the 1st time it will create
    url="http://localhost:6333", 
    collection_name="rag-nodejs", 
    embedding=embedder # openai embedder
)

# adding document(chunked)
vector_store.add_documents(documents=split_docs)

print('Injection done')

Injection done


In [2]:
user_query = "what is fs module in node"

#### Loading LLM Model to Generate 3 queries based on user's input

In [4]:
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who will take the user query and based on user query you will generate multiple relevant queries.

Generate 3 queries based on the user query

output format:
[
    "query1",
    "query2",
    "query3"
]
"""

In [5]:
# Init Openai client
client = OpenAI(api_key=os.getenv("OPEN_API_KEY"))

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: 1. What are the main functionalities provided by the fs module in Node.js?
2. How can the fs module be used in Node.js to read and write files?
3. Can you provide examples of how to use the fs module in Node.js to manipulate file systems?


In [10]:
multi_queries = response.choices[0].message.content.split("\n")
multi_queries

['1. What are the main functionalities provided by the fs module in Node.js?',
 '2. How can the fs module be used in Node.js to read and write files?',
 '3. Can you provide examples of how to use the fs module in Node.js to manipulate file systems?']

In [22]:
# Retrieving documents via similarity search
relevant_chunk1 = vector_store.similarity_search(multi_queries[0])
relevant_chunk2 = vector_store.similarity_search(multi_queries[1])
relevant_chunk3 = vector_store.similarity_search(multi_queries[2])

In [23]:
print("Chunk 1: ",relevant_chunk1)
print("Chunk 2: ",relevant_chunk2)
print("Chunk 3: ",relevant_chunk3)

Chunk 1:  [Document(metadata={'producer': 'macOS Version 10.14.1 (Build 18B75) Quartz PDFContext', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': "D:20190227140340Z00'00'", 'author': 'Andrew Mead', 'moddate': "D:20190227140340Z00'00'", 'source': './nodejs.pdf', 'total_pages': 125, 'page': 9, 'page_label': '10', '_id': '42482a4c-5f19-44f1-9f2b-9ed0de25ff53', '_collection_name': 'rag-nodejs'}, page_content="Importing Node.js Core Modules \nTo get started, let’s work with some built-in Node.js modules. These are modules that \ncome with Node, so there’s no need to install them. \nThe module system is built around the require function. This function is used to load in a \nmodule and get access to its contents. require is a global variable provided to all your \nNode.js scripts, so you can use it anywhere you like! \nLet’s look at an example. \nconst fs = require('fs') \n  \nfs.writeFileSync('notes.txt', 'I live in Philadelphia') \nThe script above uses require to load in the fs

### Creating unique chunks from 3 chunks

In [25]:
# Create a dictionary using document IDs as keys to ensure uniqueness
unique_docs = {}
for doc in relevant_chunk1 + relevant_chunk2 + relevant_chunk3:
    # Use the document's ID as the key - assumes each document has a unique ID
    doc_id = doc.metadata['_id']
    unique_docs[doc_id] = doc

# Convert back to a list if needed
unique_chunks = list(unique_docs.values())
print("Unique Chunks: ", len(unique_chunks))

Unique Chunks:  5


### Running LLM on user query and unique chunks thorugh context

In [28]:
# Feeding relevant chunk based on similarity seach to model's context
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who respond based on the avalable context

Context:
{unique_chunks}
"""

In [29]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: The fs module in Node.js is a built-in core module that provides functions for interacting with the file system. It allows you to perform various file operations like reading from files, writing to files, updating files, deleting files, and more.


## Method 2: Reciprocate Rank Fusion (Rank LLM Output)

### Parallel Query Fan out Retrieval (filter_unique_query)

![rag flow image](rank-llm-output.png "RAG FLOW")

In [34]:
# Instead of finding unique chunks, we will rank the relevant chunks

def rank_documents_rrf(chunk_lists, k=60):
    """
    Ranks documents using Reciprocal Rank Fusion.
    
    Args:
        chunk_lists: List of document lists [chunk1, chunk2, chunk3]
        k: Constant to prevent division by zero and smooth out scores (default: 60)
    
    Returns:
        List of ranked documents with their scores
    """
    # Track documents by ID for scoring
    doc_scores = {}
    doc_objects = {}
    
    # Process each result list
    for list_idx, doc_list in enumerate(chunk_lists):
        # Process each document in the list
        for rank, doc in enumerate(doc_list):
            doc_id = doc.metadata['_id']
            
            # Store the document object for later retrieval
            doc_objects[doc_id] = doc
            
            # Calculate reciprocal rank score: 1/(rank + k)
            # Rank is 0-indexed, so we add 1 to get traditional ranks
            rrf_score = 1.0 / (rank + 1 + k)
            
            # Initialize or update score
            if doc_id not in doc_scores:
                doc_scores[doc_id] = rrf_score
            else:
                doc_scores[doc_id] += rrf_score
    
    # Create result list with (document, score) pairs
    ranked_results = [(doc_objects[doc_id], score) for doc_id, score in doc_scores.items()]
    
    # Sort by score in descending order
    ranked_results.sort(key=lambda x: x[1], reverse=True)
    
    return ranked_results

# Usage
all_chunks = [relevant_chunk1, relevant_chunk2, relevant_chunk3]
ranked_docs = rank_documents_rrf(all_chunks)

# Print top 5 ranked documents with scores
for i, (doc, score) in enumerate(ranked_docs[:5]):
    print(f"Rank {i+1}: Document {doc.metadata['_id']} (Score: {score:.4f})")
    print(f"Page: {doc.metadata['page_label']}")
    print(f"Content preview: {doc.page_content[:100]}...\n")

Rank 1: Document 42482a4c-5f19-44f1-9f2b-9ed0de25ff53 (Score: 0.0492)
Page: 10
Content preview: Importing Node.js Core Modules 
To get started, let’s work with some built-in Node.js modules. These...

Rank 2: Document ea27758e-e311-4de5-bb48-05ad259459d6 (Score: 0.0484)
Page: 11
Content preview: Version 1.0 11 
• Node.js fs documentation 
Lesson 3: Importing Your Own Files 
Putting all your cod...

Rank 3: Document c05ca589-44cc-4e15-a531-c5ce814fae4c (Score: 0.0476)
Page: 10
Content preview: Version 1.0 10 
Section 3: Node.js Module System 
Lesson 1: Section Intro 
The best way to get start...

Rank 4: Document 4ad1969d-c4c0-4a8b-8d7e-4184d3587a59 (Score: 0.0312)
Page: 15
Content preview: Version 1.0 15 
nodemon app.js 
P.S. You can stop nodemon by using ctrl + c from the terminal! 
Link...

Rank 5: Document 9736419f-c9e4-450f-ba4b-5390f9499bf8 (Score: 0.0156)
Page: 2
Content preview: Lesson 5: Your First Node.js Script .................................................................

In [35]:
# Feeding relevant chunk based on similarity seach to model's context
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who respond based on the avalable context

Context:
{ranked_docs}
"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: The fs module in Node.js is a built-in module that provides functions to manipulate the file system. It allows you to perform various operations such as reading from and writing to files, creating new files, updating existing files, deleting files, and more.


## Method 3: Query Decomposition (Breaking Query)
- Less Abstract (sync process)
- use Method Chain Of Thought (CoT)

![rag flow image](query-decomposition.png "RAG FLOW")

In [None]:
multi_queries = response.choices[0].message.content.split(",\n")
multi_queries

['[\n    "what is fs module"',
 '    "what is fs in node"',
 '    "what is fs module in node"\n]']

In [37]:
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who will take the user query and based on user query, you will break down the query into 3 less abstract queries.

Example:
User Query:
"what is machine learning"

Break Down Query:
"what is machine"
"what is learning"
"what is machine learning"

output format:
[
    "query1",
    "query2",
    "query3"
]
"""

In [38]:
# Init Openai client
client = OpenAI(api_key=os.getenv("OPEN_API_KEY"))

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: [
    "what is fs module",
    "what is fs in node",
    "what is fs module in node"
]


In [49]:
multi_queries = response.choices[0].message.content.split(",\n")
multi_queries

['[\n    "what is fs module"',
 '    "what is fs in node"',
 '    "what is fs module in node"\n]']

In [57]:
llm_content = []
def sequential_query_decomposition(questions, retriever, llm_client):
    """
    Process a list of decomposed questions sequentially, where each answer
    provides context for the next question.
    """
    # Extract the questions without trying to parse as JSON
    parsed_questions = []
    
    for item in questions:
        # Clean up any extra characters and whitespace
        question = item.replace('[', '').replace(']', '').replace('\n', '').strip()
        if question.startswith('"') and question.endswith('"'):
            question = question[1:-1]
        if question:  # Only add non-empty questions
            parsed_questions.append(question)
    
    # Initialize with empty context
    current_context = ""
    
    # Process each question sequentially
    for i, question in enumerate(parsed_questions):
        print(f"Processing question {i+1}: {question}")
        
        # Retrieve relevant chunks for this question
        relevant_chunks = retriever.similarity_search(query=question)
        
        # Create system prompt with current context and new relevant chunks
        system_prompt = f"""
        You are a helpful AI Assistant who responds based on the available context.
        
        Previous information: {current_context}
        
        Current context:
        {relevant_chunks}
        """
        
        # Get response from LLM
        response = llm_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": question}
            ],
            temperature=0.7
        )
        
        # Extract the response content
        current_context = response.choices[0].message.content
        llm_content.append(current_context)
        print(f"Intermediate answer: {current_context}\n")
    
    return current_context

# Usage
questions = ["what is fs module",
             "what is fs in node",
            "what is fs module in node"]

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPEN_API_KEY"))

# Call the function with your questions, retriever, and client
final_answer = sequential_query_decomposition(questions, vector_store, client)
print(f"Final answer: {final_answer}")

Processing question 1: what is fs module
Intermediate answer: The fs module in Node.js is a built-in module that provides functions for interacting with the file system. It allows you to perform operations like reading from and writing to files, creating new files, updating file content, deleting files, and more. In the context provided, an example usage of the fs module is shown where the `writeFileSync` function is used to write a message to a file named `notes.txt`.

Processing question 2: what is fs in node
Intermediate answer: In Node.js, the `fs` module is a built-in module that provides functions for interacting with the file system. It allows you to perform various file operations like reading from and writing to files, creating new files, updating file content, deleting files, and more. The `fs` module is essential for working with files and directories in Node.js applications.

Processing question 3: what is fs module in node
Intermediate answer: The `fs` module in Node.js is

In [58]:
llm_content

['The fs module in Node.js is a built-in module that provides functions for interacting with the file system. It allows you to perform operations like reading from and writing to files, creating new files, updating file content, deleting files, and more. In the context provided, an example usage of the fs module is shown where the `writeFileSync` function is used to write a message to a file named `notes.txt`.',
 'In Node.js, the `fs` module is a built-in module that provides functions for interacting with the file system. It allows you to perform various file operations like reading from and writing to files, creating new files, updating file content, deleting files, and more. The `fs` module is essential for working with files and directories in Node.js applications.',
 'The `fs` module in Node.js is a built-in module that provides functions for interacting with the file system. It allows you to perform various file operations like reading from and writing to files, creating new file

In [59]:
# Feeding relevant chunk based on similarity seach to model's context
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who respond based on the avalable context

Context:
{llm_content}
"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: The fs module in Node.js is a built-in module that provides functions for interacting with the file system. It allows you to perform operations like reading from and writing to files, creating new files, updating file content, deleting files, and more. The fs module is essential for working with files and directories in Node.js applications.


## Method 4: HyDE (Hypothetical Document Embedding) [less abstract]

![rag flow image](hyde.png "RAG FLOW")

In [60]:
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who will take the user query and based on user query you will generate entire document on that topic.
"""

In [61]:
# Init Openai client
client = OpenAI(api_key=os.getenv("OPEN_API_KEY"))

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: The fs module in Node.js is a core module that provides a way to interact with the file system in a server-side environment. The 'fs' stands for file system, and this module allows you to work with files, directories, and file operations like reading, writing, updating, and deleting files.

Here are some common operations that you can perform using the fs module in Node.js:

1. Reading files: You can use functions like `fs.readFile()` or `fs.readFileSync()` to read the contents of a file.

2. Writing files: You can use functions like `fs.writeFile()` or `fs.writeFileSync()` to write data to a file.

3. Updating files: You can use functions like `fs.appendFile()` to add data to an existing file without overwriting it.

4. Deleting files: You can use the `fs.unlink()` function to delete a file from the file system.

5. Working with directories: You can create, read, update, and delete directories using functions like `fs.mkdir()

In [62]:
llm_doc = response.choices[0].message.content

In [63]:
relevant_chunk = vector_store.similarity_search(llm_doc)

In [65]:
# Feeding relevant chunk based on similarity seach to model's context
SYSTEM_PROMPT  = f"""
You are an helpful AI Assistant who respond based on the avalable context

Context:
{relevant_chunk}
"""

In [66]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    temperature=0.7
)

# Print the response
print("\nUser Query:", user_query)
print("\nAssistant Response:", response.choices[0].message.content)


User Query: what is fs module in node

Assistant Response: The fs module in Node.js is a built-in module that provides functions for interacting with the file system. It allows you to perform various operations such as reading from and writing to files, creating and deleting files, as well as manipulating file metadata.
