RAG Pipeline for Query Expansion

In [1]:
'''
    Purpose: Demonstrate Query Expansion in a Retrieval-Augmented Generation (RAG) pipeline using LangChain
'''

'\n    Purpose: Demonstrate Query Expansion in a Retrieval-Augmented Generation (RAG) pipeline using LangChain\n'

In [2]:
# Step 1: Import required modules

import os
from dotenv import load_dotenv
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA # For building a QA system that uses retrieval

In [3]:
# Step 2: Initialize the LLM

load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

# Creates an instance of the language model.
llm = ChatOpenAI(temperature=0.7)

  llm = ChatOpenAI(temperature=0.7)


In [4]:
# Step 3: Define the query expansion prompt
prompt = PromptTemplate(
    input_variables=['query'],
    template= '''
        You are a helpful assistant that expands user queries for better document retrieval.
Given the query: "{query}", generate 5 diverse and relevant expanded versions.
Include synonyms, paraphrases, and related subtopics.

Expanded queries:
1.
2.
3.
4.
5.
    '''
)

In [5]:
# Step 4: Build the query expansion chain
query_expansion_chain = LLMChain(llm=llm, prompt=prompt)


  query_expansion_chain = LLMChain(llm=llm, prompt=prompt)


In [6]:
# Step 5: Example user query
user_query = 'What are the health benefits of green tea?'


In [7]:
# Step 6: Run query expansion
expanded_output = query_expansion_chain.run(query=user_query)
expanded_output

  expanded_output = query_expansion_chain.run(query=user_query)


'1. "What are the advantages of consuming green tea for one\'s well-being?"\n2. "Can you explain the positive impacts of green tea on one\'s health?"\n3. "How does green tea contribute to improving overall health?"\n4. "What positive effects does green tea have on the body\'s wellness?"\n5. "What are some of the health-boosting properties found in green tea?"'

In [8]:
# Step 7: Clean and structure the output
expanded_queries = [
    line.strip() for line in expanded_output.split('\n')
    if line.strip()  and line[0].isdigit()
]
expanded_queries

['1. "What are the advantages of consuming green tea for one\'s well-being?"',
 '2. "Can you explain the positive impacts of green tea on one\'s health?"',
 '3. "How does green tea contribute to improving overall health?"',
 '4. "What positive effects does green tea have on the body\'s wellness?"',
 '5. "What are some of the health-boosting properties found in green tea?"']

In [9]:
print(f'Original Query:: {user_query}')
print()
print('Expanded Queries:')

for eq in expanded_queries:
    print(eq)

Original Query:: What are the health benefits of green tea?

Expanded Queries:
1. "What are the advantages of consuming green tea for one's well-being?"
2. "Can you explain the positive impacts of green tea on one's health?"
3. "How does green tea contribute to improving overall health?"
4. "What positive effects does green tea have on the body's wellness?"
5. "What are some of the health-boosting properties found in green tea?"


In [10]:
# Step 9: Set up embeddings + vector store (FAISS example)
embeddings = OpenAIEmbeddings()

# Build a small FAISS index from sample docs
docs = [
    'Green tea contains antioxidants that support heart health.',
    'It may help with weight loss by boosting metabolism.',
    'Green tea can lower cholesterol levels.'
]

# Converts text into numerical vectors (embeddings) for similarity search
vectorstore = FAISS.from_texts(docs, embeddings)

# Retrieve docs for each expanded query
for q in [user_query] + expanded_queries:
    results = vectorstore.similarity_search(q, k=2)
    print(f'\nResult for query: {q}')
    
    for r in results:
        print(f'- {r.page_content}')

vectorstore.save_local('faiss_index')

  embeddings = OpenAIEmbeddings()



Result for query: What are the health benefits of green tea?
- Green tea contains antioxidants that support heart health.
- Green tea can lower cholesterol levels.

Result for query: 1. "What are the advantages of consuming green tea for one's well-being?"
- Green tea contains antioxidants that support heart health.
- Green tea can lower cholesterol levels.

Result for query: 2. "Can you explain the positive impacts of green tea on one's health?"
- Green tea contains antioxidants that support heart health.
- Green tea can lower cholesterol levels.

Result for query: 3. "How does green tea contribute to improving overall health?"
- Green tea contains antioxidants that support heart health.
- Green tea can lower cholesterol levels.

Result for query: 4. "What positive effects does green tea have on the body's wellness?"
- Green tea contains antioxidants that support heart health.
- Green tea can lower cholesterol levels.

Result for query: 5. "What are some of the health-boosting proper

In [11]:
# Step 10: Create a retriever