# Retrieval-Augmented Generation (RAG) Pipeline

This notebook demonstrates the setup and implementation of a Retrieval-Augmented Generation (RAG) pipeline using LangChain, Hugging Face models, and Groq API. The RAG pipeline combines retrieval-based methods with generative models to answer queries based on a given context. The Rag is used to develop a system that, for each natural language question in the BIRD benchmark, identifies 
the Source Tables (STs) that contain data relevant to answering the question; 
results are evaluated against the BIRD ground truth computing the overall recall, precision, and F1-score for detected STs. 

## Setup and Dependencies

First, we import the necessary libraries and set up the environment.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
import os
from groq import Groq
from langchain_groq import ChatGroq
from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline
import torch
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.schema import StrOutputParser
from langchain.retrievers import BM25Retriever, EnsembleRetriever

# Set up the LLM
for this project 2 different llms have been used: 
<ul>
<li>phi-2 (2.7b): a smaller model run locally, Phi-2 is a Transformer with 2.7 billion parameters.</li>
<li>llama-3.1-8b-instant: accessed through API,  Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture with 8 bilion parameters.</li>
<ul>

## local setup

In [None]:
# Load the model and tokenizer from your HDD
save_directory = "filepath\phi-2" #use the filepath to your model's location
model = AutoModelForCausalLM.from_pretrained(save_directory)
tokenizer = AutoTokenizer.from_pretrained(save_directory)

# Move the model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

### Create a Hugging Face Pipeline

We create a Hugging Face pipeline for text generation using the loaded model and tokenizer.

In [None]:
# Create a Hugging Face pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    #temperature=0,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

### Wrap the Pipeline in LangChain's HuggingFacePipeline

We wrap the Hugging Face pipeline in LangChain's `HuggingFacePipeline` to integrate it with the LangChain framework.

In [None]:
# Wrap the pipeline in LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe)

### API Setup

Alternatively, we can set up an LLM using the Groq API. This requires an API key and initializes the `ChatGroq` model.

In [None]:
#get model from api
#  Set up Groq API
os.environ["GROQ_API_KEY"] = "your API key"  # Replace with your Groq API key
# Initialize Groq LLM
llm = ChatGroq(
    model_name="llama-3.1-8b-instant",
    temperature=0,
)

In [None]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

## Load and Split Documents

We load the documents from a text file and split them into smaller chunks for processing. the document contains the description (in natural language) of the database we are using. This data is split into smaller documents so that the rag can use only usefull parts as context to execute our task

In [None]:
#Load and split documents
loader = TextLoader("../Data/schema_descriptions.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

Created a chunk of size 2131, which is longer than the specified 1000
Created a chunk of size 1013, which is longer than the specified 1000
Created a chunk of size 1646, which is longer than the specified 1000
Created a chunk of size 1981, which is longer than the specified 1000
Created a chunk of size 1282, which is longer than the specified 1000
Created a chunk of size 1034, which is longer than the specified 1000
Created a chunk of size 3055, which is longer than the specified 1000
Created a chunk of size 3438, which is longer than the specified 1000
Created a chunk of size 1279, which is longer than the specified 1000
Created a chunk of size 1158, which is longer than the specified 1000
Created a chunk of size 1283, which is longer than the specified 1000


## Create Embeddings

We create embeddings for the document chunks using the <b>"all-MiniLM-L6-v2"</b> model.
This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

In [None]:
# Create embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(texts, embedding_model)

## Define Few-Shot Examples

We define a set of few-shot examples to guide the model in answering queries based on the context.
These examples serve as a demonstration of the desired input-output format, helping the model understand the task without requiring extensive fine-tuning.

In [None]:
# Define few-shot examples
examples = [
    {
        "query": "What is the unabbreviated mailing street address of the school with the highest FRPM count for K-12 students?",
        "answer": "[schools, frpm]"
    },
    {
        "query": "Who is the top spending customer and how much is the average price per single item purchased by this customer? What currency was being used?",
        "answer": "[customers, transactions_1k, yearmonth]"
    },
    {
        "query": "What is the amount spent by customer 38508 at the gas stations? How much had the customer spent in January 2012?",
        "answer": "[transactions_1k, gasstations, yearmonth]"
    },
    {
        "query": "Please list the lowest three eligible free rates for students aged 5-17 in continuation schools.",
        "answer": "[frpm]"
    },
]

## Define Few-Shot Examples with Chain-of-Thought Reasoning

In this section, we define a set of few-shot examples to guide the model in answering queries based on the context. Additionally, we incorporate **chain-of-thought (CoT)** reasoning to help the model break down the problem step-by-step and generate more accurate and logical responses.

This combination of few-shot examples and chain-of-thought reasoning helps the model generalize better and produce more accurate results.

In [None]:
# Define the example prompt
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template="Query: {query}\nAnswer: {answer}",
)

# Define the FewShotPromptTemplate with updated instructions
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="""Return ONLY the names of the tables from the context that could be useful for the query. 
Follow these steps:
1. Check if the table name or any of its columns could contain information relevant to the query.
2. If the table is relevant, include its name in a list. Do not include column names or explanations.
3. Return the result as a Python list of table names in the format: [tablename1, tablename2].

Context:
{context}

Examples:
""",
    suffix="\n\nQuery: {query}\nAnswer:",
    input_variables=["context", "query"],
)

## Define Retrieval Options

We define different retrieval options:<ul><li>single semantic retriever:uses the embeddings to return a set of documents that are both relevant to the query and diverse, improving the quality of retrieved information for downstream tasks like question answering</li><li> ensemble retriever:  combine a sparse retriever with a dense retriever (embedding), because their strengths are complementary. The sparse retriever is good at finding relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic similarity.</li><li> two-step retrieval process: first retrives all the documents relevant to the query, for each initially retrieved document, it finds additional similar documents</li></ul>

In [None]:
#OPTION 1: single semantic retriver
retriever=vector_store.as_retriever(search_type="mmr", search_kwargs={"score_threshold": 0.7})  # Retrieve all docs with similarity over 0.7
#retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"score_threshold": 0.5})
#retriever=vector_store.as_retriever(search_type="mmr", search_kwargs={"k":3})

In [None]:
#OPTION 2: ensamble retriever, hybrid semantic and kyeword based retriver

#semantic retriever
retriever=vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 3})
# Create a keyword-based retriever (BM25)
bm25_retriever = BM25Retriever.from_documents(texts)
bm25_retriever.k = 3  # Number of documents to retrieve
# Combine retrievers with EnsembleRetriever
ensemble_retriever = EnsembleRetriever(
    retrievers=[vector_store.as_retriever(), bm25_retriever],
    weights=[0.5, 0.5],
)

In [None]:
#OPTION 3:
#Define the two-step retrieval process
def two_step_retrieval(query):
    # First retrieval: Retrieve documents relevant to the query
    initial_docs = ensemble_retriever.get_relevant_documents(query)

    # Second retrieval: Retrieve additional documents similar to the initial ones
    additional_docs = []
    for doc in initial_docs:
        # Find similar documents based on the content of the initial document
        similar_docs = vector_store.similarity_search(doc.page_content, k=2)  # Adjust k as needed
        additional_docs.extend(similar_docs)

    # Combine and deduplicate documents based on page_content
    unique_docs = []
    seen_contents = set()
    for doc in initial_docs + additional_docs:
        if doc.page_content not in seen_contents:
            unique_docs.append(doc)
            seen_contents.add(doc.page_content)

    return unique_docs

## Define the RAG Chain

We define the RAG chain using the chosen retriever and the few-shot prompt template.

In [None]:
#chain only semantic retriver

rag_chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | RunnableLambda(lambda x: {"context": "\n\n".join([doc.page_content for doc in x["context"]]), "query": x["query"]})
    | few_shot_prompt
    | llm
    | StrOutputParser()
)


In [None]:
#chain hybrid semantich/keyword based
rag_chain = (
    {"context": ensemble_retriever, "query": RunnablePassthrough()}
    | RunnableLambda(lambda x: {"context": "\n\n".join([doc.page_content for doc in x["context"]]), "query": x["query"]})
    | few_shot_prompt
    | llm
    | StrOutputParser()
)


In [None]:
#chain for 2 step retrival
rag_chain = (
    {"context": RunnableLambda(two_step_retrieval), "query": RunnablePassthrough()}
    | RunnableLambda(lambda x: {"context": "\n\n".join([doc.page_content for doc in x["context"]]), "query": x["query"]})
    | few_shot_prompt
    | llm
    | StrOutputParser()
)

In [None]:
#example query
query = "What is the highest eligible free rate for K-12 students in the schools in Alameda County?"
#query="Among the account opened, how many female customers who were born before 1950 and stayed in Sokolov?"
response = rag_chain.invoke(query)
print(response)

[schools, frpm]


In [None]:
response[response.rfind("["):response.rfind("]")+1]

'[schools, frpm]'