## [`Retrieval-Augmented Generation` (RAG) using `LangChain`, `LlamaIndex` and `OpenAI`](https://medium.com/@prasadmahamulkar/introduction-to-retrieval-augmented-generation-rag-using-langchain-and-lamaindex-bd0047628e2a)

#### How does RAG work?

- **Indexing**
The indexing process is a crucial first step in data preparation for language models. Original data is cleaned, converted into standardized plain text, and segmented into smaller chunks for efficient processing. These chunks are transformed into vector representations through an embedding model, facilitating similarity comparisons during retrieval. The final index stores these text chunks and their vector embeddings, enabling efficient and scalable search capabilities.

- **Retrieval**
When a user asks a question, the system uses the encoding model from the indexing phase to transcode it. Next, it calculates similarity scores between the query vector and vectorized chunks within the indexed corpus. The system prioritizes and retrieves the top K chunks showing the highest similarity, using them as an expanded contextual basis to address the user’s request.

- **Generation**
The user’s question and chosen documents are combined into a clear prompt for a large language model. Then model crafts a response, adapting its approach based on task-specific criteria.

#### Basic RAG Using LangChain

In [None]:
!pip install sentence_transformers pypdf faiss-gpu
!pip install langchain langchain-openai

In [None]:
## Start by installing and loading all the necessary libraries.

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain

# For openai key
import os
os.environ["OPENAI_API_KEY"] = "Your Key"

In [None]:
## Load a PDF document using PyPDFLoader to extract text from PDF files.

loader = PyPDFLoader("/content/qlora_paper.pdf")
documents = loader.load()

In [None]:
## Use the TextSplitter to split the document into chunks.

text = RecursiveCharacterTextSplitter().split_documents(documents)

In [None]:
## Load an embedding model to convert text into numerical embeddings

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5", 
encode_kwargs={"normalize_embeddings": True})

In [None]:
## Create a Vector Store using FAISS to store embeddings and text chunks.

vectorstore = FAISS.from_documents(text, embeddings)

## Save these embeddings for later use.

vectorstore.save_local("vectorstore.db")

In [None]:
## Create a retriever using the vector store. Establishes the foundation for information retrieval based on vector similarities.

retriever = vectorstore.as_retriever()

In [None]:
## Load the Language Model (LLM) to use for retrieval and create a document chain.

llm = ChatOpenAI(model_name="gpt-3.5-turbo")

template = """
You are an assistant for question-answering tasks.
Use the provided context only to answer the following question:

<context>
{context}
</context>

Question: {input}
"""

prompt = ChatPromptTemplate.from_template(template)

doc_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, doc_chain)

In [None]:
## Create a retrieval chain by combining the retriever and document chain.
## Invoke the chain with a user query to get a relevant response.

response = chain.invoke({"input": "what is Qlora?"})

response['answer']

#### Basic RAG with `LlamaIndex`

In [None]:
## Install and load all the necessary libraries from llamaIndex.

!pip install -U llama_hub llama_index pypdf

from llama_index import SimpleDirectoryReader
from llama_index import Document
from llama_index.node_parser import SimpleNodeParser
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI
from llama_index import ServiceContext
from llama_index import VectorStoreIndex
from llama_index.query_engine import RetrieverQueryEngine

# For openai key
import os
os.environ["OPENAI_API_KEY"] = "Your Key"

In [None]:
## Load a PDF document and combine each page of the document into one document object.

documents = SimpleDirectoryReader(
input_files=["/content/qlora_paper.pdf"]).load_data()

doc_text = "\n\n".join([d.get_content() for d in documents])
text= [Document(text=doc_text)]

In [None]:
## Split the document into text chunks. Reset default node IDs for better understanding.

node_parser = SimpleNodeParser.from_defaults()
base_nodes = node_parser.get_nodes_from_documents(text)

for idx, node in enumerate(base_nodes):
    node.id_ = f"node-{idx}"

In [None]:
## Load an embedding model and language model (LLM)

embed_model = resolve_embed_model("local:BAAI/bge-small-en")

llm = OpenAI(model="gpt-3.5-turbo")

In [None]:
## Create a service by bundling LLM and embedding model for the indexing and querying stage.

service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

In [None]:
## Create and store embeddings of nodes (chunks) and store them in the vector store index.

index = VectorStoreIndex(base_nodes, service_context=service_context)

In [None]:
## Create a retriever using the vector store index to retrieve relevant information for user queries.

retriever = index.as_retriever()

In [None]:
## Set up a query engine by combining the retriever and service context, and add a user query to get a relevant response.

query_engine = RetrieverQueryEngine.from_args(retriever, service_context=service_context)

response = query_engine.query("What is Qlora?")
print(str(response))

#### Advanced RAG Using `LangChain`

In [None]:
## Use the TextSplitter to split the document into parent and child chunks.

from langchain.text_splitter import RecursiveCharacterTextSplitter

## Create the parent documents - The big chunks
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

## Create the child documents - The small chunks
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

from langchain.storage import InMemoryStore
store = InMemoryStore()

In [None]:
## Create a Vector Store using Chromadb to store new embeddings and text chunks.

from langchain.vectorstores import Chroma

vectorstore = Chroma(collection_name="split_parents", 
embedding_function=embeddings)

In [None]:
## Create a Parent doc retriever then, add a document to the retriever.

from langchain.retrievers import ParentDocumentRetriever
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

retriever.add_documents(documents)

In [None]:
## Create a retrieval chain, similar to the previous chain, and invoke it with a user query to get a response.

response = chain.invoke({"input": "what is Qlora?"})

response['answer']

#### Advanced RAG Using `LlamaIndex`

In [None]:
## Set child chunk sizes (128, 256, 512) in `sub_chunk_sizes` and create parsers (`sub_node_parsers`) for child chunks

sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
    SimpleNodeParser.from_defaults(chunk_size=c,chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in base_nodes:
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        all_nodes.extend(sub_inodes)

    # also add original node to node
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)

all_nodes_dict = {n.node_id: n for n in all_nodes}

In [None]:
## Create embeddings of all nodes (which contain both parent and child nodes) and store them in the vector store index.

index = VectorStoreIndex(all_nodes, service_context=service_context)

In [None]:
## 

vector_retriever_chunk = index.as_retriever()

from llama_index.retrievers import RecursiveRetriever
retriever_chunk = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever_chunk},
    node_dict=all_nodes_dict,
    verbose=True,
)

In [None]:
## Set up a query engine by combining the retriever and service context, and add a user query.

from llama_index.query_engine import RetrieverQueryEngine

query_engine_chunk = RetrieverQueryEngine.from_args(retriever_chunk, 
service_context=service_context)

response = query_engine_chunk.query("What is Qlora?")
print(str(response))