##1. Documents preprocessing and chunking

###Install Required Libraries

In [11]:
!pip install langchain==0.1.13 langchain-community langchain-google-genai sentence-transformers chromadb pypdf



###Mount Google Drive

In [12]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


###Load and Chunk PDF Documents

In [13]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFDirectoryLoader("/content/drive/MyDrive/RAG_AI_Papers")  # Replace with your folder path
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

print(f"Loaded {len(docs)} pages and split into {len(chunks)} chunks.")

Loaded 109 pages and split into 1422 chunks.


#2. Create Embeddings and Vector Store

In [14]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

vectorstore = Chroma.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

#3. End to End QA chatbot

###Set Google Gemini API Key

In [15]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyCssqMHotehgmDxMyl1ldr0rQrGcPjPsyM"

###Load Gemini Model and Build RAG Chain

In [18]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.2)

template = """
<context>
{context}
</context>

You are an advanced AI research assistant trained to analyze and synthesize information from technical academic documents. Based on the provided context extracted from AI research papers, generate a clear, accurate, and well-structured answer to the user’s question. Ensure your response is grounded in the retrieved context and explain concepts with precision and relevance. Avoid speculation and only include information supported by the source material.

Question: {query}
"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

#4. Test the RAG system

In [19]:
questions = [
    "What are the main components of a RAG model, and how do they interact?",
    "What are the two sub-layers in each encoder layer of the Transformer model?",
    "Explain how positional encoding is implemented in Transformers and why it is necessary.",
    "Describe the concept of multi-head attention in the Transformer architecture. Why is it beneficial?",
    "What is few-shot learning, and how does GPT-3 implement it during inference?"
]

for question in questions:
    print(f"\nQuestion: {question}")
    print("Answer:", rag_chain.invoke(question))


Question: What are the main components of a RAG model, and how do they interact?
Answer: The provided text focuses on the open-sourcing of code for RAG models and mentions their performance on various datasets.  However, it does *not* describe the internal components or interactions within a RAG model itself.  Therefore, I cannot answer your question based on this context alone.  The provided text only indicates that Hugging Face helped open-source code to run RAG models, not the architecture of the models themselves.


Question: What are the two sub-layers in each encoder layer of the Transformer model?
Answer: Based on the provided text, each encoder layer in the Transformer model has two sub-layers:

1. A multi-head self-attention mechanism.
2. A simple, position-wise fully connected feed-forward network (the full description is cut off in the provided text snippets).


Question: Explain how positional encoding is implemented in Transformers and why it is necessary.
Answer: In Tran