##1. Documents preprocessing and chunking

###Install Required Libraries

In [20]:
!pip install langchain==0.1.13 langchain-community langchain-google-genai sentence-transformers chromadb pypdf



###Mount Google Drive

In [21]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


###Load and Chunk PDF Documents

In [22]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFDirectoryLoader("/content/drive/MyDrive/RAG_AI_Papers")  # Replace with your folder path
docs = loader.load()

for i, doc in enumerate(docs):
  source = doc.metadata.get('source',f'doc_{i}.pdf')
  doc.metadata['filename'] = source.split('/')[-1]
  doc.metadata['page'] = doc.metadata.get('page', i + 1)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

print(f"Loaded {len(docs)} pages and split into {len(chunks)} chunks.")

Loaded 109 pages and split into 1422 chunks.


#2. Create Embeddings and Vector Store

In [35]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

#3. End to End QA chatbot

###Set Google Gemini API Key

In [36]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyCssqMHotehgmDxMyl1ldr0rQrGcPjPsyM"

###Load Gemini Model and Build RAG Chain

In [37]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
# from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
# from langchain.schema.output_parser import StrOutputParser

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.2)

def prepare_context_with_sources(documents):
  context_blocks = []
  source_citations = set()

  for doc in documents:
    filename = doc.metadata.get("filename", "unknown_file")
    page = doc.metadata.get("page", "N/A")
    content = doc.page_content.strip().replace("\n", " ")
    # content = doc.page_content.strip().replace("\n", " ")

    context_blocks.append(f"[{filename}, Page {page}]: {content}")
    source_citations.add((filename, page))

  return "\n\n".join(context_blocks), source_citations

template = """
<context>
{context}
</context>

You are an AI assistant answering questions based on academic papers. Answer the following question truthfully and clearly using only the above context. Do not hallucinate or make up information.

Question: {query}
"""

prompt = ChatPromptTemplate.from_template(template)

def rag_with_sources(query):
  docs = retriever.get_relevant_documents(query)
  context, sources = prepare_context_with_sources(docs)

  inputs = {"context": context, "query": query}
  answer = llm.invoke(prompt.format_prompt(**inputs).to_messages())

  formatted_sources = "\n" + "\n".join([f"[{file}, {page}]" for file, page in sources])
  result = f"question: {query}\nanswer: {answer.content.strip()}\n{formatted_sources}"
  return result

#4. Test the RAG system

###Sample Questions

In [38]:
questions = [
    "What are the main components of a RAG model, and how do they interact?",
    "What are the two sub-layers in each encoder layer of the Transformer model?",
    "Explain how positional encoding is implemented in Transformers and why it is necessary.",
    "Describe the concept of multi-head attention in the Transformer architecture. Why is it beneficial?",
    "What is few-shot learning, and how does GPT-3 implement it during inference?"
]

for q in questions:
    print(rag_with_sources(q))
    print("\n" + "="*100 + "\n")

question: What are the main components of a RAG model, and how do they interact?
answer: The provided text does not describe the components of a RAG model or how they interact.  It only mentions that code to run RAG models has been open-sourced by HuggingFace and provides links to this code and a demo.

[unknown_file, 9]
[2005.11401v4.pdf, 9]
[unknown_file, 1]


question: What are the two sub-layers in each encoder layer of the Transformer model?
answer: The first sub-layer is a multi-head self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward network.

[1706.03762v7.pdf, 2]
[unknown_file, 2]


question: Explain how positional encoding is implemented in Transformers and why it is necessary.
answer: The provided text states that positional encodings are added to the input embeddings at the bottom of the encoder and decoder stacks in Transformers.  These encodings have the same dimension as the embeddings and are summed with them.  The reason for

### Your Questions

In [39]:
import sys

while True:
    user_input = input("Ask a question (or type 'exit'): ")
    if user_input.lower() == "exit":
        print("Exiting Q&A.")
        break
    if user_input.strip() == "":
        continue
    result = rag_with_sources(user_input)
    print("\n" + result + "\n")


Ask a question (or type 'exit'): what is rag

question: what is rag
answer: Based on the provided text, RAG is a technology that produces more factual generations and offers more control and interpretability.  It can be used in various scenarios to benefit society, such as by incorporating a medical index to answer open-domain questions on medical topics.  The text also mentions that HuggingFace helped open-source code to run RAG models.

[unknown_file, 9]
[2005.11401v4.pdf, 9]

Ask a question (or type 'exit'): exit
Exiting Q&A.
