#  Gemini RAG Knowledge Engine
### A Full-Stack Retrieval-Augmented Generation (RAG) Application

**Author:** Karthik K
**Tech Stack:** Google Gemini 1.5 Flash, LangChain, ChromaDB

**Project Description:**
This notebook builds an end-to-end RAG pipeline. It ingests custom PDF/TXT documents, chunks them, embeds them into a vector database, and uses the Gemini 1.5 Flash model to answer user queries based specifically on that data. The final output is a deployed Streamlit web application.

## **Environment Setup**
Installing the necessary libraries for the RAG pipeline.
* `langchain`: Orchestration framework.
* `chromadb`: Vector database for storing document embeddings.
* `sentence-transformers`: Open-source embedding model.
* `google-generativeai`: SDK for Gemini 1.5 Flash.

In [1]:
!pip install chromadb sentence-transformers

Collecting chromadb
  Downloading chromadb-1.3.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.38.0-py3-none-any.whl.metadata (2.4 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading PyPika-0.48.9.tar.gz (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m

In [2]:
!pip install -U langchain-google-genai google-generativeai

Collecting langchain-google-genai
  Downloading langchain_google_genai-3.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<1.0.0,>=0.9.0 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.9.0-py3-none-any.whl.metadata (10 kB)
Collecting langchain-core<2.0.0,>=1.0.5 (from langchain-google-genai)
  Downloading langchain_core-1.1.0-py3-none-any.whl.metadata (3.6 kB)
INFO: pip is looking at multiple versions of google-generativeai to determine which version is compatible with other requirements. This could take a while.
Collecting google-generativeai
  Downloading google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
  Downloading google_generativeai-0.8.4-py3-none-any.whl.metadata (4.2 kB)
  Downloading google_generativeai-0.8.3-py3-none-any.whl.metadata (3.9 kB)
  Downloading google_generativeai-0.8

In [3]:
!pip install google-generativeai



In [4]:
!pip install google-genai



In [5]:
pip install -U langchain-google-genai

Collecting langchain-google-genai
  Using cached langchain_google_genai-3.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting google-ai-generativelanguage<1.0.0,>=0.9.0 (from langchain-google-genai)
  Using cached google_ai_generativelanguage-0.9.0-py3-none-any.whl.metadata (10 kB)
Collecting langchain-core<2.0.0,>=1.0.5 (from langchain-google-genai)
  Using cached langchain_core-1.1.0-py3-none-any.whl.metadata (3.6 kB)
Downloading langchain_google_genai-3.1.0-py3-none-any.whl (55 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.6/55.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading google_ai_generativelanguage-0.9.0-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-1.1.0-py3-none-any.whl (473 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m473.8/473.8 kB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstal

# langchain setup

In [6]:
!pip install -U langchain

Collecting langchain
  Downloading langchain-1.0.8-py3-none-any.whl.metadata (4.9 kB)
Collecting langgraph<1.1.0,>=1.0.2 (from langchain)
  Downloading langgraph-1.0.3-py3-none-any.whl.metadata (7.8 kB)
Collecting langgraph-checkpoint<4.0.0,>=2.1.0 (from langgraph<1.1.0,>=1.0.2->langchain)
  Downloading langgraph_checkpoint-3.0.1-py3-none-any.whl.metadata (4.7 kB)
Collecting langgraph-prebuilt<1.1.0,>=1.0.2 (from langgraph<1.1.0,>=1.0.2->langchain)
  Downloading langgraph_prebuilt-1.0.5-py3-none-any.whl.metadata (5.2 kB)
Collecting langgraph-sdk<0.3.0,>=0.2.2 (from langgraph<1.1.0,>=1.0.2->langchain)
  Downloading langgraph_sdk-0.2.9-py3-none-any.whl.metadata (1.5 kB)
Collecting ormsgpack>=1.12.0 (from langgraph-checkpoint<4.0.0,>=2.1.0->langgraph<1.1.0,>=1.0.2->langchain)
  Downloading ormsgpack-1.12.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.2 kB)
Downloading langchain-1.0.8-py3-none-any.whl (93 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [7]:
!pip install -U langchain langchain-google-genai



In [8]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_community)
  Downloading langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain_community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting langchain-text-splitters<2.0.0,>=1.0.0 (from langchain-classic<2.0.0,>=1.0.0->langchain_community)
  Downloading langchain_text_splitters-1.0.0

In [9]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-6.4.0-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.4.0-py3-none-any.whl (329 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/329.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m327.7/329.5 kB[0m [31m13.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m329.5/329.5 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-6.4.0


# Necessary  Imports

In [10]:
# Chains
from langchain_classic.chains import RetrievalQA
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_classic.memory.buffer import ConversationBufferMemory

In [11]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma


import os
from google.colab import userdata


from langchain_google_genai import ChatGoogleGenerativeAI

## **The Main Application Logic**
This cell contains the core logic for the application. It handles:
1.  **Authentication:** Loading API keys securely.
2.  **Ingestion:** Loading text/PDF documents from the data directory.
3.  **Indexing:** Splitting text into chunks and creating vector embeddings.
4.  **Retrieval Chain:** Connecting the Gemini LLM to the Vector Store.
5.  **Testing:** Running a sample query to verify the pipeline works.

In [12]:
from google.colab import userdata
from langchain_google_genai import ChatGoogleGenerativeAI

GOOGLE_API_KEY = userdata.get('GEMINI_API_KEY')

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key=GOOGLE_API_KEY
)

In [13]:
messages = [

    (

        "system",

        "You are a helpful assistant that translates English to French. Translate the user sentence.",

    ),

    ("human", "I love programming."),

]

ai_msg = llm.invoke(messages)

ai_msg

AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 'google_genai'}, id='lc_run--0331a14b-0ab9-4a50-919d-cb0a0a63ba17-0', usage_metadata={'input_tokens': 21, 'output_tokens': 7, 'total_tokens': 28, 'input_token_details': {'cache_read': 0}})

In [14]:
from google.colab import drive
import os

#Mount Google Drive
drive.mount('/content/drive')

#Move to project folder
%cd /content/drive/My Drive/RAG-Chatbot-Project/

#Verification
print("Current folder:", os.getcwd())
print("Files in here:", os.listdir())

Mounted at /content/drive
/content/drive/My Drive/RAG-Chatbot-Project
Current folder: /content/drive/My Drive/RAG-Chatbot-Project
Files in here: ['.git', 'README.md', 'data', 'chroma_db', '.ipynb_checkpoints', '1706.03762v7.pdf', 'Gemini-RAG-Knowledge-Engine.ipynb']


In [15]:
DATA_PATH = './data'

# Load documents
loader = DirectoryLoader(DATA_PATH, glob="*.txt", loader_cls=TextLoader)
documents = loader.load()

print(f"Loaded {len(documents)} document(s).")

Loaded 2 document(s).


In [16]:
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

print(f"Split into {len(chunks)} chunks.")

Split into 2 chunks.


In [17]:
# Initialize the embedding model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = HuggingFaceEmbeddings(model_name=model_name)

persist_directory = './chroma_db'

# Create the vector database
vectorstore = Chroma.from_documents(
    chunks,
    embedding_model,
    persist_directory=persist_directory
)

print("Success: Vector store created.")

  embedding_model = HuggingFaceEmbeddings(model_name=model_name)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Success: Vector store created.


In [25]:
import os
from google.colab import files
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_classic.memory import ConversationBufferMemory

# 1. Upload a file
print("Please upload a PDF or Text file:")
uploaded = files.upload()

# 2. Process the file
if uploaded:
    for filename in uploaded.keys():
        print(f"\nProcessing {filename}...")

        # Save file temporarily
        file_path = f"./{filename}"
        with open(file_path, "wb") as f:
            f.write(uploaded[filename])

        # Select loader
        if filename.endswith(".pdf"):
            loader = PyPDFLoader(file_path)
        else:
            loader = TextLoader(file_path)

        new_docs = loader.load()
        print(f"Loaded {len(new_docs)} pages/documents.")

        # Split text
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        new_chunks = text_splitter.split_documents(new_docs)
        print(f"Split into {len(new_chunks)} chunks.")

        # 3. Add to Database
        vectorstore.add_documents(new_chunks)
        print(f"✅ Successfully added {filename} to the database!")

    # ---------------------------------------------------------
    # 4. REFRESH THE BRAIN (Happens AFTER upload)
    # ---------------------------------------------------------
    print("🔄 Refreshing Chatbot Brain...")

    # Define Memory
    memory = ConversationBufferMemory(
        memory_key="chat_history",
        return_messages=True,
        output_key='answer'
    )

    # Build the Conversational Chain
    qa_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
        memory=memory,
        return_source_documents=True,
        verbose=False
    )

    print("🚀 Chatbot is updated and ready for questions!")
else:
    print("No file uploaded.")

Please upload a PDF or Text file:


Saving 1706.03762v7.pdf to 1706.03762v7.pdf

Processing 1706.03762v7.pdf...
Loaded 15 pages/documents.
Split into 49 chunks.
✅ Successfully added 1706.03762v7.pdf to the database!
🔄 Refreshing Chatbot Brain...
🚀 Chatbot is updated and ready for questions!


In [27]:
# Question 1: Initial Context
q1 = "What is the Transformer?"
print(f"👤 User: {q1}")
result1 = qa_chain.invoke({"question": q1})
print(f"🤖 Bot: {result1['answer']}\n")

# Question 2: Follow-up (Using "It")
# The bot must know that "It" refers to the Transformer from Q1
q2 = "Does it use recurrent layers?"
print(f"👤 User: {q2}")
result2 = qa_chain.invoke({"question": q2})
print(f"🤖 Bot: {result2['answer']}")

# --- Cite Sources (The Professional Touch) ---
print("\n--- 📄 Citations ---")
for doc in result2['source_documents']:
    # Get source name and page number if available
    source_name = doc.metadata.get('source', 'Unknown file')
    page_num = doc.metadata.get('page', 'Unknown page')
    print(f"- Found in: {source_name} (Page {page_num})")

👤 User: What is the Transformer?
🤖 Bot: The Transformer is a model architecture that uses stacked self-attention and point-wise, fully connected layers for both its encoder and decoder.

Specifically, the encoder is composed of a stack of N=6 identical layers. Each of these layers has two sub-layers:
1.  A multi-head self-attention mechanism.
2.  A simple, position-wise fully connected feed-forward network.

Around each of these two sub-layers, the Transformer employs a residual connection followed by layer normalization. The output of each sub-layer is calculated as LayerNorm(x + Sublayer(x)). To facilitate these residual connections, all sub-layers in the model, as well as the embedding layers, produce outputs of dimension dmodel = 512.

👤 User: Does it use recurrent layers?
🤖 Bot: No, the provided text states that the Transformer uses "stacked self-attention and point-wise, fully connected layers" for both the encoder and decoder. It describes the encoder's sub-layers as a "multi-he

In [29]:
from google.colab import userdata



GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')

!git add .

!git commit -m "Completed the RAG pipeline and done with testing !"

!git push https://{GITHUB_TOKEN}@github.com/karthik-k11/RAG-Chatbot-Project.git

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@2d324b33e286.(none)')
Everything up-to-date
