
# NCERT PDF Chatbot

**Team:** Team-6(TILAK AND ASHMAN)

**Description:** This chatbot uses a RAG pipeline on an NCERT PDF file using Gemini Pro.  

### Instructions:
- Upload a PDF
- Ask any question strictly from the content to get an accurate answer.


In [2]:
# Installing required dependencies
!pip install streamlit llama-index google-generativeai
!npm install -g localtunnel


Collecting streamlit
  Downloading streamlit-1.45.1-py3-none-any.whl.metadata (8.9 kB)
Collecting llama-index
  Downloading llama_index-0.12.42-py3-none-any.whl.metadata (12 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.11-py3-none-any.whl.metadata (439 bytes)
Collecting llama-index-cli<0.5,>=0.4.2 (from llama-index)
  Downloading llama_index_cli-0.4.3-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-core<0.13,>=0.12.42 (from llama-index)
  Downloading llama_index_core-0.12.42-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.4,>

In [3]:
# Upload PDF files from your local machine
from google.colab import files
uploaded = files.upload()


Saving ncert.pdf to ncert.pdf


In [4]:
!mkdir -p data/ncert
!mv *.pdf data/ncert/


In [1]:
# Installing required dependencies
!pip install llama-index==0.10.68 llama-index-core==0.10.68.post1
!pip install llama-index-readers-file llama-index-embeddings-google-genai
!pip install llama-index-llms-google-genai llama-index-retrievers-bm25
!pip install python-dotenv google-generativeai deepeval


Collecting llama-index-core==0.10.68.post1
  Using cached llama_index_core-0.10.68.post1-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-readers-file<0.2.0,>=0.1.4 (from llama-index==0.10.68)
  Using cached llama_index_readers_file-0.1.33-py3-none-any.whl.metadata (5.4 kB)
Collecting pypdf<5.0.0,>=4.0.1 (from llama-index-readers-file<0.2.0,>=0.1.4->llama-index==0.10.68)
  Using cached pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)
Using cached llama_index_core-0.10.68.post1-py3-none-any.whl (1.6 MB)
Using cached llama_index_readers_file-0.1.33-py3-none-any.whl (38 kB)
Using cached pypdf-4.3.1-py3-none-any.whl (295 kB)
Installing collected packages: pypdf, llama-index-core, llama-index-readers-file
  Attempting uninstall: pypdf
    Found existing installation: pypdf 5.6.0
    Uninstalling pypdf-5.6.0:
      Successfully uninstalled pypdf-5.6.0
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.12.42
    Uninstalling llama-index-core-0.

In [2]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyDJlh2Jx3rUbQGC-lBftAv33qN4aUH9GEc"


In [3]:
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
from llama_index.core import Settings

Settings.embed_model = GoogleGenAIEmbedding(
    model_name="text-embedding-004",
    api_key=os.environ["GOOGLE_API_KEY"]
)


In [4]:
# Reading documents from the uploaded folder
from llama_index.readers.file import PDFReader
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

file_extractor = {".pdf": PDFReader()}

documents = SimpleDirectoryReader(
    input_dir="data/ncert",
    file_extractor=file_extractor
).load_data()

parser = SentenceSplitter(chunk_size=750, chunk_overlap=150)
nodes = parser.get_nodes_from_documents(documents)


In [5]:
# Creating a vector index for retrieval
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex(nodes)


In [6]:
from llama_index.retrievers.bm25 import BM25Retriever

bm25_retriever = BM25Retriever.from_defaults(
    index=index,
    similarity_top_k=3
)


DEBUG:bm25s:Building index from IDs objects


In [7]:
import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = genai.GenerativeModel('gemini-pro')  # Using this to test




In [8]:

from llama_index.llms.google_genai import GoogleGenAI
import os

llm = GoogleGenAI(
    model="gemini-2.0-flash",  # or ""
    api_key=os.environ["GOOGLE_API_KEY"]
)




In [9]:
#  Query the document using natural language
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine

response_synthesizer = get_response_synthesizer(
    llm=llm,
    response_mode="compact"
)

query_engine = RetrieverQueryEngine(
    retriever=bm25_retriever,
    response_synthesizer=response_synthesizer
)


In [10]:
# Query the document using natural language
response = query_engine.query("state ohms law")
print(response)


Ohm's law states that the potential difference (V) between the ends of a conductor is proportional to the current (I) flowing through it, which can be written as V = RI, where R is the resistance of the conductor. A conducting material obeys Ohm’s law when the resistivity of the material does not depend on the magnitude and direction of the applied electric field. However, there are materials and devices where the proportionality of V and I does not hold.



###  Question-Answer Section

In [11]:
# Query the document using natural language

# Chatbot Question-Answer Test


questions = [
    "What is displacement in a wave?",
    "What is mobility?",
    "Explain superposition of waves.",
]

print(" Chatbot Responses:\n")

for idx, q in enumerate(questions, 1):
    print(f"Q{idx}: {q}")
    try:
        response = query_engine.query(q)
        print(f"A{idx}: {response}\n")
    except Exception as e:
        print(f"⚠️ Error: {e}\n")


 Chatbot Responses:

Q1: What is displacement in a wave?
A1: Displacement in a wave is the vector sum of the displacements produced by each of the waves at a particular point in the medium. It can be represented by y(x,t) = a sin (kx – wt), where 'a' is the amplitude, and w is the angular frequency of the wave. For a y-polarized wave, the displacement is along the y direction.


Q2: What is mobility?
A2: Mobility is the magnitude of the drift velocity per unit electric field.


Q3: Explain superposition of waves.
A3: The principle of superposition applies when two or more light sources illuminate the same point. When considering the intensity of light due to these sources at a given point, there is an interference term in addition to the sum of the individual intensities. This term is significant only if it has a non-zero average, which occurs if the sources have the same frequency and a stable phase difference.




In [13]:
# Query the document using natural language
import gradio as gr

# Simulated chatbot interface (using  existing `query_engine`)
chat_history = []

def chatbot_interface(message, history):
    try:
        if 'query_engine' not in globals():
            return "⚠️ System isn't ready. Please upload and process a PDF first."

        response = query_engine.query(message)
        chat_history.append((message, str(response)))
        return str(response)

    except Exception as e:
        return (
            "⚠️ Something went wrong while processing your query. "
            "Please make sure the PDF is uploaded correctly and try again."
        )

# Prompt shown after upload
greeting_msg = f"""📄 You've uploaded a file..

You can now ask factual questions based on the document’s content.
This tool retrieves relevant sections from your PDF — best used for direct lookups.

💡 Try asking:
- What does [term] mean in this document?
- What is mentioned about [topic]?
- Find the paragraph that talks about [concept]
"""

# Launch Chat UI
gr.ChatInterface(
    fn=chatbot_interface,
    title="📘 Ask Your PDF",
    description=greeting_msg,
    examples=[
        "state ohms law",
        "Define oxidation as per pdf",
        "Explain superposition of waves."
    ]
).launch()


  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3f473dc9abd8a86b32.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


