<a href="https://colab.research.google.com/github/mansoorshakeel/ML-DL-Projects/blob/main/RAG%20BASED%20Q%26A%20CHATBOT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**SIMPLE DOCUMENT Q&A CHATBOT USING RAG**

INSTALL NECESSARY LIBRARIES AND IMPORT THEM ==>

In [None]:
!pip install -q langchain langchain-community langchain-google-genai
!pip install -q chromadb
!pip install -q pypdf
!pip install -q sentence-transformers

In [None]:
import os
from google.colab import files  # For uploading files in Colab
import google.generativeai as genai

# LangChain components - these help us build the RAG system
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA



print("✅ Libraries imported!")

✅ Libraries imported!


We're using Google's Gemini AI (it's FREE!)

In [None]:
API_KEY = "AIzaSyAth2k9Z3J5GpWLTSu_sCWtKBZyPL3Wjzo"

# Check if API key is set
if API_KEY == None:
    print("❌ ERROR: Please set your API key first!")

else:
    # Configure the API key properly (THIS IS THE FIX!)
    os.environ["AIzaSyAth2k9Z3J5GpWLTSu_sCWtKBZyPL3Wjzo"] = API_KEY
    genai.configure(api_key=API_KEY)
    print("✅ API Key configured correctly!")

✅ API Key configured correctly!


Upload your documents

In [None]:
print("📁 Click 'Choose Files' to upload your document...")
uploaded = files.upload()

# Check if any files were uploaded
if uploaded:
    # Get the filename
    filename = list(uploaded.keys())[0]
    print(f"✅ Uploaded: {filename}")
else:
    print("❌ No file was uploaded.")
    filename = None # Set filename to None if no file was uploaded

📁 Click 'Choose Files' to upload your document...


Saving MansoorShakeel.Resume.pdf to MansoorShakeel.Resume (3).pdf
✅ Uploaded: MansoorShakeel.Resume (3).pdf


Checking if it's a PDF or text file

In [None]:
if filename is None:
    print("Please upload a file in the previous step.")
elif filename.endswith('.pdf'):
    loader = PyPDFLoader(filename)
    print("📄 Loading PDF...")
elif filename.endswith('.txt'):
    loader = TextLoader(filename)
    print("📄 Loading text file...")
else:
    print("❌ Please upload a PDF or TXT file")

📄 Loading PDF...


In [None]:
documents = loader.load()
print(f"✅ Loaded {len(documents)} page(s)")

✅ Loaded 1 page(s)


SPLIT DOCUMENT INTO CHUNKS

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # Size of each chunk
    chunk_overlap=50,      # Overlap between chunks
)

Split the document

In [None]:
chunks = text_splitter.split_documents(documents)
print(f"✅ Split into {len(chunks)} chunks")

# Let's see what a chunk looks like
print("\n📝 Example chunk:")
print(chunks[0].page_content[:200] + "...")


✅ Split into 8 chunks

📝 Example chunk:
Mansoor Shakeel 
Islamabad | +92 (316) 1522086 | mansoorshakeel0@gmail.com | LinkedIn-Mansoor Shakeel 
 
 
 
Work Experience 
 IT Support Internee at Graana- Islamabad                                 ...


CREATING EMBEDDINGS

In [None]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("✅ Embedding model loaded!")


✅ Embedding model loaded!


CREATE VECTOR STORE (DATABASE)

In [None]:
vectorstore = Chroma.from_documents(
    documents=chunks,           # Our document chunks
    embedding=embeddings,       # The embedding model
    persist_directory="./db"    # Where to save the database
)

print("✅ Vector store created!")
print(f"📊 Stored {len(chunks)} chunks in the database")

✅ Vector store created!
📊 Stored 8 chunks in the database


SET UP THE AI MODEL

In [None]:
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",  # ← CHANGED THIS
    temperature=0.2,
    google_api_key=API_KEY,
    convert_system_message_to_human=True
)

print("✅ AI model initialized!")

✅ AI model initialized!


CREATING THE QA SYSTEM

In [None]:
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3}  # Return top 3 most relevant chunks
)

# Create the Question-Answering chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,                              # The AI model
    chain_type="stuff",                   # How to combine chunks
    retriever=retriever,                  # How to search
    return_source_documents=True          # Show which chunks were used
)

print("✅ QA System ready!")

✅ QA System ready!


ASK QUESTIONS!

In [None]:
def ask(question):

    print("\n" + "="*60)
    print(f"❓ QUESTION: {question}")
    print("="*60)

    # Get the answer
    result = qa_chain.invoke({"query": question})

    # Print the answer
    print(f"\n💡 ANSWER:\n{result['result']}")

    # Show which parts of the document were used
    print(f"\n📚 SOURCES (Top {len(result['source_documents'])} relevant chunks):")
    for i, doc in enumerate(result['source_documents'], 1):
        print(f"\n--- Chunk {i} ---")
        print(doc.page_content[:150] + "...")

    return result

Interactive MODE

In [None]:
def chat():
    """
    Interactive chat mode - ask multiple questions
    Type 'quit' to exit
    """
    print("\n💬 CHAT MODE STARTED")
    print("Type your questions below. Type 'quit' to exit.\n")

    while True:
        question = input(" You: ").strip()

        if question.lower() in ['quit', 'exit', 'q']:
            print(" Goodbye!")
            break

        if question:
            ask(question)

chat()



💬 CHAT MODE STARTED
Type your questions below. Type 'quit' to exit.

 You: What programming languages does this person know?

❓ QUESTION: What programming languages does this person know?




KeyboardInterrupt: 