# RAG System with Qdrant and Gemini

This notebook implements a complete Retrieval Augmented Generation (RAG) system using:
- **Qdrant** (in Docker) as the vector database
- **SentenceTransformers** for embeddings
- **Gemini** as the LLM
- **LangChain** to connect everything

## Prerequisites

Before running this notebook:
1. Install Docker Desktop for Windows
2. Start Qdrant in Docker with:
   ```
   docker run -d --name qdrant -p 6333:6333 -p 6334:6334 -v "%cd%\qdrant_storage:/qdrant/storage" qdrant/qdrant
   ```
3. Create a folder named `pdfs` containing your PDF documents

## 2. Import Libraries and Setup

In [1]:
import os
import time
from dotenv import load_dotenv
import google.generativeai as genai
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Qdrant
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams, PointStruct

  from .autonotebook import tqdm as notebook_tqdm


## 3. Configure Environment

Set up the Google API key for Gemini. You can store it in a `.env` file or input it directly below.

In [2]:
# Load environment variables
load_dotenv()

# Set Google API key
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    # Uncomment and input your key if not using .env
    # GOOGLE_API_KEY = "your_google_api_key_here"
    pass

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
genai.configure(api_key=GOOGLE_API_KEY)

print(f"✅ Google API key configured")

✅ Google API key configured


## 4. Set Up Embedding Model

We'll use the SentenceTransformers library with the `all-MiniLM-L6-v2` model, which produces 384-dimensional embeddings.

In [3]:
# Set up the embedding model
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
embedding_dimension = embeddings.client.get_sentence_embedding_dimension()
print(f"✅ Encoder ready, dim = {embedding_dimension}")

  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


✅ Encoder ready, dim = 384


## 5. Connect to Qdrant and Create Collection

Connect to the Qdrant server running in Docker and set up the vector collection.

In [4]:
# Connect to Qdrant Docker container
qdrant_client = QdrantClient(host="localhost", port=6333)
collection_name = "pdf_documents"

# Check if collection exists, if not create it
collections = qdrant_client.get_collections().collections
collection_names = [collection.name for collection in collections]

if collection_name in collection_names:
    # Delete collection if it exists
    qdrant_client.delete_collection(collection_name=collection_name)
    print(f"🗑️ Deleted existing collection: {collection_name}")

# Create new collection
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=embedding_dimension, distance=Distance.COSINE),
)
print(f"✅ Created new Qdrant collection: {collection_name}")
print(f"🌐 Qdrant dashboard available at: http://localhost:6333/dashboard")

✅ Created new Qdrant collection: pdf_documents
🌐 Qdrant dashboard available at: http://localhost:6333/dashboard


## 6. Load and Split PDF Documents

We'll load PDFs from the `./pdfs` directory and split them into manageable chunks.

In [5]:
# Set up folder path and text splitter
folder_path = "./pdfs"  # folder containing your PDFs
loader_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Check if folder exists
if not os.path.exists(folder_path):
    os.makedirs(folder_path)
    print(f"Created directory: {folder_path}")
    print("Please add your PDF files to this directory and run this cell again.")
else:
    # Load and process PDF files
    documents = []
    pdf_files = 0
    
    for fname in os.listdir(folder_path):
        if not fname.lower().endswith(".pdf"):
            continue
        pdf_files += 1
        path = os.path.join(folder_path, fname)
        print(f"Loading {fname}...")
        loader = PyPDFLoader(path)
        pages = loader.load_and_split(text_splitter=loader_splitter)
        documents.extend(pages)

    if pdf_files == 0:
        print(f"No PDF files found in {folder_path}. Please add PDF files and run this cell again.")
    else:
        print(f"✅ Loaded and split {len(documents)} chunks from {pdf_files} PDF files.")

Loading 2023-S1-SE4020-Lecture-02-Introduction.pdf...
Loading 2025-S1-SE4020-Lecture-01-Introduction.pdf...
✅ Loaded and split 31 chunks from 2 PDF files.


## 7. Create Vector Store and Add Documents

We'll create embeddings for all document chunks and store them in Qdrant.

In [6]:
# Check if documents were loaded before proceeding
if 'documents' in locals() and len(documents) > 0:
    # Create vector store
    start_time = time.time()
    print("Creating embeddings and uploading to Qdrant...")
    
    vectorstore = Qdrant.from_documents(
        documents,
        embeddings,
        url="http://localhost:6333",
        collection_name=collection_name,
    )
    
    elapsed_time = time.time() - start_time
    print(f"✅ Uploaded {len(documents)} document chunks to Qdrant in {elapsed_time:.2f} seconds")
    print(f"🌐 View your collection at: http://localhost:6333/dashboard/#/collections/{collection_name}")
else:
    print("No documents loaded. Please run the previous cell successfully first.")

Creating embeddings and uploading to Qdrant...
✅ Uploaded 31 document chunks to Qdrant in 2.97 seconds
🌐 View your collection at: http://localhost:6333/dashboard/#/collections/pdf_documents


## 8. Set Up the LLM and Prompt Template

We'll use Google's Gemini Pro model and create a prompt template for consistent answers.

In [23]:
# Set up the LLM (Gemini)
llm = ChatGoogleGenerativeAI(
            model="gemini-2.0-flash",
            google_api_key=GOOGLE_API_KEY,
            temperature=0.2,
            convert_system_message_to_human=True
        )

# Create the prompt template
template = """You are an expert assistant. Use the following context (with page numbers) to answer the user's question.

Context:
{context}

Question:
{question}

Answer:
1. Summary:  
   Provide a succinct explanatory summary (1–2 sentences).

2. Key Points:  
   List the main supporting details in bullet form. For each bullet, cite the page number in parentheses.

Example format:

1. Summary:  
   The primary purpose of vector databases is to store and query dense vector embeddings for similarity search (page 12).

2. Key Points:  
   - Vector databases offer fully managed services, eliminating infrastructure overhead (page 5).  
   - They support cosine and dot-product similarity metrics for fast nearest-neighbor retrieval (page 8).  
   - They integrate seamlessly with popular embedding libraries like SentenceTransformer (page 14).  
   - They provide automatic indexing to scale to millions of vectors (page 20).

Now, answer the question below following this format:
{question}
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template
)

print("✅ Prompt template created")

✅ Prompt template created


## 9. Create the RAG Chain

Now we'll connect all components to create our RAG system.

In [24]:
# Check if vectorstore exists
if 'vectorstore' in locals():
    # Create RetrievalQA chain
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        chain_type_kwargs={"prompt": prompt},
        return_source_documents=True
    )
    
    print("✅ RAG chain with Gemini is ready.")
else:
    print("Vectorstore not created. Please run previous cells successfully first.")

✅ RAG chain with Gemini is ready.


## 10. Test with Example Questions

Let's test our RAG system with some example questions.

In [25]:
# Define a function to handle questions
def ask_question(question):
    if 'qa_chain' not in locals() and 'qa_chain' not in globals():
        print("RAG chain not created. Please run previous cells successfully first.")
        return None
    
    result = qa_chain({"query": question})
    
    print("\n" + "="*50)
    print(f"QUESTION: {question}")
    print("="*50)
    
    print("\nANSWER:")
    print(result["result"])
    
    print("\nSOURCES:")
    for doc in result["source_documents"]:
        src = doc.metadata.get("source", "unknown")
        pg = doc.metadata.get("page", "unknown")
        print(f" • {src} — page {pg}")
    
    return result

# Try an example question
example_question = "What are ranges?"
result = ask_question(example_question)




QUESTION: What are ranges?

ANSWER:
1. Summary:
Ranges in Swift define a sequence of values, and Swift provides several operators to create different types of ranges, including closed, half-open, and one-sided ranges. These ranges can be countable (integers) or strideable, allowing for enumeration or stepping through values with a specific increment (page 16).

2. Key Points:
   - Closed Range Operator (a...b): Includes both 'a' and 'b' (page 16).
   - Half-Open Range Operator (a..<b): Includes 'a' but not 'b' (page 16).
   - One-Sided Ranges (a... or ...b): Represents a range from 'a' to the end or from the beginning to 'b' (page 16).
   - Countable Range: Ranges of integers that can be enumerated (a..<b or a...b where a and b are integers) (page 16).
   - Strideable Range: Values stepped through with a certain stride (stride(from: a, to: b, by: s) or stride(from: a, through: b, by: s)) (page 16).

SOURCES:
 • ./pdfs\2023-S1-SE4020-Lecture-02-Introduction.pdf — page 15
 • ./pdfs\2023

## 11. Interactive Mode

Use this cell to ask custom questions about your documents.

In [None]:
# Ask your own question
your_question = input("What would you like to ask about your documents? ")
result = ask_question(your_question)

## 12. Exploring the Vector Space in Qdrant Dashboard

You can explore your vectors in the Qdrant dashboard:

1. Open http://localhost:6333/dashboard in your browser
2. Go to "Collections" and click on the `pdf_documents` collection
3. Use the "Search" tab to perform vector searches
4. View the "Cluster view" to visualize your vector space

## Next Steps

Some ideas to improve your RAG system:

1. Adjust the chunk size and overlap for better context retrieval
2. Try different embedding models for improved relevance
3. Experiment with different prompt templates
4. Implement metadata filtering to search specific documents or sections
5. Add logging to track performance and improve the system over time