<a href="https://colab.research.google.com/github/sabrinaaquino/rag_chatbot/blob/main/simple_rag_ai_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building a RAG Chatbot with Qdrant, Gemma3, and Docling

Installing the necessary Python libraries.

In [None]:
#!pip install qdrant-client docling fastembed google-generativeai

Importing the libraries we'll use

In [None]:
from qdrant_client import QdrantClient, models
from docling.document_converter import DocumentConverter
from docling.chunking import HybridChunker
from fastembed import TextEmbedding
import google.generativeai as genai
from google.colab import userdata

`DocumentConverter()` is reading and converting documents (like PDFs, Word docs, etc.) into a structured format.

In [None]:
source = "data/the_rust_workbook.pdf"
document = DocumentConverter().convert(source=source).document

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  np.nanmean(
  np.nanmean(


In [None]:
print(document)

Processing a document into smaller, manageable parts called "chunks". To ensure that chunks do not exceed the 256-token limit of the `all-MiniLM-L6-v2` model, let's set the max_tokens parameter to 256:

In [None]:
chunk_tokenizer = "sentence-transformers/all-MiniLM-L6-v2"
chunker = HybridChunker(tokenizer=chunk_tokenizer)
chunks = chunker.chunk(dl_doc=document, max_tokens=256)

Let's take each element from the list `chunks`, extract its `.text` attribute, and build a new list containing just the texts from each chunk

In [None]:
text_chunks = [c.text for c in chunks]

In [None]:
print(len(text_chunks))

235


In [None]:
embedding_model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(embedding_model.embed(text_chunks))

In [None]:
print(len(embeddings))

235


In [None]:
client = QdrantClient(":memory:")

In [None]:
collection_name = "rust-book"

In [None]:
embedding_dim = len(embeddings[0])

client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
      size=embedding_dim,
      distance=models.Distance.COSINE
    )
)

True

In [None]:
points = [
    models.PointStruct(id=i, vector=vec, payload={"text": text_chunks[i]})
    for i, vec in enumerate(embeddings)
]

In [None]:
print(len(points[0].vector))

384


In [None]:
client.upload_points(
    collection_name=collection_name,
    points=points,
)

In [None]:
query_text="Who is the author of this book"
query_vector = list(embedding_model.embed(query_text))[0]

In [None]:
search_result = client.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit = 3,
)
print(search_result)

points=[ScoredPoint(id=3, version=0, score=0.7342091347160316, payload={'text': 'The author has made every effort to ensure the accuracy and completeness of the information contained in this book. However, the author assumes no responsibility for errors, omissions, or contrary interpretations of the subject matter.\nThis publication is offered with the understanding that the author is not engaged in rendering legal, financial, professional, or technical advice or services. If legal, financial, or other expert assistance is required, the services of a competent professional should be sought.\nIn no event shall the author or any distributor of this book be liable for any special, incidental, indirect, or consequential damages whatsoever arising out of or in connection with the use or inability to use this book, even if the author or distributor has been advised of the possibility of such damages.'}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=2, version=0, score=0.6722

In [None]:
context = "\n\n".join(hit.payload["text"] for hit in search_result.points)

In [None]:
print(context)

The author has made every effort to ensure the accuracy and completeness of the information contained in this book. However, the author assumes no responsibility for errors, omissions, or contrary interpretations of the subject matter.
This publication is offered with the understanding that the author is not engaged in rendering legal, financial, professional, or technical advice or services. If legal, financial, or other expert assistance is required, the services of a competent professional should be sought.
In no event shall the author or any distributor of this book be liable for any special, incidental, indirect, or consequential damages whatsoever arising out of or in connection with the use or inability to use this book, even if the author or distributor has been advised of the possibility of such damages.

The information in this book is provided 'as is' without any guarantees of completeness, accuracy, usefulness, or timeliness. The author disclaims any liability for any damag

Now that we have done the retrieval, let's do the generation.

In [None]:
genai.configure(api_key=userdata.get('gemini-api-key'))

List available models

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.5-pro-exp-03-25
models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash-preview-04-17
models/gemini-2.5-flash-preview-05-20
models/gemini-2.5-flash-preview-04-17-thinking
models/gemini-2.5-pro-preview-05-06
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-preview-image-generation
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-fl

In [None]:
model = genai.GenerativeModel("gemma-3-27b-it")

In [None]:
prompt = f"""You are a helpful assistant. Use the following context to answer the user's question and if you can't find the answer in the context, say you don't know.

Context: {context}
Question: {query_text}
Answer:"""

In [None]:
response = model.generate_content(prompt)
print(response.text)

The author is Francescociulla, and you can reach them at me@francescociulla.com.






Defining it as a function

In [None]:
def rag_respond_with_gemini(query_text):

    # Step 1: Embed the query
    query_vector = list(embedding_model.embed(query_text))[0]

    # Step 2: Retrieve relevant context from Qdrant
    search_results = client.query_points(
      collection_name=collection_name,
      query=query_vector,
      limit = 3,
    )

    context = "\n\n".join(hit.payload["text"] for hit in search_results.points)

    # Step 3: Format prompt for Gemini
    prompt = f"""You are a helpful assistant.

Use the context below to answer the user's question.

Context:
{context}

Question: {query_text}
Answer:"""

    # Step 4: Generate with Gemini
    response = model.generate_content(prompt)
    return response.text.strip()

In [None]:
response = rag_respond_with_gemini("Tell me about ownership in Rust")
print(response)

According to the provided text, ownership in Rust is a key part of the language's system for managing memory safely. It's explored in an exercise where you move a variable's value to another, and observe what happens when trying to use the original. The text also explains that ownership works *with* lifetimes to prevent "dangling references" – a common source of bugs and crashes in other languages.

You can also *borrow* values using the `&` operator instead of moving them, which demonstrates how borrowing works in Rust.

The topic of ownership is covered in more detail in Chapter 2 of the book.


In [None]:
response = rag_respond_with_gemini("Why Rust is a good programming language?")
print(response)

According to the text, Rust is a good programming language because of its **performance, safety, and concurrency capabilities**. It was also voted the **most admired programming language** in the Stack Overflow Developer Survey 2024, and has **unique features, especially in memory management**. Additionally, it helps you write **organized and easier to manage** code through features like **functions and variables**.
