**1. Set up Asyncio**
We need to handle async operations within our Jupyter Notebook environment. 

In [None]:
import nest_asyncio
nest_asyncio.apply()

import sys
print("Python executable:", sys.executable)
print("Python path:", sys.path)

import numpy
print("numpy version:", numpy.__version__)
print("Numpy path:", numpy.__file__)

import torch
print(torch.__version__)

**2. Set up Qdrant vector database**
We'll use Qdrant as our vector database. Here we are going to store and retrieve vectore embeddings. 

Collections in Qdrant serve as the primary organisational unit for storing and managing vector data.
Collections are designed to group similar or related vectors together, allowing for efficient search and retrieval operations within that group.

Vectore requirements
- all vectors within a collection must have the same dimensionality
- a single distance metric is used for comparing vectors in a collection

Supported distance metrics:
- dot product
- cosine
- euclidean
- manhattan

In [4]:
import qdrant_client

collection_name = "chat_with_docs"

client = qdrant_client.QdrantClient(
  host="localhost",
  port=6333,
)


**3. Read the documents**
Load the document from the specified path and extract their contents for use in our RAG pipeline.

In [5]:
from llama_index.core import SimpleDirectoryReader

input_dir_path = "./docs"

loader = SimpleDirectoryReader(
  input_dir=input_dir_path,
  required_exts=[".pdf"],
  recursive=True,
)

docs = loader.load_data()

In [None]:
type(docs), len(docs)

**Define function to index data**
In this step I'm creating a function to create an index for our document embeddings, which will be store in the Qdrant vector database.
The index will allow us to organise and search through the document embeddings efficiently.

In [7]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

# this function converts each document into an embedding and stores them in the vector database
def create_index(documents):
  vector_store = QdrantVectorStore(client=client, collection_name=collection_name)

  storage_context = StorageContext.from_defaults(vector_store=vector_store)

  index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

  return index


**Index our data**
We are setting an embedding model from Hugging Face to convert our documents into vector embeddings, which we'll then store in Qdrant using the index function defined above.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# the model used to generate the embeddings
embed_model = HuggingFaceEmbedding(
  model_name="BAAI/bge-large-en-v1.5",
  trust_remote_code=True,
  device="cpu",
)

# make sure the same model is used throughout the pipeline to maintain consistency in embeddings generation
Settings.embed_model = embed_model

index = create_index(docs)


**Load the LLM**
We configure an LLM to handle the response generation step in our RAG pipeline

In [9]:
from llama_index.llms.ollama import Ollama

# use the timeout parameter to prevent the LLM from hanging indefinitely
llm = Ollama(model="llama3.2:1b", request_timeout=120)

# we set the LLM to be used throughout the pipeline
Settings.llm = llm

**Define the prompt template**
We create a prompt template that defines a consistent format to guide the LLM about the context it should look at while answering the query.

In [10]:
from llama_index.core import PromptTemplate

template = """Context information is below:
              ---------------------
              {context_str}
              ---------------------
              Given the context information above I want you to think
              step by step to answer the query in a crisp manner,
              incase you don't know the answer say 'I don't know!'
            
              Query: {query_str}
        
              Answer:"""

# we define the prompt template
prompt_template = PromptTemplate(
  template=template,
)

**Query the document**
Finally, we utilize the index created above to set up a query engine which will use our indexed documents to process user queries.

In [None]:
query_engine = index.as_query_engine(
  similarity_top_k=5,
  response_mode="compact",
  verbose=True,
  response_kwargs={"answer_prefix": "Answer:"},
)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": prompt_template}
)

response = query_engine.query("What exactly is DSPy?")

from IPython.display import Markdown, display

display(Markdown(str(response)))