# DSPy Retriever using Qdrant

This notebook will walk you through using Qdrant as retriever in DSPy. We'll be loading a dataset into Qdrant and retrieving relevant context from it in our DSPy retriever.

#### Setup

In [None]:
%pip install dspy-ai[qdrant]

#### Configure Constants

This notebook assumes, you have a Qdrant instance running at http://localhost:6333/. To learn more about setting up Qdrant, you can refer to the [quickstart guide](https://qdrant.tech/documentation/quick-start/).

In [1]:
COLLECTION_NAME = "DBPEDIA-DSPY"
QDRANT_URL = "http://localhost:6333"

### Ingesting data

We'll load the [Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K) dataset that contains info from DBPedia and embeddings pre-computed using OpenAI's `text-embedding-3-small`!

In [None]:
%pip install datasets

In [None]:
from datasets import load_dataset

# We will use a small subset of the dataset
dataset = (
    load_dataset(
        "Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K",
        streaming=True,
        split="train",
    )
    .take(1000)
    .remove_columns(["openai", "combined_text"])
)

Set up a client that points to a Qdrant instance at http://localhost:6333/.

In [3]:
from qdrant_client import QdrantClient

client = QdrantClient(url=QDRANT_URL)

ModuleNotFoundError: No module named 'qdrant_client'

We [create a collection](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) with the appropriate dimensions and distance metric to load our dataset into.

In [4]:
from qdrant_client import models

client.create_collection(
    collection_name=COLLECTION_NAME,
    vectors_config=models.VectorParams(
        size=1536,
        distance=models.Distance.COSINE,
    ),
)

ModuleNotFoundError: No module named 'qdrant_client'

We can now load the dataset to be indexed in Qdrant. The `upload_collection` methods accepts argumens to configure the batch size and parallelism. We'll go with the defaults.

In [5]:
vectors = [entry.pop("text-embedding-3-small-1536-embedding") for entry in dataset]

client.upload_collection(collection_name=COLLECTION_NAME, vectors=vectors, payload=dataset)

The loading is now complete. You can browse through the entries at http://localhost:6333/dashboard.

#### Initialize Qdrant retriever and OpenAI vectorizer

The Qdrant retriever allows us to configure the vectorizer to use. We'll use the `OpenAIVectorizer` with the `text-embedding-3-small` model as per our dataset.

We can also specify the field in our Qdrant payload with the document content. In our case, it's `"text"`. Based on the dataset we loaded.

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"

In [3]:
from dsp.modules.sentence_vectorizer import OpenAIVectorizer

vectorizer = OpenAIVectorizer(model="text-embedding-3-small")

In [4]:
from dspy.retrieve.qdrant_rm import QdrantRM

qdrant_retriever = QdrantRM(
    qdrant_client=client,
    qdrant_collection_name=COLLECTION_NAME,
    vectorizer=vectorizer,
    document_field="text",
)

With the `qdrant_retriever` now instantiated, we can now configure `dspy` to use it.

In [5]:
import dspy

dspy.settings.configure(rm=qdrant_retriever)

### Trying out the retriever

We can use the `dspy.Retrieve` class to query our retriever. Similar to how it's done in the DSPy RAG pipelines.

In [6]:
retrieve = dspy.Retrieve()

retrieve("Some computer programs.")

Prediction(
    passages=['CounterSpy is a proprietary spyware removal program for Microsoft Windows software developed by Sunbelt Software.', 'In computing, the diff utility is a data comparison tool that calculates and displays the differences between two files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other.', "AudioDesk is an audio workstation application by Mark of the Unicorn (MOTU) for the Mac OS. It is a multi-track recording, editing, and mixing application, with both offline file-based processing and realtime effects. It is a more basic version of MOTU's Digital Performer  DAW software. Much of the graphical user interface (GUI) and its operation are similar to Digital Performer, although it lacks some of Digital Performer's features."]
)

We are able to successfully retrieve results relevant to the query from our Qdrant collection.