### dspy.Retriever Interface
This notebook demonstrates how to use the `dspy.Retriever` interface for custom retriever integrations.

**Supported Integrations**: ColBERTv2, Databricks, FAISS, Milvus, Pinecone, Weaviate


In [1]:
import dspy


results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')("What's the name of the castle that David Gregory inherited?", k=5).passages
print(results)

[{'text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'pid': 3296134, 'rank': 1, 'score': 23.595149993896484, 'prob': 0.9494133657446476, 'long_text': 'David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine

You can create and configure your custom retrievers by implementing this interface.


In [None]:
class MyRetriever(dspy.Retriever):
    def __init__(self, embedder: dspy.Embedding, k: int = 5, cache: bool = False):
        # embedder: dspy.Embedding class to specify (1) embedding model and/or (2) embedding_function to perform any embedding computation
        # k: number of top query results to return.
        # cache: enable for query caching. (default is disabled)
        super().__init__(embedder=embedder, k=k, cache=cache)

    def forward(self, query, k):
        embeddings = self.embedder([query])

        #include custom logic here if your retriever supports a custom client, index, vector store, etc. 
        results = [
            {"passage": f"Mock passage {i+1} for query '{query}'", "doc_id": i, "score": 1.0 / (i + 1)}
            for i in range(k)
        ]
        passages = [res["passage"] for res in results]
        doc_ids = [res["doc_id"] for res in results]
        scores = [res["score"] for res in results]
        return dspy.Prediction(passages=passages, doc_ids=doc_ids, scores=scores)

embedder=dspy.Embedding(embedding_model="openai/text-embedding-3-small")
retriever = MyRetriever(embedder=embedder, k=10, cache=True)

result = retriever("What's the name of the castle that David Gregory inherited?")
print(result.passages)

With this integration, you can easily configure retrievers in DSPy programs and pipelines.

This example demonstrates a multi-hop program that declares layers for different modules and a retrievers to compose the system.

In [None]:
class MultiHop(dspy.Module):
    def __init__(self, retriever, passages_per_hop):
        super().__init__()
        self.retrieve = retriever(k=passages_per_hop) 
        self.generate_query = dspy.ChainOfThought("context ,question->search_query")
        self.generate_answer = dspy.ChainOfThought("context ,question->answer")

    def forward(self, question):
        context = []
        for hop in range(2):
            query = self.generate_query(context=context, question=question).search_query
            context += self.retrieve(query).passages
        return dspy.Prediction(
            context=context,
            answer=self.generate_answer(context=context, question=question).answer,
        )
    
embedder=dspy.Embedding(embedding_model="openai/text-embedding-3-small")

multihop = MultiHop(retriever=MyRetriever(embedder=embedder, cache=True), passages_per_hop=3)
result = multihop(question = "What's the name of the castle that David Gregory inherited?")

print(result.context)
print(result.answer)