Skip to content

Confusion about return types of retrievers #998

@wxywb

Description

@wxywb

when we set a rm in dspy, and we call the dspy.Retrieve, the following lines will be executed

def retrieve(query: str, k: int, **kwargs) -> list[str]:
"""Retrieves passages from the RM for the query and returns the top k passages."""
if not dsp.settings.rm:
raise AssertionError("No RM is loaded.")
passages = dsp.settings.rm(query, k=k, **kwargs)
if not isinstance(passages, Iterable):
# it's not an iterable yet; make it one.
# TODO: we should unify the type signatures of dspy.Retriever
passages = [passages]
passages = [psg.long_text for psg in passages]
if dsp.settings.reranker:
passages_cs_scores = dsp.settings.reranker(query, passages)
passages_cs_scores_sorted = np.argsort(passages_cs_scores)[::-1]
passages = [passages[idx] for idx in passages_cs_scores_sorted]
return passages

so we can see in L12, the passages return by retrievers should be a list of dotdict which has long_text attribute.

But we can find the base class of retrievers should return Prediction

def forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None,**kwargs) -> Prediction:

the official example which uses dspy.ColBERTv2 returns List[dotdict] instead of Prediction (although it is not a dspy.Retrieve)

return [dotdict(psg) for psg in topk]

and we now have two kinds of retrievers, one return List[dotdict]

  • azureaisearch_rm.py
  • chromadb_rm.py
  • clarifai_rm.py
  • vectara_rm.py

another one return dspy.Prediction

  • databricks_rm.py
  • mongodb_atlas_rm.py
  • pinecone_rm.py
  • ragatouille_rm.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions