Added kwargs parameter for retrievers and text to SQL pipeline using DSPy #721

Athe-kunal · 2024-03-26T07:21:24Z

All the retrievers support metadata filtering arguments, which can be passed during dspy.Retrieve(k=num_passages) as **kwargs. This PR add just the **kwargs parameter to retriever which can be passed to the retriever client and we can sample passages based on metadata. For example in ChromaDB, we have

retrieve = dspy.Retrieve(k=3)
retrieve("What is valuation",where={'book_source': 'Investment_Valuations_techniques'})

We will only retrieve documents for the given book source and send to reranker or model for next stages in the pipeline

CShorten · 2024-03-27T02:58:38Z

This is awesome!! 🔥

@Athe-kunal any chance you would also be interested in taking a look at this - #560? I see you have also updated the dsp.retrieveEnsemble call to handle the dependency in dspy.Retrieve when configured with dspy.settings(rm=rm). I think this issue is important for how we want to return metadata from the RMs. Still need to dig deeper personally, but maybe you could keep the momentum rolling into that problem as well haha.

P.S. Please note Weaviate can also take **kwargs to interface metadata filters such as:

import weaviate.classes as wvc

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
   query="food",
   filters=wvc.query.Filter.by_property("round").equal("Double Jeopardy!"),
   limit=3
)

for o in response.objects:
   print(o.properties)

Athe-kunal · 2024-03-27T03:54:09Z

Hi @CShorten
Thanks for you for your response. I got started with DSPy following your videos on weaviate and DSPy. Those are really great resources to get started with DSPy, thanks for that.

Coming to PR, the retrieve pipeline just returns the long_text key. In dspy.RetrieveEnsemble, it just returns the long_text. All the RMs return a dotdict data structure, and I only checked for ChromaDB, and it returns metadata and other metrics. However, those are ignored in the dspy.RetrieveEnsemble function. I will be working on returning metadata and other metrics that the libraries can return. It should be a default behaviour (returning the metadata), IMO.

CShorten · 2024-03-27T14:30:09Z

Thank you so much for the kind words, it means a lot!

Ok fantastic, I think we are good to merge this one adding **kwargs, and maybe we can continue our discussion of the long_text key problem on #560?

CShorten · 2024-03-27T14:31:18Z

@hsm207 maybe you will find this one interesting on the use of **kwargs for accessing search APIs.

hsm207 · 2024-03-27T16:56:52Z

the docstring in the WeaviateRM's forward method need to be updated too to document that kwargs param and that all of it will be passed to the client's hybrid method 😀

Athe-kunal · 2024-03-28T05:55:50Z

Yes, we need to have documentation for all the retrievers and what extra parameters they can take. Some of the retrievers have some inputs in the __init__ method, but not sure, if it will scale well.

hsm207 · 2024-03-28T10:04:51Z

Some of the retrievers have some inputs in the __init__ method, but not sure, if it will scale well.

Just had a look at some other retrievers and I see what you mean. There are solutions we can consider but they would require a major refactoring which I don't think is appropriate at this stage of the project. But good that you brought this up. Maybe the maintainers can put it in their backlog.

Athe-kunal · 2024-03-28T21:06:02Z

Yeah sure.
I think for now I am working on returning the metadata by the retriever if the retriever is returning it. The current dspy.retrieveEnsemble only returns the long_text parameter and ignores the rest of the keys in the dotdict data-structure.

Athe-kunal · 2024-03-31T05:49:13Z

@CShorten and @hsm207
I have added a text to SQL end-to-end example, please do review and give me some feedback on how to improve it

Josephrp

A great example for a popular use case of DSPy

Josephrp · 2024-03-31T08:38:52Z

the code looks good to me.
simply there are some superficial and language related comments to really communicate the use case more broadly i think.
If you would like my help, i can surely support you with information comments here on this thread for your consideration.

Athe-kunal · 2024-03-31T15:58:13Z

Yeah sure @Josephrp, please provide your feedback

CShorten · 2024-04-03T00:08:45Z

Lgtm, tested with:

import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate
import weaviate.classes as wvc

weaviate_client = weaviate.connect_to_local()
weaviate_rm = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)
dspy.settings.configure(rm=weaviate_rm)

filters=wvc.query.Filter.by_property("content").contains_all(["sql"])

weaviate_rm("What is ref2vec?", filters=filters)

CShorten · 2024-04-03T01:00:15Z

Having trouble with dspy.Retrieve(k=1)("What is ref2vec", filters=filters).

Have you tested that this works @Athe-kunal? Will dive in further tomorrow.

CShorten · 2024-04-03T01:14:19Z

Apologies this error was caused by how I was testing.

Confirming, dspy.Retrieve(k=1)("What is ref2vec", filters=filters) works ✅

kwargs parameter for retrievers

dc6c539

CShorten added 2 commits March 27, 2024 10:26

Update weaviate_rm.py for search arguments in **kwargs

11c2633

Run ruff linter

0eb6a49

Merge branch 'stanfordnlp:main' into main

9a61ff7

Athe-kunal and others added 2 commits March 30, 2024 21:53

Merge branch 'stanfordnlp:main' into main

46d4cff

Add text to sql end-to-end pipeline from scratch using DSPy

65bcf70

Athe-kunal changed the title ~~Added kwargs parameter for retrievers~~ Added kwargs parameter for retrievers and text to SQL pipeline using DSPy Mar 31, 2024

Josephrp reviewed Mar 31, 2024

View reviewed changes

Athe-kunal and others added 5 commits March 31, 2024 12:15

Merge branch 'stanfordnlp:main' into main

49f65c8

Merge branch 'stanfordnlp:main' into main

be88ce6

text to SQL tutorial fixes

9e45e0a

Changed the table structure and renamed the file to financial data

98da58c

Merge branch 'stanfordnlp:main' into main

8012f5f

Missing **kwargs argument pass through in retrieveEnsemble

fdf6551

CShorten merged commit fd63306 into stanfordnlp:main Apr 3, 2024

Added kwargs parameter for retrievers and text to SQL pipeline using DSPy #721

Added kwargs parameter for retrievers and text to SQL pipeline using DSPy #721

Uh oh!

Conversation

Athe-kunal commented Mar 26, 2024

Uh oh!

CShorten commented Mar 27, 2024

Uh oh!

Athe-kunal commented Mar 27, 2024

Uh oh!

CShorten commented Mar 27, 2024

Uh oh!

CShorten commented Mar 27, 2024

Uh oh!

hsm207 commented Mar 27, 2024

Uh oh!

Athe-kunal commented Mar 28, 2024

Uh oh!

hsm207 commented Mar 28, 2024

Uh oh!

Athe-kunal commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Athe-kunal commented Mar 31, 2024

Uh oh!

Josephrp left a comment

Choose a reason for hiding this comment

Uh oh!

Josephrp commented Mar 31, 2024

Uh oh!

Athe-kunal commented Mar 31, 2024

Uh oh!

CShorten commented Apr 3, 2024

Uh oh!

CShorten commented Apr 3, 2024

Uh oh!

CShorten commented Apr 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Athe-kunal commented Mar 28, 2024 •

edited

Loading