-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Added support for LlamaIndex Retrievers as a DSPy Retriever Module #911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @property | ||
| def similarity_top_k(self) -> int: | ||
| """Return similarity top k of retriever.""" | ||
| return self.retriever.similarity_top_k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not every retriever in llama-index will have this property 👀 It kind of depends. The BaseRetriever class is fairly generic, and just exposes retrieve()/aretrieve() methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh thank you for pointing this out - I'll make an adjustment shortly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @logan-markewich for looking into this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've resolved this - please review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, but the output should be typed as Optional[int] though
Kind of curious what the motivation is to expose this on the class? Is this something that dspy can optimize later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall if it can be optimized, but its a parameter that is often used with their forward method so I was looking for a way to pass it to the underlying LI retriever.
ammirsm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution and help!
The PR overal LGTM, couple small stuffs which needs attention and we can approve and merge it.
pyproject.toml
Outdated
| psycopg2 = { version = "^2.9.9", optional = true } | ||
| pgvector = { version = "^0.2.5", optional = true } | ||
| structlog = "^24.1.0" | ||
| llama-index = "^0.10.30" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you think it should be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is causing lots of confusion but for the dependencies right now we use 3 different places... I have a PR to fix that in a short but if it is going to be merge before that you need to add your dependencies in requirements.txt (which use for building our package), here that you added it, and dependencies in the pyproject.toml ...
more info:
#819
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to making the dependency optional - but wouldn't my tests fail as you pointed out if we run that route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way is to tag those tests to just be ran when the llama-index is installed and we fix the CI based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with that. I'll make the adjustment today or tomorrow. I'll only run those tests when llamaindex is present.
|
|
||
|
|
||
| def test_lirm_as_rm(rag_setup): | ||
| """Test the retriever as retriever method""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test will fail in the CI if we make that dependency optional...
properties. Updated requirements.txt.
|
There are some merge conflicts in pyproject and poetry and I think after that we should be good to go. |
I believe I've resolved this now! |
ammirsm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks all! |
Given that Llama Index has some of the most wide and robust support for various retrieval abstractions and methods, adding an interface from LlamaIndex into DSPy would allow many to leverage the various retrieval classes open in LlamaIndex.
https://docs.llamaindex.ai/en/stable/api_reference/retrievers/