-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095
Comments
@tomeck doing something like this should work on the latest version chain=ChatVectorDBChain( vectorstore=vector_store,search_kwargs={"filter":{"type":"things"}},top_k_docs_for_context=1, ...)
|
Looking at this, it should work https://github.com/hwchase17/langchain/blob/0bee219cb38248b7f152e44d99476183291862c7/langchain/vectorstores/base.py#L133 I will give it a try in my code. Edit: Doing this directly works VectorStoreRetriever(vectorstore=vector_store, search_kwargs={"filter":{"type":"filter"},"k":1},) The ergonomics aren't great though. |
I am using the latest release 0.0.123 which is missing kwargs in the |
For me, it still complains if I pass the args to |
Master implementation of
Would something like this work?
|
@tomeck it should, but I guess you might as well just init the retriever yourself with this VectorStoreRetriever(vectorstore=vector_store, search_kwargs={"filter":{"type":"filter"},"k":1},) Thats what the |
ok thanks, this got me a lot further. qa = ConversationalRetrievalChain.from_llm(
OpenAI(temperature=0),
VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"filter":{"tenant":"commerce-hub"}}),
callback_manager=manager,
verbose = True,
return_source_documents=True) |
Don't forget to set the |
FYI - I am using Chroma as my vectorstore. I had to hack Chroma.py, specifically to change docs_and_scores = self.similarity_search_with_score(query, k, where=filter) to this docs_and_scores = self.similarity_search_with_score(query, k, filter=filter) |
@tomeck that should be fixed in the latest master |
I know this is closed but for those of you who figured out how to filter, could you show me another example? I am trying to initialize a retriever with a filter based on an the hash_code in the metadata. Basically trying to build a retriever that is scoped to a single document that is represented by the hash_code. I was trying this but no luck:
Here is a sample of my chroma collection.
|
@perryrobinson Upgrading to the latest version of langchain (0.0.157) would help. |
is there a way to filter by multiple file names? |
I ended up just writing my own custom retriever wrapper and it's working great |
any plans to support filtering on a list of values? like search_kwargs={"filter":{"type":["thing1", "thing2"]}}. I'm using ChatVectorDBChain with Chroma. Any hacks? |
Hey guys, i just figured it out. vec = VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"where_document":{"$or": [{"$contains": "search_string_1"}, {"$contains": "search_string_1"}]}}) |
Si necesitan pasar filtros en: from langchain.chains import RetrievalQA, se puede hacer así: |
If I use this solution, it answers almost always "I don't know.". And when I check the results source_documents, if gives me an empty list. Am I missing anything ? |
This would work |
Does "where_document" work in ConversationalRetrievalChain? retriever = vectordb.as_retriever(
search_kwargs={
"k": 4,
"where_document": {'$contains': 'KEYWORD'}
}
)
qa = ConversationalRetrievalChain.from_llm(
llm,
retriever=retriever,
memory=self.memory,
return_source_documents=True,
combine_docs_chain_kwargs={"prompt": qa_prompt}
) |
I also am having trouble using "$contains" when using then vectordb.as_retriever() function. I am using chromadb as the vectorstore. I am able to filter documents hardcoding the search value like However, the filter no longer works if I use any of the chroma Where filters as described here (https://docs.trychroma.com/usage-guide#using-where-filters) such as $contains, $in or $eq. e.g. Any help would be appreciated |
Hello guys, just want to share with you that in my experience, passing a small number let's say 5 in the "k" paramter of the search_kwargs for retrieving the top 5 documents in chromadb works only if you have a limited number of docs indexed in the db, since I have more than 30000 docs, I had to set the k to a number greater than 30000 (in runtime it will be automaticly adjusted to the max lenght of the docs array) to let the retriever get in the first positions of the docs array the best matching documents. Which it does in the right manner, in docs[0] I get exatly what I was searching for. So I assume that it is a bug of Chroma for big db or the k parameter doesn't really work as the top documents retrieved in the whole DB. Anyone explored what really the k parameter does for the chromadb retriever? |
I need to supply a 'where' value to filter on metadata to Chromadb
similarity_search_with_score
function. I can't find a straightforward way to do it. Is there some way to do it when I kickoff my chain? Any hints, hacks, plans to support?The text was updated successfully, but these errors were encountered: