How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095

tomeck · 2023-03-28T14:47:44Z

I need to supply a 'where' value to filter on metadata to Chromadb similarity_search_with_score function. I can't find a straightforward way to do it. Is there some way to do it when I kickoff my chain? Any hints, hacks, plans to support?

The text was updated successfully, but these errors were encountered:

Arttii · 2023-03-28T15:06:56Z

@tomeck doing something like this should work on the latest version

chain=ChatVectorDBChain(  vectorstore=vector_store,search_kwargs={"filter":{"type":"things"}},top_k_docs_for_context=1,  ...)

search_kwargs should work for the other vectorstore chains as well I think

suneelmatham · 2023-03-28T15:17:24Z

Since ChatVectorDBChain is being deprecated, I have been trying to use ConversationalRetrievalChain. So, I've been passing search_kwargs to the retriever but it's been giving an unexpected keyword arg error. I'm using the latest version.

Arttii · 2023-03-28T15:21:15Z

Looking at this, it should work https://github.com/hwchase17/langchain/blob/0bee219cb38248b7f152e44d99476183291862c7/langchain/vectorstores/base.py#L133

I will give it a try in my code.

Edit: Doing this directly works

VectorStoreRetriever(vectorstore=vector_store, search_kwargs={"filter":{"type":"filter"},"k":1},)

The ergonomics aren't great though.

suneelmatham · 2023-03-28T15:40:27Z

I am using the latest release 0.0.123 which is missing kwargs in the as_retriever function in the same file which caused this issue for me. Just noticed in the repo, that it's been fixed in master branch

Arttii · 2023-03-28T15:45:30Z

For me, it still complains if I pass the args to as_retriever, maybe I am having some version clashes from pulling the latest from master, I am not certain.

tomeck · 2023-03-28T16:05:31Z

Master implementation of as_retriever takes no args.

def as_retriever(self) -> VectorStoreRetriever:
        return VectorStoreRetriever(vectorstore=self)

Would something like this work?

    retriever = vectorstore.as_retriever()
    retriever.search_kwargs = {"filter":{"type":"filter"},"k":1}
    qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0),
                        retriever,
                        callback_manager=manager,
                        verbose = True,
                        return_source_documents=True)

Arttii · 2023-03-28T16:06:50Z

@tomeck it should, but I guess you might as well just init the retriever yourself with this

VectorStoreRetriever(vectorstore=vector_store, search_kwargs={"filter":{"type":"filter"},"k":1},)

Thats what the as_retriever does anyway

tomeck · 2023-03-28T16:17:02Z

ok thanks, this got me a lot further.

    qa = ConversationalRetrievalChain.from_llm(
                        OpenAI(temperature=0),
                        VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"filter":{"tenant":"commerce-hub"}}),
                        callback_manager=manager,
                        verbose = True,
                        return_source_documents=True)

Arttii · 2023-03-28T16:18:21Z

Don't forget to set the k if you want to get more or less than 4 similar documents. That's the default.

tomeck · 2023-03-28T16:32:41Z

FYI - I am using Chroma as my vectorstore. I had to hack Chroma.py, specifically similarity_search

to change

docs_and_scores = self.similarity_search_with_score(query, k, where=filter)

to this

docs_and_scores = self.similarity_search_with_score(query, k, filter=filter)

Arttii · 2023-03-28T16:41:17Z

@tomeck that should be fixed in the latest master

perryrobinson · 2023-05-02T04:00:06Z

I know this is closed but for those of you who figured out how to filter, could you show me another example? I am trying to initialize a retriever with a filter based on an the hash_code in the metadata. Basically trying to build a retriever that is scoped to a single document that is represented by the hash_code. I was trying this but no luck:

search_kwargs = {
    'top_k': 5
    'filters': [
        {'type': 'term', 'field': 'metadatas.hash_code', 'value': doc.hash_code}
    ]
}

Here is a sample of my chroma collection.

{
  "ids": [
    "849f739c-313a-56e5-95be-e3da7d142766"
  ],
  "documents": [
    "blah blah blah blah"
  ],
  "metadatas": [
    {
      "hash_code": "96efcee6a43aaa8a699bbf90c1d002c35e358d1d44c08ce178a1d522c3d7d6fd",
      "source": "garbage.pdf",
      "doc_type": "pdf"
    }
  ]
}

vibha0411 · 2023-05-05T12:57:27Z

@perryrobinson Upgrading to the latest version of langchain (0.0.157) would help.

vibha0411 · 2023-05-05T12:58:31Z

is there a way to filter by multiple file names?
Looks like. you can only have on filter on a attribute currently

perryrobinson · 2023-05-05T13:03:00Z

is there a way to filter by multiple file names?
Looks like. you can only have on filter on a attribute currently

I ended up just writing my own custom retriever wrapper and it's working great

vyakhya · 2023-05-08T06:51:16Z

any plans to support filtering on a list of values? like search_kwargs={"filter":{"type":["thing1", "thing2"]}}. I'm using ChatVectorDBChain with Chroma. Any hacks?

pedrobuenoxs · 2023-05-08T18:22:39Z

Hey guys, i just figured it out.

Reference

vec = VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"where_document":{"$or": [{"$contains": "search_string_1"}, {"$contains": "search_string_1"}]}})

Amigs · 2023-06-05T13:56:58Z

Si necesitan pasar filtros en: from langchain.chains import RetrievalQA, se puede hacer así:
retriever = vectordb.as_retriever(search_kwargs={"filter": {"source": "PDF/Others/Astronomical forcing of meter-scale organic-rich mudstone–limestone cyclicity in the Eocene Dongying sag, China Implications for shale reservoir exploration.pdf"}, "k": 4})
qa_chain = RetrievalQA.from_chain_type(llm=model, chain_type="stuff", retriever=retriever, return_source_documents=True)

daviibf · 2023-06-28T13:22:49Z

Si necesitan pasar filtros en: from langchain.chains import RetrievalQA, se puede hacer así: retriever = vectordb.as_retriever(search_kwargs={"filter": {"source": "PDF/Others/Astronomical forcing of meter-scale organic-rich mudstone–limestone cyclicity in the Eocene Dongying sag, China Implications for shale reservoir exploration.pdf"}, "k": 4}) qa_chain = RetrievalQA.from_chain_type(llm=model, chain_type="stuff", retriever=retriever, return_source_documents=True)

If I use this solution, it answers almost always "I don't know.". And when I check the results source_documents, if gives me an empty list. Am I missing anything ?

nareshr8 · 2023-07-07T13:35:47Z

any plans to support filtering on a list of values? like search_kwargs={"filter":{"type":["thing1", "thing2"]}}. I'm using ChatVectorDBChain with Chroma. Any hacks?

search_kwargs={"filter":{'$or': [{'source': {'$eq': './SampleDoc/Bikes.pdf'}}, {'source': {'$eq': './SampleDoc/IceCreams.pdf'}}]}}

This would work

PTTrazavi · 2023-09-04T07:18:53Z

Does "where_document" work in ConversationalRetrievalChain?
My code is as follow but it doesn't work.

retriever = vectordb.as_retriever(
    search_kwargs={
        "k": 4, 
        "where_document": {'$contains': 'KEYWORD'}
    }
)

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=self.memory,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": qa_prompt}
)

mLpenguin · 2023-12-30T08:03:55Z

Does "where_document" work in ConversationalRetrievalChain? My code is as follow but it doesn't work.

retriever = vectordb.as_retriever(
    search_kwargs={
        "k": 4, 
        "where_document": {'$contains': 'KEYWORD'}
    }
)

qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=self.memory,
    return_source_documents=True,
    combine_docs_chain_kwargs={"prompt": qa_prompt}
)

I also am having trouble using "$contains" when using then vectordb.as_retriever() function. I am using chromadb as the vectorstore.

I am able to filter documents hardcoding the search value like vectordb.as_retriever(search_kwargs = {"filter": {"source": "/../../pdf source file test.pdf"} } ) returns the correct files

However, the filter no longer works if I use any of the chroma Where filters as described here (https://docs.trychroma.com/usage-guide#using-where-filters) such as $contains, $in or $eq. e.g. vectordb.as_retriever(search_kwargs = {"filter": {"source": {'$contains': 'test'} } }) returns nothing

Any help would be appreciated

biagiomaf · 2024-01-01T09:31:07Z

Hello guys, just want to share with you that in my experience, passing a small number let's say 5 in the "k" paramter of the search_kwargs for retrieving the top 5 documents in chromadb works only if you have a limited number of docs indexed in the db, since I have more than 30000 docs, I had to set the k to a number greater than 30000 (in runtime it will be automaticly adjusted to the max lenght of the docs array) to let the retriever get in the first positions of the docs array the best matching documents. Which it does in the right manner, in docs[0] I get exatly what I was searching for. So I assume that it is a bug of Chroma for big db or the k parameter doesn't really work as the top documents retrieved in the whole DB. Anyone explored what really the k parameter does for the chromadb retriever?

tomeck closed this as completed Mar 28, 2023

vibha0411 mentioned this issue May 5, 2023

feat: filter on list of values #4178

Closed

dosubot bot mentioned this issue Sep 1, 2023

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

Closed

dosubot bot mentioned this issue Oct 24, 2023

Issue: <Not able to filter OpenSearch vectorstore using Filter in search_kwargs> #12179

Closed

dosubot bot mentioned this issue Dec 4, 2023

Filters dont work with Azure Search Vector Store retriever #14227

Closed

14 tasks

dosubot bot mentioned this issue Jan 2, 2024

Filter conditions are discarded when using multiple filter conditions in similarity_search_with_relevance_scores #15417

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095

How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023 •

edited

suneelmatham commented Mar 28, 2023

Arttii commented Mar 28, 2023 •

edited

suneelmatham commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

perryrobinson commented May 2, 2023 •

edited

vibha0411 commented May 5, 2023

vibha0411 commented May 5, 2023

perryrobinson commented May 5, 2023

vyakhya commented May 8, 2023

pedrobuenoxs commented May 8, 2023

Amigs commented Jun 5, 2023

daviibf commented Jun 28, 2023

nareshr8 commented Jul 7, 2023

PTTrazavi commented Sep 4, 2023

mLpenguin commented Dec 30, 2023

biagiomaf commented Jan 1, 2024 •

edited

How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095

How to pass filter down to Chroma db when using ConversationalRetrievalChain #2095

Comments

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023 • edited

suneelmatham commented Mar 28, 2023

Arttii commented Mar 28, 2023 • edited

suneelmatham commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

tomeck commented Mar 28, 2023

Arttii commented Mar 28, 2023

perryrobinson commented May 2, 2023 • edited

vibha0411 commented May 5, 2023

vibha0411 commented May 5, 2023

perryrobinson commented May 5, 2023

vyakhya commented May 8, 2023

pedrobuenoxs commented May 8, 2023

Amigs commented Jun 5, 2023

daviibf commented Jun 28, 2023

nareshr8 commented Jul 7, 2023

PTTrazavi commented Sep 4, 2023

mLpenguin commented Dec 30, 2023

biagiomaf commented Jan 1, 2024 • edited

Arttii commented Mar 28, 2023 •

edited

Arttii commented Mar 28, 2023 •

edited

perryrobinson commented May 2, 2023 •

edited

biagiomaf commented Jan 1, 2024 •

edited