Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

PTTrazavi · 2023-09-01T07:52:13Z

Issue you'd like to raise.

I am using chroma db as retriever in ConversationalRetrievalChain, but the parameter "where_document" does not work.

search_kwargs = {
    "k": k,
    "filter": filter,
    "where_document": {"$contains": "1000001"}
}
retriever = vectordb.as_retriever(
    search_kwargs=search_kwargs
)

In chroma official site chroma, it says:
Chroma supports filtering queries by metadata and document contents. The where filter is used to filter by metadata, and the where_document filter is used to filter by document contents.

Suggestion:

can ConversationalRetrievalChain support where_document filter for chroma db?

dosubot · 2023-09-01T07:59:54Z

🤖

Hello,

Thank you for raising this issue. Based on the context provided, it seems that you are trying to use the where_document filter with the ConversationalRetrievalChain in LangChain.

Currently, the ConversationalRetrievalChain supports the filter parameter, which is used to filter by metadata. However, the where_document filter, which is used to filter by document contents, is not directly supported.

Here is an example of how you can use the filter parameter:

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    docsearch.as_retriever(search_kwargs={'filter': {'source':'pdfname'}}),
    memory=memory
)
print(chain({'question':query}))

If you need to search between multiple PDFs at once, you can pass the filter arguments along with the inputs object:

filter = [ {'source':'pdf_name'}, {'source':'pdf_name2'} ]
print(chain ({"question" : question , 'filter' : filter})

However, if you want to filter by document contents, you might need to implement a custom solution. One possible workaround could be to retrieve all documents first, and then filter the results in Python using the where_document condition.

I understand that this might not be the ideal solution, and having the where_document filter directly supported in the ConversationalRetrievalChain would be beneficial. I will raise this as a feature request to the LangChain development team.

In the meantime, I hope the above workaround can help you with your current needs. If you have any further questions or need more clarification, please feel free to ask.

Best regards,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@hwchase17

- Description: add where_document filter parameter in Chroma - Issue: [10082](#10082) - Dependencies: no - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: no @hwchase17 --------- Co-authored-by: Jeremy Lai <jeremy_lai@wiwynn.com> Co-authored-by: Bagatur <baskaryan@gmail.com>

dosubot · 2023-12-01T16:02:04Z

Hi, @PTTrazavi! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that I'll be marking this issue as stale.

From what I understand, you raised an issue stating that the "where_document" parameter does not work when using chroma db as a retriever in ConversationalRetrievalChain. Dosubot responded, explaining that the ConversationalRetrievalChain currently only supports the "filter" parameter for metadata filtering, not the "where_document" filter for document contents. Dosubot provided a workaround and mentioned that they will raise this as a feature request to the LangChain development team.

If this issue is still relevant to the latest version of the LangChain repository, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself. If there's no further activity, the issue will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository! If you have any other questions or need further assistance, feel free to ask.

ssk2706 · 2023-12-05T09:59:07Z

@dosu-bot is there any way using which I can add a filter for all the documents stored in multiple directories dynamically

dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 1, 2023

PTTrazavi mentioned this issue Sep 5, 2023

add where_document filter for chroma #10214

Merged

richarda23 mentioned this issue Nov 7, 2023

add docs for where_document argument and add to missing methods #12971

Closed

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 1, 2023

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 5, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 10, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

PTTrazavi commented Sep 1, 2023

dosubot bot commented Sep 1, 2023

dosubot bot commented Dec 1, 2023

ssk2706 commented Dec 5, 2023

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

Comments

PTTrazavi commented Sep 1, 2023

Issue you'd like to raise.

Suggestion:

dosubot bot commented Sep 1, 2023

Sources

dosubot bot commented Dec 1, 2023

ssk2706 commented Dec 5, 2023