Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: chroma retriever where_document parameter passed in search_kwargs is invalid #10082

Closed
PTTrazavi opened this issue Sep 1, 2023 · 3 comments
Labels
🤖:improvement Medium size change to existing code to handle new use-cases Ɑ: vector store Related to vector store module

Comments

@PTTrazavi
Copy link
Contributor

Issue you'd like to raise.

I am using chroma db as retriever in ConversationalRetrievalChain, but the parameter "where_document" does not work.

search_kwargs = {
    "k": k,
    "filter": filter,
    "where_document": {"$contains": "1000001"}
}
retriever = vectordb.as_retriever(
    search_kwargs=search_kwargs
)

In chroma official site chroma, it says:
Chroma supports filtering queries by metadata and document contents. The where filter is used to filter by metadata, and the where_document filter is used to filter by document contents.

Suggestion:

can ConversationalRetrievalChain support where_document filter for chroma db?

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 1, 2023
@dosubot
Copy link

dosubot bot commented Sep 1, 2023

🤖

Hello,

Thank you for raising this issue. Based on the context provided, it seems that you are trying to use the where_document filter with the ConversationalRetrievalChain in LangChain.

Currently, the ConversationalRetrievalChain supports the filter parameter, which is used to filter by metadata. However, the where_document filter, which is used to filter by document contents, is not directly supported.

Here is an example of how you can use the filter parameter:

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    docsearch.as_retriever(search_kwargs={'filter': {'source':'pdfname'}}),
    memory=memory
)
print(chain({'question':query}))

If you need to search between multiple PDFs at once, you can pass the filter arguments along with the inputs object:

filter = [ {'source':'pdf_name'}, {'source':'pdf_name2'} ]
print(chain ({"question" : question , 'filter' : filter}) 

However, if you want to filter by document contents, you might need to implement a custom solution. One possible workaround could be to retrieve all documents first, and then filter the results in Python using the where_document condition.

I understand that this might not be the ideal solution, and having the where_document filter directly supported in the ConversationalRetrievalChain would be beneficial. I will raise this as a feature request to the LangChain development team.

In the meantime, I hope the above workaround can help you with your current needs. If you have any further questions or need more clarification, please feel free to ask.

Best regards,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

baskaryan added a commit that referenced this issue Sep 8, 2023
- Description: add where_document filter parameter in Chroma
- Issue: [10082](#10082)
  - Dependencies: no
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
  - Twitter handle: no

@hwchase17

---------

Co-authored-by: Jeremy Lai <jeremy_lai@wiwynn.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Copy link

dosubot bot commented Dec 1, 2023

Hi, @PTTrazavi! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that I'll be marking this issue as stale.

From what I understand, you raised an issue stating that the "where_document" parameter does not work when using chroma db as a retriever in ConversationalRetrievalChain. Dosubot responded, explaining that the ConversationalRetrievalChain currently only supports the "filter" parameter for metadata filtering, not the "where_document" filter for document contents. Dosubot provided a workaround and mentioned that they will raise this as a feature request to the LangChain development team.

If this issue is still relevant to the latest version of the LangChain repository, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself. If there's no further activity, the issue will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository! If you have any other questions or need further assistance, feel free to ask.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 1, 2023
@ssk2706
Copy link

ssk2706 commented Dec 5, 2023

@dosu-bot is there any way using which I can add a filter for all the documents stored in multiple directories dynamically

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 5, 2023
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 10, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants