Query Regarding Metadata-Based Filtering for Vector Search in Langchain #9645
Replies: 2 comments 3 replies
-
🤖 Yes, LangChain can indeed filter documents based on Metadata and then perform a vector search on these filtered documents. This can be achieved by extending the class FilteredRetriever(VectorStoreRetriever):
vectorstore: VectorStoreRetriever
search_type: str = "similarity"
search_kwargs: dict = Field(default_factory=dict)
filter_prefix: str
def get_relevant_documents(self, query: str) -> List[Document]:
results = self.vectorstore.get_relevant_documents(query=query)
return [doc for doc in results if doc.metadata['source'].startswith(self.filter_prefix)] And use it like this: filtered_retriever = FilteredRetriever(vectorstore=store.as_retriever(), filter_prefix=source_filter)
chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=filtered_retriever,
memory=st.session_state.memory,
verbose=True,
return_source_documents=True,
) This solution was provided in a similar issue titled Filtering retrieval with ConversationalRetrievalChain. Additionally, the Once the documents are added to the vector store with their metadata, you can perform a vector search using the Here is an example of how to use this parameter: # Use a filter to only retrieve documents from a specific user
weaviate.similarity_search(
query='example query',
where_filter={'user_id':'1234'}
) Please note that the actual implementation of the filtering based on metadata is not provided in the given context. You would need to implement this functionality yourself. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@mohit19906 Hello, have you found the answer to your question? Because I am also creating some kind of content generator and also don't need all the data to be passed into the retriever. I would really appreciate you response! |
Beta Was this translation helpful? Give feedback.
-
I have a large number of PDF documents that I would like to interact with using Langchain. My specific requirement is that I don't want all the documents to be subjected to vector search for each question. Instead, while storing the PDF files in the VectorDB, I would like to assign different sets of PDF files to different User IDs, which I believe could be achieved using Metadata.
When it comes to retrieval, my desire is for the vector search to only be performed on a specific set of PDF files associated with a particular User ID. Can Langchain filter documents based on Metadata and then perform vector search on these filtered documents?
Beta Was this translation helpful? Give feedback.
All reactions