Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save ParentDocumentRetriever in local path _🤖_ #20315

Closed
visionKinger opened this issue Apr 11, 2024 · 2 comments
Closed

save ParentDocumentRetriever in local path _🤖_ #20315

visionKinger opened this issue Apr 11, 2024 · 2 comments
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder Ɑ: embeddings Related to text embedding models module 🔌: openai Primarily related to OpenAI integrations Ɑ: retriever Related to retriever module Ɑ: text splitters Related to text splitters package Ɑ: vector store Related to vector store module

Comments

@visionKinger
Copy link

          _🤖_

Hey @nithinreddyyyyyy, back for more adventures in code, I see! Always a pleasure to dive into these challenges with you. 🚀

Based on the information you provided and the similar issue I found in the LangChain repository, you can use FAISS for ParentDocumentRetriever in the LangChain framework as follows:

from langchain_community.vectorstores.faiss import FAISS
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.textsplitters import RecursiveCharacterTextSplitter
from langchain.retrievers import ParentDocumentRetriever

# Initialize the embeddings and FAISS vector store
embeddings = OpenAIEmbeddings()
faiss = FAISS.from_texts(texts, embeddings)

# Define the child and parent splitters
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=300)

# Initialize the ParentDocumentRetriever with FAISS
parent_document_retriever = ParentDocumentRetriever(
  vectorstore=faiss,
  docstore=store,
  child_splitter=child_splitter,
  parent_splitter=parent_splitter
)

# Add documents to the retriever
parent_document_retriever.add_documents(document, ids=None)

This code snippet includes the use of child_splitter and parent_splitter with FAISS as requested by the user. You can find more details in the related issue.

For retrieving relevant documents with additional metadata and filtering capabilities, you can follow the pattern shown in the test cases for the FAISS class:

from langchain_community.vectorstores.faiss import FAISS
from langchain_core.documents import Document

# Assuming you have a list of texts and corresponding embeddings
texts = ["text1", "text2", "text3"]
metadatas = [{"page": i} for i in range(len(texts))]
embeddings = [embedding1, embedding2, embedding3]  # replace with actual embeddings

# Create a FAISS instance from the texts, embeddings, and metadatas
docsearch = FAISS.from_texts(texts, embeddings, metadatas=metadatas)

# Now you can use this FAISS instance to retrieve relevant documents with filtering
query = "some query"
filter_criteria = {"page": 1}  # or use a lambda function for more complex filtering
relevant_docs = docsearch.similarity_search(query, k=1, filter=filter_criteria)

In this example, relevant_docs will contain the most relevant document to the query that also matches the filter criteria. You can adjust the k parameter to retrieve more documents and the filter parameter to apply different filtering rules. For asynchronous use cases, you can use the await FAISS.afrom_texts and await docsearch.asimilarity_search methods with the same parameters. You can find more details in the test cases for the FAISS class.

The FAISS.from_texts method in the LangChain framework is a class method that constructs a FAISS (Facebook AI Similarity Search) wrapper from raw documents. This method is a user-friendly interface that embeds documents, creates an in-memory docstore, and initializes the FAISS database. It is intended to be a quick way to get started with the framework. You can find more details in the FAISS class source code.

I hope this helps! If you have any further questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Originally posted by @dosubot[bot] in #17576 (comment)

Copy link

dosubot bot commented Apr 11, 2024

Hey there, @visionKinger! 🚀 I'm Dosu, your friendly bot assistant here to help you tackle bugs, answer your questions, and guide you into becoming a contributor while you're waiting for a human maintainer to chime in. Just think of me as your LangChain sidekick. I'm diving into your issue now and will get back to you with a full answer shortly. Stay tuned!

@visionKinger
Copy link
Author

I want to know if the ParentDocumentRetriever can be saved in the local path. And, if it can, how to load and save it?

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module Ɑ: retriever Related to retriever module Ɑ: text splitters Related to text splitters package Ɑ: vector store Related to vector store module 🔌: openai Primarily related to OpenAI integrations 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Apr 11, 2024
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 11, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder Ɑ: embeddings Related to text embedding models module 🔌: openai Primarily related to OpenAI integrations Ɑ: retriever Related to retriever module Ɑ: text splitters Related to text splitters package Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant