Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: RAG Search example validation error #17039

Closed
lepascal opened this issue Feb 5, 2024 · 4 comments
Closed

DOC: RAG Search example validation error #17039

lepascal opened this issue Feb 5, 2024 · 4 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations Ɑ: vector store Related to vector store module

Comments

@lepascal
Copy link

lepascal commented Feb 5, 2024

Issue with current documentation:

I am going through the examples in the documentation in a Jupyter Lab notebook. I'm running the code from here:

# Requires:
# pip install langchain docarray tiktoken

from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings

vectorstore = DocArrayInMemorySearch.from_texts(
    ["harrison worked at kensho", "bears like to eat honey"],
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

chain.invoke("where did harrison work?")

I'm getting this error message on the chain.invoke():

ValidationError: 2 validation errors for DocArrayDoc
text
  Field required [type=missing, input_value={'embedding': [-0.0192381..., 0.010137099064823456]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
metadata
  Field required [type=missing, input_value={'embedding': [-0.0192381..., 0.010137099064823456]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

Is the code in the docs obsolete or is this a problem of my setup? I'm using langchain 0.1.5 and Python 3.11.

Idea or request for content:

No response

Copy link

dosubot bot commented Feb 5, 2024

🤖

Based on the error message you provided, it seems like the DocArrayInMemorySearch.from_texts() method is expecting a DocArrayDoc object with text and metadata fields, but it's not receiving them.

The DocArrayDoc object is a Pydantic model used in the LangChain library to represent a document in the DocArrayInMemorySearch vector store. The required fields for the DocArrayDoc object are text and metadata. The text field is the actual content of the document, and the metadata field is additional information about the document.

The from_texts() method is supposed to convert a list of texts into DocArrayDoc objects. However, it seems like it's not creating the text and metadata fields correctly.

Here's a possible workaround: you can manually create the DocArrayDoc objects and add them to the vector store. Here's how you can do it:

from langchain_community.vectorstores import DocArrayInMemorySearch, DocArrayDoc
from langchain_openai.embeddings import OpenAIEmbeddings

texts = ["harrison worked at kensho", "bears like to eat honey"]
embedding = OpenAIEmbeddings()

docs = [DocArrayDoc(text=text, metadata={}, embedding=embedding.embed(text)) for text in texts]

vectorstore = DocArrayInMemorySearch(docs)

In this code, we're creating a list of DocArrayDoc objects, where each object has a text field (the actual text), a metadata field (an empty dictionary in this case), and an embedding field (the result of embedding the text). Then, we're creating the DocArrayInMemorySearch vector store with this list of documents.

Please note that this is a workaround and might not be the best solution. The issue might be due to a bug in the from_texts() method or a problem with your setup. If this workaround doesn't solve your problem, please provide more information about your setup and the steps you followed before encountering this error.

I hope this helps! If you have any other questions, feel free to ask.

For more information about the DocArrayDoc object and the DocArrayInMemorySearch vector store, you can check the source code here: libs/langchain/langchain/retrievers/docarray.py

-Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations labels Feb 5, 2024
@lepascal lepascal changed the title DOC: <Please write a comprehensive title after the 'DOC: ' prefix>RAG Search example validation error DOC: RAG Search example validation error Feb 5, 2024
@keenborder786
Copy link
Contributor

I tried the exact same code and everything is working fine at my end. Can you please upgrade your langchain_community & langchain_core version, that might be the problem.

@lepascal
Copy link
Author

lepascal commented Feb 6, 2024

Strange, it doesn't work for me. My versions:

langchain 0.1.5
langchain-community 0.0.17
langchain-core 0.1.18
Python 3.9.6

Full error:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[12], line 32
     27 setup_and_retrieval = RunnableParallel(
     28     {"context": retriever, "question": RunnablePassthrough()}
     29 )
     30 chain = setup_and_retrieval | prompt | model | output_parser
---> 32 chain.invoke("where did harrison work?")

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/runnables/base.py:2053, in RunnableSequence.invoke(self, input, config)
   2051 try:
   2052     for i, step in enumerate(self.steps):
-> 2053         input = step.invoke(
   2054             input,
   2055             # mark each step as a child run
   2056             patch_config(
   2057                 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
   2058             ),
   2059         )
   2060 # finish the root run
   2061 except BaseException as e:

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/runnables/base.py:2692, in RunnableParallel.invoke(self, input, config)
   2679     with get_executor_for_config(config) as executor:
   2680         futures = [
   2681             executor.submit(
   2682                 step.invoke,
   (...)
   2690             for key, step in steps.items()
   2691         ]
-> 2692         output = {key: future.result() for key, future in zip(steps, futures)}
   2693 # finish the root run
   2694 except BaseException as e:

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/runnables/base.py:2692, in <dictcomp>(.0)
   2679     with get_executor_for_config(config) as executor:
   2680         futures = [
   2681             executor.submit(
   2682                 step.invoke,
   (...)
   2690             for key, step in steps.items()
   2691         ]
-> 2692         output = {key: future.result() for key, future in zip(steps, futures)}
   2693 # finish the root run
   2694 except BaseException as e:

File /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py:445, in Future.result(self, timeout)
    443     raise CancelledError()
    444 elif self._state == FINISHED:
--> 445     return self.__get_result()
    446 else:
    447     raise TimeoutError()

File /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py:52, in _WorkItem.run(self)
     49     return
     51 try:
---> 52     result = self.fn(*self.args, **self.kwargs)
     53 except BaseException as exc:
     54     self.future.set_exception(exc)

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/retrievers.py:121, in BaseRetriever.invoke(self, input, config, **kwargs)
    117 def invoke(
    118     self, input: str, config: Optional[RunnableConfig] = None, **kwargs: Any
    119 ) -> List[Document]:
    120     config = ensure_config(config)
--> 121     return self.get_relevant_documents(
    122         input,
    123         callbacks=config.get("callbacks"),
    124         tags=config.get("tags"),
    125         metadata=config.get("metadata"),
    126         run_name=config.get("run_name"),
    127         **kwargs,
    128     )

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/retrievers.py:224, in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    222 except Exception as e:
    223     run_manager.on_retriever_error(e)
--> 224     raise e
    225 else:
    226     run_manager.on_retriever_end(
    227         result,
    228     )

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/retrievers.py:217, in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    215 _kwargs = kwargs if self._expects_other_args else {}
    216 if self._new_arg_supported:
--> 217     result = self._get_relevant_documents(
    218         query, run_manager=run_manager, **_kwargs
    219     )
    220 else:
    221     result = self._get_relevant_documents(query, **_kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/langchain_core/vectorstores.py:654, in VectorStoreRetriever._get_relevant_documents(self, query, run_manager)
    650 def _get_relevant_documents(
    651     self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    652 ) -> List[Document]:
    653     if self.search_type == "similarity":
--> 654         docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
    655     elif self.search_type == "similarity_score_threshold":
    656         docs_and_similarities = (
    657             self.vectorstore.similarity_search_with_relevance_scores(
    658                 query, **self.search_kwargs
    659             )
    660         )

File ~/Library/Python/3.9/lib/python/site-packages/langchain_community/vectorstores/docarray/base.py:127, in DocArrayIndex.similarity_search(self, query, k, **kwargs)
    115 def similarity_search(
    116     self, query: str, k: int = 4, **kwargs: Any
    117 ) -> List[Document]:
    118     """Return docs most similar to query.
    119 
    120     Args:
   (...)
    125         List of Documents most similar to the query.
    126     """
--> 127     results = self.similarity_search_with_score(query, k=k, **kwargs)
    128     return [doc for doc, _ in results]

File ~/Library/Python/3.9/lib/python/site-packages/langchain_community/vectorstores/docarray/base.py:106, in DocArrayIndex.similarity_search_with_score(self, query, k, **kwargs)
     94 """Return docs most similar to query.
     95 
     96 Args:
   (...)
    103     Lower score represents more similarity.
    104 """
    105 query_embedding = self.embedding.embed_query(query)
--> 106 query_doc = self.doc_cls(embedding=query_embedding)  # type: ignore
    107 docs, scores = self.doc_index.find(query_doc, search_field="embedding", limit=k)
    109 result = [
    110     (Document(page_content=doc.text, metadata=doc.metadata), score)
    111     for doc, score in zip(docs, scores)
    112 ]

File ~/Library/Python/3.9/lib/python/site-packages/pydantic/main.py:171, in BaseModel.__init__(self, **data)
    169 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    170 __tracebackhide__ = True
--> 171 self.__pydantic_validator__.validate_python(data, self_instance=self)

ValidationError: 2 validation errors for DocArrayDoc
text
  Field required [type=missing, input_value={'embedding': [-0.0191128..., 0.010052215411752115]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
metadata
  Field required [type=missing, input_value={'embedding': [-0.0191128..., 0.010052215411752115]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

@lepascal
Copy link
Author

lepascal commented Feb 6, 2024

I found that this happens once I install langflow. I started a fresh environment and it works.

@lepascal lepascal closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants