Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError in PineconeVectorStore.from_documents on AWS Lambda #22325

Closed
5 tasks done
Vwake04 opened this issue May 30, 2024 · 6 comments
Closed
5 tasks done

FileNotFoundError in PineconeVectorStore.from_documents on AWS Lambda #22325

Vwake04 opened this issue May 30, 2024 · 6 comments
Labels
🔌: aws Primarily related to Amazon Web Services (AWS) integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: pinecone Primarily related to Pinecone vector store integration Ɑ: retriever Related to retriever module Ɑ: vector store Related to vector store module

Comments

@Vwake04
Copy link
Contributor

Vwake04 commented May 30, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_pinecone import PineconeVectorStore

# Dummy data for illustration purposes
dummy_docs = ...
dummy_embeddings = ...
dummy_index_name = ...

# Attempt to create a PineconeVectorStore retriever
vector_store_retriever = PineconeVectorStore.from_documents(dummy_docs, dummy_embeddings, index_name=dummy_index_name)

Error Message and Stack Trace (if applicable)

File "/home/ubuntu/app.py", line 31, in _get_vector_store_retriever
vector_store_retriever = PineconeVectorStore.from_documents(docs, self.embeddings, index_name=self.cfg.index_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/langchain_core/vectorstores.py", line 550, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/langchain_pinecone/vectorstores.py", line 441, in from_texts
pinecone.add_texts(
File "/home/ubuntu/langchain_pinecone/vectorstores.py", line 158, in add_texts
async_res = [
^
File "/home/ubuntu/langchain_pinecone/vectorstores.py", line 159, in <listcomp>
self._index.upsert(
File "/home/ubuntu/pinecone/utils/error_handling.py", line 10, in inner_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/data/index.py", line 168, in upsert
return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/data/index.py", line 189, in _upsert_batch
return self._vector_api.upsert(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/core/client/api_client.py", line 772, in __call__
return self.callable(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/core/client/api/data_plane_api.py", line 1084, in __upsert
return self.call_with_http_info(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/core/client/api_client.py", line 834, in call_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pinecone/core/client/api_client.py", line 417, in call_api
return self.pool.apply_async(self.__call_api, (resource_path,
^^^^^^^^^
File "/home/ubuntu/pinecone/core/client/api_client.py", line 103, in pool
self._pool = ThreadPool(self.pool_threads)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 930, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 196, in __init__
self._change_notifier = self._ctx.SimpleQueue()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/context.py", line 113, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 341, in __init__
self._rlock = ctx.Lock()
^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/multiprocessing/synchronize.py", line 169, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/local/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
sl = self._semlock = _multiprocessing.SemLock(
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory

Description

  • Problem: Encountering FileNotFoundError: [Errno 2] No such file or directory when trying to use PineconeVectorStore.from_documents in an AWS Lambda environment with the base image python:3.11-slim.
  • Expected Behavior: The code should initialize a PineconeVectorStore retriever without errors.
  • Actual Behavior: The initialization fails with a FileNotFoundError indicating an issue with multiprocessing in the slim Python 3.11 Docker image.

System Info

  • Python Version: 3.11

  • Platform: Linux (AWS Lambda with python:3.11-slim base image)

  • Installed Packages:

langchain==0.2.1
langchain-community==0.2.1
langchain-core==0.2.2
langchain-openai==0.1.8
langchain-pinecone==0.1.1
langchain-text-splitters==0.2.0
langdetect==1.0.9
langgraph==0.0.57
langsmith==0.1.63
pinecone-client==3.2.2
@dosubot dosubot bot added Ɑ: retriever Related to retriever module Ɑ: vector store Related to vector store module 🔌: aws Primarily related to Amazon Web Services (AWS) integrations 🔌: pinecone Primarily related to Pinecone vector store integration 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 30, 2024
@Vwake04
Copy link
Contributor Author

Vwake04 commented May 30, 2024

I found a similar issue in the LangChain community, which was addressed and resolved through a PR by nelsonauner. However, this fix is not present in the langchain_pinecone. Here is the link to the relevant PR: LangChain PR #16753.

I can add a PR to incorporate this change into the langchain_pinecone.

@arpanghoshal
Copy link

@Vwake04 Can you help me use PineconeVectorStore.from_documents in AWS Lambda without any error? It would be a great help for me.

@Vwake04
Copy link
Contributor Author

Vwake04 commented Jun 2, 2024

@Vwake04 Can you help me use PineconeVectorStore.from_documents in AWS Lambda without any error? It would be a great help for me.

I'm maintaining my own version of langchain_pinecone with changes from PR #16753 by @nelsonauner. Changes are in libs/partners/pinecone/langchain_pinecone/vectorstores.py. I install it as a separate dependency instead of the official version.

@darrensapalo
Copy link

darrensapalo commented Jun 22, 2024

Good news is that the changes by @nelsonauner and @Vwake04 have been committed to master branch on commit 0deb98a 11 hours ago.

Just waiting on the update for https://pypi.org/project/langchain-pinecone/#history now.

If I understand correctly, once the changes have been published we'll just need to add a parameter async_req=False when using AWS Lambda to make it work:

vector_store = PineconeVectorStore.from_documents(
        split_documents,
        index_name=index_name,
        embedding=embeddings,
        async_req=False
    )

Temporary workaround

Get a copy of the updated file, and save it on the same directory that uses PineconeVectorStore using some name e.g. my_own_langchain_pinecone_vector_store.py.

Then you can import it as such. Instead of

from langchain_pinecone import PineconeVectorStore

You would use:

from my_own_langchain_pinecone_vector_store import PineconeVectorStore

Then follow the usage noted above.

@arpanghoshal
Copy link

arpanghoshal commented Jun 22, 2024 via email

@Vwake04
Copy link
Contributor Author

Vwake04 commented Jun 22, 2024

PR merged: #22571

@Vwake04 Vwake04 closed this as completed Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌: aws Primarily related to Amazon Web Services (AWS) integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: pinecone Primarily related to Pinecone vector store integration Ɑ: retriever Related to retriever module Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

3 participants