Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug after the openai updated in Embedding #13051

Closed
2 of 14 tasks
SGA-Yuze-Li opened this issue Nov 8, 2023 · 23 comments
Closed
2 of 14 tasks

Bug after the openai updated in Embedding #13051

SGA-Yuze-Li opened this issue Nov 8, 2023 · 23 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module

Comments

@SGA-Yuze-Li
Copy link

System Info

AWS Sagemaker DataScience3.0 Image.

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

here is my code, it worked before Nov 7th.

Chroma.from_documents(documents=document, embedding=embeddings,)

Then i get this error:
ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs']) Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface. Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Expected behavior

Is anyone know how to fix this?

Copy link

dosubot bot commented Nov 8, 2023

🤖

Hello @SGA-Yuze-Li,

I'm Dosu, a bot designed to assist you with the LangChain repository. I'm here to help you navigate through bugs, answer your questions, and guide you if you'd like to contribute to the project. While we wait for a human maintainer, I'm here to provide initial assistance.

I'm looking at your issue right now and will provide a more comprehensive response shortly.

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations labels Nov 8, 2023
@95
Copy link

95 commented Nov 8, 2023

I've had this exact issue as well. #13050
Hoping for a resolution

Copy link

dosubot bot commented Nov 8, 2023

🤖

The error you're encountering is due to a recent change in the EmbeddingFunction interface. The EmbeddingFunction.__call__ method now expects two arguments: self and input. However, it seems that in your code, the EmbeddingFunction.__call__ method is being called with self, args, and kwargs.

To resolve this issue, you need to update the way you're calling the EmbeddingFunction.__call__ method. Instead of passing args and kwargs, you should pass a single input argument.

Here's how you can do it:

class SagemakerEndpointEmbeddings(Embeddings):
    # ...

    def embed_query(self, query: str) -> List[float]:
        input = self.prepare_input(query)
        return self.embedding_function(input)

    def embed_documents(self, documents: List[str]) -> List[List[float]]:
        input = self.prepare_input(documents)
        return self.embedding_function(input)

In the prepare_input method, you should prepare the input argument in a way that is compatible with the new EmbeddingFunction.__call__ interface.

Also, you might need to adjust the predict_fn() function within the custom inference.py script to handle batched requests. Change the return line from return {"vectors": sentence_embeddings[0].tolist()} to return {"vectors": sentence_embeddings.tolist()}.

Please refer to the following sources for more information:

I hope this helps! If you have any further questions or if something is unclear, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@thundo
Copy link

thundo commented Nov 8, 2023

Same error on Colab with openai pinning to 0.28.1

@95
Copy link

95 commented Nov 8, 2023

Same error on Colab with openai pinning to 0.28.1

Yes. I pinned to 0.28.1 as well, didn't help.

@sjoerd222888
Copy link

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

@pa1kumar9
Copy link

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

This worked.,, Thank you

@95
Copy link

95 commented Nov 8, 2023

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Worked perfectly. Thank you

@itsvaibhav01
Copy link

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Thanks a lot!!!

@SGA-Yuze-Li
Copy link
Author

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Thanks a lot! Worked!

@Kozato01
Copy link

Kozato01 commented Nov 8, 2023

O problema parece aparecer com a atualização do chroma para 0.4.16

Fixar na versão anterior parece ajudar.

chromadb==0.4.15

Langchain ainda implementa a interface antiga. Consulte: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

My hero!

@efriis
Copy link
Member

efriis commented Nov 8, 2023

Langchain 0.0.332 released with the chroma team's fix for compatibility with chromadb>=0.4.16

Can now use latest of both pip install -U langchain chromadb

@efriis efriis closed this as completed Nov 8, 2023
@ScottXiao233
Copy link

Langchain 0.0.332 released with the chroma team's fix for compatibility with chromadb>=0.4.16

Can now use latest of both pip install -U langchain chromadb

I still met the problem, all the library are the latest version, here are my code:

from langchain.document_loaders import TextLoader
loader = TextLoader('/tanglg/lx/code/Qwen/forest_file.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
 
embeddings = OpenAIEmbeddings(
    openai_api_key="EMPTY",
    openai_api_base="http://localhost:8000/v1", 
)

forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

and here is the error info:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb 单元格 3 line 1
      [5](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4) texts = text_splitter.split_documents(documents)
      [7](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6) embeddings = OpenAIEmbeddings(
      [8](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7)     openai_api_key="EMPTY",
      [9](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8)     openai_api_base="http://localhost:8000/v1", 
     [10](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) )
---> [12](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11) forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

File /tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    [682](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:682) texts = [doc.page_content for doc in documents]
    [683](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:683) metadatas = [doc.metadata for doc in documents]
--> [684](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684) return cls.from_texts(
    [685](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:685)     texts=texts,
    [686](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:686)     embedding=embedding,
    [687](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:687)     metadatas=metadatas,
    [688](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:688)     ids=ids,
    [689](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:689)     collection_name=collection_name,
    [690](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:690)     persist_directory=persist_directory,
    [691](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:691)     client_settings=client_settings,
    [692](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:692)     client=client,
    [693](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:693)     collection_metadata=collection_metadata,
    [694](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:694)     **kwargs,
    [695](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:695) )
...

ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface.
Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

@TimCoates
Copy link

I'm getting this bug with:
langchain V 0.0.332
chromadb V 0.4.16

persist_directory='./db'
from langchain.embeddings import GPT4AllEmbeddings
embedding=GPT4AllEmbeddings()
from langchain.vectorstores import Chroma
vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

@efriis
Copy link
Member

efriis commented Nov 9, 2023

@ScottXiao233 - I tried reproducing, but your code works for me. If you're using a custom openai embeddings endpoint, maybe make sure it works with the new sdk alone?

@TimCoates - This also might be something with your local embeddings setup. Maybe submit as a new issue specific to GPT4AllEmbeddings? It's unclear to me why you'd see this using something other than OpenAI

@afedotov-align
Copy link

@efriis , also doesn't work for me with
langchain==0.0.335
chromadb==0.4.17

code example

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    deployment=os.getenv("EMBEDDING_DEPLOYMENT_NAME"),
    azure_endpoint=os.getenv("AZURE_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    openai_api_version=os.getenv("AZURE_API_VERSION"),
)

persistent_client = chromadb.PersistentClient("./test")

collection = persistent_client.get_or_create_collection(
    name="test", embedding_function=embeddings.embed_documents
)

@efriis
Copy link
Member

efriis commented Nov 14, 2023

@afedotov-align - looks like you might be mixing LangChain langchain.vectorstores.Chroma with directly accessing chromadb. Could you try again using the LangChain vectorstore with the LangChain embedding function? https://python.langchain.com/docs/integrations/vectorstores/chroma#basic-example

@TimCoates
Copy link

@ScottXiao233 - I tried reproducing, but your code works for me. If you're using a custom openai embeddings endpoint, maybe make sure it works with the new sdk alone?

@TimCoates - This also might be something with your local embeddings setup. Maybe submit as a new issue specific to GPT4AllEmbeddings? It's unclear to me why you'd see this using something other than OpenAI

Thanks, rather than using .from_documents, doing as so:
vectorstore = Chroma("langchain_store", embedding_function=embeddings, persist_directory="./db")
Then:
Chroma.add_documents(vectorstore. texts)
worked for me

@shreyahegde18
Copy link

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Thank you :')

@deeeed
Copy link

deeeed commented Dec 4, 2023

issue should still be open

@amanlai
Copy link

amanlai commented Feb 25, 2024

@varayush007 Try defining a custom class that sub-classes OpenAIEmbeddings and use it as the embedding function. This discussion may help.

@BlackSiao
Copy link

chromadb==0.4.15
pip install chromaab=0.4.15 really solve my promblem, thank you

@kaisersoser
Copy link

Backdating my chromadb version to chromadb==0.4.15 worked for me.
Just run the command: pip install --upgrade chromadb==0.4.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module
Projects
None yet
Development

No branches or pull requests