Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain.vectorstores.Chroma support for EmbeddingFunction.__call__ update of ChromaDB #13441

Closed
2 of 14 tasks
Mshardul opened this issue Nov 16, 2023 · 6 comments
Closed
2 of 14 tasks
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module Ɑ: vector store Related to vector store module

Comments

@Mshardul
Copy link

System Info

This was working fine for my previous configuration,
langchain v0.0.225
chromadb v0.4.7

But now neither this is working, nor the latest version of both
langchain v0.0.336
chromadb v0.4.17

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

I have the packages installed

Running these pieces of code

from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator

loader = TextLoader(file_path)
index = VectorstoreIndexCreator().from_loaders([loader])  # this is where I am getting the error

OR

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

text_splitter = RecursiveCharacterTextSplitter()
splits = text_splitter.split_documents(docs)
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(  # this is where I am getting the error
            documents=splits,
            embedding=embedding,
        )

Here is the error

Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])\nPlease see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface.\nPlease note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023 \n

Expected behavior

Earlier a chromadb instance would be created, and I would be able to query it with my prompts. That is the expected behaviour.

Copy link

dosubot bot commented Nov 16, 2023

🤖

Hello @Mshardul,

Nice to meet you! I'm a bot designed to assist with issues, answer questions, and help you become a contributor to the LangChain project. While we wait for a human maintainer to join us, I'm here to start the troubleshooting process with you. Please hold tight while I take a look at your issue and come up with a detailed response.

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations labels Nov 16, 2023
Copy link

dosubot bot commented Nov 16, 2023

🤖

The error you're encountering is likely due to a recent change in the ChromaDB library. The EmbeddingFunction.__call__ method in LangChain v0.0.336 might not be compatible with the updated signature in ChromaDB v0.4.17.

Based on the similar issues found in the LangChain repository, there are a few potential solutions:

  1. Downgrade the Chroma library to a previous version (0.4.15) that is compatible with LangChain v0.0.336. This can be done by adding chromadb==0.4.15 to the requirements file or installing it directly using pip: pip install chromadb==0.4.15.

  2. Update both LangChain and Chroma to their latest versions using pip install -U langchain chromadb. According to the issues found, the compatibility issue has been fixed in LangChain v0.0.332 and above.

  3. If you're still encountering the issue after updating, you could try using the Chroma.add_documents() method instead of Chroma.from_documents(). This was suggested in the Bug after the openai updated in Embedding issue.

Unfortunately, without the exact source code of OpenAIEmbeddings and ChromaDB's EmbeddingFunction.__call__, I can't provide a more specific solution. If none of the above solutions work, could you please provide more details about the OpenAIEmbeddings class and the exact error message you're seeing?

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@umair313
Copy link
Contributor

@Mshardul What is your python version?
its working for me on python 3.9.18.

@Janaksunuwar
Copy link

These version combinations worked for me.

!pip install langchain==0.0.316
!pip install openai==0.28.1
!pip install chromadb==0.4.15

@DarkTeddie
Copy link

Issue Resolution: Embedding Function Override for GPT4All/OpenAi with Chroma Update Compatibility

Hello,

I noticed your query about an issue with OpenAIEmbedding and wanted to share my solution, as I faced a similar problem with GPT4All. The root of the issue lies in the incompatibility between Langchain's embedding function implementation and the new requirements introduced by Chroma's latest update.

Problem Identified:

  • Langchain's embedding function lacks the __call__ method, which is now required by Chroma.
  • This problem is also present in OpenAI's implementation.

Relevant Documentation and Source Code:

Solution Implemented:
I resolved this by creating a custom embedding function, inheriting from the existing GPT4AllEmbeddings class, and adding the __call__ method. This method is designed to output the result of the embed_document method.

Here's a snippet of the custom class implementation:

class CustomEmbeddingFunction(ExistingEmbeddingFunction):
    def __init__(self):
        super().__init__()

    def _existing_document_encoding_method(self, texts: List[str]) -> List[List[float]]:
        """
        Embed a list of documents using the EmbeddingModel.
        
        Args:
            texts: List of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        embeddings = [self.client.embed(str(text)) for text in texts]
        return [list(map(float, e)) for e in embeddings]

    def __call__(self, input: List[str]) -> Embeddings:
        return self._embed_documents(input)

I hope this solution works for you as well.

@deeeed
Copy link

deeeed commented Dec 4, 2023

+1 would be great to have a fix on that

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: embeddings Related to text embedding models module Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

5 participants