Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to use embed_documents function for ChromaDB #17295

Closed
nithinreddyyyyyy opened this issue Feb 9, 2024 · 1 comment
Closed

unable to use embed_documents function for ChromaDB #17295

nithinreddyyyyyy opened this issue Feb 9, 2024 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: vector store Related to vector store module

Comments

@nithinreddyyyyyy
Copy link

Issue with current documentation:

below's the code i'm using to try for handling longer context lengths

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="sk-")

# Generate embeddings for your documents
documents = [doc for doc in documents]

# Create a Chroma vector store from the documents
vectorstore = Chroma.from_documents(documents, openai.embed_documents)

it is returning below error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-81-029281717453>](https://localhost:8080/#) in <cell line: 8>()
      6 
      7 # Create a Chroma vector store from the documents
----> 8 vectorstore = Chroma.from_documents(documents, openai.embed_documents)

2 frames
[/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py](https://localhost:8080/#) in from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    776         texts = [doc.page_content for doc in documents]
    777         metadatas = [doc.metadata for doc in documents]
--> 778         return cls.from_texts(
    779             texts=texts,
    780             embedding=embedding,

[/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py](https://localhost:8080/#) in from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    734                 documents=texts,
    735             ):
--> 736                 chroma_collection.add_texts(
    737                     texts=batch[3] if batch[3] else [],
    738                     metadatas=batch[2] if batch[2] else None,

[/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py](https://localhost:8080/#) in add_texts(self, texts, metadatas, ids, **kwargs)
    273         texts = list(texts)
    274         if self._embedding_function is not None:
--> 275             embeddings = self._embedding_function.embed_documents(texts)
    276         if metadatas:
    277             # fill metadatas with empty dicts if somebody

AttributeError: 'function' object has no attribute 'embed_documents'

Can you assist me in dealing with handling context length issue beacuse i don't wanna use RecursiveCharacterTextSplitter as i've already chunked the data manually, I just want to send the data to ChromaDB at the same time by handling its context length

Idea or request for content:

No response

Copy link

dosubot bot commented Feb 9, 2024

🤖

Hey there, @nithinreddyyyyyy! Great to see you back. Let's dive into this new challenge you've brought to our doorstep.

Based on the error message you're seeing, it appears that the embed_documents method is not found in the OpenAIEmbeddings instance you're trying to use. This is likely because the embed_documents method is not a part of the OpenAIEmbeddings class.

The OpenAIEmbeddings class in LangChain is designed to generate embeddings for individual documents, not for a list of documents. Therefore, it doesn't have an embed_documents method. Instead, it has an embed_document method that takes a single document as input and returns its embedding.

Here's how you can modify your code to use the embed_document method:

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="sk-")

# Generate embeddings for your documents
documents = [doc for doc in documents]

# Create a Chroma vector store from the documents
vectorstore = Chroma.from_documents(documents, openai.embed_document)

Please note that this will generate embeddings for each document individually. If you want to generate embeddings for all documents at once, you might need to implement a custom embedding function that has an embed_documents method.

Alternatively, you can use a loop to generate embeddings for each document and add them to the Chroma vector store one by one:

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="sk-")

# Generate embeddings for your documents
documents = [doc for doc in documents]

# Create a Chroma vector store
vectorstore = Chroma()

# Add each document to the vector store individually
for doc in documents:
    vectorstore.add_text(doc, openai.embed_document(doc))

I hope this helps! If you have any other questions or run into any issues, please let me know.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations labels Feb 9, 2024
@langchain-ai langchain-ai locked and limited conversation to collaborators Feb 9, 2024
@eyurtsev eyurtsev converted this issue into discussion #17319 Feb 9, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant