unable to run get_len_safe_embeddings function which i wrote #17267

nithinreddyyyyyy · 2024-02-08T22:15:00Z

Issue with current documentation:

below's the code

def _get_len_safe_embeddings(
    self, texts: List[str], *, engine: str, chunk_size: Optional[int] = None
) -> List[List[float]]:
    """
    Generate length-safe embeddings for a list of texts.

    This method handles tokenization and embedding generation, respecting the
    set embedding context length and chunk size. It supports both tiktoken
    and HuggingFace tokenizer based on the tiktoken_enabled flag.
    Args:
        texts (List[str]): A list of texts to embed.
        engine (str): The engine or model to use for embeddings.
        chunk_size (Optional[int]): The size of chunks for processing embeddings.

    Returns:
        List[List[float]]: A list of embeddings for each input text.
    """

    tokens = []
    indices = []
    model_name = self.tiktoken_model_name or self.model
    _chunk_size = chunk_size or self.chunk_size

    # If tiktoken flag set to False
    if not self.tiktoken_enabled:
        try:
            from transformers import AutoTokenizer
        except ImportError:
            raise ValueError(
                "Could not import transformers python package. "
                "This is needed in order to for OpenAIEmbeddings without "
                "`tiktoken`. Please install it with `pip install transformers`. "
            )

        tokenizer = AutoTokenizer.from_pretrained(
            pretrained_model_name_or_path=model_name
        )
        for i, text in enumerate(texts):
            # Tokenize the text using HuggingFace transformers
            tokenized = tokenizer.encode(text, add_special_tokens=False)

            # Split tokens into chunks respecting the embedding_ctx_length
            for j in range(0, len(tokenized), self.embedding_ctx_length):
                token_chunk = tokenized[j : j + self.embedding_ctx_length]
                tokens.append(token_chunk)
                indices.append(i)

    # Embed each chunk separately
    batched_embeddings = []
    for i in range(0, len(tokens), _chunk_size):
        token_batch = tokens[i : i + _chunk_size]
        response = embed_with_retry(
            self,
            inputs=token_batch,
            **self._invocation_params,
        )
        if not isinstance(response, dict):
            response = response.dict()
        batched_embeddings.extend(r["embedding"] for r in response["data"])

    # Concatenate the embeddings for each text
    embeddings: List[List[float]] = [[] for _ in range(len(texts))]
    for i in range(len(indices)):
        embeddings[indices[i]].extend(batched_embeddings[i])

    return embeddings

followed by below

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="")

# Generate embeddings for your documents
embeddings = get_len_safe_embeddings([doc.page_content for doc in documents], engine="text-embedding-ada-002")

# Create tuples of text and corresponding embedding
text_embeddings = list(zip([doc.page_content for doc in documents], embeddings))

# Create a FAISS vector store from the embeddings
vectorStore = FAISS.from_embeddings(text_embeddings, openai)

it has returned below issue

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-31-14fb4a40f661>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 # Generate embeddings for your documents
----> 5 embeddings = _get_len_safe_embeddings([doc.page_content for doc in documents], engine="text-embedding-ada-002")
      6 
      7 # Create tuples of text and corresponding embedding

TypeError: _get_len_safe_embeddings() missing 1 required positional argument: 'texts'

Can you assist me with this code? It'd much better resolving this issue. Can you write an updated code?

Idea or request for content:

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-02-08T22:15:21Z

🤖

Hey @nithinreddyyyyyy! Great to see you diving into another challenge. How's everything going on your end?

The error message you're seeing is due to the way you're calling the _get_len_safe_embeddings function. This function is a method of a class, and as such, it needs to be called on an instance of that class. The self parameter in the method definition refers to the instance of the class on which the method is being called.

From the code you've shared, it seems like _get_len_safe_embeddings is a method of the OpenAIEmbeddings class. Therefore, you need to call it on an instance of OpenAIEmbeddings, not as a standalone function.

Here's how you can modify your code:

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="")

# Generate embeddings for your documents
embeddings = openai._get_len_safe_embeddings([doc.page_content for doc in documents], engine="text-embedding-ada-002")

# Create tuples of text and corresponding embedding
text_embeddings = list(zip([doc.page_content for doc in documents], embeddings))

# Create a FAISS vector store from the embeddings
vectorStore = FAISS.from_embeddings(text_embeddings, openai)

Please note that methods starting with an underscore, like _get_len_safe_embeddings, are conventionally considered private in Python, meaning they're intended for internal use within the class and not meant to be called directly from outside the class. It's generally better to use public methods of the class, which are part of the class's public interface.

In this case, you might want to use the embed_documents method instead, which internally calls _get_len_safe_embeddings. Here's how you can do that:

# Instantiate the OpenAIEmbeddings class
openai = OpenAIEmbeddings(openai_api_key="")

# Generate embeddings for your documents
embeddings = openai.embed_documents([doc.page_content for doc in documents])

# Create tuples of text and corresponding embedding
text_embeddings = list(zip([doc.page_content for doc in documents], embeddings))

# Create a FAISS vector store from the embeddings
vectorStore = FAISS.from_embeddings(text_embeddings, openai)

I hope this helps! If you have any more questions, feel free to ask.

Sources

libs/partners/openai/langchain_openai/embeddings/base.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

eyurtsev · 2024-02-09T14:50:36Z

Closing issue since this is not a documentation question

dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations labels Feb 8, 2024

eyurtsev closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to run get_len_safe_embeddings function which i wrote #17267

unable to run get_len_safe_embeddings function which i wrote #17267

nithinreddyyyyyy commented Feb 8, 2024

dosubot bot commented Feb 8, 2024 •

edited

Loading

About Dosu

eyurtsev commented Feb 9, 2024

unable to run get_len_safe_embeddings function which i wrote #17267

unable to run get_len_safe_embeddings function which i wrote #17267

Comments

nithinreddyyyyyy commented Feb 8, 2024

Issue with current documentation:

Idea or request for content:

dosubot bot commented Feb 8, 2024 • edited Loading

Sources

About Dosu

eyurtsev commented Feb 9, 2024

dosubot bot commented Feb 8, 2024 •

edited

Loading