Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Unknown encoding text-embedding-ada-002 #6726

Closed
2 of 14 tasks
heavenkiller2018 opened this issue Jun 25, 2023 · 6 comments
Closed
2 of 14 tasks

ValueError: Unknown encoding text-embedding-ada-002 #6726

heavenkiller2018 opened this issue Jun 25, 2023 · 6 comments
Labels
01 bug Confirmed bug

Comments

@heavenkiller2018
Copy link

heavenkiller2018 commented Jun 25, 2023

System Info

no

Who can help?

@hwchase17 @agola11

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

when running the following code:

from langchain.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()
embeddings = embedding_model.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!"
    ]
)

such errors occurred:

ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 embeddings = embedding_model.embed_documents(
      2     [
      3         "Hi there!",
      4         "Oh, hello!",
      5         "What's your name?",
      6         "My friends call me World",
      7         "Hello World!"
      8     ]
      9 )
     10 len(embeddings), len(embeddings[0])

File ~/micromamba/envs/openai/lib/python3.11/site-packages/langchain/embeddings/openai.py:305, in OpenAIEmbeddings.embed_documents(self, texts, chunk_size)
    293 """Call out to OpenAI's embedding endpoint for embedding search docs.
    294 
    295 Args:
   (...)
    301     List of embeddings, one for each text.
    302 """
    303 # NOTE: to keep things simple, we assume the list may contain texts longer
    304 #       than the maximum context and use length-safe embedding function.
--> 305 return self._get_len_safe_embeddings(texts, engine=self.deployment)

File ~/micromamba/envs/openai/lib/python3.11/site-packages/langchain/embeddings/openai.py:225, in OpenAIEmbeddings._get_len_safe_embeddings(self, texts, engine, chunk_size)
    223 tokens = []
    224 indices = []
--> 225 encoding = tiktoken.get_encoding(self.model)
    226 for i, text in enumerate(texts):
    227     if self.model.endswith("001"):
    228         # See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
    229         # replace newlines, which can negatively affect performance.

File ~/micromamba/envs/openai/lib/python3.11/site-packages/tiktoken/registry.py:60, in get_encoding(encoding_name)
     57     assert ENCODING_CONSTRUCTORS is not None
     59 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 60     raise ValueError(f"Unknown encoding {encoding_name}")
     62 constructor = ENCODING_CONSTRUCTORS[encoding_name]
     63 enc = Encoding(**constructor())

ValueError: Unknown encoding text-embedding-ada-002

how to fix it?

Expected behavior

no

@dosubot dosubot bot added the 01 bug Confirmed bug label Jun 25, 2023
@dosubot
Copy link

dosubot bot commented Jun 25, 2023

Answer generated by a 🤖

Answer

To fix the ValueError: Unknown encoding text-embedding-ada-002, you need to update the tiktoken package to the latest version that supports the text-embedding-ada-002 encoding. You can do this by running:

pip install --upgrade tiktoken

After upgrading tiktoken, try running your code again:

from langchain.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()
embeddings = embedding_model.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!"
    ]
)

This should resolve the ValueError you encountered (docs/snippets/modules/data_connection/text_embedding/get_started.mdx).

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@heavenkiller2018
Copy link
Author

heavenkiller2018 commented Jun 25, 2023

I have followed your instructions, but the problem still persists. @hwchase17

❯ pip list |grep tiktoken
tiktoken                 0.1.2

after update tiktoken

❯ pip list |grep tiktoken
tiktoken                 0.4.0

@heavenkiller2018
Copy link
Author

is there anybody can give a help?

@dosubot
Copy link

dosubot bot commented Sep 26, 2023

Hi, @heavenkiller2018! I'm Dosu, and I'm here to help the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you reported a ValueError with the message "Unknown encoding text-embedding-ada-002" when using the OpenAIEmbeddings class. I suggested updating the tiktoken package to the latest version, but unfortunately, the problem still persists.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and we appreciate your contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 26, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 5, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 5, 2023
@jgasconm
Copy link

I'm having the same problem following the code in langchain documentation. I have updated tiktoken to the latest version (0.5.2). Does anyone solve this issue?

@sajjadafridi
Copy link

I have the same issue, but my scenario is different.
In flaskapi, locally and inside locally dockerize container it is working but after pushing to google cloud and when try to run on cloud run. it shows the error:

Unknown encoding text-embedding-ada-002. Plugins found: ['tiktoken_ext.openai_public']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 bug Confirmed bug
Projects
None yet
Development

No branches or pull requests

3 participants