-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAIEmbeddings model argument does not work #17409
Comments
🤖 Hey there, @nyck33! Great to see you diving into the depths of LangChain once again. 🚀 Based on the information you've provided, it seems like the This is evident in the following code snippet from the try:
encoding = tiktoken.encoding_for_model(model_name)
except KeyError:
logger.warning("Warning: model not found. Using cl100k_base encoding.")
model = "cl100k_base"
encoding = tiktoken.get_encoding(model) This means that if the model you specified in your environment variables is not recognized by To resolve this issue, you can either use a model name that is recognized by Regarding the In the However, if the I hope this helps clarify the issue. If you have any further questions, feel free to ask. Sources
|
So the model being mentioned in your response has to do with tokenization and not the embeddings? I was concerned that it was not able to make the embedding vectors using the model and dimensions I chose: ```EMBEDDING_DIMENSION=256 edit this based on your model preference, e.g. text-embedding-3-small, text-embedding-ada-002EMBEDDING_MODEL=text-embedding-3-large``` but I guess it is doing that part okay. What are the differences between the tokenization models? You mention |
TL;DR:Try: pip install -U langchain tiktoken ReasonsFrom my perspective, the problem may related to the problem that "text-embedding-3-small" and "text-embedding-3-large" were not registered in the After updating, you may find the following in this file: MODEL_TO_ENCODING: dict[str, str] = {
# chat
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"gpt-3.5": "cl100k_base", # Common shorthand
"gpt-35-turbo": "cl100k_base", # Azure deployment name
# base
"davinci-002": "cl100k_base",
"babbage-002": "cl100k_base",
# embeddings
"text-embedding-ada-002": "cl100k_base",
"text-embedding-3-small": "cl100k_base",
"text-embedding-3-large": "cl100k_base",
# DEPRECATED MODELS
# text (DEPRECATED)
... |
issue is still persist @oliverwang15 Lib versions ->
my code from langchain_openai import OpenAIEmbeddings
from pinecone import Pinecone
pc = Pinecone()
old_index = pc.Index(INDEX_NAME)
embed = OpenAIEmbeddings(model="text-embedding-3-small")
vectors = embed.embed_query("hello")
print(len(vectors))
old_index.upsert(vectors=[
{"id": "A", "values": vectors},
]
) printing: that's why my old_index show miss-match in dimension
|
Checked other resources
Example Code
and in my .env
EMBEDDING_DIMENSION=256 # edit this based on your model preference, e.g. text-embedding-3-small, text-embedding-ada-002 EMBEDDING_MODEL=text-embedding-3-large
Error Message and Stack Trace (if applicable)
it's a warning.
Description
I want it to use the model I designated. Can I change the default in base.py?
I can't believe the results are actually correct but this is a tiny tiny children's book so it could have been a fluke.
System Info
(langchain) nyck33@nyck33-lenovo:/media/nyck33/65DA61B605B0A8C1/projects/langchain-deeplearning-ai-tutorial$ pip freeze | grep langchain langchain==0.1.5 langchain-community==0.0.19 langchain-core==0.1.21 langchain-openai==0.0.5
The text was updated successfully, but these errors were encountered: