Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTSimpleVectorIndex azure embedding error #990

Closed
kwin-wang opened this issue Mar 30, 2023 · 6 comments
Closed

GPTSimpleVectorIndex azure embedding error #990

kwin-wang opened this issue Mar 30, 2023 · 6 comments

Comments

@kwin-wang
Copy link

After referring to this example Azure Openai demo and running the code, I received the following error message, how to fix this problem?

INFO:openai:error_code=None error_message="This model's maximum context length is 2046 tokens, however you requested 3383 tokens (3383 in your prompt; 0 for the completion). Please reduce your prompt; or completion length." error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
Traceback (most recent call last):
  File "/workspaces/gptdemo/test.py", line 61, in <module>
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/base.py", line 100, in from_documents
    return cls(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/vector_indices.py", line 94, in __init__
    super().__init__(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 58, in __init__
    super().__init__(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/base.py", line 69, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py", line 78, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 214, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 203, in _build_index_from_nodes
    self._add_nodes_to_index(index_struct, nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 182, in _add_nodes_to_index
    embedding_results = self._get_node_embedding_results(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 100, in _get_node_embedding_results
    ) = self._service_context.embed_model.get_queued_text_embeddings()
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 151, in get_queued_text_embeddings
    embeddings = self._get_text_embeddings(cur_batch_texts)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 103, in _get_text_embeddings
    result = [self._get_text_embedding(text) for text in texts]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 103, in <listcomp>
    result = [self._get_text_embedding(text) for text in texts]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/langchain.py", line 30, in _get_text_embedding
    return self._langchain_embedding.embed_documents([text])[0]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 254, in embed_documents
    response = embed_with_retry(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 53, in embed_with_retry
    return _completion_with_retry(**kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/home/codespace/.python/current/lib/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/codespace/.python/current/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 51, in _completion_with_retry
    return embeddings.client.create(**kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create
    response = super().create(*args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 682, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 2046 tokens, however you requested 3383 tokens (3383 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
@eerhil
Copy link

eerhil commented Mar 31, 2023

I'm running to the same problem myself. Somehow the embedding models are not considering the maximum token lengths from the PromptHelper and I did not find any alternative way to provide the information from the documentation.

Could someone help us solve this?

@DavidLiCIG
Copy link

Doesn't seem like that the embedding operation uses the PromptHelper, I got past that error by explicitly setting chunk_size_limit :
service_context = ServiceContext.from_defaults( llm_predictor=llm_predictor, embed_model=embedding_llm, prompt_helper=prompt_helper, chunk_size_limit=1000 )

@fms-santos
Copy link

After referring to this example Azure Openai demo and running the code, I received the following error message, how to fix this problem?

INFO:openai:error_code=None error_message="This model's maximum context length is 2046 tokens, however you requested 3383 tokens (3383 in your prompt; 0 for the completion). Please reduce your prompt; or completion length." error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
Traceback (most recent call last):
  File "/workspaces/gptdemo/test.py", line 61, in <module>
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/base.py", line 100, in from_documents
    return cls(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/vector_indices.py", line 94, in __init__
    super().__init__(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 58, in __init__
    super().__init__(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/base.py", line 69, in __init__
    index_struct = self.build_index_from_nodes(nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py", line 78, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 214, in build_index_from_nodes
    return self._build_index_from_nodes(nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 203, in _build_index_from_nodes
    self._add_nodes_to_index(index_struct, nodes)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 182, in _add_nodes_to_index
    embedding_results = self._get_node_embedding_results(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py", line 100, in _get_node_embedding_results
    ) = self._service_context.embed_model.get_queued_text_embeddings()
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 151, in get_queued_text_embeddings
    embeddings = self._get_text_embeddings(cur_batch_texts)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 103, in _get_text_embeddings
    result = [self._get_text_embedding(text) for text in texts]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/base.py", line 103, in <listcomp>
    result = [self._get_text_embedding(text) for text in texts]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/llama_index/embeddings/langchain.py", line 30, in _get_text_embedding
    return self._langchain_embedding.embed_documents([text])[0]
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 254, in embed_documents
    response = embed_with_retry(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 53, in embed_with_retry
    return _completion_with_retry(**kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/home/codespace/.python/current/lib/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/codespace/.python/current/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/codespace/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/langchain/embeddings/openai.py", line 51, in _completion_with_retry
    return embeddings.client.create(**kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create
    response = super().create(*args, **kwargs)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
  File "/home/codespace/.python/current/lib/python3.10/site-packages/openai/api_requestor.py", line 682, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 2046 tokens, however you requested 3383 tokens (3383 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

I'm having the exact same problem when using the Azure OpenAI example

@fms-santos
Copy link

I'm running to the same problem myself. Somehow the embedding models are not considering the maximum token lengths from the PromptHelper and I did not find any alternative way to provide the information from the documentation.

This didn't solve the issue for my case.

@eerhil
Copy link

eerhil commented Apr 11, 2023

chunk_size_limit=1000

This solved the error for me. Maybe, the OpenAI Demo could be updated accordingly?

@dosubot
Copy link

dosubot bot commented Aug 20, 2023

Hi, @kwin-wang! I'm here to help the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported was about the code requesting more tokens than the maximum context length allowed by the model. It seems that the issue has been resolved by setting the chunk_size_limit explicitly to solve the error. Additionally, the embedding models have been updated to consider the maximum token lengths from the PromptHelper.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LlamaIndex repository!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 20, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants