Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometime Documents is staying into queue for permanently #492

Closed
ravikhunt opened this issue Feb 8, 2024 · 8 comments · Fixed by #558
Closed

Sometime Documents is staying into queue for permanently #492

ravikhunt opened this issue Feb 8, 2024 · 8 comments · Fixed by #558
Labels
bug Something isn't working
Milestone

Comments

@ravikhunt
Copy link

ravikhunt commented Feb 8, 2024

Sometimes status is not updating and not updating the processing file ahead and stays in queue only

It's not a very big file, its like a 1.40MB file with 1500rows and 5 columns of data

image

@ryonsteele
Copy link
Contributor

Hi @ravikhunt
Try these troubleshooting steps and let us know what you find.
https://github.com/microsoft/PubSec-Info-Assistant/blob/main/docs/deployment/troubleshooting.md

@ravikhunt
Copy link
Author

@ryonsteele here is the detail

{ "id": "xxxxx", "file_path": "upload/testing mock data.csv", "file_name": "testing mock data.csv", "state": "Queued", "start_timestamp": "2024-02-07 09:33:51", "state_description": "", "state_timestamp": "2024-02-07 09:49:42", "status_updates": [ { "status": "File uploaded from browser to Azure Blob Storage", "status_timestamp": "2024-02-07 09:33:51", "status_classification": "Info" }, { "status": "Pipeline triggered by Blob Upload", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Info" }, { "status": "FileUploadedFunc - FileUploadedFunc function started", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Debug" }, { "status": "FileUploadedFunc - csv file sent to submit queue. Visible in 257 seconds", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - Starting to parse the non-PDF file", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Info" }, { "status": "FileLayoutParsingOther - Message received from non-pdf submit queue", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - SAS token generated to access the file", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - partitioning complete", "status_timestamp": "2024-02-07 09:38:45", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - chunking complete. 3045 chunks created", "status_timestamp": "2024-02-07 09:38:47", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - chunking stored.", "status_timestamp": "2024-02-07 09:49:42", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - message sent to enrichment queue", "status_timestamp": "2024-02-07 09:49:42", "status_classification": "Debug" } ], "_rid": "XHUAAPn2+-IjAAAAAAAAAA==", "_self": "dbs/XHUAAA==/colls/XHUAAPn2+-I=/docs/XHUAAPn2+-IjAAAAAAAAAA==/", "_etag": "\"02006397-0000-0d00-0000-65c352360000\"", "_attachments": "attachments/", "_ts": 1707299382 }

@ArpitaisAn0maly
Copy link
Contributor

Hi ravikhunt

From the detail you posted it seems chunking completed successfully and the file is stuck in enrichment queue/ embedding process. Can you please verify if your enrichment app is up and running?

@ravikhunt
Copy link
Author

ravikhunt commented Feb 13, 2024

It's running only and run all the time, after that i used another file that was processed successfully

@mbarnettHMX
Copy link
Collaborator

We are also having an issue with csv and xlsx files (the 'products' file in the example Ice Cream data set) getting stuck in embedding process, but it's a max "requeue limit" issue, which I assume can be corrected with a change to the 'max_requeue_count', but we haven't tested that yet.

{
"id": "dXBsb2FkL0ljZSBDcmVhbS9wcm9kdWN0cy54bHN4",
"file_path": "upload/Ice Cream/products.xlsx",
"file_name": "products.xlsx",
"state": "Error",
"start_timestamp": "2024-02-09 18:28:53",
"state_description": "",
"state_timestamp": "2024-02-09 19:05:46",
"status_updates": [
{
"status": "File uploaded from browser to Azure Blob Storage",
"status_timestamp": "2024-02-09 18:28:53",
"status_classification": "Info"
},
{
"status": "Pipeline triggered by Blob Upload",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Info"
},
{
"status": "FileUploadedFunc - FileUploadedFunc function started",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Debug"
},
{
"status": "FileUploadedFunc - xlsx file sent to submit queue. Visible in 59 seconds",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - Starting to parse the non-PDF file",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Info"
},
{
"status": "FileLayoutParsingOther - Message received from non-pdf submit queue",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - SAS token generated to access the file",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - partitioning complete",
"status_timestamp": "2024-02-09 18:30:41",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - chunking complete. 173 chunks created",
"status_timestamp": "2024-02-09 18:30:41",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - chunking stored.",
"status_timestamp": "2024-02-09 18:30:55",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - message sent to enrichment queue",
"status_timestamp": "2024-02-09 18:30:56",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - Received message from text-enrichment-queue ",
"status_timestamp": "2024-02-09 18:31:13",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - detected language of text is en.",
"status_timestamp": "2024-02-09 18:31:14",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - Text enrichment is complete",
"status_timestamp": "2024-02-09 18:33:49",
"status_classification": "Debug"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:34:07",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 1. Visible in 60 seconds. Error: .",
"status_timestamp": "2024-02-09 18:34:18",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e37f2a40 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:36:07",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 2. Visible in 193 seconds. Error: .",
"status_timestamp": "2024-02-09 18:36:15",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e85cb940 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:40:18",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 3. Visible in 279 seconds. Error: .",
"status_timestamp": "2024-02-09 18:40:28",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8467d30 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:46:44",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 4. Visible in 292 seconds. Error: .",
"status_timestamp": "2024-02-09 18:46:54",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8680430 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:53:01",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 5. Visible in 730 seconds. Error: .",
"status_timestamp": "2024-02-09 18:53:13",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e84d37f0 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 19:05:35",
"status_classification": "Info"
},
{
"status": "An error occurred, max requeue limit was reached. Error description: ",
"status_timestamp": "2024-02-09 19:05:46",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 425, in poll_queue\n statusLog.upsert_document(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8573b20 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
}
],
"_rid": "jtwnAL-HDLWcAAAAAAAAAA==",
"_self": "dbs/jtwnAA==/colls/jtwnAL-HDLU=/docs/jtwnAL-HDLWcAAAAAAAAAA==/",
"_etag": ""9701b912-0000-0100-0000-65c6778a0000"",
"_attachments": "attachments/",
"_ts": 1707505546
}

@georearl
Copy link
Contributor

georearl commented Feb 28, 2024

I believe this may be due to chunking of unstructured.io. In the version we have the library doesn't chunk by size, just creates a single chunk, which will crash later steps that can only cope with a chunk of a particular size. We have a ticket on the board to address this as part of the 1.1 release.

@dayland
Copy link
Contributor

dayland commented Mar 14, 2024

PR #558 was applied to main to address these issues. Pull latest from main and re-run make deploy.

@georearl
Copy link
Contributor

resolved and closed due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants